Files
goplt/docs/content/architecture/system-behavior.md
0x1d 8c90075086
All checks were successful
CI / Test (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 18s
CI / Build (pull_request) Successful in 12s
CI / Format Check (pull_request) Successful in 2s
fix: correct Mermaid graph syntax for endpoint labels with slashes
Mermaid graphs require node labels with special characters like forward
slashes to be quoted. Changed /healthz and /metrics from square bracket
format to quoted string format to fix the lexical error.
2025-11-05 21:26:17 +01:00

13 KiB

System Behavior Overview

Purpose

This document provides a high-level explanation of how the Go Platform behaves end-to-end, focusing on system-level operations, flows, and interactions rather than implementation details.

Overview

The Go Platform is a microservices-based system where each module operates as an independent service. Services communicate via gRPC (primary) or HTTP (fallback), share infrastructure components (PostgreSQL, Redis, Kafka), and are orchestrated through service discovery and dependency injection.

Key Concepts

  • Services: Independent processes that can be deployed and scaled separately
  • Service Clients: Abstraction layer for inter-service communication
  • Service Registry: Central registry for service discovery
  • Event Bus: Asynchronous communication channel for events
  • DI Container: Dependency injection container managing service lifecycle

Application Bootstrap Sequence

The platform follows a well-defined startup sequence that ensures all services are properly initialized and registered.

sequenceDiagram
    participant Main
    participant Config
    participant Logger
    participant DI
    participant Registry
    participant ModuleLoader
    participant ServiceRegistry
    participant HTTP
    participant gRPC
    
    Main->>Config: Load configuration
    Config-->>Main: Config ready
    
    Main->>Logger: Initialize logger
    Logger-->>Main: Logger ready
    
    Main->>DI: Create DI container
    DI->>DI: Register core services
    DI-->>Main: DI container ready
    
    Main->>ModuleLoader: Discover modules
    ModuleLoader->>ModuleLoader: Scan module directories
    ModuleLoader->>ModuleLoader: Load module.yaml files
    ModuleLoader-->>Main: Module list
    
    Main->>Registry: Register modules
    Registry->>Registry: Resolve dependencies
    Registry->>Registry: Order modules
    Registry-->>Main: Ordered modules
    
    loop For each module
        Main->>Module: Initialize module
        Module->>DI: Register services
        Module->>Registry: Register routes
        Module->>Registry: Register migrations
    end
    
    Main->>Registry: Run migrations
    Registry->>Registry: Execute in dependency order
    
    Main->>ServiceRegistry: Register service
    ServiceRegistry->>ServiceRegistry: Register with Consul/K8s
    ServiceRegistry-->>Main: Service registered
    
    Main->>gRPC: Start gRPC server
    Main->>HTTP: Start HTTP server
    HTTP-->>Main: Server ready
    gRPC-->>Main: Server ready
    
    Main->>DI: Start lifecycle
    DI->>DI: Execute OnStart hooks
    DI-->>Main: All services started

Bootstrap Phases

  1. Configuration Loading: Load YAML files, environment variables, and secrets
  2. Foundation Services: Initialize logger, config provider, DI container
  3. Module Discovery: Scan and load module manifests
  4. Dependency Resolution: Build dependency graph and order modules
  5. Module Initialization: Initialize each module in dependency order
  6. Database Migrations: Run migrations in dependency order
  7. Service Registration: Register service with service registry
  8. Server Startup: Start HTTP and gRPC servers
  9. Lifecycle Hooks: Execute OnStart hooks for all services

Request Processing Pipeline

Every HTTP request flows through a standardized pipeline that ensures security, observability, and proper error handling.

graph TD
    Start([HTTP Request]) --> Auth[Authentication Middleware]
    Auth -->|Valid Token| Authz[Authorization Middleware]
    Auth -->|Invalid Token| Error1[401 Unauthorized]
    
    Authz -->|Authorized| RateLimit[Rate Limiting]
    Authz -->|Unauthorized| Error2[403 Forbidden]
    
    RateLimit -->|Within Limits| Tracing[OpenTelemetry Tracing]
    RateLimit -->|Rate Limited| Error3[429 Too Many Requests]
    
    Tracing --> Handler[Request Handler]
    Handler --> Service[Domain Service]
    
    Service --> Cache{Cache Check}
    Cache -->|Hit| Return[Return Cached Data]
    Cache -->|Miss| Repo[Repository]
    
    Repo --> DB[(Database)]
    DB --> Repo
    Repo --> Service
    Service --> CacheStore[Update Cache]
    
    Service --> EventBus[Publish Events]
    Service --> Audit[Audit Logging]
    Service --> Metrics[Update Metrics]
    
    Service --> Handler
    Handler --> Tracing
    Tracing --> Response[HTTP Response]
    
    Error1 --> Response
    Error2 --> Response
    Error3 --> Response
    Return --> Response
    
    style Auth fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style Authz fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style Service fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff

Request Processing Stages

  1. Authentication: Extract and validate JWT token, add user to context
  2. Authorization: Check user permissions for requested resource
  3. Rate Limiting: Enforce per-user and per-IP rate limits
  4. Tracing: Start/continue distributed trace
  5. Handler Processing: Execute request handler
  6. Service Logic: Execute business logic
  7. Data Access: Query database or cache
  8. Side Effects: Publish events, audit logs, update metrics
  9. Response: Return HTTP response with tracing context

Event-Driven Interactions

The platform uses an event bus for asynchronous communication between services, enabling loose coupling and scalability.

sequenceDiagram
    participant Publisher
    participant EventBus
    participant Kafka
    participant Subscriber1
    participant Subscriber2
    
    Publisher->>EventBus: Publish(event)
    EventBus->>EventBus: Serialize event
    EventBus->>EventBus: Add metadata (trace_id, user_id)
    EventBus->>Kafka: Send to topic
    Kafka-->>EventBus: Acknowledged
    
    Kafka->>Subscriber1: Deliver event
    Kafka->>Subscriber2: Deliver event
    
    Subscriber1->>Subscriber1: Process event
    Subscriber1->>Subscriber1: Update state
    Subscriber1->>Subscriber1: Emit new events (optional)
    
    Subscriber2->>Subscriber2: Process event
    Subscriber2->>Subscriber2: Update state
    
    Note over Subscriber1,Subscriber2: Events processed asynchronously

Event Processing Flow

  1. Event Publishing: Service publishes event to event bus
  2. Event Serialization: Event is serialized with metadata
  3. Event Distribution: Event bus distributes to Kafka topic
  4. Event Consumption: Subscribers consume events from Kafka
  5. Event Processing: Each subscriber processes event independently
  6. State Updates: Subscribers update their own state
  7. Cascade Events: Subscribers may publish new events

Background Job Processing

Background jobs are scheduled and processed asynchronously, enabling long-running tasks and scheduled operations.

sequenceDiagram
    participant Scheduler
    participant JobQueue
    participant Worker
    participant Service
    participant DB
    participant EventBus
    
    Scheduler->>JobQueue: Enqueue job
    JobQueue->>JobQueue: Store job definition
    
    Worker->>JobQueue: Poll for jobs
    JobQueue-->>Worker: Job definition
    
    Worker->>Worker: Start job execution
    Worker->>Service: Execute job logic
    Service->>DB: Update data
    Service->>EventBus: Publish events
    
    Service-->>Worker: Job complete
    Worker->>JobQueue: Mark job complete
    
    alt Job fails
        Worker->>JobQueue: Mark job failed
        JobQueue->>JobQueue: Schedule retry
    end

Background Job Flow

  1. Job Scheduling: Jobs scheduled via cron or programmatically
  2. Job Enqueueing: Job definition stored in job queue
  3. Job Polling: Workers poll queue for available jobs
  4. Job Execution: Worker executes job logic
  5. Job Completion: Job marked as complete or failed
  6. Job Retry: Failed jobs retried with exponential backoff

Error Recovery and Resilience

The platform implements multiple layers of error handling to ensure system resilience.

graph TD
    Error[Error Occurs] --> Handler{Error Handler}
    
    Handler -->|Business Error| BusinessError[Business Error Handler]
    Handler -->|System Error| SystemError[System Error Handler]
    Handler -->|Panic| PanicHandler[Panic Recovery]
    
    BusinessError --> ErrorBus[Error Bus]
    SystemError --> ErrorBus
    PanicHandler --> ErrorBus
    
    ErrorBus --> Logger[Logger]
    ErrorBus --> Sentry[Sentry]
    ErrorBus --> Metrics[Metrics]
    
    BusinessError --> Response[HTTP Response]
    SystemError --> Response
    PanicHandler --> Response
    
    Response --> Client[Client]
    
    style Error fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style ErrorBus fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff

Error Handling Layers

  1. Panic Recovery: Middleware catches panics and prevents crashes
  2. Error Classification: Errors classified as business or system errors
  3. Error Bus: Central error bus collects all errors
  4. Error Logging: Errors logged with full context
  5. Error Reporting: Critical errors reported to Sentry
  6. Error Metrics: Errors tracked in metrics
  7. Error Response: Appropriate HTTP response returned

System Shutdown Sequence

The platform implements graceful shutdown to ensure data consistency and proper resource cleanup.

sequenceDiagram
    participant Signal
    participant Main
    participant HTTP
    participant gRPC
    participant ServiceRegistry
    participant DI
    participant Workers
    participant DB
    
    Signal->>Main: SIGTERM/SIGINT
    Main->>HTTP: Stop accepting requests
    HTTP->>HTTP: Wait for active requests
    HTTP-->>Main: HTTP server stopped
    
    Main->>gRPC: Stop accepting connections
    gRPC->>gRPC: Wait for active calls
    gRPC-->>Main: gRPC server stopped
    
    Main->>ServiceRegistry: Deregister service
    ServiceRegistry->>ServiceRegistry: Remove from registry
    ServiceRegistry-->>Main: Service deregistered
    
    Main->>Workers: Stop workers
    Workers->>Workers: Finish current jobs
    Workers-->>Main: Workers stopped
    
    Main->>DI: Stop lifecycle
    DI->>DI: Execute OnStop hooks
    DI->>DI: Close connections
    DI->>DB: Close DB connections
    DI-->>Main: Services stopped
    
    Main->>Main: Exit

Shutdown Phases

  1. Signal Reception: Receive SIGTERM or SIGINT
  2. Stop Accepting Requests: HTTP and gRPC servers stop accepting new requests
  3. Wait for Active Requests: Wait for in-flight requests to complete
  4. Service Deregistration: Remove service from service registry
  5. Worker Shutdown: Stop background workers gracefully
  6. Lifecycle Hooks: Execute OnStop hooks for all services
  7. Resource Cleanup: Close database connections, release resources
  8. Application Exit: Exit application cleanly

Health Check and Monitoring Flow

Health checks and metrics provide visibility into system health and performance.

graph TD
    HealthEndpoint["/healthz"] --> HealthRegistry[Health Registry]
    HealthRegistry --> CheckDB[Check Database]
    HealthRegistry --> CheckCache[Check Cache]
    HealthRegistry --> CheckEventBus[Check Event Bus]
    
    CheckDB -->|Healthy| Aggregate[Aggregate Results]
    CheckCache -->|Healthy| Aggregate
    CheckEventBus -->|Healthy| Aggregate
    
    Aggregate -->|All Healthy| Response200[200 OK]
    Aggregate -->|Unhealthy| Response503[503 Service Unavailable]
    
    MetricsEndpoint["/metrics"] --> MetricsRegistry[Metrics Registry]
    MetricsRegistry --> Prometheus[Prometheus Format]
    Prometheus --> ResponseMetrics[Metrics Response]
    
    style HealthRegistry fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff
    style MetricsRegistry fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff

Health Check Components

  • Liveness Check: Service is running (process health)
  • Readiness Check: Service is ready to accept requests (dependency health)
  • Dependency Checks: Database, cache, event bus connectivity
  • Metrics Collection: Request counts, durations, error rates
  • Metrics Export: Prometheus-formatted metrics

Integration Points

This system behavior integrates with: