Files

0x1d 38a251968c docs: Align documentation with true microservices architecture

Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.

Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
  - Each service has own entry point (cmd/{service}/)
  - Each service has own gRPC server and database schema
  - Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
  - Single entry point for all external traffic
  - Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation

Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files

New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)

New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation

2025-11-06 08:54:19 +01:00

15 KiB

Raw Permalink Blame History

System Behavior Overview

Purpose

This document provides a high-level explanation of how the Go Platform behaves end-to-end, focusing on system-level operations, flows, and interactions rather than implementation details.

Overview

The Go Platform is a microservices-based system where each service is independently deployable from day one. Services communicate via gRPC (primary) or HTTP (fallback) through service clients, share infrastructure components (PostgreSQL instance, Redis, Kafka), and are orchestrated through service discovery and dependency injection. All external traffic enters through the API Gateway.

Key Concepts

Services: Independent processes that can be deployed and scaled separately
Service Clients: Abstraction layer for inter-service communication
Service Registry: Central registry for service discovery
Event Bus: Asynchronous communication channel for events
DI Container: Dependency injection container managing service lifecycle

Service Bootstrap Sequence

Each service (API Gateway, Auth, Identity, Authz, Audit, and feature services) follows a well-defined startup sequence. Services bootstrap independently.

Individual Service Startup

sequenceDiagram
    participant Main
    participant Config
    participant Logger
    participant DI
    participant ServiceImpl
    participant ServiceRegistry
    participant DB
    participant HTTP
    participant gRPC
    
    Main->>Config: Load configuration
    Config-->>Main: Config ready
    
    Main->>Logger: Initialize logger
    Logger-->>Main: Logger ready
    
    Main->>DI: Create DI container
    DI->>DI: Register core kernel services
    DI-->>Main: DI container ready
    
    Main->>ServiceImpl: Register service implementation
    ServiceImpl->>DI: Register service dependencies
    ServiceImpl->>DB: Connect to database
    DB-->>ServiceImpl: Connection ready
    
    Main->>DB: Run migrations
    DB-->>Main: Migrations complete
    
    Main->>ServiceRegistry: Register service
    ServiceRegistry->>ServiceRegistry: Register with Consul/K8s
    ServiceRegistry-->>Main: Service registered
    
    Main->>gRPC: Start gRPC server
    Main->>HTTP: Start HTTP server (if needed)
    HTTP-->>Main: HTTP server ready
    gRPC-->>Main: gRPC server ready
    
    Main->>DI: Start lifecycle
    DI->>DI: Execute OnStart hooks
    DI-->>Main: Service started

Platform Startup (All Services)

sequenceDiagram
    participant Docker
    participant Gateway
    participant AuthSvc
    participant IdentitySvc
    participant AuthzSvc
    participant AuditSvc
    participant BlogSvc
    participant Registry
    participant DB
    
    Docker->>DB: Start PostgreSQL
    Docker->>Registry: Start Consul
    DB-->>Docker: Database ready
    Registry-->>Docker: Registry ready
    
    par Service Startup (in parallel)
        Docker->>Gateway: Start API Gateway
        Gateway->>Registry: Register
        Gateway->>Gateway: Start HTTP server
        Gateway-->>Docker: Gateway ready
    and
        Docker->>AuthSvc: Start Auth Service
        AuthSvc->>DB: Connect
        AuthSvc->>Registry: Register
        AuthSvc->>AuthSvc: Start gRPC server
        AuthSvc-->>Docker: Auth Service ready
    and
        Docker->>IdentitySvc: Start Identity Service
        IdentitySvc->>DB: Connect
        IdentitySvc->>Registry: Register
        IdentitySvc->>IdentitySvc: Start gRPC server
        IdentitySvc-->>Docker: Identity Service ready
    and
        Docker->>AuthzSvc: Start Authz Service
        AuthzSvc->>DB: Connect
        AuthzSvc->>Registry: Register
        AuthzSvc->>AuthzSvc: Start gRPC server
        AuthzSvc-->>Docker: Authz Service ready
    and
        Docker->>AuditSvc: Start Audit Service
        AuditSvc->>DB: Connect
        AuditSvc->>Registry: Register
        AuditSvc->>AuditSvc: Start gRPC server
        AuditSvc-->>Docker: Audit Service ready
    and
        Docker->>BlogSvc: Start Blog Service
        BlogSvc->>DB: Connect
        BlogSvc->>Registry: Register
        BlogSvc->>BlogSvc: Start gRPC server
        BlogSvc-->>Docker: Blog Service ready
    end
    
    Docker->>Docker: All services ready

Service Bootstrap Phases (Per Service)

Configuration Loading: Load YAML files, environment variables, and secrets
Foundation Services: Initialize core kernel (logger, config, DI container)
Database Connection: Connect to database with own connection pool
Service Implementation: Register service-specific implementations
Database Migrations: Run service-specific migrations
Service Registration: Register service with service registry
Server Startup: Start gRPC server (and HTTP if needed)
Lifecycle Hooks: Execute OnStart hooks

Platform Startup Order

Infrastructure: Start PostgreSQL, Redis, Kafka, Consul
Core Services: Start Auth, Identity, Authz, Audit services (can start in parallel)
API Gateway: Start API Gateway (depends on service registry)
Feature Services: Start Blog, Billing, etc. (can start in parallel)
Health Checks: All services report healthy to registry

Request Processing Pipeline

Every HTTP request flows through API Gateway first, then to backend services. The pipeline ensures security, observability, and proper error handling.

graph TD
    Start([HTTP Request]) --> Gateway[API Gateway]
    Gateway --> RateLimit[Rate Limiting]
    RateLimit -->|Allowed| Auth[Validate JWT via Auth Service]
    RateLimit -->|Exceeded| Error0[429 Too Many Requests]
    
    Auth -->|Valid Token| Authz[Check Permission via Authz Service]
    Auth -->|Invalid Token| Error1[401 Unauthorized]
    
    Authz -->|Authorized| RateLimit[Rate Limiting]
    Authz -->|Unauthorized| Error2[403 Forbidden]
    
    RateLimit -->|Within Limits| Tracing[OpenTelemetry Tracing]
    RateLimit -->|Rate Limited| Error3[429 Too Many Requests]
    
    Tracing --> Handler[Request Handler]
    Handler --> Service[Domain Service]
    
    Service --> Cache{Cache Check}
    Cache -->|Hit| Return[Return Cached Data]
    Cache -->|Miss| Repo[Repository]
    
    Repo --> DB[(Database)]
    DB --> Repo
    Repo --> Service
    Service --> CacheStore[Update Cache]
    
    Service --> EventBus[Publish Events]
    Service --> Audit[Audit Logging]
    Service --> Metrics[Update Metrics]
    
    Service --> Handler
    Handler --> Tracing
    Tracing --> Response[HTTP Response]
    
    Error1 --> Response
    Error2 --> Response
    Error3 --> Response
    Return --> Response
    
    style Auth fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style Authz fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style Service fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff

Request Processing Stages

Authentication: Extract and validate JWT token, add user to context
Authorization: Check user permissions for requested resource
Rate Limiting: Enforce per-user and per-IP rate limits
Tracing: Start/continue distributed trace
Handler Processing: Execute request handler
Service Logic: Execute business logic
Data Access: Query database or cache
Side Effects: Publish events, audit logs, update metrics
Response: Return HTTP response with tracing context

Event-Driven Interactions

The platform uses an event bus for asynchronous communication between services, enabling loose coupling and scalability.

sequenceDiagram
    participant Publisher
    participant EventBus
    participant Kafka
    participant Subscriber1
    participant Subscriber2
    
    Publisher->>EventBus: Publish(event)
    EventBus->>EventBus: Serialize event
    EventBus->>EventBus: Add metadata (trace_id, user_id)
    EventBus->>Kafka: Send to topic
    Kafka-->>EventBus: Acknowledged
    
    Kafka->>Subscriber1: Deliver event
    Kafka->>Subscriber2: Deliver event
    
    Subscriber1->>Subscriber1: Process event
    Subscriber1->>Subscriber1: Update state
    Subscriber1->>Subscriber1: Emit new events (optional)
    
    Subscriber2->>Subscriber2: Process event
    Subscriber2->>Subscriber2: Update state
    
    Note over Subscriber1,Subscriber2: Events processed asynchronously

Event Processing Flow

Event Publishing: Service publishes event to event bus
Event Serialization: Event is serialized with metadata
Event Distribution: Event bus distributes to Kafka topic
Event Consumption: Subscribers consume events from Kafka
Event Processing: Each subscriber processes event independently
State Updates: Subscribers update their own state
Cascade Events: Subscribers may publish new events

Background Job Processing

Background jobs are scheduled and processed asynchronously, enabling long-running tasks and scheduled operations.

sequenceDiagram
    participant Scheduler
    participant JobQueue
    participant Worker
    participant Service
    participant DB
    participant EventBus
    
    Scheduler->>JobQueue: Enqueue job
    JobQueue->>JobQueue: Store job definition
    
    Worker->>JobQueue: Poll for jobs
    JobQueue-->>Worker: Job definition
    
    Worker->>Worker: Start job execution
    Worker->>Service: Execute job logic
    Service->>DB: Update data
    Service->>EventBus: Publish events
    
    Service-->>Worker: Job complete
    Worker->>JobQueue: Mark job complete
    
    alt Job fails
        Worker->>JobQueue: Mark job failed
        JobQueue->>JobQueue: Schedule retry
    end

Background Job Flow

Job Scheduling: Jobs scheduled via cron or programmatically
Job Enqueueing: Job definition stored in job queue
Job Polling: Workers poll queue for available jobs
Job Execution: Worker executes job logic
Job Completion: Job marked as complete or failed
Job Retry: Failed jobs retried with exponential backoff

Error Recovery and Resilience

The platform implements multiple layers of error handling to ensure system resilience.

graph TD
    Error[Error Occurs] --> Handler{Error Handler}
    
    Handler -->|Business Error| BusinessError[Business Error Handler]
    Handler -->|System Error| SystemError[System Error Handler]
    Handler -->|Panic| PanicHandler[Panic Recovery]
    
    BusinessError --> ErrorBus[Error Bus]
    SystemError --> ErrorBus
    PanicHandler --> ErrorBus
    
    ErrorBus --> Logger[Logger]
    ErrorBus --> Sentry[Sentry]
    ErrorBus --> Metrics[Metrics]
    
    BusinessError --> Response[HTTP Response]
    SystemError --> Response
    PanicHandler --> Response
    
    Response --> Client[Client]
    
    style Error fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style ErrorBus fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff

Error Handling Layers

Panic Recovery: Middleware catches panics and prevents crashes
Error Classification: Errors classified as business or system errors
Error Bus: Central error bus collects all errors
Error Logging: Errors logged with full context
Error Reporting: Critical errors reported to Sentry
Error Metrics: Errors tracked in metrics
Error Response: Appropriate HTTP response returned

System Shutdown Sequence

The platform implements graceful shutdown to ensure data consistency and proper resource cleanup.

sequenceDiagram
    participant Signal
    participant Main
    participant HTTP
    participant gRPC
    participant ServiceRegistry
    participant DI
    participant Workers
    participant DB
    
    Signal->>Main: SIGTERM/SIGINT
    Main->>HTTP: Stop accepting requests
    HTTP->>HTTP: Wait for active requests
    HTTP-->>Main: HTTP server stopped
    
    Main->>gRPC: Stop accepting connections
    gRPC->>gRPC: Wait for active calls
    gRPC-->>Main: gRPC server stopped
    
    Main->>ServiceRegistry: Deregister service
    ServiceRegistry->>ServiceRegistry: Remove from registry
    ServiceRegistry-->>Main: Service deregistered
    
    Main->>Workers: Stop workers
    Workers->>Workers: Finish current jobs
    Workers-->>Main: Workers stopped
    
    Main->>DI: Stop lifecycle
    DI->>DI: Execute OnStop hooks
    DI->>DI: Close connections
    DI->>DB: Close DB connections
    DI-->>Main: Services stopped
    
    Main->>Main: Exit

Shutdown Phases

Signal Reception: Receive SIGTERM or SIGINT
Stop Accepting Requests: HTTP and gRPC servers stop accepting new requests
Wait for Active Requests: Wait for in-flight requests to complete
Service Deregistration: Remove service from service registry
Worker Shutdown: Stop background workers gracefully
Lifecycle Hooks: Execute OnStop hooks for all services
Resource Cleanup: Close database connections, release resources
Application Exit: Exit application cleanly

Health Check and Monitoring Flow

Health checks and metrics provide visibility into system health and performance.

graph TD
    HealthEndpoint["/healthz"] --> HealthRegistry[Health Registry]
    HealthRegistry --> CheckDB[Check Database]
    HealthRegistry --> CheckCache[Check Cache]
    HealthRegistry --> CheckEventBus[Check Event Bus]
    
    CheckDB -->|Healthy| Aggregate[Aggregate Results]
    CheckCache -->|Healthy| Aggregate
    CheckEventBus -->|Healthy| Aggregate
    
    Aggregate -->|All Healthy| Response200[200 OK]
    Aggregate -->|Unhealthy| Response503[503 Service Unavailable]
    
    MetricsEndpoint["/metrics"] --> MetricsRegistry[Metrics Registry]
    MetricsRegistry --> Prometheus[Prometheus Format]
    Prometheus --> ResponseMetrics[Metrics Response]
    
    style HealthRegistry fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff
    style MetricsRegistry fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff

Health Check Components

Liveness Check: Service is running (process health)
Readiness Check: Service is ready to accept requests (dependency health)
Dependency Checks: Database, cache, event bus connectivity
Metrics Collection: Request counts, durations, error rates
Metrics Export: Prometheus-formatted metrics

Integration Points

This system behavior integrates with:

Service Orchestration: How services coordinate during startup and operation
Module Integration Patterns: How modules integrate during bootstrap
Operational Scenarios: Specific operational flows and use cases
Data Flow Patterns: Detailed data flow through the system
Architecture Overview: System architecture and component relationships

Architecture Overview - System architecture
Service Orchestration - Service coordination
Module Integration Patterns - Module integration
Operational Scenarios - Common operational flows
Component Relationships - Component dependencies

15 KiB Raw Permalink Blame History