Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.
Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
- Each service has own entry point (cmd/{service}/)
- Each service has own gRPC server and database schema
- Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
- Single entry point for all external traffic
- Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation
Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files
New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)
New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation
436 lines
15 KiB
Markdown
436 lines
15 KiB
Markdown
# System Behavior Overview
|
|
|
|
## Purpose
|
|
|
|
This document provides a high-level explanation of how the Go Platform behaves end-to-end, focusing on system-level operations, flows, and interactions rather than implementation details.
|
|
|
|
## Overview
|
|
|
|
The Go Platform is a microservices-based system where each service is independently deployable from day one. Services communicate via gRPC (primary) or HTTP (fallback) through service clients, share infrastructure components (PostgreSQL instance, Redis, Kafka), and are orchestrated through service discovery and dependency injection. All external traffic enters through the API Gateway.
|
|
|
|
## Key Concepts
|
|
|
|
- **Services**: Independent processes that can be deployed and scaled separately
|
|
- **Service Clients**: Abstraction layer for inter-service communication
|
|
- **Service Registry**: Central registry for service discovery
|
|
- **Event Bus**: Asynchronous communication channel for events
|
|
- **DI Container**: Dependency injection container managing service lifecycle
|
|
|
|
## Service Bootstrap Sequence
|
|
|
|
Each service (API Gateway, Auth, Identity, Authz, Audit, and feature services) follows a well-defined startup sequence. Services bootstrap independently.
|
|
|
|
### Individual Service Startup
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Main
|
|
participant Config
|
|
participant Logger
|
|
participant DI
|
|
participant ServiceImpl
|
|
participant ServiceRegistry
|
|
participant DB
|
|
participant HTTP
|
|
participant gRPC
|
|
|
|
Main->>Config: Load configuration
|
|
Config-->>Main: Config ready
|
|
|
|
Main->>Logger: Initialize logger
|
|
Logger-->>Main: Logger ready
|
|
|
|
Main->>DI: Create DI container
|
|
DI->>DI: Register core kernel services
|
|
DI-->>Main: DI container ready
|
|
|
|
Main->>ServiceImpl: Register service implementation
|
|
ServiceImpl->>DI: Register service dependencies
|
|
ServiceImpl->>DB: Connect to database
|
|
DB-->>ServiceImpl: Connection ready
|
|
|
|
Main->>DB: Run migrations
|
|
DB-->>Main: Migrations complete
|
|
|
|
Main->>ServiceRegistry: Register service
|
|
ServiceRegistry->>ServiceRegistry: Register with Consul/K8s
|
|
ServiceRegistry-->>Main: Service registered
|
|
|
|
Main->>gRPC: Start gRPC server
|
|
Main->>HTTP: Start HTTP server (if needed)
|
|
HTTP-->>Main: HTTP server ready
|
|
gRPC-->>Main: gRPC server ready
|
|
|
|
Main->>DI: Start lifecycle
|
|
DI->>DI: Execute OnStart hooks
|
|
DI-->>Main: Service started
|
|
```
|
|
|
|
### Platform Startup (All Services)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Docker
|
|
participant Gateway
|
|
participant AuthSvc
|
|
participant IdentitySvc
|
|
participant AuthzSvc
|
|
participant AuditSvc
|
|
participant BlogSvc
|
|
participant Registry
|
|
participant DB
|
|
|
|
Docker->>DB: Start PostgreSQL
|
|
Docker->>Registry: Start Consul
|
|
DB-->>Docker: Database ready
|
|
Registry-->>Docker: Registry ready
|
|
|
|
par Service Startup (in parallel)
|
|
Docker->>Gateway: Start API Gateway
|
|
Gateway->>Registry: Register
|
|
Gateway->>Gateway: Start HTTP server
|
|
Gateway-->>Docker: Gateway ready
|
|
and
|
|
Docker->>AuthSvc: Start Auth Service
|
|
AuthSvc->>DB: Connect
|
|
AuthSvc->>Registry: Register
|
|
AuthSvc->>AuthSvc: Start gRPC server
|
|
AuthSvc-->>Docker: Auth Service ready
|
|
and
|
|
Docker->>IdentitySvc: Start Identity Service
|
|
IdentitySvc->>DB: Connect
|
|
IdentitySvc->>Registry: Register
|
|
IdentitySvc->>IdentitySvc: Start gRPC server
|
|
IdentitySvc-->>Docker: Identity Service ready
|
|
and
|
|
Docker->>AuthzSvc: Start Authz Service
|
|
AuthzSvc->>DB: Connect
|
|
AuthzSvc->>Registry: Register
|
|
AuthzSvc->>AuthzSvc: Start gRPC server
|
|
AuthzSvc-->>Docker: Authz Service ready
|
|
and
|
|
Docker->>AuditSvc: Start Audit Service
|
|
AuditSvc->>DB: Connect
|
|
AuditSvc->>Registry: Register
|
|
AuditSvc->>AuditSvc: Start gRPC server
|
|
AuditSvc-->>Docker: Audit Service ready
|
|
and
|
|
Docker->>BlogSvc: Start Blog Service
|
|
BlogSvc->>DB: Connect
|
|
BlogSvc->>Registry: Register
|
|
BlogSvc->>BlogSvc: Start gRPC server
|
|
BlogSvc-->>Docker: Blog Service ready
|
|
end
|
|
|
|
Docker->>Docker: All services ready
|
|
```
|
|
|
|
### Service Bootstrap Phases (Per Service)
|
|
|
|
1. **Configuration Loading**: Load YAML files, environment variables, and secrets
|
|
2. **Foundation Services**: Initialize core kernel (logger, config, DI container)
|
|
3. **Database Connection**: Connect to database with own connection pool
|
|
4. **Service Implementation**: Register service-specific implementations
|
|
5. **Database Migrations**: Run service-specific migrations
|
|
6. **Service Registration**: Register service with service registry
|
|
7. **Server Startup**: Start gRPC server (and HTTP if needed)
|
|
8. **Lifecycle Hooks**: Execute OnStart hooks
|
|
|
|
### Platform Startup Order
|
|
|
|
1. **Infrastructure**: Start PostgreSQL, Redis, Kafka, Consul
|
|
2. **Core Services**: Start Auth, Identity, Authz, Audit services (can start in parallel)
|
|
3. **API Gateway**: Start API Gateway (depends on service registry)
|
|
4. **Feature Services**: Start Blog, Billing, etc. (can start in parallel)
|
|
5. **Health Checks**: All services report healthy to registry
|
|
|
|
## Request Processing Pipeline
|
|
|
|
Every HTTP request flows through API Gateway first, then to backend services. The pipeline ensures security, observability, and proper error handling.
|
|
|
|
```mermaid
|
|
graph TD
|
|
Start([HTTP Request]) --> Gateway[API Gateway]
|
|
Gateway --> RateLimit[Rate Limiting]
|
|
RateLimit -->|Allowed| Auth[Validate JWT via Auth Service]
|
|
RateLimit -->|Exceeded| Error0[429 Too Many Requests]
|
|
|
|
Auth -->|Valid Token| Authz[Check Permission via Authz Service]
|
|
Auth -->|Invalid Token| Error1[401 Unauthorized]
|
|
|
|
Authz -->|Authorized| RateLimit[Rate Limiting]
|
|
Authz -->|Unauthorized| Error2[403 Forbidden]
|
|
|
|
RateLimit -->|Within Limits| Tracing[OpenTelemetry Tracing]
|
|
RateLimit -->|Rate Limited| Error3[429 Too Many Requests]
|
|
|
|
Tracing --> Handler[Request Handler]
|
|
Handler --> Service[Domain Service]
|
|
|
|
Service --> Cache{Cache Check}
|
|
Cache -->|Hit| Return[Return Cached Data]
|
|
Cache -->|Miss| Repo[Repository]
|
|
|
|
Repo --> DB[(Database)]
|
|
DB --> Repo
|
|
Repo --> Service
|
|
Service --> CacheStore[Update Cache]
|
|
|
|
Service --> EventBus[Publish Events]
|
|
Service --> Audit[Audit Logging]
|
|
Service --> Metrics[Update Metrics]
|
|
|
|
Service --> Handler
|
|
Handler --> Tracing
|
|
Tracing --> Response[HTTP Response]
|
|
|
|
Error1 --> Response
|
|
Error2 --> Response
|
|
Error3 --> Response
|
|
Return --> Response
|
|
|
|
style Auth fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
|
style Authz fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
|
style Service fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Request Processing Stages
|
|
|
|
1. **Authentication**: Extract and validate JWT token, add user to context
|
|
2. **Authorization**: Check user permissions for requested resource
|
|
3. **Rate Limiting**: Enforce per-user and per-IP rate limits
|
|
4. **Tracing**: Start/continue distributed trace
|
|
5. **Handler Processing**: Execute request handler
|
|
6. **Service Logic**: Execute business logic
|
|
7. **Data Access**: Query database or cache
|
|
8. **Side Effects**: Publish events, audit logs, update metrics
|
|
9. **Response**: Return HTTP response with tracing context
|
|
|
|
## Event-Driven Interactions
|
|
|
|
The platform uses an event bus for asynchronous communication between services, enabling loose coupling and scalability.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Publisher
|
|
participant EventBus
|
|
participant Kafka
|
|
participant Subscriber1
|
|
participant Subscriber2
|
|
|
|
Publisher->>EventBus: Publish(event)
|
|
EventBus->>EventBus: Serialize event
|
|
EventBus->>EventBus: Add metadata (trace_id, user_id)
|
|
EventBus->>Kafka: Send to topic
|
|
Kafka-->>EventBus: Acknowledged
|
|
|
|
Kafka->>Subscriber1: Deliver event
|
|
Kafka->>Subscriber2: Deliver event
|
|
|
|
Subscriber1->>Subscriber1: Process event
|
|
Subscriber1->>Subscriber1: Update state
|
|
Subscriber1->>Subscriber1: Emit new events (optional)
|
|
|
|
Subscriber2->>Subscriber2: Process event
|
|
Subscriber2->>Subscriber2: Update state
|
|
|
|
Note over Subscriber1,Subscriber2: Events processed asynchronously
|
|
```
|
|
|
|
### Event Processing Flow
|
|
|
|
1. **Event Publishing**: Service publishes event to event bus
|
|
2. **Event Serialization**: Event is serialized with metadata
|
|
3. **Event Distribution**: Event bus distributes to Kafka topic
|
|
4. **Event Consumption**: Subscribers consume events from Kafka
|
|
5. **Event Processing**: Each subscriber processes event independently
|
|
6. **State Updates**: Subscribers update their own state
|
|
7. **Cascade Events**: Subscribers may publish new events
|
|
|
|
## Background Job Processing
|
|
|
|
Background jobs are scheduled and processed asynchronously, enabling long-running tasks and scheduled operations.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Scheduler
|
|
participant JobQueue
|
|
participant Worker
|
|
participant Service
|
|
participant DB
|
|
participant EventBus
|
|
|
|
Scheduler->>JobQueue: Enqueue job
|
|
JobQueue->>JobQueue: Store job definition
|
|
|
|
Worker->>JobQueue: Poll for jobs
|
|
JobQueue-->>Worker: Job definition
|
|
|
|
Worker->>Worker: Start job execution
|
|
Worker->>Service: Execute job logic
|
|
Service->>DB: Update data
|
|
Service->>EventBus: Publish events
|
|
|
|
Service-->>Worker: Job complete
|
|
Worker->>JobQueue: Mark job complete
|
|
|
|
alt Job fails
|
|
Worker->>JobQueue: Mark job failed
|
|
JobQueue->>JobQueue: Schedule retry
|
|
end
|
|
```
|
|
|
|
### Background Job Flow
|
|
|
|
1. **Job Scheduling**: Jobs scheduled via cron or programmatically
|
|
2. **Job Enqueueing**: Job definition stored in job queue
|
|
3. **Job Polling**: Workers poll queue for available jobs
|
|
4. **Job Execution**: Worker executes job logic
|
|
5. **Job Completion**: Job marked as complete or failed
|
|
6. **Job Retry**: Failed jobs retried with exponential backoff
|
|
|
|
## Error Recovery and Resilience
|
|
|
|
The platform implements multiple layers of error handling to ensure system resilience.
|
|
|
|
```mermaid
|
|
graph TD
|
|
Error[Error Occurs] --> Handler{Error Handler}
|
|
|
|
Handler -->|Business Error| BusinessError[Business Error Handler]
|
|
Handler -->|System Error| SystemError[System Error Handler]
|
|
Handler -->|Panic| PanicHandler[Panic Recovery]
|
|
|
|
BusinessError --> ErrorBus[Error Bus]
|
|
SystemError --> ErrorBus
|
|
PanicHandler --> ErrorBus
|
|
|
|
ErrorBus --> Logger[Logger]
|
|
ErrorBus --> Sentry[Sentry]
|
|
ErrorBus --> Metrics[Metrics]
|
|
|
|
BusinessError --> Response[HTTP Response]
|
|
SystemError --> Response
|
|
PanicHandler --> Response
|
|
|
|
Response --> Client[Client]
|
|
|
|
style Error fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
|
style ErrorBus fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Error Handling Layers
|
|
|
|
1. **Panic Recovery**: Middleware catches panics and prevents crashes
|
|
2. **Error Classification**: Errors classified as business or system errors
|
|
3. **Error Bus**: Central error bus collects all errors
|
|
4. **Error Logging**: Errors logged with full context
|
|
5. **Error Reporting**: Critical errors reported to Sentry
|
|
6. **Error Metrics**: Errors tracked in metrics
|
|
7. **Error Response**: Appropriate HTTP response returned
|
|
|
|
## System Shutdown Sequence
|
|
|
|
The platform implements graceful shutdown to ensure data consistency and proper resource cleanup.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Signal
|
|
participant Main
|
|
participant HTTP
|
|
participant gRPC
|
|
participant ServiceRegistry
|
|
participant DI
|
|
participant Workers
|
|
participant DB
|
|
|
|
Signal->>Main: SIGTERM/SIGINT
|
|
Main->>HTTP: Stop accepting requests
|
|
HTTP->>HTTP: Wait for active requests
|
|
HTTP-->>Main: HTTP server stopped
|
|
|
|
Main->>gRPC: Stop accepting connections
|
|
gRPC->>gRPC: Wait for active calls
|
|
gRPC-->>Main: gRPC server stopped
|
|
|
|
Main->>ServiceRegistry: Deregister service
|
|
ServiceRegistry->>ServiceRegistry: Remove from registry
|
|
ServiceRegistry-->>Main: Service deregistered
|
|
|
|
Main->>Workers: Stop workers
|
|
Workers->>Workers: Finish current jobs
|
|
Workers-->>Main: Workers stopped
|
|
|
|
Main->>DI: Stop lifecycle
|
|
DI->>DI: Execute OnStop hooks
|
|
DI->>DI: Close connections
|
|
DI->>DB: Close DB connections
|
|
DI-->>Main: Services stopped
|
|
|
|
Main->>Main: Exit
|
|
```
|
|
|
|
### Shutdown Phases
|
|
|
|
1. **Signal Reception**: Receive SIGTERM or SIGINT
|
|
2. **Stop Accepting Requests**: HTTP and gRPC servers stop accepting new requests
|
|
3. **Wait for Active Requests**: Wait for in-flight requests to complete
|
|
4. **Service Deregistration**: Remove service from service registry
|
|
5. **Worker Shutdown**: Stop background workers gracefully
|
|
6. **Lifecycle Hooks**: Execute OnStop hooks for all services
|
|
7. **Resource Cleanup**: Close database connections, release resources
|
|
8. **Application Exit**: Exit application cleanly
|
|
|
|
## Health Check and Monitoring Flow
|
|
|
|
Health checks and metrics provide visibility into system health and performance.
|
|
|
|
```mermaid
|
|
graph TD
|
|
HealthEndpoint["/healthz"] --> HealthRegistry[Health Registry]
|
|
HealthRegistry --> CheckDB[Check Database]
|
|
HealthRegistry --> CheckCache[Check Cache]
|
|
HealthRegistry --> CheckEventBus[Check Event Bus]
|
|
|
|
CheckDB -->|Healthy| Aggregate[Aggregate Results]
|
|
CheckCache -->|Healthy| Aggregate
|
|
CheckEventBus -->|Healthy| Aggregate
|
|
|
|
Aggregate -->|All Healthy| Response200[200 OK]
|
|
Aggregate -->|Unhealthy| Response503[503 Service Unavailable]
|
|
|
|
MetricsEndpoint["/metrics"] --> MetricsRegistry[Metrics Registry]
|
|
MetricsRegistry --> Prometheus[Prometheus Format]
|
|
Prometheus --> ResponseMetrics[Metrics Response]
|
|
|
|
style HealthRegistry fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff
|
|
style MetricsRegistry fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Health Check Components
|
|
|
|
- **Liveness Check**: Service is running (process health)
|
|
- **Readiness Check**: Service is ready to accept requests (dependency health)
|
|
- **Dependency Checks**: Database, cache, event bus connectivity
|
|
- **Metrics Collection**: Request counts, durations, error rates
|
|
- **Metrics Export**: Prometheus-formatted metrics
|
|
|
|
## Integration Points
|
|
|
|
This system behavior integrates with:
|
|
|
|
- **[Service Orchestration](service-orchestration.md)**: How services coordinate during startup and operation
|
|
- **[Module Integration Patterns](module-integration-patterns.md)**: How modules integrate during bootstrap
|
|
- **[Operational Scenarios](operational-scenarios.md)**: Specific operational flows and use cases
|
|
- **[Data Flow Patterns](data-flow-patterns.md)**: Detailed data flow through the system
|
|
- **[Architecture Overview](architecture.md)**: System architecture and component relationships
|
|
|
|
## Related Documentation
|
|
|
|
- [Architecture Overview](architecture.md) - System architecture
|
|
- [Service Orchestration](service-orchestration.md) - Service coordination
|
|
- [Module Integration Patterns](module-integration-patterns.md) - Module integration
|
|
- [Operational Scenarios](operational-scenarios.md) - Common operational flows
|
|
- [Component Relationships](component-relationships.md) - Component dependencies
|
|
|