Mermaid graphs require node labels with special characters like forward slashes to be quoted. Changed /healthz and /metrics from square bracket format to quoted string format to fix the lexical error.
376 lines
13 KiB
Markdown
376 lines
13 KiB
Markdown
# System Behavior Overview
|
|
|
|
## Purpose
|
|
|
|
This document provides a high-level explanation of how the Go Platform behaves end-to-end, focusing on system-level operations, flows, and interactions rather than implementation details.
|
|
|
|
## Overview
|
|
|
|
The Go Platform is a microservices-based system where each module operates as an independent service. Services communicate via gRPC (primary) or HTTP (fallback), share infrastructure components (PostgreSQL, Redis, Kafka), and are orchestrated through service discovery and dependency injection.
|
|
|
|
## Key Concepts
|
|
|
|
- **Services**: Independent processes that can be deployed and scaled separately
|
|
- **Service Clients**: Abstraction layer for inter-service communication
|
|
- **Service Registry**: Central registry for service discovery
|
|
- **Event Bus**: Asynchronous communication channel for events
|
|
- **DI Container**: Dependency injection container managing service lifecycle
|
|
|
|
## Application Bootstrap Sequence
|
|
|
|
The platform follows a well-defined startup sequence that ensures all services are properly initialized and registered.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Main
|
|
participant Config
|
|
participant Logger
|
|
participant DI
|
|
participant Registry
|
|
participant ModuleLoader
|
|
participant ServiceRegistry
|
|
participant HTTP
|
|
participant gRPC
|
|
|
|
Main->>Config: Load configuration
|
|
Config-->>Main: Config ready
|
|
|
|
Main->>Logger: Initialize logger
|
|
Logger-->>Main: Logger ready
|
|
|
|
Main->>DI: Create DI container
|
|
DI->>DI: Register core services
|
|
DI-->>Main: DI container ready
|
|
|
|
Main->>ModuleLoader: Discover modules
|
|
ModuleLoader->>ModuleLoader: Scan module directories
|
|
ModuleLoader->>ModuleLoader: Load module.yaml files
|
|
ModuleLoader-->>Main: Module list
|
|
|
|
Main->>Registry: Register modules
|
|
Registry->>Registry: Resolve dependencies
|
|
Registry->>Registry: Order modules
|
|
Registry-->>Main: Ordered modules
|
|
|
|
loop For each module
|
|
Main->>Module: Initialize module
|
|
Module->>DI: Register services
|
|
Module->>Registry: Register routes
|
|
Module->>Registry: Register migrations
|
|
end
|
|
|
|
Main->>Registry: Run migrations
|
|
Registry->>Registry: Execute in dependency order
|
|
|
|
Main->>ServiceRegistry: Register service
|
|
ServiceRegistry->>ServiceRegistry: Register with Consul/K8s
|
|
ServiceRegistry-->>Main: Service registered
|
|
|
|
Main->>gRPC: Start gRPC server
|
|
Main->>HTTP: Start HTTP server
|
|
HTTP-->>Main: Server ready
|
|
gRPC-->>Main: Server ready
|
|
|
|
Main->>DI: Start lifecycle
|
|
DI->>DI: Execute OnStart hooks
|
|
DI-->>Main: All services started
|
|
```
|
|
|
|
### Bootstrap Phases
|
|
|
|
1. **Configuration Loading**: Load YAML files, environment variables, and secrets
|
|
2. **Foundation Services**: Initialize logger, config provider, DI container
|
|
3. **Module Discovery**: Scan and load module manifests
|
|
4. **Dependency Resolution**: Build dependency graph and order modules
|
|
5. **Module Initialization**: Initialize each module in dependency order
|
|
6. **Database Migrations**: Run migrations in dependency order
|
|
7. **Service Registration**: Register service with service registry
|
|
8. **Server Startup**: Start HTTP and gRPC servers
|
|
9. **Lifecycle Hooks**: Execute OnStart hooks for all services
|
|
|
|
## Request Processing Pipeline
|
|
|
|
Every HTTP request flows through a standardized pipeline that ensures security, observability, and proper error handling.
|
|
|
|
```mermaid
|
|
graph TD
|
|
Start([HTTP Request]) --> Auth[Authentication Middleware]
|
|
Auth -->|Valid Token| Authz[Authorization Middleware]
|
|
Auth -->|Invalid Token| Error1[401 Unauthorized]
|
|
|
|
Authz -->|Authorized| RateLimit[Rate Limiting]
|
|
Authz -->|Unauthorized| Error2[403 Forbidden]
|
|
|
|
RateLimit -->|Within Limits| Tracing[OpenTelemetry Tracing]
|
|
RateLimit -->|Rate Limited| Error3[429 Too Many Requests]
|
|
|
|
Tracing --> Handler[Request Handler]
|
|
Handler --> Service[Domain Service]
|
|
|
|
Service --> Cache{Cache Check}
|
|
Cache -->|Hit| Return[Return Cached Data]
|
|
Cache -->|Miss| Repo[Repository]
|
|
|
|
Repo --> DB[(Database)]
|
|
DB --> Repo
|
|
Repo --> Service
|
|
Service --> CacheStore[Update Cache]
|
|
|
|
Service --> EventBus[Publish Events]
|
|
Service --> Audit[Audit Logging]
|
|
Service --> Metrics[Update Metrics]
|
|
|
|
Service --> Handler
|
|
Handler --> Tracing
|
|
Tracing --> Response[HTTP Response]
|
|
|
|
Error1 --> Response
|
|
Error2 --> Response
|
|
Error3 --> Response
|
|
Return --> Response
|
|
|
|
style Auth fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
|
style Authz fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
|
style Service fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Request Processing Stages
|
|
|
|
1. **Authentication**: Extract and validate JWT token, add user to context
|
|
2. **Authorization**: Check user permissions for requested resource
|
|
3. **Rate Limiting**: Enforce per-user and per-IP rate limits
|
|
4. **Tracing**: Start/continue distributed trace
|
|
5. **Handler Processing**: Execute request handler
|
|
6. **Service Logic**: Execute business logic
|
|
7. **Data Access**: Query database or cache
|
|
8. **Side Effects**: Publish events, audit logs, update metrics
|
|
9. **Response**: Return HTTP response with tracing context
|
|
|
|
## Event-Driven Interactions
|
|
|
|
The platform uses an event bus for asynchronous communication between services, enabling loose coupling and scalability.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Publisher
|
|
participant EventBus
|
|
participant Kafka
|
|
participant Subscriber1
|
|
participant Subscriber2
|
|
|
|
Publisher->>EventBus: Publish(event)
|
|
EventBus->>EventBus: Serialize event
|
|
EventBus->>EventBus: Add metadata (trace_id, user_id)
|
|
EventBus->>Kafka: Send to topic
|
|
Kafka-->>EventBus: Acknowledged
|
|
|
|
Kafka->>Subscriber1: Deliver event
|
|
Kafka->>Subscriber2: Deliver event
|
|
|
|
Subscriber1->>Subscriber1: Process event
|
|
Subscriber1->>Subscriber1: Update state
|
|
Subscriber1->>Subscriber1: Emit new events (optional)
|
|
|
|
Subscriber2->>Subscriber2: Process event
|
|
Subscriber2->>Subscriber2: Update state
|
|
|
|
Note over Subscriber1,Subscriber2: Events processed asynchronously
|
|
```
|
|
|
|
### Event Processing Flow
|
|
|
|
1. **Event Publishing**: Service publishes event to event bus
|
|
2. **Event Serialization**: Event is serialized with metadata
|
|
3. **Event Distribution**: Event bus distributes to Kafka topic
|
|
4. **Event Consumption**: Subscribers consume events from Kafka
|
|
5. **Event Processing**: Each subscriber processes event independently
|
|
6. **State Updates**: Subscribers update their own state
|
|
7. **Cascade Events**: Subscribers may publish new events
|
|
|
|
## Background Job Processing
|
|
|
|
Background jobs are scheduled and processed asynchronously, enabling long-running tasks and scheduled operations.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Scheduler
|
|
participant JobQueue
|
|
participant Worker
|
|
participant Service
|
|
participant DB
|
|
participant EventBus
|
|
|
|
Scheduler->>JobQueue: Enqueue job
|
|
JobQueue->>JobQueue: Store job definition
|
|
|
|
Worker->>JobQueue: Poll for jobs
|
|
JobQueue-->>Worker: Job definition
|
|
|
|
Worker->>Worker: Start job execution
|
|
Worker->>Service: Execute job logic
|
|
Service->>DB: Update data
|
|
Service->>EventBus: Publish events
|
|
|
|
Service-->>Worker: Job complete
|
|
Worker->>JobQueue: Mark job complete
|
|
|
|
alt Job fails
|
|
Worker->>JobQueue: Mark job failed
|
|
JobQueue->>JobQueue: Schedule retry
|
|
end
|
|
```
|
|
|
|
### Background Job Flow
|
|
|
|
1. **Job Scheduling**: Jobs scheduled via cron or programmatically
|
|
2. **Job Enqueueing**: Job definition stored in job queue
|
|
3. **Job Polling**: Workers poll queue for available jobs
|
|
4. **Job Execution**: Worker executes job logic
|
|
5. **Job Completion**: Job marked as complete or failed
|
|
6. **Job Retry**: Failed jobs retried with exponential backoff
|
|
|
|
## Error Recovery and Resilience
|
|
|
|
The platform implements multiple layers of error handling to ensure system resilience.
|
|
|
|
```mermaid
|
|
graph TD
|
|
Error[Error Occurs] --> Handler{Error Handler}
|
|
|
|
Handler -->|Business Error| BusinessError[Business Error Handler]
|
|
Handler -->|System Error| SystemError[System Error Handler]
|
|
Handler -->|Panic| PanicHandler[Panic Recovery]
|
|
|
|
BusinessError --> ErrorBus[Error Bus]
|
|
SystemError --> ErrorBus
|
|
PanicHandler --> ErrorBus
|
|
|
|
ErrorBus --> Logger[Logger]
|
|
ErrorBus --> Sentry[Sentry]
|
|
ErrorBus --> Metrics[Metrics]
|
|
|
|
BusinessError --> Response[HTTP Response]
|
|
SystemError --> Response
|
|
PanicHandler --> Response
|
|
|
|
Response --> Client[Client]
|
|
|
|
style Error fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
|
style ErrorBus fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Error Handling Layers
|
|
|
|
1. **Panic Recovery**: Middleware catches panics and prevents crashes
|
|
2. **Error Classification**: Errors classified as business or system errors
|
|
3. **Error Bus**: Central error bus collects all errors
|
|
4. **Error Logging**: Errors logged with full context
|
|
5. **Error Reporting**: Critical errors reported to Sentry
|
|
6. **Error Metrics**: Errors tracked in metrics
|
|
7. **Error Response**: Appropriate HTTP response returned
|
|
|
|
## System Shutdown Sequence
|
|
|
|
The platform implements graceful shutdown to ensure data consistency and proper resource cleanup.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Signal
|
|
participant Main
|
|
participant HTTP
|
|
participant gRPC
|
|
participant ServiceRegistry
|
|
participant DI
|
|
participant Workers
|
|
participant DB
|
|
|
|
Signal->>Main: SIGTERM/SIGINT
|
|
Main->>HTTP: Stop accepting requests
|
|
HTTP->>HTTP: Wait for active requests
|
|
HTTP-->>Main: HTTP server stopped
|
|
|
|
Main->>gRPC: Stop accepting connections
|
|
gRPC->>gRPC: Wait for active calls
|
|
gRPC-->>Main: gRPC server stopped
|
|
|
|
Main->>ServiceRegistry: Deregister service
|
|
ServiceRegistry->>ServiceRegistry: Remove from registry
|
|
ServiceRegistry-->>Main: Service deregistered
|
|
|
|
Main->>Workers: Stop workers
|
|
Workers->>Workers: Finish current jobs
|
|
Workers-->>Main: Workers stopped
|
|
|
|
Main->>DI: Stop lifecycle
|
|
DI->>DI: Execute OnStop hooks
|
|
DI->>DI: Close connections
|
|
DI->>DB: Close DB connections
|
|
DI-->>Main: Services stopped
|
|
|
|
Main->>Main: Exit
|
|
```
|
|
|
|
### Shutdown Phases
|
|
|
|
1. **Signal Reception**: Receive SIGTERM or SIGINT
|
|
2. **Stop Accepting Requests**: HTTP and gRPC servers stop accepting new requests
|
|
3. **Wait for Active Requests**: Wait for in-flight requests to complete
|
|
4. **Service Deregistration**: Remove service from service registry
|
|
5. **Worker Shutdown**: Stop background workers gracefully
|
|
6. **Lifecycle Hooks**: Execute OnStop hooks for all services
|
|
7. **Resource Cleanup**: Close database connections, release resources
|
|
8. **Application Exit**: Exit application cleanly
|
|
|
|
## Health Check and Monitoring Flow
|
|
|
|
Health checks and metrics provide visibility into system health and performance.
|
|
|
|
```mermaid
|
|
graph TD
|
|
HealthEndpoint["/healthz"] --> HealthRegistry[Health Registry]
|
|
HealthRegistry --> CheckDB[Check Database]
|
|
HealthRegistry --> CheckCache[Check Cache]
|
|
HealthRegistry --> CheckEventBus[Check Event Bus]
|
|
|
|
CheckDB -->|Healthy| Aggregate[Aggregate Results]
|
|
CheckCache -->|Healthy| Aggregate
|
|
CheckEventBus -->|Healthy| Aggregate
|
|
|
|
Aggregate -->|All Healthy| Response200[200 OK]
|
|
Aggregate -->|Unhealthy| Response503[503 Service Unavailable]
|
|
|
|
MetricsEndpoint["/metrics"] --> MetricsRegistry[Metrics Registry]
|
|
MetricsRegistry --> Prometheus[Prometheus Format]
|
|
Prometheus --> ResponseMetrics[Metrics Response]
|
|
|
|
style HealthRegistry fill:#50c878,stroke:#2e7d4e,stroke-width:2px,color:#fff
|
|
style MetricsRegistry fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
|
|
```
|
|
|
|
### Health Check Components
|
|
|
|
- **Liveness Check**: Service is running (process health)
|
|
- **Readiness Check**: Service is ready to accept requests (dependency health)
|
|
- **Dependency Checks**: Database, cache, event bus connectivity
|
|
- **Metrics Collection**: Request counts, durations, error rates
|
|
- **Metrics Export**: Prometheus-formatted metrics
|
|
|
|
## Integration Points
|
|
|
|
This system behavior integrates with:
|
|
|
|
- **[Service Orchestration](service-orchestration.md)**: How services coordinate during startup and operation
|
|
- **[Module Integration Patterns](module-integration-patterns.md)**: How modules integrate during bootstrap
|
|
- **[Operational Scenarios](operational-scenarios.md)**: Specific operational flows and use cases
|
|
- **[Data Flow Patterns](data-flow-patterns.md)**: Detailed data flow through the system
|
|
- **[Architecture Overview](architecture.md)**: System architecture and component relationships
|
|
|
|
## Related Documentation
|
|
|
|
- [Architecture Overview](architecture.md) - System architecture
|
|
- [Service Orchestration](service-orchestration.md) - Service coordination
|
|
- [Module Integration Patterns](module-integration-patterns.md) - Module integration
|
|
- [Operational Scenarios](operational-scenarios.md) - Common operational flows
|
|
- [Component Relationships](component-relationships.md) - Component dependencies
|
|
|