docs: Align documentation with true microservices architecture

Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.

Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
  - Each service has own entry point (cmd/{service}/)
  - Each service has own gRPC server and database schema
  - Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
  - Single entry point for all external traffic
  - Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation

Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files

New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)

New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation
This commit is contained in:
2025-11-06 08:47:27 +01:00
parent cab7cadf9e
commit 38a251968c
47 changed files with 3190 additions and 1613 deletions

View File

@@ -1,14 +1,16 @@
# Epic 6: Observability & Production Readiness
## Overview
Enhance observability with full OpenTelemetry integration, add comprehensive error reporting (Sentry), create Grafana dashboards, improve logging with request correlation, add rate limiting and security hardening, and optimize performance.
Enhance observability with full OpenTelemetry integration across all services, add comprehensive error reporting (Sentry), create Grafana dashboards for service monitoring, improve logging with request correlation across services, add rate limiting (primarily at API Gateway), security hardening, and optimize performance for microservices architecture.
**Note:** Observability spans all services - distributed tracing, service-level metrics, and cross-service log correlation.
## Stories
### 6.1 Enhanced Observability
- [Story: 6.1 - Enhanced Observability](./6.1-enhanced-observability.md)
- **Goal:** Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics, and improved logging.
- **Deliverables:** Complete OpenTelemetry integration, expanded metrics, enhanced logging
- **Goal:** Enhance observability with full OpenTelemetry integration across all services, comprehensive Prometheus metrics per service, and improved logging with trace correlation.
- **Deliverables:** Complete OpenTelemetry integration, expanded metrics per service, enhanced logging with trace IDs
### 6.2 Error Reporting (Sentry)
- [Story: 6.2 - Error Reporting](./6.2-error-reporting.md)
@@ -17,13 +19,15 @@ Enhance observability with full OpenTelemetry integration, add comprehensive err
### 6.3 Grafana Dashboards
- [Story: 6.3 - Grafana Dashboards](./6.3-grafana-dashboards.md)
- **Goal:** Create comprehensive Grafana dashboards for monitoring.
- **Deliverables:** Grafana dashboard JSON files, documentation
- **Goal:** Create comprehensive Grafana dashboards for monitoring all services.
- **Deliverables:** Grafana dashboard JSON files per service, service-level dashboards, cross-service dashboards, documentation
### 6.4 Rate Limiting
- [Story: 6.4 - Rate Limiting](./6.4-rate-limiting.md)
- **Goal:** Implement rate limiting to prevent API abuse.
- **Deliverables:** Rate limiting middleware, configuration
- **Goal:** Implement rate limiting primarily at API Gateway level, with per-service rate limiting support.
- **Deliverables:** Rate limiting middleware for API Gateway, per-service rate limiting support, Redis-backed rate limiting
**Note:** Rate limiting is primarily implemented in API Gateway (Epic 1, Story 1.8). This story adds per-service rate limiting capabilities.
### 6.5 Security Hardening
- [Story: 6.5 - Security Hardening](./6.5-security-hardening.md)
@@ -46,10 +50,11 @@ Enhance observability with full OpenTelemetry integration, add comprehensive err
- [ ] Performance optimizations
## Acceptance Criteria
- Traces are exported and visible in Jaeger
- Errors are reported to Sentry with context
- Logs include request IDs and trace IDs
- Metrics are exposed and scraped by Prometheus
- Rate limiting prevents abuse
- Security headers are present
- Distributed traces span all services and are visible in Jaeger
- Errors are reported to Sentry with service context
- Logs include request IDs and trace IDs for correlation across services
- Metrics are exposed per service and scraped by Prometheus
- Rate limiting prevents abuse (primarily at API Gateway)
- Security headers are present on all services
- Performance meets SLA (< 100ms p95 for auth endpoints)
- Service-level dashboards available in Grafana