feat: reword phase to epic, update mkdocs
This commit is contained in:
72
docs/content/stories/epic6/6.1-enhanced-observability.md
Normal file
72
docs/content/stories/epic6/6.1-enhanced-observability.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Story 6.1: Enhanced Observability
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.1
|
||||
- **Title**: Enhanced Observability
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: High
|
||||
- **Estimated Time**: 6-8 hours
|
||||
- **Dependencies**: 1.6, 5.2, 5.1
|
||||
|
||||
## Goal
|
||||
Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics expansion, and improved logging with request correlation.
|
||||
|
||||
## Description
|
||||
This story enhances the observability system by completing OpenTelemetry integration with all infrastructure components, expanding Prometheus metrics, and improving logging with better correlation and structured fields.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Complete OpenTelemetry Integration
|
||||
- Export traces to Jaeger/OTLP collector
|
||||
- Add database instrumentation (Ent interceptor)
|
||||
- Add Kafka instrumentation
|
||||
- Add Redis instrumentation
|
||||
- Create custom spans:
|
||||
- Module initialization spans
|
||||
- Background job spans
|
||||
- Event publishing spans
|
||||
- Trace context propagation:
|
||||
- Include trace ID in logs
|
||||
- Propagate across HTTP calls
|
||||
- Include in error reports
|
||||
|
||||
### 2. Prometheus Metrics Expansion
|
||||
- Add more metrics:
|
||||
- Database connection pool stats
|
||||
- Cache hit/miss ratio
|
||||
- Event bus publish/consume rates
|
||||
- Background job execution times
|
||||
- Module-specific metrics (via module interface)
|
||||
- Create metric labels:
|
||||
- `module` label for module metrics
|
||||
- `tenant_id` label (if multi-tenant)
|
||||
- `status` label for error rates
|
||||
|
||||
### 3. Enhanced Logging
|
||||
- Add structured fields:
|
||||
- `user_id` from context
|
||||
- `tenant_id` from context
|
||||
- `module` name for module logs
|
||||
- `trace_id` from OpenTelemetry
|
||||
- Create log aggregation config:
|
||||
- JSON format for production
|
||||
- Human-readable for development
|
||||
- Support for Loki/CloudWatch/ELK
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Traces are exported and visible in Jaeger
|
||||
- [ ] All infrastructure components are instrumented
|
||||
- [ ] Trace IDs are included in logs
|
||||
- [ ] Metrics are expanded with new dimensions
|
||||
- [ ] Logs include all correlation fields
|
||||
- [ ] Log aggregation works correctly
|
||||
|
||||
## Files to Create/Modify
|
||||
- `internal/observability/tracer.go` - Enhanced tracing
|
||||
- `internal/infra/database/client.go` - Add tracing
|
||||
- `internal/infra/cache/redis_cache.go` - Add tracing
|
||||
- `internal/infra/bus/kafka_bus.go` - Add tracing
|
||||
- `internal/metrics/metrics.go` - Expanded metrics
|
||||
- `internal/logger/zap_logger.go` - Enhanced logging
|
||||
|
||||
Reference in New Issue
Block a user