2.4 KiB
2.4 KiB
Story 6.1: Enhanced Observability
Metadata
- Story ID: 6.1
- Title: Enhanced Observability
- Epic: 6 - Observability & Production Readiness
- Status: Pending
- Priority: High
- Estimated Time: 6-8 hours
- Dependencies: 1.6, 5.2, 5.1
Goal
Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics expansion, and improved logging with request correlation.
Description
This story enhances the observability system by completing OpenTelemetry integration with all infrastructure components, expanding Prometheus metrics, and improving logging with better correlation and structured fields.
Deliverables
1. Complete OpenTelemetry Integration
- Export traces to Jaeger/OTLP collector
- Add database instrumentation (Ent interceptor)
- Add Kafka instrumentation
- Add Redis instrumentation
- Create custom spans:
- Module initialization spans
- Background job spans
- Event publishing spans
- Trace context propagation:
- Include trace ID in logs
- Propagate across HTTP calls
- Include in error reports
2. Prometheus Metrics Expansion
- Add more metrics:
- Database connection pool stats
- Cache hit/miss ratio
- Event bus publish/consume rates
- Background job execution times
- Module-specific metrics (via module interface)
- Create metric labels:
modulelabel for module metricstenant_idlabel (if multi-tenant)statuslabel for error rates
3. Enhanced Logging
- Add structured fields:
user_idfrom contexttenant_idfrom contextmodulename for module logstrace_idfrom OpenTelemetry
- Create log aggregation config:
- JSON format for production
- Human-readable for development
- Support for Loki/CloudWatch/ELK
Acceptance Criteria
- Traces are exported and visible in Jaeger
- All infrastructure components are instrumented
- Trace IDs are included in logs
- Metrics are expanded with new dimensions
- Logs include all correlation fields
- Log aggregation works correctly
Files to Create/Modify
internal/observability/tracer.go- Enhanced tracinginternal/infra/database/client.go- Add tracinginternal/infra/cache/redis_cache.go- Add tracinginternal/infra/bus/kafka_bus.go- Add tracinginternal/metrics/metrics.go- Expanded metricsinternal/logger/zap_logger.go- Enhanced logging