73 lines
2.4 KiB
Markdown
73 lines
2.4 KiB
Markdown
# Story 6.1: Enhanced Observability
|
|
|
|
## Metadata
|
|
- **Story ID**: 6.1
|
|
- **Title**: Enhanced Observability
|
|
- **Epic**: 6 - Observability & Production Readiness
|
|
- **Status**: Pending
|
|
- **Priority**: High
|
|
- **Estimated Time**: 6-8 hours
|
|
- **Dependencies**: 1.6, 5.2, 5.1
|
|
|
|
## Goal
|
|
Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics expansion, and improved logging with request correlation.
|
|
|
|
## Description
|
|
This story enhances the observability system by completing OpenTelemetry integration with all infrastructure components, expanding Prometheus metrics, and improving logging with better correlation and structured fields.
|
|
|
|
## Deliverables
|
|
|
|
### 1. Complete OpenTelemetry Integration
|
|
- Export traces to Jaeger/OTLP collector
|
|
- Add database instrumentation (Ent interceptor)
|
|
- Add Kafka instrumentation
|
|
- Add Redis instrumentation
|
|
- Create custom spans:
|
|
- Module initialization spans
|
|
- Background job spans
|
|
- Event publishing spans
|
|
- Trace context propagation:
|
|
- Include trace ID in logs
|
|
- Propagate across HTTP calls
|
|
- Include in error reports
|
|
|
|
### 2. Prometheus Metrics Expansion
|
|
- Add more metrics:
|
|
- Database connection pool stats
|
|
- Cache hit/miss ratio
|
|
- Event bus publish/consume rates
|
|
- Background job execution times
|
|
- Module-specific metrics (via module interface)
|
|
- Create metric labels:
|
|
- `module` label for module metrics
|
|
- `tenant_id` label (if multi-tenant)
|
|
- `status` label for error rates
|
|
|
|
### 3. Enhanced Logging
|
|
- Add structured fields:
|
|
- `user_id` from context
|
|
- `tenant_id` from context
|
|
- `module` name for module logs
|
|
- `trace_id` from OpenTelemetry
|
|
- Create log aggregation config:
|
|
- JSON format for production
|
|
- Human-readable for development
|
|
- Support for Loki/CloudWatch/ELK
|
|
|
|
## Acceptance Criteria
|
|
- [ ] Traces are exported and visible in Jaeger
|
|
- [ ] All infrastructure components are instrumented
|
|
- [ ] Trace IDs are included in logs
|
|
- [ ] Metrics are expanded with new dimensions
|
|
- [ ] Logs include all correlation fields
|
|
- [ ] Log aggregation works correctly
|
|
|
|
## Files to Create/Modify
|
|
- `internal/observability/tracer.go` - Enhanced tracing
|
|
- `internal/infra/database/client.go` - Add tracing
|
|
- `internal/infra/cache/redis_cache.go` - Add tracing
|
|
- `internal/infra/bus/kafka_bus.go` - Add tracing
|
|
- `internal/metrics/metrics.go` - Expanded metrics
|
|
- `internal/logger/zap_logger.go` - Enhanced logging
|
|
|