Files
goplt/docs/content/stories/epic6/6.1-enhanced-observability.md

73 lines
2.4 KiB
Markdown

# Story 6.1: Enhanced Observability
## Metadata
- **Story ID**: 6.1
- **Title**: Enhanced Observability
- **Epic**: 6 - Observability & Production Readiness
- **Status**: Pending
- **Priority**: High
- **Estimated Time**: 6-8 hours
- **Dependencies**: 1.6, 5.2, 5.1
## Goal
Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics expansion, and improved logging with request correlation.
## Description
This story enhances the observability system by completing OpenTelemetry integration with all infrastructure components, expanding Prometheus metrics, and improving logging with better correlation and structured fields.
## Deliverables
### 1. Complete OpenTelemetry Integration
- Export traces to Jaeger/OTLP collector
- Add database instrumentation (Ent interceptor)
- Add Kafka instrumentation
- Add Redis instrumentation
- Create custom spans:
- Module initialization spans
- Background job spans
- Event publishing spans
- Trace context propagation:
- Include trace ID in logs
- Propagate across HTTP calls
- Include in error reports
### 2. Prometheus Metrics Expansion
- Add more metrics:
- Database connection pool stats
- Cache hit/miss ratio
- Event bus publish/consume rates
- Background job execution times
- Module-specific metrics (via module interface)
- Create metric labels:
- `module` label for module metrics
- `tenant_id` label (if multi-tenant)
- `status` label for error rates
### 3. Enhanced Logging
- Add structured fields:
- `user_id` from context
- `tenant_id` from context
- `module` name for module logs
- `trace_id` from OpenTelemetry
- Create log aggregation config:
- JSON format for production
- Human-readable for development
- Support for Loki/CloudWatch/ELK
## Acceptance Criteria
- [ ] Traces are exported and visible in Jaeger
- [ ] All infrastructure components are instrumented
- [ ] Trace IDs are included in logs
- [ ] Metrics are expanded with new dimensions
- [ ] Logs include all correlation fields
- [ ] Log aggregation works correctly
## Files to Create/Modify
- `internal/observability/tracer.go` - Enhanced tracing
- `internal/infra/database/client.go` - Add tracing
- `internal/infra/cache/redis_cache.go` - Add tracing
- `internal/infra/bus/kafka_bus.go` - Add tracing
- `internal/metrics/metrics.go` - Expanded metrics
- `internal/logger/zap_logger.go` - Enhanced logging