Files
goplt/docs/content/adr/0016-opentelemetry-observability.md
2025-11-05 11:00:36 +01:00

57 lines
1.7 KiB
Markdown

# ADR-0016: OpenTelemetry Observability Strategy
## Status
Accepted
## Context
The platform needs distributed tracing and observability for:
- Request tracing across services/modules
- Performance monitoring
- Debugging production issues
- Integration with observability tools (Jaeger, Grafana, etc.)
Options considered:
1. **OpenTelemetry** - Industry standard, vendor-neutral
2. **Zipkin** - Older standard, less ecosystem support
3. **Custom tracing** - Build our own
4. **No tracing** - Only logs and metrics
## Decision
Use **OpenTelemetry (OTEL)** for all observability:
1. **Tracing**: Distributed tracing with spans
2. **Metrics**: Prometheus-compatible metrics
3. **Logs**: Structured logs with trace correlation
4. **Export**: OTLP collector for production, stdout for development
**Rationale:**
- Industry standard, vendor-neutral
- Excellent Go SDK support
- Integrates with major observability tools
- Supports metrics, traces, and logs
- Recommended in playbook.md
- Future-proof (not locked to specific vendor)
## Consequences
### Positive
- Vendor-neutral (can switch backends)
- Rich ecosystem and tooling
- Excellent Go SDK
- Supports all observability signals
### Negative
- Learning curve for OpenTelemetry concepts
- Slight overhead (minimal with sampling)
- Requires OTLP collector or compatible backend
### Implementation Notes
- Install: `go.opentelemetry.io/otel` and contrib packages
- Initialize TracerProvider in `internal/observability/tracer.go`
- Use HTTP instrumentation middleware: `otelhttp.NewHandler()`
- Add database instrumentation via Ent interceptor
- Export to stdout for development, OTLP for production
- Include trace ID in structured logs
- Configure sampling for production (e.g., 10% or adaptive)