57 lines
1.7 KiB
Markdown
57 lines
1.7 KiB
Markdown
# ADR-0016: OpenTelemetry Observability Strategy
|
|
|
|
## Status
|
|
Accepted
|
|
|
|
## Context
|
|
The platform needs distributed tracing and observability for:
|
|
- Request tracing across services/modules
|
|
- Performance monitoring
|
|
- Debugging production issues
|
|
- Integration with observability tools (Jaeger, Grafana, etc.)
|
|
|
|
Options considered:
|
|
1. **OpenTelemetry** - Industry standard, vendor-neutral
|
|
2. **Zipkin** - Older standard, less ecosystem support
|
|
3. **Custom tracing** - Build our own
|
|
4. **No tracing** - Only logs and metrics
|
|
|
|
## Decision
|
|
Use **OpenTelemetry (OTEL)** for all observability:
|
|
|
|
1. **Tracing**: Distributed tracing with spans
|
|
2. **Metrics**: Prometheus-compatible metrics
|
|
3. **Logs**: Structured logs with trace correlation
|
|
4. **Export**: OTLP collector for production, stdout for development
|
|
|
|
**Rationale:**
|
|
- Industry standard, vendor-neutral
|
|
- Excellent Go SDK support
|
|
- Integrates with major observability tools
|
|
- Supports metrics, traces, and logs
|
|
- Recommended in playbook.md
|
|
- Future-proof (not locked to specific vendor)
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
- Vendor-neutral (can switch backends)
|
|
- Rich ecosystem and tooling
|
|
- Excellent Go SDK
|
|
- Supports all observability signals
|
|
|
|
### Negative
|
|
- Learning curve for OpenTelemetry concepts
|
|
- Slight overhead (minimal with sampling)
|
|
- Requires OTLP collector or compatible backend
|
|
|
|
### Implementation Notes
|
|
- Install: `go.opentelemetry.io/otel` and contrib packages
|
|
- Initialize TracerProvider in `internal/observability/tracer.go`
|
|
- Use HTTP instrumentation middleware: `otelhttp.NewHandler()`
|
|
- Add database instrumentation via Ent interceptor
|
|
- Export to stdout for development, OTLP for production
|
|
- Include trace ID in structured logs
|
|
- Configure sampling for production (e.g., 10% or adaptive)
|
|
|