1.7 KiB
1.7 KiB
ADR-0016: OpenTelemetry Observability Strategy
Status
Accepted
Context
The platform needs distributed tracing and observability for:
- Request tracing across services/modules
- Performance monitoring
- Debugging production issues
- Integration with observability tools (Jaeger, Grafana, etc.)
Options considered:
- OpenTelemetry - Industry standard, vendor-neutral
- Zipkin - Older standard, less ecosystem support
- Custom tracing - Build our own
- No tracing - Only logs and metrics
Decision
Use OpenTelemetry (OTEL) for all observability:
- Tracing: Distributed tracing with spans
- Metrics: Prometheus-compatible metrics
- Logs: Structured logs with trace correlation
- Export: OTLP collector for production, stdout for development
Rationale:
- Industry standard, vendor-neutral
- Excellent Go SDK support
- Integrates with major observability tools
- Supports metrics, traces, and logs
- Recommended in playbook.md
- Future-proof (not locked to specific vendor)
Consequences
Positive
- Vendor-neutral (can switch backends)
- Rich ecosystem and tooling
- Excellent Go SDK
- Supports all observability signals
Negative
- Learning curve for OpenTelemetry concepts
- Slight overhead (minimal with sampling)
- Requires OTLP collector or compatible backend
Implementation Notes
- Install:
go.opentelemetry.io/oteland contrib packages - Initialize TracerProvider in
internal/observability/tracer.go - Use HTTP instrumentation middleware:
otelhttp.NewHandler() - Add database instrumentation via Ent interceptor
- Export to stdout for development, OTLP for production
- Include trace ID in structured logs
- Configure sampling for production (e.g., 10% or adaptive)