# ADR-0016: OpenTelemetry Observability Strategy ## Status Accepted ## Context The platform needs distributed tracing and observability for: - Request tracing across services/modules - Performance monitoring - Debugging production issues - Integration with observability tools (Jaeger, Grafana, etc.) Options considered: 1. **OpenTelemetry** - Industry standard, vendor-neutral 2. **Zipkin** - Older standard, less ecosystem support 3. **Custom tracing** - Build our own 4. **No tracing** - Only logs and metrics ## Decision Use **OpenTelemetry (OTEL)** for all observability: 1. **Tracing**: Distributed tracing with spans 2. **Metrics**: Prometheus-compatible metrics 3. **Logs**: Structured logs with trace correlation 4. **Export**: OTLP collector for production, stdout for development **Rationale:** - Industry standard, vendor-neutral - Excellent Go SDK support - Integrates with major observability tools - Supports metrics, traces, and logs - Recommended in playbook.md - Future-proof (not locked to specific vendor) ## Consequences ### Positive - Vendor-neutral (can switch backends) - Rich ecosystem and tooling - Excellent Go SDK - Supports all observability signals ### Negative - Learning curve for OpenTelemetry concepts - Slight overhead (minimal with sampling) - Requires OTLP collector or compatible backend ### Implementation Notes - Install: `go.opentelemetry.io/otel` and contrib packages - Initialize TracerProvider in `internal/observability/tracer.go` - Use HTTP instrumentation middleware: `otelhttp.NewHandler()` - Add database instrumentation via Ent interceptor - Export to stdout for development, OTLP for production - Include trace ID in structured logs - Configure sampling for production (e.g., 10% or adaptive)