# ADR-0016: OpenTelemetry Observability Strategy

## Status
Accepted

## Context
The platform needs distributed tracing and observability for:
- Request tracing across services/modules
- Performance monitoring
- Debugging production issues
- Integration with observability tools (Jaeger, Grafana, etc.)

Options considered:
1. **OpenTelemetry** - Industry standard, vendor-neutral
2. **Zipkin** - Older standard, less ecosystem support
3. **Custom tracing** - Build our own
4. **No tracing** - Only logs and metrics

## Decision
Use **OpenTelemetry (OTEL)** for all observability:

1. **Tracing**: Distributed tracing with spans
2. **Metrics**: Prometheus-compatible metrics
3. **Logs**: Structured logs with trace correlation
4. **Export**: OTLP collector for production, stdout for development

**Rationale:**
- Industry standard, vendor-neutral
- Excellent Go SDK support
- Integrates with major observability tools
- Supports metrics, traces, and logs
- Recommended in playbook.md
- Future-proof (not locked to specific vendor)

## Consequences

### Positive
- Vendor-neutral (can switch backends)
- Rich ecosystem and tooling
- Excellent Go SDK
- Supports all observability signals

### Negative
- Learning curve for OpenTelemetry concepts
- Slight overhead (minimal with sampling)
- Requires OTLP collector or compatible backend

### Implementation Notes
- Install: `go.opentelemetry.io/otel` and contrib packages
- Initialize TracerProvider in `internal/observability/tracer.go`
- Use HTTP instrumentation middleware: `otelhttp.NewHandler()`
- Add database instrumentation via Ent interceptor
- Export to stdout for development, OTLP for production
- Include trace ID in structured logs
- Configure sampling for production (e.g., 10% or adaptive)