Files
goplt/docs/content/adr/0016-opentelemetry-observability.md

1.7 KiB

ADR-0016: OpenTelemetry Observability Strategy

Status

Accepted

Context

The platform needs distributed tracing and observability for:

  • Request tracing across services/modules
  • Performance monitoring
  • Debugging production issues
  • Integration with observability tools (Jaeger, Grafana, etc.)

Options considered:

  1. OpenTelemetry - Industry standard, vendor-neutral
  2. Zipkin - Older standard, less ecosystem support
  3. Custom tracing - Build our own
  4. No tracing - Only logs and metrics

Decision

Use OpenTelemetry (OTEL) for all observability:

  1. Tracing: Distributed tracing with spans
  2. Metrics: Prometheus-compatible metrics
  3. Logs: Structured logs with trace correlation
  4. Export: OTLP collector for production, stdout for development

Rationale:

  • Industry standard, vendor-neutral
  • Excellent Go SDK support
  • Integrates with major observability tools
  • Supports metrics, traces, and logs
  • Recommended in playbook-golang.md
  • Future-proof (not locked to specific vendor)

Consequences

Positive

  • Vendor-neutral (can switch backends)
  • Rich ecosystem and tooling
  • Excellent Go SDK
  • Supports all observability signals

Negative

  • Learning curve for OpenTelemetry concepts
  • Slight overhead (minimal with sampling)
  • Requires OTLP collector or compatible backend

Implementation Notes

  • Install: go.opentelemetry.io/otel and contrib packages
  • Initialize TracerProvider in internal/observability/tracer.go
  • Use HTTP instrumentation middleware: otelhttp.NewHandler()
  • Add database instrumentation via Ent interceptor
  • Export to stdout for development, OTLP for production
  • Include trace ID in structured logs
  • Configure sampling for production (e.g., 10% or adaptive)