- Add comprehensive 8-phase implementation plan (docs/plan.md) - Add 28 Architecture Decision Records (docs/adr/) covering all phases - Add task tracking system with 283+ task files (docs/stories/) - Add task generator script for automated task file creation - Add reference playbooks and requirements documentation This commit establishes the complete planning foundation for the Go Platform implementation, documenting all architectural decisions and providing detailed task breakdown for Phases 0-8.
1.7 KiB
1.7 KiB
ADR-0016: OpenTelemetry Observability Strategy
Status
Accepted
Context
The platform needs distributed tracing and observability for:
- Request tracing across services/modules
- Performance monitoring
- Debugging production issues
- Integration with observability tools (Jaeger, Grafana, etc.)
Options considered:
- OpenTelemetry - Industry standard, vendor-neutral
- Zipkin - Older standard, less ecosystem support
- Custom tracing - Build our own
- No tracing - Only logs and metrics
Decision
Use OpenTelemetry (OTEL) for all observability:
- Tracing: Distributed tracing with spans
- Metrics: Prometheus-compatible metrics
- Logs: Structured logs with trace correlation
- Export: OTLP collector for production, stdout for development
Rationale:
- Industry standard, vendor-neutral
- Excellent Go SDK support
- Integrates with major observability tools
- Supports metrics, traces, and logs
- Recommended in playbook-golang.md
- Future-proof (not locked to specific vendor)
Consequences
Positive
- Vendor-neutral (can switch backends)
- Rich ecosystem and tooling
- Excellent Go SDK
- Supports all observability signals
Negative
- Learning curve for OpenTelemetry concepts
- Slight overhead (minimal with sampling)
- Requires OTLP collector or compatible backend
Implementation Notes
- Install:
go.opentelemetry.io/oteland contrib packages - Initialize TracerProvider in
internal/observability/tracer.go - Use HTTP instrumentation middleware:
otelhttp.NewHandler() - Add database instrumentation via Ent interceptor
- Export to stdout for development, OTLP for production
- Include trace ID in structured logs
- Configure sampling for production (e.g., 10% or adaptive)