docs: add mkdocs, update links, add architecture documentation
This commit is contained in:
52
docs/content/adr/0014-health-check-implementation.md
Normal file
52
docs/content/adr/0014-health-check-implementation.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# ADR-0014: Health Check Implementation
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The platform needs health check endpoints for:
|
||||
- Kubernetes liveness probes (`/healthz`)
|
||||
- Kubernetes readiness probes (`/ready`)
|
||||
- Monitoring and alerting
|
||||
- Load balancer health checks
|
||||
|
||||
Health checks should be:
|
||||
- Fast and lightweight
|
||||
- Check critical dependencies (database, cache, etc.)
|
||||
- Provide clear status indicators
|
||||
|
||||
## Decision
|
||||
Implement **custom health check registry** with composable checkers:
|
||||
|
||||
1. **Liveness endpoint** (`/healthz`): Always returns 200 if process is running
|
||||
2. **Readiness endpoint** (`/ready`): Checks all registered health checkers
|
||||
3. **Health check interface**: `type HealthChecker interface { Check(ctx context.Context) error }`
|
||||
4. **Registry pattern**: Modules can register additional health checkers
|
||||
|
||||
**Rationale:**
|
||||
- Custom implementation gives full control
|
||||
- Composable design allows modules to add checks
|
||||
- Simple interface is easy to test
|
||||
- No external dependency for basic functionality
|
||||
- Can extend with Prometheus metrics later
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Lightweight and fast
|
||||
- Extensible by modules
|
||||
- Easy to test
|
||||
- Clear separation of liveness vs readiness
|
||||
|
||||
### Negative
|
||||
- Need to implement ourselves (though simple)
|
||||
- Must maintain the registry
|
||||
|
||||
### Implementation Notes
|
||||
- Create `pkg/health/health.go` interface
|
||||
- Implement `internal/health/registry.go` with checker map
|
||||
- Register core checkers: database, cache (if enabled)
|
||||
- Add endpoints to HTTP router
|
||||
- Return JSON response: `{"status": "ok", "checks": {...}}`
|
||||
- Consider timeout (e.g., 5 seconds) for readiness checks
|
||||
|
||||
Reference in New Issue
Block a user