feat: reword phase to epic, update mkdocs
This commit is contained in:
126
docs/content/stories/epic1/1.3-health-metrics-system.md
Normal file
126
docs/content/stories/epic1/1.3-health-metrics-system.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Story 1.3: Health Monitoring and Metrics System
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 1.3
|
||||
- **Title**: Health Monitoring and Metrics System
|
||||
- **Epic**: 1 - Core Kernel & Infrastructure
|
||||
- **Status**: Pending
|
||||
- **Priority**: High
|
||||
- **Estimated Time**: 5-6 hours
|
||||
- **Dependencies**: 1.1, 1.2
|
||||
|
||||
## Goal
|
||||
Implement comprehensive health checks and Prometheus metrics for monitoring platform health and performance.
|
||||
|
||||
## Description
|
||||
This story creates a complete health monitoring system with liveness and readiness probes, and a comprehensive Prometheus metrics system for tracking HTTP requests, database queries, and errors.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Health Check System
|
||||
- **HealthChecker Interface** (`pkg/health/health.go`):
|
||||
- `HealthChecker` interface with `Check(ctx context.Context) error` method
|
||||
- Health status types
|
||||
- **Health Registry** (`internal/health/registry.go`):
|
||||
- Thread-safe registry of health checkers
|
||||
- Register multiple health checkers
|
||||
- Aggregate health status
|
||||
- `GET /healthz` endpoint (liveness probe)
|
||||
- `GET /ready` endpoint (readiness probe with database check)
|
||||
- Individual component health checks
|
||||
|
||||
### 2. Prometheus Metrics System
|
||||
- **Metrics Registry** (`internal/metrics/metrics.go`):
|
||||
- Prometheus registry setup
|
||||
- HTTP request duration histogram
|
||||
- HTTP request counter (by method, path, status code)
|
||||
- Database query duration histogram (via Ent interceptor)
|
||||
- Error counter (by type)
|
||||
- Custom metrics support
|
||||
- **Metrics Endpoint**:
|
||||
- `GET /metrics` endpoint (Prometheus format)
|
||||
- Proper content type headers
|
||||
|
||||
### 3. Database Health Check
|
||||
- Database connectivity check
|
||||
- Connection pool status
|
||||
- Query execution test
|
||||
|
||||
### 4. Integration
|
||||
- Integration with HTTP server
|
||||
- Integration with DI container
|
||||
- Middleware for automatic metrics collection
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
1. **Install Dependencies**
|
||||
```bash
|
||||
go get github.com/prometheus/client_golang/prometheus
|
||||
```
|
||||
|
||||
2. **Create Health Check Interface**
|
||||
- Create `pkg/health/health.go`
|
||||
- Define HealthChecker interface
|
||||
|
||||
3. **Implement Health Registry**
|
||||
- Create `internal/health/registry.go`
|
||||
- Implement registry and endpoints
|
||||
|
||||
4. **Create Metrics System**
|
||||
- Create `internal/metrics/metrics.go`
|
||||
- Define all metrics
|
||||
- Create registry
|
||||
|
||||
5. **Add Database Health Check**
|
||||
- Implement database health checker
|
||||
- Register with health registry
|
||||
|
||||
6. **Integrate with HTTP Server**
|
||||
- Add health endpoints
|
||||
- Add metrics endpoint
|
||||
- Add metrics middleware
|
||||
|
||||
7. **Integrate with DI**
|
||||
- Create provider functions
|
||||
- Register in container
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `/healthz` returns 200 when service is alive
|
||||
- [ ] `/ready` checks database connectivity and returns appropriate status
|
||||
- [ ] `/metrics` exposes Prometheus metrics in correct format
|
||||
- [ ] All HTTP requests are measured
|
||||
- [ ] Database queries are instrumented
|
||||
- [ ] Metrics are registered in DI container
|
||||
- [ ] Health checks can be extended by modules
|
||||
- [ ] Metrics follow Prometheus naming conventions
|
||||
|
||||
## Related ADRs
|
||||
- [ADR-0014: Health Check Implementation](../../adr/0014-health-check-implementation.md)
|
||||
|
||||
## Implementation Notes
|
||||
- Use Prometheus client library
|
||||
- Follow Prometheus naming conventions
|
||||
- Health checks should be fast (< 1 second)
|
||||
- Metrics should have appropriate labels
|
||||
- Consider adding custom business metrics in future
|
||||
|
||||
## Testing
|
||||
```bash
|
||||
# Test health endpoints
|
||||
curl http://localhost:8080/healthz
|
||||
curl http://localhost:8080/ready
|
||||
|
||||
# Test metrics endpoint
|
||||
curl http://localhost:8080/metrics
|
||||
|
||||
# Test metrics collection
|
||||
go test ./internal/metrics/...
|
||||
```
|
||||
|
||||
## Files to Create/Modify
|
||||
- `pkg/health/health.go` - Health checker interface
|
||||
- `internal/health/registry.go` - Health registry
|
||||
- `internal/metrics/metrics.go` - Metrics system
|
||||
- `internal/server/server.go` - Add endpoints
|
||||
- `internal/di/providers.go` - Add providers
|
||||
|
||||
Reference in New Issue
Block a user