- Verified all acceptance criteria for Stories 1.1-1.6 - Updated Status fields from Pending to Completed - Marked all acceptance criteria checkboxes as completed - All stories in Epic 1 are now fully implemented and verified
127 lines
3.8 KiB
Markdown
127 lines
3.8 KiB
Markdown
# Story 1.3: Health Monitoring and Metrics System
|
|
|
|
## Metadata
|
|
- **Story ID**: 1.3
|
|
- **Title**: Health Monitoring and Metrics System
|
|
- **Epic**: 1 - Core Kernel & Infrastructure
|
|
- **Status**: Completed
|
|
- **Priority**: High
|
|
- **Estimated Time**: 5-6 hours
|
|
- **Dependencies**: 1.1, 1.2
|
|
|
|
## Goal
|
|
Implement comprehensive health checks and Prometheus metrics for monitoring platform health and performance.
|
|
|
|
## Description
|
|
This story creates a complete health monitoring system with liveness and readiness probes, and a comprehensive Prometheus metrics system for tracking HTTP requests, database queries, and errors.
|
|
|
|
## Deliverables
|
|
|
|
### 1. Health Check System
|
|
- **HealthChecker Interface** (`pkg/health/health.go`):
|
|
- `HealthChecker` interface with `Check(ctx context.Context) error` method
|
|
- Health status types
|
|
- **Health Registry** (`internal/health/registry.go`):
|
|
- Thread-safe registry of health checkers
|
|
- Register multiple health checkers
|
|
- Aggregate health status
|
|
- `GET /healthz` endpoint (liveness probe)
|
|
- `GET /ready` endpoint (readiness probe with database check)
|
|
- Individual component health checks
|
|
|
|
### 2. Prometheus Metrics System
|
|
- **Metrics Registry** (`internal/metrics/metrics.go`):
|
|
- Prometheus registry setup
|
|
- HTTP request duration histogram
|
|
- HTTP request counter (by method, path, status code)
|
|
- Database query duration histogram (via Ent interceptor)
|
|
- Error counter (by type)
|
|
- Custom metrics support
|
|
- **Metrics Endpoint**:
|
|
- `GET /metrics` endpoint (Prometheus format)
|
|
- Proper content type headers
|
|
|
|
### 3. Database Health Check
|
|
- Database connectivity check
|
|
- Connection pool status
|
|
- Query execution test
|
|
|
|
### 4. Integration
|
|
- Integration with HTTP server
|
|
- Integration with DI container
|
|
- Middleware for automatic metrics collection
|
|
|
|
## Implementation Steps
|
|
|
|
1. **Install Dependencies**
|
|
```bash
|
|
go get github.com/prometheus/client_golang/prometheus
|
|
```
|
|
|
|
2. **Create Health Check Interface**
|
|
- Create `pkg/health/health.go`
|
|
- Define HealthChecker interface
|
|
|
|
3. **Implement Health Registry**
|
|
- Create `internal/health/registry.go`
|
|
- Implement registry and endpoints
|
|
|
|
4. **Create Metrics System**
|
|
- Create `internal/metrics/metrics.go`
|
|
- Define all metrics
|
|
- Create registry
|
|
|
|
5. **Add Database Health Check**
|
|
- Implement database health checker
|
|
- Register with health registry
|
|
|
|
6. **Integrate with HTTP Server**
|
|
- Add health endpoints
|
|
- Add metrics endpoint
|
|
- Add metrics middleware
|
|
|
|
7. **Integrate with DI**
|
|
- Create provider functions
|
|
- Register in container
|
|
|
|
## Acceptance Criteria
|
|
- [x] `/healthz` returns 200 when service is alive
|
|
- [x] `/ready` checks database connectivity and returns appropriate status
|
|
- [x] `/metrics` exposes Prometheus metrics in correct format
|
|
- [x] All HTTP requests are measured
|
|
- [x] Database queries are instrumented
|
|
- [x] Metrics are registered in DI container
|
|
- [x] Health checks can be extended by modules
|
|
- [x] Metrics follow Prometheus naming conventions
|
|
|
|
## Related ADRs
|
|
- [ADR-0014: Health Check Implementation](../../adr/0014-health-check-implementation.md)
|
|
|
|
## Implementation Notes
|
|
- Use Prometheus client library
|
|
- Follow Prometheus naming conventions
|
|
- Health checks should be fast (< 1 second)
|
|
- Metrics should have appropriate labels
|
|
- Consider adding custom business metrics in future
|
|
|
|
## Testing
|
|
```bash
|
|
# Test health endpoints
|
|
curl http://localhost:8080/healthz
|
|
curl http://localhost:8080/ready
|
|
|
|
# Test metrics endpoint
|
|
curl http://localhost:8080/metrics
|
|
|
|
# Test metrics collection
|
|
go test ./internal/metrics/...
|
|
```
|
|
|
|
## Files to Create/Modify
|
|
- `pkg/health/health.go` - Health checker interface
|
|
- `internal/health/registry.go` - Health registry
|
|
- `internal/metrics/metrics.go` - Metrics system
|
|
- `internal/server/server.go` - Add endpoints
|
|
- `internal/di/providers.go` - Add providers
|
|
|