feat: reword phase to epic, update mkdocs
This commit is contained in:
72
docs/content/stories/epic6/6.1-enhanced-observability.md
Normal file
72
docs/content/stories/epic6/6.1-enhanced-observability.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Story 6.1: Enhanced Observability
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.1
|
||||
- **Title**: Enhanced Observability
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: High
|
||||
- **Estimated Time**: 6-8 hours
|
||||
- **Dependencies**: 1.6, 5.2, 5.1
|
||||
|
||||
## Goal
|
||||
Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics expansion, and improved logging with request correlation.
|
||||
|
||||
## Description
|
||||
This story enhances the observability system by completing OpenTelemetry integration with all infrastructure components, expanding Prometheus metrics, and improving logging with better correlation and structured fields.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Complete OpenTelemetry Integration
|
||||
- Export traces to Jaeger/OTLP collector
|
||||
- Add database instrumentation (Ent interceptor)
|
||||
- Add Kafka instrumentation
|
||||
- Add Redis instrumentation
|
||||
- Create custom spans:
|
||||
- Module initialization spans
|
||||
- Background job spans
|
||||
- Event publishing spans
|
||||
- Trace context propagation:
|
||||
- Include trace ID in logs
|
||||
- Propagate across HTTP calls
|
||||
- Include in error reports
|
||||
|
||||
### 2. Prometheus Metrics Expansion
|
||||
- Add more metrics:
|
||||
- Database connection pool stats
|
||||
- Cache hit/miss ratio
|
||||
- Event bus publish/consume rates
|
||||
- Background job execution times
|
||||
- Module-specific metrics (via module interface)
|
||||
- Create metric labels:
|
||||
- `module` label for module metrics
|
||||
- `tenant_id` label (if multi-tenant)
|
||||
- `status` label for error rates
|
||||
|
||||
### 3. Enhanced Logging
|
||||
- Add structured fields:
|
||||
- `user_id` from context
|
||||
- `tenant_id` from context
|
||||
- `module` name for module logs
|
||||
- `trace_id` from OpenTelemetry
|
||||
- Create log aggregation config:
|
||||
- JSON format for production
|
||||
- Human-readable for development
|
||||
- Support for Loki/CloudWatch/ELK
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Traces are exported and visible in Jaeger
|
||||
- [ ] All infrastructure components are instrumented
|
||||
- [ ] Trace IDs are included in logs
|
||||
- [ ] Metrics are expanded with new dimensions
|
||||
- [ ] Logs include all correlation fields
|
||||
- [ ] Log aggregation works correctly
|
||||
|
||||
## Files to Create/Modify
|
||||
- `internal/observability/tracer.go` - Enhanced tracing
|
||||
- `internal/infra/database/client.go` - Add tracing
|
||||
- `internal/infra/cache/redis_cache.go` - Add tracing
|
||||
- `internal/infra/bus/kafka_bus.go` - Add tracing
|
||||
- `internal/metrics/metrics.go` - Expanded metrics
|
||||
- `internal/logger/zap_logger.go` - Enhanced logging
|
||||
|
||||
53
docs/content/stories/epic6/6.2-error-reporting.md
Normal file
53
docs/content/stories/epic6/6.2-error-reporting.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Story 6.2: Error Reporting (Sentry)
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.2
|
||||
- **Title**: Error Reporting (Sentry)
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: High
|
||||
- **Estimated Time**: 4-5 hours
|
||||
- **Dependencies**: 1.4
|
||||
|
||||
## Goal
|
||||
Add comprehensive error reporting with Sentry integration that captures errors with full context.
|
||||
|
||||
## Description
|
||||
This story integrates Sentry for error reporting, sending all errors from the error bus to Sentry with complete context including trace IDs, user information, and module context.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Sentry Integration
|
||||
- Install and configure Sentry SDK
|
||||
- Integrate with error bus:
|
||||
- Send errors to Sentry
|
||||
- Include trace ID in Sentry events
|
||||
- Add user context (user ID, email)
|
||||
- Add module context (module name)
|
||||
- Sentry middleware:
|
||||
- Capture panics
|
||||
- Capture HTTP errors (4xx, 5xx)
|
||||
- Configure Sentry DSN via config
|
||||
|
||||
### 2. Error Context Enhancement
|
||||
- Enrich errors with:
|
||||
- Request context
|
||||
- User information
|
||||
- Module information
|
||||
- Stack traces
|
||||
- Environment information
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Errors are reported to Sentry with context
|
||||
- [ ] Panics are captured and reported
|
||||
- [ ] HTTP errors are captured
|
||||
- [ ] Trace IDs are included in Sentry events
|
||||
- [ ] User context is included
|
||||
- [ ] Sentry DSN is configurable
|
||||
|
||||
## Files to Create/Modify
|
||||
- `internal/errorbus/sentry_bus.go` - Sentry integration
|
||||
- `internal/server/middleware.go` - Sentry middleware
|
||||
- `internal/di/providers.go` - Add Sentry provider
|
||||
- `config/default.yaml` - Add Sentry config
|
||||
|
||||
46
docs/content/stories/epic6/6.3-grafana-dashboards.md
Normal file
46
docs/content/stories/epic6/6.3-grafana-dashboards.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Story 6.3: Grafana Dashboards
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.3
|
||||
- **Title**: Grafana Dashboards
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: Medium
|
||||
- **Estimated Time**: 4-5 hours
|
||||
- **Dependencies**: 1.3, 6.1
|
||||
|
||||
## Goal
|
||||
Create comprehensive Grafana dashboards for monitoring platform health, performance, and errors.
|
||||
|
||||
## Description
|
||||
This story creates Grafana dashboard JSON files that visualize platform metrics, health, and performance data from Prometheus.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Grafana Dashboards (`ops/grafana/dashboards/`)
|
||||
- `platform-overview.json` - Overall health dashboard
|
||||
- `http-metrics.json` - HTTP request metrics
|
||||
- `database-metrics.json` - Database performance
|
||||
- `module-metrics.json` - Per-module metrics
|
||||
- `error-rates.json` - Error tracking
|
||||
- Dashboard setup documentation
|
||||
|
||||
### 2. Documentation
|
||||
- Document dashboard setup in `docs/operations.md`
|
||||
- Dashboard import instructions
|
||||
- Metric explanation
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] All dashboards are created
|
||||
- [ ] Dashboards display correct metrics
|
||||
- [ ] Dashboard setup is documented
|
||||
- [ ] Dashboards can be imported into Grafana
|
||||
|
||||
## Files to Create/Modify
|
||||
- `ops/grafana/dashboards/platform-overview.json`
|
||||
- `ops/grafana/dashboards/http-metrics.json`
|
||||
- `ops/grafana/dashboards/database-metrics.json`
|
||||
- `ops/grafana/dashboards/module-metrics.json`
|
||||
- `ops/grafana/dashboards/error-rates.json`
|
||||
- `docs/operations.md` - Dashboard documentation
|
||||
|
||||
53
docs/content/stories/epic6/6.4-rate-limiting.md
Normal file
53
docs/content/stories/epic6/6.4-rate-limiting.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Story 6.4: Rate Limiting
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.4
|
||||
- **Title**: Rate Limiting
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: High
|
||||
- **Estimated Time**: 4-5 hours
|
||||
- **Dependencies**: 1.5, 5.1
|
||||
|
||||
## Goal
|
||||
Implement rate limiting to prevent API abuse and ensure fair resource usage.
|
||||
|
||||
## Description
|
||||
This story implements rate limiting middleware that limits requests per user and per IP address, with configurable limits per endpoint.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Rate Limiting Middleware
|
||||
- Per-user rate limiting
|
||||
- Per-IP rate limiting
|
||||
- Configurable limits per endpoint
|
||||
- Rate limit storage (Redis)
|
||||
- Return `X-RateLimit-*` headers
|
||||
|
||||
### 2. Configuration
|
||||
- Rate limit config in `config/default.yaml`:
|
||||
```yaml
|
||||
rate_limiting:
|
||||
enabled: true
|
||||
per_user: 100/minute
|
||||
per_ip: 1000/minute
|
||||
```
|
||||
|
||||
### 3. Integration
|
||||
- Integrate with HTTP server
|
||||
- Add to middleware stack
|
||||
- Error responses for rate limit exceeded
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Rate limiting prevents abuse
|
||||
- [ ] Per-user limits work correctly
|
||||
- [ ] Per-IP limits work correctly
|
||||
- [ ] Rate limit headers are returned
|
||||
- [ ] Configuration is flexible
|
||||
- [ ] Rate limits are stored in Redis
|
||||
|
||||
## Files to Create/Modify
|
||||
- `internal/server/middleware.go` - Rate limiting middleware
|
||||
- `internal/infra/ratelimit/limiter.go` - Rate limiter implementation
|
||||
- `config/default.yaml` - Add rate limit config
|
||||
|
||||
54
docs/content/stories/epic6/6.5-security-hardening.md
Normal file
54
docs/content/stories/epic6/6.5-security-hardening.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Story 6.5: Security Hardening
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.5
|
||||
- **Title**: Security Hardening
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: High
|
||||
- **Estimated Time**: 5-6 hours
|
||||
- **Dependencies**: 1.5
|
||||
|
||||
## Goal
|
||||
Add comprehensive security hardening including security headers, input validation, and request size limits.
|
||||
|
||||
## Description
|
||||
This story implements security best practices including security headers, input validation, request size limits, and SQL injection protection.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Security Headers Middleware
|
||||
- `X-Content-Type-Options: nosniff`
|
||||
- `X-Frame-Options: DENY`
|
||||
- `X-XSS-Protection: 1; mode=block`
|
||||
- `Strict-Transport-Security` (if HTTPS)
|
||||
- `Content-Security-Policy`
|
||||
|
||||
### 2. Request Size Limits
|
||||
- Max body size (10MB default)
|
||||
- Max header size
|
||||
- Configurable limits
|
||||
|
||||
### 3. Input Validation
|
||||
- Use `github.com/go-playground/validator`
|
||||
- Validate all request bodies
|
||||
- Sanitize user inputs
|
||||
- Validation error responses
|
||||
|
||||
### 4. SQL Injection Protection
|
||||
- Use parameterized queries (Ent already does this)
|
||||
- Add linter rule to prevent raw SQL
|
||||
- Security scanning
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Security headers are present
|
||||
- [ ] Request size limits are enforced
|
||||
- [ ] Input validation works
|
||||
- [ ] SQL injection protection is in place
|
||||
- [ ] Security headers are configurable
|
||||
|
||||
## Files to Create/Modify
|
||||
- `internal/server/middleware.go` - Security headers middleware
|
||||
- `internal/server/validation.go` - Input validation
|
||||
- `config/default.yaml` - Add security config
|
||||
|
||||
53
docs/content/stories/epic6/6.6-performance-optimization.md
Normal file
53
docs/content/stories/epic6/6.6-performance-optimization.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Story 6.6: Performance Optimization
|
||||
|
||||
## Metadata
|
||||
- **Story ID**: 6.6
|
||||
- **Title**: Performance Optimization
|
||||
- **Epic**: 6 - Observability & Production Readiness
|
||||
- **Status**: Pending
|
||||
- **Priority**: Medium
|
||||
- **Estimated Time**: 6-8 hours
|
||||
- **Dependencies**: 1.2, 5.1
|
||||
|
||||
## Goal
|
||||
Optimize platform performance through database connection pooling, query optimization, response compression, and caching strategies.
|
||||
|
||||
## Description
|
||||
This story implements performance optimizations including database connection pooling, query optimization, response compression, and strategic caching.
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Database Connection Pooling
|
||||
- Configure max connections
|
||||
- Configure idle timeout
|
||||
- Monitor pool stats
|
||||
- Connection health checks
|
||||
|
||||
### 2. Query Optimization
|
||||
- Add indexes for common queries
|
||||
- Use database query logging (development)
|
||||
- Add slow query detection
|
||||
- Query performance monitoring
|
||||
|
||||
### 3. Response Compression
|
||||
- Gzip middleware for large responses
|
||||
- Configurable compression levels
|
||||
- Content type filtering
|
||||
|
||||
### 4. Caching Strategy
|
||||
- Cache frequently accessed data (user permissions, roles)
|
||||
- Cache invalidation strategies
|
||||
- Cache warming
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Database connection pooling is optimized
|
||||
- [ ] Query performance is improved
|
||||
- [ ] Response compression works
|
||||
- [ ] Caching strategy is effective
|
||||
- [ ] Performance meets SLA (< 100ms p95 for auth endpoints)
|
||||
|
||||
## Files to Create/Modify
|
||||
- `internal/infra/database/client.go` - Connection pooling
|
||||
- `internal/server/middleware.go` - Compression middleware
|
||||
- `internal/perm/in_memory_resolver.go` - Add caching
|
||||
|
||||
55
docs/content/stories/epic6/README.md
Normal file
55
docs/content/stories/epic6/README.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# Epic 6: Observability & Production Readiness
|
||||
|
||||
## Overview
|
||||
Enhance observability with full OpenTelemetry integration, add comprehensive error reporting (Sentry), create Grafana dashboards, improve logging with request correlation, add rate limiting and security hardening, and optimize performance.
|
||||
|
||||
## Stories
|
||||
|
||||
### 6.1 Enhanced Observability
|
||||
- [Story: 6.1 - Enhanced Observability](./6.1-enhanced-observability.md)
|
||||
- **Goal:** Enhance observability with full OpenTelemetry integration, comprehensive Prometheus metrics, and improved logging.
|
||||
- **Deliverables:** Complete OpenTelemetry integration, expanded metrics, enhanced logging
|
||||
|
||||
### 6.2 Error Reporting (Sentry)
|
||||
- [Story: 6.2 - Error Reporting](./6.2-error-reporting.md)
|
||||
- **Goal:** Add comprehensive error reporting with Sentry integration.
|
||||
- **Deliverables:** Sentry integration, error context enhancement
|
||||
|
||||
### 6.3 Grafana Dashboards
|
||||
- [Story: 6.3 - Grafana Dashboards](./6.3-grafana-dashboards.md)
|
||||
- **Goal:** Create comprehensive Grafana dashboards for monitoring.
|
||||
- **Deliverables:** Grafana dashboard JSON files, documentation
|
||||
|
||||
### 6.4 Rate Limiting
|
||||
- [Story: 6.4 - Rate Limiting](./6.4-rate-limiting.md)
|
||||
- **Goal:** Implement rate limiting to prevent API abuse.
|
||||
- **Deliverables:** Rate limiting middleware, configuration
|
||||
|
||||
### 6.5 Security Hardening
|
||||
- [Story: 6.5 - Security Hardening](./6.5-security-hardening.md)
|
||||
- **Goal:** Add comprehensive security hardening.
|
||||
- **Deliverables:** Security headers, input validation, request limits
|
||||
|
||||
### 6.6 Performance Optimization
|
||||
- [Story: 6.6 - Performance Optimization](./6.6-performance-optimization.md)
|
||||
- **Goal:** Optimize platform performance.
|
||||
- **Deliverables:** Connection pooling, query optimization, compression, caching
|
||||
|
||||
## Deliverables Checklist
|
||||
- [ ] Full OpenTelemetry integration
|
||||
- [ ] Sentry error reporting
|
||||
- [ ] Enhanced logging with correlation
|
||||
- [ ] Comprehensive Prometheus metrics
|
||||
- [ ] Grafana dashboards
|
||||
- [ ] Rate limiting
|
||||
- [ ] Security hardening
|
||||
- [ ] Performance optimizations
|
||||
|
||||
## Acceptance Criteria
|
||||
- Traces are exported and visible in Jaeger
|
||||
- Errors are reported to Sentry with context
|
||||
- Logs include request IDs and trace IDs
|
||||
- Metrics are exposed and scraped by Prometheus
|
||||
- Rate limiting prevents abuse
|
||||
- Security headers are present
|
||||
- Performance meets SLA (< 100ms p95 for auth endpoints)
|
||||
Reference in New Issue
Block a user