docs: Align documentation with true microservices architecture
Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.
Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
- Each service has own entry point (cmd/{service}/)
- Each service has own gRPC server and database schema
- Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
- Single entry point for all external traffic
- Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation
Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files
New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)
New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation
This commit is contained in:
309
docs/content/adr/0033-service-discovery-implementation.md
Normal file
309
docs/content/adr/0033-service-discovery-implementation.md
Normal file
@@ -0,0 +1,309 @@
|
||||
# ADR-0033: Service Discovery Implementation
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that:
|
||||
- Enables services to find each other dynamically
|
||||
- Supports health checking and automatic deregistration
|
||||
- Works in both development (Docker Compose) and production (Kubernetes) environments
|
||||
- Provides service registration and discovery APIs
|
||||
- Supports multiple service instances (load balancing)
|
||||
|
||||
Options considered:
|
||||
1. **Consul** - HashiCorp's service discovery and configuration tool
|
||||
2. **etcd** - Distributed key-value store with service discovery
|
||||
3. **Kubernetes Service Discovery** - Native K8s service discovery
|
||||
4. **Eureka** - Netflix service discovery (Java-focused)
|
||||
5. **Custom Registry** - Build our own service registry
|
||||
|
||||
## Decision
|
||||
Use **Consul** as the primary service discovery implementation with support for Kubernetes service discovery as an alternative.
|
||||
|
||||
### Rationale
|
||||
|
||||
1. **Mature and Production-Ready**:
|
||||
- Battle-tested in production environments
|
||||
- Active development and strong community
|
||||
- Comprehensive documentation
|
||||
|
||||
2. **Feature-Rich**:
|
||||
- Service registration and health checking
|
||||
- Key-value store for configuration
|
||||
- Service mesh capabilities (Consul Connect)
|
||||
- Multi-datacenter support
|
||||
- DNS-based service discovery
|
||||
|
||||
3. **Development-Friendly**:
|
||||
- Easy to run locally (single binary or Docker)
|
||||
- Docker Compose integration
|
||||
- Good for local development setup
|
||||
|
||||
4. **Production-Ready**:
|
||||
- Works well in Kubernetes (Consul K8s)
|
||||
- Can be used alongside Kubernetes service discovery
|
||||
- Supports high availability and clustering
|
||||
|
||||
5. **Language Agnostic**:
|
||||
- HTTP API for service registration
|
||||
- gRPC support
|
||||
- Go client library available
|
||||
|
||||
6. **Health Checking**:
|
||||
- Built-in health checking with automatic deregistration
|
||||
- Multiple health check types (HTTP, TCP, gRPC, script)
|
||||
- Health status propagation
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Registry Interface
|
||||
|
||||
```go
|
||||
// pkg/registry/registry.go
|
||||
type ServiceRegistry interface {
|
||||
// Register a service instance
|
||||
Register(ctx context.Context, service *ServiceInstance) error
|
||||
|
||||
// Deregister a service instance
|
||||
Deregister(ctx context.Context, serviceID string) error
|
||||
|
||||
// Discover service instances
|
||||
Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error)
|
||||
|
||||
// Watch for service changes
|
||||
Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error)
|
||||
|
||||
// Get service health
|
||||
Health(ctx context.Context, serviceID string) (*HealthStatus, error)
|
||||
}
|
||||
|
||||
type ServiceInstance struct {
|
||||
ID string
|
||||
Name string
|
||||
Address string
|
||||
Port int
|
||||
Tags []string
|
||||
Metadata map[string]string
|
||||
}
|
||||
```
|
||||
|
||||
### Consul Implementation
|
||||
|
||||
```go
|
||||
// internal/registry/consul/consul.go
|
||||
type ConsulRegistry struct {
|
||||
client *consul.Client
|
||||
config *ConsulConfig
|
||||
}
|
||||
|
||||
// Register service with Consul
|
||||
func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error {
|
||||
registration := &consul.AgentServiceRegistration{
|
||||
ID: service.ID,
|
||||
Name: service.Name,
|
||||
Address: service.Address,
|
||||
Port: service.Port,
|
||||
Tags: service.Tags,
|
||||
Meta: service.Metadata,
|
||||
Check: &consul.AgentServiceCheck{
|
||||
HTTP: fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port),
|
||||
Interval: "10s",
|
||||
Timeout: "3s",
|
||||
DeregisterCriticalServiceAfter: "30s",
|
||||
},
|
||||
}
|
||||
return r.client.Agent().ServiceRegister(registration)
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### Phase 1: Consul Implementation (Epic 1)
|
||||
- Create service registry interface in `pkg/registry/`
|
||||
- Implement Consul registry in `internal/registry/consul/`
|
||||
- Basic service registration and discovery
|
||||
- Health check integration
|
||||
|
||||
### Phase 2: Kubernetes Support (Epic 6)
|
||||
- Implement Kubernetes service discovery as alternative
|
||||
- Service registry factory that selects implementation based on environment
|
||||
- Support for both Consul and K8s in same codebase
|
||||
|
||||
### Phase 3: Advanced Features (Epic 6)
|
||||
- Service mesh integration (Consul Connect)
|
||||
- Multi-datacenter support
|
||||
- Service tags and filtering
|
||||
- Service metadata and configuration
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
registry:
|
||||
type: consul # or "kubernetes"
|
||||
consul:
|
||||
address: "localhost:8500"
|
||||
datacenter: "dc1"
|
||||
scheme: "http"
|
||||
health_check:
|
||||
interval: "10s"
|
||||
timeout: "3s"
|
||||
deregister_after: "30s"
|
||||
kubernetes:
|
||||
namespace: "default"
|
||||
in_cluster: true
|
||||
```
|
||||
|
||||
## Service Registration Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Service
|
||||
participant Registry[Service Registry Interface]
|
||||
participant Consul
|
||||
participant Health[Health Check]
|
||||
|
||||
Service->>Registry: Register(serviceInstance)
|
||||
Registry->>Consul: Register service
|
||||
Consul->>Consul: Store service info
|
||||
Consul->>Health: Start health checks
|
||||
|
||||
loop Health Check
|
||||
Health->>Service: GET /healthz
|
||||
Service-->>Health: 200 OK
|
||||
Health->>Consul: Update health status
|
||||
end
|
||||
|
||||
Service->>Registry: Deregister(serviceID)
|
||||
Registry->>Consul: Deregister service
|
||||
Consul->>Consul: Remove service
|
||||
```
|
||||
|
||||
## Service Discovery Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Registry[Service Registry]
|
||||
participant Consul
|
||||
participant Service1[Service Instance 1]
|
||||
participant Service2[Service Instance 2]
|
||||
|
||||
Client->>Registry: Discover("auth-service")
|
||||
Registry->>Consul: Query service instances
|
||||
Consul-->>Registry: [instance1, instance2]
|
||||
Registry->>Registry: Filter healthy instances
|
||||
Registry-->>Client: [healthy instances]
|
||||
|
||||
Client->>Service1: gRPC call
|
||||
Service1-->>Client: Response
|
||||
```
|
||||
|
||||
## Development Setup
|
||||
|
||||
### Docker Compose
|
||||
|
||||
```yaml
|
||||
services:
|
||||
consul:
|
||||
image: consul:latest
|
||||
ports:
|
||||
- "8500:8500"
|
||||
command: consul agent -dev -client=0.0.0.0
|
||||
volumes:
|
||||
- consul-data:/consul/data
|
||||
|
||||
volumes:
|
||||
consul-data:
|
||||
```
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Run Consul in dev mode
|
||||
consul agent -dev
|
||||
|
||||
# Or use Docker
|
||||
docker run -d --name consul -p 8500:8500 consul:latest
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Kubernetes
|
||||
|
||||
```yaml
|
||||
# Consul Helm Chart
|
||||
helm repo add hashicorp https://helm.releases.hashicorp.com
|
||||
helm install consul hashicorp/consul --set global.datacenter=dc1
|
||||
```
|
||||
|
||||
### Standalone Cluster
|
||||
|
||||
- Deploy Consul cluster (3-5 nodes)
|
||||
- Configure service discovery endpoints
|
||||
- Set up Consul Connect for service mesh (optional)
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Dynamic Service Discovery**: Services can be added/removed without configuration changes
|
||||
- **Health Checking**: Automatic removal of unhealthy services
|
||||
- **Load Balancing**: Multiple service instances automatically discovered
|
||||
- **Configuration Management**: Consul KV store for service configuration
|
||||
- **Service Mesh Ready**: Can use Consul Connect for advanced features
|
||||
- **Development Friendly**: Easy local setup with Docker
|
||||
|
||||
### Negative
|
||||
- **Additional Infrastructure**: Requires Consul cluster in production
|
||||
- **Network Dependency**: Services depend on Consul availability
|
||||
- **Configuration Complexity**: Need to configure Consul cluster
|
||||
- **Learning Curve**: Team needs to understand Consul concepts
|
||||
|
||||
### Mitigations
|
||||
1. **High Availability**: Deploy Consul cluster (3+ nodes)
|
||||
2. **Caching**: Cache service instances to reduce Consul queries
|
||||
3. **Fallback**: Support Kubernetes service discovery as fallback
|
||||
4. **Documentation**: Comprehensive setup and usage documentation
|
||||
5. **Monitoring**: Monitor Consul health and service registration
|
||||
|
||||
## Alternative: Kubernetes Service Discovery
|
||||
|
||||
For Kubernetes deployments, we also support native Kubernetes service discovery:
|
||||
|
||||
```go
|
||||
// internal/registry/kubernetes/k8s.go
|
||||
type KubernetesRegistry struct {
|
||||
clientset kubernetes.Interface
|
||||
namespace string
|
||||
}
|
||||
|
||||
func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) {
|
||||
endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{})
|
||||
// Convert K8s endpoints to ServiceInstance
|
||||
}
|
||||
```
|
||||
|
||||
## Service Registry Factory
|
||||
|
||||
```go
|
||||
// internal/registry/factory.go
|
||||
func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) {
|
||||
switch cfg.Registry.Type {
|
||||
case "consul":
|
||||
return consul.NewRegistry(cfg.Registry.Consul)
|
||||
case "kubernetes":
|
||||
return kubernetes.NewRegistry(cfg.Registry.Kubernetes)
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md)
|
||||
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md)
|
||||
- [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md)
|
||||
- [Consul Documentation](https://www.consul.io/docs)
|
||||
- [Consul Go Client](https://github.com/hashicorp/consul/api)
|
||||
- [Consul Kubernetes](https://www.consul.io/docs/k8s)
|
||||
|
||||
Reference in New Issue
Block a user