Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.
Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
- Each service has own entry point (cmd/{service}/)
- Each service has own gRPC server and database schema
- Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
- Single entry point for all external traffic
- Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation
Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files
New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)
New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation
8.9 KiB
8.9 KiB
ADR-0033: Service Discovery Implementation
Status
Accepted
Context
The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that:
- Enables services to find each other dynamically
- Supports health checking and automatic deregistration
- Works in both development (Docker Compose) and production (Kubernetes) environments
- Provides service registration and discovery APIs
- Supports multiple service instances (load balancing)
Options considered:
- Consul - HashiCorp's service discovery and configuration tool
- etcd - Distributed key-value store with service discovery
- Kubernetes Service Discovery - Native K8s service discovery
- Eureka - Netflix service discovery (Java-focused)
- Custom Registry - Build our own service registry
Decision
Use Consul as the primary service discovery implementation with support for Kubernetes service discovery as an alternative.
Rationale
-
Mature and Production-Ready:
- Battle-tested in production environments
- Active development and strong community
- Comprehensive documentation
-
Feature-Rich:
- Service registration and health checking
- Key-value store for configuration
- Service mesh capabilities (Consul Connect)
- Multi-datacenter support
- DNS-based service discovery
-
Development-Friendly:
- Easy to run locally (single binary or Docker)
- Docker Compose integration
- Good for local development setup
-
Production-Ready:
- Works well in Kubernetes (Consul K8s)
- Can be used alongside Kubernetes service discovery
- Supports high availability and clustering
-
Language Agnostic:
- HTTP API for service registration
- gRPC support
- Go client library available
-
Health Checking:
- Built-in health checking with automatic deregistration
- Multiple health check types (HTTP, TCP, gRPC, script)
- Health status propagation
Architecture
Service Registry Interface
// pkg/registry/registry.go
type ServiceRegistry interface {
// Register a service instance
Register(ctx context.Context, service *ServiceInstance) error
// Deregister a service instance
Deregister(ctx context.Context, serviceID string) error
// Discover service instances
Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error)
// Watch for service changes
Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error)
// Get service health
Health(ctx context.Context, serviceID string) (*HealthStatus, error)
}
type ServiceInstance struct {
ID string
Name string
Address string
Port int
Tags []string
Metadata map[string]string
}
Consul Implementation
// internal/registry/consul/consul.go
type ConsulRegistry struct {
client *consul.Client
config *ConsulConfig
}
// Register service with Consul
func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error {
registration := &consul.AgentServiceRegistration{
ID: service.ID,
Name: service.Name,
Address: service.Address,
Port: service.Port,
Tags: service.Tags,
Meta: service.Metadata,
Check: &consul.AgentServiceCheck{
HTTP: fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port),
Interval: "10s",
Timeout: "3s",
DeregisterCriticalServiceAfter: "30s",
},
}
return r.client.Agent().ServiceRegister(registration)
}
Implementation Strategy
Phase 1: Consul Implementation (Epic 1)
- Create service registry interface in
pkg/registry/ - Implement Consul registry in
internal/registry/consul/ - Basic service registration and discovery
- Health check integration
Phase 2: Kubernetes Support (Epic 6)
- Implement Kubernetes service discovery as alternative
- Service registry factory that selects implementation based on environment
- Support for both Consul and K8s in same codebase
Phase 3: Advanced Features (Epic 6)
- Service mesh integration (Consul Connect)
- Multi-datacenter support
- Service tags and filtering
- Service metadata and configuration
Configuration
registry:
type: consul # or "kubernetes"
consul:
address: "localhost:8500"
datacenter: "dc1"
scheme: "http"
health_check:
interval: "10s"
timeout: "3s"
deregister_after: "30s"
kubernetes:
namespace: "default"
in_cluster: true
Service Registration Flow
sequenceDiagram
participant Service
participant Registry[Service Registry Interface]
participant Consul
participant Health[Health Check]
Service->>Registry: Register(serviceInstance)
Registry->>Consul: Register service
Consul->>Consul: Store service info
Consul->>Health: Start health checks
loop Health Check
Health->>Service: GET /healthz
Service-->>Health: 200 OK
Health->>Consul: Update health status
end
Service->>Registry: Deregister(serviceID)
Registry->>Consul: Deregister service
Consul->>Consul: Remove service
Service Discovery Flow
sequenceDiagram
participant Client
participant Registry[Service Registry]
participant Consul
participant Service1[Service Instance 1]
participant Service2[Service Instance 2]
Client->>Registry: Discover("auth-service")
Registry->>Consul: Query service instances
Consul-->>Registry: [instance1, instance2]
Registry->>Registry: Filter healthy instances
Registry-->>Client: [healthy instances]
Client->>Service1: gRPC call
Service1-->>Client: Response
Development Setup
Docker Compose
services:
consul:
image: consul:latest
ports:
- "8500:8500"
command: consul agent -dev -client=0.0.0.0
volumes:
- consul-data:/consul/data
volumes:
consul-data:
Local Development
# Run Consul in dev mode
consul agent -dev
# Or use Docker
docker run -d --name consul -p 8500:8500 consul:latest
Production Deployment
Kubernetes
# Consul Helm Chart
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install consul hashicorp/consul --set global.datacenter=dc1
Standalone Cluster
- Deploy Consul cluster (3-5 nodes)
- Configure service discovery endpoints
- Set up Consul Connect for service mesh (optional)
Consequences
Positive
- Dynamic Service Discovery: Services can be added/removed without configuration changes
- Health Checking: Automatic removal of unhealthy services
- Load Balancing: Multiple service instances automatically discovered
- Configuration Management: Consul KV store for service configuration
- Service Mesh Ready: Can use Consul Connect for advanced features
- Development Friendly: Easy local setup with Docker
Negative
- Additional Infrastructure: Requires Consul cluster in production
- Network Dependency: Services depend on Consul availability
- Configuration Complexity: Need to configure Consul cluster
- Learning Curve: Team needs to understand Consul concepts
Mitigations
- High Availability: Deploy Consul cluster (3+ nodes)
- Caching: Cache service instances to reduce Consul queries
- Fallback: Support Kubernetes service discovery as fallback
- Documentation: Comprehensive setup and usage documentation
- Monitoring: Monitor Consul health and service registration
Alternative: Kubernetes Service Discovery
For Kubernetes deployments, we also support native Kubernetes service discovery:
// internal/registry/kubernetes/k8s.go
type KubernetesRegistry struct {
clientset kubernetes.Interface
namespace string
}
func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) {
endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{})
// Convert K8s endpoints to ServiceInstance
}
Service Registry Factory
// internal/registry/factory.go
func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) {
switch cfg.Registry.Type {
case "consul":
return consul.NewRegistry(cfg.Registry.Consul)
case "kubernetes":
return kubernetes.NewRegistry(cfg.Registry.Kubernetes)
default:
return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type)
}
}