# ADR-0033: Service Discovery Implementation ## Status Accepted ## Context The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that: - Enables services to find each other dynamically - Supports health checking and automatic deregistration - Works in both development (Docker Compose) and production (Kubernetes) environments - Provides service registration and discovery APIs - Supports multiple service instances (load balancing) Options considered: 1. **Consul** - HashiCorp's service discovery and configuration tool 2. **etcd** - Distributed key-value store with service discovery 3. **Kubernetes Service Discovery** - Native K8s service discovery 4. **Eureka** - Netflix service discovery (Java-focused) 5. **Custom Registry** - Build our own service registry ## Decision Use **Consul** as the primary service discovery implementation with support for Kubernetes service discovery as an alternative. ### Rationale 1. **Mature and Production-Ready**: - Battle-tested in production environments - Active development and strong community - Comprehensive documentation 2. **Feature-Rich**: - Service registration and health checking - Key-value store for configuration - Service mesh capabilities (Consul Connect) - Multi-datacenter support - DNS-based service discovery 3. **Development-Friendly**: - Easy to run locally (single binary or Docker) - Docker Compose integration - Good for local development setup 4. **Production-Ready**: - Works well in Kubernetes (Consul K8s) - Can be used alongside Kubernetes service discovery - Supports high availability and clustering 5. **Language Agnostic**: - HTTP API for service registration - gRPC support - Go client library available 6. **Health Checking**: - Built-in health checking with automatic deregistration - Multiple health check types (HTTP, TCP, gRPC, script) - Health status propagation ## Architecture ### Service Registry Interface ```go // pkg/registry/registry.go type ServiceRegistry interface { // Register a service instance Register(ctx context.Context, service *ServiceInstance) error // Deregister a service instance Deregister(ctx context.Context, serviceID string) error // Discover service instances Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) // Watch for service changes Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error) // Get service health Health(ctx context.Context, serviceID string) (*HealthStatus, error) } type ServiceInstance struct { ID string Name string Address string Port int Tags []string Metadata map[string]string } ``` ### Consul Implementation ```go // internal/registry/consul/consul.go type ConsulRegistry struct { client *consul.Client config *ConsulConfig } // Register service with Consul func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error { registration := &consul.AgentServiceRegistration{ ID: service.ID, Name: service.Name, Address: service.Address, Port: service.Port, Tags: service.Tags, Meta: service.Metadata, Check: &consul.AgentServiceCheck{ HTTP: fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port), Interval: "10s", Timeout: "3s", DeregisterCriticalServiceAfter: "30s", }, } return r.client.Agent().ServiceRegister(registration) } ``` ## Implementation Strategy ### Phase 1: Consul Implementation (Epic 1) - Create service registry interface in `pkg/registry/` - Implement Consul registry in `internal/registry/consul/` - Basic service registration and discovery - Health check integration ### Phase 2: Kubernetes Support (Epic 6) - Implement Kubernetes service discovery as alternative - Service registry factory that selects implementation based on environment - Support for both Consul and K8s in same codebase ### Phase 3: Advanced Features (Epic 6) - Service mesh integration (Consul Connect) - Multi-datacenter support - Service tags and filtering - Service metadata and configuration ## Configuration ```yaml registry: type: consul # or "kubernetes" consul: address: "localhost:8500" datacenter: "dc1" scheme: "http" health_check: interval: "10s" timeout: "3s" deregister_after: "30s" kubernetes: namespace: "default" in_cluster: true ``` ## Service Registration Flow ```mermaid sequenceDiagram participant Service participant Registry[Service Registry Interface] participant Consul participant Health[Health Check] Service->>Registry: Register(serviceInstance) Registry->>Consul: Register service Consul->>Consul: Store service info Consul->>Health: Start health checks loop Health Check Health->>Service: GET /healthz Service-->>Health: 200 OK Health->>Consul: Update health status end Service->>Registry: Deregister(serviceID) Registry->>Consul: Deregister service Consul->>Consul: Remove service ``` ## Service Discovery Flow ```mermaid sequenceDiagram participant Client participant Registry[Service Registry] participant Consul participant Service1[Service Instance 1] participant Service2[Service Instance 2] Client->>Registry: Discover("auth-service") Registry->>Consul: Query service instances Consul-->>Registry: [instance1, instance2] Registry->>Registry: Filter healthy instances Registry-->>Client: [healthy instances] Client->>Service1: gRPC call Service1-->>Client: Response ``` ## Development Setup ### Docker Compose ```yaml services: consul: image: consul:latest ports: - "8500:8500" command: consul agent -dev -client=0.0.0.0 volumes: - consul-data:/consul/data volumes: consul-data: ``` ### Local Development ```bash # Run Consul in dev mode consul agent -dev # Or use Docker docker run -d --name consul -p 8500:8500 consul:latest ``` ## Production Deployment ### Kubernetes ```yaml # Consul Helm Chart helm repo add hashicorp https://helm.releases.hashicorp.com helm install consul hashicorp/consul --set global.datacenter=dc1 ``` ### Standalone Cluster - Deploy Consul cluster (3-5 nodes) - Configure service discovery endpoints - Set up Consul Connect for service mesh (optional) ## Consequences ### Positive - **Dynamic Service Discovery**: Services can be added/removed without configuration changes - **Health Checking**: Automatic removal of unhealthy services - **Load Balancing**: Multiple service instances automatically discovered - **Configuration Management**: Consul KV store for service configuration - **Service Mesh Ready**: Can use Consul Connect for advanced features - **Development Friendly**: Easy local setup with Docker ### Negative - **Additional Infrastructure**: Requires Consul cluster in production - **Network Dependency**: Services depend on Consul availability - **Configuration Complexity**: Need to configure Consul cluster - **Learning Curve**: Team needs to understand Consul concepts ### Mitigations 1. **High Availability**: Deploy Consul cluster (3+ nodes) 2. **Caching**: Cache service instances to reduce Consul queries 3. **Fallback**: Support Kubernetes service discovery as fallback 4. **Documentation**: Comprehensive setup and usage documentation 5. **Monitoring**: Monitor Consul health and service registration ## Alternative: Kubernetes Service Discovery For Kubernetes deployments, we also support native Kubernetes service discovery: ```go // internal/registry/kubernetes/k8s.go type KubernetesRegistry struct { clientset kubernetes.Interface namespace string } func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) { endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{}) // Convert K8s endpoints to ServiceInstance } ``` ## Service Registry Factory ```go // internal/registry/factory.go func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) { switch cfg.Registry.Type { case "consul": return consul.NewRegistry(cfg.Registry.Consul) case "kubernetes": return kubernetes.NewRegistry(cfg.Registry.Kubernetes) default: return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type) } } ``` ## References - [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md) - [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md) - [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md) - [Consul Documentation](https://www.consul.io/docs) - [Consul Go Client](https://github.com/hashicorp/consul/api) - [Consul Kubernetes](https://www.consul.io/docs/k8s)