Files
goplt/docs/content/adr/0033-service-discovery-implementation.md
0x1d 38a251968c docs: Align documentation with true microservices architecture
Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.

Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
  - Each service has own entry point (cmd/{service}/)
  - Each service has own gRPC server and database schema
  - Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
  - Single entry point for all external traffic
  - Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation

Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files

New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)

New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation
2025-11-06 08:54:19 +01:00

8.9 KiB

ADR-0033: Service Discovery Implementation

Status

Accepted

Context

The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that:

  • Enables services to find each other dynamically
  • Supports health checking and automatic deregistration
  • Works in both development (Docker Compose) and production (Kubernetes) environments
  • Provides service registration and discovery APIs
  • Supports multiple service instances (load balancing)

Options considered:

  1. Consul - HashiCorp's service discovery and configuration tool
  2. etcd - Distributed key-value store with service discovery
  3. Kubernetes Service Discovery - Native K8s service discovery
  4. Eureka - Netflix service discovery (Java-focused)
  5. Custom Registry - Build our own service registry

Decision

Use Consul as the primary service discovery implementation with support for Kubernetes service discovery as an alternative.

Rationale

  1. Mature and Production-Ready:

    • Battle-tested in production environments
    • Active development and strong community
    • Comprehensive documentation
  2. Feature-Rich:

    • Service registration and health checking
    • Key-value store for configuration
    • Service mesh capabilities (Consul Connect)
    • Multi-datacenter support
    • DNS-based service discovery
  3. Development-Friendly:

    • Easy to run locally (single binary or Docker)
    • Docker Compose integration
    • Good for local development setup
  4. Production-Ready:

    • Works well in Kubernetes (Consul K8s)
    • Can be used alongside Kubernetes service discovery
    • Supports high availability and clustering
  5. Language Agnostic:

    • HTTP API for service registration
    • gRPC support
    • Go client library available
  6. Health Checking:

    • Built-in health checking with automatic deregistration
    • Multiple health check types (HTTP, TCP, gRPC, script)
    • Health status propagation

Architecture

Service Registry Interface

// pkg/registry/registry.go
type ServiceRegistry interface {
    // Register a service instance
    Register(ctx context.Context, service *ServiceInstance) error
    
    // Deregister a service instance
    Deregister(ctx context.Context, serviceID string) error
    
    // Discover service instances
    Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error)
    
    // Watch for service changes
    Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error)
    
    // Get service health
    Health(ctx context.Context, serviceID string) (*HealthStatus, error)
}

type ServiceInstance struct {
    ID       string
    Name     string
    Address  string
    Port     int
    Tags     []string
    Metadata map[string]string
}

Consul Implementation

// internal/registry/consul/consul.go
type ConsulRegistry struct {
    client *consul.Client
    config *ConsulConfig
}

// Register service with Consul
func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error {
    registration := &consul.AgentServiceRegistration{
        ID:      service.ID,
        Name:    service.Name,
        Address: service.Address,
        Port:    service.Port,
        Tags:    service.Tags,
        Meta:    service.Metadata,
        Check: &consul.AgentServiceCheck{
            HTTP:     fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port),
            Interval: "10s",
            Timeout:  "3s",
            DeregisterCriticalServiceAfter: "30s",
        },
    }
    return r.client.Agent().ServiceRegister(registration)
}

Implementation Strategy

Phase 1: Consul Implementation (Epic 1)

  • Create service registry interface in pkg/registry/
  • Implement Consul registry in internal/registry/consul/
  • Basic service registration and discovery
  • Health check integration

Phase 2: Kubernetes Support (Epic 6)

  • Implement Kubernetes service discovery as alternative
  • Service registry factory that selects implementation based on environment
  • Support for both Consul and K8s in same codebase

Phase 3: Advanced Features (Epic 6)

  • Service mesh integration (Consul Connect)
  • Multi-datacenter support
  • Service tags and filtering
  • Service metadata and configuration

Configuration

registry:
  type: consul  # or "kubernetes"
  consul:
    address: "localhost:8500"
    datacenter: "dc1"
    scheme: "http"
    health_check:
      interval: "10s"
      timeout: "3s"
      deregister_after: "30s"
  kubernetes:
    namespace: "default"
    in_cluster: true

Service Registration Flow

sequenceDiagram
    participant Service
    participant Registry[Service Registry Interface]
    participant Consul
    participant Health[Health Check]
    
    Service->>Registry: Register(serviceInstance)
    Registry->>Consul: Register service
    Consul->>Consul: Store service info
    Consul->>Health: Start health checks
    
    loop Health Check
        Health->>Service: GET /healthz
        Service-->>Health: 200 OK
        Health->>Consul: Update health status
    end
    
    Service->>Registry: Deregister(serviceID)
    Registry->>Consul: Deregister service
    Consul->>Consul: Remove service

Service Discovery Flow

sequenceDiagram
    participant Client
    participant Registry[Service Registry]
    participant Consul
    participant Service1[Service Instance 1]
    participant Service2[Service Instance 2]
    
    Client->>Registry: Discover("auth-service")
    Registry->>Consul: Query service instances
    Consul-->>Registry: [instance1, instance2]
    Registry->>Registry: Filter healthy instances
    Registry-->>Client: [healthy instances]
    
    Client->>Service1: gRPC call
    Service1-->>Client: Response

Development Setup

Docker Compose

services:
  consul:
    image: consul:latest
    ports:
      - "8500:8500"
    command: consul agent -dev -client=0.0.0.0
    volumes:
      - consul-data:/consul/data

volumes:
  consul-data:

Local Development

# Run Consul in dev mode
consul agent -dev

# Or use Docker
docker run -d --name consul -p 8500:8500 consul:latest

Production Deployment

Kubernetes

# Consul Helm Chart
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install consul hashicorp/consul --set global.datacenter=dc1

Standalone Cluster

  • Deploy Consul cluster (3-5 nodes)
  • Configure service discovery endpoints
  • Set up Consul Connect for service mesh (optional)

Consequences

Positive

  • Dynamic Service Discovery: Services can be added/removed without configuration changes
  • Health Checking: Automatic removal of unhealthy services
  • Load Balancing: Multiple service instances automatically discovered
  • Configuration Management: Consul KV store for service configuration
  • Service Mesh Ready: Can use Consul Connect for advanced features
  • Development Friendly: Easy local setup with Docker

Negative

  • Additional Infrastructure: Requires Consul cluster in production
  • Network Dependency: Services depend on Consul availability
  • Configuration Complexity: Need to configure Consul cluster
  • Learning Curve: Team needs to understand Consul concepts

Mitigations

  1. High Availability: Deploy Consul cluster (3+ nodes)
  2. Caching: Cache service instances to reduce Consul queries
  3. Fallback: Support Kubernetes service discovery as fallback
  4. Documentation: Comprehensive setup and usage documentation
  5. Monitoring: Monitor Consul health and service registration

Alternative: Kubernetes Service Discovery

For Kubernetes deployments, we also support native Kubernetes service discovery:

// internal/registry/kubernetes/k8s.go
type KubernetesRegistry struct {
    clientset kubernetes.Interface
    namespace string
}

func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) {
    endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{})
    // Convert K8s endpoints to ServiceInstance
}

Service Registry Factory

// internal/registry/factory.go
func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) {
    switch cfg.Registry.Type {
    case "consul":
        return consul.NewRegistry(cfg.Registry.Consul)
    case "kubernetes":
        return kubernetes.NewRegistry(cfg.Registry.Kubernetes)
    default:
        return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type)
    }
}

References