Files
goplt/docs/content/adr/0033-service-discovery-implementation.md
0x1d b4b918cba8
All checks were successful
CI / Test (pull_request) Successful in 27s
CI / Lint (pull_request) Successful in 20s
CI / Build (pull_request) Successful in 16s
CI / Format Check (pull_request) Successful in 2s
docs: ensure newline before lists across docs for MkDocs rendering
2025-11-06 10:56:50 +01:00

311 lines
8.9 KiB
Markdown

# ADR-0033: Service Discovery Implementation
## Status
Accepted
## Context
The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that:
- Enables services to find each other dynamically
- Supports health checking and automatic deregistration
- Works in both development (Docker Compose) and production (Kubernetes) environments
- Provides service registration and discovery APIs
- Supports multiple service instances (load balancing)
Options considered:
1. **Consul** - HashiCorp's service discovery and configuration tool
2. **etcd** - Distributed key-value store with service discovery
3. **Kubernetes Service Discovery** - Native K8s service discovery
4. **Eureka** - Netflix service discovery (Java-focused)
5. **Custom Registry** - Build our own service registry
## Decision
Use **Consul** as the primary service discovery implementation with support for Kubernetes service discovery as an alternative.
### Rationale
1. **Mature and Production-Ready**:
- Battle-tested in production environments
- Active development and strong community
- Comprehensive documentation
2. **Feature-Rich**:
- Service registration and health checking
- Key-value store for configuration
- Service mesh capabilities (Consul Connect)
- Multi-datacenter support
- DNS-based service discovery
3. **Development-Friendly**:
- Easy to run locally (single binary or Docker)
- Docker Compose integration
- Good for local development setup
4. **Production-Ready**:
- Works well in Kubernetes (Consul K8s)
- Can be used alongside Kubernetes service discovery
- Supports high availability and clustering
5. **Language Agnostic**:
- HTTP API for service registration
- gRPC support
- Go client library available
6. **Health Checking**:
- Built-in health checking with automatic deregistration
- Multiple health check types (HTTP, TCP, gRPC, script)
- Health status propagation
## Architecture
### Service Registry Interface
```go
// pkg/registry/registry.go
type ServiceRegistry interface {
// Register a service instance
Register(ctx context.Context, service *ServiceInstance) error
// Deregister a service instance
Deregister(ctx context.Context, serviceID string) error
// Discover service instances
Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error)
// Watch for service changes
Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error)
// Get service health
Health(ctx context.Context, serviceID string) (*HealthStatus, error)
}
type ServiceInstance struct {
ID string
Name string
Address string
Port int
Tags []string
Metadata map[string]string
}
```
### Consul Implementation
```go
// internal/registry/consul/consul.go
type ConsulRegistry struct {
client *consul.Client
config *ConsulConfig
}
// Register service with Consul
func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error {
registration := &consul.AgentServiceRegistration{
ID: service.ID,
Name: service.Name,
Address: service.Address,
Port: service.Port,
Tags: service.Tags,
Meta: service.Metadata,
Check: &consul.AgentServiceCheck{
HTTP: fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port),
Interval: "10s",
Timeout: "3s",
DeregisterCriticalServiceAfter: "30s",
},
}
return r.client.Agent().ServiceRegister(registration)
}
```
## Implementation Strategy
### Phase 1: Consul Implementation (Epic 1)
- Create service registry interface in `pkg/registry/`
- Implement Consul registry in `internal/registry/consul/`
- Basic service registration and discovery
- Health check integration
### Phase 2: Kubernetes Support (Epic 6)
- Implement Kubernetes service discovery as alternative
- Service registry factory that selects implementation based on environment
- Support for both Consul and K8s in same codebase
### Phase 3: Advanced Features (Epic 6)
- Service mesh integration (Consul Connect)
- Multi-datacenter support
- Service tags and filtering
- Service metadata and configuration
## Configuration
```yaml
registry:
type: consul # or "kubernetes"
consul:
address: "localhost:8500"
datacenter: "dc1"
scheme: "http"
health_check:
interval: "10s"
timeout: "3s"
deregister_after: "30s"
kubernetes:
namespace: "default"
in_cluster: true
```
## Service Registration Flow
```mermaid
sequenceDiagram
participant Service
participant Registry[Service Registry Interface]
participant Consul
participant Health[Health Check]
Service->>Registry: Register(serviceInstance)
Registry->>Consul: Register service
Consul->>Consul: Store service info
Consul->>Health: Start health checks
loop Health Check
Health->>Service: GET /healthz
Service-->>Health: 200 OK
Health->>Consul: Update health status
end
Service->>Registry: Deregister(serviceID)
Registry->>Consul: Deregister service
Consul->>Consul: Remove service
```
## Service Discovery Flow
```mermaid
sequenceDiagram
participant Client
participant Registry[Service Registry]
participant Consul
participant Service1[Service Instance 1]
participant Service2[Service Instance 2]
Client->>Registry: Discover("auth-service")
Registry->>Consul: Query service instances
Consul-->>Registry: [instance1, instance2]
Registry->>Registry: Filter healthy instances
Registry-->>Client: [healthy instances]
Client->>Service1: gRPC call
Service1-->>Client: Response
```
## Development Setup
### Docker Compose
```yaml
services:
consul:
image: consul:latest
ports:
- "8500:8500"
command: consul agent -dev -client=0.0.0.0
volumes:
- consul-data:/consul/data
volumes:
consul-data:
```
### Local Development
```bash
# Run Consul in dev mode
consul agent -dev
# Or use Docker
docker run -d --name consul -p 8500:8500 consul:latest
```
## Production Deployment
### Kubernetes
```yaml
# Consul Helm Chart
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install consul hashicorp/consul --set global.datacenter=dc1
```
### Standalone Cluster
- Deploy Consul cluster (3-5 nodes)
- Configure service discovery endpoints
- Set up Consul Connect for service mesh (optional)
## Consequences
### Positive
- **Dynamic Service Discovery**: Services can be added/removed without configuration changes
- **Health Checking**: Automatic removal of unhealthy services
- **Load Balancing**: Multiple service instances automatically discovered
- **Configuration Management**: Consul KV store for service configuration
- **Service Mesh Ready**: Can use Consul Connect for advanced features
- **Development Friendly**: Easy local setup with Docker
### Negative
- **Additional Infrastructure**: Requires Consul cluster in production
- **Network Dependency**: Services depend on Consul availability
- **Configuration Complexity**: Need to configure Consul cluster
- **Learning Curve**: Team needs to understand Consul concepts
### Mitigations
1. **High Availability**: Deploy Consul cluster (3+ nodes)
2. **Caching**: Cache service instances to reduce Consul queries
3. **Fallback**: Support Kubernetes service discovery as fallback
4. **Documentation**: Comprehensive setup and usage documentation
5. **Monitoring**: Monitor Consul health and service registration
## Alternative: Kubernetes Service Discovery
For Kubernetes deployments, we also support native Kubernetes service discovery:
```go
// internal/registry/kubernetes/k8s.go
type KubernetesRegistry struct {
clientset kubernetes.Interface
namespace string
}
func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) {
endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{})
// Convert K8s endpoints to ServiceInstance
}
```
## Service Registry Factory
```go
// internal/registry/factory.go
func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) {
switch cfg.Registry.Type {
case "consul":
return consul.NewRegistry(cfg.Registry.Consul)
case "kubernetes":
return kubernetes.NewRegistry(cfg.Registry.Kubernetes)
default:
return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type)
}
}
```
## References
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md)
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md)
- [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md)
- [Consul Documentation](https://www.consul.io/docs)
- [Consul Go Client](https://github.com/hashicorp/consul/api)
- [Consul Kubernetes](https://www.consul.io/docs/k8s)