311 lines
8.9 KiB
Markdown
311 lines
8.9 KiB
Markdown
# ADR-0033: Service Discovery Implementation
|
|
|
|
## Status
|
|
Accepted
|
|
|
|
## Context
|
|
The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that:
|
|
|
|
- Enables services to find each other dynamically
|
|
- Supports health checking and automatic deregistration
|
|
- Works in both development (Docker Compose) and production (Kubernetes) environments
|
|
- Provides service registration and discovery APIs
|
|
- Supports multiple service instances (load balancing)
|
|
|
|
Options considered:
|
|
1. **Consul** - HashiCorp's service discovery and configuration tool
|
|
2. **etcd** - Distributed key-value store with service discovery
|
|
3. **Kubernetes Service Discovery** - Native K8s service discovery
|
|
4. **Eureka** - Netflix service discovery (Java-focused)
|
|
5. **Custom Registry** - Build our own service registry
|
|
|
|
## Decision
|
|
Use **Consul** as the primary service discovery implementation with support for Kubernetes service discovery as an alternative.
|
|
|
|
### Rationale
|
|
|
|
1. **Mature and Production-Ready**:
|
|
- Battle-tested in production environments
|
|
- Active development and strong community
|
|
- Comprehensive documentation
|
|
|
|
2. **Feature-Rich**:
|
|
- Service registration and health checking
|
|
- Key-value store for configuration
|
|
- Service mesh capabilities (Consul Connect)
|
|
- Multi-datacenter support
|
|
- DNS-based service discovery
|
|
|
|
3. **Development-Friendly**:
|
|
- Easy to run locally (single binary or Docker)
|
|
- Docker Compose integration
|
|
- Good for local development setup
|
|
|
|
4. **Production-Ready**:
|
|
- Works well in Kubernetes (Consul K8s)
|
|
- Can be used alongside Kubernetes service discovery
|
|
- Supports high availability and clustering
|
|
|
|
5. **Language Agnostic**:
|
|
- HTTP API for service registration
|
|
- gRPC support
|
|
- Go client library available
|
|
|
|
6. **Health Checking**:
|
|
- Built-in health checking with automatic deregistration
|
|
- Multiple health check types (HTTP, TCP, gRPC, script)
|
|
- Health status propagation
|
|
|
|
## Architecture
|
|
|
|
### Service Registry Interface
|
|
|
|
```go
|
|
// pkg/registry/registry.go
|
|
type ServiceRegistry interface {
|
|
// Register a service instance
|
|
Register(ctx context.Context, service *ServiceInstance) error
|
|
|
|
// Deregister a service instance
|
|
Deregister(ctx context.Context, serviceID string) error
|
|
|
|
// Discover service instances
|
|
Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error)
|
|
|
|
// Watch for service changes
|
|
Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error)
|
|
|
|
// Get service health
|
|
Health(ctx context.Context, serviceID string) (*HealthStatus, error)
|
|
}
|
|
|
|
type ServiceInstance struct {
|
|
ID string
|
|
Name string
|
|
Address string
|
|
Port int
|
|
Tags []string
|
|
Metadata map[string]string
|
|
}
|
|
```
|
|
|
|
### Consul Implementation
|
|
|
|
```go
|
|
// internal/registry/consul/consul.go
|
|
type ConsulRegistry struct {
|
|
client *consul.Client
|
|
config *ConsulConfig
|
|
}
|
|
|
|
// Register service with Consul
|
|
func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error {
|
|
registration := &consul.AgentServiceRegistration{
|
|
ID: service.ID,
|
|
Name: service.Name,
|
|
Address: service.Address,
|
|
Port: service.Port,
|
|
Tags: service.Tags,
|
|
Meta: service.Metadata,
|
|
Check: &consul.AgentServiceCheck{
|
|
HTTP: fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port),
|
|
Interval: "10s",
|
|
Timeout: "3s",
|
|
DeregisterCriticalServiceAfter: "30s",
|
|
},
|
|
}
|
|
return r.client.Agent().ServiceRegister(registration)
|
|
}
|
|
```
|
|
|
|
## Implementation Strategy
|
|
|
|
### Phase 1: Consul Implementation (Epic 1)
|
|
- Create service registry interface in `pkg/registry/`
|
|
- Implement Consul registry in `internal/registry/consul/`
|
|
- Basic service registration and discovery
|
|
- Health check integration
|
|
|
|
### Phase 2: Kubernetes Support (Epic 6)
|
|
- Implement Kubernetes service discovery as alternative
|
|
- Service registry factory that selects implementation based on environment
|
|
- Support for both Consul and K8s in same codebase
|
|
|
|
### Phase 3: Advanced Features (Epic 6)
|
|
- Service mesh integration (Consul Connect)
|
|
- Multi-datacenter support
|
|
- Service tags and filtering
|
|
- Service metadata and configuration
|
|
|
|
## Configuration
|
|
|
|
```yaml
|
|
registry:
|
|
type: consul # or "kubernetes"
|
|
consul:
|
|
address: "localhost:8500"
|
|
datacenter: "dc1"
|
|
scheme: "http"
|
|
health_check:
|
|
interval: "10s"
|
|
timeout: "3s"
|
|
deregister_after: "30s"
|
|
kubernetes:
|
|
namespace: "default"
|
|
in_cluster: true
|
|
```
|
|
|
|
## Service Registration Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Service
|
|
participant Registry[Service Registry Interface]
|
|
participant Consul
|
|
participant Health[Health Check]
|
|
|
|
Service->>Registry: Register(serviceInstance)
|
|
Registry->>Consul: Register service
|
|
Consul->>Consul: Store service info
|
|
Consul->>Health: Start health checks
|
|
|
|
loop Health Check
|
|
Health->>Service: GET /healthz
|
|
Service-->>Health: 200 OK
|
|
Health->>Consul: Update health status
|
|
end
|
|
|
|
Service->>Registry: Deregister(serviceID)
|
|
Registry->>Consul: Deregister service
|
|
Consul->>Consul: Remove service
|
|
```
|
|
|
|
## Service Discovery Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Client
|
|
participant Registry[Service Registry]
|
|
participant Consul
|
|
participant Service1[Service Instance 1]
|
|
participant Service2[Service Instance 2]
|
|
|
|
Client->>Registry: Discover("auth-service")
|
|
Registry->>Consul: Query service instances
|
|
Consul-->>Registry: [instance1, instance2]
|
|
Registry->>Registry: Filter healthy instances
|
|
Registry-->>Client: [healthy instances]
|
|
|
|
Client->>Service1: gRPC call
|
|
Service1-->>Client: Response
|
|
```
|
|
|
|
## Development Setup
|
|
|
|
### Docker Compose
|
|
|
|
```yaml
|
|
services:
|
|
consul:
|
|
image: consul:latest
|
|
ports:
|
|
- "8500:8500"
|
|
command: consul agent -dev -client=0.0.0.0
|
|
volumes:
|
|
- consul-data:/consul/data
|
|
|
|
volumes:
|
|
consul-data:
|
|
```
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
# Run Consul in dev mode
|
|
consul agent -dev
|
|
|
|
# Or use Docker
|
|
docker run -d --name consul -p 8500:8500 consul:latest
|
|
```
|
|
|
|
## Production Deployment
|
|
|
|
### Kubernetes
|
|
|
|
```yaml
|
|
# Consul Helm Chart
|
|
helm repo add hashicorp https://helm.releases.hashicorp.com
|
|
helm install consul hashicorp/consul --set global.datacenter=dc1
|
|
```
|
|
|
|
### Standalone Cluster
|
|
|
|
- Deploy Consul cluster (3-5 nodes)
|
|
- Configure service discovery endpoints
|
|
- Set up Consul Connect for service mesh (optional)
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
- **Dynamic Service Discovery**: Services can be added/removed without configuration changes
|
|
- **Health Checking**: Automatic removal of unhealthy services
|
|
- **Load Balancing**: Multiple service instances automatically discovered
|
|
- **Configuration Management**: Consul KV store for service configuration
|
|
- **Service Mesh Ready**: Can use Consul Connect for advanced features
|
|
- **Development Friendly**: Easy local setup with Docker
|
|
|
|
### Negative
|
|
- **Additional Infrastructure**: Requires Consul cluster in production
|
|
- **Network Dependency**: Services depend on Consul availability
|
|
- **Configuration Complexity**: Need to configure Consul cluster
|
|
- **Learning Curve**: Team needs to understand Consul concepts
|
|
|
|
### Mitigations
|
|
1. **High Availability**: Deploy Consul cluster (3+ nodes)
|
|
2. **Caching**: Cache service instances to reduce Consul queries
|
|
3. **Fallback**: Support Kubernetes service discovery as fallback
|
|
4. **Documentation**: Comprehensive setup and usage documentation
|
|
5. **Monitoring**: Monitor Consul health and service registration
|
|
|
|
## Alternative: Kubernetes Service Discovery
|
|
|
|
For Kubernetes deployments, we also support native Kubernetes service discovery:
|
|
|
|
```go
|
|
// internal/registry/kubernetes/k8s.go
|
|
type KubernetesRegistry struct {
|
|
clientset kubernetes.Interface
|
|
namespace string
|
|
}
|
|
|
|
func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) {
|
|
endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{})
|
|
// Convert K8s endpoints to ServiceInstance
|
|
}
|
|
```
|
|
|
|
## Service Registry Factory
|
|
|
|
```go
|
|
// internal/registry/factory.go
|
|
func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) {
|
|
switch cfg.Registry.Type {
|
|
case "consul":
|
|
return consul.NewRegistry(cfg.Registry.Consul)
|
|
case "kubernetes":
|
|
return kubernetes.NewRegistry(cfg.Registry.Kubernetes)
|
|
default:
|
|
return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type)
|
|
}
|
|
}
|
|
```
|
|
|
|
## References
|
|
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md)
|
|
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md)
|
|
- [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md)
|
|
- [Consul Documentation](https://www.consul.io/docs)
|
|
- [Consul Go Client](https://github.com/hashicorp/consul/api)
|
|
- [Consul Kubernetes](https://www.consul.io/docs/k8s)
|
|
|