docs: Align documentation with true microservices architecture
Transform all documentation from modular monolith to true microservices
architecture where core services are independently deployable.
Key Changes:
- Core Kernel: Infrastructure only (no business logic)
- Core Services: Auth, Identity, Authz, Audit as separate microservices
- Each service has own entry point (cmd/{service}/)
- Each service has own gRPC server and database schema
- Services register with Consul for service discovery
- API Gateway: Moved from Epic 8 to Epic 1 as core infrastructure
- Single entry point for all external traffic
- Handles routing, JWT validation, rate limiting, CORS
- Service Discovery: Consul as primary mechanism (ADR-0033)
- Database Pattern: Per-service connections with schema isolation
Documentation Updates:
- Updated all 9 architecture documents
- Updated 4 ADRs and created 2 new ADRs (API Gateway, Service Discovery)
- Rewrote Epic 1: Core Kernel & Infrastructure (infrastructure only)
- Rewrote Epic 2: Core Services (Auth, Identity, Authz, Audit as services)
- Updated Epic 3-8 stories for service architecture
- Updated plan.md, playbook.md, requirements.md, index.md
- Updated all epic READMEs and story files
New ADRs:
- ADR-0032: API Gateway Strategy
- ADR-0033: Service Discovery Implementation (Consul)
New Stories:
- Epic 1.7: Service Client Interfaces
- Epic 1.8: API Gateway Implementation
This commit is contained in:
@@ -4,13 +4,14 @@
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The platform needs a database ORM/library that:
|
||||
- Supports PostgreSQL (primary database)
|
||||
- Provides type-safe query building
|
||||
- Supports code generation (reduces boilerplate)
|
||||
- Handles migrations
|
||||
- Supports relationships (many-to-many, etc.)
|
||||
- Integrates with Ent (code generation)
|
||||
The platform follows a microservices architecture where each service has its own database connection. The ORM/library must:
|
||||
- Support PostgreSQL (primary database)
|
||||
- Provide type-safe query building
|
||||
- Support code generation (reduces boilerplate)
|
||||
- Handle migrations per service
|
||||
- Support relationships (many-to-many, etc.)
|
||||
- Integrate with Ent (code generation)
|
||||
- Support schema isolation (each service owns its schema)
|
||||
|
||||
Options considered:
|
||||
1. **entgo.io/ent** - Code-generated, type-safe ORM
|
||||
@@ -45,10 +46,18 @@ Use **entgo.io/ent** as the primary ORM for the platform.
|
||||
- Less flexible than raw SQL for complex queries
|
||||
- Generated code must be committed or verified in CI
|
||||
|
||||
### Database Access Pattern
|
||||
- **Each service has its own database connection pool**: Services do not share database connections
|
||||
- **Schema isolation**: Each service owns its database schema (e.g., `auth_schema`, `identity_schema`, `blog_schema`)
|
||||
- **No cross-service database access**: Services communicate via APIs, not direct database queries
|
||||
- **Shared database instance**: Services share the same PostgreSQL instance but use different schemas
|
||||
- **Alternative**: Database-per-service pattern (each service has its own database) for maximum isolation
|
||||
|
||||
### Implementation Notes
|
||||
- Install: `go get entgo.io/ent/cmd/ent`
|
||||
- Initialize schema: `go run entgo.io/ent/cmd/ent init User Role Permission`
|
||||
- Use `//go:generate` directives for code generation
|
||||
- Run migrations on startup via `client.Schema.Create()`
|
||||
- Create wrapper in `internal/infra/database/client.go` for DI injection
|
||||
- Each service initializes its own schema: `go run entgo.io/ent/cmd/ent init User Role Permission` (Identity Service)
|
||||
- Use `//go:generate` directives for code generation per service
|
||||
- Run migrations on startup via `client.Schema.Create()` for each service
|
||||
- Create database client wrapper per service in `services/{service}/internal/database/client.go`
|
||||
- Each service manages its own connection pool configuration
|
||||
|
||||
|
||||
@@ -9,31 +9,49 @@ The platform needs to scale independently, support team autonomy, and enable fle
|
||||
## Decision
|
||||
Design the platform as **microservices architecture from day one**:
|
||||
|
||||
1. **Service-Based Architecture**: All modules are independent services:
|
||||
- Each module is a separate service with its own process
|
||||
1. **Core Services**: Core business services are separate microservices:
|
||||
- **Auth Service** (`cmd/auth-service/`): JWT token generation/validation
|
||||
- **Identity Service** (`cmd/identity-service/`): User CRUD, password management
|
||||
- **Authz Service** (`cmd/authz-service/`): Permission resolution, authorization
|
||||
- **Audit Service** (`cmd/audit-service/`): Audit logging
|
||||
- Each service has its own process, database connection, and deployment
|
||||
|
||||
2. **API Gateway**: Core infrastructure component (implemented in Epic 1):
|
||||
- Single entry point for all external traffic
|
||||
- Routes requests to backend services via service discovery
|
||||
- Handles authentication, rate limiting, CORS at the edge
|
||||
- Not optional - required for microservices architecture
|
||||
|
||||
3. **Service-Based Architecture**: All modules are independent services:
|
||||
- Each module/service is a separate service with its own process
|
||||
- Services communicate via gRPC (primary) or HTTP (fallback)
|
||||
- Service client interfaces for all inter-service communication
|
||||
- No direct in-process calls between services
|
||||
|
||||
2. **Service Registry**: Central registry for service discovery:
|
||||
4. **Service Registry**: Central registry for service discovery:
|
||||
- All services register on startup
|
||||
- Service discovery via registry
|
||||
- Health checking and automatic deregistration
|
||||
- Support for Consul, etcd, or Kubernetes service discovery
|
||||
|
||||
3. **Communication Patterns**:
|
||||
5. **Communication Patterns**:
|
||||
- **Synchronous**: gRPC service calls (primary), HTTP/REST (fallback)
|
||||
- **Asynchronous**: Event bus via Kafka
|
||||
- **Shared State**: Cache (Redis) and Database (PostgreSQL)
|
||||
- **Shared Infrastructure**: Cache (Redis) and Database (PostgreSQL instance)
|
||||
- **Database Access**: Each service has its own connection pool and schema
|
||||
|
||||
4. **Service Boundaries**: Each module is an independent service:
|
||||
6. **Service Boundaries**: Each service is independent:
|
||||
- Independent Go modules (`go.mod`)
|
||||
- Own database schema (via Ent)
|
||||
- Own API routes
|
||||
- Own database schema (via Ent) - schema isolation
|
||||
- Own API routes (gRPC/HTTP)
|
||||
- Own process and deployment
|
||||
- Can be scaled independently
|
||||
|
||||
5. **Development Simplification**: For local development, multiple services can run in the same process, but they still communicate via service clients (no direct calls)
|
||||
7. **Development Mode**: For local development, services run in the same repository:
|
||||
- Each service has its own entry point and process
|
||||
- Services still communicate via service clients (gRPC/HTTP)
|
||||
- No direct in-process calls
|
||||
- Docker Compose for easy local setup
|
||||
|
||||
## Consequences
|
||||
|
||||
@@ -55,29 +73,36 @@ Design the platform as **microservices architecture from day one**:
|
||||
- **Development Setup**: More complex local development (multiple services)
|
||||
|
||||
### Mitigations
|
||||
- **Service Mesh**: Use service mesh (Istio, Linkerd) for advanced microservices features
|
||||
- **API Gateway**: Central gateway for routing and cross-cutting concerns
|
||||
- **API Gateway**: Implemented in Epic 1 as core infrastructure - handles routing, authentication, rate limiting
|
||||
- **Service Mesh**: Use service mesh (Istio, Linkerd) for advanced microservices features (optional)
|
||||
- **Event Sourcing**: Use events for eventual consistency
|
||||
- **Circuit Breakers**: Implement circuit breakers for resilience
|
||||
- **Comprehensive Observability**: OpenTelemetry, metrics, logging essential
|
||||
- **Docker Compose**: Simplify local development with docker-compose
|
||||
- **Development Mode**: Run multiple services in same process for local dev (still use service clients)
|
||||
- **Service Clients**: All inter-service communication via service clients (gRPC/HTTP)
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### Epic 1: Service Client Interfaces (Epic 1)
|
||||
- Define service client interfaces for all core services
|
||||
- All inter-service communication goes through interfaces
|
||||
### Epic 1: Core Kernel & Infrastructure
|
||||
- Core kernel (infrastructure only): config, logger, DI, health, metrics, observability
|
||||
- API Gateway implementation (core infrastructure component)
|
||||
- Service client interfaces for all core services
|
||||
- Service registry interface and basic implementation
|
||||
|
||||
### Epic 2: Service Registry (Epic 3)
|
||||
- Create service registry interface
|
||||
- Implement service discovery
|
||||
- Support for Consul, Kubernetes service discovery
|
||||
### Epic 2: Core Services Separation
|
||||
- Separate Auth, Identity, Authz, Audit into independent services
|
||||
- Each service: own entry point (`cmd/{service}/`), gRPC server, database connection
|
||||
- Service client implementations (gRPC/HTTP)
|
||||
- Service registration with registry
|
||||
|
||||
### Epic 3: gRPC Services (Epic 5)
|
||||
- Implement gRPC service definitions
|
||||
- Create gRPC servers for all services
|
||||
- Create gRPC clients for service communication
|
||||
### Epic 3: Service Registry & Discovery (Epic 3)
|
||||
- Complete service registry implementation
|
||||
- Service discovery (Consul, Kubernetes)
|
||||
- Service health checking and deregistration
|
||||
|
||||
### Epic 5: gRPC Services (Epic 5)
|
||||
- Complete gRPC service definitions for all services
|
||||
- gRPC clients for service communication
|
||||
- HTTP clients as fallback option
|
||||
|
||||
## References
|
||||
|
||||
@@ -7,22 +7,30 @@ Accepted
|
||||
Services need to communicate with each other in a microservices architecture. All communication must go through well-defined interfaces that support network calls.
|
||||
|
||||
## Decision
|
||||
Use a **service client-based communication strategy**:
|
||||
Use a **service client-based communication strategy** with API Gateway as the entry point:
|
||||
|
||||
1. **Service Client Interfaces** (Primary for synchronous calls):
|
||||
1. **API Gateway** (Entry Point):
|
||||
- All external traffic enters through API Gateway
|
||||
- Gateway routes requests to backend services via service discovery
|
||||
- Gateway handles authentication (JWT validation via Auth Service)
|
||||
- Gateway handles rate limiting, CORS, request transformation
|
||||
|
||||
2. **Service Client Interfaces** (Primary for synchronous calls):
|
||||
- Define interfaces in `pkg/services/` for all services
|
||||
- All implementations are network-based:
|
||||
- `internal/services/grpc/client/` - gRPC clients (primary)
|
||||
- `internal/services/http/client/` - HTTP clients (fallback)
|
||||
- Gateway uses service clients to communicate with backend services
|
||||
- Services use service clients for inter-service communication
|
||||
|
||||
2. **Event Bus** (Primary for asynchronous communication):
|
||||
3. **Event Bus** (Primary for asynchronous communication):
|
||||
- Distributed via Kafka
|
||||
- Preferred for cross-service communication
|
||||
- Event-driven architecture for loose coupling
|
||||
|
||||
3. **Shared Infrastructure** (For state):
|
||||
4. **Shared Infrastructure** (For state):
|
||||
- Redis for cache and distributed state
|
||||
- PostgreSQL for persistent data
|
||||
- PostgreSQL instance for persistent data (each service has its own schema)
|
||||
- Kafka for events
|
||||
|
||||
## Service Client Pattern
|
||||
@@ -47,8 +55,21 @@ type httpIdentityClient struct {
|
||||
}
|
||||
```
|
||||
|
||||
## Communication Flow
|
||||
|
||||
```
|
||||
Client → API Gateway → Backend Service (via service client)
|
||||
Backend Service → Other Service (via service client)
|
||||
```
|
||||
|
||||
All communication goes through service clients - no direct in-process calls even in development mode.
|
||||
|
||||
## Development Mode
|
||||
For local development, multiple services can run in the same process, but they still communicate via service clients (gRPC or HTTP) - no direct in-process calls. This ensures the architecture is consistent.
|
||||
For local development, services run in the same repository but as separate processes:
|
||||
- Each service has its own entry point (`cmd/{service}/`)
|
||||
- Services communicate via service clients (gRPC or HTTP) - no direct in-process calls
|
||||
- Docker Compose orchestrates all services
|
||||
- This ensures the architecture is consistent with production
|
||||
|
||||
## Consequences
|
||||
|
||||
|
||||
@@ -30,12 +30,13 @@ Use a **monorepo structure with service directories** for all services:
|
||||
```
|
||||
goplt/
|
||||
├── cmd/
|
||||
│ ├── platform/ # Core kernel entry point
|
||||
│ ├── auth-service/ # Auth Service entry point
|
||||
│ ├── identity-service/ # Identity Service entry point
|
||||
│ ├── authz-service/ # Authz Service entry point
|
||||
│ ├── audit-service/ # Audit Service entry point
|
||||
│ └── blog-service/ # Blog module service entry point
|
||||
│ ├── platform/ # Core kernel entry point (minimal, infrastructure only)
|
||||
│ ├── api-gateway/ # API Gateway service entry point
|
||||
│ ├── auth-service/ # Auth Service entry point
|
||||
│ ├── identity-service/ # Identity Service entry point
|
||||
│ ├── authz-service/ # Authz Service entry point
|
||||
│ ├── audit-service/ # Audit Service entry point
|
||||
│ └── blog-service/ # Blog feature service entry point
|
||||
├── services/ # Service implementations (optional alternative)
|
||||
│ ├── auth/
|
||||
│ │ ├── internal/ # Service implementation
|
||||
@@ -145,17 +146,22 @@ Use a **monorepo structure with service directories** for all services:
|
||||
- Single entry point `cmd/platform/`
|
||||
- Shared infrastructure established
|
||||
|
||||
### Phase 2: Service Structure (Epic 2)
|
||||
- Create service directories in `cmd/`:
|
||||
- `cmd/auth-service/`
|
||||
- `cmd/identity-service/`
|
||||
- `cmd/authz-service/`
|
||||
- `cmd/audit-service/`
|
||||
### Phase 2: Service Structure (Epic 1-2)
|
||||
- **Epic 1**: Create API Gateway service:
|
||||
- `cmd/api-gateway/` - API Gateway entry point
|
||||
- Service discovery integration
|
||||
- Request routing to backend services
|
||||
- **Epic 2**: Create core service directories:
|
||||
- `cmd/auth-service/` - Auth Service entry point
|
||||
- `cmd/identity-service/` - Identity Service entry point
|
||||
- `cmd/authz-service/` - Authz Service entry point
|
||||
- `cmd/audit-service/` - Audit Service entry point
|
||||
- Create service implementations:
|
||||
- Option A: `services/{service}/internal/` for each service
|
||||
- Option B: `internal/{service}/` for each service (if keeping all in internal/)
|
||||
- Option C: Service code directly in `cmd/{service}/` for simple services
|
||||
- Define service client interfaces in `pkg/services/`
|
||||
- Option A: `services/{service}/internal/` for each service (recommended)
|
||||
- Option B: `internal/{service}/` for each service
|
||||
- Each service has its own database connection pool
|
||||
- Define service client interfaces in `pkg/services/`:
|
||||
- `AuthServiceClient`, `IdentityServiceClient`, `AuthzServiceClient`, `AuditServiceClient`
|
||||
- Implement gRPC/HTTP clients in `internal/services/`
|
||||
|
||||
### Phase 3: Module Services (Epic 4+)
|
||||
@@ -170,18 +176,27 @@ Use a **monorepo structure with service directories** for all services:
|
||||
```
|
||||
goplt/
|
||||
├── cmd/
|
||||
│ ├── platform/ # Core kernel
|
||||
│ ├── platform/ # Core kernel (minimal, infrastructure only)
|
||||
│ ├── api-gateway/ # API Gateway entry point
|
||||
│ ├── auth-service/ # Auth entry point
|
||||
│ ├── identity-service/ # Identity entry point
|
||||
│ └── ...
|
||||
│ ├── authz-service/ # Authz entry point
|
||||
│ ├── audit-service/ # Audit entry point
|
||||
│ └── blog-service/ # Blog feature service entry point
|
||||
├── services/ # Service implementations
|
||||
│ ├── gateway/
|
||||
│ │ ├── internal/ # Gateway implementation
|
||||
│ │ └── api/ # Routing logic
|
||||
│ ├── auth/
|
||||
│ │ ├── internal/ # Service implementation
|
||||
│ │ └── api/ # gRPC/HTTP definitions
|
||||
│ └── ...
|
||||
├── internal/ # Core kernel (shared)
|
||||
│ ├── identity/
|
||||
│ ├── authz/
|
||||
│ ├── audit/
|
||||
│ └── blog/
|
||||
├── internal/ # Core kernel (shared infrastructure)
|
||||
├── pkg/ # Public interfaces
|
||||
└── modules/ # Feature modules
|
||||
└── modules/ # Feature modules (optional structure)
|
||||
```
|
||||
|
||||
This provides:
|
||||
|
||||
166
docs/content/adr/0032-api-gateway-strategy.md
Normal file
166
docs/content/adr/0032-api-gateway-strategy.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# ADR-0032: API Gateway Strategy
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The platform follows a microservices architecture where each service is independently deployable. We need a central entry point that handles:
|
||||
- Request routing to backend services
|
||||
- Authentication and authorization at the edge
|
||||
- Rate limiting and throttling
|
||||
- CORS and request/response transformation
|
||||
- Service discovery integration
|
||||
|
||||
Options considered:
|
||||
1. **Custom API Gateway** - Build our own gateway service
|
||||
2. **Kong** - Open-source API Gateway
|
||||
3. **Envoy** - High-performance proxy
|
||||
4. **Traefik** - Modern reverse proxy
|
||||
|
||||
## Decision
|
||||
Implement a **custom API Gateway service** as a core infrastructure component in Epic 1:
|
||||
|
||||
1. **API Gateway as Core Component**:
|
||||
- Entry point: `cmd/api-gateway/`
|
||||
- Implementation: `services/gateway/internal/`
|
||||
- Implemented in Epic 1 (not deferred to Epic 8)
|
||||
- Required for microservices architecture, not optional
|
||||
|
||||
2. **Responsibilities**:
|
||||
- **Request Routing**: Route requests to backend services via service discovery
|
||||
- **Authentication**: Validate JWT tokens via Auth Service
|
||||
- **Authorization**: Check permissions via Authz Service (for route-level auth)
|
||||
- **Rate Limiting**: Per-user and per-IP rate limiting
|
||||
- **CORS**: Handle cross-origin requests
|
||||
- **Request Transformation**: Modify requests before forwarding
|
||||
- **Response Transformation**: Modify responses before returning
|
||||
- **Load Balancing**: Distribute requests across service instances
|
||||
|
||||
3. **Integration Points**:
|
||||
- Service registry for service discovery
|
||||
- Auth Service client for token validation
|
||||
- Authz Service client for permission checks
|
||||
- Cache (Redis) for rate limiting state
|
||||
|
||||
4. **Implementation Approach**:
|
||||
- Built with Go (Gin/Echo framework)
|
||||
- Uses service clients for backend communication
|
||||
- Configurable routing rules
|
||||
- Middleware-based architecture
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
Client[Client] --> Gateway[API Gateway<br/>:8080]
|
||||
|
||||
Gateway --> AuthClient[Auth Service Client]
|
||||
Gateway --> AuthzClient[Authz Service Client]
|
||||
Gateway --> ServiceRegistry[Service Registry]
|
||||
Gateway --> Cache[Cache<br/>Rate Limiting]
|
||||
|
||||
AuthClient --> AuthSvc[Auth Service<br/>:8081]
|
||||
AuthzClient --> AuthzSvc[Authz Service<br/>:8083]
|
||||
ServiceRegistry --> BackendSvc[Backend Services]
|
||||
|
||||
Gateway --> BackendSvc
|
||||
|
||||
style Gateway fill:#4a90e2,stroke:#2e5c8a,stroke-width:3px,color:#fff
|
||||
style AuthSvc fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
|
||||
style BackendSvc fill:#7b68ee,stroke:#5a4fcf,stroke-width:2px,color:#fff
|
||||
```
|
||||
|
||||
## Request Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Gateway
|
||||
participant AuthSvc
|
||||
participant AuthzSvc
|
||||
participant Registry
|
||||
participant BackendSvc
|
||||
|
||||
Client->>Gateway: HTTP Request
|
||||
Gateway->>Gateway: Rate limiting check
|
||||
Gateway->>AuthSvc: Validate JWT (gRPC)
|
||||
AuthSvc-->>Gateway: Token valid + user info
|
||||
Gateway->>AuthzSvc: Check route permission (gRPC, optional)
|
||||
AuthzSvc-->>Gateway: Authorized
|
||||
Gateway->>Registry: Discover backend service
|
||||
Registry-->>Gateway: Service endpoint
|
||||
Gateway->>BackendSvc: Forward request (gRPC/HTTP)
|
||||
BackendSvc-->>Gateway: Response
|
||||
Gateway-->>Client: HTTP Response
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Single Entry Point**: All external traffic goes through one gateway
|
||||
- **Centralized Security**: Authentication and authorization at the edge
|
||||
- **Performance**: Rate limiting and caching at gateway level
|
||||
- **Flexibility**: Easy to add new routes and services
|
||||
- **Consistency**: Uniform API interface for clients
|
||||
- **Observability**: Central point for metrics and logging
|
||||
|
||||
### Negative
|
||||
- **Single Point of Failure**: Gateway failure affects all traffic
|
||||
- **Additional Latency**: Extra hop in request path
|
||||
- **Complexity**: Additional service to maintain and deploy
|
||||
- **Scaling**: Gateway must scale to handle all traffic
|
||||
|
||||
### Mitigations
|
||||
1. **High Availability**: Deploy multiple gateway instances behind load balancer
|
||||
2. **Circuit Breakers**: Implement circuit breakers for backend service failures
|
||||
3. **Caching**: Cache authentication results and service endpoints
|
||||
4. **Monitoring**: Comprehensive monitoring and alerting
|
||||
5. **Graceful Degradation**: Fallback mechanisms for service failures
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### Epic 1: Core Infrastructure
|
||||
- Create `cmd/api-gateway/` entry point
|
||||
- Implement basic routing with service discovery
|
||||
- JWT validation via Auth Service client
|
||||
- Rate limiting middleware
|
||||
- CORS support
|
||||
|
||||
### Epic 2-3: Enhanced Features
|
||||
- Permission-based routing (via Authz Service)
|
||||
- Request/response transformation
|
||||
- Advanced load balancing
|
||||
- Health check integration
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
gateway:
|
||||
port: 8080
|
||||
routes:
|
||||
- path: /api/v1/auth/**
|
||||
service: auth-service
|
||||
auth_required: false
|
||||
- path: /api/v1/users/**
|
||||
service: identity-service
|
||||
auth_required: true
|
||||
permission: user.read
|
||||
- path: /api/v1/blog/**
|
||||
service: blog-service
|
||||
auth_required: true
|
||||
permission: blog.post.read
|
||||
rate_limiting:
|
||||
enabled: true
|
||||
per_user: 100/minute
|
||||
per_ip: 1000/minute
|
||||
cors:
|
||||
allowed_origins: ["*"]
|
||||
allowed_methods: ["GET", "POST", "PUT", "DELETE"]
|
||||
```
|
||||
|
||||
## References
|
||||
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md)
|
||||
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md)
|
||||
- [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md)
|
||||
- [API Gateway Pattern](https://microservices.io/patterns/apigateway.html)
|
||||
|
||||
309
docs/content/adr/0033-service-discovery-implementation.md
Normal file
309
docs/content/adr/0033-service-discovery-implementation.md
Normal file
@@ -0,0 +1,309 @@
|
||||
# ADR-0033: Service Discovery Implementation
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The platform follows a microservices architecture where services need to discover and communicate with each other. We need a service discovery mechanism that:
|
||||
- Enables services to find each other dynamically
|
||||
- Supports health checking and automatic deregistration
|
||||
- Works in both development (Docker Compose) and production (Kubernetes) environments
|
||||
- Provides service registration and discovery APIs
|
||||
- Supports multiple service instances (load balancing)
|
||||
|
||||
Options considered:
|
||||
1. **Consul** - HashiCorp's service discovery and configuration tool
|
||||
2. **etcd** - Distributed key-value store with service discovery
|
||||
3. **Kubernetes Service Discovery** - Native K8s service discovery
|
||||
4. **Eureka** - Netflix service discovery (Java-focused)
|
||||
5. **Custom Registry** - Build our own service registry
|
||||
|
||||
## Decision
|
||||
Use **Consul** as the primary service discovery implementation with support for Kubernetes service discovery as an alternative.
|
||||
|
||||
### Rationale
|
||||
|
||||
1. **Mature and Production-Ready**:
|
||||
- Battle-tested in production environments
|
||||
- Active development and strong community
|
||||
- Comprehensive documentation
|
||||
|
||||
2. **Feature-Rich**:
|
||||
- Service registration and health checking
|
||||
- Key-value store for configuration
|
||||
- Service mesh capabilities (Consul Connect)
|
||||
- Multi-datacenter support
|
||||
- DNS-based service discovery
|
||||
|
||||
3. **Development-Friendly**:
|
||||
- Easy to run locally (single binary or Docker)
|
||||
- Docker Compose integration
|
||||
- Good for local development setup
|
||||
|
||||
4. **Production-Ready**:
|
||||
- Works well in Kubernetes (Consul K8s)
|
||||
- Can be used alongside Kubernetes service discovery
|
||||
- Supports high availability and clustering
|
||||
|
||||
5. **Language Agnostic**:
|
||||
- HTTP API for service registration
|
||||
- gRPC support
|
||||
- Go client library available
|
||||
|
||||
6. **Health Checking**:
|
||||
- Built-in health checking with automatic deregistration
|
||||
- Multiple health check types (HTTP, TCP, gRPC, script)
|
||||
- Health status propagation
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Registry Interface
|
||||
|
||||
```go
|
||||
// pkg/registry/registry.go
|
||||
type ServiceRegistry interface {
|
||||
// Register a service instance
|
||||
Register(ctx context.Context, service *ServiceInstance) error
|
||||
|
||||
// Deregister a service instance
|
||||
Deregister(ctx context.Context, serviceID string) error
|
||||
|
||||
// Discover service instances
|
||||
Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error)
|
||||
|
||||
// Watch for service changes
|
||||
Watch(ctx context.Context, serviceName string) (<-chan []*ServiceInstance, error)
|
||||
|
||||
// Get service health
|
||||
Health(ctx context.Context, serviceID string) (*HealthStatus, error)
|
||||
}
|
||||
|
||||
type ServiceInstance struct {
|
||||
ID string
|
||||
Name string
|
||||
Address string
|
||||
Port int
|
||||
Tags []string
|
||||
Metadata map[string]string
|
||||
}
|
||||
```
|
||||
|
||||
### Consul Implementation
|
||||
|
||||
```go
|
||||
// internal/registry/consul/consul.go
|
||||
type ConsulRegistry struct {
|
||||
client *consul.Client
|
||||
config *ConsulConfig
|
||||
}
|
||||
|
||||
// Register service with Consul
|
||||
func (r *ConsulRegistry) Register(ctx context.Context, service *ServiceInstance) error {
|
||||
registration := &consul.AgentServiceRegistration{
|
||||
ID: service.ID,
|
||||
Name: service.Name,
|
||||
Address: service.Address,
|
||||
Port: service.Port,
|
||||
Tags: service.Tags,
|
||||
Meta: service.Metadata,
|
||||
Check: &consul.AgentServiceCheck{
|
||||
HTTP: fmt.Sprintf("http://%s:%d/healthz", service.Address, service.Port),
|
||||
Interval: "10s",
|
||||
Timeout: "3s",
|
||||
DeregisterCriticalServiceAfter: "30s",
|
||||
},
|
||||
}
|
||||
return r.client.Agent().ServiceRegister(registration)
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### Phase 1: Consul Implementation (Epic 1)
|
||||
- Create service registry interface in `pkg/registry/`
|
||||
- Implement Consul registry in `internal/registry/consul/`
|
||||
- Basic service registration and discovery
|
||||
- Health check integration
|
||||
|
||||
### Phase 2: Kubernetes Support (Epic 6)
|
||||
- Implement Kubernetes service discovery as alternative
|
||||
- Service registry factory that selects implementation based on environment
|
||||
- Support for both Consul and K8s in same codebase
|
||||
|
||||
### Phase 3: Advanced Features (Epic 6)
|
||||
- Service mesh integration (Consul Connect)
|
||||
- Multi-datacenter support
|
||||
- Service tags and filtering
|
||||
- Service metadata and configuration
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
registry:
|
||||
type: consul # or "kubernetes"
|
||||
consul:
|
||||
address: "localhost:8500"
|
||||
datacenter: "dc1"
|
||||
scheme: "http"
|
||||
health_check:
|
||||
interval: "10s"
|
||||
timeout: "3s"
|
||||
deregister_after: "30s"
|
||||
kubernetes:
|
||||
namespace: "default"
|
||||
in_cluster: true
|
||||
```
|
||||
|
||||
## Service Registration Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Service
|
||||
participant Registry[Service Registry Interface]
|
||||
participant Consul
|
||||
participant Health[Health Check]
|
||||
|
||||
Service->>Registry: Register(serviceInstance)
|
||||
Registry->>Consul: Register service
|
||||
Consul->>Consul: Store service info
|
||||
Consul->>Health: Start health checks
|
||||
|
||||
loop Health Check
|
||||
Health->>Service: GET /healthz
|
||||
Service-->>Health: 200 OK
|
||||
Health->>Consul: Update health status
|
||||
end
|
||||
|
||||
Service->>Registry: Deregister(serviceID)
|
||||
Registry->>Consul: Deregister service
|
||||
Consul->>Consul: Remove service
|
||||
```
|
||||
|
||||
## Service Discovery Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Registry[Service Registry]
|
||||
participant Consul
|
||||
participant Service1[Service Instance 1]
|
||||
participant Service2[Service Instance 2]
|
||||
|
||||
Client->>Registry: Discover("auth-service")
|
||||
Registry->>Consul: Query service instances
|
||||
Consul-->>Registry: [instance1, instance2]
|
||||
Registry->>Registry: Filter healthy instances
|
||||
Registry-->>Client: [healthy instances]
|
||||
|
||||
Client->>Service1: gRPC call
|
||||
Service1-->>Client: Response
|
||||
```
|
||||
|
||||
## Development Setup
|
||||
|
||||
### Docker Compose
|
||||
|
||||
```yaml
|
||||
services:
|
||||
consul:
|
||||
image: consul:latest
|
||||
ports:
|
||||
- "8500:8500"
|
||||
command: consul agent -dev -client=0.0.0.0
|
||||
volumes:
|
||||
- consul-data:/consul/data
|
||||
|
||||
volumes:
|
||||
consul-data:
|
||||
```
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Run Consul in dev mode
|
||||
consul agent -dev
|
||||
|
||||
# Or use Docker
|
||||
docker run -d --name consul -p 8500:8500 consul:latest
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Kubernetes
|
||||
|
||||
```yaml
|
||||
# Consul Helm Chart
|
||||
helm repo add hashicorp https://helm.releases.hashicorp.com
|
||||
helm install consul hashicorp/consul --set global.datacenter=dc1
|
||||
```
|
||||
|
||||
### Standalone Cluster
|
||||
|
||||
- Deploy Consul cluster (3-5 nodes)
|
||||
- Configure service discovery endpoints
|
||||
- Set up Consul Connect for service mesh (optional)
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Dynamic Service Discovery**: Services can be added/removed without configuration changes
|
||||
- **Health Checking**: Automatic removal of unhealthy services
|
||||
- **Load Balancing**: Multiple service instances automatically discovered
|
||||
- **Configuration Management**: Consul KV store for service configuration
|
||||
- **Service Mesh Ready**: Can use Consul Connect for advanced features
|
||||
- **Development Friendly**: Easy local setup with Docker
|
||||
|
||||
### Negative
|
||||
- **Additional Infrastructure**: Requires Consul cluster in production
|
||||
- **Network Dependency**: Services depend on Consul availability
|
||||
- **Configuration Complexity**: Need to configure Consul cluster
|
||||
- **Learning Curve**: Team needs to understand Consul concepts
|
||||
|
||||
### Mitigations
|
||||
1. **High Availability**: Deploy Consul cluster (3+ nodes)
|
||||
2. **Caching**: Cache service instances to reduce Consul queries
|
||||
3. **Fallback**: Support Kubernetes service discovery as fallback
|
||||
4. **Documentation**: Comprehensive setup and usage documentation
|
||||
5. **Monitoring**: Monitor Consul health and service registration
|
||||
|
||||
## Alternative: Kubernetes Service Discovery
|
||||
|
||||
For Kubernetes deployments, we also support native Kubernetes service discovery:
|
||||
|
||||
```go
|
||||
// internal/registry/kubernetes/k8s.go
|
||||
type KubernetesRegistry struct {
|
||||
clientset kubernetes.Interface
|
||||
namespace string
|
||||
}
|
||||
|
||||
func (r *KubernetesRegistry) Discover(ctx context.Context, serviceName string) ([]*ServiceInstance, error) {
|
||||
endpoints, err := r.clientset.CoreV1().Endpoints(r.namespace).Get(ctx, serviceName, metav1.GetOptions{})
|
||||
// Convert K8s endpoints to ServiceInstance
|
||||
}
|
||||
```
|
||||
|
||||
## Service Registry Factory
|
||||
|
||||
```go
|
||||
// internal/registry/factory.go
|
||||
func NewServiceRegistry(cfg *config.Config) (registry.ServiceRegistry, error) {
|
||||
switch cfg.Registry.Type {
|
||||
case "consul":
|
||||
return consul.NewRegistry(cfg.Registry.Consul)
|
||||
case "kubernetes":
|
||||
return kubernetes.NewRegistry(cfg.Registry.Kubernetes)
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown registry type: %s", cfg.Registry.Type)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## References
|
||||
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md)
|
||||
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md)
|
||||
- [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md)
|
||||
- [Consul Documentation](https://www.consul.io/docs)
|
||||
- [Consul Go Client](https://github.com/hashicorp/consul/api)
|
||||
- [Consul Kubernetes](https://www.consul.io/docs/k8s)
|
||||
|
||||
@@ -71,9 +71,11 @@ Each ADR follows this structure:
|
||||
|
||||
### Architecture & Scaling
|
||||
|
||||
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md) - Microservices architecture from day one
|
||||
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md) - Service client abstraction and communication patterns
|
||||
- [ADR-0029: Microservices Architecture](./0029-microservices-architecture.md) - micromicroservices architecture from day one
|
||||
- [ADR-0030: Service Communication Strategy](./0030-service-communication-strategy.md) - Service client abstraction and communication patterns with API Gateway
|
||||
- [ADR-0031: Service Repository Structure](./0031-service-repository-structure.md) - Monorepo with service directories
|
||||
- [ADR-0032: API Gateway Strategy](./0032-api-gateway-strategy.md) - API Gateway as core infrastructure component
|
||||
- [ADR-0033: Service Discovery Implementation](./0033-service-discovery-implementation.md) - Consul-based service discovery (primary), Kubernetes as alternative
|
||||
|
||||
## Adding New ADRs
|
||||
|
||||
|
||||
Reference in New Issue
Block a user