docs: add mkdocs, update links, add architecture documentation

2025-11-05 07:44:21 +01:00
parent 6a17236474
commit 54a047f5dc
351 changed files with 3482 additions and 10 deletions
--- a/docs/content/adr/0001-go-module-path.md
+++ b/docs/content/adr/0001-go-module-path.md
@@ -0,0 +1,37 @@
+# ADR-0001: Go Module Path
+
+## Status
+Accepted
+
+## Context
+The project needs a Go module path that uniquely identifies the platform. This path will be used:
+- In `go.mod` file
+- For importing packages within the project
+- For module dependencies
+- For future module publishing
+
+## Decision
+Use `git.dcentral.systems/toolz/goplt` as the Go module path.
+
+**Rationale:**
+- Matches the organization's Git hosting structure
+- Follows Go module naming conventions
+- Clearly identifies the project as a Go platform tool
+- Prevents naming conflicts with other modules
+
+## Consequences
+
+### Positive
+- Clear, descriptive module path
+- Aligns with organization's infrastructure
+- Easy to identify in dependency graphs
+
+### Negative
+- Requires access to `git.dcentral.systems` for module resolution
+- May need to configure GOPRIVATE/GONOPROXY if using private registry
+
+### Implementation Notes
+- Initialize module: `go mod init git.dcentral.systems/toolz/goplt`
+- Update all import paths in code to use this module path
+- Configure `.git/config` or Go environment variables if needed for private module access
+
--- a/docs/content/adr/0002-go-version.md
+++ b/docs/content/adr/0002-go-version.md
@@ -0,0 +1,39 @@
+# ADR-0002: Go Version
+
+## Status
+Accepted
+
+## Context
+Go releases new versions regularly with new features, performance improvements, and security fixes. We need to choose a Go version that:
+- Provides necessary features for the platform
+- Has good ecosystem support
+- Is stable and production-ready
+- Supports required tooling (plugins, etc.)
+
+## Decision
+Use **Go 1.24.3** as the minimum required version for the platform.
+
+**Rationale:**
+- Latest stable version available
+- Provides all required features for the platform
+- Ensures compatibility with modern Go tooling
+- Supports all planned features (modules, plugins, generics)
+
+## Consequences
+
+### Positive
+- Access to latest Go features and performance improvements
+- Better security with latest patches
+- Modern tooling support
+
+### Negative
+- Requires developers to have Go 1.24.3+ installed
+- CI/CD must use compatible Go version
+- May limit compatibility with some older dependencies (if any)
+
+### Implementation Notes
+- Specify in `go.mod`: `go 1.24`
+- Document in `README.md` and CI configuration
+- Update `.github/workflows/ci.yml` to use `actions/setup-go@v5` with version `1.24.3`
+- Add version check script if needed
+
--- a/docs/content/adr/0003-dependency-injection-framework.md
+++ b/docs/content/adr/0003-dependency-injection-framework.md
@@ -0,0 +1,49 @@
+# ADR-0003: Dependency Injection Framework
+
+## Status
+Accepted
+
+## Context
+The platform requires dependency injection to:
+- Manage service lifecycle
+- Wire dependencies between components
+- Support module system initialization
+- Handle graceful shutdown
+- Provide testability through dependency substitution
+
+Options considered:
+1. **uber-go/fx** - Runtime dependency injection with lifecycle management
+2. **uber-go/dig** - Compile-time dependency injection
+3. **Manual constructor injection** - No framework, explicit wiring
+
+## Decision
+Use **uber-go/fx** (v1.23.0+) as the dependency injection framework.
+
+**Rationale:**
+- Provides lifecycle management (OnStart/OnStop hooks) crucial for services
+- Supports module-based architecture through fx.Option composition
+- Runtime dependency resolution with compile-time type safety
+- Excellent for modular monolith architecture
+- Well-documented and actively maintained
+- Used by major Go projects (Uber, etc.)
+
+## Consequences
+
+### Positive
+- Clean lifecycle management for services
+- Easy module composition via fx.Option
+- Graceful shutdown handling built-in
+- Test-friendly with fx.Options for test overrides
+
+### Negative
+- Runtime reflection overhead (minimal)
+- Learning curve for developers unfamiliar with fx
+- Slightly more complex error messages on dependency resolution failures
+
+### Implementation Notes
+- Install: `go get go.uber.org/fx@v1.23.0`
+- Create `internal/di/container.go` with fx.New()
+- Use fx.Provide() for service registration
+- Use fx.Invoke() for initialization tasks
+- Leverage fx.Lifecycle for service startup/shutdown
+
--- a/docs/content/adr/0004-configuration-management.md
+++ b/docs/content/adr/0004-configuration-management.md
@@ -0,0 +1,50 @@
+# ADR-0004: Configuration Management Library
+
+## Status
+Accepted
+
+## Context
+The platform needs a configuration system that:
+- Supports hierarchical configuration (defaults → files → env → secrets)
+- Handles multiple formats (YAML, JSON, env vars)
+- Provides type-safe access to configuration values
+- Supports environment-specific overrides
+- Can integrate with secret managers (future)
+
+Options considered:
+1. **spf13/viper** - Comprehensive configuration management
+2. **envconfig** - Environment variable only
+3. **koanf** - Lightweight configuration library
+4. **Standard library + manual parsing** - No external dependency
+
+## Decision
+Use **spf13/viper** (v1.18.0+) with **spf13/cobra** (v1.8.0+) for configuration management.
+
+**Rationale:**
+- Industry standard for Go configuration management
+- Supports multiple sources (files, env vars, flags)
+- Hierarchical configuration with precedence rules
+- Easy integration with Cobra for CLI commands
+- Well-documented and widely used
+- Supports future secret manager integration
+
+## Consequences
+
+### Positive
+- Flexible configuration loading from multiple sources
+- Easy to add new configuration sources
+- Type-safe access methods
+- Environment variable support via automatic env binding
+
+### Negative
+- Additional dependency
+- Viper can be verbose for simple use cases
+- Some learning curve for advanced features
+
+### Implementation Notes
+- Install: `go get github.com/spf13/viper@v1.18.0` and `github.com/spf13/cobra@v1.8.0`
+- Create `pkg/config/config.go` interface to abstract Viper
+- Implement `internal/config/viper_config.go` as concrete implementation
+- Load order: `default.yaml` → `development.yaml`/`production.yaml` → env vars → secrets (future)
+- Use typed getters (GetString, GetInt, GetBool) for type safety
+
--- a/docs/content/adr/0005-logging-framework.md
+++ b/docs/content/adr/0005-logging-framework.md
@@ -0,0 +1,50 @@
+# ADR-0005: Logging Framework
+
+## Status
+Accepted
+
+## Context
+The platform requires structured logging that:
+- Supports multiple log levels
+- Provides structured output (JSON for production)
+- Allows adding contextual fields
+- Performs well under load
+- Integrates with observability tools
+
+Options considered:
+1. **go.uber.org/zap** - High-performance structured logging
+2. **rs/zerolog** - Zero-allocation logger
+3. **sirupsen/logrus** - Structured logger (maintenance mode)
+4. **Standard library log** - Basic logging (insufficient)
+
+## Decision
+Use **go.uber.org/zap** (v1.26.0+) as the logging framework.
+
+**Rationale:**
+- Industry standard for high-performance Go applications
+- Excellent structured logging with field support
+- Very low overhead (designed for high-throughput systems)
+- JSON output for production, human-readable for development
+- Strong ecosystem integration
+- Actively maintained by Uber
+
+## Consequences
+
+### Positive
+- High performance (low latency, high throughput)
+- Rich structured logging with fields
+- Easy integration with observability tools
+- Configurable output formats (JSON/console)
+
+### Negative
+- Slightly more verbose API than standard library
+- Requires wrapping for common use cases (we'll abstract via interface)
+
+### Implementation Notes
+- Install: `go get go.uber.org/zap@v1.26.0`
+- Create `pkg/logger/logger.go` interface to abstract zap
+- Implement `internal/logger/zap_logger.go` as concrete implementation
+- Use JSON encoder for production, console encoder for development
+- Support request-scoped fields via context
+- Export global logger via `pkg/logger` package
+
--- a/docs/content/adr/0006-http-framework.md
+++ b/docs/content/adr/0006-http-framework.md
@@ -0,0 +1,50 @@
+# ADR-0006: HTTP Framework
+
+## Status
+Accepted
+
+## Context
+The platform needs an HTTP framework for:
+- REST API endpoints
+- Middleware support (auth, logging, metrics)
+- Request/response handling
+- Route registration from modules
+- Integration with observability tools
+
+Options considered:
+1. **gin-gonic/gin** - Fast, feature-rich HTTP web framework
+2. **gorilla/mux** - Lightweight router
+3. **go-chi/chi** - Lightweight, idiomatic router
+4. **net/http** (standard library) - No external dependency
+
+## Decision
+Use **gin-gonic/gin** (v1.9.1+) as the HTTP framework.
+
+**Rationale:**
+- Fast performance (comparable to net/http)
+- Rich middleware ecosystem
+- Excellent for REST APIs
+- Easy route grouping (useful for modules)
+- Good OpenTelemetry integration support
+- Widely used and well-documented
+- Recommended in playbook-golang.md
+
+## Consequences
+
+### Positive
+- High performance
+- Easy middleware chaining
+- Route grouping supports module architecture
+- Good ecosystem support
+
+### Negative
+- Additional dependency (though lightweight)
+- Slight learning curve for developers unfamiliar with Gin
+
+### Implementation Notes
+- Install: `go get github.com/gin-gonic/gin@v1.9.1`
+- Create router in `internal/server/server.go`
+- Use route groups for module isolation: `r.Group("/api/v1/blog")`
+- Add middleware stack: logging, recovery, metrics, auth (later)
+- Support graceful shutdown via fx lifecycle
+
--- a/docs/content/adr/0007-project-directory-structure.md
+++ b/docs/content/adr/0007-project-directory-structure.md
@@ -0,0 +1,82 @@
+# ADR-0007: Project Directory Structure
+
+## Status
+Accepted
+
+## Context
+The project needs a clear, scalable directory structure that:
+- Follows Go best practices
+- Separates public interfaces from implementations
+- Supports modular architecture
+- Is maintainable and discoverable
+- Aligns with Go community standards
+
+## Decision
+Adopt a **standard Go project layout** with **internal/** and **pkg/** separation:
+
+```
+goplt/
+├── cmd/
+│   └── platform/          # Application entry point
+├── internal/              # Private implementation code
+│   ├── di/                # Dependency injection
+│   ├── registry/          # Module registry
+│   ├── pluginloader/      # Plugin loader (optional)
+│   ├── config/            # Config implementation
+│   ├── logger/            # Logger implementation
+│   └── infra/             # Infrastructure adapters
+├── pkg/                   # Public interfaces (exported)
+│   ├── config/            # ConfigProvider interface
+│   ├── logger/            # Logger interface
+│   ├── module/            # IModule interface
+│   ├── auth/              # Auth interfaces (Phase 2)
+│   ├── perm/              # Permission DSL (Phase 2)
+│   └── infra/             # Infrastructure interfaces
+├── modules/               # Feature modules
+│   └── blog/              # Sample module (Phase 4)
+├── config/                # Configuration files
+│   ├── default.yaml
+│   ├── development.yaml
+│   └── production.yaml
+├── api/                   # OpenAPI specs
+├── scripts/               # Build/test scripts
+├── docs/                  # Documentation
+│   └── adr/              # Architecture Decision Records
+├── ops/                   # Operations (Grafana dashboards, etc.)
+├── .github/
+│   └── workflows/
+│       └── ci.yml
+├── Dockerfile
+├── docker-compose.yml
+├── docker-compose.test.yml
+└── go.mod
+```
+
+**Rationale:**
+- `internal/` prevents external packages from importing implementation details
+- `pkg/` exposes only interfaces that modules need
+- `cmd/` follows Go standard for application entry points
+- `modules/` clearly separates feature modules
+- `config/` centralizes configuration files
+- Separates concerns and supports clean architecture
+
+## Consequences
+
+### Positive
+- Clear separation of concerns
+- Prevents circular dependencies
+- Easy to navigate and understand
+- Aligns with Go community standards
+- Supports modular architecture
+
+### Negative
+- Slightly more directories than minimal structure
+- Requires discipline to maintain boundaries
+
+### Implementation Notes
+- Initialize with `go mod init git.dcentral.systems/toolz/goplt`
+- Create all directories upfront in Phase 0
+- Document structure in `README.md`
+- Enforce boundaries via `internal/` package visibility
+- Use `go build ./...` to verify structure
+
--- a/docs/content/adr/0008-error-handling-strategy.md
+++ b/docs/content/adr/0008-error-handling-strategy.md
@@ -0,0 +1,57 @@
+# ADR-0008: Error Handling Strategy
+
+## Status
+Accepted
+
+## Context
+Go's error handling philosophy requires explicit error checking. We need a consistent approach for:
+- Error creation and wrapping
+- Error propagation
+- Error classification (domain vs infrastructure)
+- Error reporting (logging, monitoring)
+- HTTP error responses
+
+## Decision
+Adopt a **wrapped error pattern** with **structured error types**:
+
+1. **Error Wrapping**: Use `fmt.Errorf("context: %w", err)` for error wrapping
+2. **Error Types**: Define custom error types for domain errors
+3. **Error Classification**: Distinguish between:
+   - Domain errors (business logic failures)
+   - Infrastructure errors (external system failures)
+   - Validation errors (input validation failures)
+4. **Error Context**: Always wrap errors with context about where they occurred
+
+**Rationale:**
+- Follows Go 1.13+ error wrapping best practices
+- Enables error inspection with `errors.Is()` and `errors.As()`
+- Maintains error chains for debugging
+- Allows structured error handling
+
+## Consequences
+
+### Positive
+- Full error traceability through call stack
+- Can inspect and handle specific error types
+- Better debugging with error context
+- Aligns with Go best practices
+
+### Negative
+- Requires discipline to wrap errors consistently
+- Can be verbose in some cases
+
+### Implementation Notes
+- Always wrap errors: `return nil, fmt.Errorf("failed to load config: %w", err)`
+- Create error types for domain errors:
+  ```go
+  type ConfigError struct {
+      Key   string
+      Cause error
+  }
+  func (e *ConfigError) Error() string { ... }
+  func (e *ConfigError) Unwrap() error { return e.Cause }
+  ```
+- Use `errors.Is()` and `errors.As()` for error checking
+- Log errors with context before returning
+- Map domain errors to HTTP status codes in handlers
+
--- a/docs/content/adr/0009-context-key-types.md
+++ b/docs/content/adr/0009-context-key-types.md
@@ -0,0 +1,56 @@
+# ADR-0009: Context Key Types
+
+## Status
+Accepted
+
+## Context
+The platform will use `context.Context` to propagate request-scoped values such as:
+- User ID (from authentication)
+- Request ID (for tracing)
+- Tenant ID (for multi-tenancy)
+- Logger instance (with request-scoped fields)
+
+Go best practices recommend using typed keys instead of string keys to avoid collisions.
+
+## Decision
+Use **typed context keys** for all context values:
+
+```go
+type contextKey string
+
+const (
+    userIDKey contextKey = "user_id"
+    requestIDKey contextKey = "request_id"
+    tenantIDKey contextKey = "tenant_id"
+    loggerKey contextKey = "logger"
+)
+```
+
+**Rationale:**
+- Prevents key collisions between packages
+- Type-safe access to context values
+- Aligns with Go best practices (see `context.WithValue` documentation)
+- Makes context usage explicit and discoverable
+
+## Consequences
+
+### Positive
+- Type-safe context access
+- Prevents accidental key collisions
+- Clear intent in code
+- Better IDE support
+
+### Negative
+- Slightly more verbose than string keys
+- Requires defining keys upfront
+
+### Implementation Notes
+- Create `pkg/context/keys.go` with all context key definitions
+- Provide helper functions for setting/getting values:
+  ```go
+  func WithUserID(ctx context.Context, userID string) context.Context
+  func UserIDFromContext(ctx context.Context) (string, bool)
+  ```
+- Use in middleware and services
+- Document all context keys and their usage
+
--- a/docs/content/adr/0010-ci-cd-platform.md
+++ b/docs/content/adr/0010-ci-cd-platform.md
@@ -0,0 +1,50 @@
+# ADR-0010: CI/CD Platform
+
+## Status
+Accepted
+
+## Context
+The platform needs a CI/CD system for:
+- Automated testing on pull requests
+- Code quality checks (linting, formatting)
+- Building binaries and Docker images
+- Publishing artifacts
+- Running integration tests
+
+Options considered:
+1. **GitHub Actions** - Native GitHub integration
+2. **GitLab CI** - If using GitLab
+3. **Jenkins** - Self-hosted option
+4. **CircleCI** - Cloud-based CI/CD
+
+## Decision
+Use **GitHub Actions** for CI/CD pipeline.
+
+**Rationale:**
+- Native integration with GitHub repositories
+- Free for public repos, reasonable for private
+- Rich ecosystem of actions
+- Easy to configure with YAML
+- Good documentation and community support
+- Recommended in playbook-golang.md
+
+## Consequences
+
+### Positive
+- Easy setup and configuration
+- Good GitHub integration
+- Large action marketplace
+- Free for public repositories
+
+### Negative
+- Tied to GitHub (if migrating Git hosts, need to migrate CI)
+- Limited customization compared to self-hosted solutions
+
+### Implementation Notes
+- Create `.github/workflows/ci.yml`
+- Use `actions/setup-go@v5` for Go setup
+- Configure caching for Go modules
+- Run: linting, unit tests, integration tests, build
+- Use `actions/cache@v4` for module caching
+- Add build matrix if needed for multiple Go versions (future)
+
--- a/docs/content/adr/0011-code-generation-tools.md
+++ b/docs/content/adr/0011-code-generation-tools.md
@@ -0,0 +1,53 @@
+# ADR-0011: Code Generation Tools
+
+## Status
+Accepted
+
+## Context
+The platform will use code generation for:
+- Permission constants from module manifests
+- Ent ORM code generation
+- Mock generation for testing
+- OpenAPI client/server code (future)
+
+We need to decide on tooling and workflow.
+
+## Decision
+Use **standard Go generation tools** with `go generate`:
+
+1. **Ent ORM**: `entgo.io/ent/cmd/ent` for schema code generation
+2. **Mocks**: `github.com/vektra/mockery/v2` or `github.com/golang/mock/mockgen`
+3. **Permissions**: Custom `scripts/generate-permissions.go`
+4. **OpenAPI**: `github.com/deepmap/oapi-codegen` (future)
+
+**Workflow:**
+- Use `//go:generate` directives in source files
+- Run `go generate ./...` before commits
+- Document in `Makefile` with `make generate` target
+- CI should verify generated code is up-to-date
+
+**Rationale:**
+- Standard Go tooling, well-supported
+- `go generate` is the idiomatic way to run code generation
+- Easy to integrate into CI/CD
+- Reduces manual code maintenance
+
+## Consequences
+
+### Positive
+- Automated code generation reduces errors
+- Consistent code style
+- Easy to maintain
+- Standard Go workflow
+
+### Negative
+- Requires developers to run generation before commits
+- Generated code must be committed (or verified in CI)
+- Slight learning curve for new developers
+
+### Implementation Notes
+- Add `//go:generate` directives where needed
+- Create `Makefile` target: `make generate`
+- Add CI step to verify generated code: `go generate ./... && git diff --exit-code`
+- Document in `CONTRIBUTING.md`
+
--- a/docs/content/adr/0012-logger-interface-design.md
+++ b/docs/content/adr/0012-logger-interface-design.md
@@ -0,0 +1,62 @@
+# ADR-0012: Logger Interface Design
+
+## Status
+Accepted
+
+## Context
+We're using zap for logging, but want to abstract it behind an interface for:
+- Testability (mock logger in tests)
+- Flexibility (could swap implementations)
+- Module compatibility (modules use interface, not concrete type)
+
+We need to decide on the interface design.
+
+## Decision
+Create a **simple logger interface** that matches zap's API pattern but uses generic types:
+
+```go
+type Field interface {
+    // Field represents a key-value pair for structured logging
+}
+
+type Logger interface {
+    Debug(msg string, fields ...Field)
+    Info(msg string, fields ...Field)
+    Warn(msg string, fields ...Field)
+    Error(msg string, fields ...Field)
+    With(fields ...Field) Logger
+}
+```
+
+**Implementation:**
+- Use `zap.Field` as the Field type (no abstraction needed for now)
+- Provide helper functions in `pkg/logger` for creating fields:
+  ```go
+  func String(key, value string) Field
+  func Int(key string, value int) Field
+  func Error(err error) Field
+  ```
+
+**Rationale:**
+- Simple interface that modules can depend on
+- Matches zap's usage patterns
+- Easy to test with mock implementations
+- Allows future swap if needed (though unlikely)
+
+## Consequences
+
+### Positive
+- Clean abstraction for modules
+- Testable with mocks
+- Simple API for modules to use
+
+### Negative
+- Slight indirection overhead
+- Need to maintain interface compatibility
+
+### Implementation Notes
+- Define interface in `pkg/logger/logger.go`
+- Implement in `internal/logger/zap_logger.go`
+- Export helper functions in `pkg/logger/fields.go`
+- Modules import `pkg/logger`, not `internal/logger`
+
--- a/docs/content/adr/0013-database-orm.md
+++ b/docs/content/adr/0013-database-orm.md
@@ -0,0 +1,54 @@
+# ADR-0013: Database ORM Selection
+
+## Status
+Accepted
+
+## Context
+The platform needs a database ORM/library that:
+- Supports PostgreSQL (primary database)
+- Provides type-safe query building
+- Supports code generation (reduces boilerplate)
+- Handles migrations
+- Supports relationships (many-to-many, etc.)
+- Integrates with Ent (code generation)
+
+Options considered:
+1. **entgo.io/ent** - Code-generated, type-safe ORM
+2. **gorm.io/gorm** - Feature-rich ORM with reflection
+3. **sqlx** - Lightweight wrapper around database/sql
+4. **Standard library database/sql** - No ORM, raw SQL
+
+## Decision
+Use **entgo.io/ent** as the primary ORM for the platform.
+
+**Rationale:**
+- Code generation provides compile-time type safety
+- Excellent schema definition and migration support
+- Strong relationship modeling
+- Good performance (no reflection at runtime)
+- Active development and good documentation
+- Recommended in playbook-golang.md
+- Easy to integrate with OpenTelemetry
+
+## Consequences
+
+### Positive
+- Type-safe queries eliminate runtime errors
+- Schema changes are explicit and versioned
+- Code generation reduces boilerplate
+- Good migration support
+- Strong relationship support
+
+### Negative
+- Requires code generation step (`go generate`)
+- Learning curve for developers unfamiliar with Ent
+- Less flexible than raw SQL for complex queries
+- Generated code must be committed or verified in CI
+
+### Implementation Notes
+- Install: `go get entgo.io/ent/cmd/ent`
+- Initialize schema: `go run entgo.io/ent/cmd/ent init User Role Permission`
+- Use `//go:generate` directives for code generation
+- Run migrations on startup via `client.Schema.Create()`
+- Create wrapper in `internal/infra/database/client.go` for DI injection
+
--- a/docs/content/adr/0014-health-check-implementation.md
+++ b/docs/content/adr/0014-health-check-implementation.md
@@ -0,0 +1,52 @@
+# ADR-0014: Health Check Implementation
+
+## Status
+Accepted
+
+## Context
+The platform needs health check endpoints for:
+- Kubernetes liveness probes (`/healthz`)
+- Kubernetes readiness probes (`/ready`)
+- Monitoring and alerting
+- Load balancer health checks
+
+Health checks should be:
+- Fast and lightweight
+- Check critical dependencies (database, cache, etc.)
+- Provide clear status indicators
+
+## Decision
+Implement **custom health check registry** with composable checkers:
+
+1. **Liveness endpoint** (`/healthz`): Always returns 200 if process is running
+2. **Readiness endpoint** (`/ready`): Checks all registered health checkers
+3. **Health check interface**: `type HealthChecker interface { Check(ctx context.Context) error }`
+4. **Registry pattern**: Modules can register additional health checkers
+
+**Rationale:**
+- Custom implementation gives full control
+- Composable design allows modules to add checks
+- Simple interface is easy to test
+- No external dependency for basic functionality
+- Can extend with Prometheus metrics later
+
+## Consequences
+
+### Positive
+- Lightweight and fast
+- Extensible by modules
+- Easy to test
+- Clear separation of liveness vs readiness
+
+### Negative
+- Need to implement ourselves (though simple)
+- Must maintain the registry
+
+### Implementation Notes
+- Create `pkg/health/health.go` interface
+- Implement `internal/health/registry.go` with checker map
+- Register core checkers: database, cache (if enabled)
+- Add endpoints to HTTP router
+- Return JSON response: `{"status": "ok", "checks": {...}}`
+- Consider timeout (e.g., 5 seconds) for readiness checks
+
--- a/docs/content/adr/0015-error-bus-implementation.md
+++ b/docs/content/adr/0015-error-bus-implementation.md
@@ -0,0 +1,55 @@
+# ADR-0015: Error Bus Implementation
+
+## Status
+Accepted
+
+## Context
+The platform needs a centralized error handling mechanism for:
+- Capturing panics and errors
+- Logging errors consistently
+- Sending errors to external services (Sentry, etc.)
+- Avoiding error handling duplication
+
+Options considered:
+1. **Channel-based in-process bus** - Simple, Go-idiomatic
+2. **Event bus integration** - Use existing event bus
+3. **Direct logging** - No bus, direct integration
+4. **External service integration** - Direct to Sentry
+
+## Decision
+Implement a **channel-based error bus** with pluggable sinks:
+
+1. **Error bus interface**: `type ErrorPublisher interface { Publish(err error) }`
+2. **Channel-based implementation**: Background goroutine consumes errors from channel
+3. **Pluggable sinks**: Logger (always), Sentry (optional, Phase 6)
+4. **Panic recovery middleware**: Automatically publishes panics to error bus
+
+**Rationale:**
+- Simple, idiomatic Go pattern
+- Non-blocking error publishing (buffered channel)
+- Decouples error capture from error handling
+- Easy to add new sinks (Sentry, logging, metrics)
+- Can be extended to use event bus later if needed
+
+## Consequences
+
+### Positive
+- Centralized error handling
+- Non-blocking (doesn't slow down request path)
+- Easy to extend with new sinks
+- Consistent error handling across the platform
+
+### Negative
+- Additional goroutine overhead (minimal)
+- Must ensure error bus doesn't become bottleneck
+
+### Implementation Notes
+- Create `pkg/errorbus/errorbus.go` interface
+- Implement `internal/errorbus/channel_bus.go`:
+  - Buffered channel (e.g., size 100)
+  - Background goroutine consumes errors
+  - Multiple sinks (logger, optional Sentry)
+- Add panic recovery middleware that publishes to bus
+- Register in DI container as singleton
+- Monitor channel size to detect error storms
+
--- a/docs/content/adr/0016-opentelemetry-observability.md
+++ b/docs/content/adr/0016-opentelemetry-observability.md
@@ -0,0 +1,56 @@
+# ADR-0016: OpenTelemetry Observability Strategy
+
+## Status
+Accepted
+
+## Context
+The platform needs distributed tracing and observability for:
+- Request tracing across services/modules
+- Performance monitoring
+- Debugging production issues
+- Integration with observability tools (Jaeger, Grafana, etc.)
+
+Options considered:
+1. **OpenTelemetry** - Industry standard, vendor-neutral
+2. **Zipkin** - Older standard, less ecosystem support
+3. **Custom tracing** - Build our own
+4. **No tracing** - Only logs and metrics
+
+## Decision
+Use **OpenTelemetry (OTEL)** for all observability:
+
+1. **Tracing**: Distributed tracing with spans
+2. **Metrics**: Prometheus-compatible metrics
+3. **Logs**: Structured logs with trace correlation
+4. **Export**: OTLP collector for production, stdout for development
+
+**Rationale:**
+- Industry standard, vendor-neutral
+- Excellent Go SDK support
+- Integrates with major observability tools
+- Supports metrics, traces, and logs
+- Recommended in playbook-golang.md
+- Future-proof (not locked to specific vendor)
+
+## Consequences
+
+### Positive
+- Vendor-neutral (can switch backends)
+- Rich ecosystem and tooling
+- Excellent Go SDK
+- Supports all observability signals
+
+### Negative
+- Learning curve for OpenTelemetry concepts
+- Slight overhead (minimal with sampling)
+- Requires OTLP collector or compatible backend
+
+### Implementation Notes
+- Install: `go.opentelemetry.io/otel` and contrib packages
+- Initialize TracerProvider in `internal/observability/tracer.go`
+- Use HTTP instrumentation middleware: `otelhttp.NewHandler()`
+- Add database instrumentation via Ent interceptor
+- Export to stdout for development, OTLP for production
+- Include trace ID in structured logs
+- Configure sampling for production (e.g., 10% or adaptive)
+
--- a/docs/content/adr/0017-jwt-token-strategy.md
+++ b/docs/content/adr/0017-jwt-token-strategy.md
@@ -0,0 +1,55 @@
+# ADR-0017: JWT Token Strategy
+
+## Status
+Accepted
+
+## Context
+The platform needs authentication tokens that:
+- Are stateless (no server-side session storage)
+- Support role and permission claims
+- Can be revoked (challenge)
+- Have appropriate lifetimes
+- Support multi-tenancy (tenant ID in claims)
+
+Token strategies considered:
+1. **Short-lived access tokens + long-lived refresh tokens** - Industry standard
+2. **Single long-lived tokens** - Simple but insecure
+3. **Short-lived tokens only** - Secure but poor UX
+4. **Session-based** - Stateful, requires storage
+
+## Decision
+Use **short-lived access tokens + long-lived refresh tokens**:
+
+1. **Access tokens**: 15 minutes lifetime, contain user ID, roles, tenant ID
+2. **Refresh tokens**: 7 days lifetime, stored in database (for revocation)
+3. **Token format**: JWT with claims: `sub` (user ID), `roles`, `tenant_id`, `exp`
+4. **Revocation**: Refresh tokens stored in DB, can be revoked/deleted
+
+**Rationale:**
+- Industry best practice (OAuth2/OIDC pattern)
+- Good balance of security and UX
+- Access tokens can't be revoked (short lifetime mitigates risk)
+- Refresh tokens can be revoked (stored in DB)
+- Supports stateless authentication for most requests
+
+## Consequences
+
+### Positive
+- Secure (short access token lifetime)
+- Good UX (refresh tokens prevent frequent re-login)
+- Stateless for most requests (access tokens)
+- Supports revocation (refresh tokens)
+
+### Negative
+- Requires refresh token storage (DB table)
+- More complex than single token
+- Need to handle token refresh flow
+
+### Implementation Notes
+- Use `github.com/golang-jwt/jwt/v5` for JWT handling
+- Store refresh tokens in `refresh_tokens` table (user_id, token_hash, expires_at)
+- Generate access tokens with HS256 or RS256 signing
+- Include roles in token claims (not just role IDs)
+- Validate token signature and expiration on each request
+- Refresh endpoint validates refresh token and issues new access token
+
--- a/docs/content/adr/0018-password-hashing.md
+++ b/docs/content/adr/0018-password-hashing.md
@@ -0,0 +1,53 @@
+# ADR-0018: Password Hashing Algorithm
+
+## Status
+Accepted
+
+## Context
+The platform needs to securely store user passwords. Requirements:
+- Resist brute-force attacks
+- Resist rainbow table attacks
+- Future-proof against advances in computing
+- Reasonable performance (not too slow)
+
+Options considered:
+1. **bcrypt** - Battle-tested, widely used
+2. **argon2id** - Modern, memory-hard, recommended by OWASP
+3. **scrypt** - Memory-hard, good alternative
+4. **PBKDF2** - Older standard, less secure
+
+## Decision
+Use **argon2id** for password hashing with recommended parameters:
+
+- **Algorithm**: argon2id (variant)
+- **Memory**: 64 MB (65536 KB)
+- **Iterations**: 3 (time cost)
+- **Parallelism**: 4 (number of threads)
+- **Salt length**: 16 bytes (random, unique per password)
+
+**Rationale:**
+- Recommended by OWASP for new applications
+- Memory-hard algorithm (resistant to GPU/ASIC attacks)
+- Good balance of security and performance
+- Future-proof design
+- Standard library support in Go 1.23+
+
+## Consequences
+
+### Positive
+- Strong security guarantees
+- Memory-hard (resistant to hardware attacks)
+- OWASP recommended
+- Standard library support
+
+### Negative
+- Slightly slower than bcrypt (acceptable trade-off)
+- Requires tuning parameters for production
+
+### Implementation Notes
+- Use `golang.org/x/crypto/argon2` package
+- Store hash in format: `$argon2id$v=19$m=65536,t=3,p=4$salt$hash`
+- Use `crypto/rand` for salt generation
+- Verify passwords with `argon2.CompareHashAndPassword()`
+- Consider increasing parameters for high-security environments
+
--- a/docs/content/adr/0019-permission-dsl-format.md
+++ b/docs/content/adr/0019-permission-dsl-format.md
@@ -0,0 +1,57 @@
+# ADR-0019: Permission DSL Format
+
+## Status
+Accepted
+
+## Context
+The platform needs a permission system that:
+- Is extensible by modules
+- Prevents typos and errors (compile-time safety)
+- Supports hierarchical permissions
+- Is easy to understand and use
+
+Permission formats considered:
+1. **String format**: `"module.resource.action"` - Simple, flexible
+2. **Enum/Constants**: Type-safe but less flexible
+3. **Hierarchical tree**: Complex but powerful
+4. **Bitmask**: Efficient but hard to read
+
+## Decision
+Use **string-based permission format** with **code-generated constants**:
+
+1. **Format**: `"{module}.{resource}.{action}"`
+   - Examples: `blog.post.create`, `user.read`, `system.health.check`
+2. **Code generation**: Generate constants from `module.yaml` files
+3. **Type safety**: `type Permission string` with generated constants
+4. **Validation**: Compile-time constants prevent typos
+
+**Rationale:**
+- Simple and readable
+- Easy to extend (modules define in manifest)
+- Code generation provides compile-time safety
+- Flexible (modules can define any format)
+- Hierarchical structure is intuitive
+- Easy to parse and match
+
+## Consequences
+
+### Positive
+- Simple and intuitive format
+- Compile-time safety via code generation
+- Easy to extend by modules
+- Human-readable
+- Flexible for various permission models
+
+### Negative
+- String comparisons (minimal performance impact)
+- Requires code generation step
+- Potential for permission string conflicts (mitigated by module prefix)
+
+### Implementation Notes
+- Define `type Permission string` in `pkg/perm/perm.go`
+- Create code generator: `scripts/generate-permissions.go`
+- Scan `modules/*/module.yaml` for permissions
+- Generate constants in `pkg/perm/generated.go`
+- Use `//go:generate` directive
+- Validate format: `^[a-z0-9]+(\.[a-z0-9]+)*$` (lowercase, dots)
+
--- a/docs/content/adr/0020-audit-logging-storage.md
+++ b/docs/content/adr/0020-audit-logging-storage.md
@@ -0,0 +1,63 @@
+# ADR-0020: Audit Logging Storage
+
+## Status
+Accepted
+
+## Context
+The platform needs to audit all security-relevant actions:
+- User logins and authentication attempts
+- Permission changes
+- Data modifications
+- Administrative actions
+
+Audit logs must be:
+- Immutable (append-only)
+- Queryable
+- Performant (don't slow down operations)
+- Compliant with audit requirements
+
+Storage options considered:
+1. **PostgreSQL table** - Simple, queryable, transactional
+2. **Elasticsearch** - Excellent for searching, but additional dependency
+3. **File-based logs** - Simple but hard to query
+4. **External audit service** - Overkill for initial version
+
+## Decision
+Store audit logs in **PostgreSQL append-only table** with JSON metadata:
+
+1. **Table structure**: `audit_logs` with columns:
+   - `id`, `actor_id`, `action`, `target_id`, `metadata` (JSONB), `timestamp`
+2. **Append-only**: No UPDATE or DELETE operations
+3. **JSON metadata**: Flexible storage for additional context
+4. **Indexing**: Index on `actor_id`, `action`, `timestamp` for queries
+
+**Rationale:**
+- Simple (no additional infrastructure)
+- Queryable via SQL
+- Transactional (consistent with other data)
+- JSONB provides flexibility for metadata
+- Can migrate to Elasticsearch later if needed
+- Good performance for typical audit volumes
+
+## Consequences
+
+### Positive
+- Simple implementation
+- Queryable via SQL
+- No additional infrastructure
+- Transactional consistency
+- Can archive old logs if needed
+
+### Negative
+- Adds load to primary database
+- May need archiving strategy for large volumes
+- Less powerful search than Elasticsearch
+
+### Implementation Notes
+- Create `audit_logs` table via Ent schema
+- Use JSONB for metadata column (PostgreSQL-specific)
+- Add indexes: `(actor_id, timestamp)`, `(action, timestamp)`
+- Implement async logging (optional, via channel) for high throughput
+- Consider partitioning by date for large volumes
+- Add retention policy (e.g., archive after 1 year)
+
--- a/docs/content/adr/0021-module-loading-strategy.md
+++ b/docs/content/adr/0021-module-loading-strategy.md
@@ -0,0 +1,54 @@
+# ADR-0021: Module Loading Strategy
+
+## Status
+Accepted
+
+## Context
+The platform needs to support pluggable modules. Two approaches:
+1. **Static registration** - Modules compiled into binary
+2. **Dynamic plugin loading** - Load `.so` files at runtime
+
+Each has trade-offs for development, CI, and production.
+
+## Decision
+Support **both approaches** with **static registration as primary**:
+
+1. **Static registration (primary)**:
+   - Modules register via `init()` function
+   - Imported via `import _ "module/pkg"` in main
+   - Works everywhere (Windows, Linux, macOS)
+   - Compile-time type safety
+
+2. **Dynamic plugin loading (optional)**:
+   - Support via Go `plugin` package
+   - Load `.so` files from `./plugins/` directory
+   - Only for production scenarios requiring hot-swap
+   - Linux/macOS only (Go plugin limitation)
+
+**Rationale:**
+- Static registration is simpler and more reliable
+- Works in CI/CD (no plugin compilation needed)
+- Compile-time safety catches errors early
+- Dynamic loading provides flexibility for specific use cases
+- Modules can choose their approach
+
+## Consequences
+
+### Positive
+- Flexible: static for most cases, dynamic when needed
+- Static registration works everywhere
+- Compile-time safety with static
+- Hot-swap capability with dynamic (Linux/macOS)
+
+### Negative
+- Two code paths to maintain
+- Dynamic plugins have version compatibility constraints
+- Plugin debugging is harder
+
+### Implementation Notes
+- Implement static registry in `internal/registry/registry.go`
+- Modules register via: `registry.Register(Module)` in `init()`
+- Implement plugin loader in `internal/pluginloader/plugin_loader.go` (optional)
+- Document when to use each approach
+- Validate plugin version compatibility if using dynamic loading
+
--- a/docs/content/adr/0022-cache-implementation.md
+++ b/docs/content/adr/0022-cache-implementation.md
@@ -0,0 +1,56 @@
+# ADR-0022: Cache Implementation
+
+## Status
+Accepted
+
+## Context
+The platform needs caching for:
+- Performance optimization (reduce database load)
+- Frequently accessed data (user permissions, roles)
+- Session data (optional)
+- Query results
+
+Options considered:
+1. **Redis** - Industry standard, feature-rich
+2. **In-memory cache** - Simple, no external dependency
+3. **Memcached** - Simple, but less features than Redis
+4. **No cache** - Simplest, but poor performance at scale
+
+## Decision
+Use **Redis** as the primary cache with **in-memory fallback**:
+
+1. **Primary**: Redis for production
+2. **Fallback**: In-memory cache for development/testing
+3. **Interface abstraction**: `Cache` interface allows swapping implementations
+4. **Use cases**: Permission lookups, role assignments, query caching
+
+**Rationale:**
+- Industry standard, widely supported
+- Rich feature set (TTL, pub/sub, etc.)
+- Can be shared across instances (multi-instance deployments)
+- Good performance
+- Easy to abstract behind interface
+
+## Consequences
+
+### Positive
+- High performance
+- Shared across instances
+- Rich feature set
+- Easy to scale horizontally
+- Abstraction allows swapping implementations
+
+### Negative
+- Additional infrastructure dependency
+- Network latency (minimal with proper setup)
+- Need to handle Redis failures gracefully
+
+### Implementation Notes
+- Install: `github.com/redis/go-redis/v9`
+- Create `pkg/infra/cache/cache.go` interface
+- Implement `internal/infra/cache/redis_cache.go`
+- Implement `internal/infra/cache/memory_cache.go` for fallback
+- Use connection pooling
+- Handle Redis failures gracefully (fallback or error)
+- Configure TTLs appropriately (e.g., 5 minutes for permissions)
+
--- a/docs/content/adr/0023-event-bus-implementation.md
+++ b/docs/content/adr/0023-event-bus-implementation.md
@@ -0,0 +1,59 @@
+# ADR-0023: Event Bus Implementation
+
+## Status
+Accepted
+
+## Context
+The platform needs an event bus for:
+- Module-to-module communication
+- Decoupled event publishing
+- Event sourcing (optional, future)
+- Integration with external systems
+
+Options considered:
+1. **In-process channel-based bus** - Simple, for development/testing
+2. **Kafka** - Production-grade, scalable
+3. **RabbitMQ** - Alternative message broker
+4. **Redis pub/sub** - Simple but less reliable
+
+## Decision
+Support **dual implementation** with **in-process primary, Kafka for production**:
+
+1. **In-process bus (default)**:
+   - Channel-based implementation
+   - Used for development, testing, small deployments
+   - Simple, no external dependencies
+
+2. **Kafka bus (production)**:
+   - Full Kafka integration via `segmentio/kafka-go`
+   - Producer/consumer groups
+   - Configurable via environment (switch implementation)
+
+**Rationale:**
+- In-process bus is simple for development
+- Kafka provides production-grade reliability and scalability
+- Interface abstraction allows swapping
+- Modules don't need to know which implementation
+- Can start simple and scale up
+
+## Consequences
+
+### Positive
+- Simple for development (no Kafka needed)
+- Scalable for production (Kafka)
+- Flexible (can choose implementation)
+- Modules are decoupled from implementation
+
+### Negative
+- Two implementations to maintain
+- Need to ensure interface covers both use cases
+- Kafka adds infrastructure complexity
+
+### Implementation Notes
+- Create `pkg/eventbus/eventbus.go` interface
+- Implement `internal/infra/bus/inprocess_bus.go` (channel-based)
+- Implement `internal/infra/bus/kafka_bus.go` (Kafka)
+- Select implementation via config
+- Support both sync and async event publishing
+- Handle errors gracefully (retry, dead letter queue)
+
--- a/docs/content/adr/0024-job-scheduler.md
+++ b/docs/content/adr/0024-job-scheduler.md
@@ -0,0 +1,56 @@
+# ADR-0024: Background Job Scheduler
+
+## Status
+Accepted
+
+## Context
+The platform needs background job processing for:
+- Periodic tasks (cron jobs)
+- Asynchronous processing
+- Long-running operations
+- Retry logic for failed jobs
+
+Options considered:
+1. **asynq (Redis-based)** - Simple, feature-rich
+2. **cron + custom queue** - Build our own
+3. **Kafka consumers** - Use event bus
+4. **External service** - AWS SQS, etc.
+
+## Decision
+Use **asynq** (Redis-backed) for job scheduling:
+
+1. **Cron jobs**: `github.com/robfig/cron/v3` for periodic tasks
+2. **Job queue**: `github.com/hibiken/asynq` for async jobs
+3. **Storage**: Redis (shared with cache)
+4. **Features**: Retries, backoff, job status tracking
+
+**Rationale:**
+- Simple, Redis-backed (no new infrastructure)
+- Good Go library support
+- Built-in retry and backoff
+- Job status tracking
+- Easy to integrate
+- Can scale horizontally (multiple workers)
+
+## Consequences
+
+### Positive
+- Simple (uses existing Redis)
+- Feature-rich (retries, backoff)
+- Good performance
+- Easy to scale
+- Job status tracking
+
+### Negative
+- Tied to Redis (but we're already using it)
+- Requires Redis to be available
+
+### Implementation Notes
+- Install: `github.com/hibiken/asynq` and `github.com/robfig/cron/v3`
+- Create `pkg/scheduler/scheduler.go` interface
+- Implement `internal/infra/scheduler/asynq_scheduler.go`
+- Register jobs in `internal/infra/scheduler/job_registry.go`
+- Start worker in fx lifecycle
+- Configure retry policies (exponential backoff)
+- Add job monitoring endpoint
+
--- a/docs/content/adr/0025-multitenancy-model.md
+++ b/docs/content/adr/0025-multitenancy-model.md
@@ -0,0 +1,49 @@
+# ADR-0025: Multi-tenancy Model
+
+## Status
+Accepted
+
+## Context
+The platform may need multi-tenancy support for SaaS deployments. Options:
+1. **Shared database with tenant_id column** - Single DB, row-level isolation
+2. **Schema-per-tenant** - Single DB, separate schemas
+3. **Database-per-tenant** - Separate databases
+
+Each has trade-offs for isolation, performance, and operational complexity.
+
+## Decision
+Use **shared database with tenant_id column** (optional feature):
+
+1. **Model**: Single PostgreSQL database with `tenant_id` column on tenant-scoped tables
+2. **Isolation**: Row-level via Ent interceptors (automatic filtering)
+3. **Tenant resolution**: From header (`X-Tenant-ID`), subdomain, or JWT claim
+4. **Optional**: Can be disabled for single-tenant deployments
+
+**Rationale:**
+- Simplest operational model (single database)
+- Good performance (can index tenant_id)
+- Easy to implement (Ent interceptors)
+- Can migrate to schema-per-tenant later if needed
+- Flexible (can support both single and multi-tenant)
+
+## Consequences
+
+### Positive
+- Simple operations (single database)
+- Good performance with proper indexing
+- Easy to implement
+- Flexible (optional feature)
+
+### Negative
+- Requires careful query design (ensure tenant_id filtering)
+- Data isolation at application level (not database level)
+- Potential for data leakage if bugs occur
+
+### Implementation Notes
+- Make tenant_id optional (nullable) for single-tenant mode
+- Add Ent interceptor to automatically filter by tenant_id
+- Resolve tenant from context via middleware
+- Add tenant_id to JWT claims
+- Document tenant isolation guarantees
+- Consider adding tenant_id to all tenant-scoped tables
+
--- a/docs/content/adr/0026-error-reporting-service.md
+++ b/docs/content/adr/0026-error-reporting-service.md
@@ -0,0 +1,54 @@
+# ADR-0026: Error Reporting Service
+
+## Status
+Accepted
+
+## Context
+The platform needs error reporting for:
+- Production error tracking
+- Stack trace collection
+- Error aggregation and analysis
+- Integration with monitoring
+
+Options considered:
+1. **Sentry** - Popular, feature-rich
+2. **Rollbar** - Alternative error tracking
+3. **Custom solution** - Build our own
+4. **Logs only** - No external service
+
+## Decision
+Use **Sentry** for error reporting (optional, configurable):
+
+1. **Integration**: Via error bus sink
+2. **Configuration**: Sentry DSN from config
+3. **Context**: Include user ID, trace ID, module name
+4. **Optional**: Can be disabled for development
+
+**Rationale:**
+- Industry standard error tracking
+- Excellent Go SDK
+- Rich features (release tracking, grouping, etc.)
+- Good free tier
+- Easy to integrate
+
+## Consequences
+
+### Positive
+- Excellent error tracking
+- Rich context and grouping
+- Easy integration
+- Good free tier
+
+### Negative
+- External dependency
+- Additional cost at scale
+- Privacy considerations (data sent to Sentry)
+
+### Implementation Notes
+- Install: `github.com/getsentry/sentry-go`
+- Create Sentry sink for error bus
+- Configure via environment variable
+- Include context: user ID, trace ID, module name
+- Set up release tracking
+- Configure sampling for high-volume deployments
+
--- a/docs/content/adr/0027-rate-limiting-strategy.md
+++ b/docs/content/adr/0027-rate-limiting-strategy.md
@@ -0,0 +1,54 @@
+# ADR-0027: Rate Limiting Strategy
+
+## Status
+Accepted
+
+## Context
+The platform needs rate limiting to:
+- Prevent abuse and DoS attacks
+- Protect against brute-force attacks
+- Ensure fair resource usage
+- Comply with API usage policies
+
+Rate limiting strategies:
+1. **Per-user rate limiting** - Based on authenticated user
+2. **Per-IP rate limiting** - Based on client IP
+3. **Fixed rate limiting** - Global limits
+4. **Distributed rate limiting** - Shared state across instances
+
+## Decision
+Implement **multi-level rate limiting**:
+
+1. **Per-user rate limiting**: For authenticated requests (e.g., 100 req/min)
+2. **Per-IP rate limiting**: For all requests (e.g., 1000 req/min)
+3. **Storage**: Redis for distributed rate limiting
+4. **Algorithm**: Token bucket or sliding window
+
+**Rationale:**
+- Multi-level provides defense in depth
+- Per-user prevents abuse by authenticated users
+- Per-IP protects against unauthenticated abuse
+- Redis enables distributed rate limiting (multi-instance)
+- Token bucket provides smooth rate limiting
+
+## Consequences
+
+### Positive
+- Multi-layer protection
+- Works with multiple instances
+- Configurable per endpoint
+- Standard approach
+
+### Negative
+- Requires Redis (or shared state)
+- Additional latency (minimal)
+- Need to handle Redis failures gracefully
+
+### Implementation Notes
+- Use `github.com/ulule/limiter/v3` library
+- Configure limits in config file
+- Store rate limit state in Redis
+- Return `X-RateLimit-*` headers
+- Handle Redis failures gracefully (fail open or closed based on config)
+- Configure different limits for different endpoints
+
--- a/docs/content/adr/0028-testing-strategy.md
+++ b/docs/content/adr/0028-testing-strategy.md
@@ -0,0 +1,67 @@
+# ADR-0028: Testing Strategy
+
+## Status
+Accepted
+
+## Context
+The platform needs a comprehensive testing strategy:
+- Unit tests for individual components
+- Integration tests for full flows
+- Contract tests for API compatibility
+- Load tests for performance
+
+Testing tools and approaches vary in complexity and coverage.
+
+## Decision
+Adopt a **multi-layered testing approach**:
+
+1. **Unit tests**:
+   - Tool: Standard `testing` package + `testify`
+   - Coverage: >80% for core modules
+   - Mocks: `mockery` or `mockgen`
+   - Fast execution (< 1 second)
+
+2. **Integration tests**:
+   - Tool: `testcontainers-go` for Docker-based services
+   - Coverage: End-to-end flows (auth, modules, etc.)
+   - Infrastructure: PostgreSQL, Redis, Kafka via testcontainers
+   - Tagged: `//go:build integration`
+
+3. **Contract tests**:
+   - Tool: OpenAPI validator (`kin-openapi`)
+   - Coverage: API request/response validation
+   - Optional: Pact for service contracts
+
+4. **Load tests**:
+   - Tool: k6 or vegeta
+   - Coverage: Critical endpoints (auth, API)
+   - Performance benchmarks
+
+**Rationale:**
+- Comprehensive coverage across layers
+- Fast feedback with unit tests
+- Realistic testing with integration tests
+- API compatibility with contract tests
+- Performance validation with load tests
+
+## Consequences
+
+### Positive
+- High confidence in code quality
+- Fast unit tests for quick feedback
+- Realistic integration tests
+- API compatibility guaranteed
+
+### Negative
+- Integration tests are slower
+- Requires Docker for testcontainers
+- More complex CI setup
+
+### Implementation Notes
+- Use `testify` for assertions: `require` and `assert`
+- Generate mocks with `mockery` or `mockgen`
+- Create test helpers in `internal/testutil/`
+- Use test tags: `go test -tags=integration ./...`
+- Run integration tests in separate CI job
+- Document testing approach in `CONTRIBUTING.md`
+
--- a/docs/content/adr/README.md
+++ b/docs/content/adr/README.md
@@ -0,0 +1,86 @@
+# Architecture Decision Records (ADRs)
+
+This directory contains Architecture Decision Records (ADRs) for the Go Platform project.
+
+## What are ADRs?
+
+ADRs document important architectural decisions made during the project. They help:
+- Track why decisions were made
+- Understand the context and constraints
+- Review decisions when requirements change
+- Onboard new team members
+
+## ADR Format
+
+Each ADR follows this structure:
+- **Status**: Proposed | Accepted | Rejected | Superseded
+- **Context**: The situation that led to the decision
+- **Decision**: What was decided
+- **Consequences**: Positive and negative impacts
+
+## ADR Index
+
+### Phase 0: Project Setup & Foundation
+
+- [ADR-0001: Go Module Path](./0001-go-module-path.md) - Module path: `git.dcentral.systems/toolz/goplt`
+- [ADR-0002: Go Version](./0002-go-version.md) - Go 1.24.3
+- [ADR-0003: Dependency Injection Framework](./0003-dependency-injection-framework.md) - uber-go/fx
+- [ADR-0004: Configuration Management](./0004-configuration-management.md) - spf13/viper + cobra
+- [ADR-0005: Logging Framework](./0005-logging-framework.md) - go.uber.org/zap
+- [ADR-0006: HTTP Framework](./0006-http-framework.md) - gin-gonic/gin
+- [ADR-0007: Project Directory Structure](./0007-project-directory-structure.md) - Standard Go layout with internal/pkg separation
+- [ADR-0008: Error Handling Strategy](./0008-error-handling-strategy.md) - Wrapped errors with typed errors
+- [ADR-0009: Context Key Types](./0009-context-key-types.md) - Typed context keys
+- [ADR-0010: CI/CD Platform](./0010-ci-cd-platform.md) - GitHub Actions
+- [ADR-0011: Code Generation Tools](./0011-code-generation-tools.md) - go generate workflow
+- [ADR-0012: Logger Interface Design](./0012-logger-interface-design.md) - Logger interface abstraction
+
+### Phase 1: Core Kernel & Infrastructure
+
+- [ADR-0013: Database ORM Selection](./0013-database-orm.md) - entgo.io/ent
+- [ADR-0014: Health Check Implementation](./0014-health-check-implementation.md) - Custom health check registry
+- [ADR-0015: Error Bus Implementation](./0015-error-bus-implementation.md) - Channel-based error bus with pluggable sinks
+- [ADR-0016: OpenTelemetry Observability Strategy](./0016-opentelemetry-observability.md) - OpenTelemetry for tracing, metrics, logs
+
+### Phase 2: Authentication & Authorization
+
+- [ADR-0017: JWT Token Strategy](./0017-jwt-token-strategy.md) - Short-lived access tokens + long-lived refresh tokens
+- [ADR-0018: Password Hashing Algorithm](./0018-password-hashing.md) - argon2id
+- [ADR-0019: Permission DSL Format](./0019-permission-dsl-format.md) - String-based format with code generation
+- [ADR-0020: Audit Logging Storage](./0020-audit-logging-storage.md) - PostgreSQL append-only table with JSONB metadata
+
+### Phase 3: Module Framework
+
+- [ADR-0021: Module Loading Strategy](./0021-module-loading-strategy.md) - Static registration (primary) + dynamic plugin loading (optional)
+
+### Phase 5: Infrastructure Adapters
+
+- [ADR-0022: Cache Implementation](./0022-cache-implementation.md) - Redis with in-memory fallback
+- [ADR-0023: Event Bus Implementation](./0023-event-bus-implementation.md) - In-process bus (default) + Kafka (production)
+- [ADR-0024: Background Job Scheduler](./0024-job-scheduler.md) - asynq (Redis-backed) + cron
+- [ADR-0025: Multi-tenancy Model](./0025-multitenancy-model.md) - Shared database with tenant_id column (optional)
+
+### Phase 6: Observability & Production Readiness
+
+- [ADR-0026: Error Reporting Service](./0026-error-reporting-service.md) - Sentry (optional, configurable)
+- [ADR-0027: Rate Limiting Strategy](./0027-rate-limiting-strategy.md) - Multi-level (per-user + per-IP) with Redis
+
+### Phase 7: Testing, Documentation & CI/CD
+
+- [ADR-0028: Testing Strategy](./0028-testing-strategy.md) - Multi-layered (unit, integration, contract, load)
+
+## Adding New ADRs
+
+When making a new architectural decision:
+
+1. Create a new file: `XXXX-short-title.md` (next sequential number)
+2. Follow the ADR template
+3. Update this README with the new entry
+4. Set status to "Proposed" initially
+5. Update to "Accepted" after review/approval
+
+## References
+
+- [ADR Template](https://adr.github.io/madr/)
+- [Documenting Architecture Decisions](https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions)
+