docs: update

This commit is contained in:
2025-09-24 21:12:22 +02:00
parent 921e2c7152
commit 921eec3848
6 changed files with 248 additions and 82 deletions

View File

@@ -25,9 +25,9 @@ The system architecture consists of several key components working together:
- **Service Registry**: Track available services across the cluster
### Task Scheduler
- **Cooperative Multitasking**: Background task management system
- **Task Lifecycle Management**: Automatic task execution and monitoring
- **Resource Optimization**: Efficient task scheduling and execution
- **Cooperative Multitasking**: Background task management system (`TaskManager`)
- **Task Lifecycle Management**: Enable/disable tasks and set intervals at runtime
- **Execution Model**: Tasks run in `Spore::loop()` when their interval elapses
### Node Context
- **Central Context**: Shared resources and configuration
@@ -40,27 +40,30 @@ The cluster uses a UDP-based discovery protocol for automatic node detection:
### Discovery Process
1. **Discovery Broadcast**: Nodes periodically send UDP packets on port 4210
2. **Response Handling**: Nodes respond with their hostname and IP address
3. **Member Management**: Discovered nodes are automatically added to the cluster
4. **Health Monitoring**: Continuous status checking via HTTP API calls
1. **Discovery Broadcast**: Nodes periodically send UDP packets on port `udp_port` (default 4210)
2. **Response Handling**: Nodes respond with `CLUSTER_RESPONSE:<hostname>`
3. **Member Management**: Discovered nodes are added/updated in the cluster
4. **Node Info via UDP**: Heartbeat triggers peers to send `CLUSTER_NODE_INFO:<hostname>:<json>`
### Protocol Details
- **UDP Port**: 4210 (configurable)
- **UDP Port**: 4210 (configurable via `Config.udp_port`)
- **Discovery Message**: `CLUSTER_DISCOVERY`
- **Response Message**: `CLUSTER_RESPONSE`
- **Heartbeat Message**: `CLUSTER_HEARTBEAT`
- **Node Info Message**: `CLUSTER_NODE_INFO:<hostname>:<json>`
- **Broadcast Address**: 255.255.255.255
- **Discovery Interval**: 1 second (configurable)
- **Listen Interval**: 100ms (configurable)
- **Discovery Interval**: `Config.discovery_interval_ms` (default 1000 ms)
- **Listen Interval**: `Config.discovery_interval_ms / 10` (default 100 ms)
- **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)
### Node Status Categories
Nodes are automatically categorized by their activity:
- **ACTIVE**: Responding within 10 seconds
- **INACTIVE**: No response for 10-60 seconds
- **DEAD**: No response for over 60 seconds
- **ACTIVE**: lastSeen < `node_inactive_threshold_ms` (default 10s)
- **INACTIVE**: < `node_dead_threshold_ms` (default 120s)
- **DEAD**: `node_dead_threshold_ms`
## Task Scheduling System
@@ -68,14 +71,14 @@ The system runs several background tasks at different intervals:
### Core System Tasks
| Task | Interval | Purpose |
|------|----------|---------|
| **Discovery Send** | 1 second | Send UDP discovery packets |
| **Discovery Listen** | 100ms | Listen for discovery responses |
| **Status Updates** | 1 second | Monitor cluster member health |
| **Heartbeat** | 2 seconds | Maintain cluster connectivity |
| **Member Info** | 10 seconds | Update detailed node information |
| **Debug Output** | 5 seconds | Print cluster status |
| Task | Interval (default) | Purpose |
|------|--------------------|---------|
| `discovery_send` | 1000 ms | Send UDP discovery packets |
| `discovery_listen` | 100 ms | Listen for discovery/heartbeat/node-info |
| `status_update` | 1000 ms | Update node status categories, purge dead |
| `heartbeat` | 5000 ms | Broadcast heartbeat and update local resources |
| `update_members_info` | 10000 ms | Reserved; no-op (info via UDP) |
| `print_members` | 5000 ms | Log current member list |
### Task Management Features
@@ -112,10 +115,7 @@ ctx.fire("cluster_updated", &clusterData);
### Available Events
- **`node_discovered`**: New node added to cluster
- **`cluster_updated`**: Cluster membership changed
- **`resource_update`**: Node resources updated
- **`health_check`**: Node health status changed
- **`node_discovered`**: New node added or local node refreshed
## Resource Monitoring
@@ -155,10 +155,8 @@ The system includes automatic WiFi fallback for robust operation:
### Configuration
- **SSID Format**: `SPORE_<MAC_LAST_4>`
- **Password**: Configurable fallback password
- **IP Range**: 192.168.4.x subnet
- **Gateway**: 192.168.4.1
- **Hostname**: Derived from MAC (`esp-<mac>`) and assigned to `ctx.hostname`
- **AP Mode**: If STA connection fails, device switches to AP mode with configured SSID/password
## Cluster Topology
@@ -170,32 +168,30 @@ The system includes automatic WiFi fallback for robust operation:
### Network Architecture
- **Mesh-like Structure**: Nodes can communicate with each other
- **Dynamic Routing**: Automatic path discovery between nodes
- **Load Distribution**: Tasks distributed across available nodes
- **Fault Tolerance**: Automatic failover and recovery
- UDP broadcast-based discovery and heartbeats on local subnet
- Optional HTTP polling (disabled by default; node info exchanged via UDP)
## Data Flow
### Node Discovery
1. **UDP Broadcast**: Nodes broadcast discovery packets on port 4210
2. **UDP Response**: Receiving nodes responds with hostname
2. **UDP Response**: Receiving nodes respond with hostname
3. **Registration**: Discovered nodes are added to local cluster member list
### Health Monitoring
1. **Periodic Checks**: Cluster manager polls member nodes every 1 second
2. **Status Collection**: Each node returns resource usage and health metrics
1. **Periodic Checks**: Cluster manager updates node status categories
2. **Status Collection**: Each node updates resources via UDP node-info messages
### Task Management
1. **Scheduling**: TaskScheduler executes registered tasks at configured intervals
2. **Execution**: Tasks run cooperatively, yielding control to other tasks
3. **Monitoring**: Task status and results are exposed via REST API endpoints
1. **Scheduling**: `TaskManager` executes registered tasks at configured intervals
2. **Execution**: Tasks run cooperatively in the main loop without preemption
3. **Monitoring**: Task status is exposed via REST (`/api/tasks/status`)
## Performance Characteristics
### Memory Usage
- **Base System**: ~15-20KB RAM
- **Base System**: ~15-20KB RAM (device dependent)
- **Per Task**: ~100-200 bytes per task
- **Cluster Members**: ~50-100 bytes per member
- **API Endpoints**: ~20-30 bytes per endpoint
@@ -219,7 +215,7 @@ The system includes automatic WiFi fallback for robust operation:
### Current Implementation
- **Network Access**: Local network only (no internet exposure)
- **Authentication**: None currently implemented
- **Authentication**: None currently implemented; LAN-only access assumed
- **Data Validation**: Basic input validation
- **Resource Limits**: Memory and processing constraints