docs: update

2025-09-24 21:12:22 +02:00
parent 921e2c7152
commit 921eec3848
6 changed files with 248 additions and 82 deletions
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@@ -25,9 +25,9 @@ The system architecture consists of several key components working together:
 - **Service Registry**: Track available services across the cluster

 ### Task Scheduler
- **Cooperative Multitasking**: Background task management system
- **Task Lifecycle Management**: Automatic task execution and monitoring
- **Resource Optimization**: Efficient task scheduling and execution
+- **Cooperative Multitasking**: Background task management system (`TaskManager`)
+- **Task Lifecycle Management**: Enable/disable tasks and set intervals at runtime
+- **Execution Model**: Tasks run in `Spore::loop()` when their interval elapses

 ### Node Context
 - **Central Context**: Shared resources and configuration
@@ -40,27 +40,30 @@ The cluster uses a UDP-based discovery protocol for automatic node detection:

 ### Discovery Process

-1. **Discovery Broadcast**: Nodes periodically send UDP packets on port 4210
-2. **Response Handling**: Nodes respond with their hostname and IP address
-3. **Member Management**: Discovered nodes are automatically added to the cluster
-4. **Health Monitoring**: Continuous status checking via HTTP API calls
+1. **Discovery Broadcast**: Nodes periodically send UDP packets on port `udp_port` (default 4210)
+2. **Response Handling**: Nodes respond with `CLUSTER_RESPONSE:<hostname>`
+3. **Member Management**: Discovered nodes are added/updated in the cluster
+4. **Node Info via UDP**: Heartbeat triggers peers to send `CLUSTER_NODE_INFO:<hostname>:<json>`

 ### Protocol Details

- **UDP Port**: 4210 (configurable)
+- **UDP Port**: 4210 (configurable via `Config.udp_port`)
 - **Discovery Message**: `CLUSTER_DISCOVERY`
 - **Response Message**: `CLUSTER_RESPONSE`
+- **Heartbeat Message**: `CLUSTER_HEARTBEAT`
+- **Node Info Message**: `CLUSTER_NODE_INFO:<hostname>:<json>`
 - **Broadcast Address**: 255.255.255.255
- **Discovery Interval**: 1 second (configurable)
- **Listen Interval**: 100ms (configurable)
+- **Discovery Interval**: `Config.discovery_interval_ms` (default 1000 ms)
+- **Listen Interval**: `Config.discovery_interval_ms / 10` (default 100 ms)
+- **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)

 ### Node Status Categories

 Nodes are automatically categorized by their activity:

- **ACTIVE**: Responding within 10 seconds
- **INACTIVE**: No response for 10-60 seconds  
- **DEAD**: No response for over 60 seconds
+- **ACTIVE**: lastSeen < `node_inactive_threshold_ms` (default 10s)
+- **INACTIVE**: < `node_dead_threshold_ms` (default 120s)
+- **DEAD**: ≥ `node_dead_threshold_ms`

 ## Task Scheduling System

@@ -68,14 +71,14 @@ The system runs several background tasks at different intervals:

 ### Core System Tasks

-| Task | Interval | Purpose |
-|------|----------|---------|
-| **Discovery Send** | 1 second | Send UDP discovery packets |
-| **Discovery Listen** | 100ms | Listen for discovery responses |
-| **Status Updates** | 1 second | Monitor cluster member health |
-| **Heartbeat** | 2 seconds | Maintain cluster connectivity |
-| **Member Info** | 10 seconds | Update detailed node information |
-| **Debug Output** | 5 seconds | Print cluster status |
+| Task | Interval (default) | Purpose |
+|------|--------------------|---------|
+| `discovery_send` | 1000 ms | Send UDP discovery packets |
+| `discovery_listen` | 100 ms | Listen for discovery/heartbeat/node-info |
+| `status_update` | 1000 ms | Update node status categories, purge dead |
+| `heartbeat` | 5000 ms | Broadcast heartbeat and update local resources |
+| `update_members_info` | 10000 ms | Reserved; no-op (info via UDP) |
+| `print_members` | 5000 ms | Log current member list |

 ### Task Management Features

@@ -112,10 +115,7 @@ ctx.fire("cluster_updated", &clusterData);

 ### Available Events

- **`node_discovered`**: New node added to cluster
- **`cluster_updated`**: Cluster membership changed
- **`resource_update`**: Node resources updated
- **`health_check`**: Node health status changed
+- **`node_discovered`**: New node added or local node refreshed

 ## Resource Monitoring

@@ -155,10 +155,8 @@ The system includes automatic WiFi fallback for robust operation:

 ### Configuration

- **SSID Format**: `SPORE_<MAC_LAST_4>`
- **Password**: Configurable fallback password
- **IP Range**: 192.168.4.x subnet
- **Gateway**: 192.168.4.1
+- **Hostname**: Derived from MAC (`esp-<mac>`) and assigned to `ctx.hostname`
+- **AP Mode**: If STA connection fails, device switches to AP mode with configured SSID/password

 ## Cluster Topology

@@ -170,32 +168,30 @@ The system includes automatic WiFi fallback for robust operation:

 ### Network Architecture

- **Mesh-like Structure**: Nodes can communicate with each other
- **Dynamic Routing**: Automatic path discovery between nodes
- **Load Distribution**: Tasks distributed across available nodes
- **Fault Tolerance**: Automatic failover and recovery
+- UDP broadcast-based discovery and heartbeats on local subnet
+- Optional HTTP polling (disabled by default; node info exchanged via UDP)

 ## Data Flow

 ### Node Discovery
 1. **UDP Broadcast**: Nodes broadcast discovery packets on port 4210
-2. **UDP Response**: Receiving nodes responds with hostname
+2. **UDP Response**: Receiving nodes respond with hostname
 3. **Registration**: Discovered nodes are added to local cluster member list

 ### Health Monitoring
-1. **Periodic Checks**: Cluster manager polls member nodes every 1 second
-2. **Status Collection**: Each node returns resource usage and health metrics
+1. **Periodic Checks**: Cluster manager updates node status categories
+2. **Status Collection**: Each node updates resources via UDP node-info messages

 ### Task Management
-1. **Scheduling**: TaskScheduler executes registered tasks at configured intervals
-2. **Execution**: Tasks run cooperatively, yielding control to other tasks
-3. **Monitoring**: Task status and results are exposed via REST API endpoints
+1. **Scheduling**: `TaskManager` executes registered tasks at configured intervals
+2. **Execution**: Tasks run cooperatively in the main loop without preemption
+3. **Monitoring**: Task status is exposed via REST (`/api/tasks/status`)

 ## Performance Characteristics

 ### Memory Usage

- **Base System**: ~15-20KB RAM
+- **Base System**: ~15-20KB RAM (device dependent)
 - **Per Task**: ~100-200 bytes per task
 - **Cluster Members**: ~50-100 bytes per member
 - **API Endpoints**: ~20-30 bytes per endpoint
@@ -219,7 +215,7 @@ The system includes automatic WiFi fallback for robust operation:
 ### Current Implementation

 - **Network Access**: Local network only (no internet exposure)
- **Authentication**: None currently implemented
+- **Authentication**: None currently implemented; LAN-only access assumed
 - **Data Validation**: Basic input validation
 - **Resource Limits**: Memory and processing constraints