docs: update

2025-09-24 21:12:22 +02:00
parent 921e2c7152
commit 921eec3848
6 changed files with 248 additions and 82 deletions
--- a/docs/API.md
+++ b/docs/API.md
@@ -15,12 +15,18 @@ The SPORE system provides a comprehensive RESTful API for monitoring and control
 | Endpoint | Method | Description | Response |
 |----------|--------|-------------|----------|
-| `/api/node/status` | GET | System resource information and API endpoint registry | System metrics and API catalog |
+| `/api/node/status` | GET | System resource information | System metrics |
 | `/api/node/endpoints` | GET | API endpoints and parameters | Detailed endpoint specifications |
 | `/api/cluster/members` | GET | Cluster membership and node health information | Cluster topology and health status |
 | `/api/node/update` | POST | Handle firmware updates via OTA | Update progress and status |
 | `/api/node/restart` | POST | Trigger system restart | Restart confirmation |
 ### Monitoring API
 | Endpoint | Method | Description | Response |
 |----------|--------|-------------|----------|
 | `/api/monitoring/resources` | GET | CPU, memory, filesystem, and uptime | System resource metrics |
 ### Network Management API
 | Endpoint | Method | Description | Response |
@@ -140,7 +146,7 @@ Controls the execution state of individual tasks. Supports enabling, disabling,
 #### GET /api/node/status
-Returns comprehensive system resource information including memory usage, chip details, and a registry of all available API endpoints.
+Returns comprehensive system resource information including memory usage and chip details. For a list of available API endpoints, use `/api/node/endpoints`.
 **Response Fields:**
 - `freeHeap`: Available RAM in bytes
@@ -168,7 +174,7 @@ Returns comprehensive system resource information including memory usage, chip d
 #### GET /api/node/endpoints
-Returns detailed information about all available API endpoints, including their parameters, types, and validation rules.
+Returns detailed information about all available API endpoints, including their parameters, types, and validation rules. Methods are returned as strings (e.g., "GET", "POST").
 **Response Fields:**
 - `endpoints[]`: Array of endpoint capability objects
@@ -236,6 +242,54 @@ Initiates an over-the-air firmware update. The firmware file should be uploaded
 Triggers a system restart. The response will be sent before the restart occurs.
 ### Monitoring
 #### GET /api/monitoring/resources
 Returns real-time system resource metrics.
 Response Fields:
 - `cpu.current_usage`: Current CPU usage percent
 - `cpu.average_usage`: Average CPU usage percent
 - `cpu.max_usage`: Max observed CPU usage
 - `cpu.min_usage`: Min observed CPU usage
 - `cpu.measurement_count`: Number of measurements
 - `cpu.is_measuring`: Whether measurement is active
 - `memory.free_heap`: Free heap bytes
 - `memory.total_heap`: Total heap bytes (approximate)
 - `memory.heap_fragmentation`: Fragmentation percent (0 on ESP8266)
 - `filesystem.total_bytes`: LittleFS total bytes
 - `filesystem.used_bytes`: Used bytes
 - `filesystem.free_bytes`: Free bytes
 - `filesystem.usage_percent`: Usage percent
 - `system.uptime_ms`: Uptime in milliseconds
 Example Response:
 ```json
 {
  "cpu": {
    "current_usage": 3.5,
    "average_usage": 2.1,
    "max_usage": 15.2,
    "min_usage": 0.0,
    "measurement_count": 120,
    "is_measuring": true
  },
  "memory": {
    "free_heap": 48748,
    "total_heap": 81920,
    "heap_fragmentation": 0
  },
  "filesystem": {
    "total_bytes": 65536,
    "used_bytes": 10240,
    "free_bytes": 55296,
    "usage_percent": 15.6
  },
  "system": {
    "uptime_ms": 123456
  }
 }
 ```
 ### Network Management
 #### GET /api/network/status
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@@ -25,9 +25,9 @@ The system architecture consists of several key components working together:
 - **Service Registry**: Track available services across the cluster
 ### Task Scheduler
- **Cooperative Multitasking**: Background task management system
+- **Cooperative Multitasking**: Background task management system (`TaskManager`)
- **Task Lifecycle Management**: Automatic task execution and monitoring
+- **Task Lifecycle Management**: Enable/disable tasks and set intervals at runtime
- **Resource Optimization**: Efficient task scheduling and execution
+- **Execution Model**: Tasks run in `Spore::loop()` when their interval elapses
 ### Node Context
 - **Central Context**: Shared resources and configuration
@@ -40,27 +40,30 @@ The cluster uses a UDP-based discovery protocol for automatic node detection:
 ### Discovery Process
-1. **Discovery Broadcast**: Nodes periodically send UDP packets on port 4210
+1. **Discovery Broadcast**: Nodes periodically send UDP packets on port `udp_port` (default 4210)
-2. **Response Handling**: Nodes respond with their hostname and IP address
+2. **Response Handling**: Nodes respond with `CLUSTER_RESPONSE:<hostname>`
-3. **Member Management**: Discovered nodes are automatically added to the cluster
+3. **Member Management**: Discovered nodes are added/updated in the cluster
-4. **Health Monitoring**: Continuous status checking via HTTP API calls
+4. **Node Info via UDP**: Heartbeat triggers peers to send `CLUSTER_NODE_INFO:<hostname>:<json>`
 ### Protocol Details
- **UDP Port**: 4210 (configurable)
+- **UDP Port**: 4210 (configurable via `Config.udp_port`)
 - **Discovery Message**: `CLUSTER_DISCOVERY`
 - **Response Message**: `CLUSTER_RESPONSE`
 - **Heartbeat Message**: `CLUSTER_HEARTBEAT`
 - **Node Info Message**: `CLUSTER_NODE_INFO:<hostname>:<json>`
 - **Broadcast Address**: 255.255.255.255
- **Discovery Interval**: 1 second (configurable)
+- **Discovery Interval**: `Config.discovery_interval_ms` (default 1000 ms)
- **Listen Interval**: 100ms (configurable)
+- **Listen Interval**: `Config.discovery_interval_ms / 10` (default 100 ms)
 - **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)
 ### Node Status Categories
 Nodes are automatically categorized by their activity:
- **ACTIVE**: Responding within 10 seconds
+- **ACTIVE**: lastSeen < `node_inactive_threshold_ms` (default 10s)
- **INACTIVE**: No response for 10-60 seconds  
+- **INACTIVE**: < `node_dead_threshold_ms` (default 120s)
- **DEAD**: No response for over 60 seconds
+- **DEAD**: ≥ `node_dead_threshold_ms`
 ## Task Scheduling System
@@ -68,14 +71,14 @@ The system runs several background tasks at different intervals:
 ### Core System Tasks
-| Task | Interval | Purpose |
+| Task | Interval (default) | Purpose |
-|------|----------|---------|
+|------|--------------------|---------|
-| **Discovery Send** | 1 second | Send UDP discovery packets |
+| `discovery_send` | 1000 ms | Send UDP discovery packets |
-| **Discovery Listen** | 100ms | Listen for discovery responses |
+| `discovery_listen` | 100 ms | Listen for discovery/heartbeat/node-info |
-| **Status Updates** | 1 second | Monitor cluster member health |
+| `status_update` | 1000 ms | Update node status categories, purge dead |
-| **Heartbeat** | 2 seconds | Maintain cluster connectivity |
+| `heartbeat` | 5000 ms | Broadcast heartbeat and update local resources |
-| **Member Info** | 10 seconds | Update detailed node information |
+| `update_members_info` | 10000 ms | Reserved; no-op (info via UDP) |
-| **Debug Output** | 5 seconds | Print cluster status |
+| `print_members` | 5000 ms | Log current member list |
 ### Task Management Features
@@ -112,10 +115,7 @@ ctx.fire("cluster_updated", &clusterData);
 ### Available Events
- **`node_discovered`**: New node added to cluster
+- **`node_discovered`**: New node added or local node refreshed
 - **`cluster_updated`**: Cluster membership changed
 - **`resource_update`**: Node resources updated
 - **`health_check`**: Node health status changed
 ## Resource Monitoring
@@ -155,10 +155,8 @@ The system includes automatic WiFi fallback for robust operation:
 ### Configuration
- **SSID Format**: `SPORE_<MAC_LAST_4>`
+- **Hostname**: Derived from MAC (`esp-<mac>`) and assigned to `ctx.hostname`
- **Password**: Configurable fallback password
+- **AP Mode**: If STA connection fails, device switches to AP mode with configured SSID/password
 - **IP Range**: 192.168.4.x subnet
 - **Gateway**: 192.168.4.1
 ## Cluster Topology
@@ -170,32 +168,30 @@ The system includes automatic WiFi fallback for robust operation:
 ### Network Architecture
- **Mesh-like Structure**: Nodes can communicate with each other
+- UDP broadcast-based discovery and heartbeats on local subnet
- **Dynamic Routing**: Automatic path discovery between nodes
+- Optional HTTP polling (disabled by default; node info exchanged via UDP)
 - **Load Distribution**: Tasks distributed across available nodes
 - **Fault Tolerance**: Automatic failover and recovery
 ## Data Flow
 ### Node Discovery
 1. **UDP Broadcast**: Nodes broadcast discovery packets on port 4210
-2. **UDP Response**: Receiving nodes responds with hostname
+2. **UDP Response**: Receiving nodes respond with hostname
 3. **Registration**: Discovered nodes are added to local cluster member list
 ### Health Monitoring
-1. **Periodic Checks**: Cluster manager polls member nodes every 1 second
+1. **Periodic Checks**: Cluster manager updates node status categories
-2. **Status Collection**: Each node returns resource usage and health metrics
+2. **Status Collection**: Each node updates resources via UDP node-info messages
 ### Task Management
-1. **Scheduling**: TaskScheduler executes registered tasks at configured intervals
+1. **Scheduling**: `TaskManager` executes registered tasks at configured intervals
-2. **Execution**: Tasks run cooperatively, yielding control to other tasks
+2. **Execution**: Tasks run cooperatively in the main loop without preemption
-3. **Monitoring**: Task status and results are exposed via REST API endpoints
+3. **Monitoring**: Task status is exposed via REST (`/api/tasks/status`)
 ## Performance Characteristics
 ### Memory Usage
- **Base System**: ~15-20KB RAM
+- **Base System**: ~15-20KB RAM (device dependent)
 - **Per Task**: ~100-200 bytes per task
 - **Cluster Members**: ~50-100 bytes per member
 - **API Endpoints**: ~20-30 bytes per endpoint
@@ -219,7 +215,7 @@ The system includes automatic WiFi fallback for robust operation:
 ### Current Implementation
 - **Network Access**: Local network only (no internet exposure)
- **Authentication**: None currently implemented
+- **Authentication**: None currently implemented; LAN-only access assumed
 - **Data Validation**: Basic input validation
 - **Resource Limits**: Memory and processing constraints
--- a/docs/Development.md
+++ b/docs/Development.md
@@ -20,57 +20,99 @@
 ```
 spore/
-├── src/                    # Source code
+├── src/                    # Source code (framework under src/spore)
-│   ├── main.cpp           # Main application entry point
+│   └── spore/
-│   ├── ApiServer.cpp      # HTTP API server implementation
+│       ├── Spore.cpp               # Framework lifecycle (setup/begin/loop)
-│   ├── ClusterManager.cpp # Cluster management logic
+│       ├── core/                   # Core components
-│   ├── NetworkManager.cpp # WiFi and network handling
+│       │   ├── ApiServer.cpp       # HTTP API server implementation
-│   ├── TaskManager.cpp    # Background task management
+│       │   ├── ClusterManager.cpp  # Cluster management logic
-│   └── NodeContext.cpp    # Central context and events
+│       │   ├── NetworkManager.cpp  # WiFi and network handling
 │       │   ├── TaskManager.cpp     # Background task management
 │       │   └── NodeContext.cpp     # Central context and events
 │       ├── services/               # Built-in services
 │       │   ├── NodeService.cpp
 │       │   ├── NetworkService.cpp
 │       │   ├── ClusterService.cpp
 │       │   ├── TaskService.cpp
 │       │   ├── StaticFileService.cpp
 │       │   └── MonitoringService.cpp
 │       └── types/                  # Shared types
 ├── include/                # Header files
-├── lib/                    # Library files
+├── examples/               # Example apps per env (base, relay, neopattern)
 ├── docs/                   # Documentation
 ├── api/                    # OpenAPI specification
-├── examples/               # Example code
+├── platformio.ini          # PlatformIO configuration
-├── test/                   # Test files
+└── ctl.sh                  # Build and deployment scripts
 ├── platformio.ini         # PlatformIO configuration
 └── ctl.sh                 # Build and deployment scripts
 ```
 ## PlatformIO Configuration
 ### Framework and Board
-The project uses PlatformIO with the following configuration:
+The project uses PlatformIO with the following configuration (excerpt):
 ```ini
-[env:esp01_1m]
+[platformio]
 default_envs = base
 src_dir = .
 data_dir = ${PROJECT_DIR}/examples/${PIOENV}/data
 [common]
 monitor_speed = 115200
 lib_deps = 
    esp32async/ESPAsyncWebServer@^3.8.0
    bblanchon/ArduinoJson@^7.4.2
 [env:base]
 platform = platformio/espressif8266@^4.2.1
 board = esp01_1m
 framework = arduino
 upload_speed = 115200
-flash_mode = dout
+monitor_speed = 115200
 board_build.f_cpu = 80000000L
 board_build.flash_mode = qio
 board_build.filesystem = littlefs
 ; note: somehow partition table is not working, so we need to use the ldscript
 board_build.ldscript = eagle.flash.1m64.ld
 lib_deps = ${common.lib_deps}
 build_src_filter = 
    +<examples/base/*.cpp>
    +<src/spore/*.cpp>
    +<src/spore/core/*.cpp>
    +<src/spore/services/*.cpp>
    +<src/spore/types/*.cpp>
    +<src/spore/util/*.cpp>
    +<src/internal/*.cpp>
 [env:d1_mini]
 platform = platformio/espressif8266@^4.2.1
 board = d1_mini
 framework = arduino
 upload_speed = 115200
 monitor_speed = 115200
 board_build.filesystem = littlefs
 board_build.flash_mode = dio           ; D1 Mini uses DIO on 4 Mbit flash
 board_build.flash_size = 4M
 board_build.ldscript = eagle.flash.4m1m.ld
 lib_deps = ${common.lib_deps}
 build_src_filter = 
    +<examples/base/*.cpp>
    +<src/spore/*.cpp>
    +<src/spore/core/*.cpp>
    +<src/spore/services/*.cpp>
    +<src/spore/types/*.cpp>
    +<src/spore/util/*.cpp>
    +<src/internal/*.cpp>
 ```
 ### Key Configuration Details
 - **Framework**: Arduino
 - **Board**: ESP-01 with 1MB flash
 - **Upload Speed**: 115200 baud
 - **Flash Mode**: DOUT (required for ESP-01S)
 - **Build Type**: Release (optimized for production)
 ### Dependencies
-The project requires the following libraries:
+The project requires the following libraries (resolved via PlatformIO):
 ```ini
 lib_deps =
    esp32async/ESPAsyncWebServer@^3.8.0
    bblanchon/ArduinoJson@^7.4.2
    arkhipenko/TaskScheduler@^3.8.5
    ESP8266HTTPClient@1.2
    ESP8266WiFi@1.0
 ```
 ### Filesystem, Linker Scripts, and Flash Layout
@@ -103,7 +145,6 @@ Notes:
 - If you need a different FS size, select an appropriate ldscript variant and keep `board_build.filesystem = littlefs`.
 - On ESP8266, custom partition CSVs are not used for layout; the linker script defines the flash map. This project removed prior `board_build.partitions` usage in favor of explicit `board_build.ldscript` entries per environment.
 ## Building
 ### Basic Build Commands
@@ -308,7 +349,7 @@ export API_NODE=192.168.1.100
 Key configuration files:
 - **`platformio.ini`**: Build and upload configuration
- **`src/Config.cpp`**: Application configuration
+- **`src/spore/types/Config.cpp`**: Default runtime configuration
 - **`.env`**: Environment variables
 - **`ctl.sh`**: Build and deployment scripts
--- a/docs/MonitoringService.md
+++ b/docs/MonitoringService.md
@@ -0,0 +1,79 @@
 # Monitoring Service
 Exposes system resource metrics via HTTP for observability.
 ## Overview
 - **Service name**: `MonitoringService`
 - **Endpoint**: `GET /api/monitoring/resources`
 - **Metrics**: CPU usage, memory, filesystem, uptime
 ## Endpoint
 ### GET /api/monitoring/resources
 Returns real-time system resource metrics.
 Response fields:
 - `cpu.current_usage`: Current CPU usage percent
 - `cpu.average_usage`: Average CPU usage percent
 - `cpu.max_usage`: Max observed CPU usage
 - `cpu.min_usage`: Min observed CPU usage
 - `cpu.measurement_count`: Number of measurements
 - `cpu.is_measuring`: Whether measurement is active
 - `memory.free_heap`: Free heap bytes
 - `memory.total_heap`: Total heap bytes (approximate)
 - `memory.min_free_heap`: Minimum free heap (0 on ESP8266)
 - `memory.max_alloc_heap`: Max allocatable heap (0 on ESP8266)
 - `memory.heap_fragmentation`: Fragmentation percent (0 on ESP8266)
 - `filesystem.total_bytes`: LittleFS total bytes
 - `filesystem.used_bytes`: Used bytes
 - `filesystem.free_bytes`: Free bytes
 - `filesystem.usage_percent`: Usage percent
 - `system.uptime_ms`: Uptime in milliseconds
 - `system.uptime_seconds`: Uptime in seconds
 - `system.uptime_formatted`: Human-readable uptime
 Example:
 ```json
 {
  "cpu": {
    "current_usage": 3.5,
    "average_usage": 2.1,
    "max_usage": 15.2,
    "min_usage": 0.0,
    "measurement_count": 120,
    "is_measuring": true
  },
  "memory": {
    "free_heap": 48748,
    "total_heap": 81920,
    "min_free_heap": 0,
    "max_alloc_heap": 0,
    "heap_fragmentation": 0,
    "heap_usage_percent": 40.4
  },
  "filesystem": {
    "total_bytes": 65536,
    "used_bytes": 10240,
    "free_bytes": 55296,
    "usage_percent": 15.6
  },
  "system": {
    "uptime_ms": 123456,
    "uptime_seconds": 123,
    "uptime_formatted": "0h 2m 3s"
  }
 }
 ```
 ## Implementation Notes
 - `MonitoringService` reads from `CpuUsage` and ESP8266 SDK APIs.
 - Filesystem metrics are gathered from LittleFS.
 - CPU measurement is bracketed by `Spore::loop()` calling `cpuUsage.startMeasurement()` and `cpuUsage.endMeasurement()`.
 ## Troubleshooting
 - If `filesystem.total_bytes` is zero, ensure LittleFS is enabled in `platformio.ini` and an FS image is uploaded.
 - CPU usage values remain zero until the main loop runs and CPU measurement is started.
--- a/docs/README.md
+++ b/docs/README.md
@@ -15,15 +15,8 @@ Complete API reference with detailed endpoint documentation, examples, and integ
 - Task management workflows
 - Cluster monitoring examples
-### 📖 [TaskManager.md](./TaskManager.md)
+### 📖 [MonitoringService.md](./MonitoringService.md)
-Comprehensive guide to the TaskManager system for background task management.
+System resource monitoring API for CPU, memory, filesystem, and uptime.
 **Includes:**
 - Basic usage examples
 - Advanced binding techniques
 - Task status monitoring
 - API integration details
 - Performance considerations
 ### 📖 [TaskManagement.md](./TaskManagement.md)
 Complete guide to the task management system with examples and best practices.
--- a/src/spore/core/ClusterManager.cpp
+++ b/src/spore/core/ClusterManager.cpp
@@ -29,6 +29,9 @@ void ClusterManager::sendDiscovery() {
    ctx.udp->endPacket();
 }
 // TODO the various if statements here are a mess, we need to clean them up
 // TODO we should use a state machine to handle the different types of messages
 // TODO we should use a class to handle the different types of messages using predicate functions
 void ClusterManager::listenForDiscovery() {
    int packetSize = ctx.udp->parsePacket();
    if (packetSize) {