feat: task manager endpoint, updated documentation

2025-08-22 15:47:08 +02:00
parent d7d307e3ce
commit 30a5f8b8cb
14 changed files with 2550 additions and 551 deletions
--- a/docs/API.md
+++ b/docs/API.md
@@ -0,0 +1,279 @@
+# SPORE API Documentation
+
+The SPORE system provides a comprehensive RESTful API for monitoring and controlling the embedded device. All endpoints return JSON responses and support standard HTTP status codes.
+
+## Quick Reference
+
+### Task Management API
+
+| Endpoint | Method | Description | Parameters | Response |
+|----------|--------|-------------|------------|----------|
+| `/api/tasks/status` | GET | Get comprehensive status of all tasks and system information | None | Task status overview with system metrics |
+| `/api/tasks/control` | POST | Control individual task operations | `task`, `action` | Operation result with task details |
+
+### System Status API
+
+| Endpoint | Method | Description | Response |
+|----------|--------|-------------|----------|
+| `/api/node/status` | GET | System resource information and API endpoint registry | System metrics and API catalog |
+| `/api/cluster/members` | GET | Cluster membership and node health information | Cluster topology and health status |
+| `/api/node/update` | POST | Handle firmware updates via OTA | Update progress and status |
+| `/api/node/restart` | POST | Trigger system restart | Restart confirmation |
+
+## Detailed API Reference
+
+### Task Management
+
+#### GET /api/tasks/status
+
+Returns comprehensive status information for all registered tasks, including system resource metrics and task execution details.
+
+**Response Fields:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `summary.totalTasks` | integer | Total number of registered tasks |
+| `summary.activeTasks` | integer | Number of currently enabled tasks |
+| `tasks[].name` | string | Unique task identifier |
+| `tasks[].interval` | integer | Execution frequency in milliseconds |
+| `tasks[].enabled` | boolean | Whether task is currently enabled |
+| `tasks[].running` | boolean | Whether task is actively executing |
+| `tasks[].autoStart` | boolean | Whether task starts automatically |
+| `system.freeHeap` | integer | Available RAM in bytes |
+| `system.uptime` | integer | System uptime in milliseconds |
+
+**Example Response:**
+```json
+{
+  "summary": {
+    "totalTasks": 6,
+    "activeTasks": 5
+  },
+  "tasks": [
+    {
+      "name": "discovery_send",
+      "interval": 1000,
+      "enabled": true,
+      "running": true,
+      "autoStart": true
+    }
+  ],
+  "system": {
+    "freeHeap": 48748,
+    "uptime": 12345
+  }
+}
+```
+
+#### POST /api/tasks/control
+
+Controls the execution state of individual tasks. Supports enabling, disabling, starting, stopping, and getting detailed status for specific tasks.
+
+**Parameters:**
+- `task` (required): Name of the task to control
+- `action` (required): Action to perform
+
+**Available Actions:**
+
+| Action | Description | Use Case |
+|--------|-------------|----------|
+| `enable` | Enable a disabled task | Resume background operations |
+| `disable` | Disable a running task | Pause resource-intensive tasks |
+| `start` | Start a stopped task | Begin task execution |
+| `stop` | Stop a running task | Halt task execution |
+| `status` | Get detailed status for a specific task | Monitor individual task health |
+
+**Example Response:**
+```json
+{
+  "success": true,
+  "message": "Task enabled",
+  "task": "heartbeat",
+  "action": "enable"
+}
+```
+
+**Task Status Response:**
+```json
+{
+  "success": true,
+  "message": "Task status retrieved",
+  "task": "discovery_send",
+  "action": "status",
+  "taskDetails": {
+    "name": "discovery_send",
+    "enabled": true,
+    "running": true,
+    "interval": 1000,
+    "system": {
+      "freeHeap": 48748,
+      "uptime": 12345
+    }
+  }
+}
+```
+
+### System Status
+
+#### GET /api/node/status
+
+Returns comprehensive system resource information including memory usage, chip details, and a registry of all available API endpoints.
+
+**Response Fields:**
+- `freeHeap`: Available RAM in bytes
+- `chipId`: ESP8266 chip ID
+- `sdkVersion`: ESP8266 SDK version
+- `cpuFreqMHz`: CPU frequency in MHz
+- `flashChipSize`: Flash chip size in bytes
+- `api`: Array of registered API endpoints
+
+#### GET /api/cluster/members
+
+Returns information about all nodes in the cluster, including their health status, resources, and API endpoints.
+
+**Response Fields:**
+- `members[]`: Array of cluster node information
+- `hostname`: Node hostname
+- `ip`: Node IP address
+- `lastSeen`: Timestamp of last communication
+- `latency`: Network latency in milliseconds
+- `status`: Node health status (ACTIVE, INACTIVE, DEAD)
+- `resources`: System resource information
+- `api`: Available API endpoints
+
+### System Management
+
+#### POST /api/node/update
+
+Initiates an over-the-air firmware update. The firmware file should be uploaded as multipart/form-data.
+
+**Parameters:**
+- `firmware`: Firmware binary file (.bin)
+
+#### POST /api/node/restart
+
+Triggers a system restart. The response will be sent before the restart occurs.
+
+## HTTP Status Codes
+
+| Code | Description | Use Case |
+|------|-------------|----------|
+| 200 | Success | Operation completed successfully |
+| 400 | Bad Request | Invalid parameters or action |
+| 404 | Not Found | Task or endpoint not found |
+| 500 | Internal Server Error | System error occurred |
+
+## OpenAPI Specification
+
+A complete OpenAPI 3.0 specification is available in the [`api/`](../api/) folder. This specification can be used to:
+
+- Generate client libraries in multiple programming languages
+- Create interactive API documentation
+- Validate API requests and responses
+- Generate mock servers for testing
+- Integrate with API management platforms
+
+See [`api/README.md`](../api/README.md) for detailed usage instructions.
+
+## Usage Examples
+
+### Basic Task Status Check
+```bash
+curl -s http://10.0.1.60/api/tasks/status | jq '.'
+```
+
+### Task Control
+```bash
+# Disable a task
+curl -X POST http://10.0.1.60/api/tasks/control \
+  -d "task=heartbeat&action=disable"
+
+# Get detailed status
+curl -X POST http://10.0.1.60/api/tasks/control \
+  -d "task=discovery_send&action=status"
+```
+
+### System Monitoring
+```bash
+# Check system resources
+curl -s http://10.0.1.60/api/node/status | jq '.freeHeap'
+
+# Monitor cluster health
+curl -s http://10.0.1.60/api/cluster/members | jq '.members[].status'
+```
+
+## Integration Examples
+
+### Python Client
+```python
+import requests
+
+# Get task status
+response = requests.get('http://10.0.1.60/api/tasks/status')
+tasks = response.json()
+
+# Check active tasks
+active_count = tasks['summary']['activeTasks']
+print(f"Active tasks: {active_count}")
+
+# Control a task
+control_data = {'task': 'heartbeat', 'action': 'disable'}
+response = requests.post('http://10.0.1.60/api/tasks/control', data=control_data)
+```
+
+### JavaScript Client
+```javascript
+// Get task status
+fetch('http://10.0.1.60/api/tasks/status')
+  .then(response => response.json())
+  .then(data => {
+    console.log(`Total tasks: ${data.summary.totalTasks}`);
+    console.log(`Active tasks: ${data.summary.activeTasks}`);
+  });
+
+// Control a task
+fetch('http://10.0.1.60/api/tasks/control', {
+  method: 'POST',
+  headers: {'Content-Type': 'application/x-www-form-urlencoded'},
+  body: 'task=heartbeat&action=disable'
+});
+```
+
+## Task Management Examples
+
+### Monitoring Task Health
+```bash
+# Check overall task status
+curl -s http://10.0.1.60/api/tasks/status | jq '.'
+
+# Monitor specific task
+curl -s -X POST http://10.0.1.60/api/tasks/control \
+  -d "task=heartbeat&action=status" | jq '.'
+
+# Watch for low memory conditions
+watch -n 5 'curl -s http://10.0.1.60/api/tasks/status | jq ".system.freeHeap"'
+```
+
+### Task Control Workflows
+```bash
+# Temporarily disable discovery to reduce network traffic
+curl -X POST http://10.0.1.60/api/tasks/control \
+  -d "task=discovery_send&action=disable"
+
+# Check if it's disabled
+curl -s -X POST http://10.0.1.60/api/tasks/control \
+  -d "task=discovery_send&action=status" | jq '.taskDetails.enabled'
+
+# Re-enable when needed
+curl -X POST http://10.0.1.60/api/tasks/control \
+  -d "task=discovery_send&action=enable"
+```
+
+### Cluster Health Monitoring
+```bash
+# Monitor all nodes in cluster
+for ip in 10.0.1.60 10.0.1.61 10.0.1.62; do
+  echo "=== Node $ip ==="
+  curl -s "http://$ip/api/tasks/status" | jq '.summary'
+done
+``` 
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@@ -0,0 +1,358 @@
+# SPORE Architecture & Implementation
+
+## System Overview
+
+SPORE (SProcket ORchestration Engine) is a cluster engine for ESP8266 microcontrollers that provides automatic node discovery, health monitoring, and over-the-air updates in a distributed network environment.
+
+## Core Components
+
+The system architecture consists of several key components working together:
+
+### Network Manager
+- **WiFi Connection Handling**: Automatic WiFi STA/AP configuration
+- **Hostname Configuration**: MAC-based hostname generation
+- **Fallback Management**: Automatic access point creation if WiFi connection fails
+
+### Cluster Manager
+- **Node Discovery**: UDP-based automatic node detection
+- **Member List Management**: Dynamic cluster membership tracking
+- **Health Monitoring**: Continuous node status checking
+- **Resource Tracking**: Monitor node resources and capabilities
+
+### API Server
+- **HTTP API Server**: RESTful API for cluster management
+- **Dynamic Endpoint Registration**: Automatic API endpoint discovery
+- **Service Registry**: Track available services across the cluster
+
+### Task Scheduler
+- **Cooperative Multitasking**: Background task management system
+- **Task Lifecycle Management**: Automatic task execution and monitoring
+- **Resource Optimization**: Efficient task scheduling and execution
+
+### Node Context
+- **Central Context**: Shared resources and configuration
+- **Event System**: Local and cluster-wide event publishing/subscription
+- **Resource Management**: Centralized resource allocation and monitoring
+
+## Auto Discovery Protocol
+
+The cluster uses a UDP-based discovery protocol for automatic node detection:
+
+### Discovery Process
+
+1. **Discovery Broadcast**: Nodes periodically send UDP packets on port 4210
+2. **Response Handling**: Nodes respond with their hostname and IP address
+3. **Member Management**: Discovered nodes are automatically added to the cluster
+4. **Health Monitoring**: Continuous status checking via HTTP API calls
+
+### Protocol Details
+
+- **UDP Port**: 4210 (configurable)
+- **Discovery Message**: `CLUSTER_DISCOVERY`
+- **Response Message**: `CLUSTER_RESPONSE`
+- **Broadcast Address**: 255.255.255.255
+- **Discovery Interval**: 1 second (configurable)
+- **Listen Interval**: 100ms (configurable)
+
+### Node Status Categories
+
+Nodes are automatically categorized by their activity:
+
+- **ACTIVE**: Responding within 10 seconds
+- **INACTIVE**: No response for 10-60 seconds  
+- **DEAD**: No response for over 60 seconds
+
+## Task Scheduling System
+
+The system runs several background tasks at different intervals:
+
+### Core System Tasks
+
+| Task | Interval | Purpose |
+|------|----------|---------|
+| **Discovery Send** | 1 second | Send UDP discovery packets |
+| **Discovery Listen** | 100ms | Listen for discovery responses |
+| **Status Updates** | 1 second | Monitor cluster member health |
+| **Heartbeat** | 2 seconds | Maintain cluster connectivity |
+| **Member Info** | 10 seconds | Update detailed node information |
+| **Debug Output** | 5 seconds | Print cluster status |
+
+### Task Management Features
+
+- **Dynamic Intervals**: Change execution frequency on-the-fly
+- **Runtime Control**: Enable/disable tasks without restart
+- **Status Monitoring**: Real-time task health tracking
+- **Resource Integration**: View task status with system resources
+
+## Event System
+
+The `NodeContext` provides an event-driven architecture for system-wide communication:
+
+### Event Subscription
+
+```cpp
+// Subscribe to events
+ctx.on("node_discovered", [](void* data) {
+    NodeInfo* node = static_cast<NodeInfo*>(data);
+    // Handle new node discovery
+});
+
+ctx.on("cluster_updated", [](void* data) {
+    // Handle cluster membership changes
+});
+```
+
+### Event Publishing
+
+```cpp
+// Publish events
+ctx.fire("node_discovered", &newNode);
+ctx.fire("cluster_updated", &clusterData);
+```
+
+### Available Events
+
+- **`node_discovered`**: New node added to cluster
+- **`cluster_updated`**: Cluster membership changed
+- **`resource_update`**: Node resources updated
+- **`health_check`**: Node health status changed
+
+## Resource Monitoring
+
+Each node tracks comprehensive system resources:
+
+### System Resources
+
+- **Free Heap Memory**: Available RAM in bytes
+- **Chip ID**: Unique ESP8266 identifier
+- **SDK Version**: ESP8266 firmware version
+- **CPU Frequency**: Operating frequency in MHz
+- **Flash Chip Size**: Total flash storage in bytes
+
+### API Endpoint Registry
+
+- **Dynamic Discovery**: Automatically detect available endpoints
+- **Method Information**: HTTP method (GET, POST, etc.)
+- **Service Catalog**: Complete service registry across cluster
+
+### Health Metrics
+
+- **Response Time**: API response latency
+- **Uptime**: System uptime in milliseconds
+- **Connection Status**: Network connectivity health
+- **Resource Utilization**: Memory and CPU usage
+
+## WiFi Fallback System
+
+The system includes automatic WiFi fallback for robust operation:
+
+### Fallback Process
+
+1. **Primary Connection**: Attempts to connect to configured WiFi network
+2. **Connection Failure**: If connection fails, creates an access point
+3. **Hostname Generation**: Automatically generates hostname from MAC address
+4. **Service Continuity**: Maintains cluster functionality in fallback mode
+
+### Configuration
+
+- **SSID Format**: `SPORE_<MAC_LAST_4>`
+- **Password**: Configurable fallback password
+- **IP Range**: 192.168.4.x subnet
+- **Gateway**: 192.168.4.1
+
+## Cluster Topology
+
+### Node Types
+
+- **Master Node**: Primary cluster coordinator (if applicable)
+- **Worker Nodes**: Standard cluster members
+- **Edge Nodes**: Network edge devices
+
+### Network Architecture
+
+- **Mesh-like Structure**: Nodes can communicate with each other
+- **Dynamic Routing**: Automatic path discovery between nodes
+- **Load Distribution**: Tasks distributed across available nodes
+- **Fault Tolerance**: Automatic failover and recovery
+
+## Data Flow
+
+### Discovery Flow
+
+```
+Node A → UDP Broadcast → Node B
+Node B → HTTP Response → Node A
+Node A → Add to Cluster → Update Member List
+```
+
+### Health Monitoring Flow
+
+```
+Cluster Manager → HTTP Request → Node Status
+Node → JSON Response → Resource Information
+Cluster Manager → Update Health → Fire Events
+```
+
+### Task Execution Flow
+
+```
+Task Scheduler → Check Intervals → Execute Tasks
+Task → Update Status → API Server
+API Server → JSON Response → Client
+```
+
+## Performance Characteristics
+
+### Memory Usage
+
+- **Base System**: ~15-20KB RAM
+- **Per Task**: ~100-200 bytes per task
+- **Cluster Members**: ~50-100 bytes per member
+- **API Endpoints**: ~20-30 bytes per endpoint
+
+### Network Overhead
+
+- **Discovery Packets**: 64 bytes every 1 second
+- **Health Checks**: ~200-500 bytes every 1 second
+- **Status Updates**: ~1-2KB per node
+- **API Responses**: Varies by endpoint (typically 100B-5KB)
+
+### Processing Overhead
+
+- **Task Execution**: Minimal overhead per task
+- **Event Processing**: Fast event dispatch
+- **JSON Parsing**: Efficient ArduinoJson usage
+- **Network I/O**: Asynchronous operations
+
+## Security Considerations
+
+### Current Implementation
+
+- **Network Access**: Local network only (no internet exposure)
+- **Authentication**: None currently implemented
+- **Data Validation**: Basic input validation
+- **Resource Limits**: Memory and processing constraints
+
+### Future Enhancements
+
+- **TLS/SSL**: Encrypted communications
+- **API Keys**: Authentication for API access
+- **Access Control**: Role-based permissions
+- **Audit Logging**: Security event tracking
+
+## Scalability
+
+### Cluster Size Limits
+
+- **Theoretical**: Up to 255 nodes (IP subnet limit)
+- **Practical**: 20-50 nodes for optimal performance
+- **Memory Constraint**: ~8KB available for member tracking
+- **Network Constraint**: UDP packet size limits
+
+### Performance Scaling
+
+- **Linear Scaling**: Most operations scale linearly with node count
+- **Discovery Overhead**: Increases with cluster size
+- **Health Monitoring**: Parallel HTTP requests
+- **Task Management**: Independent per-node execution
+
+## Configuration Management
+
+### Environment Variables
+
+```bash
+# API node IP for cluster management
+export API_NODE=192.168.1.100
+
+# Cluster configuration
+export CLUSTER_PORT=4210
+export DISCOVERY_INTERVAL=1000
+export HEALTH_CHECK_INTERVAL=1000
+```
+
+### PlatformIO Configuration
+
+The project uses PlatformIO with the following configuration:
+
+- **Framework**: Arduino
+- **Board**: ESP-01 with 1MB flash
+- **Upload Speed**: 115200 baud
+- **Flash Mode**: DOUT (required for ESP-01S)
+
+### Dependencies
+
+The project requires the following libraries:
+- `esp32async/ESPAsyncWebServer@^3.8.0` - HTTP API server
+- `bblanchon/ArduinoJson@^7.4.2` - JSON processing
+- `arkhipenko/TaskScheduler@^3.8.5` - Cooperative multitasking
+
+## Development Workflow
+
+### Building
+
+Build the firmware for specific chip:
+
+```bash
+./ctl.sh build target esp01_1m
+```
+
+### Flashing
+
+Flash firmware to a connected device:
+
+```bash
+./ctl.sh flash target esp01_1m
+```
+
+### Over-The-Air Updates
+
+Update a specific node:
+
+```bash
+./ctl.sh ota update 192.168.1.100 esp01_1m
+```
+
+Update all nodes in the cluster:
+
+```bash
+./ctl.sh ota all esp01_1m
+```
+
+### Cluster Management
+
+View cluster members:
+
+```bash
+./ctl.sh cluster members
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Discovery Failures**: Check UDP port 4210 is not blocked
+2. **WiFi Connection**: Verify SSID/password in Config.cpp
+3. **OTA Updates**: Ensure sufficient flash space (1MB minimum)
+4. **Cluster Split**: Check network connectivity between nodes
+
+### Debug Output
+
+Enable serial monitoring to see cluster activity:
+
+```bash
+pio device monitor
+```
+
+### Performance Monitoring
+
+- **Memory Usage**: Monitor free heap with `/api/node/status`
+- **Task Health**: Check task status with `/api/tasks/status`
+- **Cluster Health**: Monitor member status with `/api/cluster/members`
+- **Network Latency**: Track response times in cluster data
+
+## Related Documentation
+
+- **[Task Management](./TaskManagement.md)** - Background task system
+- **[API Reference](./API.md)** - REST API documentation
+- **[TaskManager API](./TaskManager.md)** - TaskManager class reference
+- **[OpenAPI Specification](../api/)** - Machine-readable API specification 
--- a/docs/Development.md
+++ b/docs/Development.md
@@ -0,0 +1,437 @@
+# Development & Deployment Guide
+
+## Prerequisites
+
+### Required Tools
+
+- **PlatformIO Core** or **PlatformIO IDE**
+- **ESP8266 development tools**
+- **`jq`** for JSON processing in scripts
+- **Git** for version control
+
+### System Requirements
+
+- **Operating System**: Linux, macOS, or Windows
+- **Python**: 3.7+ (for PlatformIO)
+- **Memory**: 4GB+ RAM recommended
+- **Storage**: 2GB+ free space for development environment
+
+## Project Structure
+
+```
+spore/
+├── src/                    # Source code
+│   ├── main.cpp           # Main application entry point
+│   ├── ApiServer.cpp      # HTTP API server implementation
+│   ├── ClusterManager.cpp # Cluster management logic
+│   ├── NetworkManager.cpp # WiFi and network handling
+│   ├── TaskManager.cpp    # Background task management
+│   └── NodeContext.cpp    # Central context and events
+├── include/                # Header files
+├── lib/                    # Library files
+├── docs/                   # Documentation
+├── api/                    # OpenAPI specification
+├── examples/               # Example code
+├── test/                   # Test files
+├── platformio.ini         # PlatformIO configuration
+└── ctl.sh                 # Build and deployment scripts
+```
+
+## PlatformIO Configuration
+
+### Framework and Board
+
+The project uses PlatformIO with the following configuration:
+
+```ini
+[env:esp01_1m]
+platform = platformio/espressif8266@^4.2.1
+board = esp01_1m
+framework = arduino
+upload_speed = 115200
+flash_mode = dout
+```
+
+### Key Configuration Details
+
+- **Framework**: Arduino
+- **Board**: ESP-01 with 1MB flash
+- **Upload Speed**: 115200 baud
+- **Flash Mode**: DOUT (required for ESP-01S)
+- **Build Type**: Release (optimized for production)
+
+### Dependencies
+
+The project requires the following libraries:
+
+```ini
+lib_deps =
+    esp32async/ESPAsyncWebServer@^3.8.0
+    bblanchon/ArduinoJson@^7.4.2
+    arkhipenko/TaskScheduler@^3.8.5
+    ESP8266HTTPClient@1.2
+    ESP8266WiFi@1.0
+```
+
+## Building
+
+### Basic Build Commands
+
+Build the firmware for specific chip:
+
+```bash
+# Build for ESP-01 1MB
+./ctl.sh build target esp01_1m
+
+# Build for D1 Mini
+./ctl.sh build target d1_mini
+
+# Build with verbose output
+pio run -v
+```
+
+### Build Targets
+
+Available build targets:
+
+| Target | Description | Flash Size |
+|--------|-------------|------------|
+| `esp01_1m` | ESP-01 with 1MB flash | 1MB |
+| `d1_mini` | D1 Mini with 4MB flash | 4MB |
+
+### Build Artifacts
+
+After successful build:
+
+- **Firmware**: `.pio/build/{target}/firmware.bin`
+- **ELF File**: `.pio/build/{target}/firmware.elf`
+- **Map File**: `.pio/build/{target}/firmware.map`
+
+## Flashing
+
+### Direct USB Flashing
+
+Flash firmware to a connected device:
+
+```bash
+# Flash ESP-01
+./ctl.sh flash target esp01_1m
+
+# Flash D1 Mini
+./ctl.sh flash target d1_mini
+
+# Manual flash command
+pio run --target upload
+```
+
+### Flash Settings
+
+- **Upload Speed**: 115200 baud (optimal for ESP-01)
+- **Flash Mode**: DOUT (required for ESP-01S)
+- **Reset Method**: Hardware reset or manual reset
+
+### Troubleshooting Flashing
+
+Common flashing issues:
+
+1. **Connection Failed**: Check USB cable and drivers
+2. **Wrong Upload Speed**: Try lower speeds (9600, 57600)
+3. **Flash Mode Error**: Ensure DOUT mode for ESP-01S
+4. **Permission Denied**: Run with sudo or add user to dialout group
+
+## Over-The-Air Updates
+
+### Single Node Update
+
+Update a specific node:
+
+```bash
+# Update specific node
+./ctl.sh ota update 192.168.1.100 esp01_1m
+
+# Update with custom firmware
+./ctl.sh ota update 192.168.1.100 esp01_1m custom_firmware.bin
+```
+
+### Cluster-Wide Updates
+
+Update all nodes in the cluster:
+
+```bash
+# Update all nodes
+./ctl.sh ota all esp01_1m
+```
+
+### OTA Process
+
+1. **Firmware Upload**: Send firmware to target node
+2. **Verification**: Check firmware integrity
+3. **Installation**: Install new firmware
+4. **Restart**: Node restarts with new firmware
+5. **Verification**: Confirm successful update
+
+### OTA Requirements
+
+- **Flash Space**: Minimum 1MB for OTA updates
+- **Network**: Stable WiFi connection
+- **Power**: Stable power supply during update
+- **Memory**: Sufficient RAM for firmware processing
+
+## Cluster Management
+
+### View Cluster Status
+
+```bash
+# View all cluster members
+./ctl.sh cluster members
+
+# View specific node details
+./ctl.sh cluster members --node 192.168.1.100
+```
+
+### Cluster Commands
+
+Available cluster management commands:
+
+| Command | Description |
+|---------|-------------|
+| `members` | List all cluster members |
+| `status` | Show cluster health status |
+| `discover` | Force discovery process |
+| `health` | Check cluster member health |
+
+### Cluster Monitoring
+
+Monitor cluster health in real-time:
+
+```bash
+# Watch cluster status
+watch -n 5 './ctl.sh cluster members'
+
+# Monitor specific metrics
+./ctl.sh cluster members | jq '.members[] | {hostname, status, latency}'
+```
+
+## Development Workflow
+
+### Local Development
+
+1. **Setup Environment**:
+   ```bash
+   git clone <repository>
+   cd spore
+   pio run
+   ```
+
+2. **Make Changes**:
+   - Edit source files in `src/`
+   - Modify headers in `include/`
+   - Update configuration in `platformio.ini`
+
+3. **Test Changes**:
+   ```bash
+   pio run
+   pio check
+   ```
+
+### Testing
+
+Run various tests:
+
+```bash
+# Code quality check
+pio check
+
+# Unit tests (if available)
+pio test
+
+# Memory usage analysis
+pio run --target size
+```
+
+### Debugging
+
+Enable debug output:
+
+```bash
+# Serial monitoring
+pio device monitor
+
+# Build with debug symbols
+pio run --environment esp01_1m --build-flags -DDEBUG
+```
+
+## Configuration Management
+
+### Environment Setup
+
+Create a `.env` file in your project root:
+
+```bash
+# API node IP for cluster management
+export API_NODE=192.168.1.100
+```
+
+### Configuration Files
+
+Key configuration files:
+
+- **`platformio.ini`**: Build and upload configuration
+- **`src/Config.cpp`**: Application configuration
+- **`.env`**: Environment variables
+- **`ctl.sh`**: Build and deployment scripts
+
+### Configuration Options
+
+Available configuration options:
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `CLUSTER_PORT` | 4210 | UDP discovery port |
+| `DISCOVERY_INTERVAL` | 1000 | Discovery packet interval (ms) |
+| `HEALTH_CHECK_INTERVAL` | 1000 | Health check interval (ms) |
+| `API_SERVER_PORT` | 80 | HTTP API server port |
+
+## Deployment Strategies
+
+### Development Deployment
+
+For development and testing:
+
+1. **Build**: `pio run`
+2. **Flash**: `pio run --target upload`
+3. **Monitor**: `pio device monitor`
+
+### Production Deployment
+
+For production systems:
+
+1. **Build Release**: `pio run --environment esp01_1m`
+2. **OTA Update**: `./ctl.sh ota update <ip> esp01_1m`
+3. **Verify**: Check node status via API
+
+### Continuous Integration
+
+Automated deployment pipeline:
+
+```yaml
+# Example GitHub Actions workflow
+- name: Build Firmware
+  run: pio run --environment esp01_1m
+
+- name: Deploy to Test Cluster
+  run: ./ctl.sh ota all esp01_1m --target test
+
+- name: Deploy to Production
+  run: ./ctl.sh ota all esp01_1m --target production
+```
+
+## Monitoring and Debugging
+
+### Serial Output
+
+Enable serial monitoring:
+
+```bash
+# Basic monitoring
+pio device monitor
+
+# With specific baud rate
+pio device monitor --baud 115200
+
+# Filter specific messages
+pio device monitor | grep "Cluster"
+```
+
+### API Monitoring
+
+Monitor system via HTTP API:
+
+```bash
+# Check system status
+curl -s http://192.168.1.100/api/node/status | jq '.'
+
+# Monitor tasks
+curl -s http://192.168.1.100/api/tasks/status | jq '.'
+
+# Check cluster health
+curl -s http://192.168.1.100/api/cluster/members | jq '.'
+```
+
+### Performance Monitoring
+
+Track system performance:
+
+```bash
+# Memory usage over time
+watch -n 5 'curl -s http://192.168.1.100/api/node/status | jq ".freeHeap"'
+
+# Task execution status
+watch -n 10 'curl -s http://192.168.1.100/api/tasks/status | jq ".summary"'
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Discovery Failures**: Check UDP port 4210 is not blocked
+2. **WiFi Connection**: Verify SSID/password in Config.cpp
+3. **OTA Updates**: Ensure sufficient flash space (1MB minimum)
+4. **Cluster Split**: Check network connectivity between nodes
+
+### Debug Commands
+
+Useful debugging commands:
+
+```bash
+# Check network connectivity
+ping 192.168.1.100
+
+# Test UDP port
+nc -u 192.168.1.100 4210
+
+# Check HTTP API
+curl -v http://192.168.1.100/api/node/status
+
+# Monitor system resources
+./ctl.sh cluster members | jq '.members[] | {hostname, status, resources.freeHeap}'
+```
+
+### Performance Issues
+
+Common performance problems:
+
+- **Memory Leaks**: Monitor free heap over time
+- **Network Congestion**: Check discovery intervals
+- **Task Overload**: Review task execution intervals
+- **WiFi Interference**: Check channel and signal strength
+
+## Best Practices
+
+### Code Organization
+
+1. **Modular Design**: Keep components loosely coupled
+2. **Clear Interfaces**: Define clear APIs between components
+3. **Error Handling**: Implement proper error handling and logging
+4. **Resource Management**: Efficient memory and resource usage
+
+### Testing Strategy
+
+1. **Unit Tests**: Test individual components
+2. **Integration Tests**: Test component interactions
+3. **System Tests**: Test complete system functionality
+4. **Performance Tests**: Monitor resource usage and performance
+
+### Deployment Strategy
+
+1. **Staged Rollout**: Deploy to test cluster first
+2. **Rollback Plan**: Maintain ability to rollback updates
+3. **Monitoring**: Monitor system health during deployment
+4. **Documentation**: Keep deployment procedures updated
+
+## Related Documentation
+
+- **[Architecture Guide](./Architecture.md)** - System architecture overview
+- **[Task Management](./TaskManagement.md)** - Background task system
+- **[API Reference](./API.md)** - REST API documentation
+- **[OpenAPI Specification](../api/)** - Machine-readable API specification 
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,85 @@
+# SPORE Documentation
+
+This folder contains comprehensive documentation for the SPORE embedded system.
+
+## Available Documentation
+
+### 📖 [API.md](./API.md)
+Complete API reference with detailed endpoint documentation, examples, and integration guides.
+
+**Includes:**
+- API endpoint specifications
+- Request/response examples
+- HTTP status codes
+- Integration examples (Python, JavaScript)
+- Task management workflows
+- Cluster monitoring examples
+
+### 📖 [TaskManager.md](./TaskManager.md)
+Comprehensive guide to the TaskManager system for background task management.
+
+**Includes:**
+- Basic usage examples
+- Advanced binding techniques
+- Task status monitoring
+- API integration details
+- Performance considerations
+
+### 📖 [TaskManagement.md](./TaskManagement.md)
+Complete guide to the task management system with examples and best practices.
+
+**Includes:**
+- Task registration methods (std::bind, lambdas, functions)
+- Task control and lifecycle management
+- Remote task management via API
+- Performance considerations and best practices
+- Migration guides and compatibility information
+
+### 📖 [Architecture.md](./Architecture.md)
+Comprehensive system architecture and implementation details.
+
+**Includes:**
+- Core component descriptions
+- Auto discovery protocol details
+- Task scheduling system
+- Event system architecture
+- Resource monitoring
+- Performance characteristics
+- Security and scalability considerations
+
+### 📖 [Development.md](./Development.md)
+Complete development and deployment guide.
+
+**Includes:**
+- PlatformIO configuration
+- Build and flash instructions
+- OTA update procedures
+- Cluster management commands
+- Development workflow
+- Troubleshooting guide
+- Best practices
+
+## Quick Links
+
+- **Main Project**: [../README.md](../README.md)
+- **OpenAPI Specification**: [../api/](../api/)
+- **Source Code**: [../src/](../src/)
+
+## Contributing
+
+When adding new documentation:
+
+1. Create a new `.md` file in this folder
+2. Use clear, descriptive filenames
+3. Include practical examples and code snippets
+4. Update this README.md to reference new files
+5. Follow the existing documentation style
+
+## Documentation Style Guide
+
+- Use clear, concise language
+- Include practical examples
+- Use code blocks with appropriate language tags
+- Include links to related documentation
+- Use emojis sparingly for visual organization
+- Keep README.md files focused and scoped 
--- a/docs/TaskManagement.md
+++ b/docs/TaskManagement.md
@@ -0,0 +1,348 @@
+# Task Management System
+
+The SPORE system includes a comprehensive TaskManager that provides a clean interface for managing system tasks. This makes it easy to add, configure, and control background tasks without cluttering the main application code.
+
+## Overview
+
+The TaskManager system provides:
+- **Easy Task Registration**: Simple API for adding new tasks with configurable intervals
+- **Dynamic Control**: Enable/disable tasks at runtime
+- **Interval Management**: Change task execution frequency on the fly
+- **Status Monitoring**: View task status and configuration
+- **Automatic Lifecycle**: Tasks are automatically managed and executed
+
+## Basic Usage
+
+```cpp
+#include "TaskManager.h"
+
+// Create task manager
+TaskManager taskManager(ctx);
+
+// Register tasks
+taskManager.registerTask("heartbeat", 2000, heartbeatFunction);
+taskManager.registerTask("maintenance", 30000, maintenanceFunction);
+
+// Initialize and start all tasks
+taskManager.initialize();
+```
+
+## Task Registration Methods
+
+### Using std::bind with Member Functions (Recommended)
+
+```cpp
+#include <functional>
+#include "TaskManager.h"
+
+class MyService {
+public:
+    void sendHeartbeat() {
+        Serial.println("Service heartbeat");
+    }
+    
+    void performMaintenance() {
+        Serial.println("Running maintenance");
+    }
+};
+
+MyService service;
+TaskManager taskManager(ctx);
+
+// Register member functions using std::bind
+taskManager.registerTask("heartbeat", 2000, 
+                       std::bind(&MyService::sendHeartbeat, &service));
+taskManager.registerTask("maintenance", 30000, 
+                       std::bind(&MyService::performMaintenance, &service));
+
+// Initialize and start all tasks
+taskManager.initialize();
+```
+
+### Using Lambda Functions
+
+```cpp
+// Register lambda functions directly
+taskManager.registerTask("counter", 1000, []() {
+    static int count = 0;
+    Serial.printf("Count: %d\n", ++count);
+});
+
+// Lambda with capture
+int threshold = 100;
+taskManager.registerTask("monitor", 5000, [&threshold]() {
+    if (ESP.getFreeHeap() < threshold) {
+        Serial.println("Low memory warning!");
+    }
+});
+```
+
+### Complex Task Registration
+
+```cpp
+class NetworkManager {
+public:
+    void checkConnection() { /* ... */ }
+    void sendData(String data) { /* ... */ }
+};
+
+NetworkManager network;
+
+// Multiple operations in one task
+taskManager.registerTask("network_ops", 3000, 
+                       std::bind([](NetworkManager* net) {
+                           net->checkConnection();
+                           net->sendData("status_update");
+                       }, &network));
+```
+
+## Task Control API
+
+### Basic Operations
+
+```cpp
+// Enable/disable tasks
+taskManager.enableTask("heartbeat");
+taskManager.disableTask("maintenance");
+
+// Change intervals
+taskManager.setTaskInterval("heartbeat", 5000);  // 5 seconds
+
+// Check status
+bool isRunning = taskManager.isTaskEnabled("heartbeat");
+unsigned long interval = taskManager.getTaskInterval("heartbeat");
+
+// Print all task statuses
+taskManager.printTaskStatus();
+```
+
+### Task Lifecycle Management
+
+```cpp
+// Start/stop tasks
+taskManager.startTask("heartbeat");
+taskManager.stopTask("discovery");
+
+// Bulk operations
+taskManager.enableAllTasks();
+taskManager.disableAllTasks();
+```
+
+## Task Configuration Options
+
+When registering tasks, you can specify:
+
+- **Name**: Unique identifier for the task
+- **Interval**: Execution frequency in milliseconds
+- **Callback**: Function, bound method, or lambda to execute
+- **Enabled**: Whether the task starts enabled (default: true)
+- **AutoStart**: Whether to start automatically (default: true)
+
+```cpp
+// Traditional function
+taskManager.registerTask("delayed_task", 5000, taskFunction, true, false);
+
+// Member function with std::bind
+taskManager.registerTask("service_task", 3000, 
+                       std::bind(&Service::method, &instance), true, false);
+
+// Lambda function
+taskManager.registerTask("lambda_task", 2000, 
+                       []() { Serial.println("Lambda!"); }, true, false);
+```
+
+## Adding Custom Tasks
+
+### Method 1: Using std::bind (Recommended)
+
+1. **Create your service class**:
+   ```cpp
+   class SensorService {
+   public:
+       void readTemperature() {
+           // Read sensor logic
+           Serial.println("Reading temperature");
+       }
+       
+       void calibrateSensors() {
+           // Calibration logic
+           Serial.println("Calibrating sensors");
+       }
+   };
+   ```
+
+2. **Register with TaskManager**:
+   ```cpp
+   SensorService sensors;
+   
+   taskManager.registerTask("temp_read", 1000, 
+                          std::bind(&SensorService::readTemperature, &sensors));
+   taskManager.registerTask("calibrate", 60000, 
+                          std::bind(&SensorService::calibrateSensors, &sensors));
+   ```
+
+### Method 2: Traditional Functions
+
+1. **Define your task function**:
+   ```cpp
+   void myCustomTask() {
+       // Your task logic here
+       Serial.println("Custom task executed");
+   }
+   ```
+
+2. **Register with TaskManager**:
+   ```cpp
+   taskManager.registerTask("my_task", 10000, myCustomTask);
+   ```
+
+## Enhanced TaskManager Capabilities
+
+### Task Status Monitoring
+- **Real-time Status**: Check enabled/disabled state and running status
+- **Performance Metrics**: Monitor execution intervals and timing
+- **System Integration**: View task status alongside system resources
+- **Bulk Operations**: Get status of all tasks at once
+
+### Task Control Features
+- **Runtime Control**: Enable/disable tasks without restart
+- **Dynamic Intervals**: Change task execution frequency on-the-fly
+- **Individual Status**: Get detailed information about specific tasks
+- **Health Monitoring**: Track task health and system resources
+
+## Remote Task Management
+
+The TaskManager integrates with the API server to provide comprehensive remote task control and monitoring.
+
+### Task Status Overview
+
+Get a complete overview of all tasks and system status:
+
+```bash
+# Get comprehensive task status
+curl http://192.168.1.100/api/tasks/status
+```
+
+**Response includes:**
+- **Summary**: Total task count and active task count
+- **Task Details**: Individual status for each task (name, interval, enabled, running, auto-start)
+- **System Info**: Free heap memory and uptime
+
+**Example Response:**
+```json
+{
+  "summary": {
+    "totalTasks": 6,
+    "activeTasks": 5
+  },
+  "tasks": [
+    {
+      "name": "discovery_send",
+      "interval": 1000,
+      "enabled": true,
+      "running": true,
+      "autoStart": true
+    },
+    {
+      "name": "heartbeat",
+      "interval": 2000,
+      "enabled": true,
+      "running": true,
+      "autoStart": true
+    }
+  ],
+  "system": {
+    "freeHeap": 48748,
+    "uptime": 12345
+  }
+}
+```
+
+### Individual Task Control
+
+Control individual tasks with various actions:
+
+```bash
+# Control tasks
+curl -X POST http://192.168.1.100/api/tasks/control \
+  -d "task=heartbeat&action=disable"
+
+# Get detailed status for a specific task
+curl -X POST http://192.168.1.100/api/tasks/control \
+  -d "task=discovery_send&action=status"
+```
+
+**Available Actions:**
+- `enable` - Enable a task
+- `disable` - Disable a task  
+- `start` - Start a task
+- `stop` - Stop a task
+- `status` - Get detailed status for a specific task
+
+**Task Status Response:**
+```json
+{
+  "success": true,
+  "message": "Task status retrieved",
+  "task": "discovery_send",
+  "action": "status",
+  "taskDetails": {
+    "name": "discovery_send",
+    "enabled": true,
+    "running": true,
+    "interval": 1000,
+    "system": {
+      "freeHeap": 48748,
+      "uptime": 12345
+    }
+  }
+}
+```
+
+## Performance Considerations
+
+- `std::bind` creates a callable object that may have a small overhead compared to direct function pointers
+- For high-frequency tasks, consider the performance impact
+- The overhead is typically negligible for most embedded applications
+- The TaskManager stores bound functions efficiently in a registry
+
+## Best Practices
+
+1. **Use std::bind for member functions**: Cleaner than wrapper functions
+2. **Group related tasks**: Register multiple related operations in a single task
+3. **Monitor task health**: Use the status API to monitor task performance
+4. **Plan intervals carefully**: Balance responsiveness with system resources
+5. **Use descriptive names**: Make task names clear and meaningful
+
+## Migration from Wrapper Functions
+
+### Before (with wrapper functions):
+```cpp
+void discoverySendTask() { cluster.sendDiscovery(); }
+void discoveryListenTask() { cluster.listenForDiscovery(); }
+
+taskManager.registerTask("discovery_send", interval, discoverySendTask);
+taskManager.registerTask("discovery_listen", interval, discoveryListenTask);
+```
+
+### After (with std::bind):
+```cpp
+taskManager.registerTask("discovery_send", interval, 
+                       std::bind(&ClusterManager::sendDiscovery, &cluster));
+taskManager.registerTask("discovery_listen", interval, 
+                       std::bind(&ClusterManager::listenForDiscovery, &cluster));
+```
+
+## Compatibility
+
+- The new `std::bind` support is fully backward compatible
+- Existing code using function pointers will continue to work
+- You can mix both approaches in the same project
+- All existing TaskManager methods remain unchanged
+- New status monitoring methods are additive and don't break existing functionality
+
+## Related Documentation
+
+- **[TaskManager API Reference](./TaskManager.md)** - Detailed API documentation
+- **[API Reference](./API.md)** - REST API for remote task management
+- **[OpenAPI Specification](../api/)** - Machine-readable API specification 
--- a/docs/TaskManager.md
+++ b/docs/TaskManager.md
@@ -1,180 +0,0 @@
-# TaskManager
-
-## Basic Usage
-
-### Including Required Headers
-
-```cpp
-#include <functional>  // For std::bind
-#include "TaskManager.h"
-```
-
-### Registering Member Functions
-
-```cpp
-class MyClass {
-public:
-    void myMethod() {
-        Serial.println("My method called");
-    }
-    
-    void methodWithParams(int value, String text) {
-        Serial.printf("Method called with %d and %s\n", value, text.c_str());
-    }
-};
-
-// Create an instance
-MyClass myObject;
-
-// Register member function
-taskManager.registerTask("my_task", 1000, 
-                       std::bind(&MyClass::myMethod, &myObject));
-
-// Register method with parameters
-taskManager.registerTask("param_task", 2000, 
-                       std::bind(&MyClass::methodWithParams, &myObject, 42, "hello"));
-```
-
-### Registering Lambda Functions
-
-```cpp
-// Simple lambda
-taskManager.registerTask("lambda_task", 3000, []() {
-    Serial.println("Lambda executed");
-});
-
-// Lambda with capture
-int counter = 0;
-taskManager.registerTask("counter_task", 4000, [&counter]() {
-    counter++;
-    Serial.printf("Counter: %d\n", counter);
-});
-
-// Lambda that calls multiple methods
-taskManager.registerTask("multi_task", 5000, [&myObject]() {
-    myObject.myMethod();
-    // Do other work...
-});
-```
-
-### Registering Global Functions
-
-```cpp
-void globalFunction() {
-    Serial.println("Global function called");
-}
-
-// Still supported for backward compatibility
-taskManager.registerTask("global_task", 6000, globalFunction);
-```
-
-## Advanced Examples
-
-### Binding to Different Object Types
-
-```cpp
-class NetworkManager {
-public:
-    void sendHeartbeat() { /* ... */ }
-    void checkConnection() { /* ... */ }
-};
-
-class SensorManager {
-public:
-    void readSensors() { /* ... */ }
-    void calibrate() { /* ... */ }
-};
-
-NetworkManager network;
-SensorManager sensors;
-
-// Bind to different objects
-taskManager.registerTask("heartbeat", 1000, 
-                       std::bind(&NetworkManager::sendHeartbeat, &network));
-taskManager.registerTask("sensor_read", 500, 
-                       std::bind(&SensorManager::readSensors, &sensors));
-```
-
-### Using std::placeholders for Complex Binding
-
-```cpp
-#include <functional>
-
-class ConfigManager {
-public:
-    void updateConfig(int interval, bool enabled) {
-        Serial.printf("Updating config: interval=%d, enabled=%d\n", interval, enabled);
-    }
-};
-
-ConfigManager config;
-
-// Use placeholders for complex parameter binding
-using namespace std::placeholders;
-taskManager.registerTask("config_update", 10000, 
-                       std::bind(&ConfigManager::updateConfig, &config, _1, _2));
-```
-
-### Conditional Task Execution
-
-```cpp
-class TaskController {
-public:
-    bool shouldExecute() {
-        return millis() % 10000 < 5000; // Execute only in first 5 seconds of each 10-second cycle
-    }
-    
-    void conditionalTask() {
-        if (shouldExecute()) {
-            Serial.println("Conditional task executed");
-        }
-    }
-};
-
-TaskController controller;
-
-taskManager.registerTask("conditional", 1000, 
-                       std::bind(&TaskController::conditionalTask, &controller));
-```
-
-## Benefits of Using std::bind
-
-1. **Cleaner Code**: No need for wrapper functions
-2. **Direct Binding**: Bind member functions directly to objects
-3. **Parameter Passing**: Easily pass parameters to bound methods
-4. **Lambda Support**: Use lambdas for complex logic
-5. **Type Safety**: Better type checking than function pointers
-6. **Flexibility**: Mix and match different callable types
-
-## Migration from Wrapper Functions
-
-### Before (with wrapper functions):
-```cpp
-void discoverySendTask() { cluster.sendDiscovery(); }
-void discoveryListenTask() { cluster.listenForDiscovery(); }
-
-taskManager.registerTask("discovery_send", interval, discoverySendTask);
-taskManager.registerTask("discovery_listen", interval, discoveryListenTask);
-```
-
-### After (with std::bind):
-```cpp
-taskManager.registerTask("discovery_send", interval, 
-                       std::bind(&ClusterManager::sendDiscovery, &cluster));
-taskManager.registerTask("discovery_listen", interval, 
-                       std::bind(&ClusterManager::listenForDiscovery, &cluster));
-```
-
-## Performance Considerations
-
- `std::bind` creates a callable object that may have a small overhead compared to direct function pointers
- For high-frequency tasks, consider the performance impact
- The overhead is typically negligible for most embedded applications
- The TaskManager stores bound functions efficiently in a registry
-
-## Compatibility
-
- The new `std::bind` support is fully backward compatible
- Existing code using function pointers will continue to work
- You can mix both approaches in the same project
- All existing TaskManager methods remain unchanged