15 KiB
SPORE Architecture & Implementation
System Overview
SPORE (SProcket ORchestration Engine) is a cluster engine for ESP8266 microcontrollers that provides automatic node discovery, health monitoring, and over-the-air updates in a distributed network environment.
Core Components
The system architecture consists of several key components working together:
Network Manager
- WiFi Connection Handling: Automatic WiFi STA/AP configuration
- Hostname Configuration: MAC-based hostname generation
- Fallback Management: Automatic access point creation if WiFi connection fails
Cluster Manager
- Node Discovery: UDP-based automatic node detection
- Member List Management: Dynamic cluster membership tracking
- Health Monitoring: Continuous node status checking
- Resource Tracking: Monitor node resources and capabilities
API Server
- HTTP API Server: RESTful API for cluster management
- Dynamic Endpoint Registration: Services register endpoints via
registerEndpoints(ApiServer&) - Service Registry: Track available services across the cluster
- Service Lifecycle: Services register both endpoints and tasks through unified interface
Task Scheduler
- Cooperative Multitasking: Background task management system (
TaskManager) - Service Task Registration: Services register tasks via
registerTasks(TaskManager&) - Task Lifecycle Management: Enable/disable tasks and set intervals at runtime
- Execution Model: Tasks run in
Spore::loop()when their interval elapses
Node Context
- Central Context: Shared resources and configuration
- Event System: Local and cluster-wide event publishing/subscription
- Resource Management: Centralized resource allocation and monitoring
Auto Discovery Protocol
The cluster uses a UDP-based discovery protocol for automatic node detection:
Discovery Process
- Discovery Broadcast: Nodes periodically send UDP packets on port
udp_port(default 4210) - Response Handling: Nodes respond with
CLUSTER_RESPONSE:<hostname> - Member Management: Discovered nodes are added/updated in the cluster
- Node Info via UDP: Heartbeat triggers peers to send
CLUSTER_NODE_INFO:<hostname>:<json>
Protocol Details
- UDP Port: 4210 (configurable via
Config.udp_port) - Discovery Message:
CLUSTER_DISCOVERY - Response Message:
CLUSTER_RESPONSE - Heartbeat Message:
CLUSTER_HEARTBEAT - Node Info Message:
CLUSTER_NODE_INFO:<hostname>:<json> - Broadcast Address: 255.255.255.255
- Discovery Interval:
Config.discovery_interval_ms(default 1000 ms) - Listen Interval:
Config.cluster_listen_interval_ms(default 10 ms) - Heartbeat Interval:
Config.heartbeat_interval_ms(default 5000 ms)
Message Formats
- Discovery:
CLUSTER_DISCOVERY- Sender: any node, broadcast to 255.255.255.255:
udp_port - Purpose: announce presence and solicit peer identification
- Sender: any node, broadcast to 255.255.255.255:
- Response:
CLUSTER_RESPONSE:<hostname>- Sender: node receiving a discovery; unicast to requester IP
- Purpose: provide hostname so requester can register/update member
- Heartbeat:
CLUSTER_HEARTBEAT:<hostname>- Sender: each node, broadcast to 255.255.255.255:
udp_porton interval - Purpose: prompt peers to reply with their node info and keep liveness
- Sender: each node, broadcast to 255.255.255.255:
- Node Info:
CLUSTER_NODE_INFO:<hostname>:<json>- Sender: node receiving a heartbeat; unicast to heartbeat sender IP
- JSON fields: freeHeap, chipId, sdkVersion, cpuFreqMHz, flashChipSize, optional labels
Discovery Flow
- Sender broadcasts
CLUSTER_DISCOVERY - Each receiver responds with
CLUSTER_RESPONSE:<hostname>to the sender IP - Sender registers/updates the node using hostname and source IP
Heartbeat Flow
- A node broadcasts
CLUSTER_HEARTBEAT:<hostname> - Each receiver replies with
CLUSTER_NODE_INFO:<hostname>:<json>to the heartbeat sender IP - The sender:
- Ensures the node exists or creates it with
hostnameand sender IP - Parses JSON and updates resources, labels,
status = ACTIVE,lastSeen = now - Sets
latency = now - lastHeartbeatSentAt(per-node, measured at heartbeat origin)
- Ensures the node exists or creates it with
Listener Behavior
The cluster_listen task parses one UDP packet per run and dispatches by prefix to:
- Discovery → send
CLUSTER_RESPONSE - Heartbeat → send
CLUSTER_NODE_INFOJSON - Response → add/update node using provided hostname and source IP
- Node Info → update resources/status/labels and record latency
Timing and Intervals
- UDP Port:
Config.udp_port(default 4210) - Discovery Interval:
Config.discovery_interval_ms(default 1000 ms) - Listen Interval:
Config.cluster_listen_interval_ms(default 10 ms) - Heartbeat Interval:
Config.heartbeat_interval_ms(default 5000 ms)
Node Status Categories
Nodes are automatically categorized by their activity:
- ACTIVE: lastSeen <
node_inactive_threshold_ms(default 10s) - INACTIVE: <
node_dead_threshold_ms(default 120s) - DEAD: ≥
node_dead_threshold_ms
Task Scheduling System
The system runs several background tasks at different intervals:
Core System Tasks
| Task | Interval (default) | Purpose |
|---|---|---|
cluster_discovery |
1000 ms | Send UDP discovery packets |
cluster_listen |
10 ms | Listen for discovery/heartbeat/node-info |
status_update |
1000 ms | Update node status categories, purge dead |
heartbeat |
5000 ms | Broadcast heartbeat and update local resources |
cluster_update_members_info |
10000 ms | Reserved; no-op (info via UDP) |
print_members |
5000 ms | Log current member list |
Task Management Features
- Dynamic Intervals: Change execution frequency on-the-fly
- Runtime Control: Enable/disable tasks without restart
- Status Monitoring: Real-time task health tracking
- Resource Integration: View task status with system resources
Event System
The NodeContext provides an event-driven architecture for system-wide communication:
Event Subscription
// Subscribe to events
ctx.on("node_discovered", [](void* data) {
NodeInfo* node = static_cast<NodeInfo*>(data);
// Handle new node discovery
});
ctx.on("cluster_updated", [](void* data) {
// Handle cluster membership changes
});
Event Publishing
// Publish events
ctx.fire("node_discovered", &newNode);
ctx.fire("cluster_updated", &clusterData);
Available Events
node_discovered: New node added or local node refreshed
Resource Monitoring
Each node tracks comprehensive system resources:
System Resources
- Free Heap Memory: Available RAM in bytes
- Chip ID: Unique ESP8266 identifier
- SDK Version: ESP8266 firmware version
- CPU Frequency: Operating frequency in MHz
- Flash Chip Size: Total flash storage in bytes
API Endpoint Registry
- Dynamic Discovery: Automatically detect available endpoints
- Method Information: HTTP method (GET, POST, etc.)
- Service Catalog: Complete service registry across cluster
Health Metrics
- Response Time: API response latency
- Uptime: System uptime in milliseconds
- Connection Status: Network connectivity health
- Resource Utilization: Memory and CPU usage
WiFi Fallback System
The system includes automatic WiFi fallback for robust operation:
Fallback Process
- Primary Connection: Attempts to connect to configured WiFi network
- Connection Failure: If connection fails, creates an access point
- Hostname Generation: Automatically generates hostname from MAC address
- Service Continuity: Maintains cluster functionality in fallback mode
Configuration
- Hostname: Derived from MAC (
esp-<mac>) and assigned toctx.hostname - AP Mode: If STA connection fails, device switches to AP mode with configured SSID/password
Cluster Topology
Node Types
- Master Node: Primary cluster coordinator (if applicable)
- Worker Nodes: Standard cluster members
- Edge Nodes: Network edge devices
Network Architecture
- UDP broadcast-based discovery and heartbeats on local subnet
- Optional HTTP polling (disabled by default; node info exchanged via UDP)
Data Flow
Node Discovery
- UDP Broadcast: Nodes broadcast discovery packets on port 4210
- UDP Response: Receiving nodes respond with hostname
- Registration: Discovered nodes are added to local cluster member list
Health Monitoring
- Periodic Checks: Cluster manager updates node status categories
- Status Collection: Each node updates resources via UDP node-info messages
Task Management
- Scheduling:
TaskManagerexecutes registered tasks at configured intervals - Execution: Tasks run cooperatively in the main loop without preemption
- Monitoring: Task status is exposed via REST (
/api/tasks/status)
Performance Characteristics
Memory Usage
- Base System: ~15-20KB RAM (device dependent)
- Per Task: ~100-200 bytes per task
- Cluster Members: ~50-100 bytes per member
- API Endpoints: ~20-30 bytes per endpoint
Network Overhead
- Discovery Packets: 64 bytes every 1 second
- Health Checks: ~200-500 bytes every 1 second
- Status Updates: ~1-2KB per node
- API Responses: Varies by endpoint (typically 100B-5KB)
Processing Overhead
- Task Execution: Minimal overhead per task
- Event Processing: Fast event dispatch
- JSON Parsing: Efficient ArduinoJson usage
- Network I/O: Asynchronous operations
Security Considerations
Current Implementation
- Network Access: Local network only (no internet exposure)
- Authentication: None currently implemented; LAN-only access assumed
- Data Validation: Basic input validation
- Resource Limits: Memory and processing constraints
Future Enhancements
- TLS/SSL: Encrypted communications
- API Keys: Authentication for API access
- Access Control: Role-based permissions
- Audit Logging: Security event tracking
Scalability
Cluster Size Limits
- Theoretical: Up to 255 nodes (IP subnet limit)
- Practical: 20-50 nodes for optimal performance
- Memory Constraint: ~8KB available for member tracking
- Network Constraint: UDP packet size limits
Performance Scaling
- Linear Scaling: Most operations scale linearly with node count
- Discovery Overhead: Increases with cluster size
- Health Monitoring: Parallel HTTP requests
- Task Management: Independent per-node execution
Configuration Management
SPORE implements a persistent configuration system that manages device settings across reboots and provides runtime reconfiguration capabilities.
Configuration Architecture
The configuration system consists of several key components:
ConfigClass: Central configuration management with default constants- LittleFS Storage: Persistent file-based storage (
/config.json) - Runtime Updates: Live configuration changes via HTTP API
- Automatic Persistence: Configuration changes are automatically saved
Configuration Categories
| Category | Description | Examples |
|---|---|---|
| WiFi Configuration | Network connection settings | SSID, password, timeouts |
| Network Configuration | Network service settings | UDP port, API server port |
| Cluster Configuration | Cluster management settings | Discovery intervals, heartbeat timing |
| Node Status Thresholds | Health monitoring thresholds | Active/inactive/dead timeouts |
| System Configuration | Core system settings | Restart delay, JSON document size |
| Memory Management | Resource management settings | Memory thresholds, HTTP request limits |
Configuration Lifecycle
- Boot Process: Load configuration from
/config.jsonor use defaults - Runtime Updates: Configuration changes via HTTP API
- Persistent Storage: Changes automatically saved to LittleFS
- Service Integration: Configuration applied to all system services
Default Value Management
All default values are defined as constexpr constants in the Config class:
static constexpr const char* DEFAULT_WIFI_SSID = "shroud";
static constexpr uint16_t DEFAULT_UDP_PORT = 4210;
static constexpr unsigned long DEFAULT_HEARTBEAT_INTERVAL_MS = 5000;
This ensures:
- Single Source of Truth: All defaults defined once
- Type Safety: Compile-time type checking
- Maintainability: Easy to update default values
- Consistency: Same defaults used in
setDefaults()andloadFromFile()
Environment Variables
# API node IP for cluster management
export API_NODE=192.168.1.100
PlatformIO Configuration
The project uses PlatformIO with the following configuration:
- Framework: Arduino
- Board: ESP-01 with 1MB flash
- Upload Speed: 115200 baud
- Flash Mode: DOUT (required for ESP-01S)
Dependencies
The project requires the following libraries:
esp32async/ESPAsyncWebServer@^3.8.0- HTTP API serverbblanchon/ArduinoJson@^7.4.2- JSON processingarkhipenko/TaskScheduler@^3.8.5- Cooperative multitasking
Development Workflow
Building
Build the firmware for specific chip:
./ctl.sh build target esp01_1m
Flashing
Flash firmware to a connected device:
./ctl.sh flash target esp01_1m
Over-The-Air Updates
Update a specific node:
./ctl.sh ota update 192.168.1.100 esp01_1m
Update all nodes in the cluster:
./ctl.sh ota all esp01_1m
Cluster Management
View cluster members:
./ctl.sh cluster members
Troubleshooting
Common Issues
- Discovery Failures: Check UDP port 4210 is not blocked
- WiFi Connection: Verify SSID/password in Config.cpp
- OTA Updates: Ensure sufficient flash space (1MB minimum)
- Cluster Split: Check network connectivity between nodes
Debug Output
Enable serial monitoring to see cluster activity:
pio device monitor
Performance Monitoring
- Memory Usage: Monitor free heap with
/api/node/status - Task Health: Check task status with
/api/tasks/status - Cluster Health: Monitor member status with
/api/cluster/members - Network Latency: Track response times in cluster data
Related Documentation
- Configuration Management - Persistent configuration system
- WiFi Configuration - WiFi setup and reconfiguration process
- Task Management - Background task system
- API Reference - REST API documentation
- TaskManager API - TaskManager class reference
- OpenAPI Specification - Machine-readable API specification