feature/improved-cluster-forming #15
3
ctl.sh
3
ctl.sh
@@ -372,11 +372,10 @@ function node {
|
|||||||
|
|
||||||
# Cluster Configuration
|
# Cluster Configuration
|
||||||
echo "=== Cluster Configuration ==="
|
echo "=== Cluster Configuration ==="
|
||||||
echo "Discovery Interval: $(echo "$response_body" | jq -r '.cluster.discovery_interval_ms // "N/A"') ms"
|
|
||||||
echo "Heartbeat Interval: $(echo "$response_body" | jq -r '.cluster.heartbeat_interval_ms // "N/A"') ms"
|
echo "Heartbeat Interval: $(echo "$response_body" | jq -r '.cluster.heartbeat_interval_ms // "N/A"') ms"
|
||||||
echo "Cluster Listen Interval: $(echo "$response_body" | jq -r '.cluster.cluster_listen_interval_ms // "N/A"') ms"
|
echo "Cluster Listen Interval: $(echo "$response_body" | jq -r '.cluster.cluster_listen_interval_ms // "N/A"') ms"
|
||||||
echo "Status Update Interval: $(echo "$response_body" | jq -r '.cluster.status_update_interval_ms // "N/A"') ms"
|
echo "Status Update Interval: $(echo "$response_body" | jq -r '.cluster.status_update_interval_ms // "N/A"') ms"
|
||||||
echo "Member Info Update Interval: $(echo "$response_body" | jq -r '.cluster.member_info_update_interval_ms // "N/A"') ms"
|
echo "Node Update Broadcast Interval: $(echo "$response_body" | jq -r '.cluster.node_update_broadcast_interval_ms // "N/A"') ms"
|
||||||
echo "Print Interval: $(echo "$response_body" | jq -r '.cluster.print_interval_ms // "N/A"') ms"
|
echo "Print Interval: $(echo "$response_body" | jq -r '.cluster.print_interval_ms // "N/A"') ms"
|
||||||
echo ""
|
echo ""
|
||||||
|
|
||||||
|
|||||||
@@ -52,57 +52,50 @@ The cluster uses a UDP-based discovery protocol for automatic node detection:
|
|||||||
- **UDP Port**: 4210 (configurable via `Config.udp_port`)
|
- **UDP Port**: 4210 (configurable via `Config.udp_port`)
|
||||||
- **Discovery Message**: `CLUSTER_DISCOVERY`
|
- **Discovery Message**: `CLUSTER_DISCOVERY`
|
||||||
- **Response Message**: `CLUSTER_RESPONSE`
|
- **Response Message**: `CLUSTER_RESPONSE`
|
||||||
- **Heartbeat Message**: `CLUSTER_HEARTBEAT`
|
- **Heartbeat Message**: `CLUSTER_HEARTBEAT:hostname`
|
||||||
- **Node Info Message**: `CLUSTER_NODE_INFO:<hostname>:<json>`
|
- **Node Update Message**: `NODE_UPDATE:hostname:{json}`
|
||||||
- **Broadcast Address**: 255.255.255.255
|
- **Broadcast Address**: 255.255.255.255
|
||||||
- **Discovery Interval**: `Config.discovery_interval_ms` (default 1000 ms)
|
|
||||||
- **Listen Interval**: `Config.cluster_listen_interval_ms` (default 10 ms)
|
- **Listen Interval**: `Config.cluster_listen_interval_ms` (default 10 ms)
|
||||||
- **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)
|
- **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)
|
||||||
|
- **Node Update Broadcast Interval**: `Config.node_update_broadcast_interval_ms` (default 5000 ms)
|
||||||
|
|
||||||
### Message Formats
|
### Message Formats
|
||||||
|
|
||||||
- **Discovery**: `CLUSTER_DISCOVERY`
|
- **Heartbeat**: `CLUSTER_HEARTBEAT:hostname`
|
||||||
- Sender: any node, broadcast to 255.255.255.255:`udp_port`
|
|
||||||
- Purpose: announce presence and solicit peer identification
|
|
||||||
- **Response**: `CLUSTER_RESPONSE:<hostname>`
|
|
||||||
- Sender: node receiving a discovery; unicast to requester IP
|
|
||||||
- Purpose: provide hostname so requester can register/update member
|
|
||||||
- **Heartbeat**: `CLUSTER_HEARTBEAT:<hostname>`
|
|
||||||
- Sender: each node, broadcast to 255.255.255.255:`udp_port` on interval
|
- Sender: each node, broadcast to 255.255.255.255:`udp_port` on interval
|
||||||
- Purpose: prompt peers to reply with their node info and keep liveness
|
- Purpose: announce presence, prompt peers for node info, and keep liveness
|
||||||
- **Node Info**: `CLUSTER_NODE_INFO:<hostname>:<json>`
|
- **Node Update**: `NODE_UPDATE:hostname:{json}`
|
||||||
- Sender: node receiving a heartbeat; unicast to heartbeat sender IP
|
- Sender: node receiving a heartbeat; unicast to heartbeat sender IP
|
||||||
- JSON fields: freeHeap, chipId, sdkVersion, cpuFreqMHz, flashChipSize, optional labels
|
- JSON fields: hostname, ip, uptime, optional labels
|
||||||
|
- Purpose: provide minimal node information in response to heartbeat
|
||||||
### Discovery Flow
|
|
||||||
|
|
||||||
1. **Sender broadcasts** `CLUSTER_DISCOVERY`
|
|
||||||
2. **Each receiver responds** with `CLUSTER_RESPONSE:<hostname>` to the sender IP
|
|
||||||
3. **Sender registers/updates** the node using hostname and source IP
|
|
||||||
|
|
||||||
### Heartbeat Flow
|
### Heartbeat Flow
|
||||||
|
|
||||||
1. **A node broadcasts** `CLUSTER_HEARTBEAT:<hostname>`
|
1. **A node broadcasts** `CLUSTER_HEARTBEAT:hostname`
|
||||||
2. **Each receiver replies** with `CLUSTER_NODE_INFO:<hostname>:<json>` to the heartbeat sender IP
|
2. **Each receiver responds** with `NODE_UPDATE:hostname:{json}` to the heartbeat sender IP
|
||||||
3. **The sender**:
|
3. **The sender**:
|
||||||
- Ensures the node exists or creates it with `hostname` and sender IP
|
- Ensures the node exists or creates it with `hostname` and sender IP
|
||||||
- Parses JSON and updates resources, labels, `status = ACTIVE`, `lastSeen = now`
|
- Parses JSON and updates node info, `status = ACTIVE`, `lastSeen = now`
|
||||||
- Sets `latency = now - lastHeartbeatSentAt` (per-node, measured at heartbeat origin)
|
- Sets `latency = now - lastHeartbeatSentAt` (per-node, measured at heartbeat origin)
|
||||||
|
|
||||||
|
### Node Update Broadcasting
|
||||||
|
|
||||||
|
1. **Periodic broadcast**: Each node broadcasts `NODE_UPDATE:hostname:{json}` every 5 seconds
|
||||||
|
2. **All receivers**: Update their memberlist entry for the broadcasting node
|
||||||
|
3. **Purpose**: Ensures all nodes have current information about each other
|
||||||
|
|
||||||
### Listener Behavior
|
### Listener Behavior
|
||||||
|
|
||||||
The `cluster_listen` task parses one UDP packet per run and dispatches by prefix to:
|
The `cluster_listen` task parses one UDP packet per run and dispatches by prefix to:
|
||||||
- **Discovery** → send `CLUSTER_RESPONSE`
|
- **Heartbeat** → add/update node and send `NODE_UPDATE` JSON response
|
||||||
- **Heartbeat** → send `CLUSTER_NODE_INFO` JSON
|
- **Node Update** → update node information and status
|
||||||
- **Response** → add/update node using provided hostname and source IP
|
|
||||||
- **Node Info** → update resources/status/labels and record latency
|
|
||||||
|
|
||||||
### Timing and Intervals
|
### Timing and Intervals
|
||||||
|
|
||||||
- **UDP Port**: `Config.udp_port` (default 4210)
|
- **UDP Port**: `Config.udp_port` (default 4210)
|
||||||
- **Discovery Interval**: `Config.discovery_interval_ms` (default 1000 ms)
|
|
||||||
- **Listen Interval**: `Config.cluster_listen_interval_ms` (default 10 ms)
|
- **Listen Interval**: `Config.cluster_listen_interval_ms` (default 10 ms)
|
||||||
- **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)
|
- **Heartbeat Interval**: `Config.heartbeat_interval_ms` (default 5000 ms)
|
||||||
|
- **Node Update Broadcast Interval**: `Config.node_update_broadcast_interval_ms` (default 5000 ms)
|
||||||
|
|
||||||
### Node Status Categories
|
### Node Status Categories
|
||||||
|
|
||||||
|
|||||||
@@ -93,11 +93,10 @@ The configuration is stored as a JSON file with the following structure:
|
|||||||
"api_server_port": 80
|
"api_server_port": 80
|
||||||
},
|
},
|
||||||
"cluster": {
|
"cluster": {
|
||||||
"discovery_interval_ms": 1000,
|
|
||||||
"heartbeat_interval_ms": 5000,
|
"heartbeat_interval_ms": 5000,
|
||||||
"cluster_listen_interval_ms": 10,
|
"cluster_listen_interval_ms": 10,
|
||||||
"status_update_interval_ms": 1000,
|
"status_update_interval_ms": 1000,
|
||||||
"member_info_update_interval_ms": 10000,
|
"node_update_broadcast_interval_ms": 5000,
|
||||||
"print_interval_ms": 5000
|
"print_interval_ms": 5000
|
||||||
},
|
},
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
|
|||||||
@@ -14,7 +14,6 @@ class ClusterManager {
|
|||||||
public:
|
public:
|
||||||
ClusterManager(NodeContext& ctx, TaskManager& taskMgr);
|
ClusterManager(NodeContext& ctx, TaskManager& taskMgr);
|
||||||
void registerTasks();
|
void registerTasks();
|
||||||
void sendDiscovery();
|
|
||||||
void listen();
|
void listen();
|
||||||
void addOrUpdateNode(const String& nodeHost, IPAddress nodeIP);
|
void addOrUpdateNode(const String& nodeHost, IPAddress nodeIP);
|
||||||
void updateAllNodeStatuses();
|
void updateAllNodeStatuses();
|
||||||
@@ -25,6 +24,7 @@ public:
|
|||||||
void updateLocalNodeResources();
|
void updateLocalNodeResources();
|
||||||
void heartbeatTaskCallback();
|
void heartbeatTaskCallback();
|
||||||
void updateAllMembersInfoTaskCallback();
|
void updateAllMembersInfoTaskCallback();
|
||||||
|
void broadcastNodeUpdate();
|
||||||
private:
|
private:
|
||||||
NodeContext& ctx;
|
NodeContext& ctx;
|
||||||
TaskManager& taskManager;
|
TaskManager& taskManager;
|
||||||
@@ -35,18 +35,15 @@ private:
|
|||||||
};
|
};
|
||||||
void initMessageHandlers();
|
void initMessageHandlers();
|
||||||
void handleIncomingMessage(const char* incoming);
|
void handleIncomingMessage(const char* incoming);
|
||||||
static bool isDiscoveryMsg(const char* msg);
|
|
||||||
static bool isHeartbeatMsg(const char* msg);
|
static bool isHeartbeatMsg(const char* msg);
|
||||||
static bool isResponseMsg(const char* msg);
|
static bool isNodeUpdateMsg(const char* msg);
|
||||||
static bool isNodeInfoMsg(const char* msg);
|
|
||||||
static bool isClusterEventMsg(const char* msg);
|
static bool isClusterEventMsg(const char* msg);
|
||||||
static bool isRawMsg(const char* msg);
|
static bool isRawMsg(const char* msg);
|
||||||
void onDiscovery(const char* msg);
|
|
||||||
void onHeartbeat(const char* msg);
|
void onHeartbeat(const char* msg);
|
||||||
void onResponse(const char* msg);
|
void onNodeUpdate(const char* msg);
|
||||||
void onNodeInfo(const char* msg);
|
|
||||||
void onClusterEvent(const char* msg);
|
void onClusterEvent(const char* msg);
|
||||||
void onRawMessage(const char* msg);
|
void onRawMessage(const char* msg);
|
||||||
|
void sendNodeInfo(const String& hostname, const IPAddress& targetIP);
|
||||||
unsigned long lastHeartbeatSentAt = 0;
|
unsigned long lastHeartbeatSentAt = 0;
|
||||||
std::vector<MessageHandler> messageHandlers;
|
std::vector<MessageHandler> messageHandlers;
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -5,10 +5,9 @@
|
|||||||
|
|
||||||
// Cluster protocol and API constants
|
// Cluster protocol and API constants
|
||||||
namespace ClusterProtocol {
|
namespace ClusterProtocol {
|
||||||
constexpr const char* DISCOVERY_MSG = "CLUSTER_DISCOVERY";
|
// Simplified heartbeat-only protocol
|
||||||
constexpr const char* RESPONSE_MSG = "CLUSTER_RESPONSE";
|
|
||||||
constexpr const char* HEARTBEAT_MSG = "CLUSTER_HEARTBEAT";
|
constexpr const char* HEARTBEAT_MSG = "CLUSTER_HEARTBEAT";
|
||||||
constexpr const char* NODE_INFO_MSG = "CLUSTER_NODE_INFO";
|
constexpr const char* NODE_UPDATE_MSG = "NODE_UPDATE";
|
||||||
constexpr const char* CLUSTER_EVENT_MSG = "CLUSTER_EVENT";
|
constexpr const char* CLUSTER_EVENT_MSG = "CLUSTER_EVENT";
|
||||||
constexpr const char* RAW_MSG = "RAW";
|
constexpr const char* RAW_MSG = "RAW";
|
||||||
constexpr uint16_t UDP_PORT = 4210;
|
constexpr uint16_t UDP_PORT = 4210;
|
||||||
@@ -18,12 +17,9 @@ namespace ClusterProtocol {
|
|||||||
}
|
}
|
||||||
|
|
||||||
namespace TaskIntervals {
|
namespace TaskIntervals {
|
||||||
constexpr unsigned long SEND_DISCOVERY = 1000;
|
|
||||||
constexpr unsigned long LISTEN_FOR_DISCOVERY = 100;
|
|
||||||
constexpr unsigned long UPDATE_STATUS = 1000;
|
constexpr unsigned long UPDATE_STATUS = 1000;
|
||||||
constexpr unsigned long PRINT_MEMBER_LIST = 5000;
|
constexpr unsigned long PRINT_MEMBER_LIST = 5000;
|
||||||
constexpr unsigned long HEARTBEAT = 2000;
|
constexpr unsigned long HEARTBEAT = 2000;
|
||||||
constexpr unsigned long UPDATE_ALL_MEMBERS_INFO = 10000;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
constexpr unsigned long NODE_ACTIVE_THRESHOLD = 10000;
|
constexpr unsigned long NODE_ACTIVE_THRESHOLD = 10000;
|
||||||
|
|||||||
@@ -12,11 +12,10 @@ public:
|
|||||||
static constexpr const char* DEFAULT_WIFI_PASSWORD = "th3r31sn0sp00n";
|
static constexpr const char* DEFAULT_WIFI_PASSWORD = "th3r31sn0sp00n";
|
||||||
static constexpr uint16_t DEFAULT_UDP_PORT = 4210;
|
static constexpr uint16_t DEFAULT_UDP_PORT = 4210;
|
||||||
static constexpr uint16_t DEFAULT_API_SERVER_PORT = 80;
|
static constexpr uint16_t DEFAULT_API_SERVER_PORT = 80;
|
||||||
static constexpr unsigned long DEFAULT_DISCOVERY_INTERVAL_MS = 1000;
|
|
||||||
static constexpr unsigned long DEFAULT_CLUSTER_LISTEN_INTERVAL_MS = 10;
|
static constexpr unsigned long DEFAULT_CLUSTER_LISTEN_INTERVAL_MS = 10;
|
||||||
static constexpr unsigned long DEFAULT_HEARTBEAT_INTERVAL_MS = 5000;
|
static constexpr unsigned long DEFAULT_HEARTBEAT_INTERVAL_MS = 5000;
|
||||||
static constexpr unsigned long DEFAULT_STATUS_UPDATE_INTERVAL_MS = 1000;
|
static constexpr unsigned long DEFAULT_STATUS_UPDATE_INTERVAL_MS = 1000;
|
||||||
static constexpr unsigned long DEFAULT_MEMBER_INFO_UPDATE_INTERVAL_MS = 10000;
|
static constexpr unsigned long DEFAULT_NODE_UPDATE_BROADCAST_INTERVAL_MS = 5000;
|
||||||
static constexpr unsigned long DEFAULT_PRINT_INTERVAL_MS = 5000;
|
static constexpr unsigned long DEFAULT_PRINT_INTERVAL_MS = 5000;
|
||||||
static constexpr unsigned long DEFAULT_NODE_ACTIVE_THRESHOLD_MS = 10000;
|
static constexpr unsigned long DEFAULT_NODE_ACTIVE_THRESHOLD_MS = 10000;
|
||||||
static constexpr unsigned long DEFAULT_NODE_INACTIVE_THRESHOLD_MS = 60000;
|
static constexpr unsigned long DEFAULT_NODE_INACTIVE_THRESHOLD_MS = 60000;
|
||||||
@@ -38,11 +37,10 @@ public:
|
|||||||
uint16_t api_server_port;
|
uint16_t api_server_port;
|
||||||
|
|
||||||
// Cluster Configuration
|
// Cluster Configuration
|
||||||
unsigned long discovery_interval_ms;
|
|
||||||
unsigned long heartbeat_interval_ms;
|
unsigned long heartbeat_interval_ms;
|
||||||
unsigned long cluster_listen_interval_ms;
|
unsigned long cluster_listen_interval_ms;
|
||||||
unsigned long status_update_interval_ms;
|
unsigned long status_update_interval_ms;
|
||||||
unsigned long member_info_update_interval_ms;
|
unsigned long node_update_broadcast_interval_ms;
|
||||||
unsigned long print_interval_ms;
|
unsigned long print_interval_ms;
|
||||||
|
|
||||||
// Node Status Thresholds
|
// Node Status Thresholds
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ struct NodeInfo {
|
|||||||
String hostname;
|
String hostname;
|
||||||
IPAddress ip;
|
IPAddress ip;
|
||||||
unsigned long lastSeen;
|
unsigned long lastSeen;
|
||||||
|
unsigned long uptime = 0; // milliseconds since node started
|
||||||
enum Status { ACTIVE, INACTIVE, DEAD } status;
|
enum Status { ACTIVE, INACTIVE, DEAD } status;
|
||||||
struct Resources {
|
struct Resources {
|
||||||
uint32_t freeHeap = 0;
|
uint32_t freeHeap = 0;
|
||||||
@@ -17,7 +18,7 @@ struct NodeInfo {
|
|||||||
uint32_t cpuFreqMHz = 0;
|
uint32_t cpuFreqMHz = 0;
|
||||||
uint32_t flashChipSize = 0;
|
uint32_t flashChipSize = 0;
|
||||||
} resources;
|
} resources;
|
||||||
unsigned long latency = 0; // ms from heartbeat broadcast to NODE_INFO receipt
|
unsigned long latency = 0; // ms from heartbeat broadcast to NODE_UPDATE receipt
|
||||||
std::vector<EndpointInfo> endpoints; // List of registered endpoints
|
std::vector<EndpointInfo> endpoints; // List of registered endpoints
|
||||||
std::map<String, String> labels; // Arbitrary node labels (key -> value)
|
std::map<String, String> labels; // Arbitrary node labels (key -> value)
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -25,27 +25,27 @@ ClusterManager::ClusterManager(NodeContext& ctx, TaskManager& taskMgr) : ctx(ctx
|
|||||||
this->ctx.udp->write(msg.c_str());
|
this->ctx.udp->write(msg.c_str());
|
||||||
this->ctx.udp->endPacket();
|
this->ctx.udp->endPacket();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Handler for node update broadcasts: services fire 'cluster/node_update' when their node info changes
|
||||||
|
ctx.on("cluster/node_update", [this](void* data) {
|
||||||
|
// Trigger immediate NODE_UPDATE broadcast when node info changes
|
||||||
|
broadcastNodeUpdate();
|
||||||
|
});
|
||||||
// Register tasks
|
// Register tasks
|
||||||
registerTasks();
|
registerTasks();
|
||||||
initMessageHandlers();
|
initMessageHandlers();
|
||||||
}
|
}
|
||||||
|
|
||||||
void ClusterManager::registerTasks() {
|
void ClusterManager::registerTasks() {
|
||||||
taskManager.registerTask("cluster_discovery", ctx.config.discovery_interval_ms, [this]() { sendDiscovery(); });
|
|
||||||
taskManager.registerTask("cluster_listen", ctx.config.cluster_listen_interval_ms, [this]() { listen(); });
|
taskManager.registerTask("cluster_listen", ctx.config.cluster_listen_interval_ms, [this]() { listen(); });
|
||||||
taskManager.registerTask("status_update", ctx.config.status_update_interval_ms, [this]() { updateAllNodeStatuses(); removeDeadNodes(); });
|
taskManager.registerTask("status_update", ctx.config.status_update_interval_ms, [this]() { updateAllNodeStatuses(); removeDeadNodes(); });
|
||||||
taskManager.registerTask("print_members", ctx.config.print_interval_ms, [this]() { printMemberList(); });
|
taskManager.registerTask("print_members", ctx.config.print_interval_ms, [this]() { printMemberList(); });
|
||||||
taskManager.registerTask("heartbeat", ctx.config.heartbeat_interval_ms, [this]() { heartbeatTaskCallback(); });
|
taskManager.registerTask("heartbeat", ctx.config.heartbeat_interval_ms, [this]() { heartbeatTaskCallback(); });
|
||||||
taskManager.registerTask("cluster_update_members_info", ctx.config.member_info_update_interval_ms, [this]() { updateAllMembersInfoTaskCallback(); });
|
taskManager.registerTask("node_update_broadcast", ctx.config.node_update_broadcast_interval_ms, [this]() { broadcastNodeUpdate(); });
|
||||||
LOG_INFO("ClusterManager", "Registered all cluster tasks");
|
LOG_INFO("ClusterManager", "Registered all cluster tasks");
|
||||||
}
|
}
|
||||||
|
|
||||||
void ClusterManager::sendDiscovery() {
|
// Discovery functionality removed - using heartbeat-only approach
|
||||||
//LOG_DEBUG(ctx, "Cluster", "Sending discovery packet...");
|
|
||||||
ctx.udp->beginPacket("255.255.255.255", ctx.config.udp_port);
|
|
||||||
ctx.udp->write(ClusterProtocol::DISCOVERY_MSG);
|
|
||||||
ctx.udp->endPacket();
|
|
||||||
}
|
|
||||||
|
|
||||||
void ClusterManager::listen() {
|
void ClusterManager::listen() {
|
||||||
int packetSize = ctx.udp->parsePacket();
|
int packetSize = ctx.udp->parsePacket();
|
||||||
@@ -69,10 +69,8 @@ void ClusterManager::listen() {
|
|||||||
void ClusterManager::initMessageHandlers() {
|
void ClusterManager::initMessageHandlers() {
|
||||||
messageHandlers.clear();
|
messageHandlers.clear();
|
||||||
messageHandlers.push_back({ &ClusterManager::isRawMsg, [this](const char* msg){ this->onRawMessage(msg); }, "RAW" });
|
messageHandlers.push_back({ &ClusterManager::isRawMsg, [this](const char* msg){ this->onRawMessage(msg); }, "RAW" });
|
||||||
messageHandlers.push_back({ &ClusterManager::isDiscoveryMsg, [this](const char* msg){ this->onDiscovery(msg); }, "DISCOVERY" });
|
|
||||||
messageHandlers.push_back({ &ClusterManager::isHeartbeatMsg, [this](const char* msg){ this->onHeartbeat(msg); }, "HEARTBEAT" });
|
messageHandlers.push_back({ &ClusterManager::isHeartbeatMsg, [this](const char* msg){ this->onHeartbeat(msg); }, "HEARTBEAT" });
|
||||||
messageHandlers.push_back({ &ClusterManager::isResponseMsg, [this](const char* msg){ this->onResponse(msg); }, "RESPONSE" });
|
messageHandlers.push_back({ &ClusterManager::isNodeUpdateMsg, [this](const char* msg){ this->onNodeUpdate(msg); }, "NODE_UPDATE" });
|
||||||
messageHandlers.push_back({ &ClusterManager::isNodeInfoMsg, [this](const char* msg){ this->onNodeInfo(msg); }, "NODE_INFO" });
|
|
||||||
messageHandlers.push_back({ &ClusterManager::isClusterEventMsg, [this](const char* msg){ this->onClusterEvent(msg); }, "CLUSTER_EVENT" });
|
messageHandlers.push_back({ &ClusterManager::isClusterEventMsg, [this](const char* msg){ this->onClusterEvent(msg); }, "CLUSTER_EVENT" });
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -94,20 +92,12 @@ void ClusterManager::handleIncomingMessage(const char* incoming) {
|
|||||||
LOG_DEBUG("Cluster", String("Unknown cluster message: ") + head);
|
LOG_DEBUG("Cluster", String("Unknown cluster message: ") + head);
|
||||||
}
|
}
|
||||||
|
|
||||||
bool ClusterManager::isDiscoveryMsg(const char* msg) {
|
|
||||||
return strcmp(msg, ClusterProtocol::DISCOVERY_MSG) == 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
bool ClusterManager::isHeartbeatMsg(const char* msg) {
|
bool ClusterManager::isHeartbeatMsg(const char* msg) {
|
||||||
return strncmp(msg, ClusterProtocol::HEARTBEAT_MSG, strlen(ClusterProtocol::HEARTBEAT_MSG)) == 0;
|
return strncmp(msg, ClusterProtocol::HEARTBEAT_MSG, strlen(ClusterProtocol::HEARTBEAT_MSG)) == 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
bool ClusterManager::isResponseMsg(const char* msg) {
|
bool ClusterManager::isNodeUpdateMsg(const char* msg) {
|
||||||
return strncmp(msg, ClusterProtocol::RESPONSE_MSG, strlen(ClusterProtocol::RESPONSE_MSG)) == 0;
|
return strncmp(msg, ClusterProtocol::NODE_UPDATE_MSG, strlen(ClusterProtocol::NODE_UPDATE_MSG)) == 0;
|
||||||
}
|
|
||||||
|
|
||||||
bool ClusterManager::isNodeInfoMsg(const char* msg) {
|
|
||||||
return strncmp(msg, ClusterProtocol::NODE_INFO_MSG, strlen(ClusterProtocol::NODE_INFO_MSG)) == 0;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
bool ClusterManager::isClusterEventMsg(const char* msg) {
|
bool ClusterManager::isClusterEventMsg(const char* msg) {
|
||||||
@@ -123,87 +113,122 @@ bool ClusterManager::isRawMsg(const char* msg) {
|
|||||||
return msg[prefixLen] == ':';
|
return msg[prefixLen] == ':';
|
||||||
}
|
}
|
||||||
|
|
||||||
void ClusterManager::onDiscovery(const char* /*msg*/) {
|
// Discovery functionality removed - using heartbeat-only approach
|
||||||
ctx.udp->beginPacket(ctx.udp->remoteIP(), ctx.config.udp_port);
|
|
||||||
String response = String(ClusterProtocol::RESPONSE_MSG) + ":" + ctx.hostname;
|
void ClusterManager::onHeartbeat(const char* msg) {
|
||||||
ctx.udp->write(response.c_str());
|
// Extract hostname from heartbeat message: "CLUSTER_HEARTBEAT:hostname"
|
||||||
ctx.udp->endPacket();
|
const char* colon = strchr(msg, ':');
|
||||||
|
if (!colon) {
|
||||||
|
LOG_WARN("Cluster", "Invalid heartbeat message format");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
String hostname = String(colon + 1);
|
||||||
|
IPAddress senderIP = ctx.udp->remoteIP();
|
||||||
|
|
||||||
|
// Update memberlist with the heartbeat
|
||||||
|
addOrUpdateNode(hostname, senderIP);
|
||||||
|
|
||||||
|
// Respond with minimal node info (hostname, ip, uptime, labels)
|
||||||
|
sendNodeInfo(hostname, senderIP);
|
||||||
}
|
}
|
||||||
|
|
||||||
void ClusterManager::onHeartbeat(const char* /*msg*/) {
|
void ClusterManager::onNodeUpdate(const char* msg) {
|
||||||
|
// Message format: "NODE_UPDATE:hostname:{json}"
|
||||||
|
const char* firstColon = strchr(msg, ':');
|
||||||
|
if (!firstColon) {
|
||||||
|
LOG_WARN("Cluster", "Invalid NODE_UPDATE message format");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const char* secondColon = strchr(firstColon + 1, ':');
|
||||||
|
if (!secondColon) {
|
||||||
|
LOG_WARN("Cluster", "Invalid NODE_UPDATE message format");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
String hostnamePart = String(firstColon + 1);
|
||||||
|
String hostname = hostnamePart.substring(0, secondColon - firstColon - 1);
|
||||||
|
const char* jsonCStr = secondColon + 1;
|
||||||
|
|
||||||
|
JsonDocument doc;
|
||||||
|
DeserializationError err = deserializeJson(doc, jsonCStr);
|
||||||
|
if (err) {
|
||||||
|
LOG_WARN("Cluster", String("Failed to parse NODE_UPDATE JSON from ") + ctx.udp->remoteIP().toString());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update the specific node in memberlist
|
||||||
|
auto& memberList = *ctx.memberList;
|
||||||
|
auto it = memberList.find(hostname);
|
||||||
|
if (it != memberList.end()) {
|
||||||
|
NodeInfo& node = it->second;
|
||||||
|
|
||||||
|
// Update basic info if provided
|
||||||
|
if (doc["hostname"].is<const char*>()) {
|
||||||
|
node.hostname = doc["hostname"].as<const char*>();
|
||||||
|
}
|
||||||
|
if (doc["uptime"].is<unsigned long>()) {
|
||||||
|
node.uptime = doc["uptime"];
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update labels if provided
|
||||||
|
if (doc["labels"].is<JsonObject>()) {
|
||||||
|
node.labels.clear();
|
||||||
|
JsonObject labelsObj = doc["labels"].as<JsonObject>();
|
||||||
|
for (JsonPair kvp : labelsObj) {
|
||||||
|
const char* key = kvp.key().c_str();
|
||||||
|
const char* value = labelsObj[kvp.key()];
|
||||||
|
node.labels[key] = String(value);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
node.lastSeen = millis();
|
||||||
|
node.status = NodeInfo::ACTIVE;
|
||||||
|
|
||||||
|
LOG_DEBUG("Cluster", String("Updated node ") + hostname + " from NODE_UPDATE");
|
||||||
|
} else {
|
||||||
|
LOG_WARN("Cluster", String("Received NODE_UPDATE for unknown node: ") + hostname);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void ClusterManager::sendNodeInfo(const String& hostname, const IPAddress& targetIP) {
|
||||||
JsonDocument doc;
|
JsonDocument doc;
|
||||||
|
|
||||||
if (ctx.memberList) {
|
// Get our node info for the response
|
||||||
auto it = ctx.memberList->find(ctx.hostname);
|
auto& memberList = *ctx.memberList;
|
||||||
if (it != ctx.memberList->end()) {
|
auto it = memberList.find(ctx.hostname);
|
||||||
|
if (it != memberList.end()) {
|
||||||
|
const NodeInfo& node = it->second;
|
||||||
|
|
||||||
|
// Minimal response: hostname, ip, uptime, labels
|
||||||
|
doc["hostname"] = node.hostname;
|
||||||
|
doc["ip"] = node.ip.toString();
|
||||||
|
doc["uptime"] = millis() - node.lastSeen; // Approximate uptime
|
||||||
|
|
||||||
|
// Add labels if present
|
||||||
|
if (!node.labels.empty()) {
|
||||||
JsonObject labelsObj = doc["labels"].to<JsonObject>();
|
JsonObject labelsObj = doc["labels"].to<JsonObject>();
|
||||||
for (const auto& kv : it->second.labels) {
|
for (const auto& kv : node.labels) {
|
||||||
labelsObj[kv.first.c_str()] = kv.second;
|
|
||||||
}
|
|
||||||
} else if (!ctx.self.labels.empty()) {
|
|
||||||
JsonObject labelsObj = doc["labels"].to<JsonObject>();
|
|
||||||
for (const auto& kv : ctx.self.labels) {
|
|
||||||
labelsObj[kv.first.c_str()] = kv.second;
|
labelsObj[kv.first.c_str()] = kv.second;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
} else {
|
||||||
|
// Fallback to basic info
|
||||||
|
doc["hostname"] = ctx.hostname;
|
||||||
|
doc["ip"] = ctx.localIP.toString();
|
||||||
|
doc["uptime"] = millis();
|
||||||
}
|
}
|
||||||
|
|
||||||
String json;
|
String json;
|
||||||
serializeJson(doc, json);
|
serializeJson(doc, json);
|
||||||
|
|
||||||
ctx.udp->beginPacket(ctx.udp->remoteIP(), ctx.config.udp_port);
|
ctx.udp->beginPacket(targetIP, ctx.config.udp_port);
|
||||||
String msg = String(ClusterProtocol::NODE_INFO_MSG) + ":" + ctx.hostname + ":" + json;
|
String msg = String(ClusterProtocol::NODE_UPDATE_MSG) + ":" + hostname + ":" + json;
|
||||||
ctx.udp->write(msg.c_str());
|
ctx.udp->write(msg.c_str());
|
||||||
ctx.udp->endPacket();
|
ctx.udp->endPacket();
|
||||||
}
|
|
||||||
|
|
||||||
void ClusterManager::onResponse(const char* msg) {
|
LOG_DEBUG("Cluster", String("Sent NODE_UPDATE response to ") + hostname + " @ " + targetIP.toString());
|
||||||
char* hostPtr = const_cast<char*>(msg) + strlen(ClusterProtocol::RESPONSE_MSG) + 1;
|
|
||||||
String nodeHost = String(hostPtr);
|
|
||||||
addOrUpdateNode(nodeHost, ctx.udp->remoteIP());
|
|
||||||
}
|
|
||||||
|
|
||||||
void ClusterManager::onNodeInfo(const char* msg) {
|
|
||||||
char* p = const_cast<char*>(msg) + strlen(ClusterProtocol::NODE_INFO_MSG) + 1;
|
|
||||||
char* hostEnd = strchr(p, ':');
|
|
||||||
if (hostEnd) {
|
|
||||||
*hostEnd = '\0';
|
|
||||||
const char* hostCStr = p;
|
|
||||||
const char* jsonCStr = hostEnd + 1;
|
|
||||||
|
|
||||||
String nodeHost = String(hostCStr);
|
|
||||||
IPAddress senderIP = ctx.udp->remoteIP();
|
|
||||||
|
|
||||||
addOrUpdateNode(nodeHost, senderIP);
|
|
||||||
|
|
||||||
JsonDocument doc;
|
|
||||||
DeserializationError err = deserializeJson(doc, jsonCStr);
|
|
||||||
if (!err) {
|
|
||||||
auto& memberList = *ctx.memberList;
|
|
||||||
auto it = memberList.find(nodeHost);
|
|
||||||
if (it != memberList.end()) {
|
|
||||||
NodeInfo& node = it->second;
|
|
||||||
node.status = NodeInfo::ACTIVE;
|
|
||||||
unsigned long now = millis();
|
|
||||||
node.lastSeen = now;
|
|
||||||
if (lastHeartbeatSentAt != 0) {
|
|
||||||
node.latency = now - lastHeartbeatSentAt;
|
|
||||||
}
|
|
||||||
|
|
||||||
node.labels.clear();
|
|
||||||
if (doc["labels"].is<JsonObject>()) {
|
|
||||||
JsonObject labelsObj = doc["labels"].as<JsonObject>();
|
|
||||||
for (JsonPair kvp : labelsObj) {
|
|
||||||
const char* key = kvp.key().c_str();
|
|
||||||
const char* value = labelsObj[kvp.key()];
|
|
||||||
node.labels[key] = value;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
LOG_WARN("Cluster", String("Failed to parse NODE_INFO JSON from ") + senderIP.toString());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void ClusterManager::onClusterEvent(const char* msg) {
|
void ClusterManager::onClusterEvent(const char* msg) {
|
||||||
@@ -406,16 +431,19 @@ void ClusterManager::heartbeatTaskCallback() {
|
|||||||
NodeInfo& node = it->second;
|
NodeInfo& node = it->second;
|
||||||
node.lastSeen = millis();
|
node.lastSeen = millis();
|
||||||
node.status = NodeInfo::ACTIVE;
|
node.status = NodeInfo::ACTIVE;
|
||||||
|
node.uptime = millis(); // Update uptime
|
||||||
updateLocalNodeResources();
|
updateLocalNodeResources();
|
||||||
addOrUpdateNode(ctx.hostname, ctx.localIP);
|
addOrUpdateNode(ctx.hostname, ctx.localIP);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Broadcast heartbeat so peers can respond with their node info
|
// Broadcast heartbeat - peers will respond with NODE_UPDATE
|
||||||
lastHeartbeatSentAt = millis();
|
lastHeartbeatSentAt = millis();
|
||||||
ctx.udp->beginPacket("255.255.255.255", ctx.config.udp_port);
|
ctx.udp->beginPacket("255.255.255.255", ctx.config.udp_port);
|
||||||
String hb = String(ClusterProtocol::HEARTBEAT_MSG) + ":" + ctx.hostname;
|
String hb = String(ClusterProtocol::HEARTBEAT_MSG) + ":" + ctx.hostname;
|
||||||
ctx.udp->write(hb.c_str());
|
ctx.udp->write(hb.c_str());
|
||||||
ctx.udp->endPacket();
|
ctx.udp->endPacket();
|
||||||
|
|
||||||
|
LOG_DEBUG("Cluster", String("Sent heartbeat: ") + ctx.hostname);
|
||||||
}
|
}
|
||||||
|
|
||||||
void ClusterManager::updateAllMembersInfoTaskCallback() {
|
void ClusterManager::updateAllMembersInfoTaskCallback() {
|
||||||
@@ -423,6 +451,40 @@ void ClusterManager::updateAllMembersInfoTaskCallback() {
|
|||||||
// No-op to reduce network and memory usage
|
// No-op to reduce network and memory usage
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void ClusterManager::broadcastNodeUpdate() {
|
||||||
|
// Broadcast our current node info as NODE_UPDATE to all cluster members
|
||||||
|
auto& memberList = *ctx.memberList;
|
||||||
|
auto it = memberList.find(ctx.hostname);
|
||||||
|
if (it == memberList.end()) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const NodeInfo& node = it->second;
|
||||||
|
|
||||||
|
JsonDocument doc;
|
||||||
|
doc["hostname"] = node.hostname;
|
||||||
|
doc["uptime"] = node.uptime;
|
||||||
|
|
||||||
|
// Add labels if present
|
||||||
|
if (!node.labels.empty()) {
|
||||||
|
JsonObject labelsObj = doc["labels"].to<JsonObject>();
|
||||||
|
for (const auto& kv : node.labels) {
|
||||||
|
labelsObj[kv.first.c_str()] = kv.second;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
String json;
|
||||||
|
serializeJson(doc, json);
|
||||||
|
|
||||||
|
// Broadcast to all cluster members
|
||||||
|
ctx.udp->beginPacket("255.255.255.255", ctx.config.udp_port);
|
||||||
|
String msg = String(ClusterProtocol::NODE_UPDATE_MSG) + ":" + ctx.hostname + ":" + json;
|
||||||
|
ctx.udp->write(msg.c_str());
|
||||||
|
ctx.udp->endPacket();
|
||||||
|
|
||||||
|
LOG_DEBUG("Cluster", String("Broadcasted NODE_UPDATE for ") + ctx.hostname);
|
||||||
|
}
|
||||||
|
|
||||||
void ClusterManager::updateAllNodeStatuses() {
|
void ClusterManager::updateAllNodeStatuses() {
|
||||||
auto& memberList = *ctx.memberList;
|
auto& memberList = *ctx.memberList;
|
||||||
unsigned long now = millis();
|
unsigned long now = millis();
|
||||||
|
|||||||
@@ -255,11 +255,10 @@ void NodeService::handleGetConfigRequest(AsyncWebServerRequest* request) {
|
|||||||
|
|
||||||
// Cluster Configuration
|
// Cluster Configuration
|
||||||
JsonObject clusterObj = doc["cluster"].to<JsonObject>();
|
JsonObject clusterObj = doc["cluster"].to<JsonObject>();
|
||||||
clusterObj["discovery_interval_ms"] = ctx.config.discovery_interval_ms;
|
|
||||||
clusterObj["heartbeat_interval_ms"] = ctx.config.heartbeat_interval_ms;
|
clusterObj["heartbeat_interval_ms"] = ctx.config.heartbeat_interval_ms;
|
||||||
clusterObj["cluster_listen_interval_ms"] = ctx.config.cluster_listen_interval_ms;
|
clusterObj["cluster_listen_interval_ms"] = ctx.config.cluster_listen_interval_ms;
|
||||||
clusterObj["status_update_interval_ms"] = ctx.config.status_update_interval_ms;
|
clusterObj["status_update_interval_ms"] = ctx.config.status_update_interval_ms;
|
||||||
clusterObj["member_info_update_interval_ms"] = ctx.config.member_info_update_interval_ms;
|
clusterObj["node_update_broadcast_interval_ms"] = ctx.config.node_update_broadcast_interval_ms;
|
||||||
clusterObj["print_interval_ms"] = ctx.config.print_interval_ms;
|
clusterObj["print_interval_ms"] = ctx.config.print_interval_ms;
|
||||||
|
|
||||||
// Node Status Thresholds
|
// Node Status Thresholds
|
||||||
|
|||||||
@@ -32,11 +32,10 @@ void Config::setDefaults() {
|
|||||||
api_server_port = DEFAULT_API_SERVER_PORT;
|
api_server_port = DEFAULT_API_SERVER_PORT;
|
||||||
|
|
||||||
// Cluster Configuration
|
// Cluster Configuration
|
||||||
discovery_interval_ms = DEFAULT_DISCOVERY_INTERVAL_MS; // TODO retire this in favor of heartbeat_interval_ms
|
|
||||||
cluster_listen_interval_ms = DEFAULT_CLUSTER_LISTEN_INTERVAL_MS;
|
cluster_listen_interval_ms = DEFAULT_CLUSTER_LISTEN_INTERVAL_MS;
|
||||||
heartbeat_interval_ms = DEFAULT_HEARTBEAT_INTERVAL_MS;
|
heartbeat_interval_ms = DEFAULT_HEARTBEAT_INTERVAL_MS;
|
||||||
status_update_interval_ms = DEFAULT_STATUS_UPDATE_INTERVAL_MS;
|
status_update_interval_ms = DEFAULT_STATUS_UPDATE_INTERVAL_MS;
|
||||||
member_info_update_interval_ms = DEFAULT_MEMBER_INFO_UPDATE_INTERVAL_MS; // TODO retire this in favor of heartbeat_interval_ms
|
node_update_broadcast_interval_ms = DEFAULT_NODE_UPDATE_BROADCAST_INTERVAL_MS;
|
||||||
print_interval_ms = DEFAULT_PRINT_INTERVAL_MS;
|
print_interval_ms = DEFAULT_PRINT_INTERVAL_MS;
|
||||||
|
|
||||||
// Node Status Thresholds
|
// Node Status Thresholds
|
||||||
@@ -86,11 +85,10 @@ bool Config::saveToFile(const String& filename) {
|
|||||||
doc["network"]["api_server_port"] = api_server_port;
|
doc["network"]["api_server_port"] = api_server_port;
|
||||||
|
|
||||||
// Cluster Configuration
|
// Cluster Configuration
|
||||||
doc["cluster"]["discovery_interval_ms"] = discovery_interval_ms;
|
|
||||||
doc["cluster"]["heartbeat_interval_ms"] = heartbeat_interval_ms;
|
doc["cluster"]["heartbeat_interval_ms"] = heartbeat_interval_ms;
|
||||||
doc["cluster"]["cluster_listen_interval_ms"] = cluster_listen_interval_ms;
|
doc["cluster"]["cluster_listen_interval_ms"] = cluster_listen_interval_ms;
|
||||||
doc["cluster"]["status_update_interval_ms"] = status_update_interval_ms;
|
doc["cluster"]["status_update_interval_ms"] = status_update_interval_ms;
|
||||||
doc["cluster"]["member_info_update_interval_ms"] = member_info_update_interval_ms;
|
doc["cluster"]["node_update_broadcast_interval_ms"] = node_update_broadcast_interval_ms;
|
||||||
doc["cluster"]["print_interval_ms"] = print_interval_ms;
|
doc["cluster"]["print_interval_ms"] = print_interval_ms;
|
||||||
|
|
||||||
// Node Status Thresholds
|
// Node Status Thresholds
|
||||||
@@ -166,11 +164,10 @@ bool Config::loadFromFile(const String& filename) {
|
|||||||
api_server_port = doc["network"]["api_server_port"] | DEFAULT_API_SERVER_PORT;
|
api_server_port = doc["network"]["api_server_port"] | DEFAULT_API_SERVER_PORT;
|
||||||
|
|
||||||
// Load Cluster Configuration with defaults
|
// Load Cluster Configuration with defaults
|
||||||
discovery_interval_ms = doc["cluster"]["discovery_interval_ms"] | DEFAULT_DISCOVERY_INTERVAL_MS;
|
|
||||||
heartbeat_interval_ms = doc["cluster"]["heartbeat_interval_ms"] | DEFAULT_HEARTBEAT_INTERVAL_MS;
|
heartbeat_interval_ms = doc["cluster"]["heartbeat_interval_ms"] | DEFAULT_HEARTBEAT_INTERVAL_MS;
|
||||||
cluster_listen_interval_ms = doc["cluster"]["cluster_listen_interval_ms"] | DEFAULT_CLUSTER_LISTEN_INTERVAL_MS;
|
cluster_listen_interval_ms = doc["cluster"]["cluster_listen_interval_ms"] | DEFAULT_CLUSTER_LISTEN_INTERVAL_MS;
|
||||||
status_update_interval_ms = doc["cluster"]["status_update_interval_ms"] | DEFAULT_STATUS_UPDATE_INTERVAL_MS;
|
status_update_interval_ms = doc["cluster"]["status_update_interval_ms"] | DEFAULT_STATUS_UPDATE_INTERVAL_MS;
|
||||||
member_info_update_interval_ms = doc["cluster"]["member_info_update_interval_ms"] | DEFAULT_MEMBER_INFO_UPDATE_INTERVAL_MS;
|
node_update_broadcast_interval_ms = doc["cluster"]["node_update_broadcast_interval_ms"] | DEFAULT_NODE_UPDATE_BROADCAST_INTERVAL_MS;
|
||||||
print_interval_ms = doc["cluster"]["print_interval_ms"] | DEFAULT_PRINT_INTERVAL_MS;
|
print_interval_ms = doc["cluster"]["print_interval_ms"] | DEFAULT_PRINT_INTERVAL_MS;
|
||||||
|
|
||||||
// Load Node Status Thresholds with defaults
|
// Load Node Status Thresholds with defaults
|
||||||
|
|||||||
Reference in New Issue
Block a user