199 lines
6.1 KiB
Markdown
199 lines
6.1 KiB
Markdown
# Monitoring Resources Endpoint
|
|
|
|
## Overview
|
|
|
|
The `/api/monitoring/resources` endpoint provides comprehensive real-time resource monitoring for all nodes in the cluster.
|
|
|
|
## Endpoint
|
|
|
|
```
|
|
GET /api/monitoring/resources
|
|
```
|
|
|
|
## Response Format
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-10-24T10:30:45Z",
|
|
"nodes": [
|
|
{
|
|
"timestamp": 1729763445,
|
|
"node_ip": "192.168.1.100",
|
|
"hostname": "spore-node-1",
|
|
"cpu": {
|
|
"frequency_mhz": 160,
|
|
"usage_percent": 42.5,
|
|
"temperature_c": 58.3
|
|
},
|
|
"memory": {
|
|
"total_bytes": 98304,
|
|
"free_bytes": 45632,
|
|
"used_bytes": 52672,
|
|
"usage_percent": 53.6
|
|
},
|
|
"network": {
|
|
"bytes_sent": 3245678,
|
|
"bytes_received": 5678901,
|
|
"packets_sent": 32456,
|
|
"packets_received": 56789,
|
|
"rssi_dbm": -65,
|
|
"signal_quality_percent": 75.5
|
|
},
|
|
"flash": {
|
|
"total_bytes": 4194304,
|
|
"used_bytes": 2097152,
|
|
"free_bytes": 2097152,
|
|
"usage_percent": 50.0
|
|
},
|
|
"labels": {
|
|
"version": "1.0.0",
|
|
"stable": "true",
|
|
"env": "production",
|
|
"zone": "zone-1",
|
|
"type": "spore-node"
|
|
}
|
|
}
|
|
],
|
|
"summary": {
|
|
"total_nodes": 5,
|
|
"avg_cpu_usage_percent": 38.7,
|
|
"avg_memory_usage_percent": 51.2,
|
|
"avg_flash_usage_percent": 52.8,
|
|
"total_bytes_sent": 16228390,
|
|
"total_bytes_received": 28394505
|
|
}
|
|
}
|
|
```
|
|
|
|
## Data Fields
|
|
|
|
### CPU Metrics
|
|
- **frequency_mhz**: Current CPU frequency in MHz (80-240 MHz typical for ESP32)
|
|
- **usage_percent**: CPU utilization percentage (0-100%)
|
|
- **temperature_c**: CPU temperature in Celsius (45-65°C typical)
|
|
|
|
### Memory Metrics
|
|
- **total_bytes**: Total RAM available (64-128 KB typical)
|
|
- **free_bytes**: Free RAM available
|
|
- **used_bytes**: Used RAM
|
|
- **usage_percent**: Memory utilization percentage
|
|
|
|
### Network Metrics
|
|
- **bytes_sent**: Total bytes transmitted since boot
|
|
- **bytes_received**: Total bytes received since boot
|
|
- **packets_sent**: Total packets transmitted
|
|
- **packets_received**: Total packets received
|
|
- **rssi_dbm**: WiFi signal strength in dBm (-30 to -90 typical)
|
|
- **signal_quality_percent**: WiFi signal quality (0-100%)
|
|
|
|
### Flash Metrics
|
|
- **total_bytes**: Total flash storage (typically 4MB)
|
|
- **used_bytes**: Used flash storage
|
|
- **free_bytes**: Free flash storage
|
|
- **usage_percent**: Flash utilization percentage
|
|
|
|
### Node Labels
|
|
Each node includes labels that match firmware versions:
|
|
- **version**: Current firmware version (e.g., "1.0.0", "1.1.0", "1.2.0")
|
|
- **stable**: Whether this is a stable release ("true" or "false")
|
|
- **env**: Environment (e.g., "production", "beta")
|
|
- **zone**: Deployment zone (e.g., "zone-1", "zone-2", "zone-3")
|
|
- **type**: Node type (e.g., "spore-node")
|
|
|
|
### Summary Statistics
|
|
Aggregate metrics across all nodes:
|
|
- **total_nodes**: Total number of nodes monitored
|
|
- **avg_cpu_usage_percent**: Average CPU usage across all nodes
|
|
- **avg_memory_usage_percent**: Average memory usage across all nodes
|
|
- **avg_flash_usage_percent**: Average flash usage across all nodes
|
|
- **total_bytes_sent**: Combined network traffic sent
|
|
- **total_bytes_received**: Combined network traffic received
|
|
|
|
## Firmware Version Matching
|
|
|
|
Node labels are automatically synchronized with the firmware available in the registry:
|
|
|
|
| Version | Registry Status | Node Distribution | Environment |
|
|
|---------|----------------|-------------------|-------------|
|
|
| 1.0.0 | Stable | 40% of nodes | production |
|
|
| 1.1.0 | Stable | 40% of nodes | production |
|
|
| 1.2.0 | Beta | 20% of nodes | beta |
|
|
|
|
This ensures that monitoring data accurately reflects which firmware versions are deployed across the cluster.
|
|
|
|
## Use Cases
|
|
|
|
### 1. Real-time Dashboard
|
|
Display live resource usage for all nodes in a monitoring dashboard.
|
|
|
|
### 2. Alerting
|
|
Set up alerts based on thresholds:
|
|
- CPU usage > 80%
|
|
- Memory usage > 90%
|
|
- Flash usage > 95%
|
|
- WiFi signal quality < 30%
|
|
|
|
### 3. Capacity Planning
|
|
Track resource trends to plan firmware optimizations or hardware upgrades.
|
|
|
|
### 4. Firmware Rollout Monitoring
|
|
Monitor resource usage before, during, and after firmware rollouts to detect issues.
|
|
|
|
### 5. Network Health
|
|
Track WiFi signal quality and network traffic to identify connectivity issues.
|
|
|
|
## Example Usage
|
|
|
|
### cURL
|
|
```bash
|
|
curl http://localhost:3001/api/monitoring/resources
|
|
```
|
|
|
|
### JavaScript (fetch)
|
|
```javascript
|
|
const response = await fetch('http://localhost:3001/api/monitoring/resources');
|
|
const data = await response.json();
|
|
|
|
console.log(`Monitoring ${data.summary.total_nodes} nodes`);
|
|
console.log(`Average CPU: ${data.summary.avg_cpu_usage_percent.toFixed(1)}%`);
|
|
console.log(`Average Memory: ${data.summary.avg_memory_usage_percent.toFixed(1)}%`);
|
|
|
|
data.nodes.forEach(node => {
|
|
console.log(`${node.hostname} (${node.labels.version}): CPU ${node.cpu.usage_percent.toFixed(1)}%`);
|
|
});
|
|
```
|
|
|
|
### Python
|
|
```python
|
|
import requests
|
|
|
|
response = requests.get('http://localhost:3001/api/monitoring/resources')
|
|
data = response.json()
|
|
|
|
print(f"Monitoring {data['summary']['total_nodes']} nodes")
|
|
print(f"Average CPU: {data['summary']['avg_cpu_usage_percent']:.1f}%")
|
|
print(f"Average Memory: {data['summary']['avg_memory_usage_percent']:.1f}%")
|
|
|
|
for node in data['nodes']:
|
|
print(f"{node['hostname']} ({node['labels']['version']}): "
|
|
f"CPU {node['cpu']['usage_percent']:.1f}%")
|
|
```
|
|
|
|
## Mock Gateway Behavior
|
|
|
|
The mock gateway generates realistic monitoring data with:
|
|
- **Dynamic values**: CPU, memory, and network metrics vary on each request
|
|
- **Realistic ranges**: Values stay within typical ESP32 hardware limits
|
|
- **Signal quality**: WiFi RSSI converted to quality percentage
|
|
- **Consistent labels**: Node labels always match firmware registry versions
|
|
- **Aggregate summaries**: Automatic calculation of cluster-wide statistics
|
|
|
|
## Integration with WebSocket
|
|
|
|
For real-time updates, consider combining this endpoint with the WebSocket connection at `/ws` which broadcasts:
|
|
- Node status changes
|
|
- Firmware update progress
|
|
- Cluster membership changes
|
|
|
|
The monitoring endpoint provides detailed point-in-time snapshots, while WebSocket provides real-time event streams.
|