Files
spore-gateway/docs/monitoring-example.md
Patrick Balsiger 3c3fb886a3 feat: mock gateway
2025-10-24 14:24:14 +02:00

6.1 KiB

Monitoring Resources Endpoint

Overview

The /api/monitoring/resources endpoint provides comprehensive real-time resource monitoring for all nodes in the cluster.

Endpoint

GET /api/monitoring/resources

Response Format

{
  "timestamp": "2025-10-24T10:30:45Z",
  "nodes": [
    {
      "timestamp": 1729763445,
      "node_ip": "192.168.1.100",
      "hostname": "spore-node-1",
      "cpu": {
        "frequency_mhz": 160,
        "usage_percent": 42.5,
        "temperature_c": 58.3
      },
      "memory": {
        "total_bytes": 98304,
        "free_bytes": 45632,
        "used_bytes": 52672,
        "usage_percent": 53.6
      },
      "network": {
        "bytes_sent": 3245678,
        "bytes_received": 5678901,
        "packets_sent": 32456,
        "packets_received": 56789,
        "rssi_dbm": -65,
        "signal_quality_percent": 75.5
      },
      "flash": {
        "total_bytes": 4194304,
        "used_bytes": 2097152,
        "free_bytes": 2097152,
        "usage_percent": 50.0
      },
      "labels": {
        "version": "1.0.0",
        "stable": "true",
        "env": "production",
        "zone": "zone-1",
        "type": "spore-node"
      }
    }
  ],
  "summary": {
    "total_nodes": 5,
    "avg_cpu_usage_percent": 38.7,
    "avg_memory_usage_percent": 51.2,
    "avg_flash_usage_percent": 52.8,
    "total_bytes_sent": 16228390,
    "total_bytes_received": 28394505
  }
}

Data Fields

CPU Metrics

  • frequency_mhz: Current CPU frequency in MHz (80-240 MHz typical for ESP32)
  • usage_percent: CPU utilization percentage (0-100%)
  • temperature_c: CPU temperature in Celsius (45-65°C typical)

Memory Metrics

  • total_bytes: Total RAM available (64-128 KB typical)
  • free_bytes: Free RAM available
  • used_bytes: Used RAM
  • usage_percent: Memory utilization percentage

Network Metrics

  • bytes_sent: Total bytes transmitted since boot
  • bytes_received: Total bytes received since boot
  • packets_sent: Total packets transmitted
  • packets_received: Total packets received
  • rssi_dbm: WiFi signal strength in dBm (-30 to -90 typical)
  • signal_quality_percent: WiFi signal quality (0-100%)

Flash Metrics

  • total_bytes: Total flash storage (typically 4MB)
  • used_bytes: Used flash storage
  • free_bytes: Free flash storage
  • usage_percent: Flash utilization percentage

Node Labels

Each node includes labels that match firmware versions:

  • version: Current firmware version (e.g., "1.0.0", "1.1.0", "1.2.0")
  • stable: Whether this is a stable release ("true" or "false")
  • env: Environment (e.g., "production", "beta")
  • zone: Deployment zone (e.g., "zone-1", "zone-2", "zone-3")
  • type: Node type (e.g., "spore-node")

Summary Statistics

Aggregate metrics across all nodes:

  • total_nodes: Total number of nodes monitored
  • avg_cpu_usage_percent: Average CPU usage across all nodes
  • avg_memory_usage_percent: Average memory usage across all nodes
  • avg_flash_usage_percent: Average flash usage across all nodes
  • total_bytes_sent: Combined network traffic sent
  • total_bytes_received: Combined network traffic received

Firmware Version Matching

Node labels are automatically synchronized with the firmware available in the registry:

Version Registry Status Node Distribution Environment
1.0.0 Stable 40% of nodes production
1.1.0 Stable 40% of nodes production
1.2.0 Beta 20% of nodes beta

This ensures that monitoring data accurately reflects which firmware versions are deployed across the cluster.

Use Cases

1. Real-time Dashboard

Display live resource usage for all nodes in a monitoring dashboard.

2. Alerting

Set up alerts based on thresholds:

  • CPU usage > 80%
  • Memory usage > 90%
  • Flash usage > 95%
  • WiFi signal quality < 30%

3. Capacity Planning

Track resource trends to plan firmware optimizations or hardware upgrades.

4. Firmware Rollout Monitoring

Monitor resource usage before, during, and after firmware rollouts to detect issues.

5. Network Health

Track WiFi signal quality and network traffic to identify connectivity issues.

Example Usage

cURL

curl http://localhost:3001/api/monitoring/resources

JavaScript (fetch)

const response = await fetch('http://localhost:3001/api/monitoring/resources');
const data = await response.json();

console.log(`Monitoring ${data.summary.total_nodes} nodes`);
console.log(`Average CPU: ${data.summary.avg_cpu_usage_percent.toFixed(1)}%`);
console.log(`Average Memory: ${data.summary.avg_memory_usage_percent.toFixed(1)}%`);

data.nodes.forEach(node => {
  console.log(`${node.hostname} (${node.labels.version}): CPU ${node.cpu.usage_percent.toFixed(1)}%`);
});

Python

import requests

response = requests.get('http://localhost:3001/api/monitoring/resources')
data = response.json()

print(f"Monitoring {data['summary']['total_nodes']} nodes")
print(f"Average CPU: {data['summary']['avg_cpu_usage_percent']:.1f}%")
print(f"Average Memory: {data['summary']['avg_memory_usage_percent']:.1f}%")

for node in data['nodes']:
    print(f"{node['hostname']} ({node['labels']['version']}): "
          f"CPU {node['cpu']['usage_percent']:.1f}%")

Mock Gateway Behavior

The mock gateway generates realistic monitoring data with:

  • Dynamic values: CPU, memory, and network metrics vary on each request
  • Realistic ranges: Values stay within typical ESP32 hardware limits
  • Signal quality: WiFi RSSI converted to quality percentage
  • Consistent labels: Node labels always match firmware registry versions
  • Aggregate summaries: Automatic calculation of cluster-wide statistics

Integration with WebSocket

For real-time updates, consider combining this endpoint with the WebSocket connection at /ws which broadcasts:

  • Node status changes
  • Firmware update progress
  • Cluster membership changes

The monitoring endpoint provides detailed point-in-time snapshots, while WebSocket provides real-time event streams.