Files
spore-gateway/docs/Rollout.md
2025-10-22 19:57:48 +02:00

192 lines
5.4 KiB
Markdown

# Rollout
The rollout feature provides orchestrated firmware updates across multiple SPORE nodes. It integrates with the spore-registry to manage firmware binaries and uses WebSocket communication for real-time progress updates.
## Architecture
### Components
- **spore-gateway**: Orchestrates rollouts, proxies registry calls, manages WebSocket communication
- **spore-registry**: Stores firmware binaries and metadata
- **spore-ui**: Provides rollout interface and real-time status updates
- **SPORE Nodes**: Target devices for firmware updates
### Data Flow
1. **UI Discovery**: Frontend queries `/api/cluster/node/versions` to find matching nodes
2. **Rollout Initiation**: Frontend sends firmware info and node list to `/api/rollout`
3. **Parallel Processing**: Gateway processes multiple nodes concurrently using goroutines
4. **Real-time Updates**: Progress and status updates sent via WebSocket
5. **Status Display**: UI shows updating status directly on cluster view nodes
## API Endpoints
### `/api/cluster/node/versions` (GET)
Returns cluster members with their current firmware versions based on the `version` label.
**Response:**
```json
{
"members": [
{
"ip": "10.0.1.134",
"version": "1.1.0",
"labels": {"app": "base", "role": "debug"}
}
]
}
```
### `/api/rollout` (POST)
Initiates a firmware rollout for specified nodes.
**Request Body:**
```json
{
"firmware": {
"name": "my-firmware",
"version": "1.0.0",
"labels": {"app": "base"}
},
"nodes": [
{
"ip": "10.0.1.134",
"version": "1.1.0",
"labels": {"app": "base", "role": "debug"}
}
]
}
```
**Response:**
```json
{
"success": true,
"message": "Rollout started for 3 nodes",
"rolloutId": "rollout_1761076653",
"totalNodes": 3,
"firmwareUrl": "http://localhost:3002/firmware/my-firmware/1.0.0"
}
```
## Rollout Process
### 1. Firmware Lookup
- Gateway looks up firmware in registry by name and version
- Validates firmware exists and is accessible
### 2. Parallel Node Processing
- Each node is processed in a separate goroutine
- Uses `sync.WaitGroup` for coordination
- Processes up to N nodes concurrently (where N = total nodes)
### 3. Node Update Sequence
For each node:
1. **Status Update**: Broadcast `"updating"` status via WebSocket
2. **Label Update**: Update node's `version` label to new firmware version
3. **Firmware Upload**: Upload firmware binary to node
4. **Status Completion**: Broadcast `"online"` status via WebSocket
### 4. Error Handling
- Failed nodes broadcast `"online"` status to return to normal
- Rollout continues for remaining nodes
- Detailed error logging for debugging
## WebSocket Communication
### Message Types
#### `rollout_progress`
```json
{
"type": "rollout_progress",
"rolloutId": "rollout_1761076653",
"nodeIp": "10.0.1.134",
"status": "uploading",
"current": 2,
"total": 3,
"progress": 67,
"timestamp": "2025-01-21T20:05:00Z"
}
```
**Status Values:**
- `updating_labels`: Node labels being updated
- `uploading`: Firmware being uploaded to node
- `completed`: Node update completed successfully
- `failed`: Node update failed
#### `node_status_update`
```json
{
"type": "node_status_update",
"nodeIp": "10.0.1.134",
"status": "updating",
"timestamp": "2025-01-21T20:05:00Z"
}
```
**Status Values:**
- `updating`: Node is being updated (blue indicator)
- `online`: Node is online and operational (green indicator)
## UI Behavior
### Rollout Panel
- Shows firmware details and matching nodes
- Displays node IP, current version, and labels
- Provides "Rollout" button to initiate process
### Real-time Updates
- **Node Status**: Cluster view shows blue "updating" indicator during rollout
- **Progress Tracking**: Rollout panel shows individual node status
- **Completion Detection**: Automatically detects when all nodes complete
### Status Indicators
- **Ready**: Node ready for rollout (gray)
- **Updating**: Node being updated (blue, accent-secondary color)
- **Completed**: Node update completed (green)
- **Failed**: Node update failed (red)
## Registry Integration
### Firmware Lookup
- Gateway uses `FindFirmwareByNameAndVersion()` for direct lookup
- No label-based matching required
- Ensures exact firmware version is deployed
### Proxy Endpoints
All registry operations are proxied through the gateway:
- `GET /api/registry/health` - Registry health check
- `GET /api/registry/firmware` - List firmware
- `POST /api/registry/firmware` - Upload firmware
- `GET /api/registry/firmware/{name}/{version}` - Download firmware
- `PUT /api/registry/firmware/{name}/{version}` - Update firmware metadata
## Error Handling
### Common Error Scenarios
1. **Firmware Not Found**: Returns 404 with specific error message
2. **Node Communication Failure**: Logs error, continues with other nodes
3. **Registry Unavailable**: Returns 503 service unavailable
4. **Invalid Request**: Returns 400 with validation details
### Logging
- Detailed logs for each rollout step
- Node-specific error tracking
- Performance metrics (upload times, success rates)
## Performance Considerations
### Parallel Processing
- Multiple nodes updated simultaneously
- Configurable concurrency limits
- Efficient resource utilization
### WebSocket Optimization
- Batched status updates
- Efficient message serialization
- Connection pooling for registry calls
### Memory Management
- Streaming firmware downloads
- Bounded goroutine pools
- Proper resource cleanup