docs: update rollout docs

This commit is contained in:
2025-10-21 22:34:07 +02:00
parent 9c86e215fe
commit e6bca2c2e5
3 changed files with 192 additions and 13 deletions

View File

@@ -1,14 +1,192 @@
# Rollout
The rollout feature works together with the spore-registry.
It provides an endpoint `/cluster/node/versions` to determin which version are installed on which nodes through the `version` label.
A rollout can be started by calling the `/rollout` endpoint and providing a set of labels.
The endpoint will then search the the corresponding firmware in the spore-registry and checks the cluster members that match the labels.
The gateway will then upload the firmware that was found to the matching cluster members in the background. Rollout and upload progress is sent through websocket.
Before the upload starts, the `version` label on the member node is updated with the firmware version from the registry.
The rollout feature provides orchestrated firmware updates across multiple SPORE nodes. It integrates with the spore-registry to manage firmware binaries and uses WebSocket communication for real-time progress updates.
The spore-ui provides a rollout button on each firmware version. When clicked, the existing drawer is shown with the Rollout panel.
The gateway is consulted (endpoint `/cluster/node/versions`) o return the list of matching members that are affected by the rollout and displayed inside the Rollout panel.
The button `Rollout` will, once clicked, trigger the `/rollout` endpoint with the label set of the selected firmware that needs to be rolled out.
Rollout and upload progress is received through websocket and the Rollout panel updated in realtime.
Any UI interaction is blocked during rollout and the UI behaves like the Firmware Deploy on the cluster view (also with backdrop and info message).
## Architecture
### Components
- **spore-gateway**: Orchestrates rollouts, proxies registry calls, manages WebSocket communication
- **spore-registry**: Stores firmware binaries and metadata
- **spore-ui**: Provides rollout interface and real-time status updates
- **SPORE Nodes**: Target devices for firmware updates
### Data Flow
1. **UI Discovery**: Frontend queries `/api/cluster/node/versions` to find matching nodes
2. **Rollout Initiation**: Frontend sends firmware info and node list to `/api/rollout`
3. **Parallel Processing**: Gateway processes multiple nodes concurrently using goroutines
4. **Real-time Updates**: Progress and status updates sent via WebSocket
5. **Status Display**: UI shows updating status directly on cluster view nodes
## API Endpoints
### `/api/cluster/node/versions` (GET)
Returns cluster members with their current firmware versions based on the `version` label.
**Response:**
```json
{
"members": [
{
"ip": "10.0.1.134",
"version": "1.1.0",
"labels": {"app": "base", "role": "debug"}
}
]
}
```
### `/api/rollout` (POST)
Initiates a firmware rollout for specified nodes.
**Request Body:**
```json
{
"firmware": {
"name": "my-firmware",
"version": "1.0.0",
"labels": {"app": "base"}
},
"nodes": [
{
"ip": "10.0.1.134",
"version": "1.1.0",
"labels": {"app": "base", "role": "debug"}
}
]
}
```
**Response:**
```json
{
"success": true,
"message": "Rollout started for 3 nodes",
"rolloutId": "rollout_1761076653",
"totalNodes": 3,
"firmwareUrl": "http://localhost:3002/firmware/my-firmware/1.0.0"
}
```
## Rollout Process
### 1. Firmware Lookup
- Gateway looks up firmware in registry by name and version
- Validates firmware exists and is accessible
### 2. Parallel Node Processing
- Each node is processed in a separate goroutine
- Uses `sync.WaitGroup` for coordination
- Processes up to N nodes concurrently (where N = total nodes)
### 3. Node Update Sequence
For each node:
1. **Status Update**: Broadcast `"updating"` status via WebSocket
2. **Label Update**: Update node's `version` label to new firmware version
3. **Firmware Upload**: Upload firmware binary to node
4. **Status Completion**: Broadcast `"online"` status via WebSocket
### 4. Error Handling
- Failed nodes broadcast `"online"` status to return to normal
- Rollout continues for remaining nodes
- Detailed error logging for debugging
## WebSocket Communication
### Message Types
#### `rollout_progress`
```json
{
"type": "rollout_progress",
"rolloutId": "rollout_1761076653",
"nodeIp": "10.0.1.134",
"status": "uploading",
"current": 2,
"total": 3,
"progress": 67,
"timestamp": "2025-01-21T20:05:00Z"
}
```
**Status Values:**
- `updating_labels`: Node labels being updated
- `uploading`: Firmware being uploaded to node
- `completed`: Node update completed successfully
- `failed`: Node update failed
#### `node_status_update`
```json
{
"type": "node_status_update",
"nodeIp": "10.0.1.134",
"status": "updating",
"timestamp": "2025-01-21T20:05:00Z"
}
```
**Status Values:**
- `updating`: Node is being updated (blue indicator)
- `online`: Node is online and operational (green indicator)
## UI Behavior
### Rollout Panel
- Shows firmware details and matching nodes
- Displays node IP, current version, and labels
- Provides "Rollout" button to initiate process
### Real-time Updates
- **Node Status**: Cluster view shows blue "updating" indicator during rollout
- **Progress Tracking**: Rollout panel shows individual node status
- **Completion Detection**: Automatically detects when all nodes complete
### Status Indicators
- **Ready**: Node ready for rollout (gray)
- **Updating**: Node being updated (blue, accent-secondary color)
- **Completed**: Node update completed (green)
- **Failed**: Node update failed (red)
## Registry Integration
### Firmware Lookup
- Gateway uses `FindFirmwareByNameAndVersion()` for direct lookup
- No label-based matching required
- Ensures exact firmware version is deployed
### Proxy Endpoints
All registry operations are proxied through the gateway:
- `GET /api/registry/health` - Registry health check
- `GET /api/registry/firmware` - List firmware
- `POST /api/registry/firmware` - Upload firmware
- `GET /api/registry/firmware/{name}/{version}` - Download firmware
- `PUT /api/registry/firmware/{name}/{version}` - Update firmware metadata
## Error Handling
### Common Error Scenarios
1. **Firmware Not Found**: Returns 404 with specific error message
2. **Node Communication Failure**: Logs error, continues with other nodes
3. **Registry Unavailable**: Returns 503 service unavailable
4. **Invalid Request**: Returns 400 with validation details
### Logging
- Detailed logs for each rollout step
- Node-specific error tracking
- Performance metrics (upload times, success rates)
## Performance Considerations
### Parallel Processing
- Multiple nodes updated simultaneously
- Configurable concurrency limits
- Efficient resource utilization
### WebSocket Optimization
- Batched status updates
- Efficient message serialization
- Connection pooling for registry calls
### Memory Management
- Streaming firmware downloads
- Bounded goroutine pools
- Proper resource cleanup