192 lines
5.4 KiB
Markdown
192 lines
5.4 KiB
Markdown
# Rollout
|
|
|
|
The rollout feature provides orchestrated firmware updates across multiple SPORE nodes. It integrates with the spore-registry to manage firmware binaries and uses WebSocket communication for real-time progress updates.
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
- **spore-gateway**: Orchestrates rollouts, proxies registry calls, manages WebSocket communication
|
|
- **spore-registry**: Stores firmware binaries and metadata
|
|
- **spore-ui**: Provides rollout interface and real-time status updates
|
|
- **SPORE Nodes**: Target devices for firmware updates
|
|
|
|
### Data Flow
|
|
1. **UI Discovery**: Frontend queries `/api/cluster/node/versions` to find matching nodes
|
|
2. **Rollout Initiation**: Frontend sends firmware info and node list to `/api/rollout`
|
|
3. **Parallel Processing**: Gateway processes multiple nodes concurrently using goroutines
|
|
4. **Real-time Updates**: Progress and status updates sent via WebSocket
|
|
5. **Status Display**: UI shows updating status directly on cluster view nodes
|
|
|
|
## API Endpoints
|
|
|
|
### `/api/cluster/node/versions` (GET)
|
|
Returns cluster members with their current firmware versions based on the `version` label.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"members": [
|
|
{
|
|
"ip": "10.0.1.134",
|
|
"version": "1.1.0",
|
|
"labels": {"app": "base", "role": "debug"}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### `/api/rollout` (POST)
|
|
Initiates a firmware rollout for specified nodes.
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"firmware": {
|
|
"name": "my-firmware",
|
|
"version": "1.0.0",
|
|
"labels": {"app": "base"}
|
|
},
|
|
"nodes": [
|
|
{
|
|
"ip": "10.0.1.134",
|
|
"version": "1.1.0",
|
|
"labels": {"app": "base", "role": "debug"}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Rollout started for 3 nodes",
|
|
"rolloutId": "rollout_1761076653",
|
|
"totalNodes": 3,
|
|
"firmwareUrl": "http://localhost:3002/firmware/my-firmware/1.0.0"
|
|
}
|
|
```
|
|
|
|
## Rollout Process
|
|
|
|
### 1. Firmware Lookup
|
|
- Gateway looks up firmware in registry by name and version
|
|
- Validates firmware exists and is accessible
|
|
|
|
### 2. Parallel Node Processing
|
|
- Each node is processed in a separate goroutine
|
|
- Uses `sync.WaitGroup` for coordination
|
|
- Processes up to N nodes concurrently (where N = total nodes)
|
|
|
|
### 3. Node Update Sequence
|
|
For each node:
|
|
1. **Status Update**: Broadcast `"updating"` status via WebSocket
|
|
2. **Label Update**: Update node's `version` label to new firmware version
|
|
3. **Firmware Upload**: Upload firmware binary to node
|
|
4. **Status Completion**: Broadcast `"online"` status via WebSocket
|
|
|
|
### 4. Error Handling
|
|
- Failed nodes broadcast `"online"` status to return to normal
|
|
- Rollout continues for remaining nodes
|
|
- Detailed error logging for debugging
|
|
|
|
## WebSocket Communication
|
|
|
|
### Message Types
|
|
|
|
#### `rollout_progress`
|
|
```json
|
|
{
|
|
"type": "rollout_progress",
|
|
"rolloutId": "rollout_1761076653",
|
|
"nodeIp": "10.0.1.134",
|
|
"status": "uploading",
|
|
"current": 2,
|
|
"total": 3,
|
|
"progress": 67,
|
|
"timestamp": "2025-01-21T20:05:00Z"
|
|
}
|
|
```
|
|
|
|
**Status Values:**
|
|
- `updating_labels`: Node labels being updated
|
|
- `uploading`: Firmware being uploaded to node
|
|
- `completed`: Node update completed successfully
|
|
- `failed`: Node update failed
|
|
|
|
#### `node_status_update`
|
|
```json
|
|
{
|
|
"type": "node_status_update",
|
|
"nodeIp": "10.0.1.134",
|
|
"status": "updating",
|
|
"timestamp": "2025-01-21T20:05:00Z"
|
|
}
|
|
```
|
|
|
|
**Status Values:**
|
|
- `updating`: Node is being updated (blue indicator)
|
|
- `online`: Node is online and operational (green indicator)
|
|
|
|
## UI Behavior
|
|
|
|
### Rollout Panel
|
|
- Shows firmware details and matching nodes
|
|
- Displays node IP, current version, and labels
|
|
- Provides "Rollout" button to initiate process
|
|
|
|
### Real-time Updates
|
|
- **Node Status**: Cluster view shows blue "updating" indicator during rollout
|
|
- **Progress Tracking**: Rollout panel shows individual node status
|
|
- **Completion Detection**: Automatically detects when all nodes complete
|
|
|
|
### Status Indicators
|
|
- **Ready**: Node ready for rollout (gray)
|
|
- **Updating**: Node being updated (blue, accent-secondary color)
|
|
- **Completed**: Node update completed (green)
|
|
- **Failed**: Node update failed (red)
|
|
|
|
## Registry Integration
|
|
|
|
### Firmware Lookup
|
|
- Gateway uses `FindFirmwareByNameAndVersion()` for direct lookup
|
|
- No label-based matching required
|
|
- Ensures exact firmware version is deployed
|
|
|
|
### Proxy Endpoints
|
|
All registry operations are proxied through the gateway:
|
|
- `GET /api/registry/health` - Registry health check
|
|
- `GET /api/registry/firmware` - List firmware
|
|
- `POST /api/registry/firmware` - Upload firmware
|
|
- `GET /api/registry/firmware/{name}/{version}` - Download firmware
|
|
- `PUT /api/registry/firmware/{name}/{version}` - Update firmware metadata
|
|
|
|
## Error Handling
|
|
|
|
### Common Error Scenarios
|
|
1. **Firmware Not Found**: Returns 404 with specific error message
|
|
2. **Node Communication Failure**: Logs error, continues with other nodes
|
|
3. **Registry Unavailable**: Returns 503 service unavailable
|
|
4. **Invalid Request**: Returns 400 with validation details
|
|
|
|
### Logging
|
|
- Detailed logs for each rollout step
|
|
- Node-specific error tracking
|
|
- Performance metrics (upload times, success rates)
|
|
|
|
## Performance Considerations
|
|
|
|
### Parallel Processing
|
|
- Multiple nodes updated simultaneously
|
|
- Configurable concurrency limits
|
|
- Efficient resource utilization
|
|
|
|
### WebSocket Optimization
|
|
- Batched status updates
|
|
- Efficient message serialization
|
|
- Connection pooling for registry calls
|
|
|
|
### Memory Management
|
|
- Streaming firmware downloads
|
|
- Bounded goroutine pools
|
|
- Proper resource cleanup |