docs: update rollout docs
This commit is contained in:
200
docs/Rollout.md
200
docs/Rollout.md
@@ -1,14 +1,192 @@
|
||||
# Rollout
|
||||
|
||||
The rollout feature works together with the spore-registry.
|
||||
It provides an endpoint `/cluster/node/versions` to determin which version are installed on which nodes through the `version` label.
|
||||
A rollout can be started by calling the `/rollout` endpoint and providing a set of labels.
|
||||
The endpoint will then search the the corresponding firmware in the spore-registry and checks the cluster members that match the labels.
|
||||
The gateway will then upload the firmware that was found to the matching cluster members in the background. Rollout and upload progress is sent through websocket.
|
||||
Before the upload starts, the `version` label on the member node is updated with the firmware version from the registry.
|
||||
The rollout feature provides orchestrated firmware updates across multiple SPORE nodes. It integrates with the spore-registry to manage firmware binaries and uses WebSocket communication for real-time progress updates.
|
||||
|
||||
The spore-ui provides a rollout button on each firmware version. When clicked, the existing drawer is shown with the Rollout panel.
|
||||
The gateway is consulted (endpoint `/cluster/node/versions`) o return the list of matching members that are affected by the rollout and displayed inside the Rollout panel.
|
||||
The button `Rollout` will, once clicked, trigger the `/rollout` endpoint with the label set of the selected firmware that needs to be rolled out.
|
||||
Rollout and upload progress is received through websocket and the Rollout panel updated in realtime.
|
||||
Any UI interaction is blocked during rollout and the UI behaves like the Firmware Deploy on the cluster view (also with backdrop and info message).
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
- **spore-gateway**: Orchestrates rollouts, proxies registry calls, manages WebSocket communication
|
||||
- **spore-registry**: Stores firmware binaries and metadata
|
||||
- **spore-ui**: Provides rollout interface and real-time status updates
|
||||
- **SPORE Nodes**: Target devices for firmware updates
|
||||
|
||||
### Data Flow
|
||||
1. **UI Discovery**: Frontend queries `/api/cluster/node/versions` to find matching nodes
|
||||
2. **Rollout Initiation**: Frontend sends firmware info and node list to `/api/rollout`
|
||||
3. **Parallel Processing**: Gateway processes multiple nodes concurrently using goroutines
|
||||
4. **Real-time Updates**: Progress and status updates sent via WebSocket
|
||||
5. **Status Display**: UI shows updating status directly on cluster view nodes
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### `/api/cluster/node/versions` (GET)
|
||||
Returns cluster members with their current firmware versions based on the `version` label.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"members": [
|
||||
{
|
||||
"ip": "10.0.1.134",
|
||||
"version": "1.1.0",
|
||||
"labels": {"app": "base", "role": "debug"}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `/api/rollout` (POST)
|
||||
Initiates a firmware rollout for specified nodes.
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"firmware": {
|
||||
"name": "my-firmware",
|
||||
"version": "1.0.0",
|
||||
"labels": {"app": "base"}
|
||||
},
|
||||
"nodes": [
|
||||
{
|
||||
"ip": "10.0.1.134",
|
||||
"version": "1.1.0",
|
||||
"labels": {"app": "base", "role": "debug"}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "Rollout started for 3 nodes",
|
||||
"rolloutId": "rollout_1761076653",
|
||||
"totalNodes": 3,
|
||||
"firmwareUrl": "http://localhost:3002/firmware/my-firmware/1.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
## Rollout Process
|
||||
|
||||
### 1. Firmware Lookup
|
||||
- Gateway looks up firmware in registry by name and version
|
||||
- Validates firmware exists and is accessible
|
||||
|
||||
### 2. Parallel Node Processing
|
||||
- Each node is processed in a separate goroutine
|
||||
- Uses `sync.WaitGroup` for coordination
|
||||
- Processes up to N nodes concurrently (where N = total nodes)
|
||||
|
||||
### 3. Node Update Sequence
|
||||
For each node:
|
||||
1. **Status Update**: Broadcast `"updating"` status via WebSocket
|
||||
2. **Label Update**: Update node's `version` label to new firmware version
|
||||
3. **Firmware Upload**: Upload firmware binary to node
|
||||
4. **Status Completion**: Broadcast `"online"` status via WebSocket
|
||||
|
||||
### 4. Error Handling
|
||||
- Failed nodes broadcast `"online"` status to return to normal
|
||||
- Rollout continues for remaining nodes
|
||||
- Detailed error logging for debugging
|
||||
|
||||
## WebSocket Communication
|
||||
|
||||
### Message Types
|
||||
|
||||
#### `rollout_progress`
|
||||
```json
|
||||
{
|
||||
"type": "rollout_progress",
|
||||
"rolloutId": "rollout_1761076653",
|
||||
"nodeIp": "10.0.1.134",
|
||||
"status": "uploading",
|
||||
"current": 2,
|
||||
"total": 3,
|
||||
"progress": 67,
|
||||
"timestamp": "2025-01-21T20:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Values:**
|
||||
- `updating_labels`: Node labels being updated
|
||||
- `uploading`: Firmware being uploaded to node
|
||||
- `completed`: Node update completed successfully
|
||||
- `failed`: Node update failed
|
||||
|
||||
#### `node_status_update`
|
||||
```json
|
||||
{
|
||||
"type": "node_status_update",
|
||||
"nodeIp": "10.0.1.134",
|
||||
"status": "updating",
|
||||
"timestamp": "2025-01-21T20:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Values:**
|
||||
- `updating`: Node is being updated (blue indicator)
|
||||
- `online`: Node is online and operational (green indicator)
|
||||
|
||||
## UI Behavior
|
||||
|
||||
### Rollout Panel
|
||||
- Shows firmware details and matching nodes
|
||||
- Displays node IP, current version, and labels
|
||||
- Provides "Rollout" button to initiate process
|
||||
|
||||
### Real-time Updates
|
||||
- **Node Status**: Cluster view shows blue "updating" indicator during rollout
|
||||
- **Progress Tracking**: Rollout panel shows individual node status
|
||||
- **Completion Detection**: Automatically detects when all nodes complete
|
||||
|
||||
### Status Indicators
|
||||
- **Ready**: Node ready for rollout (gray)
|
||||
- **Updating**: Node being updated (blue, accent-secondary color)
|
||||
- **Completed**: Node update completed (green)
|
||||
- **Failed**: Node update failed (red)
|
||||
|
||||
## Registry Integration
|
||||
|
||||
### Firmware Lookup
|
||||
- Gateway uses `FindFirmwareByNameAndVersion()` for direct lookup
|
||||
- No label-based matching required
|
||||
- Ensures exact firmware version is deployed
|
||||
|
||||
### Proxy Endpoints
|
||||
All registry operations are proxied through the gateway:
|
||||
- `GET /api/registry/health` - Registry health check
|
||||
- `GET /api/registry/firmware` - List firmware
|
||||
- `POST /api/registry/firmware` - Upload firmware
|
||||
- `GET /api/registry/firmware/{name}/{version}` - Download firmware
|
||||
- `PUT /api/registry/firmware/{name}/{version}` - Update firmware metadata
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Error Scenarios
|
||||
1. **Firmware Not Found**: Returns 404 with specific error message
|
||||
2. **Node Communication Failure**: Logs error, continues with other nodes
|
||||
3. **Registry Unavailable**: Returns 503 service unavailable
|
||||
4. **Invalid Request**: Returns 400 with validation details
|
||||
|
||||
### Logging
|
||||
- Detailed logs for each rollout step
|
||||
- Node-specific error tracking
|
||||
- Performance metrics (upload times, success rates)
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Parallel Processing
|
||||
- Multiple nodes updated simultaneously
|
||||
- Configurable concurrency limits
|
||||
- Efficient resource utilization
|
||||
|
||||
### WebSocket Optimization
|
||||
- Batched status updates
|
||||
- Efficient message serialization
|
||||
- Connection pooling for registry calls
|
||||
|
||||
### Memory Management
|
||||
- Streaming firmware downloads
|
||||
- Bounded goroutine pools
|
||||
- Proper resource cleanup
|
||||
@@ -78,7 +78,8 @@ func (hs *HTTPServer) corsMiddleware(next http.Handler) http.Handler {
|
||||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization, Accept")
|
||||
w.Header().Set("Access-Control-Expose-Headers", "Content-Type, Content-Length")
|
||||
|
||||
if r.Method == "OPTIONS" {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
|
||||
@@ -132,7 +132,7 @@ func (c *RegistryClient) UploadFirmware(metadata FirmwareMetadata, firmwareFile
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
return nil, fmt.Errorf("firmware upload failed with status %d: %s", resp.StatusCode, string(body))
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user