diff --git a/docs/Rollout.md b/docs/Rollout.md index a84fcb2..2bf351a 100644 --- a/docs/Rollout.md +++ b/docs/Rollout.md @@ -1,14 +1,192 @@ # Rollout -The rollout feature works together with the spore-registry. -It provides an endpoint `/cluster/node/versions` to determin which version are installed on which nodes through the `version` label. -A rollout can be started by calling the `/rollout` endpoint and providing a set of labels. -The endpoint will then search the the corresponding firmware in the spore-registry and checks the cluster members that match the labels. -The gateway will then upload the firmware that was found to the matching cluster members in the background. Rollout and upload progress is sent through websocket. -Before the upload starts, the `version` label on the member node is updated with the firmware version from the registry. +The rollout feature provides orchestrated firmware updates across multiple SPORE nodes. It integrates with the spore-registry to manage firmware binaries and uses WebSocket communication for real-time progress updates. -The spore-ui provides a rollout button on each firmware version. When clicked, the existing drawer is shown with the Rollout panel. -The gateway is consulted (endpoint `/cluster/node/versions`) o return the list of matching members that are affected by the rollout and displayed inside the Rollout panel. -The button `Rollout` will, once clicked, trigger the `/rollout` endpoint with the label set of the selected firmware that needs to be rolled out. -Rollout and upload progress is received through websocket and the Rollout panel updated in realtime. -Any UI interaction is blocked during rollout and the UI behaves like the Firmware Deploy on the cluster view (also with backdrop and info message). \ No newline at end of file +## Architecture + +### Components +- **spore-gateway**: Orchestrates rollouts, proxies registry calls, manages WebSocket communication +- **spore-registry**: Stores firmware binaries and metadata +- **spore-ui**: Provides rollout interface and real-time status updates +- **SPORE Nodes**: Target devices for firmware updates + +### Data Flow +1. **UI Discovery**: Frontend queries `/api/cluster/node/versions` to find matching nodes +2. **Rollout Initiation**: Frontend sends firmware info and node list to `/api/rollout` +3. **Parallel Processing**: Gateway processes multiple nodes concurrently using goroutines +4. **Real-time Updates**: Progress and status updates sent via WebSocket +5. **Status Display**: UI shows updating status directly on cluster view nodes + +## API Endpoints + +### `/api/cluster/node/versions` (GET) +Returns cluster members with their current firmware versions based on the `version` label. + +**Response:** +```json +{ + "members": [ + { + "ip": "10.0.1.134", + "version": "1.1.0", + "labels": {"app": "base", "role": "debug"} + } + ] +} +``` + +### `/api/rollout` (POST) +Initiates a firmware rollout for specified nodes. + +**Request Body:** +```json +{ + "firmware": { + "name": "my-firmware", + "version": "1.0.0", + "labels": {"app": "base"} + }, + "nodes": [ + { + "ip": "10.0.1.134", + "version": "1.1.0", + "labels": {"app": "base", "role": "debug"} + } + ] +} +``` + +**Response:** +```json +{ + "success": true, + "message": "Rollout started for 3 nodes", + "rolloutId": "rollout_1761076653", + "totalNodes": 3, + "firmwareUrl": "http://localhost:3002/firmware/my-firmware/1.0.0" +} +``` + +## Rollout Process + +### 1. Firmware Lookup +- Gateway looks up firmware in registry by name and version +- Validates firmware exists and is accessible + +### 2. Parallel Node Processing +- Each node is processed in a separate goroutine +- Uses `sync.WaitGroup` for coordination +- Processes up to N nodes concurrently (where N = total nodes) + +### 3. Node Update Sequence +For each node: +1. **Status Update**: Broadcast `"updating"` status via WebSocket +2. **Label Update**: Update node's `version` label to new firmware version +3. **Firmware Upload**: Upload firmware binary to node +4. **Status Completion**: Broadcast `"online"` status via WebSocket + +### 4. Error Handling +- Failed nodes broadcast `"online"` status to return to normal +- Rollout continues for remaining nodes +- Detailed error logging for debugging + +## WebSocket Communication + +### Message Types + +#### `rollout_progress` +```json +{ + "type": "rollout_progress", + "rolloutId": "rollout_1761076653", + "nodeIp": "10.0.1.134", + "status": "uploading", + "current": 2, + "total": 3, + "progress": 67, + "timestamp": "2025-01-21T20:05:00Z" +} +``` + +**Status Values:** +- `updating_labels`: Node labels being updated +- `uploading`: Firmware being uploaded to node +- `completed`: Node update completed successfully +- `failed`: Node update failed + +#### `node_status_update` +```json +{ + "type": "node_status_update", + "nodeIp": "10.0.1.134", + "status": "updating", + "timestamp": "2025-01-21T20:05:00Z" +} +``` + +**Status Values:** +- `updating`: Node is being updated (blue indicator) +- `online`: Node is online and operational (green indicator) + +## UI Behavior + +### Rollout Panel +- Shows firmware details and matching nodes +- Displays node IP, current version, and labels +- Provides "Rollout" button to initiate process + +### Real-time Updates +- **Node Status**: Cluster view shows blue "updating" indicator during rollout +- **Progress Tracking**: Rollout panel shows individual node status +- **Completion Detection**: Automatically detects when all nodes complete + +### Status Indicators +- **Ready**: Node ready for rollout (gray) +- **Updating**: Node being updated (blue, accent-secondary color) +- **Completed**: Node update completed (green) +- **Failed**: Node update failed (red) + +## Registry Integration + +### Firmware Lookup +- Gateway uses `FindFirmwareByNameAndVersion()` for direct lookup +- No label-based matching required +- Ensures exact firmware version is deployed + +### Proxy Endpoints +All registry operations are proxied through the gateway: +- `GET /api/registry/health` - Registry health check +- `GET /api/registry/firmware` - List firmware +- `POST /api/registry/firmware` - Upload firmware +- `GET /api/registry/firmware/{name}/{version}` - Download firmware +- `PUT /api/registry/firmware/{name}/{version}` - Update firmware metadata + +## Error Handling + +### Common Error Scenarios +1. **Firmware Not Found**: Returns 404 with specific error message +2. **Node Communication Failure**: Logs error, continues with other nodes +3. **Registry Unavailable**: Returns 503 service unavailable +4. **Invalid Request**: Returns 400 with validation details + +### Logging +- Detailed logs for each rollout step +- Node-specific error tracking +- Performance metrics (upload times, success rates) + +## Performance Considerations + +### Parallel Processing +- Multiple nodes updated simultaneously +- Configurable concurrency limits +- Efficient resource utilization + +### WebSocket Optimization +- Batched status updates +- Efficient message serialization +- Connection pooling for registry calls + +### Memory Management +- Streaming firmware downloads +- Bounded goroutine pools +- Proper resource cleanup \ No newline at end of file diff --git a/internal/server/server.go b/internal/server/server.go index b03deca..b4f7a20 100644 --- a/internal/server/server.go +++ b/internal/server/server.go @@ -78,7 +78,8 @@ func (hs *HTTPServer) corsMiddleware(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { w.Header().Set("Access-Control-Allow-Origin", "*") w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS") - w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization") + w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization, Accept") + w.Header().Set("Access-Control-Expose-Headers", "Content-Type, Content-Length") if r.Method == "OPTIONS" { w.WriteHeader(http.StatusOK) diff --git a/pkg/registry/registry.go b/pkg/registry/registry.go index d67a873..1f1a5ae 100644 --- a/pkg/registry/registry.go +++ b/pkg/registry/registry.go @@ -132,7 +132,7 @@ func (c *RegistryClient) UploadFirmware(metadata FirmwareMetadata, firmwareFile } defer resp.Body.Close() - if resp.StatusCode != http.StatusOK { + if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusCreated { body, _ := io.ReadAll(resp.Body) return nil, fmt.Errorf("firmware upload failed with status %d: %s", resp.StatusCode, string(body)) }