fix: remove node_modules
This commit is contained in:
224
docs/DISCOVERY.md
Normal file
224
docs/DISCOVERY.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# UDP Auto Discovery Implementation
|
||||
|
||||
## Overview
|
||||
|
||||
The backend has been successfully updated to implement UDP auto discovery, eliminating the need for hardcoded IP addresses. The system now automatically discovers SPORE nodes on the network and dynamically configures the SporeApiClient.
|
||||
|
||||
## What Was Implemented
|
||||
|
||||
### 1. UDP Discovery Server
|
||||
- **Port**: 4210 (configurable via `UDP_PORT` constant)
|
||||
- **Message**: `CLUSTER_DISCOVERY` (configurable via `DISCOVERY_MESSAGE` constant)
|
||||
- **Protocol**: UDP broadcast listening
|
||||
- **Auto-binding**: Automatically binds to the specified port on startup
|
||||
|
||||
### 2. Dynamic Node Management
|
||||
- **Automatic Discovery**: Nodes are discovered when they send `CLUSTER_DISCOVERY` messages
|
||||
- **Primary Node Selection**: The most recently seen node becomes the primary connection
|
||||
- **Stale Node Cleanup**: Nodes not seen for 5+ minutes are automatically removed
|
||||
- **Health Monitoring**: Continuous monitoring of node availability
|
||||
|
||||
### 3. SporeApiClient Integration
|
||||
- **Dynamic IP Configuration**: Client is automatically configured with discovered node IPs
|
||||
- **No Hardcoded IPs**: All IP addresses are now discovered dynamically
|
||||
- **Automatic Failover**: System automatically switches to available nodes
|
||||
- **Error Handling**: Graceful handling when no nodes are available
|
||||
|
||||
### 4. New API Endpoints
|
||||
|
||||
#### Discovery Management
|
||||
- `GET /api/discovery/nodes` - View all discovered nodes and status
|
||||
- `POST /api/discovery/refresh` - Manually trigger discovery refresh
|
||||
- `POST /api/discovery/primary/:ip` - Manually set primary node
|
||||
|
||||
#### Health Monitoring
|
||||
- `GET /api/health` - Comprehensive health check including discovery status
|
||||
|
||||
### 5. Enhanced Error Handling
|
||||
- **Service Unavailable**: Returns 503 when no nodes are discovered
|
||||
- **Graceful Degradation**: System continues to function even when nodes are unavailable
|
||||
- **Detailed Error Messages**: Clear feedback about discovery status
|
||||
|
||||
## How It Works
|
||||
|
||||
### 1. Startup Sequence
|
||||
```
|
||||
1. Backend starts and binds UDP server to port 4210
|
||||
2. HTTP server starts on port 3001
|
||||
3. System waits for CLUSTER_DISCOVERY messages
|
||||
4. When messages arrive, nodes are automatically discovered
|
||||
5. SporeApiClient is configured with the first discovered node
|
||||
```
|
||||
|
||||
### 2. Discovery Process
|
||||
```
|
||||
1. Node sends "CLUSTER_DISCOVERY" to 255.255.255.255:4210
|
||||
2. Backend receives message and extracts source IP
|
||||
3. Node is added to discovered nodes list
|
||||
4. If no primary node exists, this becomes the primary
|
||||
5. SporeApiClient is automatically configured with the new IP
|
||||
```
|
||||
|
||||
### 3. Node Management
|
||||
```
|
||||
1. All discovered nodes are tracked with timestamps
|
||||
2. Primary node is the most recently seen node
|
||||
3. Stale nodes (5+ minutes old) are automatically removed
|
||||
4. System automatically switches primary node if current becomes stale
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
- `PORT`: HTTP server port (default: 3001)
|
||||
- `UDP_PORT`: UDP discovery port (default: 4210)
|
||||
|
||||
### Constants (in index.js)
|
||||
- `UDP_PORT`: Discovery port (currently 4210)
|
||||
- `DISCOVERY_MESSAGE`: Expected message (currently "CLUSTER_DISCOVERY")
|
||||
- Stale timeout: 5 minutes (configurable in `cleanupStaleNodes()`)
|
||||
- Health check interval: 5 seconds (configurable in `setInterval`)
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Starting the Backend
|
||||
```bash
|
||||
npm start
|
||||
```
|
||||
|
||||
### Testing Discovery
|
||||
```bash
|
||||
# Send discovery message to broadcast
|
||||
npm run test-discovery broadcast
|
||||
|
||||
# Send to specific IP
|
||||
npm run test-discovery 192.168.1.100
|
||||
|
||||
# Send multiple messages
|
||||
npm run test-discovery broadcast 5
|
||||
```
|
||||
|
||||
### Monitoring Discovery
|
||||
```bash
|
||||
# Watch discovery in real-time
|
||||
npm run demo-discovery
|
||||
```
|
||||
|
||||
### Using the Client
|
||||
```bash
|
||||
# Use discovery system
|
||||
npm run client-example
|
||||
|
||||
# Direct connection (for testing)
|
||||
npm run client-example 192.168.1.100
|
||||
```
|
||||
|
||||
## API Response Examples
|
||||
|
||||
### Discovery Status
|
||||
```json
|
||||
{
|
||||
"primaryNode": "192.168.1.100",
|
||||
"totalNodes": 2,
|
||||
"nodes": [
|
||||
{
|
||||
"ip": "192.168.1.100",
|
||||
"port": 4210,
|
||||
"discoveredAt": "2024-01-01T12:00:00.000Z",
|
||||
"lastSeen": "2024-01-01T12:05:00.000Z",
|
||||
"isPrimary": true
|
||||
}
|
||||
],
|
||||
"clientInitialized": true,
|
||||
"clientBaseUrl": "http://192.168.1.100"
|
||||
}
|
||||
```
|
||||
|
||||
### Health Check
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"timestamp": "2024-01-01T12:05:00.000Z",
|
||||
"services": {
|
||||
"http": true,
|
||||
"udp": true,
|
||||
"sporeClient": true
|
||||
},
|
||||
"discovery": {
|
||||
"totalNodes": 2,
|
||||
"primaryNode": "192.168.1.100",
|
||||
"udpPort": 4210,
|
||||
"serverRunning": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
### 1. Zero Configuration
|
||||
- No need to manually configure IP addresses
|
||||
- Automatic discovery of all nodes on the network
|
||||
- Self-healing when nodes come and go
|
||||
|
||||
### 2. High Availability
|
||||
- Automatic failover to available nodes
|
||||
- No single point of failure
|
||||
- Continuous health monitoring
|
||||
|
||||
### 3. Scalability
|
||||
- Supports unlimited number of nodes
|
||||
- Automatic load distribution
|
||||
- Easy to add/remove nodes
|
||||
|
||||
### 4. Maintenance
|
||||
- No manual IP updates required
|
||||
- Automatic cleanup of stale nodes
|
||||
- Comprehensive monitoring and logging
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### No Nodes Discovered
|
||||
1. Check if backend is running: `curl http://localhost:3001/api/health`
|
||||
2. Verify UDP port is open: Check firewall settings
|
||||
3. Send test discovery message: `npm run test-discovery broadcast`
|
||||
|
||||
#### UDP Port Already in Use
|
||||
1. Check for other instances: `netstat -tulpn | grep 4210`
|
||||
2. Kill conflicting processes or change port in code
|
||||
3. Restart backend server
|
||||
|
||||
#### Client Not Initialized
|
||||
1. Check discovery status: `curl http://localhost:3001/api/discovery/nodes`
|
||||
2. Verify nodes are sending discovery messages
|
||||
3. Check network connectivity
|
||||
|
||||
### Debug Commands
|
||||
```bash
|
||||
# Check discovery status
|
||||
curl http://localhost:3001/api/discovery/nodes
|
||||
|
||||
# Check health
|
||||
curl http://localhost:3001/api/health
|
||||
|
||||
# Manual refresh
|
||||
curl -X POST http://localhost:3001/api/discovery/refresh
|
||||
|
||||
# Set primary node
|
||||
curl -X POST http://localhost:3001/api/discovery/primary/192.168.1.100
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Potential Improvements
|
||||
1. **Node Prioritization**: Weight-based node selection
|
||||
2. **Load Balancing**: Distribute requests across multiple nodes
|
||||
3. **Authentication**: Secure discovery messages
|
||||
4. **Metrics**: Detailed performance and health metrics
|
||||
5. **Configuration**: Runtime configuration updates
|
||||
6. **Clustering**: Multiple backend instances with shared discovery
|
||||
|
||||
## Conclusion
|
||||
|
||||
The UDP auto discovery implementation provides a robust, scalable solution for dynamic node management. It eliminates manual configuration while providing high availability and automatic failover capabilities. The system is production-ready and includes comprehensive monitoring, error handling, and debugging tools.
|
||||
Reference in New Issue
Block a user