File size: 5,717 Bytes
096295a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# Multi-GPU Setup Guide

This guide explains how to run the neural OS demo with multiple GPUs and user queue management.

## Architecture Overview

The system has been split into two main components:

1. **Dispatcher** (`dispatcher.py`): Handles WebSocket connections, manages user queues, and routes requests to workers
2. **Worker** (`worker.py`): Runs the actual model inference on individual GPUs

## Files Overview

- `main.py` - Original single-GPU implementation (kept as backup)
- `dispatcher.py` - Queue management and WebSocket handling
- `worker.py` - GPU worker for model inference
- `start_workers.py` - Helper script to start multiple workers
- `start_system.sh` - Shell script to start the entire system
- `tail_workers.py` - Script to monitor all worker logs simultaneously
- `requirements.txt` - Dependencies
- `static/index.html` - Frontend interface

## Setup Instructions

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Start the Dispatcher

The dispatcher runs on port 7860 and manages user connections and queues:

```bash
python dispatcher.py
```

### 3. Start Workers (One per GPU)

Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.

#### GPU 0:
```bash
python worker.py --gpu-id 0
```

#### GPU 1:
```bash
python worker.py --gpu-id 1
```

#### GPU 2:
```bash
python worker.py --gpu-id 2
```

And so on for additional GPUs.

Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).

### 4. Access the Application

Open your browser and go to: `http://localhost:7860`

## System Behavior

### Queue Management

- **No Queue**: Users get normal timeout behavior (20 seconds of inactivity)
- **With Queue**: Users get limited session time (60 seconds) with warnings and grace periods
- **Grace Period**: If queue becomes empty during grace period, time limits are removed

### User Experience

1. **Immediate Access**: If GPUs are available, users start immediately
2. **Queue Position**: Users see their position and estimated wait time
3. **Session Warnings**: Users get warnings when their time is running out
4. **Grace Period**: 10-second countdown when session time expires, but if queue empties, users can continue
5. **Queue Updates**: Real-time updates on queue position every 5 seconds

### Worker Management

- Workers automatically register with the dispatcher on startup
- Workers send periodic pings (every 10 seconds) to maintain connection
- Workers handle session cleanup when users disconnect
- Each worker can handle one session at a time

### Input Queue Optimization

The system implements intelligent input filtering to maintain performance:

- **Queue Management**: Each worker maintains an input queue per session
- **Interesting Input Detection**: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
- **Smart Processing**: When multiple inputs are queued:
  - Processes "interesting" inputs immediately, skipping boring mouse movements
  - If no interesting inputs are found, processes the latest mouse position
  - This prevents the system from getting bogged down processing every mouse movement
- **Performance**: Maintains responsiveness even during rapid mouse movements

## Configuration

### Dispatcher Settings (in `dispatcher.py`)

```python
self.IDLE_TIMEOUT = 20.0  # When no queue
self.QUEUE_WARNING_TIME = 10.0
self.MAX_SESSION_TIME_WITH_QUEUE = 60.0  # When there's a queue
self.QUEUE_SESSION_WARNING_TIME = 45.0  # 15 seconds before timeout
self.GRACE_PERIOD = 10.0
```

### Worker Settings (in `worker.py`)

```python
self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
self.SCREEN_WIDTH = 512
self.SCREEN_HEIGHT = 384
self.NUM_SAMPLING_STEPS = 32
self.USE_RNN = False
```

## Monitoring

### Health Checks

Check worker health:
```bash
curl http://localhost:8001/health  # GPU 0
curl http://localhost:8002/health  # GPU 1
```

### Logs

The system provides detailed logging for debugging and monitoring:

**Dispatcher logs:**
- `dispatcher.log` - All dispatcher activity, session management, queue operations

**Worker logs:**
- `workers.log` - Summary output from the worker startup script
- `worker_gpu_0.log` - Detailed logs from GPU 0 worker
- `worker_gpu_1.log` - Detailed logs from GPU 1 worker
- `worker_gpu_N.log` - Detailed logs from GPU N worker

**Monitor all worker logs:**
```bash
# Tail all worker logs simultaneously
python tail_workers.py --num-gpus 2

# Or monitor individual workers
tail -f worker_gpu_0.log
tail -f worker_gpu_1.log
```

## Troubleshooting

### Common Issues

1. **Worker not registering**: Check that dispatcher is running first
2. **GPU memory issues**: Ensure each worker is assigned to a different GPU
3. **Port conflicts**: Make sure ports 7860, 8001, 8002, etc. are available
4. **Model loading errors**: Check that model files and configurations are present

### Debug Mode

Enable debug logging by setting log level in both files:
```python
logging.basicConfig(level=logging.DEBUG)
```

## Scaling

To add more GPUs:
1. Start additional workers with higher GPU IDs
2. Workers automatically register with the dispatcher
3. Queue processing automatically utilizes all available workers

The system scales horizontally - add as many workers as you have GPUs available.

## API Endpoints

### Dispatcher
- `GET /` - Serve the web interface
- `WebSocket /ws` - User connections
- `POST /register_worker` - Worker registration
- `POST /worker_ping` - Worker health pings

### Worker
- `POST /process_input` - Process user input
- `POST /end_session` - Clean up session
- `GET /health` - Health check