neural-os

Runtime error

App Files Files Community

da03 commited on Jul 15

Commit

096295a

1 Parent(s): 7323319

.

Browse files

Files changed (6) hide show

Dockerfile +44 -1
MULTI_GPU_SETUP.md +192 -0
README.md +27 -1
start_remote_worker.sh +87 -0
static/index.html +87 -5
worker.py +2 -2

Dockerfile CHANGED Viewed

@@ -35,4 +35,47 @@ WORKDIR $HOME/app
 # Copy the current directory contents into the container at $HOME/app setting the owner to the user
 COPY --chown=user . $HOME/app
-CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1", "--timeout-keep-alive", "3000"]

 # Copy the current directory contents into the container at $HOME/app setting the owner to the user
 COPY --chown=user . $HOME/app
+# Create a startup script for HF Spaces
+COPY --chown=user <<EOF $HOME/app/start_hf_spaces.sh
+#!/bin/bash
+set -e
+echo "🚀 Starting Neural OS for HF Spaces"
+echo "===================================="
+# Start dispatcher in background
+echo "🎯 Starting dispatcher..."
+python dispatcher.py --port 7860 > dispatcher.log 2>&1 &
+DISPATCHER_PID=\$!
+# Wait for dispatcher to start
+sleep 3
+# Start single worker (HF Spaces typically has 1 GPU or CPU)
+echo "🔧 Starting worker..."
+python worker.py --worker-address localhost:8001 --dispatcher-url http://localhost:7860 > worker.log 2>&1 &
+WORKER_PID=\$!
+# Wait for worker to initialize
+echo "⏳ Waiting for worker to initialize..."
+sleep 30
+echo "✅ System ready!"
+echo "🌍 Web interface: http://localhost:7860"
+# Function to cleanup
+cleanup() {
+    echo "🛑 Shutting down..."
+    kill \$DISPATCHER_PID \$WORKER_PID 2>/dev/null || true
+    exit 0
+}
+trap cleanup SIGINT SIGTERM
+# Wait for dispatcher (main process)
+wait \$DISPATCHER_PID
+EOF
+RUN chmod +x $HOME/app/start_hf_spaces.sh
+CMD ["bash", "start_hf_spaces.sh"]

MULTI_GPU_SETUP.md ADDED Viewed

	@@ -0,0 +1,192 @@

+# Multi-GPU Setup Guide
+This guide explains how to run the neural OS demo with multiple GPUs and user queue management.
+## Architecture Overview
+The system has been split into two main components:
+1. **Dispatcher** (`dispatcher.py`): Handles WebSocket connections, manages user queues, and routes requests to workers
+2. **Worker** (`worker.py`): Runs the actual model inference on individual GPUs
+## Files Overview
+- `main.py` - Original single-GPU implementation (kept as backup)
+- `dispatcher.py` - Queue management and WebSocket handling
+- `worker.py` - GPU worker for model inference
+- `start_workers.py` - Helper script to start multiple workers
+- `start_system.sh` - Shell script to start the entire system
+- `tail_workers.py` - Script to monitor all worker logs simultaneously
+- `requirements.txt` - Dependencies
+- `static/index.html` - Frontend interface
+## Setup Instructions
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Start the Dispatcher
+The dispatcher runs on port 7860 and manages user connections and queues:
+```bash
+python dispatcher.py
+```
+### 3. Start Workers (One per GPU)
+Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.
+#### GPU 0:
+```bash
+python worker.py --gpu-id 0
+```
+#### GPU 1:
+```bash
+python worker.py --gpu-id 1
+```
+#### GPU 2:
+```bash
+python worker.py --gpu-id 2
+```
+And so on for additional GPUs.
+Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).
+### 4. Access the Application
+Open your browser and go to: `http://localhost:7860`
+## System Behavior
+### Queue Management
+- **No Queue**: Users get normal timeout behavior (20 seconds of inactivity)
+- **With Queue**: Users get limited session time (60 seconds) with warnings and grace periods
+- **Grace Period**: If queue becomes empty during grace period, time limits are removed
+### User Experience
+1. **Immediate Access**: If GPUs are available, users start immediately
+2. **Queue Position**: Users see their position and estimated wait time
+3. **Session Warnings**: Users get warnings when their time is running out
+4. **Grace Period**: 10-second countdown when session time expires, but if queue empties, users can continue
+5. **Queue Updates**: Real-time updates on queue position every 5 seconds
+### Worker Management
+- Workers automatically register with the dispatcher on startup
+- Workers send periodic pings (every 10 seconds) to maintain connection
+- Workers handle session cleanup when users disconnect
+- Each worker can handle one session at a time
+### Input Queue Optimization
+The system implements intelligent input filtering to maintain performance:
+- **Queue Management**: Each worker maintains an input queue per session
+- **Interesting Input Detection**: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
+- **Smart Processing**: When multiple inputs are queued:
+  - Processes "interesting" inputs immediately, skipping boring mouse movements
+  - If no interesting inputs are found, processes the latest mouse position
+  - This prevents the system from getting bogged down processing every mouse movement
+- **Performance**: Maintains responsiveness even during rapid mouse movements
+## Configuration
+### Dispatcher Settings (in `dispatcher.py`)
+```python
+self.IDLE_TIMEOUT = 20.0  # When no queue
+self.QUEUE_WARNING_TIME = 10.0
+self.MAX_SESSION_TIME_WITH_QUEUE = 60.0  # When there's a queue
+self.QUEUE_SESSION_WARNING_TIME = 45.0  # 15 seconds before timeout
+self.GRACE_PERIOD = 10.0
+```
+### Worker Settings (in `worker.py`)
+```python
+self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
+self.SCREEN_WIDTH = 512
+self.SCREEN_HEIGHT = 384
+self.NUM_SAMPLING_STEPS = 32
+self.USE_RNN = False
+```
+## Monitoring
+### Health Checks
+Check worker health:
+```bash
+curl http://localhost:8001/health  # GPU 0
+curl http://localhost:8002/health  # GPU 1
+```
+### Logs
+The system provides detailed logging for debugging and monitoring:
+**Dispatcher logs:**
+- `dispatcher.log` - All dispatcher activity, session management, queue operations
+**Worker logs:**
+- `workers.log` - Summary output from the worker startup script
+- `worker_gpu_0.log` - Detailed logs from GPU 0 worker
+- `worker_gpu_1.log` - Detailed logs from GPU 1 worker
+- `worker_gpu_N.log` - Detailed logs from GPU N worker
+**Monitor all worker logs:**
+```bash
+# Tail all worker logs simultaneously
+python tail_workers.py --num-gpus 2
+# Or monitor individual workers
+tail -f worker_gpu_0.log
+tail -f worker_gpu_1.log
+```
+## Troubleshooting
+### Common Issues
+1. **Worker not registering**: Check that dispatcher is running first
+2. **GPU memory issues**: Ensure each worker is assigned to a different GPU
+3. **Port conflicts**: Make sure ports 7860, 8001, 8002, etc. are available
+4. **Model loading errors**: Check that model files and configurations are present
+### Debug Mode
+Enable debug logging by setting log level in both files:
+```python
+logging.basicConfig(level=logging.DEBUG)
+```
+## Scaling
+To add more GPUs:
+1. Start additional workers with higher GPU IDs
+2. Workers automatically register with the dispatcher
+3. Queue processing automatically utilizes all available workers
+The system scales horizontally - add as many workers as you have GPUs available.
+## API Endpoints
+### Dispatcher
+- `GET /` - Serve the web interface
+- `WebSocket /ws` - User connections
+- `POST /register_worker` - Worker registration
+- `POST /worker_ping` - Worker health pings
+### Worker
+- `POST /process_input` - Process user input
+- `POST /end_session` - Clean up session
+- `GET /health` - Health check

README.md CHANGED Viewed

@@ -1,10 +1,36 @@
 ---
 title: Neural Computer
-emoji: 🐢
 colorFrom: purple
 colorTo: blue
 sdk: docker
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Neural Computer
+emoji: 🧠
 colorFrom: purple
 colorTo: blue
 sdk: docker
 pinned: false
 ---
+# Neural Computer Demo
+This is a demonstration of a Neural Computer system that can generate computer screen interactions in real-time. The system uses a trained diffusion model to predict what the screen should look like based on mouse movements, clicks, and keyboard inputs.
+## How to Use
+1. **Wait for the model to load** - This may take a minute or two on first startup
+2. **Click anywhere on the canvas** to begin interacting
+3. **Move your mouse** around to see the model predict screen changes
+4. **Click and drag** to simulate mouse interactions
+5. **Use keyboard inputs** while focused on the canvas
+6. **Use the controls** to:
+   - Reset the simulation
+   - Adjust sampling steps (lower = faster, higher = better quality)
+   - Toggle RNN mode for even faster inference
+## Settings
+- **Sampling Steps**: Controls the quality vs speed tradeoff (1-50 steps)
+- **Use RNN**: Enables faster inference mode using RNN output directly
+- **Reset**: Clears the simulation and starts fresh
+## Technical Details
+This system uses a specialized diffusion model trained on computer interaction data. The model can predict realistic screen changes based on user inputs in real-time.
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

start_remote_worker.sh ADDED Viewed

	@@ -0,0 +1,87 @@

+#!/bin/bash
+# Remote Worker Startup Script
+# Usage: ./start_remote_worker.sh <dispatcher_ip> <local_ip> <num_gpus>
+DISPATCHER_IP=${1:-"192.168.1.50"}
+LOCAL_IP=${2:-$(hostname -I | awk '{print $1}')}
+NUM_GPUS=${3:-1}
+DISPATCHER_URL="http://${DISPATCHER_IP}:7860"
+echo "🚀 Starting Remote GPU Workers"
+echo "==============================="
+echo "🌐 Dispatcher: $DISPATCHER_URL"
+echo "📍 Local IP: $LOCAL_IP"
+echo "🖥️  GPUs: $NUM_GPUS"
+echo ""
+# Check if required files exist
+REQUIRED_FILES=("worker.py" "utils.py" "latent_stats.json")
+for file in "${REQUIRED_FILES[@]}"; do
+    if [[ ! -f "$file" ]]; then
+        echo "❌ Error: $file not found"
+        echo "💡 Copy required files from main machine:"
+        echo "   scp user@dispatcher-machine:/path/to/{worker.py,utils.py,latent_stats.json,config_*.yaml} ."
+        exit 1
+    fi
+done
+# Test GPU access
+echo "🧪 Testing GPU access..."
+python -c "import torch; print(f'✅ CUDA available: {torch.cuda.is_available()}'); print(f'📊 GPU count: {torch.cuda.device_count()}')"
+# Test dispatcher connectivity
+echo "🌐 Testing dispatcher connectivity..."
+if curl -s --connect-timeout 5 "$DISPATCHER_URL" > /dev/null; then
+    echo "✅ Dispatcher reachable"
+else
+    echo "❌ Cannot reach dispatcher at $DISPATCHER_URL"
+    echo "💡 Check network connectivity and dispatcher status"
+    exit 1
+fi
+# Start workers
+echo "🔧 Starting $NUM_GPUS GPU workers..."
+for ((i=0; i<NUM_GPUS; i++)); do
+    PORT=$((8001 + i))
+    WORKER_ADDRESS="${LOCAL_IP}:${PORT}"
+    echo "Starting worker on GPU $i: $WORKER_ADDRESS"
+    CUDA_VISIBLE_DEVICES=$i python worker.py \
+        --worker-address "$WORKER_ADDRESS" \
+        --dispatcher-url "$DISPATCHER_URL" \
+        > "worker_gpu_${i}.log" 2>&1 &
+    WORKER_PID=$!
+    echo "✅ Worker $i started (PID: $WORKER_PID)"
+    # Small delay between starts
+    sleep 2
+done
+echo ""
+echo "🎉 All workers started!"
+echo "📋 Monitor logs:"
+for ((i=0; i<NUM_GPUS; i++)); do
+    echo "   GPU $i: tail -f worker_gpu_${i}.log"
+done
+echo ""
+echo "🔍 Check worker health:"
+for ((i=0; i<NUM_GPUS; i++)); do
+    PORT=$((8001 + i))
+    echo "   GPU $i: curl http://${LOCAL_IP}:${PORT}/health"
+done
+echo ""
+echo "⚠️  To stop workers: pkill -f 'python.*worker.py'"
+echo "Press Ctrl+C to continue monitoring or any key to exit..."
+# Keep script running to show it's active
+trap 'echo ""; echo "🛑 Stopping workers..."; pkill -f "python.*worker.py"; exit 0' SIGINT
+# Show real-time worker status
+while true; do
+    sleep 10
+    RUNNING=$(ps aux | grep -c "python.*worker.py" || echo "0")
+    echo "$(date): $RUNNING workers running"
+done

static/index.html CHANGED Viewed

@@ -290,6 +290,9 @@
                     console.log(`Queue update: Position ${data.position}/${data.total_waiting}, wait: ${data.maximum_wait_seconds.toFixed(1)} seconds`);
                     const waitSeconds = Math.ceil(data.maximum_wait_seconds);
                     if (waitSeconds === 0) {
                         showConnectionStatus("Starting soon...");
                         stopQueueCountdown();
@@ -313,6 +316,8 @@
                     console.log("Session started, clearing queue display");
                     // Stop queue countdown and clear the display
                     stopQueueCountdown();
                     //ctx.clearRect(0, 0, canvas.width, canvas.height);
                 } else if (data.type === "session_warning") {
                     console.log(`Session time warning: ${data.time_remaining} seconds remaining`);
@@ -391,6 +396,10 @@
         let autoInputEnabled = true; // Default to enabled
         let userHasInteracted = false; // Track if user has moved mouse inside canvas
         // Timeout countdown mechanism - support concurrent timeouts
         let timeoutCountdownInterval = null;
         let timeoutCountdown = 10;
@@ -534,11 +543,11 @@
             }
             // Update initial display
-            const countdownElement = document.getElementById('timeoutCountdown');
-            if (countdownElement) {
-                countdownElement.textContent = timeoutCountdown;
-            }
             console.log(`Starting ${earliestTimeout.type} timeout countdown: ${timeoutCountdown} seconds`);
             // Start countdown
@@ -710,6 +719,44 @@
             queueCountdownActive = false;
             queueWaitTime = 0;
         }
         function updateQueueCountdownDisplay() {
             if (queueWaitTime <= 0) {
@@ -815,6 +862,13 @@
             }
             if (!isConnected || isProcessing) return;
             let rect = canvas.getBoundingClientRect();
             let x = event.clientX - rect.left;
             let y = event.clientY - rect.top;
@@ -832,6 +886,13 @@
         canvas.addEventListener("click", function (event) {
             if (!isConnected || isProcessing) return;
             let rect = canvas.getBoundingClientRect();
             let x = event.clientX - rect.left;
             let y = event.clientY - rect.top;
@@ -844,6 +905,12 @@
             event.preventDefault(); // Prevent default context menu
             if (!isConnected || isProcessing) return;
             let rect = canvas.getBoundingClientRect();
             let x = event.clientX - rect.left;
             let y = event.clientY - rect.top;
@@ -907,6 +974,12 @@
             }
             if (!isConnected || isProcessing || !userHasInteracted) return;
             // Get the current mouse position
             let rect = canvas.getBoundingClientRect();
             let x = lastSentPosition ? lastSentPosition.x : canvas.width / 2;
@@ -923,6 +996,12 @@
             }
             if (!isConnected || socket.readyState !== WebSocket.OPEN || !userHasInteracted) return;
             // Get the current mouse position
             let rect = canvas.getBoundingClientRect();
             let x = lastSentPosition ? lastSentPosition.x : canvas.width / 2;
@@ -1030,6 +1109,9 @@
                 }
             }
         });
     </script>
     <!-- Bootstrap JS (optional) -->

                     console.log(`Queue update: Position ${data.position}/${data.total_waiting}, wait: ${data.maximum_wait_seconds.toFixed(1)} seconds`);
                     const waitSeconds = Math.ceil(data.maximum_wait_seconds);
+                    // Disable canvas interaction while in queue
+                    disableCanvasInteraction();
                     if (waitSeconds === 0) {
                         showConnectionStatus("Starting soon...");
                         stopQueueCountdown();
                     console.log("Session started, clearing queue display");
                     // Stop queue countdown and clear the display
                     stopQueueCountdown();
+                    // Enable canvas interaction when session starts
+                    enableCanvasInteraction();
                     //ctx.clearRect(0, 0, canvas.width, canvas.height);
                 } else if (data.type === "session_warning") {
                     console.log(`Session time warning: ${data.time_remaining} seconds remaining`);
         let autoInputEnabled = true; // Default to enabled
         let userHasInteracted = false; // Track if user has moved mouse inside canvas
+        // Session state tracking
+        let sessionState = 'queued'; // 'queued', 'active', 'disconnected'
+        let canvasInteractionEnabled = false;
         // Timeout countdown mechanism - support concurrent timeouts
         let timeoutCountdownInterval = null;
         let timeoutCountdown = 10;
             }
             // Update initial display
+                const countdownElement = document.getElementById('timeoutCountdown');
+                if (countdownElement) {
+                    countdownElement.textContent = timeoutCountdown;
+                }
             console.log(`Starting ${earliestTimeout.type} timeout countdown: ${timeoutCountdown} seconds`);
             // Start countdown
             queueCountdownActive = false;
             queueWaitTime = 0;
         }
+        function enableCanvasInteraction() {
+            canvasInteractionEnabled = true;
+            sessionState = 'active';
+            // Remove visual queue indicator
+            if (canvas) {
+                canvas.style.opacity = '1';
+                canvas.style.cursor = 'crosshair';
+                canvas.style.pointerEvents = 'auto';
+            }
+            // Update status
+            const statusElement = document.getElementById('connectionStatus');
+            if (statusElement) {
+                statusElement.textContent = 'Active';
+                statusElement.className = 'connected';
+            }
+        }
+        function disableCanvasInteraction() {
+            canvasInteractionEnabled = false;
+            sessionState = 'queued';
+            // Add visual queue indicator
+            if (canvas) {
+                canvas.style.opacity = '0.5';
+                canvas.style.cursor = 'not-allowed';
+                canvas.style.pointerEvents = 'none';
+            }
+            // Update status
+            const statusElement = document.getElementById('connectionStatus');
+            if (statusElement) {
+                statusElement.textContent = 'Queued';
+                statusElement.className = 'connecting';
+            }
+        }
         function updateQueueCountdownDisplay() {
             if (queueWaitTime <= 0) {
             }
             if (!isConnected || isProcessing) return;
+            // Check if canvas interaction is enabled (not queued)
+            if (!canvasInteractionEnabled) {
+                console.log("Canvas interaction disabled - user is queued");
+                return;
+            }
             let rect = canvas.getBoundingClientRect();
             let x = event.clientX - rect.left;
             let y = event.clientY - rect.top;
         canvas.addEventListener("click", function (event) {
             if (!isConnected || isProcessing) return;
+            // Check if canvas interaction is enabled (not queued)
+            if (!canvasInteractionEnabled) {
+                console.log("Canvas interaction disabled - user is queued");
+                return;
+            }
             let rect = canvas.getBoundingClientRect();
             let x = event.clientX - rect.left;
             let y = event.clientY - rect.top;
             event.preventDefault(); // Prevent default context menu
             if (!isConnected || isProcessing) return;
+            // Check if canvas interaction is enabled (not queued)
+            if (!canvasInteractionEnabled) {
+                console.log("Canvas interaction disabled - user is queued");
+                return;
+            }
             let rect = canvas.getBoundingClientRect();
             let x = event.clientX - rect.left;
             let y = event.clientY - rect.top;
             }
             if (!isConnected || isProcessing || !userHasInteracted) return;
+            // Check if canvas interaction is enabled (not queued)
+            if (!canvasInteractionEnabled) {
+                console.log("Canvas interaction disabled - user is queued");
+                return;
+            }
             // Get the current mouse position
             let rect = canvas.getBoundingClientRect();
             let x = lastSentPosition ? lastSentPosition.x : canvas.width / 2;
             }
             if (!isConnected || socket.readyState !== WebSocket.OPEN || !userHasInteracted) return;
+            // Check if canvas interaction is enabled (not queued)
+            if (!canvasInteractionEnabled) {
+                console.log("Canvas interaction disabled - user is queued");
+                return;
+            }
             // Get the current mouse position
             let rect = canvas.getBoundingClientRect();
             let x = lastSentPosition ? lastSentPosition.x : canvas.width / 2;
                 }
             }
         });
+        // Initialize canvas in disabled state (user starts queued)
+        disableCanvasInteraction();
     </script>
     <!-- Bootstrap JS (optional) -->

worker.py CHANGED Viewed

@@ -53,7 +53,7 @@ class GPUWorker:
         self.NUM_SAMPLING_STEPS = 32
         self.USE_RNN = False
-        self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k72-108k"
         # Initialize model
         self._initialize_model()
@@ -810,4 +810,4 @@ if __name__ == "__main__":
         logger.error(f"❌ Failed to start worker: {e}")
         import traceback
         logger.error(f"🔍 Full traceback: {traceback.format_exc()}")
-        raise

         self.NUM_SAMPLING_STEPS = 32
         self.USE_RNN = False
+        self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
         # Initialize model
         self._initialize_model()
         logger.error(f"❌ Failed to start worker: {e}")
         import traceback
         logger.error(f"🔍 Full traceback: {traceback.format_exc()}")
+        raise