Spaces:

blanchon
/

RobotHub-InferenceServer

Runtime error

App Files Files Community

RobotHub-InferenceServer / README.md

blanchon

Update

3380376 5 months ago

preview code

raw

history blame

12.1 kB

	---
	title: RobotHub Inference Server
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 8001
	suggested_hardware: t4-small
	suggested_storage: medium
	short_description: Real-time ACT model inference server for robot control
	tags:
	- robotics
	pinned: false
	fullWidth: true
	---

	# 🤖 RobotHub Inference Server

	AI-Powered Robot Control Engine for Real-time Robotics

	The RobotHub Inference Server is the AI brain of the RobotHub ecosystem. It's a FastAPI server that processes real-time camera feeds and robot state data to generate precise control commands using transformer models like ACT, Pi0, SmolVLA, and Diffusion Policies.

	## 🏗️ How It Works in the RobotHub Ecosystem

	The RobotHub Inference Server is part of a complete robotics control pipeline:

	```
	┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
	│ RobotHub │ │ RobotHub │ │ RobotHub │ │ Physical │
	│ Frontend │───▶│ TransportServer│───▶│ InferenceServer│───▶│ Robot │
	│ │ │ │ │ │ │ │
	│ • Web Interface │ │ • Video Streams │ │ • AI Models │ │ • USB/Network │
	│ • Robot Config │ │ • Joint States │ │ • Real-time │ │ • Joint Control │
	│ • Monitoring │ │ • WebRTC/WS │ │ • Inference │ │ • Cameras │
	└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
	│ ▲ │ │
	│ │ │ │
	└────────────────────────┼────────────────────────┼────────────────────────┘
	│ │
	Status & Control Action Commands
	```

	### 🔄 Data Flow

	1. Input Sources → TransportServer:
	- Camera Feeds: Real-time video from robot cameras (front, wrist, overhead, etc.)
	- Joint States: Current robot joint positions and velocities
	- Robot Configuration: Joint limits, kinematics, calibration data

	2. TransportServer → Inference Server:
	- Streams normalized camera images (RGB, 224x224 or custom resolution)
	- Sends normalized joint positions (most joints -100 to +100, gripper 0 to +100)
	- Maintains real-time communication via WebSocket/WebRTC

	3. Inference Server → AI Processing:
	- Vision Processing: Multi-camera image preprocessing and encoding
	- State Encoding: Joint position normalization and history buffering
	- Policy Inference: Transformer model processes visual + proprioceptive data
	- Action Generation: Outputs sequence of robot joint commands

	4. Output → Robot Execution:
	- Action Chunks: Sequences of joint commands (ACT outputs 10-100 actions per inference)
	- Real-time Control: 20Hz control loop, 2Hz inference loop
	- Safety Monitoring: Emergency stop, joint limit checking

	## 🚀 Quick Start

	The server is primarily a FastAPI REST API, but includes an optional Gradio web interface for easy debugging and testing without needing to write code or use curl commands.

	### Option 1: Server + UI (Recommended for Testing)

	```bash
	# Clone and setup
	git clone https://github.com/julien-blanchon/RobotHub-InferenceServer
	cd RobotHub-InferenceServer
	uv sync

	# Launch with integrated UI (FastAPI + Gradio on same port)
	python launch_simple.py
	```

	Access Points:
	- 🎨 Web Interface: http://localhost:7860/ (create sessions, monitor performance)
	- 📖 API Documentation: http://localhost:7860/api/docs (REST API reference)
	- 🔍 Health Check: http://localhost:7860/api/health (system status)

	### Option 2: Server Only (Production)

	```bash
	# Launch FastAPI server only (no UI)
	python -m inference_server.cli --server-only

	# Or with custom configuration
	python -m inference_server.cli --server-only --host localhost --port 8080
	```

	Access:
	- 📖 API Only: http://localhost:7860/api/docs
	- 🔍 Health Check: http://localhost:7860/api/health

	### Option 3: Docker

	```bash
	# Build and run
	docker build -t robothub-inference-server .
	docker run -p 7860:7860 \
	-v /path/to/your/models:/app/checkpoints \
	robothub-inference-server
	```

	## 🛠️ Setting Up Your Robot

	### 1. Connect Your Hardware

	You need the RobotHub TransportServer running first:

	```bash
	# Start the RobotHub TransportServer (dependency)
	cd ../RobotHub-TransportServer
	docker run -p 8000:8000 robothub-transport-server
	```

	### 2. Create an Inference Session

	Via Web Interface (Gradio UI):
	1. Open http://localhost:7860/
	2. Enter your model path (e.g., `./checkpoints/act_pick_place_model`)
	3. Configure camera names (e.g., `front,wrist,overhead`)
	4. Set TransportServer URL (default: `http://localhost:8000`)
	5. Click "Create & Start AI Control"

	Via REST API:
	```python
	import httpx

	session_config = {
	"session_id": "robot_assembly_task",
	"policy_path": "./checkpoints/act_assembly_model",
	"policy_type": "act", # or "pi0", "smolvla", "diffusion"
	"camera_names": ["front_cam", "wrist_cam"],
	"transport_server_url": "http://localhost:8000",
	"language_instruction": "Pick up the red block and place it on the blue platform" # For SmolVLA
	}

	async with httpx.AsyncClient() as client:
	# Create session
	response = await client.post("http://localhost:7860/api/sessions", json=session_config)

	# Start inference
	await client.post(f"http://localhost:7860/api/sessions/{session_config['session_id']}/start")
	```

	### 3. Connect Robot & Cameras

	The robot and cameras connect to the TransportServer, not directly to the Inference Server:

	```python
	# Example: Connect robot to TransportServer
	from transport_server_client import RoboticsConsumer, RoboticsProducer
	from transport_server_client.video import VideoProducer

	# Robot receives AI commands and executes them
	joint_consumer = RoboticsConsumer('http://localhost:8000')
	await joint_consumer.connect(workspace_id, joint_input_room_id)

	def execute_joint_commands(commands):
	"""Execute commands on your actual robot hardware"""
	for cmd in commands:
	joint_name = cmd['name']
	position = cmd['value'] # Normalized: most joints -100 to +100, gripper 0 to +100
	robot.move_joint(joint_name, position)

	joint_consumer.on_joint_update(execute_joint_commands)

	# Robot sends its current state back
	joint_producer = RoboticsProducer('http://localhost:8000')
	await joint_producer.connect(workspace_id, joint_output_room_id)

	# Send current robot state periodically
	await joint_producer.send_state_sync({
	'shoulder_pan_joint': current_joint_positions[0],
	'shoulder_lift_joint': current_joint_positions[1],
	# ... etc
	})

	# Cameras stream to TransportServer
	for camera_name, camera_device in cameras.items():
	video_producer = VideoProducer('http://localhost:8000')
	await video_producer.connect(workspace_id, camera_room_ids[camera_name])
	await video_producer.start_camera(camera_device)
	```

	## 🎮 Supported AI Models

	### ACT (Action Chunking Transformer)
	- Best for: Complex manipulation tasks requiring temporal coherence
	- Output: Chunks of 10-100 future actions per inference
	- Use case: Pick-and-place, assembly, cooking tasks

	### Pi0 (Vision-Language Policy)
	- Best for: Tasks requiring language understanding
	- Output: Single actions with language conditioning
	- Use case: "Pick up the red mug", "Open the top drawer"

	### SmolVLA (Small Vision-Language-Action)
	- Best for: Lightweight vision-language tasks
	- Use case: Simple manipulation with natural language

	### Diffusion Policy
	- Best for: High-precision continuous control
	- Use case: Precise assembly, drawing, writing

	## 📊 Monitoring & Debugging

	### Using the Web Interface

	The Gradio UI provides real-time monitoring:
	- Active Sessions: View all running inference sessions
	- Performance Metrics: Inference rate, control rate, camera FPS
	- Action Queue: Current action buffer status
	- Error Logs: Real-time error tracking

	### Using the REST API

	```bash
	# Check active sessions
	curl http://localhost:7860/api/sessions

	# Get detailed session info
	curl http://localhost:7860/api/sessions/my_robot_session

	# Stop a session
	curl -X POST http://localhost:7860/api/sessions/my_robot_session/stop

	# Emergency stop all sessions
	curl -X POST http://localhost:7860/api/debug/emergency_stop
	```

	## 🔧 Configuration

	### Multi-Camera Setup

	```python
	# Configure multiple camera angles
	session_config = {
	"camera_names": ["front_cam", "wrist_cam", "overhead_cam", "side_cam"],
	# Each camera gets its own TransportServer room
	}
	```

	### Custom Joint Mappings

	The server handles various robot joint naming conventions automatically:
	- LeRobot names: `shoulder_pan_joint`, `shoulder_lift_joint`, `elbow_joint`, etc.
	- Custom names: `base_rotation`, `shoulder_tilt`, `elbow_bend`, etc.
	- Alternative names: `joint_1`, `joint_2`, `base_joint`, etc.

	See `src/inference_server/models/joint_config.py` for full mapping details.

	## 🔌 Integration Examples

	### Standalone Python Application

	```python
	import asyncio
	from transport_server_client import RoboticsProducer, RoboticsConsumer
	from transport_server_client.video import VideoProducer
	import httpx

	class RobotAIController:
	def __init__(self):
	self.inference_client = httpx.AsyncClient(base_url="http://localhost:7860/api")
	self.transport_url = "http://localhost:8000"

	async def start_ai_control(self, task_description: str):
	# 1. Create inference session
	session_config = {
	"session_id": f"task_{int(time.time())}",
	"policy_path": "./checkpoints/general_manipulation_act",
	"policy_type": "act",
	"camera_names": ["front", "wrist"],
	"language_instruction": task_description
	}

	response = await self.inference_client.post("/sessions", json=session_config)
	session_data = response.json()

	# 2. Connect robot to the same workspace/rooms
	await self.connect_robot_hardware(session_data)

	# 3. Start AI inference
	await self.inference_client.post(f"/sessions/{session_config['session_id']}/start")

	print(f"🤖 AI control started for task: {task_description}")

	# Usage
	controller = RobotAIController()
	await controller.start_ai_control("Pick up the blue cup and place it on the shelf")
	```

	## 🚨 Safety & Best Practices

	- Emergency Stop: Built-in emergency stop via API: `/sessions/{id}/stop`
	- Joint Limits: All joint values are normalized (most joints -100 to +100, gripper 0 to +100)
	- Hardware Limits: Robot driver should enforce actual hardware joint limits
	- Session Timeouts: Automatic cleanup prevents runaway processes
	- Error Handling: Graceful degradation when cameras disconnect

	## 🚀 Deployment

	### Local Development
	```bash
	# All services on one machine
	python launch_simple.py # Inference Server with UI
	```

	### Production Setup
	```bash
	# Server only (no UI)
	python -m inference_server.cli --server-only --host localhost --port 7860

	# Or with Docker
	docker run -p 7860:7860 robothub-inference-server
	```