File size: 3,656 Bytes
94eeccf
 
 
 
 
 
5b13625
94eeccf
 
 
 
 
 
f5a0539
 
f07e38a
986677d
 
 
 
 
f07e38a
 
986677d
 
 
f5a0539
 
986677d
 
f07e38a
 
 
 
 
 
 
 
 
986677d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f07e38a
 
f5a0539
 
 
986677d
 
 
f07e38a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: RTMO Checkpoint Tester
emoji: πŸ‘€
colorFrom: pink
colorTo: green
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: RTMO PyTorch Checkpoint Tester
---

# RTMO PyTorch Checkpoint Tester

This HuggingFace Space provides a real-time 2D multi-person pose estimation demo using the RTMO model from OpenMMLab, accelerated with ZeroGPU. It supports both image and video inputs.

## Features

- **Remote Checkpoint Selection**: Choose from multiple pre-trained variants (COCO, BODY7, CrowdPose, retrainable RTMO-s) via a dropdown.
- **Custom Checkpoint Upload**: Upload your own `.pth` file; the application auto-detects RTMO-t/s/m/l variants.
- **Image Input**: Upload images for single-frame pose estimation.
- **Video Input**: Upload video files (e.g., `.mp4`, `.mov`, `.avi`, `.mkv`, `.webm`) to perform pose estimation on video sequences and view annotated outputs.
- **Threshold Adjustment**: Fine-tune **Bounding Box Threshold** and **NMS Threshold** sliders to refine detections.
- **Example Images**: Three license-free images with people are included for quick testing via the **Examples** panel.
- **ZeroGPU Acceleration**: Utilizes the `@spaces.GPU()` decorator for GPU inference on HuggingFace Spaces.

## Usage

1. **Upload Image**: Drag-and-drop or select an image in the **Upload Image** component (or choose from **Examples**).
2. **Upload Video**: Drag-and-drop or select a video file in the **Upload Video** component.
3. **Select Remote Checkpoint**: Pick a preloaded variant from the dropdown menu.
4. **(Optional) Upload Your Own Checkpoint**: Provide a `.pth` file to override the remote selection; the model variant is detected automatically.
5. **Adjust Thresholds**: Set **Bounding Box Threshold** (`bbox_thr`) and **NMS Threshold** (`nms_thr`) to control confidence and suppression behavior.
6. **Run Inference**: Click **Run Inference**.
7. **View Results**:
   - For images, the annotated image will appear in the **Annotated Image** panel.
   - For videos, the annotated video will appear in the **Annotated Video** panel.
   The active checkpoint name will appear below.

## Remote Checkpoints

The following variants are available out of the box:

- `rtmo-s_8xb32-600e_coco`
- `rtmo-m_16xb16-600e_coco`
- `rtmo-l_16xb16-600e_coco`
- `rtmo-t_8xb32-600e_body7`
- `rtmo-s_8xb32-600e_body7`
- `rtmo-m_16xb16-600e_body7`
- `rtmo-l_16xb16-600e_body7`
- `rtmo-s_8xb32-700e_crowdpose`
- `rtmo-m_16xb16-700e_crowdpose`
- `rtmo-l_16xb16-700e_crowdpose`
- `rtmo-s_coco_retrainable` (from Hugging Face)

## Implementation Details

- **GPU Decorator**: `@spaces.GPU()` marks the `predict` function for GPU execution under ZeroGPU.
- **Inference API**: Leverages `MMPoseInferencer` from MMPose with `pose2d`, `pose2d_weights`, and category `[0]` for person detection.
- **Monkey-Patch**: Applies a regex patch to bypass `mmdet`’s MMCV version assertion for compatibility.
- **Variant Detection**: Inspects `backbone.stem.conv.conv.weight` channels in the checkpoint to select the correct RTMO variant.
- **Checkpoint Management**: Remote files are downloaded to `/tmp/{key}.pth` on demand; uploads use the provided local path.
- **Image & Video Support**: The `predict` function automatically handles both image and video inputs, saving annotated frames or video to `/tmp/vis` and displaying them in the UI.
- **Output**: Saves visualization images or videos to `/tmp/vis` and displays them in the UI panels.

## Files

- **app.py**: Main Gradio application script.
- **requirements.txt**: Python dependencies, including MMCV and MMPose.
- **README.md**: This documentation file.