Spaces:

Luigi
/

RTMO-Checkpoint-Tester

Running on Zero

App Files Files Community

RTMO-Checkpoint-Tester / README.md

Luigi

update README

f07e38a 4 months ago

preview code

raw

history blame contribute delete

3.66 kB

	---
	title: RTMO Checkpoint Tester
	emoji: 👀
	colorFrom: pink
	colorTo: green
	sdk: gradio
	sdk_version: 5.27.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: RTMO PyTorch Checkpoint Tester
	---

	# RTMO PyTorch Checkpoint Tester

	This HuggingFace Space provides a real-time 2D multi-person pose estimation demo using the RTMO model from OpenMMLab, accelerated with ZeroGPU. It supports both image and video inputs.

	## Features

	- Remote Checkpoint Selection: Choose from multiple pre-trained variants (COCO, BODY7, CrowdPose, retrainable RTMO-s) via a dropdown.
	- Custom Checkpoint Upload: Upload your own `.pth` file; the application auto-detects RTMO-t/s/m/l variants.
	- Image Input: Upload images for single-frame pose estimation.
	- Video Input: Upload video files (e.g., `.mp4`, `.mov`, `.avi`, `.mkv`, `.webm`) to perform pose estimation on video sequences and view annotated outputs.
	- Threshold Adjustment: Fine-tune Bounding Box Threshold and NMS Threshold sliders to refine detections.
	- Example Images: Three license-free images with people are included for quick testing via the Examples panel.
	- ZeroGPU Acceleration: Utilizes the `@spaces.GPU()` decorator for GPU inference on HuggingFace Spaces.

	## Usage

	1. Upload Image: Drag-and-drop or select an image in the Upload Image component (or choose from Examples).
	2. Upload Video: Drag-and-drop or select a video file in the Upload Video component.
	3. Select Remote Checkpoint: Pick a preloaded variant from the dropdown menu.
	4. (Optional) Upload Your Own Checkpoint: Provide a `.pth` file to override the remote selection; the model variant is detected automatically.
	5. Adjust Thresholds: Set Bounding Box Threshold (`bbox_thr`) and NMS Threshold (`nms_thr`) to control confidence and suppression behavior.
	6. Run Inference: Click Run Inference.
	7. View Results:
	- For images, the annotated image will appear in the Annotated Image panel.
	- For videos, the annotated video will appear in the Annotated Video panel.
	The active checkpoint name will appear below.

	## Remote Checkpoints

	The following variants are available out of the box:

	- `rtmo-s_8xb32-600e_coco`
	- `rtmo-m_16xb16-600e_coco`
	- `rtmo-l_16xb16-600e_coco`
	- `rtmo-t_8xb32-600e_body7`
	- `rtmo-s_8xb32-600e_body7`
	- `rtmo-m_16xb16-600e_body7`
	- `rtmo-l_16xb16-600e_body7`
	- `rtmo-s_8xb32-700e_crowdpose`
	- `rtmo-m_16xb16-700e_crowdpose`
	- `rtmo-l_16xb16-700e_crowdpose`
	- `rtmo-s_coco_retrainable` (from Hugging Face)

	## Implementation Details

	- GPU Decorator: `@spaces.GPU()` marks the `predict` function for GPU execution under ZeroGPU.
	- Inference API: Leverages `MMPoseInferencer` from MMPose with `pose2d`, `pose2d_weights`, and category `[0]` for person detection.
	- Monkey-Patch: Applies a regex patch to bypass `mmdet`’s MMCV version assertion for compatibility.
	- Variant Detection: Inspects `backbone.stem.conv.conv.weight` channels in the checkpoint to select the correct RTMO variant.
	- Checkpoint Management: Remote files are downloaded to `/tmp/{key}.pth` on demand; uploads use the provided local path.
	- Image & Video Support: The `predict` function automatically handles both image and video inputs, saving annotated frames or video to `/tmp/vis` and displaying them in the UI.
	- Output: Saves visualization images or videos to `/tmp/vis` and displays them in the UI panels.

	## Files

	- app.py: Main Gradio application script.
	- requirements.txt: Python dependencies, including MMCV and MMPose.
	- README.md: This documentation file.