title: TalkSHOW Speech-to-Motion Translation
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
Team 14 - TalkSHOW: Generating Holistic 3D Human Motion from Speech
Contributors - Abinaya Odeti , Shipra , Shravani , Vishal
About
This repository hosts the implementation of "TalkSHOW: A Speech-to-Motion Translation System", which maps raw audio input to full-body 3D motion using the SMPL-X model. It enables synchronized generation of expressive human body motion (including face, hands, and body) from speech input β supporting real-time animation, virtual avatars, and digital storytelling.
Highlights
Translates raw .wav audio into natural whole-body motion (jaw, pose, expressions, hands) using deep learning.
Based on SMPL-X model for realistic 3D human mesh generation.
Modular pipeline with support for face-body composition.
Visualization with OpenGL & FFmpeg for final rendered video.
End-to-end customizable configuration with audio models, latent generation, and rendering.
Prerequisites
Python 3.7+
Anaconda for environment management
Install required packages:
pip install -r requirements.txt
Install FFmpeg
β€ Extract the FFmpeg ZIP and add its bin folder to System PATH
Getting started
The visualization code was test on Windows 10
, and it requires:
- Python 3.7
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
1. Setup and Steps
Clone the repo:
git clone https://github.com/YOUR_USERNAME/TALKSHOW-speech-to-motion-translation-system.git
cd TalkSHOW
Create conda environment:
conda create -n talkshow python=3.7 -y
conda activate talkshow
pip install -r requirements.txt
2.Download models
Download or place the required checkpoints:
Download pretrained models,
unzip and place it in the TalkSHOW folder, i.e. path-to-TalkSHOW/experiments
.
Download smplx model (Please register in the official SMPLX webpage before you use it.)
and place it in path-to-TalkSHOW/visualise/smplx_model
.
To visualise the test set and generated result (in each video, left: generated result | right: ground truth).
The videos and generated motion data are saved in ./visualise/video/body-pixel
:
SMPLX Model Weights β visualise/smplx_model/SMPLX_NEUTRAL_2020.npz
Extra joints, regressors, YAML configs β inside visualise/smplx_model/
Also, ensure vq_path in body_pixel.json points to a valid .pth model (in ./experiments/.../ckpt-*.pth)
3.ποΈ Running Inference
To generate a 3D animated video from an audio file:
python scripts/demo.py \
--config_file ./config/body_pixel.json \
--infer \
--audio_file ./demo_audio/1st-page.wav \
--id 0 \
--whole_body
Change Input Replace --audio_file value with your own .wav file path.
4. Output
The final 3D animated video will be saved under:
visualise/video/body-pixel2/<audio_file_name>/1st-page.mp4
The exact command you used to run the project
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/1st-page.wav --id 0 --whole_body
Contact
For issues or questions, raise an issue or contact the contributors directly!