metadata

title: TalkSHOW Speech-to-Motion Translation
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit

Team 14 - TalkSHOW: Generating Holistic 3D Human Motion from Speech

Contributors - Abinaya Odeti , Shipra , Shravani , Vishal

About

This repository hosts the implementation of "TalkSHOW: A Speech-to-Motion Translation System", which maps raw audio input to full-body 3D motion using the SMPL-X model. It enables synchronized generation of expressive human body motion (including face, hands, and body) from speech input — supporting real-time animation, virtual avatars, and digital storytelling.

Highlights

Translates raw .wav audio into natural whole-body motion (jaw, pose, expressions, hands) using deep learning.

Based on SMPL-X model for realistic 3D human mesh generation.

Modular pipeline with support for face-body composition.

Visualization with OpenGL & FFmpeg for final rendered video.

End-to-end customizable configuration with audio models, latent generation, and rendering.

Prerequisites

Python 3.7+

Anaconda for environment management

Install required packages:

pip install -r requirements.txt

Install FFmpeg

➤ Extract the FFmpeg ZIP and add its bin folder to System PATH

Getting started

The visualization code was test on Windows 10, and it requires:

Python 3.7
conda3 or miniconda3
CUDA capable GPU (one is enough)

1. Setup and Steps

Clone the repo:

git clone https://github.com/YOUR_USERNAME/TALKSHOW-speech-to-motion-translation-system.git
cd TalkSHOW

Create conda environment:

conda create -n talkshow python=3.7 -y
conda activate talkshow
pip install -r requirements.txt

2.Download models

Download or place the required checkpoints: Download pretrained models, unzip and place it in the TalkSHOW folder, i.e. path-to-TalkSHOW/experiments.

Download smplx model (Please register in the official SMPLX webpage before you use it.) and place it in path-to-TalkSHOW/visualise/smplx_model. To visualise the test set and generated result (in each video, left: generated result | right: ground truth). The videos and generated motion data are saved in ./visualise/video/body-pixel:

SMPLX Model Weights – visualise/smplx_model/SMPLX_NEUTRAL_2020.npz

Extra joints, regressors, YAML configs – inside visualise/smplx_model/

Also, ensure vq_path in body_pixel.json points to a valid .pth model (in ./experiments/.../ckpt-*.pth)

3.🎙️ Running Inference

To generate a 3D animated video from an audio file:

python scripts/demo.py \
  --config_file ./config/body_pixel.json \
  --infer \
  --audio_file ./demo_audio/1st-page.wav \
  --id 0 \
  --whole_body

Change Input Replace --audio_file value with your own .wav file path.

4. Output

The final 3D animated video will be saved under:

visualise/video/body-pixel2/<audio_file_name>/1st-page.mp4

The exact command you used to run the project

python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/1st-page.wav --id 0 --whole_body

Contact

For issues or questions, raise an issue or contact the contributors directly!