File size: 3,507 Bytes
d414280 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
title: TalkSHOW Speech-to-Motion Translation
emoji: ποΈ
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
---
# Team 14 - TalkSHOW: Generating Holistic 3D Human Motion from Speech
Contributors - Abinaya Odeti , Shipra , Shravani , Vishal

## About
This repository hosts the implementation of "TalkSHOW: A Speech-to-Motion Translation System", which maps raw audio input to full-body 3D motion using the SMPL-X model. It enables synchronized generation of expressive human body motion (including face, hands, and body) from speech input β supporting real-time animation, virtual avatars, and digital storytelling.
## Highlights
Translates raw .wav audio into natural whole-body motion (jaw, pose, expressions, hands) using deep learning.
Based on SMPL-X model for realistic 3D human mesh generation.
Modular pipeline with support for face-body composition.
Visualization with OpenGL & FFmpeg for final rendered video.
End-to-end customizable configuration with audio models, latent generation, and rendering.
## Prerequisites
Python 3.7+
Anaconda for environment management
Install required packages:
```bash
pip install -r requirements.txt
```
Install FFmpeg
β€ Extract the FFmpeg ZIP and add its bin folder to System PATH
## Getting started
The visualization code was test on `Windows 10`, and it requires:
* Python 3.7
* conda3 or miniconda3
* CUDA capable GPU (one is enough)
### 1. Setup and Steps
Clone the repo:
```bash
git clone https://github.com/YOUR_USERNAME/TALKSHOW-speech-to-motion-translation-system.git
cd TalkSHOW
```
Create conda environment:
```bash
conda create -n talkshow python=3.7 -y
conda activate talkshow
pip install -r requirements.txt
```
### 2.Download models
Download or place the required checkpoints:
Download [**pretrained models**](https://drive.google.com/file/d/1bC0ZTza8HOhLB46WOJ05sBywFvcotDZG/view?usp=sharing),
unzip and place it in the TalkSHOW folder, i.e. ``path-to-TalkSHOW/experiments``.
Download [**smplx model**](https://drive.google.com/file/d/1Ly_hQNLQcZ89KG0Nj4jYZwccQiimSUVn/view?usp=share_link) (Please register in the official [**SMPLX webpage**](https://smpl-x.is.tue.mpg.de) before you use it.)
and place it in ``path-to-TalkSHOW/visualise/smplx_model``.
To visualise the test set and generated result (in each video, left: generated result | right: ground truth).
The videos and generated motion data are saved in ``./visualise/video/body-pixel``:
SMPLX Model Weights β visualise/smplx_model/SMPLX_NEUTRAL_2020.npz
Extra joints, regressors, YAML configs β inside visualise/smplx_model/
Also, ensure vq_path in body_pixel.json points to a valid .pth model (in ./experiments/.../ckpt-*.pth)
### 3.ποΈ Running Inference
To generate a 3D animated video from an audio file:
```bash
python scripts/demo.py \
--config_file ./config/body_pixel.json \
--infer \
--audio_file ./demo_audio/1st-page.wav \
--id 0 \
--whole_body
```
Change Input
Replace --audio_file value with your own .wav file path.
### 4. Output
The final 3D animated video will be saved under:
```bash
visualise/video/body-pixel2/<audio_file_name>/1st-page.mp4
```
The exact command you used to run the project
```bash
python scripts/demo.py --config_file ./config/body_pixel.json --infer --audio_file ./demo_audio/1st-page.wav --id 0 --whole_body
```
### Contact
For issues or questions, raise an issue or contact the contributors directly!
|