Spaces:
Running
on
Zero
Running
on
Zero
title: Sound AI SFX | |
emoji: π | |
colorFrom: indigo | |
colorTo: pink | |
sdk: gradio | |
sdk_version: 5.35.0 | |
app_file: app.py | |
pinned: false | |
short_description: SText to Audio(Sound SFX) Generator | |
## TangoFlux: Text-to-Audio Generation System | |
TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts. | |
### Key Features | |
**1. Advanced Audio Generation** | |
- Converts detailed text descriptions into realistic audio | |
- Supports complex soundscapes with multiple elements | |
- Generates audio up to 30 seconds in duration | |
- Produces 44.1kHz high-quality audio output | |
**2. Flexible Generation Controls** | |
- **Steps (10-100)**: Controls generation quality vs speed | |
- **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt | |
- **Duration (1-30s)**: Sets the length of generated audio | |
**3. Diverse Audio Capabilities** | |
- Natural sounds (ocean waves, thunder, rain) | |
- Animal sounds (dogs barking, cats meowing, birds singing) | |
- Human sounds (laughter, speaking, whistling, snoring) | |
- Mechanical sounds (engines, vehicles, machinery) | |
- Complex soundscapes (multiple layered sounds) | |
**4. Technical Architecture** | |
- Uses flow matching for efficient generation | |
- CLAP-ranked preference optimization for quality | |
- GPU-accelerated inference with CUDA support | |
- Transformer-based text encoding | |
- Optimized for fast generation with @spaces.GPU | |
### How It Works | |
1. **Text Input**: Describe the desired audio in natural language | |
2. **Parameter Adjustment**: Fine-tune generation settings | |
3. **AI Processing**: The model interprets text and generates corresponding audio | |
4. **Audio Output**: Download or play the generated WAV file | |
### Example Use Cases | |
- **Film & Video Production**: Create custom sound effects and ambiences | |
- **Game Development**: Generate dynamic environmental sounds | |
- **Podcast Production**: Add realistic background sounds | |
- **Music Production**: Create unique sound textures and effects | |
- **Educational Content**: Generate illustrative audio examples | |
- **Accessibility**: Convert text descriptions to audio experiences | |
The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes. | |
--- | |
## TangoFlux: ν μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν | |
TangoFluxλ ν μ€νΈ μ€λͺ μ κ³ νμ§ μ€λμ€λ‘ λ³ννλ μ΅μ²¨λ¨ ν μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν μ λλ€. νλ‘μ° λ§€μΉκ³Ό CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν κΈ°μ μ κΈ°λ°μΌλ‘ ꡬμΆλμ΄, μμ°μ΄ ν둬ννΈλ‘λΆν° λΉ λ₯΄κ³ μ νν μ€λμ€ ν©μ±μ μ 곡ν©λλ€. | |
### μ£Όμ κΈ°λ₯ | |
**1. κ³ κΈ μ€λμ€ μμ±** | |
- μμΈν ν μ€νΈ μ€λͺ μ νμ€μ μΈ μ€λμ€λ‘ λ³ν | |
- μ¬λ¬ μμκ° ν¬ν¨λ 볡μ‘ν μ¬μ΄λμ€μΌμ΄ν μ§μ | |
- μ΅λ 30μ΄ κΈΈμ΄μ μ€λμ€ μμ± | |
- 44.1kHz κ³ νμ§ μ€λμ€ μΆλ ₯ | |
**2. μ μ°ν μμ± μ μ΄** | |
- **Steps (10-100)**: μμ± νμ§ λ μλ μ‘°μ | |
- **Guidance Scale (1-10)**: ν둬ννΈ μ€μλ μ‘°μ | |
- **Duration (1-30μ΄)**: μμ± μ€λμ€ κΈΈμ΄ μ€μ | |
**3. λ€μν μ€λμ€ μμ± λ₯λ ₯** | |
- μμ°μ (νλ, μ²λ₯, λΉ) | |
- λλ¬Ό μ리 (κ° μ§λ μ리, κ³ μμ΄ μΈμ, μ μ§μ κ·) | |
- μΈκ° μ리 (μμ, λ§νκΈ°, ννλ, μ½κ³¨μ΄) | |
- κΈ°κ³μ (μμ§, μ°¨λ, κΈ°κ³λ₯) | |
- λ³΅ν© μ¬μ΄λμ€μΌμ΄ν (μ¬λ¬ μΈ΅μ μ리 μ‘°ν©) | |
**4. κΈ°μ μ ꡬ쑰** | |
- ν¨μ¨μ μΈ μμ±μ μν νλ‘μ° λ§€μΉ μ¬μ© | |
- νμ§ ν₯μμ μν CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν | |
- CUDA μ§μ GPU κ°μ μΆλ‘ | |
- νΈλμ€ν¬λ¨Έ κΈ°λ° ν μ€νΈ μΈμ½λ© | |
- @spaces.GPUλ‘ λΉ λ₯Έ μμ± μ΅μ ν | |
### μλ λ°©μ | |
1. **ν μ€νΈ μ λ ₯**: μνλ μ€λμ€λ₯Ό μμ°μ΄λ‘ μ€λͺ | |
2. **λ§€κ°λ³μ μ‘°μ **: μμ± μ€μ λ―ΈμΈ μ‘°μ | |
3. **AI μ²λ¦¬**: λͺ¨λΈμ΄ ν μ€νΈλ₯Ό ν΄μνκ³ ν΄λΉ μ€λμ€ μμ± | |
4. **μ€λμ€ μΆλ ₯**: μμ±λ WAV νμΌ λ€μ΄λ‘λ λλ μ¬μ | |
### νμ© μμ | |
- **μν λ° λΉλμ€ μ μ**: λ§μΆ€ν μ¬μ΄λ ν¨κ³Ό λ° λΆμκΈ°μ μμ± | |
- **κ²μ κ°λ°**: λμ νκ²½μ μμ± | |
- **νμΊμ€νΈ μ μ**: νμ€μ μΈ λ°°κ²½μ μΆκ° | |
- **μμ μ μ**: λ νΉν μ¬μ΄λ ν μ€μ²μ ν¨κ³Ό μμ± | |
- **κ΅μ‘ μ½ν μΈ **: μ€λͺ μ© μ€λμ€ μμ μμ± | |
- **μ κ·Όμ±**: ν μ€νΈ μ€λͺ μ μ€λμ€ κ²½νμΌλ‘ λ³ν | |
μ΄ μμ€ν μ λ¨μν λ¨μΌ μ리λΆν° 볡μ‘ν λ€μΈ΅ μ¬μ΄λμ€μΌμ΄νκΉμ§ λ€μν μ€λμ€ μμ± κΈ°λ₯μ 보μ¬μ£Όλ 20κ° μ΄μμ μ¬μ ꡬμ±λ μμ λ₯Ό ν¬ν¨νκ³ μμ΅λλ€. |