Sound-AI-SFX / README.md
fantaxy's picture
Update README.md
72a850a verified
---
title: Sound AI SFX
emoji: 🐠
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: SText to Audio(Sound SFX) Generator
---
## TangoFlux: Text-to-Audio Generation System
TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
### Key Features
**1. Advanced Audio Generation**
- Converts detailed text descriptions into realistic audio
- Supports complex soundscapes with multiple elements
- Generates audio up to 30 seconds in duration
- Produces 44.1kHz high-quality audio output
**2. Flexible Generation Controls**
- **Steps (10-100)**: Controls generation quality vs speed
- **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt
- **Duration (1-30s)**: Sets the length of generated audio
**3. Diverse Audio Capabilities**
- Natural sounds (ocean waves, thunder, rain)
- Animal sounds (dogs barking, cats meowing, birds singing)
- Human sounds (laughter, speaking, whistling, snoring)
- Mechanical sounds (engines, vehicles, machinery)
- Complex soundscapes (multiple layered sounds)
**4. Technical Architecture**
- Uses flow matching for efficient generation
- CLAP-ranked preference optimization for quality
- GPU-accelerated inference with CUDA support
- Transformer-based text encoding
- Optimized for fast generation with @spaces.GPU
### How It Works
1. **Text Input**: Describe the desired audio in natural language
2. **Parameter Adjustment**: Fine-tune generation settings
3. **AI Processing**: The model interprets text and generates corresponding audio
4. **Audio Output**: Download or play the generated WAV file
### Example Use Cases
- **Film & Video Production**: Create custom sound effects and ambiences
- **Game Development**: Generate dynamic environmental sounds
- **Podcast Production**: Add realistic background sounds
- **Music Production**: Create unique sound textures and effects
- **Educational Content**: Generate illustrative audio examples
- **Accessibility**: Convert text descriptions to audio experiences
The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
---
## TangoFlux: ν…μŠ€νŠΈ-투-μ˜€λ””μ˜€ 생성 μ‹œμŠ€ν…œ
TangoFluxλŠ” ν…μŠ€νŠΈ μ„€λͺ…을 κ³ ν’ˆμ§ˆ μ˜€λ””μ˜€λ‘œ λ³€ν™˜ν•˜λŠ” μ΅œμ²¨λ‹¨ ν…μŠ€νŠΈ-투-μ˜€λ””μ˜€ 생성 μ‹œμŠ€ν…œμž…λ‹ˆλ‹€. ν”Œλ‘œμš° λ§€μΉ­κ³Ό CLAP μˆœμœ„ 기반 μ„ ν˜Έλ„ μ΅œμ ν™” κΈ°μˆ μ„ 기반으둜 κ΅¬μΆ•λ˜μ–΄, μžμ—°μ–΄ ν”„λ‘¬ν”„νŠΈλ‘œλΆ€ν„° λΉ λ₯΄κ³  μ •ν™•ν•œ μ˜€λ””μ˜€ 합성을 μ œκ³΅ν•©λ‹ˆλ‹€.
### μ£Όμš” κΈ°λŠ₯
**1. κ³ κΈ‰ μ˜€λ””μ˜€ 생성**
- μƒμ„Έν•œ ν…μŠ€νŠΈ μ„€λͺ…을 ν˜„μ‹€μ μΈ μ˜€λ””μ˜€λ‘œ λ³€ν™˜
- μ—¬λŸ¬ μš”μ†Œκ°€ ν¬ν•¨λœ λ³΅μž‘ν•œ μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„ 지원
- μ΅œλŒ€ 30초 길이의 μ˜€λ””μ˜€ 생성
- 44.1kHz κ³ ν’ˆμ§ˆ μ˜€λ””μ˜€ 좜λ ₯
**2. μœ μ—°ν•œ 생성 μ œμ–΄**
- **Steps (10-100)**: 생성 ν’ˆμ§ˆ λŒ€ 속도 쑰절
- **Guidance Scale (1-10)**: ν”„λ‘¬ν”„νŠΈ μ€€μˆ˜λ„ μ‘°μ •
- **Duration (1-30초)**: 생성 μ˜€λ””μ˜€ 길이 μ„€μ •
**3. λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 생성 λŠ₯λ ₯**
- μžμ—°μŒ (νŒŒλ„, 천λ‘₯, λΉ„)
- 동물 μ†Œλ¦¬ (개 μ§–λŠ” μ†Œλ¦¬, 고양이 울음, μƒˆ 지저귐)
- 인간 μ†Œλ¦¬ (μ›ƒμŒ, λ§ν•˜κΈ°, 휘파람, 코골이)
- κΈ°κ³„μŒ (μ—”μ§„, μ°¨λŸ‰, 기계λ₯˜)
- 볡합 μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„ (μ—¬λŸ¬ 측의 μ†Œλ¦¬ μ‘°ν•©)
**4. 기술적 ꡬ쑰**
- 효율적인 생성을 μœ„ν•œ ν”Œλ‘œμš° λ§€μΉ­ μ‚¬μš©
- ν’ˆμ§ˆ ν–₯상을 μœ„ν•œ CLAP μˆœμœ„ 기반 μ„ ν˜Έλ„ μ΅œμ ν™”
- CUDA 지원 GPU 가속 μΆ”λ‘ 
- 트랜슀포머 기반 ν…μŠ€νŠΈ 인코딩
- @spaces.GPU둜 λΉ λ₯Έ 생성 μ΅œμ ν™”
### μž‘λ™ 방식
1. **ν…μŠ€νŠΈ μž…λ ₯**: μ›ν•˜λŠ” μ˜€λ””μ˜€λ₯Ό μžμ—°μ–΄λ‘œ μ„€λͺ…
2. **λ§€κ°œλ³€μˆ˜ μ‘°μ •**: 생성 μ„€μ • λ―Έμ„Έ μ‘°μ •
3. **AI 처리**: λͺ¨λΈμ΄ ν…μŠ€νŠΈλ₯Ό ν•΄μ„ν•˜κ³  ν•΄λ‹Ή μ˜€λ””μ˜€ 생성
4. **μ˜€λ””μ˜€ 좜λ ₯**: μƒμ„±λœ WAV 파일 λ‹€μš΄λ‘œλ“œ λ˜λŠ” μž¬μƒ
### ν™œμš© μ˜ˆμ‹œ
- **μ˜ν™” 및 λΉ„λ””μ˜€ μ œμž‘**: λ§žμΆ€ν˜• μ‚¬μš΄λ“œ 효과 및 λΆ„μœ„κΈ°μŒ 생성
- **κ²Œμž„ 개발**: 동적 ν™˜κ²½μŒ 생성
- **팟캐슀트 μ œμž‘**: ν˜„μ‹€μ μΈ 배경음 μΆ”κ°€
- **μŒμ•… μ œμž‘**: λ…νŠΉν•œ μ‚¬μš΄λ“œ ν…μŠ€μ²˜μ™€ 효과 생성
- **ꡐ윑 μ½˜ν…μΈ **: μ„€λͺ…μš© μ˜€λ””μ˜€ 예제 생성
- **μ ‘κ·Όμ„±**: ν…μŠ€νŠΈ μ„€λͺ…을 μ˜€λ””μ˜€ κ²½ν—˜μœΌλ‘œ λ³€ν™˜
이 μ‹œμŠ€ν…œμ€ λ‹¨μˆœν•œ 단일 μ†Œλ¦¬λΆ€ν„° λ³΅μž‘ν•œ λ‹€μΈ΅ μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„κΉŒμ§€ λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 생성 κΈ°λŠ₯을 λ³΄μ—¬μ£ΌλŠ” 20개 μ΄μƒμ˜ 사전 κ΅¬μ„±λœ 예제λ₯Ό ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.