Update README.md
Browse files
README.md
CHANGED
@@ -12,139 +12,155 @@ license: mit
|
|
12 |
tag: agent-demo-track
|
13 |
---
|
14 |
|
15 |
-
# Scriptura: A
|
16 |
|
17 |
-
The explanation
|
18 |
|
19 |
-
The screenplay used in the video as sample is available
|
20 |
|
21 |
## Introduction
|
22 |
|
23 |
**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
|
24 |
|
25 |
At its heart:
|
26 |
-
|
27 |
-
|
|
|
28 |
|
29 |
For media generation, Scriptura integrates:
|
30 |
-
|
31 |
-
|
|
|
32 |
|
33 |
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
|
34 |
|
|
|
|
|
35 |
## Agent Capabilities
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
**
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
: - **Web Search ON**: Queries DuckDuckGo API → fetch license info if match.
|
54 |
-
- **Web Search OFF**: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed.
|
55 |
-
- **If no match & search OFF**: No licensing check.
|
56 |
-
|
57 |
-
**Image Generation (Storyboard & Concept Art)**
|
58 |
-
: - **Model**: `FLUX (black-forest-labs/FLUX.1-dev)`
|
59 |
-
- **Trigger**: “Generate Image” / storyboard phase.
|
60 |
-
- **Process**: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption.
|
61 |
-
|
62 |
-
**Audio Generation (Music & Sound Effects)**
|
63 |
-
: - **Model**: `MusicGen (facebook/musicgen-melody)`
|
64 |
-
- **Trigger**: “Generate Audio.”
|
65 |
-
- **Process**: Send prompt → receive MP3/WAV (standalone audio, no text/images).
|
66 |
-
|
67 |
-
**In-Depth Analysis of Key Points**
|
68 |
-
: - **Extracts**:
|
69 |
-
- Characters (role, gender, description)
|
70 |
-
- Locations (interior/exterior, period, geography)
|
71 |
-
- Plot Points (crucial narrative beats via Story Understanding models)
|
72 |
-
- **Extras**: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”).
|
73 |
-
|
74 |
-
**Optional Web Search**
|
75 |
-
: - **Checkbox** toggles DuckDuckGo API lookups.
|
76 |
-
- **If Enabled**: search preconfigured sites (free & paid) for scripts, sound effects.
|
77 |
-
- **Output**: List of links + short summaries.
|
78 |
|
|
|
|
|
|
|
|
|
|
|
79 |
|
80 |
---
|
81 |
|
82 |
## Agent Flow
|
83 |
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
N -->|Yes| O[API Call to MusicGen for audio tracks]
|
100 |
-
N -->|No| P[Skip Audio Generation]
|
101 |
-
L & O --> Q[Final Output: text, JSON/CSV, images, audio]
|
102 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
---
|
104 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
|
106 |
---
|
107 |
## Use Cases
|
108 |
|
109 |
**Independent Writer**
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
|
114 |
**Film Production Company**
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
|
119 |
**Translation and Adaptation Agency**
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
|
124 |
**Digital Humanities Course**
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
|
129 |
---
|
130 |
-
## Credits
|
131 |
-
|
132 |
|
133 |
-
|
134 |
-
## Acknowledgements
|
135 |
|
|
|
|
|
|
|
|
|
136 |
|
137 |
---
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
-
|
142 |
-
-
|
143 |
-
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
-
|
148 |
-
-
|
149 |
-
-
|
150 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
tag: agent-demo-track
|
13 |
---
|
14 |
|
15 |
+
# Scriptura: A MultiAgent System for Screenplay Creation and Editing
|
16 |
|
17 |
+
The explanation video is available [here](https://www.youtube.com/watch?v=I0201ruB1Uo)
|
18 |
|
19 |
+
The screenplay used in the video as sample is available [here](https://www.studiobinder.com/blog/best-free-movie-scripts-online/)
|
20 |
|
21 |
## Introduction
|
22 |
|
23 |
**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
|
24 |
|
25 |
At its heart:
|
26 |
+
|
27 |
+
* Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
|
28 |
+
* Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.
|
29 |
|
30 |
For media generation, Scriptura integrates:
|
31 |
+
|
32 |
+
* MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
|
33 |
+
* FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.
|
34 |
|
35 |
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
|
36 |
|
37 |
+
---
|
38 |
+
|
39 |
## Agent Capabilities
|
40 |
|
41 |
+
Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline:
|
42 |
+
|
43 |
+
- **Text Analysis & Summarization**
|
44 |
+
- Automatically extracts key themes, character arcs, and plot points
|
45 |
+
- Segments and summarizes scenes for rapid iteration
|
46 |
+
|
47 |
+
- **Multimodal Ingestion**
|
48 |
+
- Supports PDF, DOCX, ODT, TXT and image uploads
|
49 |
+
- Transcribes audio files using OpenAI Whisper
|
50 |
+
|
51 |
+
- **Image Generation**
|
52 |
+
- On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev)
|
53 |
+
|
54 |
+
- **Audio Generation**
|
55 |
+
- Produces original soundtracks and SFX with MusicGen (AudioCraft spec)
|
56 |
+
- Allows sample-conditioned audio generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
+
- **Captioning & Metadata**
|
59 |
+
- Auto-generates captions and descriptions for images using Gemma-3-27B-IT
|
60 |
+
|
61 |
+
- **Optional Web Research**
|
62 |
+
- Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references
|
63 |
|
64 |
---
|
65 |
|
66 |
## Agent Flow
|
67 |
|
68 |
+
Here’s an example flow demonstrating how you could use the agent.
|
69 |
+
|
70 |
+
<img alt="Flowchart" src="https://www.canva.com/design/DAGphLlng2I/MZ2cOAnS520rFtnhTP5H6A/view?utm_content=DAGphLlng2I&utm_campaign=designshare&utm_medium=link2&utm_source=uniquelinks&utlId=hca1222039d" width="600"/>
|
71 |
+
|
72 |
+

|
73 |
+
---
|
74 |
+
|
75 |
+
## Code Overview
|
76 |
+
|
77 |
+
```bash
|
78 |
+
.
|
79 |
+
├── app.py # Entry point: defines Gradio interface and routing logic
|
80 |
+
├── system_prompt.txt # System-level prompt template for the CodeAgent
|
81 |
+
├── requirements.txt # Python dependencies (Gradio, SmolAgents, OpenAI, etc.)
|
82 |
+
└── README.md # Project documentation
|
|
|
|
|
|
|
83 |
```
|
84 |
+
|
85 |
+
* **app.py**
|
86 |
+
|
87 |
+
* **Agent** class: loads Qwen3-32B model, registers all tools
|
88 |
+
* **respond()**: orchestrates between Gradio inputs and CodeAgent
|
89 |
+
* Decorated `@tool` functions for image download, media generation, transcription, captioning
|
90 |
+
* Gradio `ChatInterface` setup with text/file support and “Enable web search” toggle
|
91 |
+
|
92 |
+
* **system\_prompt.txt**
|
93 |
+
|
94 |
+
* Injects the agent’s “way of thinking,” including reasoning structure and error handling
|
95 |
+
|
96 |
+
* **requirements.txt**
|
97 |
+
|
98 |
+
* Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.)
|
99 |
+
|
100 |
---
|
101 |
+
|
102 |
+
## Deployment & Access
|
103 |
+
|
104 |
+
### Hugging Face Spaces
|
105 |
+
|
106 |
+
1. Include `app.py`, `system_prompt.txt`, and `requirements.txt` in the root of your Space.
|
107 |
+
2. Configure `OPENAI_API_KEY` and `HF_TOKEN` as Secrets in your Space’s settings.
|
108 |
+
3. Make sure the Space is set to use **Python 3.10 or higher**.
|
109 |
+
4. Select **Gradio** as the SDK (version 5.32.1).
|
110 |
+
5. Pin or share the Space link to collaborate with your team.
|
111 |
+
|
112 |
+
> **Note:** If you choose to clone this repository and run it locally, make sure to set your own `OPENAI_API_KEY` and `HF_TOKEN` environment variables before launching.
|
113 |
|
114 |
---
|
115 |
## Use Cases
|
116 |
|
117 |
**Independent Writer**
|
118 |
+
* Upload a screenplay and quickly get a summary, a list of characters, and locations.
|
119 |
+
* Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).
|
120 |
+
* Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).
|
121 |
|
122 |
**Film Production Company**
|
123 |
+
* Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.
|
124 |
+
* Use the web search feature to find reference scripts or specific sound effects from free/paid sources.
|
125 |
+
* Develop visual storyboards and audio prototypes to share with directors, artists, and investors.
|
126 |
|
127 |
**Translation and Adaptation Agency**
|
128 |
+
* Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).
|
129 |
+
* Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).
|
130 |
+
* Produce reference audio via MusicGen to test culturally appropriate music for the target audience.
|
131 |
|
132 |
**Digital Humanities Course**
|
133 |
+
* Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.
|
134 |
+
* Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.
|
135 |
+
* Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.
|
136 |
|
137 |
---
|
|
|
|
|
138 |
|
139 |
+
## Contributors:
|
|
|
140 |
|
141 |
+
* Code development and implementation made by luke9705;
|
142 |
+
* Ideas creation, testing and videomaking conducted by OrianIce;
|
143 |
+
* Research and testing by Loren1214;
|
144 |
+
* Code revisions by DDPM.
|
145 |
|
146 |
---
|
147 |
+
## Sources
|
148 |
+
The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities:
|
149 |
+
|
150 |
+
- **Qwen3-32B** – primary orchestrating LLM for high-level reasoning and workflow management
|
151 |
+
- **Gradio** – interactive web UI framework
|
152 |
+
- **smolagents** – lightweight multi-agent orchestrator from Hugging Face
|
153 |
+
- **huggingface_hub** – model & dataset management
|
154 |
+
- **duckduckgo-search** – optional web research integration
|
155 |
+
- **openai** – Whisper transcription, GPT-based reasoning
|
156 |
+
- **anthropic** – Claude-style LLM support
|
157 |
+
- **pdfplumber** – PDF text extraction
|
158 |
+
- **docx2txt** – DOCX parsing
|
159 |
+
- **odfpy** – ODT parsing
|
160 |
+
- **pandas** – data handling
|
161 |
+
- **Pillow (PIL)** – image processing
|
162 |
+
- **requests** – HTTP client for external APIs
|
163 |
+
- **numpy** – numerical operations
|
164 |
+
- **MusicGen (AudioCraft)** – soundtrack and SFX generation
|
165 |
+
- **FLUX (black-forest-labs/FLUX.1-dev)** – on-the-fly image generation
|
166 |
+
- **Gemma-3-27B-IT** – multimodal captioning and metadata
|