luke9705 commited on
Commit
39b7a18
·
verified ·
1 Parent(s): b21d1ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -13
README.md CHANGED
@@ -15,22 +15,17 @@ tag: agent-demo-track
15
 
16
  ## Introduction
17
 
18
- **Scriptura** is a multi-agent AI system designed to assist authors in creating screenplays, storyboards, and soundtracks. Its main goal is to automate and accelerate the stages of analysis, summarization, and enrichment of narrative text, allowing screenwriters to focus on the creative aspects.
19
 
20
- The core stack includes:
21
- - **DeepSeek (deepseek-ai/DeepSeek-R1)** as the base model for all text operations (analysis, summarization, generation) via APIs managed by Nebius AI.
22
- - **FLUX (black-forest-labs/FLUX.1-dev)** for image generation (storyboards, concept art) integrated into the narrative flow.
23
- - **MusicGen (facebook/musicgen-melody)** to create short audio tracks or sound effects, useful for prototyping or presenting.
24
- - Optional web search (integrated with DuckDuckGo API) to fetch external resources (original scripts, sound effects, reference materials).
25
 
26
- **Scriptura** supports inputs in various formats:
27
- - **Text**: TXT, PDF, DOCX (automatically converted to structured plain text)
28
- - **Images**: JPEG, PNG (for analyzing existing storyboards or screenshots)
29
- - **Audio**: MP3, WAV (for transcribing dialogue or analyzing uploaded soundtracks)
30
 
31
- There are size and duration checks on uploaded files to prevent excessively large inputs.
32
-
33
- ---
34
 
35
  ## Agent Capabilities
36
 
 
15
 
16
  ## Introduction
17
 
18
+ **Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
19
 
20
+ At its heart:
21
+ • Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
22
+ • Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.
 
 
23
 
24
+ For media generation, Scriptura integrates:
25
+ • MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
26
+ • FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation—ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.
 
27
 
28
+ Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
 
 
29
 
30
  ## Agent Capabilities
31