Spaces:
Sleeping
Sleeping
| title: TorchTransformers Diffusion CV SFT | |
| emoji: โก | |
| colorFrom: yellow | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.43.2 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Torch Transformers Diffusion SFT f. Streamlit & C. Vision | |
| # TorchTransformers Diffusion CV SFT Titans ๐ | |
| A Streamlit app blending `torch`, `transformers`, and `diffusers` for vision and NLP fun! Snap PDFs ๐, turn them into double-page spreads ๐ผ๏ธ, extract text with GPT ๐ค, and craft emoji-packed Markdown outlines ๐โall with a witty UI and CPU-friendly SFT. | |
| ## Integration Details | |
| 1. **SFT Tiny Titans (First Listing)**: | |
| - Features: Causal LM and Diffusion SFT, camera snap, RAG party. | |
| - Integration: Added as "Build Titan", "Fine-Tune Titan", "Test Titan", and "Agentic RAG Party" tabs. Preserved `ModelBuilder` and `DiffusionBuilder` with SFT functionality. | |
| 2. **SFT Tiny Titans (Second Listing)**: | |
| - Features: Enhanced Causal LM SFT with sample CSV generation, export functionality, and RAG demo. | |
| - Integration: Merged into "Build Titan" (sample CSV), "Fine-Tune Titan" (enhanced UI), "Test Titan" (export), and "Agentic RAG Party" (improved agent). | |
| 3. **AI Vision Titans (Current)**: | |
| - Features: PDF snapshotting, OCR with GOT-OCR2_0, Image Gen, GPT-based text extraction. | |
| - Integration: Added as "Download PDFs", "Test OCR", "Test Image Gen", "PDF Process", "Image Process", and "MD Gallery" tabs. Retained async processing and gallery updates. | |
| 4. **Sidebar, Session, and History**: | |
| - Unified gallery shows PNGs, PDFs, and MD files from all tabs. | |
| - Session state (`captured_files`, `builder`, `model_loaded`, `processing`, `history`) tracks all operations. | |
| - History log in sidebar records key actions (snapshots, SFT, tests). | |
| 5. **Workflow**: | |
| - Snap images or download PDFs, snapshot to double-page spreads, extract text with GPT, summarize into emoji outlinesโall saved in the gallery. | |
| 6. **Verification**: | |
| - Run: `streamlit run app.py` | |
| - Check: Camera snaps, PDF downloads, GPT text extraction, and Markdown outlines in gallery. | |
| 7. **Notes**: | |
| - PDF URLs need direct links (e.g., arXivโs `/pdf/` path). | |
| - CPU defaults with CUDA fallback for broad compatibility. | |
| ## Abstract | |
| Fuse `torch`, `transformers`, and `diffusers` with GPT vision for a wild AI ride! Dual `st.camera_input` ๐ท and PDF downloads ๐ feed a gallery, powering GOT-OCR2_0 ๐, Stable Diffusion ๐จ, and GPT text extraction ๐ค. Key papers: | |
| - ๐ **[Streamlit Framework](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI magic. | |
| - ๐ฅ **[PyTorch DL](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Torch core. | |
| - ๐ง **[Attention is All You Need](https://arxiv.org/abs/1706.03762)** - Vaswani et al., 2017: NLP transformers. | |
| - ๐จ **[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)** - Ho et al., 2020: Diffusion basics. | |
| - ๐ **[GOT: General OCR Theory](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Advanced OCR. | |
| - ๐จ **[Latent Diffusion Models](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image generation. | |
| - โ๏ธ **[LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685)** - Hu et al., 2021: SFT efficiency. | |
| - ๐ **[RAG: Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)** - Lewis et al., 2020: RAG foundations. | |
| - ๐๏ธ **[Vision Transformers](https://arxiv.org/abs/2010.11929)** - Dosovitskiy et al., 2020: Vision backbone. | |
| - ๐ **[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)** - OpenAI, 2023: GPT power. | |
| - ๐ผ๏ธ **[CLIP: Learning Transferable Visual Models](https://arxiv.org/abs/2103.00020)** - Radford et al., 2021: Vision-language bridge. | |
| - โฐ **[Time Zone Handling in Python](https://arxiv.org/abs/2308.11235)** - Henshaw, 2023: `pytz` context. | |
| Run: `pip install -r requirements.txt`, `streamlit run app.py`. Snap, process, summarize! โก | |
| ## Usage ๐ฏ | |
| - ๐ท **Camera Snap**: Capture pics with dual cams. | |
| - ๐ฅ **Download PDFs**: Fetch papers (e.g., arXiv links below). | |
| - ๐ **PDF Process**: Snapshot to double-page spreads, extract text with GPT. | |
| - ๐ผ๏ธ **Image Process**: OCR images with GPT vision. | |
| - ๐ **MD Gallery**: Summarize Markdown files into emoji outlines. | |
| ## Tutorial: Single to Double Page Emoji Outlines | |
| ### Single Page Outline: Key Functions in `app.py` | |
| | **Function** | **Purpose** ๐ฏ | **How It Works** ๐ ๏ธ | **Emoji Insight** ๐ | | |
| |----------------------------|---------------------------------------------|--------------------------------------------------|-------------------------------| | |
| | `generate_filename` | Unique file names ๐ | Adds timestamp to sequence | ๐ฐ๏ธ Timeโs your file buddy! | | |
| | `pdf_url_to_filename` | Safe PDF names ๐๏ธ | Cleans URLs to underscores | ๐ซ No URL mess! | | |
| | `get_download_link` | Downloadable files โฌ๏ธ | Base64-encodes for HTML links | ๐ฆ Grab it, go! | | |
| | `download_pdf` | Web PDF snatcher ๐ | Fetches PDFs with `requests` | ๐ PDF pirate ahoy! | | |
| | `process_pdf_snapshot` | PDF to images ๐ผ๏ธ | Async snapshots (single/double/all) with `fitz` | ๐ธ Double-page dazzle! | | |
| | `process_ocr` | Image text extractor ๐ | Async GOT-OCR2_0 with `transformers` | ๐ Text ninja strikes! | | |
| | `process_image_gen` | Prompt to image ๐จ | Async Stable Diffusion with `diffusers` | ๐๏ธ Art from wordsโbam! | | |
| | `process_image_with_prompt`| GPT image analysis ๐ค | Base64 to GPT vision | ๐ง GPT sees all! | | |
| | `process_text_with_prompt` | GPT text summarizer โ๏ธ | Text to GPT for outlining | ๐ Summarize like a pro! | | |
| | `update_gallery` | File showcase ๐ผ๏ธ๐ | Sidebar display with delete options | ๐ Your creations shine! | | |
| ### Double Page Outline: Libraries in `requirements.txt` | |
| | **Library** | **Single Page Purpose** ๐ฏ | **Double Page Usage** ๐ ๏ธ | **Emoji Insight** ๐ | | |
| |---------------|-------------------------------------------|----------------------------------------------------|-------------------------------| | |
| | `streamlit` | App UI ๐ | Tabs like โPDF Process ๐โ and โMD Gallery ๐โ | ๐ฌ App starโlights, action! | | |
| | `pandas` | Data crunching ๐ | Ready for OCR/metadata tables | ๐ Table tamer awaits! | | |
| | `torch` | ML engine ๐ฅ | Powers `transformers` and `diffusers` | ๐ฅ AIโs fiery heart! | | |
| | `requests` | Web grabber ๐ | Downloads PDFs in `download_pdf` | ๐ Web loot collector! | | |
| | `aiofiles` | Fast file ops โก | Async writes in `process_ocr` | โ๏ธ File speed demon! | | |
| | `pillow` | Image magic ๐๏ธ | PDF to image in `process_pdf_snapshot` | ๐ผ๏ธ Pixel Picasso! | | |
| | `PyMuPDF` | PDF handler ๐ | Snapshots in `process_pdf_snapshot` | ๐ PDF scroll master! | | |
| | `transformers`| AI models ๐ฃ๏ธ | GOT-OCR2_0 in `process_ocr` | ๐ค Brain in a box! | | |
| | `diffusers` | Image gen ๐จ | Stable Diffusion in `process_image_gen` | ๐จ Art generator supreme! | | |
| | `openai` | GPT vision/text ๐ค | Image/text processing in GPT functions | ๐ All-seeing AI oracle! | | |
| | `glob2` | File finder ๐ | Gallery files in `update_gallery` | ๐ต๏ธ File sleuth! | | |
| | `pytz` | Time zones โฐ | Timestamps in `generate_filename` | โณ Time wizard! | | |
| ## Automation Instructions: Witty & Funny Steps ๐ | |
| 1. **Load PDFs** ๐ | |
| - Drop URLs into โDownload PDFs ๐ฅโ or upload files. | |
| - *Emoji Tip*: ๐ฆ Unleash the PDF beastโroar through arXiv! | |
| 2. **Double-Page Snap** ๐ธ | |
| - Click โSnapshot Selected ๐ธโ with โTwo Pages (High-Res)โโlandscape glory! | |
| - *Witty Note*: Two pages > one, because who reads half a comic? ๐ฆธ | |
| 3. **GPT Vision Zap** โก | |
| - In โPDF Process ๐โ, pick a GPT model (e.g., `gpt-4o-mini`) and zap text out. | |
| - *Funny Bit*: GPTโs like โI see text, mortals!โ ๐๏ธ | |
| 4. **Markdown Mash** ๐ | |
| - โMD Gallery ๐โ takes Markdown files, smashes them into a 12-point emoji outline. | |
| - *Sassy Tip*: 12 pointsโbecause 11โs weak and 13โs overkill! ๐ | |
| ## Innovative Features ๐ | |
| - **Double-Page Spreads**: High-res, landscape images from PDFsโperfect for apps! ๐ฅ๏ธ | |
| - **GPT Model Picker**: Swap `gpt-4o` for `gpt-4o-mini`โspeed vs. smarts! โก๐ง | |
| - **12-Point Emoji Outline**: Clusters facts into 12 witty sectionsโe.g., โ1. Heroes ๐ฆธโ, โ2. Tech ๐งโ. ๐ | |
| ## Mermaid Process Flow ๐งโโ๏ธ | |
| ```mermaid | |
| graph TD | |
| A[๐ PDFs] -->|๐ฅ Download| B[๐ PDF Process] | |
| B -->|๐ธ Snapshot| C[๐ผ๏ธ Double-Page Images] | |
| C -->|๐ค GPT Vision| D[๐ Markdown Files] | |
| D -->|๐ MD Gallery| E[โ๏ธ 12-Point Emoji Outline] | |
| A:::pdf | |
| B:::process | |
| C:::image | |
| D:::markdown | |
| E:::outline | |
| classDef pdf fill:#f9f,stroke:#333,stroke-width:2px; | |
| classDef process fill:#bbf,stroke:#333,stroke-width:2px; | |
| classDef image fill:#bfb,stroke:#333,stroke-width:2px; | |
| classDef markdown fill:#ffb,stroke:#333,stroke-width:2px; | |
| classDef outline fill:#fbf,stroke:#333,stroke-width:2px; | |
| ``` | |
| Flow Explained: | |
| 1. ๐ PDFs: Start with one or more PDFs on a topic. | |
| 2. ๐ PDF Process: Download and snapshot into high-res double-page spreads. | |
| 3. ๐ผ๏ธ Double-Page Images: Landscape images ideal for apps, processed by GPT. | |
| 4. ๐ Markdown Files: Text extracted per document, saved as Markdown. | |
| 5. โ๏ธ 12-Point Emoji Outline: Combines Markdown files into a 12-section summary (e.g., โ1. Context ๐โ, โ2. Methods ๐ฌโ, ..., โ12. Future ๐โ). | |
| Run: pip install -r requirements.txt, streamlit run app.py. Snap, process, outlineโAI magic! โก | |
| --- | |
| ### Key Updates | |
| 1. **Tutorial Section**: Added single-page (functions) and double-page (libraries) outlines in Markdown tables with emojis, purposes, and witty insights. | |
| 2. **Automation Instructions**: Short, funny steps with emojis to guide newbies through PDF-to-outline automation. | |
| 3. **Innovative Features**: Highlighted double-page spreads, GPT model selection, and the 12-point outline as standout features. | |
| 4. **Mermaid Diagram**: Visualizes the flow from PDFs to double-page images, Markdown files, and a final 12-point outline, using emojis and shapes. | |
| 5. **Updated arXiv Links**: Refreshed to match current functionality (vision, OCR, GPT, diffusion): | |
| - Added GOT-OCR2_0, Vision Transformers, GPT-4, and CLIP papers. | |
| - Kept core papers (Streamlit, PyTorch, etc.) and adjusted for relevance. | |
| ### How to Use | |
| - Save this as `README.md` in your project folder. | |
| - View it in a Markdown renderer (e.g., GitHub, VS Code) to see tables and Mermaid diagram rendered. | |
| - Follow the automation steps to process PDFs and generate outlinesโperfect for learners exploring AI vision and text summarization! | |
| This README now serves as both a project overview and a tutorial, making it a fun, educational asset for all! ๐ | |