Powergen-AI (PowergenAI)

posted an update 1 day ago

Post

1617

Try Liquid AI's all-new multimodal models: LFM2-VL-1.6B & LFM2-VL-450M! Demo with the Gradio UI and ReportLab support and both models are runnable on T4 GPU!

↗ LFM2-VL-1.6B-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-1.6B-LiquidAI/LFM2-VL-1.6B_ReportLab.ipynb

↗ LFM2-VL-450M-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-450M-LiquidAI/LFM2-VL-450M_ReportLab.ipynb

.
.
.
To know more about it, visit the multimodal outpost notebooks !!

1 reply

·

ginipick

posted an update 1 day ago

Post

294

🎨 AI Webtoon Creation Platform: Turn Your Ideas into Reality!

🌟 Two Powerful Tools, One Perfect Workflow
📖 Webtoon Generator
ginigen/AGI-WebToon-KOREA
"Transform Your Ideas into 40-Episode Masterpieces" ✨

Automated Story Planning 🎬
One-line idea → Complete 40-episode structure
Automatic cliffhangers for each episode
Customized storytelling for 9 different genres

Consistent Character Design 👥
Maintains consistent character appearance throughout
Memorable and distinctive character visuals
Automatic character generation system

Instant 30-Panel Storyboard 🎞️
Auto-placement of dialogue, narration, and sound effects
Cinematic shot composition (close-ups, wide shots, etc.)
Vertical scroll format optimized for webtoons

🖌️ Editing Studio
ginigen/webtoon-studio
"Professional Finishing Touch for Your Generated Webtoons" 🎯

Intuitive Drag & Drop ✏️
10 speech bubble styles (normal, thought, shout, whisper...)
12 Korean fonts for emotional expression
Real-time preview & editing

Professional-Grade Finishing 💎
Image sequence adjustment & spacing control
Individual panel refinement
Publication-ready final export

💡 Who Should Use This?
🏢 Corporate Marketing Teams
Product Launch Campaigns 📱: Turn complex features into engaging stories
Brand Storytelling 🎯: Make corporate messages approachable and shareable

👨‍🎨 Content Creators
Aspiring Artists 🌱: Create webtoons without drawing skills
Professional Writers ⚡: Transform scripts into visual narratives instantly

🚀 Why Use Both Tools Together?
Perfect 3-Step Workflow:
1️⃣ Generate → Input idea, get complete storyboard
2️⃣ Customize → Add branding, adjust dialogue, insert logos
3️⃣ Publish → Export and share across all platforms
📊 Key Benefits

95% faster than traditional production
80% cost reduction compared to agencies
10x better engagement with Gen MZ audience
Zero artistic skills required

🌈 Start Creating Today!

2 replies

·

prithivMLmods

posted an update 5 days ago

Post

4294

On the verge of releasing Poseidon-Reasoning-5M, a dataset built to excel in general thought processes, mathematics, and science across a diverse mixture of domains, I’m also dropping the Gargantua-R1-Compact dataset, a collection of over six million high-quality reasoning QA pair traces. 🤗🚀

✦ Gargantua-R1-Compact : prithivMLmods/Gargantua-R1-Compact

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Compact", split="train")

Additionally, I’m adding the mini version of Gargantua — the Gargantua-R1-Wee : prithivMLmods/Gargantua-R1-Wee

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Wee", split="train")

The composition spans 73.93% core mathematical reasoning involving problems, proofs, and computational challenges, 12.11% across diverse scientific domains such as physics, chemistry, biology, and interdisciplinary topics, 11.35% in competitive coding covering algorithms and data structures, 1.37% in academic science focusing on research-level methodology, 0.95% in creative and analytical reasoning through logic puzzles and problem-solving tasks, 0.25% in specialized technical areas like MLOps, LLMs, diffusion models, and CUDA, and 0.06% involving data from graphs and charts converted into structured JSON formats. Designed with both rich contextual depth and formal structural clarity, Gargantua-R1-Compact is an optimal resource for advancing research in symbolic reasoning, interpretability, and high-precision question answering in mathematical domains.

✦ Collection : prithivMLmods/gargantua-r1-mod-6896bfd7834e82b89ad2b38b

To know more about it, visit the dataset card of the respective dataset. !!

prithivMLmods

posted an update 6 days ago

Post

2150

I've added the demo of the openbmb/MiniCPM-V-4 model to the Hugging Face Space:
prithivMLmods/Multimodal-VLM-Thinking

✨ MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B, with a total of 4.1B parameters. It inherits the strong single-image, multi-image, and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency.

✨ With only 4.1B parameters, MiniCPM-V 4.0 achieves an average score of 69.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. This performance surpasses GPT-4.1-mini-20250414, MiniCPM-V 2.6 (8.1B parameters, OpenCompass 65.2), and Qwen2.5-VL-3B-Instruct (3.8B parameters, OpenCompass 64.5). It also shows good performance in multi-image and video understanding.

The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀

To know more about it, visit the model card of the respective model. !!

ginipick

posted an update 8 days ago

Post

2409

🚀 FLUXllama gpt-oss: 4-bit Quantization + GPT-OSS-120B = Perfect AI Image Generation

🎯 One-Line Summary
"Maximum Images with Minimal Memory!" - The perfect fusion of 4-bit quantization and GPT-OSS-120B prompt enhancement

ginipick/FLUXllama

🧠 Core Innovation: Prompt Enhancement System
📝 What You Type:

"cat"

✨ What GPT-OSS-120B Transforms:

"Majestic tabby cat with emerald eyes in golden afternoon light, soft bokeh, cinematic lighting, 8K photorealistic"

💡 Result: Beginners create professional-grade images instantly!

⚡ The Magic of 4-bit Quantization
🔥 Before (Standard Model)

📦 Memory: 24GB VRAM required
⏱️ Loading: 45 seconds
💰 Cost: RTX 4090 essential ($2000+)

🎉 After (FLUXllama gpt-oss 4-bit)

📦 Memory: 6GB VRAM (75% reduction!)
⏱️ Loading: 12 seconds (73% faster!)
💰 Cost: RTX 3060 works great! ($400)

Same quality, 4x efficiency! 🎊

🔧 Simple Model Swapping
python# Switch to any LLM in 1 second!
pipe = pipeline("text-generation", model="your-model")
✅ GPT-OSS-120B (Premium quality)
✅ Phi-3 (Lightning fast)
✅ Custom models (Your unique style)

🏆 Why FLUXllama gpt-oss?
💪 Powerful

Hugging Face 'STAR AI 12' Selected (Dec 2024)
95% quality maintained with 75% memory savings

🤝 Easy

No prompt writing skills needed
GPT-OSS-120B enhances automatically

💸 Economical

Works on consumer GPUs
60% cloud cost reduction

🚀 Start Now
Just 3 Steps!

💭 Enter your idea
✨ Click "Enhance Prompt"
🎨 Click "Generate"

Result: Images that rival pro designers!

🎊 FLUXllama gpt-oss = Less Resources + Smart Prompts = Best Images
Experience the perfect synergy of 4-bit quantization and GPT-OSS-120B!

2 replies

·

openfree

posted an update 9 days ago

Post

3634

🚀 GPT-OSS 120B & 20B - Use Both Models in One Space!

openfree/OpenAI-gpt-oss
VIDraft/gpt-oss-RAG

🎯 Two Models, One Space!
GPT-OSS hit #1 on HF just 2 hours after release! 🏆
Now you can use both models conveniently in a single space.
📋 Model Selection Made Easy!
Just pick from the dropdown ✅
├── GPT-OSS-120B (Complex tasks)
└── GPT-OSS-20B (Quick chats)
💫 How to Use (Takes 30 seconds!)

Sign in → With your HF account 🔐
Select model → Choose what you need 📌
Apply → Click! ⚡
Start chatting → That's it! 💬

🌈 Perfect For:

120B → Deep analysis, professional work 🧠
20B → Fast responses, casual conversations ⚡

No installation needed - just use it in your browser! 🌐
✨ Special Features

🎨 Beautiful gradient UI
🌙 Dark mode support
🔄 Real-time model switching
🆓 Completely free!

👉 Try it now! It's really that simple!

#GPT-OSS #HuggingFace #FreeAI #EasyToUse

2 replies

·

prithivMLmods

posted an update 9 days ago

Post

4173

Qwen Image – The Latest Image Generation Model🔥

Below are some samples generated using the Qwen Image Diffusion Model. Qwen-Image, a 20B MMDiT model for next-generation text-to-image generation, preserves typographic details, layout coherence, and contextual harmony with stunning accuracy. It is especially strong at creating stunning graphic posters with native text. The model is now open-source. [ 𝚀𝚠𝚎𝚗-𝙸𝚖𝚊𝚐𝚎 : Qwen/Qwen-Image ]

⤷ Try the Qwen Image demo here: prithivMLmods/Qwen-Image-Diffusion

⤷ Qwen-Image Technical Report : Qwen-Image Technical Report (2508.02324)
⤷ Qwen Image [GitHub] : https://github.com/QwenLM/Qwen-Image

Even more impressively, it demonstrates a strong ability to understand images. The model supports a wide range of vision-related tasks such as object detection, semantic segmentation, depth and edge (Canny) estimation, novel view synthesis, and image super-resolution. While each task is technically distinct, they can all be viewed as advanced forms of intelligent image editing driven by deep visual understanding. Collectively, these capabilities position Qwen-Image as more than just a tool for generating appealing visuals, it serves as a versatile foundation model for intelligent visual creation and transformation, seamlessly blending language, layout, and imagery.

Qwen-Image uses a dual-stream MMDiT architecture with a frozen Qwen2.5-VL, VAE encoder, RMSNorm for QK-Norm, LayerNorm elsewhere, and a custom MSRoPE scheme for joint image-text positional encoding.

.
.
.
To know more about it, visit the model card of the respective model. !!

openfree

posted an update 11 days ago

Post

587

🚀 K-Story AI Studio: The Game-Changer in Web Novel Creation
🎯 Unique Differentiation: AI-Powered Korean Web Novel Revolution

openfree/AGI-WebNovel-Gallery
fantaxy/AGI-LEADERBOARD

📚 All-in-One Creative Pipeline

Phase 1 🎲: Generate 10 loglines instantly + AI evaluation system
Phase 2 📖: Selected logline → Complete 40-episode structure
Phase 3 ✍️: Auto-write 500 words per episode

🌟 K-WebNovel Genre Mastery
7 Popular Genre Templates:

💕 Romance: Heart-fluttering moments + Sweet dialogues
⚔️ Romance Fantasy: Regression/Possession + OP protagonist
🐉 Fantasy: Level-up system + Dungeon mechanics
🏙️ Modern Fantasy: Hunter + Gate system
🥋 Martial Arts: Cultivation + Realm progression
🔍 Mystery: Foreshadowing + Plot twists
🎮 Light Novel: Casual tone + Character-driven

💡 Exceptional Features: Productivity Revolution
⚡ Lightning-Fast Creation

Logline Generation: 30 seconds ⏱️
40-Episode Structure: 40 seconds ⏱️
Episode Writing: 20 seconds/episode ⏱️
Complete 20,000-word novel = Just 15 minutes! 🎉

🎨 Smart Features

🧠 AI Evaluation: Auto-scoring for originality, interest, narrative, plausibility
📊 Real-time Progress: Visual tracking for every step
💾 Cloud Library: Save works + Continue writing anytime
📤 One-Click Export: Download as TXT instantly
🌐 Bilingual Support: Korean/English UI switching

🛡️ Stability & Scalability

☁️ Persistent storage on Hugging Face Spaces
🔐 Email-based user authentication
🧹 Automatic memory optimization
📈 Unlimited work creation

🎭 Perfect For:
✅ Aspiring Writers: Have ideas but struggle with structure
✅ Professional Authors: Rapid prototyping & idea validation
✅ Publishers/Agencies: New talent discovery & IP development
✅ Creative Institutions: Storytelling education tool

🌈 K-Story AI Studio: Not just a tool, but your creative partner in the AI era!

ginipick

posted an update 11 days ago

Post

370

🚀 **Wan 2.2 TI2V Enhanced Released!**

🎬 **More Natural Motion, More Powerful Video Generation**
The original Wan 2.2 TI2V model has been upgraded to the **Enhanced version**!
Now bring your imagination to life with smoother and more natural movements.

ginigen/Wan-2.2-Enhanced
---

✨ **Core Upgrades**

🌊 **Natural Motion Built-in**
- Automatic "smooth and natural movement" applied to all videos
- Seamless fluid dynamic movements
- Natural motion transitions for objects

🎯 **Smart Prompt Templates**
Various styles with one click!
- 🎥 **Cinematic** - Movie-like camera movements
- 🎨 **Animation** - Vibrant animated style
- 🌿 **Nature** - Nature documentary style
- ⚡ **Action** - Dynamic action sequences
- 🔍 **Slow Motion** - Detailed slow motion

💾 **Enhanced Performance & Stability**
- 🧠 Smart GPU memory management for longer video generation
- ⚡ Optimized processing speed (up to 15% improvement)
- 🛡️ Enhanced error handling for stable generation

🎨 **Intuitive UI/UX**
- 📊 Real-time progress display
- 🖼️ Automatic resolution optimization on image upload
- 💡 Contextual help & tips
- ✅ Smart input validation system

---

🎯 **Recommended For**

**Content Creators** 👨‍🎨
*"I can apply the style I want instantly with prompt templates!"*

**Video Producers** 🎬
*"The quality has definitely improved with natural motion"*

**AI Artists** 🎨
*"Work is much easier now that long videos generate stably"*
---

🚀 **Get Started Now!**

3 replies

·

prithivMLmods

posted an update 13 days ago

Post

3147

Introducing Camel-Doc-OCR-080125(v2), a document content-structure retrieval VLM designed for content extraction and summarization. This is the second model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825(v1). The new version fixes formal table reconstruction issues in both en and zh language, achieving optimal performance for long-context inferences.🤗🐪

⤷ Camel-Doc-OCR(v2) : prithivMLmods/Camel-Doc-OCR-080125
⤷ Camel-Doc-OCR(v1) : prithivMLmods/Camel-Doc-OCR-062825
⤷ Demo : prithivMLmods/core-OCR

Multimodal Model Collections and Spaces:

➝ Camel-Doc-OCR : prithivMLmods/camel-doc-ocr-080125-688c0c61c5dba648756f31f8
➝ Vision-Language (VLr) : prithivMLmods/vision-language-for-reasoning-vlr-6889b3f45917352b5e3a6f7a
➝ Multimodal Spaces : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
➝ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the model card of the respective model. !!

2 replies

·

prithivMLmods

posted an update 15 days ago

Post

1072

Exciting to bring the explicitly grounded experimental reasoning model, Lumian-VLR-7B-Thinking, built on top of Qwen2.5-VL, featuring reasoning-aware trajectories with enhanced spatial perception. Along with this, we’ve also added a demo for the model while bringing some of the latest and most interesting models available on the hub to make full use of the remaining resources.

✨ Multimodal-VLM-Thinking : prithivMLmods/Multimodal-VLM-Thinking
✨ Multimodal-VLM-OCR : https://huggingface.co/spaces/prithivMLmods/Multimodal-VLM-OCR

✦ Models used in these spaces:

✨ Lumian-VLR-7B-Thinking : prithivMLmods/Lumian-VLR-7B-Thinking
✨ Enesidaon-VLR-7B-no-Thinking : prithivMLmods/Enesidaon-VLR-7B-no-Thinking
✨ GLM-4.1V-9B-Thinking : zai-org/GLM-4.1V-9B-Thinking
✨ DREX-062225-exp : prithivMLmods/DREX-062225-exp & more ...

✦ Multimodal Model Collections and Spaces:

✨ Vision-Language (VLr) : prithivMLmods/vision-language-for-reasoning-vlr-6889b3f45917352b5e3a6f7a
✨ Multimodal Spaces : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update 18 days ago

Post

4817

Explore OCR, Captioning, and Visual Understanding with Cutting-Edge Models on Hugging Face. 🤗🧪

I’ve put together a collection of Google Colab notebooks to experiment with some of the most exciting models available on the Hugging Face Hub focused on OCR, image captioning, and visual understanding tasks. [Image-to-Text] / [Image-Text-to-Text]

> 📖 OCR-ReportLab-Notebooks : prithivMLmods/OCR-ReportLab-Notebooks

These notebooks are built for quick prototyping and run on free T4 GPUs, making them perfect for experimentation, testing ideas, or just exploring what’s possible with modern vision-language models.

Note: The experimental notebooks are compiled with models that fit within the T4 GPU (free-tier) limits. More models along with their notebooks will be added over time.

prithivMLmods

posted an update 21 days ago

Post

2378

Excited to introduce the new experimental model "Qwen2.5-VL-7B-Abliterated-Caption-it", which is performing exceptionally well on image captioning tasks. This variant is specifically tailored for Abliterated Captioning and Uncensored Image Captioning. It is designed to generate highly detailed and descriptive captions across a broad range of visual categories including images with complex, sensitive, or nuanced content while handling varying aspect ratios and resolutions.🧪🤗

✨ Try the demo here : https://huggingface.co/spaces/prithivMLmods/Qwen2.5-VL
✨ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
✨ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update 22 days ago

Post

2375

olmOCR [Allen AI] just got an upgrade! 📈🧑‍🍳

The allenai/olmOCR-7B-0725 — fine-tuned with allenai/olmOCR-mix-0225 on top of Qwen/Qwen2.5-VL-7B-Instruct, pushing the boundaries of OCR technology. It takes a single document image as input, with the longest side resized to 1288 pixels. High-quality, openly available approach to parsing pdfs and other complex documents optical character recognition.

Try the demo here: prithivMLmods/Multimodal-OCR

✨ Model: allenai/olmOCR-7B-0725
✨ Model [fp8]: allenai/olmOCR-7B-0725-FP8
✨ Multimodal Implementations Space Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

prithivMLmods

posted an update 25 days ago

Post

5116

Upgraded the step-by-step notebook for fine-tuning SigLIP2 on domain-specific image classification tasks. The notebook supports both datasets with predefined train/test splits and those with only a train split, making it suitable for low-resource, custom, and real-world classification scenarios. 📢👉

➺ FineTuning-SigLIP2-Notebook : prithivMLmods/FineTuning-SigLIP2-Notebook

➺ GitHub : https://github.com/PRITHIVSAKTHIUR/FineTuning-SigLIP-2

➺ In the first, datasets include predefined train and test splits, enabling conventional supervised learning and generalization evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

➺ In the second scenario, only a training split is available; in such cases, the training set is either partially reserved for validation or reused entirely for evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

This flexibility supports experimentation in constrained or domain-specific settings, where standard test annotations may not exist.

prithivMLmods

posted an update 27 days ago

Post

4079

Dropping the general-purpose reasoning dataset Poseidon-Reasoning-5M, which supports general thought processes, math, and science — featuring a diverse mixture of domains 🌊 : prithivMLmods/Poseidon-Reasoning-5M

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-5M", split="data")

The compact version is as follows — Poseidon-Reasoning-Mini-300K : prithivMLmods/Poseidon-Reasoning-Mini-300K

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Poseidon-Reasoning-Mini-300K", split="train")

Collection : prithivMLmods/poseidon-reasoning-6879ca98e118b307c781a9ba

openfree

posted an update 29 days ago

Post

3101

🎯 AGI NOVEL Generator: The First Step Toward True AI Creativity

openfree/AGI-NOVEL

Can AI Write a 100,000-Word Novel?
What's the ultimate test for AGI (Artificial General Intelligence)? Calculation? Logic? Or creativity?
We tackled the hardest creative challenge: A single AI writing a full-length novel with consistent voice from beginning to end.

🚀 Core Innovations

Single Writer System: Not fragmented texts from multiple AIs, but a genuine novel by one author
Immediate Critique System: Real-time literary critique and revision for each part
170 Quadrillion Themes: Infinite creative possibilities (4.6 million years at 100 novels/day!)
Philosophical Depth: Nobel Prize-level existential exploration and social insight

🎲 Infinite Possibilities
"The day my father died, I discovered he had another family he'd hidden all his life."
One random click generates a powerful opening sentence and a completely new story begins.
📊 Technical Achievements

8,000-word novella auto-generation (approximately 20 minutes)
10 organically structured parts: Perfect narrative arc from introduction to resolution
Real-time progress tracking: Session recovery for uninterrupted creation
DOCX/TXT export: Korean standard book format (152x225mm) support

🌟 Journey Toward AGI
This project goes beyond simple text generation. Sustained memory, causal reasoning, emotional nuance, ethical self-censorship, originality - it tests all capabilities required for AGI.
Experience it now! Your unique story awaits.

fantaxy

posted an update 30 days ago

Post

677

# 🚀 AGI Turing Test Leaderboard: Evaluating AI's Novel Writing Abilities!

## 🤔 Can Machines Truly Create?

In 2025, we've reached an era where we question whether AI can transcend being mere tools to become genuine creators. The **AGI Turing Test Leaderboard** is a revolutionary evaluation system that measures AI's true intelligence through long-form fiction writing!

fantaxy/AGI-LEADERBOARD

## 📚 Why Novel Writing?

Novel writing is humanity's most complex cognitive task. Maintaining coherent worldbuilding across tens of thousands of words, creating multidimensional characters, crafting narratives that move the human heart—all demanding a **symphony of cognitive abilities**! 🎭

## 🏆 10-Tier Literary Evaluation System

**10.0** ✨ Perfect Literary Achievement (Theoretical Apex)
**9.1** 🏅 Nobel Prize Level (*Beloved*, *Never Let Me Go*)
**8.1** 📖 Timeless Classic (*1984*, *Pride and Prejudice*)
**7.1** 🌍 Global Bestseller (*Harry Potter*, *The Da Vinci Code*)
**6.1** 🎖️ International Literary Award (*The Handmaid's Tale*, *Underground Railroad*)
**5.1** ✍️ Professional Writer Level (Academy Award Screenplays)
**0.0** ❌ Plagiarism/Human Work (Disqualified)

## 🎯 Current Record: 6.5 Points!

The recently submitted work "**Commotion as a meteorite crashes through the roof**" achieved **6.5 points**!

### 📊 Evaluation Breakdown
- **Base Score**: 6.1 points (International Literary Award level)
- **Volume Bonus**: +0.4 points (9,000 words)
- **Final Score**: 6.5/10 points

This work demonstrates quality comparable to major international literary award winners like the Booker Prize or Nebula Award!

## 🤝 Join This Historic Experiment!

The current record stands at 6.5 points. **Can your AI surpass this?**

This platform is more than an evaluation tool. It's a grand experiment in understanding creativity itself! We welcome submissions at all levels.

https://huggingface.co/blog/fantaxy/agi-leaderboard

prithivMLmods

posted an update about 1 month ago

Post

2178

Open Omega Ω (Forge, Atom, Explora):
A Fusion of Math, Science, and Coding 🧪🤗

Datasets :
⌯⌲ Open-Omega-Forge-1M [Mathematics, Coding, and Science]: prithivMLmods/Open-Omega-Forge-1M
⌯⌲ Open-Omega-Atom-1.5M [Mathematics and Science]: prithivMLmods/Open-Omega-Atom-1.5M
⌯⌲ Open-Omega-Explora-2.5M [Forge + Atom]: prithivMLmods/Open-Omega-Explora-2.5M
⌯⌲ Others [Subordinate portion] - Curated and blended modular dataset.

Models :
> Omega-Qwen3-Atom-8B : prithivMLmods/Omega-Qwen3-Atom-8B
> Omega-Qwen2.5-Coder-3B : prithivMLmods/Omega-Qwen2.5-Coder-3B

Dataset Collection: prithivMLmods/open-omega-a-fusion-of-math-science-and-coding-68756c37769fa39c4055cc0e

.
.
.
For more information, refer to the dataset card(s).

prithivMLmods

posted an update about 1 month ago

Post

3855

Excited to bring the new models that are performing exceptionally well in document OCR, image captioning, and visual understanding tasks. Megalodon-OCR and Perseus-Doc-VL have both demonstrated significant improvements across key areas. You can explore live demos on Hugging Face Spaces to compare their performance with other top-tier models available on the hub. 🤗📄

Models & Spaces :
> Megalodon-OCR (3B) : prithivMLmods/Megalodon-OCR-Sync-0713
> Perseus-Doc-vl (7B): prithivMLmods/Perseus-Doc-vl-0712
> Doc-VLMs-OCR : https://huggingface.co/spaces/prithivMLmods/Multimodal-VLM-OCR
> core-OCR : prithivMLmods/core-OCR

Datasets Caption Mix :
> Corvus-OCR-Caption-Mix : prithivMLmods/Corvus-OCR-Caption-Mix
> Corvus-OCR-Caption-Mini-Mix : prithivMLmods/Corvus-OCR-Caption-Mini-Mix

Collections :
> Corvus OCR Caption Mix: prithivMLmods/corvus-ocr-caption-mix-687349bfaceffbd10976f0cc
> Captioning / OCR / DocTable : prithivMLmods/captioning-ocr-doctable-687382e1da822008bb5c06f2

GitHub :
> OCR-ReportLab : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLab/blob/main/Megalodon-OCR-Sync-0713-ColabNotebook/Megalodon_OCR_Sync_0713_ReportLab.ipynb

Others Spaces :
> Multimodal-OCR : prithivMLmods/Multimodal-OCR
> Multimodal-VLMs : https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR-Outpost
> Multimodal-OCR2 : prithivMLmods/Multimodal-OCR2
> Florence-2-Image-Caption : prithivMLmods/Florence-2-Image-Caption
> VisionScope-R2 : prithivMLmods/VisionScope-R2
> DocScope-R1 : prithivMLmods/DocScope-R1

.
.
.
To know more about it, visit the model card of the respective model. !!

AI & ML interests

Team members 38

Powergen-AI's activity