AI & ML interests

None defined yet.

Recent Activity

fdaudens 
posted an update about 7 hours ago
view post
Post
100
Want to learn to build an AI Agent? I put together a cookbook for creating your own news research agent with OpenAI GPT-OSS:

- Searches headlines & specific sites
- Pulls full articles when you need depth
- Summarizes with clickable sources
- Runs in a simple Gradio chat UI
- No GPU, no local setup — just open-weight GPT-OSS models via Hugging Face

If you’ve been wanting to try agents but weren’t sure where to start, this is an end-to-end example you can fork, run, and adapt.

Full guide + code https://huggingface.co/blog/fdaudens/openai-gpt-oss-agent-inference-providers
prithivMLmods 
posted an update 1 day ago
view post
Post
1884
Try Liquid AI's all-new multimodal models: LFM2-VL-1.6B & LFM2-VL-450M! Demo with the Gradio UI and ReportLab support and both models are runnable on T4 GPU!

↗ LFM2-VL-1.6B-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-1.6B-LiquidAI/LFM2-VL-1.6B_ReportLab.ipynb

↗ LFM2-VL-450M-LiquidAI : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/LFM2-VL-450M-LiquidAI/LFM2-VL-450M_ReportLab.ipynb

.
.
.
To know more about it, visit the multimodal outpost notebooks !!
  • 1 reply
·
ZennyKenny 
posted an update 1 day ago
fdaudens 
posted an update 2 days ago
view post
Post
276
What can OpenAI’s new open models do with the news? I built a News Agent to find out.

It can answer questions about the news in real time, and every answer comes with original source links so you can dive deeper.

Ask it things like:
- "What are the top news stories today?"
- "What's the latest on artificial intelligence?"
- Follow-up questions on specific stories

Runs with Hugging Face inference providers, letting you compare results from the OpenAI 20B and 120B models

So far, I’m quite impressed by the capabilities of even the smaller 20B model. Definitely not a perfect project, but curious to hear your thoughts!

fdaudens/gpt-oss-news-agent
  • 2 replies
·
fdaudens 
posted an update 3 days ago
view post
Post
3234
OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.

For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.

It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.

Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K

The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)

Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀
  • 1 reply
·
prithivMLmods 
posted an update 5 days ago
view post
Post
4298
On the verge of releasing Poseidon-Reasoning-5M, a dataset built to excel in general thought processes, mathematics, and science across a diverse mixture of domains, I’m also dropping the Gargantua-R1-Compact dataset, a collection of over six million high-quality reasoning QA pair traces. 🤗🚀

✦ Gargantua-R1-Compact : prithivMLmods/Gargantua-R1-Compact

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Compact", split="train")

Additionally, I’m adding the mini version of Gargantua — the Gargantua-R1-Wee : prithivMLmods/Gargantua-R1-Wee

from datasets import load_dataset

dataset = load_dataset("prithivMLmods/Gargantua-R1-Wee", split="train")

The composition spans 73.93% core mathematical reasoning involving problems, proofs, and computational challenges, 12.11% across diverse scientific domains such as physics, chemistry, biology, and interdisciplinary topics, 11.35% in competitive coding covering algorithms and data structures, 1.37% in academic science focusing on research-level methodology, 0.95% in creative and analytical reasoning through logic puzzles and problem-solving tasks, 0.25% in specialized technical areas like MLOps, LLMs, diffusion models, and CUDA, and 0.06% involving data from graphs and charts converted into structured JSON formats. Designed with both rich contextual depth and formal structural clarity, Gargantua-R1-Compact is an optimal resource for advancing research in symbolic reasoning, interpretability, and high-precision question answering in mathematical domains.

✦ Collection : prithivMLmods/gargantua-r1-mod-6896bfd7834e82b89ad2b38b


To know more about it, visit the dataset card of the respective dataset. !!
prithivMLmods 
posted an update 6 days ago
view post
Post
2153
I've added the demo of the openbmb/MiniCPM-V-4 model to the Hugging Face Space:
prithivMLmods/Multimodal-VLM-Thinking

✨ MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B, with a total of 4.1B parameters. It inherits the strong single-image, multi-image, and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency.

✨ With only 4.1B parameters, MiniCPM-V 4.0 achieves an average score of 69.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. This performance surpasses GPT-4.1-mini-20250414, MiniCPM-V 2.6 (8.1B parameters, OpenCompass 65.2), and Qwen2.5-VL-3B-Instruct (3.8B parameters, OpenCompass 64.5). It also shows good performance in multi-image and video understanding.

The community GPU grant was given by Hugging Face — special thanks to them. 🤗🚀

To know more about it, visit the model card of the respective model. !!
fdaudens 
posted an update 9 days ago
view post
Post
2568
Well, it took just 2 hours for openai/gpt-oss-120b to hit #1 on Hugging Face. Don’t remember seeing anything rise that fast!
  • 1 reply
·
alielfilali01 
posted an update 9 days ago
prithivMLmods 
posted an update 10 days ago
view post
Post
4175
Qwen Image – The Latest Image Generation Model🔥

Below are some samples generated using the Qwen Image Diffusion Model. Qwen-Image, a 20B MMDiT model for next-generation text-to-image generation, preserves typographic details, layout coherence, and contextual harmony with stunning accuracy. It is especially strong at creating stunning graphic posters with native text. The model is now open-source. [ 𝚀𝚠𝚎𝚗-𝙸𝚖𝚊𝚐𝚎 : Qwen/Qwen-Image ]

⤷ Try the Qwen Image demo here: prithivMLmods/Qwen-Image-Diffusion

⤷ Qwen-Image Technical Report : Qwen-Image Technical Report (2508.02324)
⤷ Qwen Image [GitHub] : https://github.com/QwenLM/Qwen-Image

Even more impressively, it demonstrates a strong ability to understand images. The model supports a wide range of vision-related tasks such as object detection, semantic segmentation, depth and edge (Canny) estimation, novel view synthesis, and image super-resolution. While each task is technically distinct, they can all be viewed as advanced forms of intelligent image editing driven by deep visual understanding. Collectively, these capabilities position Qwen-Image as more than just a tool for generating appealing visuals, it serves as a versatile foundation model for intelligent visual creation and transformation, seamlessly blending language, layout, and imagery.

Qwen-Image uses a dual-stream MMDiT architecture with a frozen Qwen2.5-VL, VAE encoder, RMSNorm for QK-Norm, LayerNorm elsewhere, and a custom MSRoPE scheme for joint image-text positional encoding.

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 13 days ago
view post
Post
3148
Introducing Camel-Doc-OCR-080125(v2), a document content-structure retrieval VLM designed for content extraction and summarization. This is the second model in the Camel Doc OCR VLM series, following Camel-Doc-OCR-062825(v1). The new version fixes formal table reconstruction issues in both en and zh language, achieving optimal performance for long-context inferences.🤗🐪

⤷ Camel-Doc-OCR(v2) : prithivMLmods/Camel-Doc-OCR-080125
⤷ Camel-Doc-OCR(v1) : prithivMLmods/Camel-Doc-OCR-062825
⤷ Demo : prithivMLmods/core-OCR

Multimodal Model Collections and Spaces:

➝ Camel-Doc-OCR : prithivMLmods/camel-doc-ocr-080125-688c0c61c5dba648756f31f8
➝ Vision-Language (VLr) : prithivMLmods/vision-language-for-reasoning-vlr-6889b3f45917352b5e3a6f7a
➝ Multimodal Spaces : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
➝ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the model card of the respective model. !!
  • 2 replies
·
prithivMLmods 
posted an update 15 days ago
view post
Post
1073
Exciting to bring the explicitly grounded experimental reasoning model, Lumian-VLR-7B-Thinking, built on top of Qwen2.5-VL, featuring reasoning-aware trajectories with enhanced spatial perception. Along with this, we’ve also added a demo for the model while bringing some of the latest and most interesting models available on the hub to make full use of the remaining resources.

✨ Multimodal-VLM-Thinking : prithivMLmods/Multimodal-VLM-Thinking
✨ Multimodal-VLM-OCR : https://huggingface.co/spaces/prithivMLmods/Multimodal-VLM-OCR

✦ Models used in these spaces:

✨ Lumian-VLR-7B-Thinking : prithivMLmods/Lumian-VLR-7B-Thinking
✨ Enesidaon-VLR-7B-no-Thinking : prithivMLmods/Enesidaon-VLR-7B-no-Thinking
✨ GLM-4.1V-9B-Thinking : zai-org/GLM-4.1V-9B-Thinking
✨ DREX-062225-exp : prithivMLmods/DREX-062225-exp & more ...

✦ Multimodal Model Collections and Spaces:

✨ Vision-Language (VLr) : prithivMLmods/vision-language-for-reasoning-vlr-6889b3f45917352b5e3a6f7a
✨ Multimodal Spaces : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 18 days ago
view post
Post
4818
Explore OCR, Captioning, and Visual Understanding with Cutting-Edge Models on Hugging Face. 🤗🧪

I’ve put together a collection of Google Colab notebooks to experiment with some of the most exciting models available on the Hugging Face Hub focused on OCR, image captioning, and visual understanding tasks. [Image-to-Text] / [Image-Text-to-Text]

> 📖 OCR-ReportLab-Notebooks : prithivMLmods/OCR-ReportLab-Notebooks

These notebooks are built for quick prototyping and run on free T4 GPUs, making them perfect for experimentation, testing ideas, or just exploring what’s possible with modern vision-language models.

Note: The experimental notebooks are compiled with models that fit within the T4 GPU (free-tier) limits. More models along with their notebooks will be added over time.
prithivMLmods 
posted an update 21 days ago
view post
Post
2379
Excited to introduce the new experimental model "Qwen2.5-VL-7B-Abliterated-Caption-it", which is performing exceptionally well on image captioning tasks. This variant is specifically tailored for Abliterated Captioning and Uncensored Image Captioning. It is designed to generate highly detailed and descriptive captions across a broad range of visual categories including images with complex, sensitive, or nuanced content while handling varying aspect ratios and resolutions.🧪🤗

✨ Try the demo here : https://huggingface.co/spaces/prithivMLmods/Qwen2.5-VL
✨ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
✨ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!
prithivMLmods 
posted an update 22 days ago
view post
Post
2376
olmOCR [Allen AI] just got an upgrade! 📈🧑‍🍳

The allenai/olmOCR-7B-0725 — fine-tuned with allenai/olmOCR-mix-0225 on top of Qwen/Qwen2.5-VL-7B-Instruct, pushing the boundaries of OCR technology. It takes a single document image as input, with the longest side resized to 1288 pixels. High-quality, openly available approach to parsing pdfs and other complex documents optical character recognition.

Try the demo here: prithivMLmods/Multimodal-OCR

✨ Model: allenai/olmOCR-7B-0725
✨ Model [fp8]: allenai/olmOCR-7B-0725-FP8
✨ Multimodal Implementations Space Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!
AtAndDev 
posted an update 23 days ago
view post
Post
362
Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...

Qwen ftw!

DECENTRALIZE DECENTRALIZE DECENTRALIZE
davidberenstein1957 
posted an update 24 days ago
prithivMLmods 
posted an update 25 days ago
view post
Post
5117
Upgraded the step-by-step notebook for fine-tuning SigLIP2 on domain-specific image classification tasks. The notebook supports both datasets with predefined train/test splits and those with only a train split, making it suitable for low-resource, custom, and real-world classification scenarios. 📢👉

➺ FineTuning-SigLIP2-Notebook : prithivMLmods/FineTuning-SigLIP2-Notebook

➺ GitHub : https://github.com/PRITHIVSAKTHIUR/FineTuning-SigLIP-2

➺ In the first, datasets include predefined train and test splits, enabling conventional supervised learning and generalization evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

➺ In the second scenario, only a training split is available; in such cases, the training set is either partially reserved for validation or reused entirely for evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

This flexibility supports experimentation in constrained or domain-specific settings, where standard test annotations may not exist.