Victor Mustar's picture

Victor Mustar PRO

victor

AI & ML interests

Building the UX of this website

Recent Activity

liked a model about 1 hour ago
lodestones/Chroma
liked a model about 3 hours ago
stepfun-ai/Step1X-3D
liked a Space about 3 hours ago
stepfun-ai/Step1X-3D
View all activity

Organizations

Hugging Face's profile picture Google's profile picture Competitions's profile picture Safetensors's profile picture 21 RNN's profile picture Spaces-explorers's profile picture Text Generation Inference's profile picture CVPR Demo Track's profile picture Spaces Examples's profile picture Hugging Chat's profile picture Webhooks Explorers (BETA)'s profile picture lora concepts library's profile picture Scanned Tokens's profile picture Huggingface Projects's profile picture hf admins's profile picture Hugging Face OSS Metrics's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Core ML Projects's profile picture temp-org's profile picture Blog-explorers's profile picture Mustarz's profile picture Open LLM Leaderboard's profile picture Enterprise Explorers's profile picture The Collectionists's profile picture ZeroGPU Explorers's profile picture Hugging Face Tools's profile picture TstOrg141's profile picture Stable Video benchmark's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture LLHF's profile picture SLLHF's profile picture Self-serve FTW's profile picture Inference Explorers's profile picture hf-inference's profile picture

victor's activity

reacted to RiverZ's post with ๐Ÿค— 9 days ago
view post
Post
2664
๐Ÿš€ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝž

๐ŸŽจ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


๐Ÿ”“ Code is now open source!
๐Ÿ”ฅ Huggingface DEMO:
RiverZ/ICEdit

๐ŸŒ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐Ÿ  GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐Ÿค— Huggingface:
sanaka87/ICEdit-MoE-LoRA

๐Ÿ“„ arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


๐Ÿ”ฅ Why itโ€™s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ€” extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ€” think of it as the โ€œDeepSeek of image editingโ€ ๐Ÿ‘€

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ€” happy to send it your way!
reacted to prithivMLmods's post with ๐Ÿ”ฅ 12 days ago
view post
Post
1838
Dropping downstream tasks using newly initialized parameters and weights supports domain-specific image classification post-training, based on the SigLIP-2 models: Patch-16/224, Patch-16/256, and Patch-32/256. For more details, please refer to the respective model cards : ๐Ÿค—

+ watermark detection : prithivMLmods/Watermark-Detection-SigLIP2
+ resisc45 : prithivMLmods/RESISC45-SigLIP2
+ pacs dg : prithivMLmods/PACS-DG-SigLIP2
+ 3d printed or not : prithivMLmods/3D-Printed-Or-Not-SigLIP2
+ formula or text : prithivMLmods/Formula-Text-Detection

Categorizing Un-Safe Content :
- explicit content patch16 256 : prithivMLmods/siglip2-x256-explicit-content
- explicit content patch32 256 : prithivMLmods/siglip2-x256p32-explicit-content

Collection :
> SigLIP2 Content Filters 042025 Final : https://huggingface.co/collections/prithivMLmods/siglip2-content-filters-04202-final-680fe4aa1a9d589bf2c915ff
> SigLIP2 : google/siglip2-67b5dcef38c175486e240107
> SigLIP2 Multilingual Vision-Language Encoders : https://arxiv.org/pdf/2502.14786
reacted to MikeDoes's post with ๐Ÿš€ 12 days ago
view post
Post
1507
PII-Masking-1M Final Day (7/7)! ๐Ÿš€ Today, we unveil 5 NEW Enterprise PII (E-PII) Dataset PREVIEWS!

Standard PII tools often miss sensitive *business* data. That's why we built E-PII previews for the data that powers your operations and compliance needs.

Get a first look (representing 100,000 samples each!) into datasets designed for real-world enterprise security across these categories:

๐Ÿฅ **PHI Preview**: For Healthcare Data
๐Ÿ’ณ **PFI Preview:** For Financial Data
๐Ÿข **PWI Preview:** For Workplace Data
๐Ÿ’ป **PDI Preview:** For Digital Activity Data
๐Ÿ“ **PLI Preview:** For Location Data


That wraps up our #PIIMasking1M 7 days announcement! HUGE thanks for following along and for your engagement.
Explore ALL our releases, including these E-PII previews, in the Ai4Privacy Hugging Face Collection & show some love โค๏ธ if you find them useful!
๐Ÿ”— Visit the Collection:https://huggingface.co/ai4privacy

Let's keep building safer AI, together!
replied to onekq's post 12 days ago
view reply

it's trained to think - probably with the idea to punctually use /no_think for some messages in a conversation where you don't want it to :) (The no think is probably more a product feature than something means to be used as default)

reacted to merterbak's post with ๐Ÿ”ฅ 13 days ago
view post
Post
4825
Qwen 3 models released๐Ÿ”ฅ
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

โœ… Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
โœ…Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
โœ… Three stage done while pretraining:
โ€ข Stage 1: General language learning and knowledge building.
โ€ข Stage 2: Reasoning boost with STEM, coding, and logic skills.
โ€ข Stage 3: Long context training
โœ… It supports MCP in the model
โœ… Strong agent skills
โœ… Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
โœ… Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.
reacted to AdinaY's post with ๐Ÿš€ 13 days ago
view post
Post
2921
DeepSeek, Alibaba, Skywork, Xiaomi, Bytedance.....
And thatโ€™s just part of the companies from the Chinese community that released open models in April ๐Ÿคฏ

zh-ai-community/april-2025-open-releases-from-the-chinese-community-67ea699965f6e4c135cab10f

๐ŸŽฌ Video
> MAGI-1 by SandAI
> SkyReels-A2 & SkyReels-V2 by Skywork
> Wan2.1-FLF2V by Alibaba-Wan

๐ŸŽจ Image
> HiDream-I1 by Vivago AI
> Kimi-VL by Moonshot AI
> InstantCharacter by InstantX & Tencent-Hunyuan
> Step1X-Edit by StepFun
> EasyControl by Shanghai Jiaotong University

๐Ÿง  Reasoning
> MiMo by Xiaomi
> Skywork-R1V 2.0 by Skywork
> ChatTS by ByteDance
> Kimina by Moonshot AI & Numina
> GLM-Z1 by Zhipu AI
> Skywork OR1 by Skywork
> Kimi-VL-Thinking by Moonshot AI

๐Ÿ”Š Audio
> Kimi-Audio by Moonshot AI
> IndexTTS by BiliBili
> MegaTTS3 by ByteDance
> Dolphin by DataOceanAI

๐Ÿ”ข Math
> DeepSeek Prover V2 by Deepseek

๐ŸŒ LLM
> Qwen by Alibaba-Qwen
> InternVL3 by Shanghai AI lab
> Ernie4.5 (demo) by Baidu

๐Ÿ“Š Dataset
> PHYBench by Eureka-Lab
> ChildMandarin & Seniortalk by BAAI

Please feel free to add if I missed anything!
reacted to Xenova's post with ๐Ÿ”ฅ 13 days ago
view post
Post
5942
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. ๐Ÿคฏ A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! ๐Ÿค—

Check it out! ๐Ÿ‘‡
Demo: onnx-community/model-explorer
Dataset: onnx-community/model-explorer
Source code: https://github.com/xenova/model-explorer
reacted to abidlabs's post with ๐Ÿค—๐Ÿš€๐Ÿ”ฅ 13 days ago
view post
Post
2560
Hi folks! Excited to share a new feature from the Gradio team along with a tutorial.

If you don't already know, Gradio is an open-source Python library used to build interfaces for machine learning models. Beyond just creating UIs, Gradio also exposes API capabilities and now, Gradio apps can be launched Model Context Protocol (MCP) servers for LLMs.

If you already know how to use Gradio, there are only two additional things you need to do:
* Add standard docstrings to your function (these will be used to generate the descriptions for your tools for the LLM)
* Set mcp_server=True in launch()


Here's a complete example (make sure you already have the latest version of Gradio installed):


import gradio as gr

def letter_counter(word, letter):
    """Count the occurrences of a specific letter in a word.
    
    Args:
        word: The word or phrase to analyze
        letter: The letter to count occurrences of
        
    Returns:
        The number of times the letter appears in the word
    """
    return word.lower().count(letter.lower())

demo = gr.Interface(
    fn=letter_counter,
    inputs=["text", "text"],
    outputs="number",
    title="Letter Counter",
    description="Count how many times a letter appears in a word"
)

demo.launch(mcp_server=True)



This is a very simple example, but you can add the ability to generate Ghibli images or speak emotions to any LLM that supports MCP. Once you have an MCP running locally, you can copy-paste the same app to host it on [Hugging Face Spaces](https://huggingface.co/spaces/) as well.

All free and open-source of course! Full tutorial: https://www.gradio.app/guides/building-mcp-server-with-gradio
  • 2 replies
ยท
reacted to abhi1nandy2's post with ๐Ÿ‘ 13 days ago
view post
Post
1570
Little late to the party ๐Ÿฅณ, but Iโ€™m thrilled to finally share our AAAI 2025โ€“accepted work! Check out the project homepage here: https://midas-pro-mds.github.io/

๐Ÿš€ Whatโ€™s MiDAS-PRo all about?
We tackle the challenge of coherent, non-redundant multi-document summarization with source attribution via a three-stage LLM-based pipeline:

1. Plan a hierarchical document organization


2. Reason by generating entities/topics


3. Summarize the collection into a cohesive narrative
All โ€œplanningโ€ and โ€œreasoningโ€ steps are framed as code-completion tasks, guided by graph attention network-based in-context example selectionโ€”boosting both automated and human evaluation scores!



๐Ÿ”— Resources

- Paper: https://ojs.aaai.org/index.php/AAAI/article/view/34676

- Slides: https://drive.google.com/file/d/1lWqQtHRnpn-g2IQ3guloj_2j8sDm4z0V/view?usp=drivesdk

- Poster: https://drive.google.com/file/d/1EQqgwbcS7xkVx38y0qPvdRh0gQyeOH5u/view?usp=drivesdk

- Video: https://youtube.com/shorts/6ecxLLUpWJE?si=BAluAeP4-_eCmfu7


Big thanks to my mentor and co-author Sambaran Bandyopadhyay at Adobe Research for the guidance (Summer 2024 internship days FTW!) ๐Ÿ™.

#AAAI2025 #MultiDocumentSummarization #LLM #Research #NLP
reacted to sanaka87's post with ๐Ÿ”ฅ 13 days ago
view post
Post
2503
๐Ÿš€ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝž

๐ŸŽจ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

๐Ÿ”“ Code is now open source!
๐Ÿ”ฅ Huggingface DEMO: RiverZ/ICEdit
๐ŸŒ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐Ÿ  GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐Ÿค— Huggingface: sanaka87/ICEdit-MoE-LoRA
๐Ÿ“„ arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

๐Ÿ”ฅ Why itโ€™s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ€” extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ€” think of it as the โ€œDeepSeek of image editingโ€ ๐Ÿ‘€

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ€” happy to send it your way!
  • 1 reply
ยท
replied to gtvracer's post 15 days ago
reacted to AdinaY's post with ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ 15 days ago
view post
Post
5113
Kimi-Audio ๐Ÿš€๐ŸŽง an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
โœจ 7B
โœจ 13M+ hours of pretraining data
โœจ Novel hybrid input architecture
โœจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to jasoncorkill's post with ๐Ÿš€ 15 days ago
view post
Post
5504
๐Ÿš€ Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
reacted to Aurelien-Morgan's post with ๐Ÿ‘€ 15 days ago
view post
Post
3126
The Almighty function-caller

How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?

We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.

Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
replied to prithivMLmods's post 18 days ago
reacted to prithivMLmods's post with ๐Ÿ”ฅ 18 days ago
view post
Post
1218
Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. ๐Ÿ”ฅ

โ•ฐโ”ˆโžคModels :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2

โ•ฐโ”ˆโžคDatasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K

โ•ฐโ”ˆโžคCollections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1

Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.

For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets
  • 1 reply
ยท