Model statistics of the 50 most downloaded entities on Hugging Face

Community Article Published October 13, 2025

Foreword

Introduction

TL;DR

Data collected

Plots
Overview

View by country of origin of the account

View by entity type

Type x country

View by modality

View by task

View by language

View by model size

Continuation of work

Citation

Foreword

It's not possible to add .hmtl graphics to Hugging Face blog posts. If you speak French, we really invite you to read the French version here instead where the graphics display well.
For non-French speakers, we can only include screenshots in this article. Under each of them you will find a link to a dynamic plot.

Please note that in the following, in the text within a blue box, we express only our personal reflection.

Finally, it should be noted that on October 9, Elastic announced the acquisition of Jina.ai. We haven't had time to modify the various graphs to take this into account.

Introduction

In this blog post, we analyze the most impactful open-source models in practice. To do so, we focus on a very pragmatic metric: "Which models are downloaded most often on the Hugging Face Hub?". The assumption is that models that are downloaded massively are those that are used in the real world. This approach is also intended to be fairer to individuals/organizations that do not have a communication service or are not followed/liked massively on the Hub.

TL;DR

The analysis of the 50 most downloaded entities on the Hugging Face Hub (80.22% of total Hub downloads) shows that:

Among all open-source models where it is possible to know the size of the model (96.94% of the top 50 and 77.76% of the Hub), small models are by far the most downloaded:
   - 92.48% of downloads are for models with fewer than one billion parameters,
   - 86.33% are for models with fewer than 500 million parameters,
   - 69.83% on models with fewer than 200 million parameters,
   - 40.17% on models with fewer than 100 million parameters.
Model downloads primarily concern NLP (58.1%), followed by CV at 21.2%, audio at 15.1%, various forms of multimodality at 3.3%, time series at 1.7%, with the rest undetermined due to a lack of correctly annotated metadata.
Text encoders represent (base models + their fine-tuning on a specific task) more than 45% of total downloads (or 77.5% of the NLP modality), compared to only 9.5% for decoders (16.5% of the modality) and 3% for encoder-decoders (5% of the modality).
Thus, contrary to the hype surrounding these models, LLMs are not being downloaded massively in open source. Could it be that their real-world use is more on the side of private APIs?
English represents more than 79.46% of downloads of models (monolingual or multilingual) using a language (and even 92.85% if we only consider models with a language tag). This language is far ahead of the others. For example, French, which comes in second place, accounts for only 17.48% (20.43% of models with a language tag).
Companies are the largest contributors to open source, accounting for 63.2% of downloads (20 entities out of 50), followed by universities at 20.7% (10 entities), then individuals at 12.1% (16 entities), non-profit organizations at 3.8% (4 entities), and finally hybrid laboratories at 0.3% (1 entity).
The United States is presents everywhere, covering all modalities (NLP, vision, audio, time series, multimodalities) and all model sizes (from less than 5M parameters to tens/hundreds of billions). Americans are notably driven by their open-source companies (crushing all competition in this segment) but also have strengths in all types of existing organizations (excluding hybrid laboratories) since they are represented 18 times in this top 50.
Europe (notably Germany, France, and the United Kingdom) is also positioned in all types of existing organizations outside of hybrid laboratories (present 20 times) but stands out due to the impact of its specialized universities on small models (<200M parameters). It is also present in all modalities except time series.
China (represented by five entities) has a strong presence in the large open-source model segment (31.8% vs. 43.1% for the United States and 24% for Europe on models with more than 7.5 billion parameters). However, it is badly placed in all other model size categories (only 130 million downloads of models with fewer than 100 million parameters, compared to 7.05 billion for the United States and 5.3 billion for Europe). Its lack of positioning in vision (barely 4 million downloads) and audio (0 downloads) also penalizes it. These are not areas in which it is known to be lagging behind, but it is clear that it does not currently produce open-source content in these areas on Hugging Face (a platform that is not accessible in the country). It dominates the non-profit sector and is the only player in the university/business hybrid laboratory sector.
Finally, other countries in this top 50 only benefit from a specialized player in a given modality.

Data collected

The data shown in this article was collected on October 1, 2025.
After identifying the 50 open-source entities with the most downloads, we collected all the models associated with them. This represents 72,423 models out of the 2,126,833 hosted on the Hub, or 3.41% of the total.
These accounts represent exactly 36,450,707,797 downloads out of a total of 45,438,836,957, or 80.22%.

For each of the models, the pipeline and language tags were also collected when available. Similarly, the size was estimated based on the .safetensors file.

When information was missing (no tags or model size in particular), we manually corrected the data by consulting the model card or publication associated for the 1,000 most downloaded open-source models. These alone represent 77.89% of all Hub downloads and 97.10% of the 50 entities analyzed.

Everything was finally stored in a dataframe that looked like this:

All the graphs shown below are generated from this data.

The amounts are rounded to the nearest million for clarity, but also because the Hub is experiencing some display issues. For example, for the sentence-transformers/static-retrieval-mrl-en-v1 model, no downloads are available . The goal here is mainly to understand the orders of magnitude rather than to focus on the exact numbers evolving every day.

Figure 2: Example of what can be observed for models where the Hub does not correctly display the number of downloads

Plots

Overview

In this first section, we display the overall downloads for each of the top 50 entities contributing to open source, as well as their category type and country of origin. We will discuss these last two points in dedicated sections.

Note that we use the term "entity" rather than (Hugging Face) "account" because an entity can be composed of several accounts. For example, Google is composed of google, google-bert, google-t5, and albert.
We therefore offer a global plot allowing you to compare the most downloaded entities, and another plot by sub-account to visualize how the different accounts are distributed within them.

Overview

Figure 3: Top 50 Hugging Face Entities by Total Downloads
Dynamic version available here

Sub-accounts view

Figure 4: Top 50 Hugging Face Entities by Total Downloads (with sub-account breakdown)
Dynamic version available here

A word about each entity

1. The Google entity is composed of google, google-bert, google-t5 and albert. More than 74% of its downloads come from "old models", namely 64% from BERT, 6.8% from T5, and 3.2% from ALBERT.

2. The Meta entity is composed of FacebookAI, facebook and meta-llama. Similar observations as for Google but to a lesser extent, with 48.3% of downloads coming from its RoBERTa models versus 9% for Llamas.

3. The Sentence-transformers entity (from the Ubiquitous Knowledge Processing Lab in Darmstadt, Germany, and more specifically the work of Nils Reimers) completes the trio. This entity is composed of sentence-transformers and cross-encoder. The sentence-transformers account is actually the most downloaded on all of Hugging Face.

4. The Hugging Face entity is composed of timm (52.4% of downloads), distilbert (44.6%), llava-hf, HuggingFaceTB and HuggingFaceM4.

5. The OpenAI entity is composed of openai (72%) and openai-community (28%). Although it publishes little in open-source, the company is extremely impactful when it does (its CLIP and Whisper are particularly downloaded).

6. 99% of MIT's downloads come from its ast-finetuned-audioset-10-10-0.4593 model, which is the second most downloaded model on the Hub.

7. Microsoft is another popular Big Tech company. The chart in the section on modalities shows that it is the most diversified organization in terms of modalities addressed (whereas, with the exception of a few others, most entities in the top 50 are highly specialized in a given modality).

8. Jonatas Grosman is the most downloaded individual. With barely 300 followers on the Hub, he is an illustration of the fact that it is not necessarily the most followed/liked entities that are the most impactful. He specialized for a time in finetuning wav2vec2 but has not published new models for 3 years now.

9. Pyannote specializes in small audio segmentation and diarization models.

10. 99% of Falcons.ai's downloads (350 followers) come from its nsfw_image_detection model, which is the seventh most downloaded model on the entire Hub.

11. BAAI is the most downloaded Chinese entity on Hugging Face, notably through its bge models. It is also the most downloaded non-profit entity.

12. The Alibaba entity is composed of Qwen (81.5%), Alibaba-NLP (9.5%), thenlper (7.7%), Wan-AI, AIDC-AI, alibaba-pai, alibaba-damo. It is thus primarily downloaded for its Qwen models (especially the Qwen2.5-1.5B-Instruct which represents 20% of its downloads and is the most downloaded LLM on the Hub).

13. The Amazon entity is composed of amazon (80.1%) and autogluon (19.9%). It is the only entity whose downloads are massively focused on time series.

14. Dima806 is primarily downloaded for its dima806/fairface_age_image_detection model. It is an individual still active in 2025.

15. Cardiffnlp is downloaded for its numerous sentiment classification models.

16. The Stability AI entity is composed of stabilityai (80%) and stable-diffusion-v1-5 (20%). It is primarily downloaded for its various versions of Stable Diffusion.

17. The Maziyar Panahi entity is composed of MaziyarPanahi (80.7%), which is his individual account where he offers GGUF versions of LLMs, and OpenMed (19.3%), which is a non-profit organization he created dedicated to medical models. He is an individual still active in 2025.

18. Helsinki-NLP is downloaded for its numerous machine translation models.

19. Laion is primarily downloaded for its numerous models reproducing CLIP.

20. Juan Manuel Pérez via the pysentimiento organization is primarily downloaded for his sentiment classification models in Spanish. He has not published models since 2023.

21. Bingsu is primarily downloaded for the various YOLOs contained in Bingsu/adetailer. He is an individual still active in 2025.

22. Half of AllenAI's downloads come from its longformer.

23. Tohoku-nlp is a university group downloaded mainly for its Japanese version of BERT.

24. Manuel Romero is massively downloaded for his T5 finetunings but especially his financial model distilroberta-finetuned-financial-news-sentiment-analysis. He is an individual still active in 2025.

25. Mistral AI is primarily downloaded for the 7B instruct versions of its models.

26. Prajjwal1's downloads are based on PyTorch conversions of bert-tiny (4M parameters) and bert-small (29M). He has not published models since 2023.

27. Deepset is downloaded for its English QA model.

28. Salesforce has found success with its various versions of its BLIP model.

29. Intfloat is considered a Chinese individual since Liang Wang decided to publish his work under his name on Hugging Face. In practice, these are the e5 models created as part of his work at Microsoft Asia. He is an individual still active in 2025.

30. TheBloke is known for offering quantized versions of models. His most successful version is the phi-2-GGUF. He has not published models since 2024.

31. CompVis is popular for its stable-diffusion-safety-checker model.

32. CIDAS is widely used for its segmentation model clipseg-rd64-refined.

33. Emily Alsentzer found some success with her Bio_ClinicalBERT. She has not published models since 2020.

34. NVIDIA is extremely balanced (it's not a single model that drives all the entity's downloads). Its speakerverification_en_titanet_large model stands out slightly from the others.

35. LM Studio is composed of lmstudio-community (44.1%), bartowski (55.8%) and lmstudio-ai.
Please note that we were really hesitant to include bartowski account in this entity. He describes himself on HF as the "Official Model Curator for https://lmstudio.ai/"" but is primarily a "Research Engineer at arcee.ai". Furthermore, he is located in Canada, while LM Studio is in the United States. The choice was to either combine them or not include either one, as they are not individually in the top 50 (answerdotai being the 51st entity).

36. David S. Lim is downloaded for his English NER models. He has not published models since 2024.

37. Unsloth is primarily downloaded for its quantized versions of LLMs.

38. mradermacher is primarily downloaded for his quantized versions of LLMs. He is a group of individuals still active in 2025.

39. Moritz Laurer is primarily downloaded for his NLI models, notably DeBERTa-v3-base-mnli-fever-anli. He has not published models since 2024.

40. Jean-Baptiste Polle is downloaded for his French NER models. He has not published models since 2023.

41. HFL is a hybrid laboratory (university x company) downloaded mainly for its Chinese version of BERT.

42. Deepseek is fairly balanced with a slight prevalence of R1 models.

43. Big Science is an international collaborative project (led by Hugging Face and CNRS) for its bloom, bloomz and t0 models.

44. Flair is primarily downloaded for its monolingual NER models in multiple languages.

45. Sam Lowe is primarily downloaded for his emotion classification model. He has not published models since 2023.

46. Patrick John Chia, whose nationality we were unable to determine, is known for his fashion-clip model. He has not published models since 2023.

47. Almanach is a university group downloaded mainly for its French version of BERT.

48. Supabase is primarily downloaded for its gte-small model.

49. Jina AI is primarily downloaded for its embedding models.

50. colbert-ir is primarily downloaded for its colbertv2.0 model.

View by country of origin of the account

For the country of origin, in this version we are interested in the locations of the individual and their company's headquarters. The aim here is to estimate the number of countries with an environment enabling the creation of the most downloaded models.

View by country (individuals)

Figure 5: Total Downloads by Country (Individual Countries)
Dynamic version available here

With more than 20.6 billion downloads, the United States accounts for 56.4% of downloads of the 50 most downloaded open-source entities on Hugging Face. This is driven in particular by Big Tech and its high density, with 18 of the 50 entities located there.
Four times smaller, Germany is second with 13.2% of the total, with 4.8 billion downloads, of which 79% come from sentence-transformers. It accounts for 8 of the 50 entities.
France is third with 3.4 billion downloads, representing 9.3% of the total. With five entities, its distribution of contributions from different players is slightly more balanced.
In fourth place, China accounts for 1.9 billion downloads, or 5.2% of the total, also with five entities (four if we consider intfloat associated to Microsoft).
With the exception of the United Kingdom, which also has four entities, all other countries are represented by a single entity.

It may be surprising to observe that China ranks only fourth in terms of downloads. This can be explained by the fact that Hugging Face is not accessible in this country, preventing entities there from counting local users in addition to international ones.
In the rest of the post, analysis of the collected data highlights other possible explanations for this observation (see the section on model size or the section on modalities).

View by country (UE grouped)

Figure 6: Total Downloads by Country (EU Grouped)
Dynamic version available here

When the countries of the European Union (13 entities) are grouped together, their share of total downloads rises to 24%.

View by country (continents)

Figure 7: Total Downloads by Continent
Dynamic version available here

A comparison at the continental level shows that North America accounts for 56.7% of downloads of the top 50 open-source models (and 19 entities), Europe 29% (20 entities), Asia 8.9% (8 entities), and South America 4.9% (2 entities). The remainder is either undetermined or consists of international initiatives (2 entities).

View by entity type

The type of entity is determined by what it has chosen from the options offered by Hugging Face when creating an organization (Company, University, Classroom, Non-profit, Government, Community). We added Individual to designate individuals offering models under their own name outside of an organization. Note also a special case, HFL, which is a joint laboratory between a university and a company. We therefore had to create a Hybrid Lab category not available on Hugging Face.

Figure 8: Total Downloads by Entity Type
Dynamic version available here

Companies (20 entities out of 50) account for 63.2% of downloads of open-source models in the top 50, universities (10 entities) 20.7%, individuals (16 entities) 12.1%, non-profit organizations (4 entities) 3.8%, and hybrid laboratories 0.3% (1 entity).

For universities, most entities are in fact only research teams/groups and not the entire University/Institution. The inability to create sub-teams within a large Hugging Face organization results in the creation of several organizations (one per team, or individuals publishing under their own name rather than that of the University), with the consequence of splintering the total account and making them less visible.
For example, the CNRS (not technically a university) is not represented in the top-50, even though pyannote was created there by Hervé Bredin, Maziyar Panahi works there and Bloom from Big Science was trained on its servers.

Looking at the activity of the different entities on the Hub, we can see that overall, most of them are still active in 2025. One notable exception is the Individual category. Of the 16 entities in this category, only 6 published new models in 2025.
It therefore appears that, unlike other categories, the contribution of individuals to open source is not a sustainable activity over time. This phenomenon can be offset by a renewal of contributors, but it would be interesting to know the reasons why these people are turning away from open source in order to find a way to support them and remedy this problem.

The large share of companies in the open-source model could also be a vulnerability of the model if they decide in the future to stop contributing, as some have done in the past.

Type x country

View by country (individuals)

Figure 9: Total Downloads by Entity Type (Individual Countries)
Dynamic version available here

The United States is present in four categories: first by a wide margin among companies with 76.3% of downloads, second among universities with 30% of the category, third among individuals with 12.5%, and third among non-profits with 15.3%.
Germany is also in four categories: sixth among companies with 1.4% of downloads, first among universities with 54.1%, 2.5% among individuals, and second among non-profits with 20.3%.
China appears in four categories too: fourth among companies with 3.7%, absent from universities (alone) but the only case of a hybrid university/company laboratory in the top 50, with 3.6% among individuals (the intfloat case otherwise absent) and finally first among non-profits with 57.5%.
France is present in three types of categories: second among companies with 12.7%, sixth among universities with 1.2% and fourth among individuals with 8.9%.
The United Kingdom is ranked in three types of categories: fifth among companies with 1.9% of downloads, third among universities with 7.4% and 5.4% among individuals.
All other countries are represented only once in a specific category.

Not applied in this first version, a system for weighting the various amounts advanced would need to be applied. Indeed, the number of downloads in a country can be influenced by its population size, the number of companies in the country, the rate of AI usage among the population, etc.
That is why, for example, we offer individual graphs but also graphs at the EU or continental level.
In addition, discussions are taking place with the Hugging Face teams to try to determine the location of model downloads, in order to distinguish what constitutes "domestic consumption" and what constitutes "exports".

View by country (UE grouped)

Figure 10: Total Downloads by Entity Type (EU grouped)
Dynamic version available here

In this configuration, the European Union ranks second among open-source companies at 14.1%, first among universities at 60%, third among individuals at 15.6% and second among non-profits at 20.3%.

View by country (continents)

Figure 11: Total Downloads by Entity Type (Continental grouped)
Dynamic version available here

At the continental level, North America's position is essentially the same as that listed for the United States. Only the share of individuals increases from 12.5% to 14.4% including Canada contributor.
Europe benefits from the figures for the United Kingdom and Ukraine, giving it second place for companies at 16%, first place for universities at 67.4%, second place for individuals at 34.3%, and second place for non-profits at 20.3%.
Asia ranks third for companies at 7.7%, third for universities at 2.6%, fourth for individuals at 8.6%, first for non-profits at 57.4%, and alone in the hybrid laboratory segment.
Finally, it should be noted that South America stands out for its individual contributions, where it ranks first with 40.7% of the total.