It's not possible to add .hmtl graphics to Hugging Face blog posts. If you speak French, we really invite you to read the French version here instead where the graphics display well. For non-French speakers, we can only include screenshots in this article. Under each of them you will find a link to a dynamic plot.
Please note that in the following, in the text within a blue box, we express only our personal reflection.
Finally, it should be noted that on October 9, Elastic announced the acquisition of Jina.ai. We haven't had time to modify the various graphs to take this into account.
Introduction
In this blog post, we analyze the most impactful open-source models in practice. To do so, we focus on a very pragmatic metric: "Which models are downloaded most often on the Hugging Face Hub?". The assumption is that models that are downloaded massively are those that are used in the real world. This approach is also intended to be fairer to individuals/organizations that do not have a communication service or are not followed/liked massively on the Hub.
TL;DR
The analysis of the 50 most downloaded entities on the Hugging Face Hub (80.22% of total Hub downloads) shows that:
Among all open-source models where it is possible to know the size of the model (96.94% of the top 50 and 77.76% of the Hub), small models are by far the most downloaded: - 92.48% of downloads are for models with fewer than one billion parameters, - 86.33% are for models with fewer than 500 million parameters, - 69.83% on models with fewer than 200 million parameters, - 40.17% on models with fewer than 100 million parameters.
Model downloads primarily concern NLP (58.1%), followed by CV at 21.2%, audio at 15.1%, various forms of multimodality at 3.3%, time series at 1.7%, with the rest undetermined due to a lack of correctly annotated metadata.
Text encoders represent (base models + their fine-tuning on a specific task) more than 45% of total downloads (or 77.5% of the NLP modality), compared to only 9.5% for decoders (16.5% of the modality) and 3% for encoder-decoders (5% of the modality). Thus, contrary to the hype surrounding these models, LLMs are not being downloaded massively in open source. Could it be that their real-world use is more on the side of private APIs?
English represents more than 79.46% of downloads of models (monolingual or multilingual) using a language (and even 92.85% if we only consider models with a language tag). This language is far ahead of the others. For example, French, which comes in second place, accounts for only 17.48% (20.43% of models with a language tag).
Companies are the largest contributors to open source, accounting for 63.2% of downloads (20 entities out of 50), followed by universities at 20.7% (10 entities), then individuals at 12.1% (16 entities), non-profit organizations at 3.8% (4 entities), and finally hybrid laboratories at 0.3% (1 entity).
The United States is presents everywhere, covering all modalities (NLP, vision, audio, time series, multimodalities) and all model sizes (from less than 5M parameters to tens/hundreds of billions). Americans are notably driven by their open-source companies (crushing all competition in this segment) but also have strengths in all types of existing organizations (excluding hybrid laboratories) since they are represented 18 times in this top 50. Europe (notably Germany, France, and the United Kingdom) is also positioned in all types of existing organizations outside of hybrid laboratories (present 20 times) but stands out due to the impact of its specialized universities on small models (<200M parameters). It is also present in all modalities except time series. China (represented by five entities) has a strong presence in the large open-source model segment (31.8% vs. 43.1% for the United States and 24% for Europe on models with more than 7.5 billion parameters). However, it is badly placed in all other model size categories (only 130 million downloads of models with fewer than 100 million parameters, compared to 7.05 billion for the United States and 5.3 billion for Europe). Its lack of positioning in vision (barely 4 million downloads) and audio (0 downloads) also penalizes it. These are not areas in which it is known to be lagging behind, but it is clear that it does not currently produce open-source content in these areas on Hugging Face (a platform that is not accessible in the country). It dominates the non-profit sector and is the only player in the university/business hybrid laboratory sector. Finally, other countries in this top 50 only benefit from a specialized player in a given modality.
Data collected
The data shown in this article was collected on October 1, 2025. After identifying the 50 open-source entities with the most downloads, we collected all the models associated with them.
This represents 72,423 models out of the 2,126,833 hosted on the Hub, or 3.41% of the total. These accounts represent exactly 36,450,707,797 downloads out of a total of 45,438,836,957, or 80.22%.
For each of the models, the pipeline and language tags were also collected when available. Similarly, the size was estimated based on the .safetensors file.
When information was missing (no tags or model size in particular), we manually corrected the data by consulting the model card or publication associated for the 1,000 most downloaded open-source models. These alone represent 77.89% of all Hub downloads and 97.10% of the 50 entities analyzed.
Everything was finally stored in a dataframe that looked like this:
All the graphs shown below are generated from this data.
The amounts are rounded to the nearest million for clarity, but also because the Hub is experiencing some display issues. For example, for the sentence-transformers/static-retrieval-mrl-en-v1 model, no downloads are available . The goal here is mainly to understand the orders of magnitude rather than to focus on the exact numbers evolving every day.
Figure 2: Example of what can be observed for models where the Hub does not correctly display the number of downloads
Plots
Overview
In this first section, we display the overall downloads for each of the top 50 entities contributing to open source, as well as their category type and country of origin. We will discuss these last two points in dedicated sections.
Note that we use the term "entity" rather than (Hugging Face) "account" because an entity can be composed of several accounts. For example, Google is composed of google, google-bert, google-t5, and albert. We therefore offer a global plot allowing you to compare the most downloaded entities, and another plot by sub-account to visualize how the different accounts are distributed within them.
Overview
Figure 3: Top 50 Hugging Face Entities by Total Downloads Dynamic version available
here
Sub-accounts view
Figure 4: Top 50 Hugging Face Entities by Total Downloads (with sub-account breakdown) Dynamic version available
here
A word about each entity
1. The Google entity is composed of google, google-bert, google-t5 and albert. More than 74% of its downloads come from "old models", namely 64% from BERT, 6.8% from T5, and 3.2% from ALBERT.
2. The Meta entity is composed of FacebookAI, facebook and meta-llama. Similar observations as for Google but to a lesser extent, with 48.3% of downloads coming from its RoBERTa models versus 9% for Llamas.
3. The Sentence-transformers entity (from the Ubiquitous Knowledge Processing Lab in Darmstadt, Germany, and more specifically the work of Nils Reimers) completes the trio. This entity is composed of sentence-transformers and cross-encoder. The sentence-transformers account is actually the most downloaded on all of Hugging Face.
5. The OpenAI entity is composed of openai (72%) and openai-community (28%). Although it publishes little in open-source, the company is extremely impactful when it does (its CLIP and Whisper are particularly downloaded).
7. Microsoft is another popular Big Tech company. The chart in the section on modalities shows that it is the most diversified organization in terms of modalities addressed (whereas, with the exception of a few others, most entities in the top 50 are highly specialized in a given modality).
8. Jonatas Grosman is the most downloaded individual. With barely 300 followers on the Hub, he is an illustration of the fact that it is not necessarily the most followed/liked entities that are the most impactful. He specialized for a time in finetuning wav2vec2 but has not published new models for 3 years now.
9. Pyannote specializes in small audio segmentation and diarization models.
10. 99% of Falcons.ai's downloads (350 followers) come from its nsfw_image_detection model, which is the seventh most downloaded model on the entire Hub.
11. BAAI is the most downloaded Chinese entity on Hugging Face, notably through its bge models. It is also the most downloaded non-profit entity.
15. Cardiffnlp is downloaded for its numerous sentiment classification models.
16. The Stability AI entity is composed of stabilityai (80%) and stable-diffusion-v1-5 (20%). It is primarily downloaded for its various versions of Stable Diffusion.
17. The Maziyar Panahi entity is composed of MaziyarPanahi (80.7%), which is his individual account where he offers GGUF versions of LLMs, and OpenMed (19.3%), which is a non-profit organization he created dedicated to medical models. He is an individual still active in 2025.
18. Helsinki-NLP is downloaded for its numerous machine translation models.
19. Laion is primarily downloaded for its numerous models reproducing CLIP.
20. Juan Manuel Pérez via the pysentimiento organization is primarily downloaded for his sentiment classification models in Spanish. He has not published models since 2023.
21. Bingsu is primarily downloaded for the various YOLOs contained in Bingsu/adetailer. He is an individual still active in 2025.
22. Half of AllenAI's downloads come from its longformer.
28. Salesforce has found success with its various versions of its BLIP model.
29. Intfloat is considered a Chinese individual since Liang Wang decided to publish his work under his name on Hugging Face. In practice, these are the e5 models created as part of his work at Microsoft Asia. He is an individual still active in 2025.
30. TheBloke is known for offering quantized versions of models. His most successful version is the phi-2-GGUF. He has not published models since 2024.
33. Emily Alsentzer found some success with her Bio_ClinicalBERT. She has not published models since 2020.
34. NVIDIA is extremely balanced (it's not a single model that drives all the entity's downloads). Its speakerverification_en_titanet_large model stands out slightly from the others.
35. LM Studio is composed of lmstudio-community (44.1%), bartowski (55.8%) and lmstudio-ai. Please note that we were really hesitant to include bartowski account in this entity. He describes himself on HF as the "Official Model Curator for https://lmstudio.ai/"" but is primarily a "Research Engineer at arcee.ai".
Furthermore, he is located in Canada, while LM Studio is in the United States. The choice was to either combine them or not include either one, as they are not individually in the top 50 (answerdotai being the 51st entity).
36. David S. Lim is downloaded for his English NER models. He has not published models since 2024.
37. Unsloth is primarily downloaded for its quantized versions of LLMs.
38. mradermacher is primarily downloaded for his quantized versions of LLMs. He is a group of individuals still active in 2025.
39. Moritz Laurer is primarily downloaded for his NLI models, notably DeBERTa-v3-base-mnli-fever-anli. He has not published models since 2024.
40. Jean-Baptiste Polle is downloaded for his French NER models. He has not published models since 2023.
48. Supabase is primarily downloaded for its gte-small model.
49. Jina AI is primarily downloaded for its embedding models.
50. colbert-ir is primarily downloaded for its colbertv2.0 model.
View by country of origin of the account
For the country of origin, in this version we are interested in the locations of the individual and their company's headquarters. The aim here is to estimate the number of countries with an environment enabling the creation of the most downloaded models.
View by country (individuals)
Figure 5: Total Downloads by Country (Individual Countries) Dynamic version available
here
With more than 20.6 billion downloads, the United States accounts for 56.4% of downloads of the 50 most downloaded open-source entities on Hugging Face. This is driven in particular by Big Tech and its high density, with 18 of the 50 entities located there. Four times smaller, Germany is second with 13.2% of the total, with 4.8 billion downloads, of which 79% come from sentence-transformers. It accounts for 8 of the 50 entities. France is third with 3.4 billion downloads, representing 9.3% of the total. With five entities, its distribution of contributions from different players is slightly more balanced. In fourth place, China accounts for 1.9 billion downloads, or 5.2% of the total, also with five entities (four if we consider intfloat associated to Microsoft). With the exception of the United Kingdom, which also has four entities, all other countries are represented by a single entity.
It may be surprising to observe that China ranks only fourth in terms of downloads. This can be explained by the fact that Hugging Face is not accessible in this country, preventing entities there from counting local users in addition to international ones. In the rest of the post, analysis of the collected data highlights other possible explanations for this observation (see the section on model size or the section on modalities).
View by country (UE grouped)
Figure 6: Total Downloads by Country (EU Grouped) Dynamic version available
here
When the countries of the European Union (13 entities) are grouped together, their share of total downloads rises to 24%.
View by country (continents)
Figure 7: Total Downloads by Continent Dynamic version available
here
A comparison at the continental level shows that North America accounts for 56.7% of downloads of the top 50 open-source models (and 19 entities), Europe 29% (20 entities), Asia 8.9% (8 entities), and South America 4.9% (2 entities). The remainder is either undetermined or consists of international initiatives (2 entities).
View by entity type
The type of entity is determined by what it has chosen from the options offered by Hugging Face when creating an organization (Company, University, Classroom, Non-profit, Government, Community). We added Individual to designate individuals offering models under their own name outside of an organization. Note also a special case, HFL, which is a joint laboratory between a university and a company. We therefore had to create a Hybrid Lab category not available on Hugging Face.
Figure 8: Total Downloads by Entity Type Dynamic version available
here
Companies (20 entities out of 50) account for 63.2% of downloads of open-source models in the top 50, universities (10 entities) 20.7%, individuals (16 entities) 12.1%, non-profit organizations (4 entities) 3.8%, and hybrid laboratories 0.3% (1 entity).
For universities, most entities are in fact only research teams/groups and not the entire University/Institution. The inability to create sub-teams within a large Hugging Face organization results in the creation of several organizations (one per team, or individuals publishing under their own name rather than that of the University), with the consequence of splintering the total account and making them less visible. For example, the CNRS (not technically a university) is not represented in the top-50, even though pyannote was created there by Hervé Bredin, Maziyar Panahi works there and Bloom from Big Science was trained on its servers.
Looking at the activity of the different entities on the Hub, we can see that overall, most of them are still active in 2025. One notable exception is the Individual category. Of the 16 entities in this category, only 6 published new models in 2025. It therefore appears that, unlike other categories, the contribution of individuals to open source is not a sustainable activity over time. This phenomenon can be offset by a renewal of contributors, but it would be interesting to know the reasons why these people are turning away from open source in order to find a way to support them and remedy this problem.
The large share of companies in the open-source model could also be a vulnerability of the model if they decide in the future to stop contributing, as some have done in the past.
Type x country
View by country (individuals)
Figure 9: Total Downloads by Entity Type (Individual Countries) Dynamic version available
here
The United States is present in four categories: first by a wide margin among companies with 76.3% of downloads, second among universities with 30% of the category, third among individuals with 12.5%, and third among non-profits with 15.3%. Germany is also in four categories: sixth among companies with 1.4% of downloads, first among universities with 54.1%, 2.5% among individuals, and second among non-profits with 20.3%. China appears in four categories too: fourth among companies with 3.7%, absent from universities (alone) but the only case of a hybrid university/company laboratory in the top 50, with 3.6% among individuals (the intfloat case otherwise absent) and finally first among non-profits with 57.5%. France is present in three types of categories: second among companies with 12.7%, sixth among universities with 1.2% and fourth among individuals with 8.9%. The United Kingdom is ranked in three types of categories: fifth among companies with 1.9% of downloads, third among universities with 7.4% and 5.4% among individuals. All other countries are represented only once in a specific category.
Not applied in this first version, a system for weighting the various amounts advanced would need to be applied. Indeed, the number of downloads in a country can be influenced by its population size, the number of companies in the country, the rate of AI usage among the population, etc. That is why, for example, we offer individual graphs but also graphs at the EU or continental level. In addition, discussions are taking place with the Hugging Face teams to try to determine the location of model downloads, in order to distinguish what constitutes "domestic consumption" and what constitutes "exports".
View by country (UE grouped)
Figure 10: Total Downloads by Entity Type (EU grouped) Dynamic version available
here
In this configuration, the European Union ranks second among open-source companies at 14.1%, first among universities at 60%, third among individuals at 15.6% and second among non-profits at 20.3%.
View by country (continents)
Figure 11: Total Downloads by Entity Type (Continental grouped) Dynamic version available
here
At the continental level, North America's position is essentially the same as that listed for the United States. Only the share of individuals increases from 12.5% to 14.4% including Canada contributor. Europe benefits from the figures for the United Kingdom and Ukraine, giving it second place for companies at 16%, first place for universities at 67.4%, second place for individuals at 34.3%, and second place for non-profits at 20.3%. Asia ranks third for companies at 7.7%, third for universities at 2.6%, fourth for individuals at 8.6%, first for non-profits at 57.4%, and alone in the hybrid laboratory segment. Finally, it should be noted that South America stands out for its individual contributions, where it ranks first with 40.7% of the total.
View by modality
Overview
Figure 12: Contribution of Entities to the Modalities Dynamic version available
here
We can see that NLP is the most downloaded modality among the top 50, with 58.1% of downloads, followed by CV at 21.2% and audio at 15.1%. The “Unknown” modality includes all models whose language tag was not specified and could not be corrected.
The number of downloads for a modality seems to be related to whether or not Hugging Face also includes a model for that modality in its transformers library. NLP thus appears to be favored (Hugging Face being known for this modality). It is not clear whether practitioners of other modalities use this library for their use cases. For example, there are several alternatives in vision with tools from open-mmlab, roboflow, etc. In the vision modality, the most downloaded model is CLIP, integrated into transformers.
View by entity
In the following graph, for each entity, we show the proportion of each modality in its downloads.
Figure 13: Top 50 Hugging Face Entities by Total Downloads with Modality Breakdown Dynamic version available
here
We observe that few entities are diversified. Each seems to have a specialty. 32 are mainly involved in NLP, 10 in vision (paradoxically Hugging Face with the acquisition of timm and OpenAI as a result of not having released any NLP models in open source since gpt2 before August 2025), 4 in audio, 2 in multimodal NLP/vision, 1 in time series, and 1 undetermined (mradermacher offers quantified versions of models, so this would likely be NLP).
View by sub-account
The chart below provides a little more detail by showing the breakdown by sub-account for each entity.
Figure 14: Top 50 Hugging Face Entities by Total Downloads with Modality Breakdown (with sub-account breakdown) Dynamic version available
here
View by country (individuals)
Figure 15: Contribution of Countries to the Modalities Dynamic version available
here
It can be noted that the United States ranks first in all modalities. Except for vision, where it scores "only" 46.6%, it captures the majority in all other modalities. France ranks second or third in NLP, vision, and audio modalities. China, for its part, is absent from CV and audio modalities. This could explain the observation made above. It ranks "only" fourth in overall downloads, or perhaps rather that Germany (present in NLP and vision) and France register more downloads.
View by country (UE grouped)
Figure 16: Contribution of Countries (EU grouped) to the Modalities Dynamic version available
here
The European Union thus ranks second or third in terms of NLP, vision, and audio modalities.
View by country (continents)
Figure 17: Contribution of Continents to the Modalities Dynamic version available
here
View by task
The modality graphs could be obtained from the tag pipelines using the associations available in the following dictionary.
In this section, we display the different tasks by modality to get a better overview of the most popular ones.
Overview
NLP tasks are shown in different shades of blue, vision tasks in different shades of yellow, and audio tasks in different shades of red.
After exchanges with the Hugigng Face team, we have decided to merge the sentence-similarity and feature-extraction pipeline tags.
Finally, we did not group text2text-generation with text-generation. The first tag is mainly used by encoder-decoder models that generate text (T5, BART), while the second is used by decoder-only models. The goal here was to show that these encoder-decoders account for as many downloads as Alibaba's Qwen and more than Meta's Llamas.
Figure 18: Contribution of Entities to Pipeline Tags Dynamic version available
here
Base encoder-only models visible via the fill-mask task are by far the most downloaded models, accounting for 22.3% of downloads. Their fine-tunings account for 22.7% of downloads, mainly on the sentence-similarity task (which we hesitated to merge with text-ranking) followed by text-classification, token-classification, zero-shot-classification, and finally question-answering.
Pure decoder generation models account for 9.5% of downloads. Encoder-decoder models account for 1.4%, to which we can add the translation task at 1.6%, for which they are mainly used. Next comes vision, with 11.1% of downloads in classification, 6.4% in zero-shot-classification (CLIP and derivatives), 1.8% in image feature extraction, 0.9% in object detection, and 0.5% in object segmentation (the rest of the tasks are not significant).
For audio, we have 7% ASR, 6% classification, and 1.2% activity detection.
View by entity
Figure 19: Top 50 Hugging Face Entities by Total Downloads with Pipeline Tag Breakdown Dynamic version available
here
The graphs by sub-account and by country are not displayed. They are completely illegible with more than 50 tasks listed in total.
View by language
In this section, we focus only on models related to tasks where language is applicable (NLP tasks, ASR, text-to-image, etc.). This represents 24,592,908,565 downloads out of the initial 36,450,707,797 in the top 50, or 67.47%.
In practice, it turns out that for 14.42% of these 24.6 billion downloads, the language tag is not specified for these models.
After analysis, 184 languages are referenced in this top 50 (224 other values were found but are not ISO639-1 or ISO639-3 tags). Here too, for visibility reasons, only the top 20 languages are displayed in the following graph. All the figures can be found in the dataset available here.
Figure 20: Total Downloads by Language Dynamic version available
here
It can be noted that among all the models available in the top 50 based on language, English accounts for more than 79.46% of downloads of these models (monolingual or multilingual) and even 92.85% of models with a language tag. It is far ahead of other languages. For example, French, which comes in second place, accounts for only 17.48% (20.43% of models with a language tag). In this top 20, languages with a Latin alphabet occupy the top positions but account for only 8 out of 20, showing that multilingual models are quite diverse.
View by model size
In this section, we focus exclusively on the 11,263 models for which we were able to determine a model size.
They account for 35,333,543,289 downloads out of 36,450,707,797, or 96.94% or 77.76% of the total.
Overview
Figure 21: Total Downloads by Model Size Dynamic version available
here
We can see that:
92.48% of downloads are for models with fewer than one billion parameters,
86.33% are for models with fewer than 500 million parameters,
69.83% are for models with fewer than 200 million parameters,
40.17% are for models with fewer than 100 million parameters.
In more detail:
Figure 22: Total Downloads by Model Size (more detailed) Dynamic version available
here
LLMs are therefore not massively downloaded models (at least in open source). One hypothesis is that the explanation could lie in the profile of Hugging Face users (particularly their computing power, which can be entered here), but we have not found a way to retrieve this information automatically. In any case, an entity wishing to have an impact in open source probably needs to propose models with fewer than 500 million parameters, or even fewer than 200 million, in order to be downloaded by a large target audience. Given that 92.5% of downloads are for models with fewer than one billion parameters, it would be useful that Hugging Face allows for more refined filtering options.
View by entity type
Figure 23: Total Downloads by Model Size and Entity Type Dynamic version available
here
Companies are present in all model size ranges. They account for a significant part of models with fewer than 5 million parameters (94% of downloads in this range) and more than 500 million parameters (77 to 85% of downloads). Between 5 million and 500 million parameters (more like 200 million), university models can be seen as alternatives to those developed by companies.
View by modality
Figure 24: Total Downloads by Model Size and Modality Dynamic version available
here
Le NLP est présent dans toutes les tranches de tailles, notamment de 100 à 500M de paramètres. La CV est principalement portée par des modèles de moins de 100 de paramètres bien qu'également présente entre 100 et 500M. L'audio est principalement réparti de <5 à 500M de paramètres mais étonnement absent sur la tranche 100 à 200M. Les séries temporelles pour les modèles de moins de 50M de paramètres.
View by task
Figure 25: Total Downloads by Model Size and Task (Pipeline Tag) Dynamic version available
here
We have a little more detail at the task level (for example, for vision, we can see that models <100M are mostly classification, while between 100 and 500M they are mostly CLIP).
Readers can make their own analyses based on the fact that NLP tasks are different shades of blue, vision tasks are shades of yellow, and audio tasks are shades of red.
View by entity
Figure 26: Total Downloads by Model Size and Entity Dynamic version available
here
We invite readers to click on the legend to keep only the entities that interest them. This allows a given entity to see the distribution of downloads of its models according to the different segments and thus reveal its profile, or even compare n different entities.
Figure 27: Total Downloads by Model Size and Entity (focus Google and Meta)
For example, if we compare Google and Meta, which are the two most downloaded open-source entities, Google dominates models with <200M parameters (and is extremely strong in the 100-200M range), whereas Meta dominates models with 200M and above.
It is also possible to zoom in:
Figure 28: Total Downloads by Model Size and Entity (1B parameters and more models)
In this example, focusing only on the top 50 large models players that exceed 5% of downloads on one of the ranges above 1B parameters (namely Google, Meta, Microsoft, Alibaba, Mistral, Unsloth, Deepseek + individuals offering quantified versions of models such as TheBloke and Maziyar Panahi), we can see:
In the 1B-3B range: Alibaba is the most downloaded in this range with 20.2%, ahead of Meta at 16.3%, Google at 11.7%, TheBlock at 11.4%, and Maziyar Panahi at 6.3%.
in the 3B-7.5B range: Meta is the most downloaded in the range with 31.4%, ahead of Mistral at 14.6%, Alibaba at 12.3%, Maziyar Panahi at 9.7%, Microsoft at 7.6% and Unsloth at 7.2%
In the 7.5B+ segment: Alibaba is the most downloaded app in this segment with 24.6%, ahead of Meta with 23.2%, Maziyar Panahi with 15.5%, Mistral with 7.1%, Deepseek with 6.9%, and Google with 5.9%.
Overall (not shown in this graph), in the 1B+ segment: Meta is the most downloaded with 23.2% of the segment, ahead of Alibaba at 20%, Maziyar Panahi at 11.1%, Google at 7%, Mistral at 6.8%, TheBloke at 4.5%, Deepseek at 3.8%, and Microsoft at 3.3%.
15.6% of the segment is captured by individuals who offer quantified weights for the base models. The entities creating these models could capture these downloads if, when uploading them, they made these quantified versions directly available rather than leaving it to the community. Without Meta releasing any new open-source models in the near future, according to a very naive estimate (linear extrapolation based on downloads from the Qwen and meta-llama accounts between September 21 and October 1, 2025), Alibaba should become the leader in this segment by the end of November. Its Qwen/Qwen2.5-1.5B-Instruct model is already the most downloaded textual LLM ahead of meta-llama/Llama-3.1-8B-Instruct (the smallest models being the most downloaded).
A Chinese player would therefore be in first place in this metric (and in the open-source LLM segment), although overall US entities still have a margin over all Chinese entities (49.8% for the US in the 1B+ range compared to 24.2% for China, and 43.1% vs. 31.8% in the 7.5B+ range).
View by country (individuals)
Figure 29: Total Downloads by Model Size and Countries Dynamic version available
here
View by country (EU grouped)
Figure 30: Total Downloads by Model Size and Country (EU Countries Grouped) Dynamic version available
here
The United States and the European Union are present across all model size segments. The EU has a majority position in models with fewer than 25 million parameters, while the United States leads in all other segments. It should be noted that China is absent (with barely 130 million downloads) from models with fewer than 100 million parameters. Given that these are the most downloaded models, this explains the observation made at the beginning of the article that the country ranks only fourth in global downloads behind Germany and France, while it ranks second for models with one billion parameters and above, which are not widely downloaded in open source.
View by country (continents)
Figure 31: Total Downloads by Model Size and Continents Dynamic version available
here
Continuation of work
In an update to this work (which will require analyzing a large number of publications), we would like to add two additional layers, namely the city where the authors of the model are located and their nationality. The aim will be to determine in which cities the most downloaded models are developed, as well as the countries with the education systems that produce the most downloaded authors.
For example, downloads of LLaMA 1 would not be counted as 100% American, but as 12/14 French (Paris), 1/14 American (San Francisco), and 1/14 Pakistani (London). Several weighting systems could be applied, such as a higher weighting for the main authors, for example.
We would also like to offer a view that distinguishes the impact of a model within the ecosystem, i.e., in addition to downloads specific to a given model, add all downloads resulting from fine-tuning/merges/adapters/quantifications. To this end, we have already conducted a few experiments using the Model tree, but it turns out that this is often incomplete, especially for older models. One method would be to analyze the names of the fine-tuning heads used by the models, if not the base model, then at least its architecture.
The goal is then to determine which base models have the greatest impact (this recent article by the Hugging Face teams, which has just been released, would be a good starting point).
Figure 32: Model tree of google-bert/bert-base-uncased
As mentioned in the introduction, downloads of certain models are currently being incorrectly counted on the Hub. This could devalue certain entities. We are currently in contact with the Hugging Face teams to correct this issue as best we can.
Finally, we are considering writing an article similar to the one for datasets in place of models.
Citation
@inproceedings{HF_models_stats_blog_post,
author = {Loïck BOURDOIS},
title = {Statistiques des modèles des 50 entités les plus téléchargées sur Hugging Face },
year = {2025},
url = {https://lbourdois.github.io/blog/HF_stats_models/}
}
Happy to see a fellow Spanish in the top-50 @mrm8488 (well done!). I'm surprised that Falconsai is so high in the table and Alibaba so low, was expecting maybe the opposite and amazing to see @TheBloke in the middle of the table even if he has been unactive for so long. Great work!
Falconsai is high because I don't know who really uses its NSWF content detection model extensively
Alibaba has a huge momentum right now, with the entity currently gaining 3.5 million downloads per day (i.e., over 100 million per month) during the past few weeks
And TheBloke has simply the third most downloaded LLM: