pagezyhf HF Staff commited on
Commit
714e085
Β·
1 Parent(s): a8a7665
Files changed (4) hide show
  1. .gitignore +53 -0
  2. README.md +89 -14
  3. app.py +303 -0
  4. requirements.txt +5 -0
.gitignore ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+ MANIFEST
23
+
24
+ # Virtual environments
25
+ .env
26
+ .venv
27
+ env/
28
+ venv/
29
+ ENV/
30
+ env.bak/
31
+ venv.bak/
32
+
33
+ # IDEs
34
+ .vscode/
35
+ .idea/
36
+ *.swp
37
+ *.swo
38
+ *~
39
+
40
+ # OS
41
+ .DS_Store
42
+ Thumbs.db
43
+
44
+ # Gradio
45
+ gradio_cached_examples/
46
+ flagged/
47
+
48
+ # Logs
49
+ *.log
50
+
51
+ # Temporary files
52
+ *.tmp
53
+ *.temp
README.md CHANGED
@@ -1,14 +1,89 @@
1
- ---
2
- title: Trending Model Dashboard
3
- emoji: πŸ¦€
4
- colorFrom: pink
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.38.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: trending model dashboard
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trending Models Dashboard
2
+
3
+ A Hugging Face Space dashboard for visualizing and analyzing trending model data from the `hf-azure-internal/trending-models-analysis` dataset.
4
+
5
+ ## Overview
6
+
7
+ This dashboard provides an interactive interface to explore trending machine learning models, their metadata, and their compliance status. Users can filter by date and view models organized by status or trending rank.
8
+
9
+ ## Data Source
10
+
11
+ The dashboard uses data from: [`hf-azure-internal/trending-models-analysis`](https://huggingface.co/datasets/hf-azure-internal/trending-models-analysis)
12
+
13
+ ### Dataset Schema
14
+
15
+ The `models` section contains the following columns:
16
+
17
+ | Column | Type | Description |
18
+ |--------|------|-------------|
19
+ | `id` | string | Model identifier |
20
+ | `trending_rank` | integer | Current trending position |
21
+ | `author` | string | Model author/organization |
22
+ | `tags` | array | Model tags |
23
+ | `license` | string | Model license type |
24
+ | `library_name` | string | Framework used (e.g., transformers, diffusers) |
25
+ | `gated` | boolean | Whether model access is gated |
26
+ | `task` | string | Model task type |
27
+ | `is_in_catalog` | boolean | Whether model is in catalog |
28
+ | `is_custom_code` | boolean | Whether model uses custom code |
29
+ | `is_excluded_org` | boolean | Whether organization is excluded |
30
+ | `is_supported_license` | boolean | Whether license is supported |
31
+ | `is_supported_library` | boolean | Whether library is supported |
32
+ | `is_safetensors` | boolean | Whether model uses safetensors |
33
+ | `is_supported_task` | boolean | Whether task is supported |
34
+ | `is_securely_scanned` | boolean | Whether model is securely scanned |
35
+ | `collected_at` | datetime | Data collection timestamp |
36
+ | `model_status` | string | Status: "to add", "blocked", or "added" |
37
+
38
+ ## Features
39
+
40
+ ### User Controls
41
+
42
+ - **Date Filter**: Select a specific date to filter models by `collected_at` timestamp
43
+ - **Display Mode**: Choose between two viewing modes:
44
+ - **By Status**: Organize models into three sections based on `model_status`
45
+ - **By Trending Rank**: Display all models sorted by `trending_rank` (ascending)
46
+
47
+ ### Display Sections
48
+
49
+ #### By Status Mode
50
+ Models are organized into three sections:
51
+ - **To Add**: Models with `model_status = "to add"`
52
+ - **Blocked**: Models with `model_status = "blocked"`
53
+ - **Added**: Models with `model_status = "added"`
54
+
55
+ #### By Trending Rank Mode
56
+ - Single section displaying all models sorted by `trending_rank`
57
+
58
+ ### Model Cards
59
+
60
+ Each model displays:
61
+ - **Model ID**: Raw value from `id` column
62
+ - **Status Indicators**: Boolean fields shown as emojis
63
+ - 🟒 Green: `true` values
64
+ - πŸ”΄ Red: `false` values
65
+ - Fields: `is_in_catalog`, `is_custom_code`, `is_excluded_org`, `is_supported_license`, `is_supported_library`, `is_safetensors`, `is_supported_task`
66
+ - **Model Status**: Displayed in bold text
67
+
68
+ ### Interactive Details
69
+
70
+ Clicking on any model card reveals additional information:
71
+ - **License Details**: Full license information
72
+ - **Task Information**: Specific task type
73
+ - **Library Details**: Framework/library used
74
+ - **Author Information**: Model author/organization
75
+
76
+ ## Technical Requirements
77
+
78
+ - Will be hosted as a Hugging Face Spaces
79
+ - Interactive web interface
80
+ - Real-time data filtering
81
+ - Smooth animations for detail views
82
+ - Responsive design
83
+
84
+ ## Development
85
+
86
+ This project will be implemented as a Hugging Face Space using:
87
+ - Python backend for data processing
88
+ - Interactive web framework (Gradio/Streamlit)
89
+ - Dataset integration via Hugging Face Hub
app.py ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ from datasets import load_dataset
4
+ from datetime import datetime, date
5
+ import numpy as np
6
+ from functools import lru_cache
7
+
8
+ # Load the dataset
9
+ @lru_cache(maxsize=1)
10
+ def load_trending_models_data():
11
+ """Load the trending models dataset from Hugging Face"""
12
+ try:
13
+ print("Loading dataset from hf-azure-internal/trending-models-analysis...")
14
+
15
+ # First, check what splits are available
16
+ dataset_info = load_dataset("hf-azure-internal/trending-models-analysis")
17
+ print(f"Available splits: {list(dataset_info.keys())}")
18
+
19
+ # Try to load the correct split
20
+ if "models" in dataset_info:
21
+ print("Using 'models' split...")
22
+ dataset = dataset_info["models"]
23
+ elif "train" in dataset_info:
24
+ print("Using 'train' split...")
25
+ dataset = dataset_info["train"]
26
+ else:
27
+ # Fallback to first available split
28
+ split_name = list(dataset_info.keys())[0]
29
+ print(f"Using '{split_name}' split...")
30
+ dataset = dataset_info[split_name]
31
+
32
+ print(f"Dataset loaded. Type: {type(dataset)}")
33
+
34
+ df = dataset.to_pandas()
35
+ print(f"Converted to pandas. Shape: {df.shape}")
36
+ print(f"Columns: {list(df.columns)}")
37
+
38
+ # Convert collected_at to datetime if it's not already
39
+ if 'collected_at' in df.columns:
40
+ print(f"collected_at column found. Sample values:")
41
+ print(df['collected_at'].head(3).tolist())
42
+ df['collected_at'] = pd.to_datetime(df['collected_at'])
43
+ print(f"After conversion, dtype: {df['collected_at'].dtype}")
44
+
45
+ # Show unique dates
46
+ unique_dates = df['collected_at'].dt.date.unique()
47
+ print(f"Unique dates in dataset: {sorted(unique_dates)}")
48
+ else:
49
+ print("No 'collected_at' column found!")
50
+
51
+ return df
52
+ except Exception as e:
53
+ print(f"Error loading dataset: {e}")
54
+ # Return empty dataframe with expected columns for development
55
+ return pd.DataFrame(columns=[
56
+ 'id', 'trending_rank', 'author', 'tags', 'license', 'library_name',
57
+ 'gated', 'task', 'is_in_catalog', 'is_custom_code', 'is_excluded_org',
58
+ 'is_supported_license', 'is_supported_library', 'is_safetensors',
59
+ 'is_supported_task', 'is_securely_scanned', 'collected_at', 'model_status'
60
+ ])
61
+
62
+ def get_status_emoji(value):
63
+ """Convert boolean values to emoji indicators"""
64
+ if pd.isna(value):
65
+ return "❓"
66
+ return "🟒" if value else "πŸ”΄"
67
+
68
+ def get_negative_status_emoji(value):
69
+ """Convert boolean values to emoji indicators where True is bad (red) and False is good (green)"""
70
+ if pd.isna(value):
71
+ return "❓"
72
+ return "πŸ”΄" if value else "🟒"
73
+
74
+ def get_status_with_text(value, text_value=None):
75
+ """Convert boolean values to emoji indicators with optional text"""
76
+ if pd.isna(value):
77
+ emoji = "❓"
78
+ else:
79
+ emoji = "🟒" if value else "πŸ”΄"
80
+
81
+ if text_value and not pd.isna(text_value):
82
+ return f"{emoji} {text_value}"
83
+ else:
84
+ return emoji
85
+
86
+ def get_negative_status_with_text(value, text_value=None):
87
+ """Convert boolean values to emoji indicators where True is bad (red) and False is good (green), with optional text"""
88
+ if pd.isna(value):
89
+ emoji = "❓"
90
+ else:
91
+ emoji = "πŸ”΄" if value else "🟒"
92
+
93
+ if text_value and not pd.isna(text_value):
94
+ return f"{emoji} {text_value}"
95
+ else:
96
+ return emoji
97
+
98
+ def create_clickable_model_id(model_id):
99
+ """Convert model ID to clickable link"""
100
+ if pd.isna(model_id) or not model_id:
101
+ return ""
102
+ return f'<a href="https://hf.co/{model_id}" target="_blank" style="text-decoration: underline; color: #0066cc;">{model_id}</a>'
103
+
104
+ def get_status_with_color(status):
105
+ """Add color coding to status values"""
106
+ if pd.isna(status) or not status:
107
+ return ""
108
+
109
+ status_lower = str(status).lower()
110
+ if status_lower == "to add":
111
+ return f'<span style="color: #0066ff; font-weight: bold; background-color: #e6f3ff; padding: 2px 6px; border-radius: 4px;">{status}</span>'
112
+ elif status_lower == "added":
113
+ return f'<span style="color: #00aa00; font-weight: bold; background-color: #e6ffe6; padding: 2px 6px; border-radius: 4px;">{status}</span>'
114
+ elif status_lower == "blocked":
115
+ return f'<span style="color: #cc0000; font-weight: bold; background-color: #ffe6e6; padding: 2px 6px; border-radius: 4px;">{status}</span>'
116
+ else:
117
+ return f'<span style="padding: 2px 6px; border-radius: 4px;">{status}</span>'
118
+
119
+ def create_display_dataframe(df, selected_date):
120
+ """Create a DataFrame for display"""
121
+ if df.empty:
122
+ return pd.DataFrame()
123
+
124
+ # Filter by date if specified
125
+ filtered_df = df.copy()
126
+ if selected_date and 'collected_at' in df.columns:
127
+ # Convert selected_date to just the date part for comparison
128
+ if isinstance(selected_date, str):
129
+ target_date = pd.to_datetime(selected_date).date()
130
+ elif hasattr(selected_date, 'date'):
131
+ target_date = selected_date.date()
132
+ else:
133
+ target_date = selected_date
134
+
135
+ # Filter by comparing just the date parts (ignoring time)
136
+ filtered_df = filtered_df[filtered_df['collected_at'].dt.date == target_date]
137
+
138
+ if filtered_df.empty:
139
+ return pd.DataFrame()
140
+
141
+ # Create display dataframe with key columns
142
+ display_df = filtered_df[['trending_rank', 'id', 'is_custom_code', 'is_excluded_org',
143
+ 'is_supported_license', 'is_supported_library', 'is_safetensors',
144
+ 'is_supported_task', 'is_securely_scanned', 'model_status']].copy()
145
+
146
+ # Convert boolean columns to emojis for better display
147
+ display_df['Custom Code'] = filtered_df['is_custom_code'].apply(get_negative_status_emoji)
148
+ display_df['Excluded Org'] = filtered_df.apply(lambda row: get_negative_status_with_text(row['is_excluded_org'], row.get('author')), axis=1)
149
+ display_df['Supported License'] = filtered_df.apply(lambda row: get_status_with_text(row['is_supported_license'], row.get('license')), axis=1)
150
+ display_df['Supported Library'] = filtered_df.apply(lambda row: get_status_with_text(row['is_supported_library'], row.get('library_name')), axis=1)
151
+ display_df['Safetensors'] = filtered_df['is_safetensors'].apply(get_status_emoji)
152
+ display_df['Supported Task'] = filtered_df.apply(lambda row: get_status_with_text(row['is_supported_task'], row.get('task')), axis=1)
153
+ display_df['Security Check'] = filtered_df['is_securely_scanned'].apply(get_status_emoji)
154
+
155
+ # Create clickable model IDs and colored status
156
+ display_df['Model ID'] = filtered_df['id'].apply(create_clickable_model_id)
157
+ display_df['Status'] = filtered_df['model_status'].apply(get_status_with_color)
158
+
159
+ # Rename and reorder columns
160
+ display_df = display_df.rename(columns={
161
+ 'trending_rank': 'Rank'
162
+ })
163
+
164
+ # Select final columns for display
165
+ final_columns = ['Rank', 'Model ID', 'Custom Code', 'Excluded Org', 'Supported License',
166
+ 'Supported Library', 'Safetensors', 'Supported Task', 'Security Check', 'Status']
167
+ display_df = display_df[final_columns]
168
+
169
+ # Sort by rank and reset index to get clean row indices
170
+ display_df = display_df.sort_values('Rank').reset_index(drop=True)
171
+
172
+ return display_df
173
+
174
+ def update_dashboard(selected_date):
175
+ """Update the dashboard based on user selections"""
176
+ df = load_trending_models_data()
177
+ display_df = create_display_dataframe(df, selected_date)
178
+ return display_df
179
+
180
+ def get_available_dates():
181
+ """Get list of available dates from the dataset"""
182
+ df = load_trending_models_data()
183
+ if df.empty or 'collected_at' not in df.columns:
184
+ return [], None, None
185
+
186
+ dates = df['collected_at'].dt.date.unique()
187
+ valid_dates = sorted([d for d in dates if pd.notna(d)], reverse=True)
188
+
189
+ if not valid_dates:
190
+ return [], None, None
191
+
192
+ return valid_dates, valid_dates[-1], valid_dates[0] # all_dates, min_date, max_date
193
+
194
+ # Create the Gradio interface
195
+ def create_interface():
196
+ # Custom CSS for enhanced styling
197
+ custom_css = """
198
+ .dataframe-container {
199
+ border-radius: 12px;
200
+ overflow: hidden;
201
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
202
+ }
203
+
204
+ .info-text {
205
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
206
+ color: white;
207
+ padding: 12px 16px;
208
+ border-radius: 8px;
209
+ text-align: center;
210
+ font-weight: 500;
211
+ margin: 8px 0;
212
+ }
213
+ """
214
+
215
+ with gr.Blocks(title="Trending Models Dashboard", theme=gr.themes.Soft(), css=custom_css) as demo:
216
+ gr.Markdown("""
217
+ # Trending Models Support Dashboard
218
+ **Data Source:** [hf-azure-internal/trending-models-analysis](https://huggingface.co/datasets/hf-azure-internal/trending-models-analysis)
219
+ """)
220
+
221
+ # Get date information
222
+ available_dates, min_date, max_date = get_available_dates()
223
+
224
+ # Controls row at the top
225
+ with gr.Row():
226
+ with gr.Column(scale=1):
227
+ date_picker = gr.Textbox(
228
+ value=str(max_date) if max_date else "",
229
+ label="πŸ“… Date Selection",
230
+ placeholder="2025-01-21",
231
+ info="Enter date in YYYY-MM-DD format"
232
+ )
233
+
234
+ with gr.Column(scale=1):
235
+ refresh_btn = gr.Button("πŸ”„ Refresh Data", variant="primary", size="lg")
236
+
237
+ # Main dataframe display
238
+ with gr.Row():
239
+ dataframe_display = gr.Dataframe(
240
+ label="πŸ“Š Trending Models Overview",
241
+ interactive=False,
242
+ wrap=True,
243
+ elem_classes=["dataframe-container"],
244
+ datatype=["number", "html", "str", "str", "str", "str", "str", "str", "str", "html"]
245
+ )
246
+
247
+ # Event handlers
248
+ def update_dashboard_wrapper(selected_date_text):
249
+ """Wrapper to handle the dashboard update"""
250
+ selected_date = None
251
+ if selected_date_text:
252
+ try:
253
+ selected_date = pd.to_datetime(selected_date_text).date()
254
+ except Exception as e:
255
+ print(f"Date conversion error: {e}, value: {selected_date_text}")
256
+ selected_date = None
257
+
258
+ return update_dashboard(selected_date)
259
+
260
+ # Wire up events
261
+ date_picker.change(
262
+ fn=update_dashboard_wrapper,
263
+ inputs=[date_picker],
264
+ outputs=[dataframe_display]
265
+ )
266
+
267
+ def refresh_data(selected_date_text):
268
+ """Refresh data and update dashboard"""
269
+ available_dates, _, max_date = get_available_dates()
270
+
271
+ selected_date = max_date
272
+ if selected_date_text:
273
+ try:
274
+ selected_date = pd.to_datetime(selected_date_text).date()
275
+ except Exception as e:
276
+ print(f"Date conversion error in refresh: {e}, value: {selected_date_text}")
277
+ selected_date = max_date
278
+
279
+ display_df = update_dashboard(selected_date)
280
+ return (
281
+ str(max_date) if max_date else "",
282
+ display_df
283
+ )
284
+
285
+ refresh_btn.click(
286
+ fn=refresh_data,
287
+ inputs=[date_picker],
288
+ outputs=[date_picker, dataframe_display]
289
+ )
290
+
291
+ # Load initial data
292
+ demo.load(
293
+ fn=update_dashboard_wrapper,
294
+ inputs=[date_picker],
295
+ outputs=[dataframe_display]
296
+ )
297
+
298
+ return demo
299
+
300
+ # Launch the app
301
+ if __name__ == "__main__":
302
+ demo = create_interface()
303
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ pandas>=1.5.0
3
+ datasets>=2.14.0
4
+ numpy>=1.21.0
5
+ huggingface_hub>=0.16.0