Lisa Dunlap commited on
Commit
f850bde
·
1 Parent(s): 0ac505b

Add persistent storage support for Hugging Face Spaces - Enhanced app.py with automatic persistent storage detection - Added comprehensive persistent storage utilities - Added documentation and examples - Automatic HF_HOME and cache configuration for /data directory

Browse files
PERSISTENT_STORAGE_README.md ADDED
@@ -0,0 +1,384 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Persistent Storage Setup for Hugging Face Spaces
2
+
3
+ This guide explains how to set up and use persistent storage in Hugging Face Spaces for your LMM-Vibes application.
4
+
5
+ ## Overview
6
+
7
+ Hugging Face Spaces provides persistent storage at the `/data` directory that persists across app restarts and deployments. This storage is perfect for:
8
+
9
+ - Caching models and datasets
10
+ - Storing user uploads and results
11
+ - Maintaining application state
12
+ - Saving experiment results
13
+
14
+ ## Quick Start
15
+
16
+ ### 1. Automatic Setup (Already Implemented)
17
+
18
+ Your application automatically detects and configures persistent storage when running in Hugging Face Spaces:
19
+
20
+ ```python
21
+ # This is already handled in app.py
22
+ if is_persistent_storage_available():
23
+ # Configure HF cache to persistent storage
24
+ hf_home = get_hf_home_dir()
25
+ os.environ.setdefault("HF_HOME", str(hf_home))
26
+
27
+ # Set cache directories
28
+ cache_dir = get_cache_dir()
29
+ os.environ.setdefault("TRANSFORMERS_CACHE", str(cache_dir / "transformers"))
30
+ os.environ.setdefault("HF_DATASETS_CACHE", str(cache_dir / "datasets"))
31
+ ```
32
+
33
+ ### 2. Storage Structure
34
+
35
+ When persistent storage is available, your data is organized as follows:
36
+
37
+ ```
38
+ /data/
39
+ ├── app_data/ # Main application data
40
+ │ ├── experiments/ # Pipeline results and experiments
41
+ │ ├── dataframes/ # Saved pandas DataFrames
42
+ │ ├── logs/ # Application logs
43
+ │ └── uploads/ # User uploaded files
44
+ ├── .cache/ # Application cache
45
+ │ ├── transformers/ # Hugging Face Transformers cache
46
+ │ └── datasets/ # Hugging Face Datasets cache
47
+ └── .huggingface/ # Hugging Face model cache
48
+ ```
49
+
50
+ ## Usage Examples
51
+
52
+ ### Saving Data
53
+
54
+ ```python
55
+ from lmmvibes.utils.persistent_storage import (
56
+ save_data_to_persistent,
57
+ save_uploaded_file
58
+ )
59
+
60
+ # Save binary data
61
+ data_bytes = b"your binary data"
62
+ saved_path = save_data_to_persistent(
63
+ data=data_bytes,
64
+ filename="my_data.bin",
65
+ subdirectory="experiments"
66
+ )
67
+
68
+ # Save uploaded file from Gradio
69
+ def handle_upload(uploaded_file):
70
+ if uploaded_file:
71
+ saved_path = save_uploaded_file(uploaded_file, "user_upload.zip")
72
+ return f"Saved to: {saved_path}"
73
+ ```
74
+
75
+ ### Loading Data
76
+
77
+ ```python
78
+ from lmmvibes.utils.persistent_storage import load_data_from_persistent
79
+
80
+ # Load binary data
81
+ data_bytes = load_data_from_persistent("my_data.bin", "experiments")
82
+ if data_bytes:
83
+ # Process the data
84
+ data = data_bytes.decode('utf-8')
85
+ ```
86
+
87
+ ### Listing Files
88
+
89
+ ```python
90
+ from lmmvibes.utils.persistent_storage import list_persistent_files
91
+
92
+ # List all files
93
+ all_files = list_persistent_files()
94
+
95
+ # List specific types of files
96
+ json_files = list_persistent_files(subdirectory="experiments", pattern="*.json")
97
+ parquet_files = list_persistent_files(subdirectory="dataframes", pattern="*.parquet")
98
+ ```
99
+
100
+ ### Checking Storage Status
101
+
102
+ ```python
103
+ from lmmvibes.utils.persistent_storage import get_storage_info
104
+
105
+ info = get_storage_info()
106
+ print(f"Persistent storage available: {info['persistent_available']}")
107
+ print(f"Data directory: {info['data_dir']}")
108
+ print(f"Free space: {info['storage_paths']['free_gb']:.1f}GB")
109
+ ```
110
+
111
+ ## Integration with Your Application
112
+
113
+ ### 1. Data Loading
114
+
115
+ Your application already uses persistent storage for loading pipeline results:
116
+
117
+ ```python
118
+ # In data_loader.py - automatically uses persistent storage when available
119
+ def load_pipeline_results(results_dir: str):
120
+ # The function automatically checks for data in persistent storage
121
+ # Falls back to local storage if persistent storage is not available
122
+ pass
123
+ ```
124
+
125
+ ### 2. Caching
126
+
127
+ The application automatically caches data in persistent storage:
128
+
129
+ ```python
130
+ # In data_loader.py - DataCache uses persistent storage when available
131
+ class DataCache:
132
+ @classmethod
133
+ def get(cls, key: str):
134
+ # Check persistent storage first, then memory cache
135
+ return cls._cache.get(key)
136
+ ```
137
+
138
+ ### 3. User Uploads
139
+
140
+ For handling user uploads in Gradio:
141
+
142
+ ```python
143
+ import gradio as gr
144
+ from lmmvibes.utils.persistent_storage import save_uploaded_file
145
+
146
+ def handle_file_upload(file):
147
+ if file:
148
+ saved_path = save_uploaded_file(file, "user_upload.zip")
149
+ if saved_path:
150
+ return f"✅ File saved to persistent storage: {saved_path.name}"
151
+ else:
152
+ return "❌ Failed to save - persistent storage not available"
153
+ return "⚠️ No file uploaded"
154
+
155
+ # In your Gradio interface
156
+ with gr.Blocks() as demo:
157
+ file_input = gr.File(label="Upload data")
158
+ upload_btn = gr.Button("Save to persistent storage")
159
+ result = gr.Textbox(label="Status")
160
+
161
+ upload_btn.click(handle_file_upload, inputs=[file_input], outputs=[result])
162
+ ```
163
+
164
+ ## Best Practices
165
+
166
+ ### 1. Check Availability
167
+
168
+ Always check if persistent storage is available before trying to use it:
169
+
170
+ ```python
171
+ from lmmvibes.utils.persistent_storage import is_persistent_storage_available
172
+
173
+ if is_persistent_storage_available():
174
+ # Use persistent storage
175
+ save_data_to_persistent(data, "important_data.json")
176
+ else:
177
+ # Fall back to local storage or in-memory
178
+ print("Persistent storage not available")
179
+ ```
180
+
181
+ ### 2. Organize Data
182
+
183
+ Use subdirectories to organize your data:
184
+
185
+ ```python
186
+ # Save experiments in their own directory
187
+ save_data_to_persistent(
188
+ data=experiment_data,
189
+ filename=f"{experiment_name}_results.json",
190
+ subdirectory="experiments"
191
+ )
192
+
193
+ # Save dataframes separately
194
+ save_data_to_persistent(
195
+ data=dataframe_bytes,
196
+ filename=f"{dataset_name}_data.parquet",
197
+ subdirectory="dataframes"
198
+ )
199
+ ```
200
+
201
+ ### 3. Handle Errors Gracefully
202
+
203
+ ```python
204
+ def safe_save_data(data, filename):
205
+ try:
206
+ saved_path = save_data_to_persistent(data, filename)
207
+ if saved_path:
208
+ return f"✅ Saved to {saved_path}"
209
+ else:
210
+ return "❌ Failed to save - storage not available"
211
+ except Exception as e:
212
+ return f"❌ Error saving data: {e}"
213
+ ```
214
+
215
+ ### 4. Clean Up Old Data
216
+
217
+ Periodically clean up old files to manage storage space:
218
+
219
+ ```python
220
+ from lmmvibes.utils.persistent_storage import list_persistent_files, delete_persistent_file
221
+
222
+ def cleanup_old_files(days_old=30):
223
+ """Delete files older than specified days."""
224
+ import time
225
+ cutoff_time = time.time() - (days_old * 24 * 60 * 60)
226
+
227
+ for file in list_persistent_files():
228
+ if file.stat().st_mtime < cutoff_time:
229
+ delete_persistent_file(file.name)
230
+ ```
231
+
232
+ ## Troubleshooting
233
+
234
+ ### 1. Storage Not Available
235
+
236
+ If persistent storage is not working:
237
+
238
+ ```python
239
+ from lmmvibes.utils.persistent_storage import get_storage_info
240
+
241
+ info = get_storage_info()
242
+ print(f"Storage available: {info['persistent_available']}")
243
+ print(f"Data directory: {info['data_dir']}")
244
+ ```
245
+
246
+ ### 2. Permission Issues
247
+
248
+ If you encounter permission issues:
249
+
250
+ ```python
251
+ # The utilities automatically create directories with proper permissions
252
+ # If issues persist, check if /data exists and is writable
253
+ import os
254
+ if os.path.isdir("/data") and os.access("/data", os.W_OK):
255
+ print("✅ Persistent storage is accessible and writable")
256
+ else:
257
+ print("❌ Persistent storage not accessible")
258
+ ```
259
+
260
+ ### 3. Storage Full
261
+
262
+ Monitor storage usage:
263
+
264
+ ```python
265
+ info = get_storage_info()
266
+ if info['storage_paths']:
267
+ usage_pct = (info['storage_paths']['used_gb'] / info['storage_paths']['total_gb']) * 100
268
+ if usage_pct > 90:
269
+ print(f"⚠️ Storage nearly full: {usage_pct:.1f}% used")
270
+ # Implement cleanup logic
271
+ ```
272
+
273
+ ## Migration from Local Storage
274
+
275
+ If you're migrating from local storage to persistent storage:
276
+
277
+ 1. **Backup existing data**: Copy your local `data/` directory to persistent storage
278
+ 2. **Update paths**: Use the persistent storage utilities instead of hardcoded paths
279
+ 3. **Test thoroughly**: Ensure all functionality works with persistent storage
280
+ 4. **Monitor usage**: Keep track of storage usage and implement cleanup
281
+
282
+ ## Example: Complete Integration
283
+
284
+ Here's a complete example of integrating persistent storage into your application:
285
+
286
+ ```python
287
+ import gradio as gr
288
+ import json
289
+ import pandas as pd
290
+ from lmmvibes.utils.persistent_storage import (
291
+ save_data_to_persistent,
292
+ load_data_from_persistent,
293
+ list_persistent_files,
294
+ get_storage_info,
295
+ is_persistent_storage_available
296
+ )
297
+
298
+ def save_experiment_results(results_data, experiment_name):
299
+ """Save experiment results to persistent storage."""
300
+ if not is_persistent_storage_available():
301
+ return "❌ Persistent storage not available"
302
+
303
+ try:
304
+ results_json = json.dumps(results_data, indent=2)
305
+ results_bytes = results_json.encode('utf-8')
306
+
307
+ filename = f"{experiment_name}_results.json"
308
+ saved_path = save_data_to_persistent(
309
+ data=results_bytes,
310
+ filename=filename,
311
+ subdirectory="experiments"
312
+ )
313
+
314
+ if saved_path:
315
+ return f"✅ Saved experiment to: {saved_path.name}"
316
+ else:
317
+ return "❌ Failed to save experiment"
318
+ except Exception as e:
319
+ return f"❌ Error: {e}"
320
+
321
+ def load_experiment_results(experiment_name):
322
+ """Load experiment results from persistent storage."""
323
+ filename = f"{experiment_name}_results.json"
324
+ results_bytes = load_data_from_persistent(
325
+ filename=filename,
326
+ subdirectory="experiments"
327
+ )
328
+
329
+ if results_bytes:
330
+ results_data = json.loads(results_bytes.decode('utf-8'))
331
+ return json.dumps(results_data, indent=2)
332
+ else:
333
+ return "No results found"
334
+
335
+ def get_available_experiments():
336
+ """List all available experiments."""
337
+ experiment_files = list_persistent_files(subdirectory="experiments", pattern="*_results.json")
338
+ if experiment_files:
339
+ return "\n".join([f.name for f in experiment_files])
340
+ else:
341
+ return "No experiments found"
342
+
343
+ # Gradio interface
344
+ with gr.Blocks(title="Persistent Storage Demo") as demo:
345
+ gr.Markdown("# Persistent Storage Demo")
346
+
347
+ with gr.Tab("Save Experiment"):
348
+ experiment_name = gr.Textbox(label="Experiment Name")
349
+ results_json = gr.Textbox(label="Results (JSON)", lines=5)
350
+ save_btn = gr.Button("Save Experiment")
351
+ save_result = gr.Textbox(label="Save Result")
352
+
353
+ save_btn.click(
354
+ save_experiment_results,
355
+ inputs=[results_json, experiment_name],
356
+ outputs=[save_result]
357
+ )
358
+
359
+ with gr.Tab("Load Experiment"):
360
+ load_experiment_name = gr.Textbox(label="Experiment Name")
361
+ load_btn = gr.Button("Load Experiment")
362
+ load_result = gr.Textbox(label="Loaded Results", lines=10)
363
+
364
+ load_btn.click(
365
+ load_experiment_results,
366
+ inputs=[load_experiment_name],
367
+ outputs=[load_result]
368
+ )
369
+
370
+ with gr.Tab("Storage Info"):
371
+ info_btn = gr.Button("Get Storage Info")
372
+ storage_info = gr.Textbox(label="Storage Information", lines=10)
373
+
374
+ def get_info():
375
+ info = get_storage_info()
376
+ return json.dumps(info, indent=2)
377
+
378
+ info_btn.click(get_info, outputs=[storage_info])
379
+
380
+ if __name__ == "__main__":
381
+ demo.launch()
382
+ ```
383
+
384
+ This comprehensive setup ensures your application can take full advantage of Hugging Face Spaces' persistent storage capabilities while maintaining backward compatibility with local development.
README.md CHANGED
@@ -2,6 +2,7 @@
2
  title: Whatever This Is
3
  colorFrom: yellow
4
  colorTo: gray
 
5
  sdk: gradio
6
  sdk_version: 5.41.1
7
  app_file: app.py
 
2
  title: Whatever This Is
3
  colorFrom: yellow
4
  colorTo: gray
5
+ emoji: 🇬🇮
6
  sdk: gradio
7
  sdk_version: 5.41.1
8
  app_file: app.py
app.py CHANGED
@@ -1,10 +1,39 @@
1
  import os
 
2
 
3
  from lmmvibes.vis_gradio.app import launch_app
 
 
 
 
 
 
4
 
5
  # Launch the app for Hugging Face Spaces
6
  if __name__ == "__main__":
7
- # Optimize HF cache to persistent storage in Spaces
8
- if os.path.isdir("/data"):
9
- os.environ.setdefault("HF_HOME", "/data/.huggingface")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  launch_app(share=False, server_name="0.0.0.0", server_port=7860)
 
1
  import os
2
+ from pathlib import Path
3
 
4
  from lmmvibes.vis_gradio.app import launch_app
5
+ from lmmvibes.utils.persistent_storage import (
6
+ get_hf_home_dir,
7
+ get_cache_dir,
8
+ is_persistent_storage_available,
9
+ get_storage_info
10
+ )
11
 
12
  # Launch the app for Hugging Face Spaces
13
  if __name__ == "__main__":
14
+ # Set up persistent storage for Hugging Face Spaces
15
+ if is_persistent_storage_available():
16
+ print("🚀 Persistent storage available - configuring for HF Spaces")
17
+
18
+ # Set Hugging Face cache to persistent storage
19
+ hf_home = get_hf_home_dir()
20
+ os.environ.setdefault("HF_HOME", str(hf_home))
21
+
22
+ # Set cache directory for other libraries
23
+ cache_dir = get_cache_dir()
24
+ os.environ.setdefault("TRANSFORMERS_CACHE", str(cache_dir / "transformers"))
25
+ os.environ.setdefault("HF_DATASETS_CACHE", str(cache_dir / "datasets"))
26
+
27
+ # Print storage info
28
+ storage_info = get_storage_info()
29
+ print(f"📁 Data directory: {storage_info['data_dir']}")
30
+ print(f"🗄️ Cache directory: {storage_info['cache_dir']}")
31
+ print(f"🤗 HF Home: {storage_info['hf_home']}")
32
+
33
+ if storage_info['storage_paths']:
34
+ print(f"💾 Storage: {storage_info['storage_paths']['free_gb']:.1f}GB free / {storage_info['storage_paths']['total_gb']:.1f}GB total")
35
+ else:
36
+ print("⚠️ Persistent storage not available - using local storage")
37
+
38
+ # Launch the Gradio app
39
  launch_app(share=False, server_name="0.0.0.0", server_port=7860)
lmmvibes/utils/persistent_storage.py CHANGED
@@ -1,14 +1,22 @@
1
  """
2
  Utilities for persistent storage in Hugging Face Spaces.
 
 
 
3
  """
4
  import os
 
5
  from pathlib import Path
6
- from typing import Optional
 
7
 
8
 
9
  def get_persistent_data_dir() -> Optional[Path]:
10
  """Get the persistent data directory if available.
11
 
 
 
 
12
  Returns:
13
  Path to persistent storage directory if available, None otherwise.
14
  """
@@ -22,6 +30,9 @@ def get_persistent_data_dir() -> Optional[Path]:
22
  def get_cache_dir() -> Path:
23
  """Get the appropriate cache directory (persistent if available, temp otherwise).
24
 
 
 
 
25
  Returns:
26
  Path to cache directory.
27
  """
@@ -31,10 +42,27 @@ def get_cache_dir() -> Path:
31
  return cache_dir
32
  else:
33
  # Fallback to temp directory
34
- import tempfile
35
  return Path(tempfile.gettempdir()) / "app_cache"
36
 
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  def save_uploaded_file(uploaded_file, filename: str) -> Optional[Path]:
39
  """Save an uploaded file to persistent storage.
40
 
@@ -51,12 +79,112 @@ def save_uploaded_file(uploaded_file, filename: str) -> Optional[Path]:
51
  save_path.parent.mkdir(parents=True, exist_ok=True)
52
 
53
  # Copy the uploaded file to persistent storage
54
- import shutil
55
- shutil.copy2(uploaded_file, save_path)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  return save_path
57
  return None
58
 
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  def is_persistent_storage_available() -> bool:
61
  """Check if persistent storage is available.
62
 
@@ -77,4 +205,50 @@ def get_persistent_results_dir() -> Optional[Path]:
77
  results_dir = persistent_dir / "results"
78
  results_dir.mkdir(exist_ok=True)
79
  return results_dir
80
- return None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  """
2
  Utilities for persistent storage in Hugging Face Spaces.
3
+
4
+ This module provides utilities for managing persistent storage in Hugging Face Spaces,
5
+ including data directories, cache management, and file operations.
6
  """
7
  import os
8
+ import shutil
9
  from pathlib import Path
10
+ from typing import Optional, Union
11
+ import tempfile
12
 
13
 
14
  def get_persistent_data_dir() -> Optional[Path]:
15
  """Get the persistent data directory if available.
16
 
17
+ In Hugging Face Spaces, this will be `/data/app_data`.
18
+ Returns None if persistent storage is not available.
19
+
20
  Returns:
21
  Path to persistent storage directory if available, None otherwise.
22
  """
 
30
  def get_cache_dir() -> Path:
31
  """Get the appropriate cache directory (persistent if available, temp otherwise).
32
 
33
+ In Hugging Face Spaces, this will be `/data/.cache`.
34
+ Falls back to temp directory in local development.
35
+
36
  Returns:
37
  Path to cache directory.
38
  """
 
42
  return cache_dir
43
  else:
44
  # Fallback to temp directory
 
45
  return Path(tempfile.gettempdir()) / "app_cache"
46
 
47
 
48
+ def get_hf_home_dir() -> Path:
49
+ """Get the Hugging Face home directory for model caching.
50
+
51
+ In Hugging Face Spaces, this will be `/data/.huggingface`.
52
+ Falls back to default ~/.cache/huggingface in local development.
53
+
54
+ Returns:
55
+ Path to HF home directory.
56
+ """
57
+ if os.path.isdir("/data"):
58
+ hf_home = Path("/data/.huggingface")
59
+ hf_home.mkdir(exist_ok=True)
60
+ return hf_home
61
+ else:
62
+ # Fallback to default location
63
+ return Path.home() / ".cache" / "huggingface"
64
+
65
+
66
  def save_uploaded_file(uploaded_file, filename: str) -> Optional[Path]:
67
  """Save an uploaded file to persistent storage.
68
 
 
79
  save_path.parent.mkdir(parents=True, exist_ok=True)
80
 
81
  # Copy the uploaded file to persistent storage
82
+ if hasattr(uploaded_file, 'name'):
83
+ # Gradio file object
84
+ shutil.copy2(uploaded_file.name, save_path)
85
+ else:
86
+ # Direct file path
87
+ shutil.copy2(uploaded_file, save_path)
88
+ return save_path
89
+ return None
90
+
91
+
92
+ def save_data_to_persistent(data: bytes, filename: str, subdirectory: str = "") -> Optional[Path]:
93
+ """Save binary data to persistent storage.
94
+
95
+ Args:
96
+ data: Binary data to save
97
+ filename: Name to save the file as
98
+ subdirectory: Optional subdirectory within persistent storage
99
+
100
+ Returns:
101
+ Path to saved file if successful, None otherwise.
102
+ """
103
+ persistent_dir = get_persistent_data_dir()
104
+ if persistent_dir:
105
+ if subdirectory:
106
+ save_dir = persistent_dir / subdirectory
107
+ save_dir.mkdir(exist_ok=True)
108
+ else:
109
+ save_dir = persistent_dir
110
+
111
+ save_path = save_dir / filename
112
+ save_path.parent.mkdir(parents=True, exist_ok=True)
113
+
114
+ with open(save_path, 'wb') as f:
115
+ f.write(data)
116
  return save_path
117
  return None
118
 
119
 
120
+ def load_data_from_persistent(filename: str, subdirectory: str = "") -> Optional[bytes]:
121
+ """Load binary data from persistent storage.
122
+
123
+ Args:
124
+ filename: Name of the file to load
125
+ subdirectory: Optional subdirectory within persistent storage
126
+
127
+ Returns:
128
+ Binary data if successful, None otherwise.
129
+ """
130
+ persistent_dir = get_persistent_data_dir()
131
+ if persistent_dir:
132
+ if subdirectory:
133
+ load_path = persistent_dir / subdirectory / filename
134
+ else:
135
+ load_path = persistent_dir / filename
136
+
137
+ if load_path.exists():
138
+ with open(load_path, 'rb') as f:
139
+ return f.read()
140
+ return None
141
+
142
+
143
+ def list_persistent_files(subdirectory: str = "", pattern: str = "*") -> list[Path]:
144
+ """List files in persistent storage.
145
+
146
+ Args:
147
+ subdirectory: Optional subdirectory within persistent storage
148
+ pattern: Glob pattern to match files (e.g., "*.json", "data_*")
149
+
150
+ Returns:
151
+ List of Path objects for matching files.
152
+ """
153
+ persistent_dir = get_persistent_data_dir()
154
+ if persistent_dir:
155
+ if subdirectory:
156
+ search_dir = persistent_dir / subdirectory
157
+ else:
158
+ search_dir = persistent_dir
159
+
160
+ if search_dir.exists():
161
+ return list(search_dir.glob(pattern))
162
+ return []
163
+
164
+
165
+ def delete_persistent_file(filename: str, subdirectory: str = "") -> bool:
166
+ """Delete a file from persistent storage.
167
+
168
+ Args:
169
+ filename: Name of the file to delete
170
+ subdirectory: Optional subdirectory within persistent storage
171
+
172
+ Returns:
173
+ True if successful, False otherwise.
174
+ """
175
+ persistent_dir = get_persistent_data_dir()
176
+ if persistent_dir:
177
+ if subdirectory:
178
+ file_path = persistent_dir / subdirectory / filename
179
+ else:
180
+ file_path = persistent_dir / filename
181
+
182
+ if file_path.exists():
183
+ file_path.unlink()
184
+ return True
185
+ return False
186
+
187
+
188
  def is_persistent_storage_available() -> bool:
189
  """Check if persistent storage is available.
190
 
 
205
  results_dir = persistent_dir / "results"
206
  results_dir.mkdir(exist_ok=True)
207
  return results_dir
208
+ return None
209
+
210
+
211
+ def get_persistent_logs_dir() -> Optional[Path]:
212
+ """Get the persistent logs directory for storing application logs.
213
+
214
+ Returns:
215
+ Path to persistent logs directory if available, None otherwise.
216
+ """
217
+ persistent_dir = get_persistent_data_dir()
218
+ if persistent_dir:
219
+ logs_dir = persistent_dir / "logs"
220
+ logs_dir.mkdir(exist_ok=True)
221
+ return logs_dir
222
+ return None
223
+
224
+
225
+ def get_storage_info() -> dict:
226
+ """Get information about available storage.
227
+
228
+ Returns:
229
+ Dictionary with storage information.
230
+ """
231
+ info = {
232
+ "persistent_available": is_persistent_storage_available(),
233
+ "data_dir": None,
234
+ "cache_dir": str(get_cache_dir()),
235
+ "hf_home": str(get_hf_home_dir()),
236
+ "storage_paths": {}
237
+ }
238
+
239
+ if info["persistent_available"]:
240
+ data_dir = get_persistent_data_dir()
241
+ info["data_dir"] = str(data_dir)
242
+
243
+ # Check available space
244
+ try:
245
+ total, used, free = shutil.disk_usage(data_dir)
246
+ info["storage_paths"] = {
247
+ "total_gb": round(total / (1024**3), 2),
248
+ "used_gb": round(used / (1024**3), 2),
249
+ "free_gb": round(free / (1024**3), 2)
250
+ }
251
+ except OSError:
252
+ pass
253
+
254
+ return info
lmmvibes/utils/persistent_storage_example.py ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Example usage of persistent storage utilities for Hugging Face Spaces.
3
+
4
+ This file demonstrates how to use the persistent storage utilities
5
+ for saving and loading data in Hugging Face Spaces.
6
+ """
7
+
8
+ import json
9
+ import pandas as pd
10
+ from pathlib import Path
11
+
12
+ from .persistent_storage import (
13
+ get_persistent_data_dir,
14
+ get_cache_dir,
15
+ get_hf_home_dir,
16
+ save_data_to_persistent,
17
+ load_data_from_persistent,
18
+ save_uploaded_file,
19
+ list_persistent_files,
20
+ delete_persistent_file,
21
+ is_persistent_storage_available,
22
+ get_storage_info
23
+ )
24
+
25
+
26
+ def example_save_results(results_data: dict, experiment_name: str):
27
+ """Example: Save pipeline results to persistent storage.
28
+
29
+ Args:
30
+ results_data: Dictionary containing pipeline results
31
+ experiment_name: Name of the experiment
32
+ """
33
+ if not is_persistent_storage_available():
34
+ print("⚠️ Persistent storage not available - skipping save")
35
+ return None
36
+
37
+ # Save results as JSON
38
+ results_json = json.dumps(results_data, indent=2)
39
+ results_bytes = results_json.encode('utf-8')
40
+
41
+ filename = f"{experiment_name}_results.json"
42
+ saved_path = save_data_to_persistent(
43
+ data=results_bytes,
44
+ filename=filename,
45
+ subdirectory="experiments"
46
+ )
47
+
48
+ if saved_path:
49
+ print(f"✅ Saved results to: {saved_path}")
50
+ return saved_path
51
+ else:
52
+ print("❌ Failed to save results")
53
+ return None
54
+
55
+
56
+ def example_load_results(experiment_name: str):
57
+ """Example: Load pipeline results from persistent storage.
58
+
59
+ Args:
60
+ experiment_name: Name of the experiment
61
+
62
+ Returns:
63
+ Dictionary containing the loaded results or None
64
+ """
65
+ filename = f"{experiment_name}_results.json"
66
+ results_bytes = load_data_from_persistent(
67
+ filename=filename,
68
+ subdirectory="experiments"
69
+ )
70
+
71
+ if results_bytes:
72
+ results_data = json.loads(results_bytes.decode('utf-8'))
73
+ print(f"✅ Loaded results from: {filename}")
74
+ return results_data
75
+ else:
76
+ print(f"❌ No results found for: {filename}")
77
+ return None
78
+
79
+
80
+ def example_save_dataframe(df: pd.DataFrame, filename: str):
81
+ """Example: Save a pandas DataFrame to persistent storage.
82
+
83
+ Args:
84
+ df: DataFrame to save
85
+ filename: Name of the file (with .parquet extension)
86
+ """
87
+ if not is_persistent_storage_available():
88
+ print("⚠️ Persistent storage not available - skipping save")
89
+ return None
90
+
91
+ # Convert DataFrame to parquet bytes
92
+ try:
93
+ parquet_bytes = df.to_parquet()
94
+ saved_path = save_data_to_persistent(
95
+ data=parquet_bytes,
96
+ filename=filename,
97
+ subdirectory="dataframes"
98
+ )
99
+
100
+ if saved_path:
101
+ print(f"✅ Saved DataFrame to: {saved_path}")
102
+ return saved_path
103
+ else:
104
+ print("❌ Failed to save DataFrame")
105
+ return None
106
+ except Exception as e:
107
+ print(f"❌ Error saving DataFrame: {e}")
108
+ return None
109
+
110
+
111
+ def example_list_saved_files():
112
+ """Example: List all files saved in persistent storage."""
113
+ if not is_persistent_storage_available():
114
+ print("⚠️ Persistent storage not available")
115
+ return []
116
+
117
+ print("📁 Files in persistent storage:")
118
+
119
+ # List all files
120
+ all_files = list_persistent_files()
121
+ if all_files:
122
+ for file in all_files:
123
+ print(f" - {file.name}")
124
+ else:
125
+ print(" No files found")
126
+
127
+ # List experiment files
128
+ experiment_files = list_persistent_files(subdirectory="experiments", pattern="*.json")
129
+ if experiment_files:
130
+ print("\n🔬 Experiment files:")
131
+ for file in experiment_files:
132
+ print(f" - {file.name}")
133
+
134
+ # List dataframe files
135
+ dataframe_files = list_persistent_files(subdirectory="dataframes", pattern="*.parquet")
136
+ if dataframe_files:
137
+ print("\n📊 DataFrame files:")
138
+ for file in dataframe_files:
139
+ print(f" - {file.name}")
140
+
141
+ return all_files
142
+
143
+
144
+ def example_storage_cleanup(days_old: int = 30):
145
+ """Example: Clean up old files from persistent storage.
146
+
147
+ Args:
148
+ days_old: Delete files older than this many days
149
+ """
150
+ if not is_persistent_storage_available():
151
+ print("⚠️ Persistent storage not available")
152
+ return
153
+
154
+ import time
155
+ from datetime import datetime, timedelta
156
+
157
+ cutoff_time = time.time() - (days_old * 24 * 60 * 60)
158
+
159
+ print(f"🧹 Cleaning up files older than {days_old} days...")
160
+
161
+ # List all files and check their modification time
162
+ all_files = list_persistent_files()
163
+ deleted_count = 0
164
+
165
+ for file in all_files:
166
+ if file.stat().st_mtime < cutoff_time:
167
+ if delete_persistent_file(file.name):
168
+ print(f"🗑️ Deleted: {file.name}")
169
+ deleted_count += 1
170
+
171
+ print(f"✅ Cleanup complete - deleted {deleted_count} files")
172
+
173
+
174
+ def example_storage_info():
175
+ """Example: Display information about persistent storage."""
176
+ info = get_storage_info()
177
+
178
+ print("📊 Persistent Storage Information:")
179
+ print(f" Available: {info['persistent_available']}")
180
+
181
+ if info['persistent_available']:
182
+ print(f" Data directory: {info['data_dir']}")
183
+ print(f" Cache directory: {info['cache_dir']}")
184
+ print(f" HF Home: {info['hf_home']}")
185
+
186
+ if info['storage_paths']:
187
+ print(f" Total storage: {info['storage_paths']['total_gb']:.1f}GB")
188
+ print(f" Used storage: {info['storage_paths']['used_gb']:.1f}GB")
189
+ print(f" Free storage: {info['storage_paths']['free_gb']:.1f}GB")
190
+
191
+ # Calculate usage percentage
192
+ usage_pct = (info['storage_paths']['used_gb'] / info['storage_paths']['total_gb']) * 100
193
+ print(f" Usage: {usage_pct:.1f}%")
194
+
195
+
196
+ # Example usage in a Gradio app
197
+ def example_gradio_integration():
198
+ """Example: How to integrate persistent storage with Gradio."""
199
+
200
+ def save_uploaded_data(uploaded_file):
201
+ """Save a file uploaded through Gradio."""
202
+ if uploaded_file:
203
+ saved_path = save_uploaded_file(uploaded_file, "user_upload.txt")
204
+ if saved_path:
205
+ return f"✅ File saved to persistent storage: {saved_path.name}"
206
+ else:
207
+ return "❌ Failed to save file - persistent storage not available"
208
+ return "⚠️ No file uploaded"
209
+
210
+ def load_user_data():
211
+ """Load previously uploaded data."""
212
+ data_bytes = load_data_from_persistent("user_upload.txt")
213
+ if data_bytes:
214
+ return data_bytes.decode('utf-8')
215
+ return "No data found"
216
+
217
+ # This would be used in a Gradio interface like:
218
+ # import gradio as gr
219
+ #
220
+ # with gr.Blocks() as demo:
221
+ # file_input = gr.File(label="Upload file")
222
+ # upload_btn = gr.Button("Save to persistent storage")
223
+ # download_btn = gr.Button("Load from persistent storage")
224
+ #
225
+ # upload_btn.click(save_uploaded_data, inputs=[file_input])
226
+ # download_btn.click(load_user_data)
227
+
228
+
229
+ if __name__ == "__main__":
230
+ # Run examples
231
+ print("🔍 Persistent Storage Examples")
232
+ print("=" * 40)
233
+
234
+ example_storage_info()
235
+ print()
236
+
237
+ example_list_saved_files()
238
+ print()
239
+
240
+ # Example: Save some test data
241
+ test_data = {"experiment": "test", "results": [1, 2, 3], "timestamp": "2024-01-01"}
242
+ example_save_results(test_data, "test_experiment")
243
+ print()
244
+
245
+ # Example: Load the test data
246
+ loaded_data = example_load_results("test_experiment")
247
+ if loaded_data:
248
+ print(f"📊 Loaded data: {loaded_data}")
249
+ print()
250
+
251
+ # Example: List files again
252
+ example_list_saved_files()