Spaces:

AuraSystems
/

spanish-embeddings-api

Sleeping

App Files Files Community

Jordi Catafal commited on Jun 3

Commit

dda5c3b

1 Parent(s): 47bf3b3

trying to solve api

Browse files

Files changed (8) hide show

CLAUDE.md +103 -0
__pycache__/app.cpython-311.pyc +0 -0
app.py +38 -15
models/__pycache__/__init__.cpython-311.pyc +0 -0
models/__pycache__/schemas.cpython-311.pyc +0 -0
test_api.py +64 -0
utils/__pycache__/__init__.cpython-311.pyc +0 -0
utils/__pycache__/helpers.cpython-311.pyc +0 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,103 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+This is a FastAPI-based multilingual embedding API that provides access to 5 specialized models for generating embeddings from Spanish, Catalan, English, and multilingual text. The API is deployed on Hugging Face Spaces and serves embeddings for different use cases including legal documents and general-purpose text.
+## Available Models
+The API serves 5 models with different specializations:
+- **jina**: Bilingual Spanish-English (768D, 8192 tokens)
+- **robertalex**: Spanish legal domain (768D, 512 tokens)
+- **jina-v3**: Multilingual latest generation (1024D, 8192 tokens)
+- **legal-bert**: English legal domain (768D, 512 tokens)
+- **roberta-ca**: Catalan general purpose (1024D, 512 tokens)
+## Architecture
+### Core Components
+- **app.py**: Main FastAPI application with endpoints for embedding generation, model listing, and health checks
+- **models/schemas.py**: Pydantic models for request/response validation and API documentation
+- **utils/helpers.py**: Model loading, embedding generation, and memory management utilities
+### Key Design Patterns
+- **Global model cache**: All 5 models loaded at startup and cached in memory for fast inference
+- **Batch processing**: Memory-efficient batching with different batch sizes based on model complexity
+- **Memory optimization**: Automatic cleanup after large batches, torch dtype optimization
+- **Device handling**: Automatic CPU/GPU detection with appropriate tensor placement
+## Development Commands
+### Running the API
+```bash
+python app.py
+```
+The API will start on `http://0.0.0.0:7860` by default.
+### Testing the API
+```bash
+# Using the test script
+python test_api.py
+# Manual testing with curl
+# Health check
+curl http://localhost:7860/health
+# Generate embeddings
+curl -X POST "http://localhost:7860/embed" \
+     -H "Content-Type: application/json" \
+     -d '{"texts": ["Texto de prueba"], "model": "jina"}'
+# List models
+curl http://localhost:7860/models
+```
+### Docker Development
+```bash
+# Build image
+docker build -t spanish-embeddings-api .
+# Run container
+docker run -p 7860:7860 spanish-embeddings-api
+```
+## Important Implementation Details
+### Model Loading Strategy
+- All models loaded at startup in `load_models()` function
+- Uses different tokenizer classes based on model architecture (AutoTokenizer, RobertaTokenizer, BertTokenizer)
+- Implements memory optimization with torch.float16 on GPU, torch.float32 on CPU
+- Each model cached with its tokenizer, device, and pooling strategy
+### Embedding Generation
+- Supports two pooling strategies: mean pooling (Jina models) and CLS token (BERT-based models)
+- Implements dynamic batching based on model complexity
+- Automatic memory cleanup for large batches (>20 texts)
+- Text validation and cleaning in preprocessing
+### API Rate Limiting
+- Maximum 50 texts per request
+- Model-specific max_length validation
+- Memory-aware batch sizing
+### Error Handling
+- Comprehensive validation in Pydantic schemas
+- HTTP status code mapping for different error types
+- Model availability checks
+## Environment Variables
+- `TRANSFORMERS_CACHE`: Model cache directory
+- `HF_HOME`: Hugging Face cache directory
+- `PYTORCH_CUDA_ALLOC_CONF`: CUDA memory management
+- `TOKENIZERS_PARALLELISM`: Set to false to avoid warnings
+## Memory Management
+The application implements several memory optimization strategies:
+- Automatic garbage collection after model loading
+- Batch size reduction for large models (jina-v3, roberta-ca)
+- CUDA cache clearing for GPU deployments
+- Memory cleanup after processing large batches

__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (6.2 kB). View file

app.py CHANGED Viewed

@@ -1,28 +1,48 @@
 from fastapi import FastAPI, HTTPException
 from typing import List
 import torch
 import uvicorn
-import gc
-import os
 from models.schemas import EmbeddingRequest, EmbeddingResponse, ModelInfo
 from utils.helpers import load_models, get_embeddings, cleanup_memory
 app = FastAPI(
     title="Multilingual & Legal Embedding API",
     description="Multi-model embedding API for Spanish, Catalan, English and Legal texts",
-    version="3.0.0"
 )
-# Global model cache
-models_cache = {}
-@app.on_event("startup")
-async def startup_event():
-    """Load models on startup"""
-    global models_cache
-    models_cache = load_models()
-    print("All models loaded successfully!")
 @app.get("/")
 async def root():
@@ -122,10 +142,13 @@ async def list_models():
 @app.get("/health")
 async def health_check():
     """Health check endpoint"""
     return {
-        "status": "healthy",
-        "models_loaded": len(models_cache) == 5,
-        "available_models": list(models_cache.keys())
     }
 if __name__ == "__main__":

 from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from contextlib import asynccontextmanager
 from typing import List
 import torch
 import uvicorn
 from models.schemas import EmbeddingRequest, EmbeddingResponse, ModelInfo
 from utils.helpers import load_models, get_embeddings, cleanup_memory
+# Global model cache
+models_cache = {}
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Application lifespan handler for startup and shutdown"""
+    # Startup
+    try:
+        global models_cache
+        print("Loading models...")
+        models_cache = load_models()
+        print("All models loaded successfully!")
+        yield
+    except Exception as e:
+        print(f"Failed to load models: {str(e)}")
+        raise
+    finally:
+        # Shutdown - cleanup resources
+        cleanup_memory()
 app = FastAPI(
     title="Multilingual & Legal Embedding API",
     description="Multi-model embedding API for Spanish, Catalan, English and Legal texts",
+    version="3.0.0",
+    lifespan=lifespan
 )
+# Add CORS middleware to allow cross-origin requests
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # In production, specify actual domains
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
 @app.get("/")
 async def root():
 @app.get("/health")
 async def health_check():
     """Health check endpoint"""
+    models_loaded = len(models_cache) == 5
     return {
+        "status": "healthy" if models_loaded else "degraded",
+        "models_loaded": models_loaded,
+        "available_models": list(models_cache.keys()),
+        "expected_models": ["jina", "robertalex", "jina-v3", "legal-bert", "roberta-ca"],
+        "models_count": len(models_cache)
     }
 if __name__ == "__main__":

models/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (409 Bytes). View file

models/__pycache__/schemas.cpython-311.pyc ADDED Viewed

Binary file (5.72 kB). View file

test_api.py ADDED Viewed

	@@ -0,0 +1,64 @@

+#!/usr/bin/env python3
+"""
+Simple test script for the embedding API
+"""
+import requests
+import json
+import time
+def test_api(base_url="http://localhost:7860"):
+    """Test the API endpoints"""
+    print(f"Testing API at {base_url}")
+    # Test root endpoint
+    try:
+        response = requests.get(f"{base_url}/")
+        print(f"✓ Root endpoint: {response.status_code}")
+        print(f"  Response: {response.json()}")
+    except Exception as e:
+        print(f"✗ Root endpoint failed: {e}")
+        return False
+    # Test health endpoint
+    try:
+        response = requests.get(f"{base_url}/health")
+        print(f"✓ Health endpoint: {response.status_code}")
+        health_data = response.json()
+        print(f"  Models loaded: {health_data.get('models_loaded', False)}")
+        print(f"  Available models: {health_data.get('available_models', [])}")
+    except Exception as e:
+        print(f"✗ Health endpoint failed: {e}")
+    # Test models endpoint
+    try:
+        response = requests.get(f"{base_url}/models")
+        print(f"✓ Models endpoint: {response.status_code}")
+        models = response.json()
+        print(f"  Found {len(models)} model definitions")
+    except Exception as e:
+        print(f"✗ Models endpoint failed: {e}")
+    # Test embedding endpoint
+    try:
+        payload = {
+            "texts": ["Hello world", "Test text"],
+            "model": "jina",
+            "normalize": True
+        }
+        response = requests.post(f"{base_url}/embed", json=payload)
+        print(f"✓ Embed endpoint: {response.status_code}")
+        if response.status_code == 200:
+            data = response.json()
+            print(f"  Generated {data.get('num_texts', 0)} embeddings")
+            print(f"  Dimensions: {data.get('dimensions', 0)}")
+        else:
+            print(f"  Error: {response.text}")
+    except Exception as e:
+        print(f"✗ Embed endpoint failed: {e}")
+    return True
+if __name__ == "__main__":
+    test_api()

utils/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (445 Bytes). View file

utils/__pycache__/helpers.cpython-311.pyc ADDED Viewed

Binary file (10.4 kB). View file