Spaces:

laureBe
/

submission

Sleeping

App Files Files Community

laureBe commited on Jan 14

Commit

15c4bfa

verified ·

1 Parent(s): b66d092

Upload 2 files

Browse files

Files changed (2) hide show

README.md +6 -6
app.py +105 -26

README.md CHANGED Viewed

@@ -8,11 +8,11 @@ pinned: false
 ---
-# Random Baseline Model for Climate Disinformation Classification
 ## Model Description
-This is a random baseline model for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor, randomly assigning labels to text inputs without any learning.
 ### Intended Use
@@ -40,7 +40,7 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
 ## Performance
 ### Metrics
-- **Accuracy**: ~12.5% (random chance with 8 classes)
 - **Environmental Impact**:
   - Emissions tracked in gCO2eq
   - Energy consumption tracked in Wh
@@ -57,10 +57,10 @@ Environmental impact is tracked using CodeCarbon, measuring:
 This tracking helps establish a baseline for the environmental impact of model deployment and inference.
 ## Limitations
-- Makes completely random predictions
 - No learning or pattern recognition
-- No consideration of input text
-- Serves only as a baseline reference
 - Not suitable for any real-world applications
 ## Ethical Considerations

 ---
+# Logistic regression Model for Climate Disinformation Classification
 ## Model Description
+This is a Logistic regression baseline model for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor.
 ### Intended Use
 ## Performance
 ### Metrics
+- **Accuracy**: ~63.5%
 - **Environmental Impact**:
   - Emissions tracked in gCO2eq
   - Energy consumption tracked in Wh
 This tracking helps establish a baseline for the environmental impact of model deployment and inference.
 ## Limitations
+- Makes Logistic regression predictions
 - No learning or pattern recognition
+- Input text vectorized
+- Serves only as a LR baseline reference
 - Not suitable for any real-world applications
 ## Ethical Considerations

app.py CHANGED Viewed

@@ -1,27 +1,106 @@
-from fastapi import FastAPI
-from dotenv import load_dotenv
-from tasks import text, image, audio
-# Load environment variables
-load_dotenv()
-app = FastAPI(
-    title="Frugal AI Challenge API",
-    description="API for the Frugal AI Challenge evaluation endpoints"
-)
-# Include all routers
-app.include_router(text.router)
-app.include_router(image.router)
-app.include_router(audio.router)
-@app.get("/")
-async def root():
-    return {
-        "message": "Welcome to the Frugal AI Challenge API",
-        "endpoints": {
-            "text": "/text - Text classification task",
-            "image": "/image - Image classification task (coming soon)",
-            "audio": "/audio - Audio classification task (coming soon)"
         }
-    }

+from fastapi import APIRouter
+from datetime import datetime
+from datasets import load_dataset
+from sklearn.metrics import accuracy_score
+from sklearn.linear_model import LogisticRegression
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.model_selection import train_test_split
+from .utils.evaluation import TextEvaluationRequest
+from .utils.emissions import tracker, clean_emissions_data, get_space_info
+router = APIRouter()
+DESCRIPTION = "Logistic Regression"
+ROUTE = "/text"
+@router.post(ROUTE, tags=["Text Task"],
+             description=DESCRIPTION)
+async def evaluate_text(request: TextEvaluationRequest):
+    """
+    Evaluate text classification for climate disinformation detection.
+    Current Model: Logistic regression
+    - Used as a baseline for comparison
+    """
+    # Get space info
+    username, space_url = get_space_info()
+    # Define the label mapping
+    LABEL_MAPPING = {
+        "0_not_relevant": 0,
+        "1_not_happening": 1,
+        "2_not_human": 2,
+        "3_not_bad": 3,
+        "4_solutions_harmful_unnecessary": 4,
+        "5_science_unreliable": 5,
+        "6_proponents_biased": 6,
+        "7_fossil_fuels_needed": 7
+    }
+    # Load and prepare the dataset
+    dataset = load_dataset(request.dataset_name)
+    # Convert string labels to integers
+    dataset = dataset.map(lambda x: {"label": LABEL_MAPPING[x["label"]]})
+    # Split dataset
+    #train_test = dataset.train_test_split(test_size=.33, seed=42)
+    train_test = dataset["train"].train_test_split(test_size=request.test_size, seed=request.test_seed)
+    test_dataset = train_test["test"]
+    tfidf_vect = TfidfVectorizer(stop_words = 'english')
+    tfidf_train = tfidf_vect.fit_transform(train_dataset['quote'])
+    tfidf_test = tfidf_vect.transform(test_dataset['quote'])
+    # Start tracking emissions
+    tracker.start()
+    tracker.start_task("inference")
+    #--------------------------------------------------------------------------------------------
+    # YOUR MODEL INFERENCE CODE HERE
+    # Update the code below to replace the random baseline by your model inference within the inference pass where the energy consumption and emissions are tracked.
+    #--------------------------------------------------------------------------------------------
+    # Make random predictions (placeholder for actual model inference)
+    true_labels = test_dataset["label"]
+    LR = LogisticRegression(class_weight='balanced', max_iter=20, random_state=1234,
+                   solver='liblinear')
+    LR.fit(pd.DataFrame.sparse.from_spmatrix(tfidf_train), pd.DataFrame(y_train_v))
+    predictions=LR.predict(pd.DataFrame.sparse.from_spmatrix(tfidf_test))
+    #--------------------------------------------------------------------------------------------
+    # YOUR MODEL INFERENCE STOPS HERE
+    #--------------------------------------------------------------------------------------------
+    # Stop tracking emissions
+    emissions_data = tracker.stop_task()
+    # Calculate accuracy
+    accuracy = accuracy_score(true_labels, predictions)
+    # Prepare results dictionary
+    results = {
+        "username": username,
+        "space_url": space_url,
+        "submission_timestamp": datetime.now().isoformat(),
+        "model_description": DESCRIPTION,
+        "accuracy": float(accuracy),
+        "energy_consumed_wh": emissions_data.energy_consumed * 1000,
+        "emissions_gco2eq": emissions_data.emissions * 1000,
+        "emissions_data": clean_emissions_data(emissions_data),
+        "api_route": ROUTE,
+        "dataset_config": {
+            "dataset_name": request.dataset_name,
+            "test_size": request.test_size,
+            "test_seed": request.test_seed
         }
+    }
+    return results