submission / README.md
pierre-loic's picture
update content with the text model from Thomas repository https://huggingface.co/spaces/tombou/frugal-ai-challenge
42b7ac6
|
raw
history blame
2.34 kB
metadata
title: Frugal AI Challenge Submission
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

Models for Climate Disinformation Classification

Evaluate locally

To evaluate the model locally, you can use the following command:

python main.py --config config_evaluation_{model_name}.json

where {model_name} is either distilBERT or embeddingML.

Models Description

DistilBERT Model

The model uses the distilbert-base-uncased model from the Hugging Face Transformers library, fine-tuned on the training dataset (see below).

Embedding + ML Model

The model uses a simple embedding layer followed by a classic ML model. Currently, the embedding layer is a simple TF-IDF vectorizer, and the ML model is a logistic regression.

Training Data

The model uses the QuotaClimat/frugalaichallenge-text-train dataset:

  • Size: ~6000 examples
  • Split: 80% train, 20% test
  • 8 categories of climate disinformation claims

Labels

  1. No relevant claim detected
  2. Global warming is not happening
  3. Not caused by humans
  4. Not bad or beneficial
  5. Solutions harmful/unnecessary
  6. Science is unreliable
  7. Proponents are biased
  8. Fossil fuels are needed

Performance

Metrics

  • Accuracy: ~12.5% (random chance with 8 classes)
  • Environmental Impact:
    • Emissions tracked in gCO2eq
    • Energy consumption tracked in Wh

Model Architecture

The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

Environmental Impact

Environmental impact is tracked using CodeCarbon, measuring:

  • Carbon emissions during inference
  • Energy consumption during inference

This tracking helps establish a baseline for the environmental impact of model deployment and inference.

Limitations

  • Makes completely random predictions
  • No learning or pattern recognition
  • No consideration of input text
  • Serves only as a baseline reference
  • Not suitable for any real-world applications

Ethical Considerations

  • Dataset contains sensitive topics related to climate disinformation
  • Model makes random predictions and should not be used for actual classification
  • Environmental impact is tracked to promote awareness of AI's carbon footprint