metadata

title: Submission Template
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: docker
pinned: false

Climate Disinformation Classification using XGBOOST over TF-IDF vectorized input optimized using RandomizedSearchCV

Model Description

This is a model based on XGBOOST classifier for TF-IDF vectorized texts for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor.

Intended Use

Primary intended uses: Comparison for climate disinformation classification models
Primary intended users: Researchers and developers participating in the Frugal AI Challenge
Out-of-scope use cases: Not intended for production use or real-world classification tasks

Training Data

The model uses the QuotaClimat/frugalaichallenge-text-train dataset:

Size: ~6000 examples
Split: 80% train, 20% test
8 categories of climate disinformation claims

Labels

No relevant claim detected
Global warming is not happening
Not caused by humans
Not bad or beneficial
Solutions harmful/unnecessary
Science is unreliable
Proponents are biased
Fossil fuels are needed

Performance

Metrics

Accuracy: 0.9815384615384616
Environmental Impact:
- Emissions tracked in gCO2eq: 0.19426531051455168
- Energy consumption tracked in Wh: 0.5262726046395284

Model Architecture

The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

Environmental Impact

Environmental impact is tracked using CodeCarbon, measuring:

Carbon emissions during inference
Energy consumption during inference

This tracking helps establish a baseline for the environmental impact of model deployment and inference.

Limitations

Text Classification using XGBOOST
Input text vectorized with TF-IDF
XGBOOST parameter search with RandomizedSearchCV
Serves as baseline reference
Not suitable for any real-world applications

Ethical Considerations

Dataset contains sensitive topics related to climate disinformation
Model makes random predictions and should not be used for actual classification
Environmental impact is tracked to promote awareness of AI's carbon footprint