|
--- |
|
title: Submission Template |
|
emoji: 🔥 |
|
colorFrom: yellow |
|
colorTo: green |
|
sdk: docker |
|
pinned: false |
|
--- |
|
|
|
|
|
# Climate Disinformation Classification using XGBOOST over TF-IDF vectorized input optimized using RandomizedSearchCV |
|
|
|
## Model Description |
|
|
|
This is a model based on XGBOOST classifier for TF-IDF vectorized texts for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor. |
|
|
|
### Intended Use |
|
|
|
- **Primary intended uses**: Comparison for climate disinformation classification models |
|
- **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge |
|
- **Out-of-scope use cases**: Not intended for production use or real-world classification tasks |
|
|
|
## Training Data |
|
|
|
The model uses the QuotaClimat/frugalaichallenge-text-train dataset: |
|
- Size: ~6000 examples |
|
- Split: 80% train, 20% test |
|
- 8 categories of climate disinformation claims |
|
|
|
### Labels |
|
0. No relevant claim detected |
|
1. Global warming is not happening |
|
2. Not caused by humans |
|
3. Not bad or beneficial |
|
4. Solutions harmful/unnecessary |
|
5. Science is unreliable |
|
6. Proponents are biased |
|
7. Fossil fuels are needed |
|
|
|
## Performance |
|
|
|
### Metrics |
|
- **Accuracy**: 0.9815384615384616 |
|
- **Environmental Impact**: |
|
- Emissions tracked in gCO2eq: 0.19426531051455168 |
|
- Energy consumption tracked in Wh: 0.5262726046395284 |
|
|
|
### Model Architecture |
|
The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline. |
|
|
|
## Environmental Impact |
|
|
|
Environmental impact is tracked using CodeCarbon, measuring: |
|
- Carbon emissions during inference |
|
- Energy consumption during inference |
|
|
|
This tracking helps establish a baseline for the environmental impact of model deployment and inference. |
|
|
|
## Limitations |
|
- Text Classification using XGBOOST |
|
- Input text vectorized with TF-IDF |
|
- XGBOOST parameter search with RandomizedSearchCV |
|
- Serves as baseline reference |
|
- Not suitable for any real-world applications |
|
|
|
## Ethical Considerations |
|
|
|
- Dataset contains sensitive topics related to climate disinformation |
|
- Model makes random predictions and should not be used for actual classification |
|
- Environmental impact is tracked to promote awareness of AI's carbon footprint |
|
``` |
|
|