baseline

Sleeping

File size: 2,295 Bytes

6fb5d57
9685f7b
6fb5d57
 
 
 
 
 
 
70f5f26
e862b5e
70f5f26
 
 
4096ab5
70f5f26
 
 
dca6bb5
70f5f26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dca6bb5
70f5f26
dca6bb5
 
70f5f26
 
 
 
 
 
 
 
 
 
 
 
 
4096ab5
dca6bb5
 
 
70f5f26

---
title: Submission Template
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: docker
pinned: false
---


# Climate Disinformation Classification using XGBOOST over TF-IDF vectorized input optimized using RandomizedSearchCV

## Model Description

This is a model based on XGBOOST classifier for TF-IDF vectorized texts for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor.

### Intended Use

- **Primary intended uses**:  Comparison for climate disinformation classification models
- **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge
- **Out-of-scope use cases**: Not intended for production use or real-world classification tasks

## Training Data

The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
- Size: ~6000 examples
- Split: 80% train, 20% test
- 8 categories of climate disinformation claims

### Labels
0. No relevant claim detected
1. Global warming is not happening
2. Not caused by humans
3. Not bad or beneficial
4. Solutions harmful/unnecessary
5. Science is unreliable
6. Proponents are biased
7. Fossil fuels are needed

## Performance

### Metrics
- **Accuracy**: 0.9815384615384616
- **Environmental Impact**:
  - Emissions tracked in gCO2eq: 0.19426531051455168
  - Energy consumption tracked in Wh: 0.5262726046395284

### Model Architecture
The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

## Environmental Impact

Environmental impact is tracked using CodeCarbon, measuring:
- Carbon emissions during inference
- Energy consumption during inference

This tracking helps establish a baseline for the environmental impact of model deployment and inference.

## Limitations
- Text Classification using XGBOOST 
- Input text vectorized with TF-IDF
- XGBOOST parameter search with RandomizedSearchCV
- Serves as baseline reference
- Not suitable for any real-world applications

## Ethical Considerations

- Dataset contains sensitive topics related to climate disinformation
- Model makes random predictions and should not be used for actual classification
- Environmental impact is tracked to promote awareness of AI's carbon footprint
```