baseline

Sleeping

App Files Files Community

baseline / README.md

laureBe

Update README.md

e862b5e verified 6 months ago

preview code

raw

history blame contribute delete

2.3 kB

	---
	title: Submission Template
	emoji: 🔥
	colorFrom: yellow
	colorTo: green
	sdk: docker
	pinned: false
	---


	# Climate Disinformation Classification using XGBOOST over TF-IDF vectorized input optimized using RandomizedSearchCV

	## Model Description

	This is a model based on XGBOOST classifier for TF-IDF vectorized texts for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor.

	### Intended Use

	- Primary intended uses: Comparison for climate disinformation classification models
	- Primary intended users: Researchers and developers participating in the Frugal AI Challenge
	- Out-of-scope use cases: Not intended for production use or real-world classification tasks

	## Training Data

	The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
	- Size: ~6000 examples
	- Split: 80% train, 20% test
	- 8 categories of climate disinformation claims

	### Labels
	0. No relevant claim detected
	1. Global warming is not happening
	2. Not caused by humans
	3. Not bad or beneficial
	4. Solutions harmful/unnecessary
	5. Science is unreliable
	6. Proponents are biased
	7. Fossil fuels are needed

	## Performance

	### Metrics
	- Accuracy: 0.9815384615384616
	- Environmental Impact:
	- Emissions tracked in gCO2eq: 0.19426531051455168
	- Energy consumption tracked in Wh: 0.5262726046395284

	### Model Architecture
	The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

	## Environmental Impact

	Environmental impact is tracked using CodeCarbon, measuring:
	- Carbon emissions during inference
	- Energy consumption during inference

	This tracking helps establish a baseline for the environmental impact of model deployment and inference.

	## Limitations
	- Text Classification using XGBOOST
	- Input text vectorized with TF-IDF
	- XGBOOST parameter search with RandomizedSearchCV
	- Serves as baseline reference
	- Not suitable for any real-world applications

	## Ethical Considerations

	- Dataset contains sensitive topics related to climate disinformation
	- Model makes random predictions and should not be used for actual classification
	- Environmental impact is tracked to promote awareness of AI's carbon footprint
	```