File size: 2,344 Bytes
6fb5d57
42b7ac6
 
 
6fb5d57
 
 
 
 
70f5f26
42b7ac6
70f5f26
42b7ac6
70f5f26
42b7ac6
70f5f26
42b7ac6
 
 
70f5f26
42b7ac6
 
 
 
 
 
 
 
 
 
 
 
 
 
70f5f26
 
 
42b7ac6
70f5f26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: Frugal AI Challenge Submission
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---


# Models for Climate Disinformation Classification

## Evaluate locally

To evaluate the model locally, you can use the following command:

```bash
python main.py --config config_evaluation_{model_name}.json
```

where `{model_name}` is either `distilBERT` or `embeddingML`.


## Models Description

### DistilBERT Model

The model uses the `distilbert-base-uncased` model from the Hugging Face Transformers library, fine-tuned on the 
training dataset (see below).

### Embedding + ML Model

The model uses a simple embedding layer followed by a classic ML model. Currently, the embedding layer is a simple
TF-IDF vectorizer, and the ML model is a logistic regression.

## Training Data

The model uses the [`QuotaClimat/frugalaichallenge-text-train`](https://huggingface.co/datasets/QuotaClimat/frugalaichallenge-text-train) dataset:
- Size: ~6000 examples
- Split: 80% train, 20% test
- 8 categories of climate disinformation claims

### Labels
0. No relevant claim detected
1. Global warming is not happening
2. Not caused by humans
3. Not bad or beneficial
4. Solutions harmful/unnecessary
5. Science is unreliable
6. Proponents are biased
7. Fossil fuels are needed

## Performance

### Metrics
- **Accuracy**: ~12.5% (random chance with 8 classes)
- **Environmental Impact**:
  - Emissions tracked in gCO2eq
  - Energy consumption tracked in Wh

### Model Architecture
The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

## Environmental Impact

Environmental impact is tracked using CodeCarbon, measuring:
- Carbon emissions during inference
- Energy consumption during inference

This tracking helps establish a baseline for the environmental impact of model deployment and inference.

## Limitations
- Makes completely random predictions
- No learning or pattern recognition
- No consideration of input text
- Serves only as a baseline reference
- Not suitable for any real-world applications

## Ethical Considerations

- Dataset contains sensitive topics related to climate disinformation
- Model makes random predictions and should not be used for actual classification
- Environmental impact is tracked to promote awareness of AI's carbon footprint
```