Update README.md
Browse files
README.md
CHANGED
@@ -5,4 +5,37 @@ datasets:
|
|
5 |
metrics:
|
6 |
- accuracy
|
7 |
pipeline_tag: text-classification
|
8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
metrics:
|
6 |
- accuracy
|
7 |
pipeline_tag: text-classification
|
8 |
+
---
|
9 |
+
|
10 |
+
# BERT-based Classification Model for AI Generated Text Detection
|
11 |
+
|
12 |
+
## Model Overview
|
13 |
+
This BERT-based model is fine-tuned for the task of Ai generated text detection, especially in a TEXT-SQL senario.
|
14 |
+
Please be noted that this model is still in testing phase, its validity has not been fully tested.
|
15 |
+
|
16 |
+
## Model Details
|
17 |
+
- **Architecture**: BERT (bert-base-uncased)
|
18 |
+
- **Training Data**: The model was trained on a dataset of 2000 labeled human and ai created questions.
|
19 |
+
- **Training Procedure**:
|
20 |
+
- **Epochs**: 10
|
21 |
+
- **Batch Size**: 16
|
22 |
+
- **Learning Rate**: 2e-5
|
23 |
+
- **Warmup Steps**: 500
|
24 |
+
- **Weight Decay**: 0.01
|
25 |
+
- **Model Performance**:
|
26 |
+
- **Accuracy**: 84.5%
|
27 |
+
- **Precision**: 1
|
28 |
+
- **Recall**: 0.845
|
29 |
+
- **F1 Score**: 0.916
|
30 |
+
|
31 |
+
## Limitations and Ethical Considerations
|
32 |
+
|
33 |
+
### Limitations
|
34 |
+
The model may not perform well on text that are significantly different from the training data.
|
35 |
+
|
36 |
+
### Ethical Considerations
|
37 |
+
Be aware of potential biases in the training data that could affect the model's predictions. Ensure that the model is used in a fair and unbiased manner.
|
38 |
+
|
39 |
+
## References
|
40 |
+
- **BERT Paper**: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
|
41 |
+
- **Dataset**: [Link to the dataset](https://huggingface.co/datasets/yongchao/gptgen_text_detection)
|