Spaces:
Sleeping
Sleeping
# Sentence Transformers | |
This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset. | |
AutoTrain supports the following types of sentence transformer finetuning: | |
- `pair`: dataset with two sentences: anchor and positive | |
- `pair_class`: dataset with two sentences: premise and hypothesis and a target label | |
- `pair_score`: dataset with two sentences: sentence1 and sentence2 and a target score | |
- `triplet`: dataset with three sentences: anchor, positive and negative | |
- `qa`: dataset with two sentences: query and answer | |
## Data Format | |
Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub. | |
### `pair` | |
For `pair` training, the data should be in the following format: | |
| anchor | positive | | |
|--------|----------| | |
| hello | hi | | |
| how are you | I am fine | | |
| What is your name? | My name is Abhishek | | |
| Which is the best programming language? | Python | | |
### `pair_class` | |
For `pair_class` training, the data should be in the following format: | |
| premise | hypothesis | label | | |
|---------|------------|-------| | |
| hello | hi | 1 | | |
| how are you | I am fine | 0 | | |
| What is your name? | My name is Abhishek | 1 | | |
| Which is the best programming language? | Python | 1 | | |
### `pair_score` | |
For `pair_score` training, the data should be in the following format: | |
| sentence1 | sentence2 | score | | |
|-----------|-----------|-------| | |
| hello | hi | 0.8 | | |
| how are you | I am fine | 0.2 | | |
| What is your name? | My name is Abhishek | 0.9 | | |
| Which is the best programming language? | Python | 0.7 | | |
### `triplet` | |
For `triplet` training, the data should be in the following format: | |
| anchor | positive | negative | | |
|--------|----------|----------| | |
| hello | hi | bye | | |
| how are you | I am fine | I am not fine | | |
| What is your name? | My name is Abhishek | Whats it to you? | | |
| Which is the best programming language? | Python | Javascript | | |
### `qa` | |
For `qa` training, the data should be in the following format: | |
| query | answer | | |
|-------|--------| | |
| hello | hi | | |
| how are you | I am fine | | |
| What is your name? | My name is Abhishek | | |
| Which is the best programming language? | Python | | |
## Parameters | |
[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams | |