tensora-autotrain / docs /source /tasks /sentence_transformer.mdx
hardiktiwari's picture
Upload 244 files
33d4721 verified
# Sentence Transformers
This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.
AutoTrain supports the following types of sentence transformer finetuning:
- `pair`: dataset with two sentences: anchor and positive
- `pair_class`: dataset with two sentences: premise and hypothesis and a target label
- `pair_score`: dataset with two sentences: sentence1 and sentence2 and a target score
- `triplet`: dataset with three sentences: anchor, positive and negative
- `qa`: dataset with two sentences: query and answer
## Data Format
Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.
### `pair`
For `pair` training, the data should be in the following format:
| anchor | positive |
|--------|----------|
| hello | hi |
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |
### `pair_class`
For `pair_class` training, the data should be in the following format:
| premise | hypothesis | label |
|---------|------------|-------|
| hello | hi | 1 |
| how are you | I am fine | 0 |
| What is your name? | My name is Abhishek | 1 |
| Which is the best programming language? | Python | 1 |
### `pair_score`
For `pair_score` training, the data should be in the following format:
| sentence1 | sentence2 | score |
|-----------|-----------|-------|
| hello | hi | 0.8 |
| how are you | I am fine | 0.2 |
| What is your name? | My name is Abhishek | 0.9 |
| Which is the best programming language? | Python | 0.7 |
### `triplet`
For `triplet` training, the data should be in the following format:
| anchor | positive | negative |
|--------|----------|----------|
| hello | hi | bye |
| how are you | I am fine | I am not fine |
| What is your name? | My name is Abhishek | Whats it to you? |
| Which is the best programming language? | Python | Javascript |
### `qa`
For `qa` training, the data should be in the following format:
| query | answer |
|-------|--------|
| hello | hi |
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |
## Parameters
[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams