Spaces:

hardiktiwari
/

tensora-autotrain

Sleeping

File size: 2,336 Bytes

33d4721

# Sentence Transformers

This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.

AutoTrain supports the following types of sentence transformer finetuning:

- `pair`: dataset with two sentences: anchor and positive 
- `pair_class`: dataset with two sentences: premise and hypothesis and a target label
- `pair_score`: dataset with two sentences: sentence1 and sentence2 and a target score
- `triplet`: dataset with three sentences: anchor, positive and negative
- `qa`: dataset with two sentences: query and answer

## Data Format

Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.

### `pair`

For `pair` training, the data should be in the following format:

| anchor | positive |
|--------|----------|
| hello  | hi       |
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |

### `pair_class`

For `pair_class` training, the data should be in the following format:

| premise | hypothesis | label |
|---------|------------|-------|
| hello   | hi         | 1     |
| how are you | I am fine | 0 |
| What is your name? | My name is Abhishek | 1 |
| Which is the best programming language? | Python | 1 |

### `pair_score`

For `pair_score` training, the data should be in the following format:

| sentence1 | sentence2 | score |
|-----------|-----------|-------|
| hello     | hi        | 0.8   |
| how are you | I am fine | 0.2 |
| What is your name? | My name is Abhishek | 0.9 |
| Which is the best programming language? | Python | 0.7 |

### `triplet`

For `triplet` training, the data should be in the following format:

| anchor | positive | negative |
|--------|----------|----------|
| hello  | hi       | bye      |
| how are you | I am fine | I am not fine |
| What is your name? | My name is Abhishek | Whats it to you? | 
| Which is the best programming language? | Python | Javascript |

### `qa`

For `qa` training, the data should be in the following format:

| query | answer |
|-------|--------|
| hello | hi     |
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |


## Parameters
    
[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams