File size: 2,336 Bytes
33d4721
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# Sentence Transformers

This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.

AutoTrain supports the following types of sentence transformer finetuning:

- `pair`: dataset with two sentences: anchor and positive 
- `pair_class`: dataset with two sentences: premise and hypothesis and a target label
- `pair_score`: dataset with two sentences: sentence1 and sentence2 and a target score
- `triplet`: dataset with three sentences: anchor, positive and negative
- `qa`: dataset with two sentences: query and answer

## Data Format

Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.

### `pair`

For `pair` training, the data should be in the following format:

| anchor | positive |
|--------|----------|
| hello  | hi       |
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |

### `pair_class`

For `pair_class` training, the data should be in the following format:

| premise | hypothesis | label |
|---------|------------|-------|
| hello   | hi         | 1     |
| how are you | I am fine | 0 |
| What is your name? | My name is Abhishek | 1 |
| Which is the best programming language? | Python | 1 |

### `pair_score`

For `pair_score` training, the data should be in the following format:

| sentence1 | sentence2 | score |
|-----------|-----------|-------|
| hello     | hi        | 0.8   |
| how are you | I am fine | 0.2 |
| What is your name? | My name is Abhishek | 0.9 |
| Which is the best programming language? | Python | 0.7 |

### `triplet`

For `triplet` training, the data should be in the following format:

| anchor | positive | negative |
|--------|----------|----------|
| hello  | hi       | bye      |
| how are you | I am fine | I am not fine |
| What is your name? | My name is Abhishek | Whats it to you? | 
| Which is the best programming language? | Python | Javascript |

### `qa`

For `qa` training, the data should be in the following format:

| query | answer |
|-------|--------|
| hello | hi     |
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |


## Parameters
    
[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams