Spaces:

hardiktiwari
/

tensora-autotrain

Sleeping

App Files Files Community

tensora-autotrain / docs /source /tasks /sentence_transformer.mdx

hardiktiwari

Upload 244 files

33d4721 verified 2 months ago

raw

history blame contribute delete

2.34 kB

	# Sentence Transformers

	This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.

	AutoTrain supports the following types of sentence transformer finetuning:

	- `pair`: dataset with two sentences: anchor and positive
	- `pair_class`: dataset with two sentences: premise and hypothesis and a target label
	- `pair_score`: dataset with two sentences: sentence1 and sentence2 and a target score
	- `triplet`: dataset with three sentences: anchor, positive and negative
	- `qa`: dataset with two sentences: query and answer

	## Data Format

	Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.

	### `pair`

	For `pair` training, the data should be in the following format:

	\| anchor \| positive \|
	\|--------\|----------\|
	\| hello \| hi \|
	\| how are you \| I am fine \|
	\| What is your name? \| My name is Abhishek \|
	\| Which is the best programming language? \| Python \|

	### `pair_class`

	For `pair_class` training, the data should be in the following format:

	\| premise \| hypothesis \| label \|
	\|---------\|------------\|-------\|
	\| hello \| hi \| 1 \|
	\| how are you \| I am fine \| 0 \|
	\| What is your name? \| My name is Abhishek \| 1 \|
	\| Which is the best programming language? \| Python \| 1 \|

	### `pair_score`

	For `pair_score` training, the data should be in the following format:

	\| sentence1 \| sentence2 \| score \|
	\|-----------\|-----------\|-------\|
	\| hello \| hi \| 0.8 \|
	\| how are you \| I am fine \| 0.2 \|
	\| What is your name? \| My name is Abhishek \| 0.9 \|
	\| Which is the best programming language? \| Python \| 0.7 \|

	### `triplet`

	For `triplet` training, the data should be in the following format:

	\| anchor \| positive \| negative \|
	\|--------\|----------\|----------\|
	\| hello \| hi \| bye \|
	\| how are you \| I am fine \| I am not fine \|
	\| What is your name? \| My name is Abhishek \| Whats it to you? \|
	\| Which is the best programming language? \| Python \| Javascript \|

	### `qa`

	For `qa` training, the data should be in the following format:

	\| query \| answer \|
	\|-------\|--------\|
	\| hello \| hi \|
	\| how are you \| I am fine \|
	\| What is your name? \| My name is Abhishek \|
	\| Which is the best programming language? \| Python \|


	## Parameters

	[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams

	# Sentence Transformers

	This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.

	AutoTrain supports the following types of sentence transformer finetuning:

	- `pair`: dataset with two sentences: anchor and positive
	- `pair_class`: dataset with two sentences: premise and hypothesis and a target label
	- `pair_score`: dataset with two sentences: sentence1 and sentence2 and a target score
	- `triplet`: dataset with three sentences: anchor, positive and negative
	- `qa`: dataset with two sentences: query and answer

	## Data Format

	Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.

	### `pair`

	For `pair` training, the data should be in the following format:

	\| anchor \| positive \|
	\|--------\|----------\|
	\| hello \| hi \|
	\| how are you \| I am fine \|
	\| What is your name? \| My name is Abhishek \|
	\| Which is the best programming language? \| Python \|

	### `pair_class`

	For `pair_class` training, the data should be in the following format:

	\| premise \| hypothesis \| label \|
	\|---------\|------------\|-------\|
	\| hello \| hi \| 1 \|
	\| how are you \| I am fine \| 0 \|
	\| What is your name? \| My name is Abhishek \| 1 \|
	\| Which is the best programming language? \| Python \| 1 \|

	### `pair_score`

	For `pair_score` training, the data should be in the following format:

	\| sentence1 \| sentence2 \| score \|
	\|-----------\|-----------\|-------\|
	\| hello \| hi \| 0.8 \|
	\| how are you \| I am fine \| 0.2 \|
	\| What is your name? \| My name is Abhishek \| 0.9 \|
	\| Which is the best programming language? \| Python \| 0.7 \|

	### `triplet`

	For `triplet` training, the data should be in the following format:

	\| anchor \| positive \| negative \|
	\|--------\|----------\|----------\|
	\| hello \| hi \| bye \|
	\| how are you \| I am fine \| I am not fine \|
	\| What is your name? \| My name is Abhishek \| Whats it to you? \|
	\| Which is the best programming language? \| Python \| Javascript \|

	### `qa`

	For `qa` training, the data should be in the following format:

	\| query \| answer \|
	\|-------\|--------\|
	\| hello \| hi \|
	\| how are you \| I am fine \|
	\| What is your name? \| My name is Abhishek \|
	\| Which is the best programming language? \| Python \|


	## Parameters

	[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams