radoslavralev commited on
Commit
ce94dd5
·
verified ·
1 Parent(s): cc01ad0

Add new SentenceTransformer model

Browse files
Files changed (2) hide show
  1. README.md +68 -65
  2. model.safetensors +1 -1
README.md CHANGED
@@ -16,50 +16,53 @@ tags:
16
  - loss:MultipleNegativesSymmetricRankingLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
  widget:
19
- - source_sentence: In 2015 Adolf Hitler appeared in the kickstarter short movie ``
20
- Kung Fury `` as Taccone ( A.K.A .
21
  sentences:
22
- - In 2015 , Adolf Hitler appeared in the Kickstarter - short film `` Kung Fury ``
23
- as Taccone ( A.K.A .
24
- - In 1795 , the only white residents were Dr. John Laidley and two brothers with
25
- the surname Ainslie .
26
- - The 125th University Match was played in March 2014 at the Rye Golf Club , Oxford
27
- , East Sussex won the game 8.5 - 6.5 .
28
- - source_sentence: From 1973 to 1974 , Aubrey toured with the Cambridge Theatre Company
29
- as Diggory in `` She Stoops to Conquer `` and again as Aguecheek .
 
30
  sentences:
31
- - Oxide can be reduced to metallic samarium at higher temperatures by heating with
32
- a reducing agent such as hydrogen or carbon monoxide .
33
- - From 1973 to 1974 Aguecheek toured with the Cambridge Theatre Company as Diggory
34
- in `` You Stoops to Conquer `` and again as Aubrey .
35
- - The medals were presented by Barry Maister , IOC member , New Zealand and Sarah
36
- Webb Gosling , Vice President of World Sailing .
37
- - source_sentence: There is no official wall on the border , although there are sections
38
- of fence near populated areas and continuous border crossings .
39
  sentences:
40
- - The 2014 -- 15 Boston Bruins season was the 91st season for the National Hockey
41
- League franchise that was established on November 1 , 1924 .
42
- - He was trained by the Inghams and owned by John Hawkes .
43
- - There is no continuous wall on the border , although there are fence sections
44
- near populated areas and official border crossings .
45
- - source_sentence: Capital . `` The French established similar hill stations in Indochina
46
- , such as Dalat built in 1921 .
 
 
47
  sentences:
48
- - Lubuk China is a small town in Alor Gajah District , Melaka , Malaysia . It is
49
- situated near the border with Negeri Sembilan .
50
- - The French established similar hill stations in Indochina , such as Dalat , built
51
- in 1921 .
52
- - John Potts ( or Pott ) was a doctor and colonial governor of Virginia in the Jamestown
53
- settlement at Virginia Colony in the early 17th century .
54
- - source_sentence: The band pursued `` signals `` in January 2012 in three weeks ,
55
- and drums were recorded in a day and a half .
56
  sentences:
57
- - It was repaired at the beginning of the 20th century and is listed as closed in
58
- our records .
59
- - The band tracked `` Signals `` in three weeks in January 2012 . Drums were recorded
60
- in a day and a half .
61
- - Contributors include actor Anton LaVey , Satanist Christopher Lee , serial killer
62
- expert Clive Barker , author Karen Greenlee , and necrophile Robert Ressler .
63
  datasets:
64
  - redis/langcache-sentencepairs-v1
65
  pipeline_tag: sentence-similarity
@@ -159,9 +162,9 @@ from sentence_transformers import SentenceTransformer
159
  model = SentenceTransformer("redis/langcache-embed-v3")
160
  # Run inference
161
  sentences = [
162
- 'The band pursued `` signals `` in January 2012 in three weeks , and drums were recorded in a day and a half .',
163
- 'The band tracked `` Signals `` in three weeks in January 2012 . Drums were recorded in a day and a half .',
164
- 'Contributors include actor Anton LaVey , Satanist Christopher Lee , serial killer expert Clive Barker , author Karen Greenlee , and necrophile Robert Ressler .',
165
  ]
166
  embeddings = model.encode(sentences)
167
  print(embeddings.shape)
@@ -170,9 +173,9 @@ print(embeddings.shape)
170
  # Get the similarity scores for the embeddings
171
  similarities = model.similarity(embeddings, embeddings)
172
  print(similarities)
173
- # tensor([[0.9961, 0.9570, 0.4941],
174
- # [0.9570, 0.9961, 0.5078],
175
- # [0.4941, 0.5078, 1.0000]], dtype=torch.bfloat16)
176
  ```
177
 
178
  <!--
@@ -238,19 +241,19 @@ You can finetune this model on your own dataset.
238
  #### LangCache Sentence Pairs (all)
239
 
240
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
241
- * Size: 62,021 training samples
242
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
243
  * Approximate statistics based on the first 1000 samples:
244
- | | sentence1 | sentence2 | label |
245
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
246
- | type | string | string | int |
247
- | details | <ul><li>min: 8 tokens</li><li>mean: 27.46 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 27.36 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>0: ~50.30%</li><li>1: ~49.70%</li></ul> |
248
  * Samples:
249
- | sentence1 | sentence2 | label |
250
- |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
251
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
252
- | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> | <code>0</code> |
253
- | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
254
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
255
  ```json
256
  {
@@ -265,19 +268,19 @@ You can finetune this model on your own dataset.
265
  #### LangCache Sentence Pairs (all)
266
 
267
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
268
- * Size: 62,021 evaluation samples
269
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
270
  * Approximate statistics based on the first 1000 samples:
271
- | | sentence1 | sentence2 | label |
272
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
273
- | type | string | string | int |
274
- | details | <ul><li>min: 8 tokens</li><li>mean: 27.46 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 27.36 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>0: ~50.30%</li><li>1: ~49.70%</li></ul> |
275
  * Samples:
276
- | sentence1 | sentence2 | label |
277
- |:--------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
278
- | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
279
- | <code>Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .</code> | <code>Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .</code> | <code>0</code> |
280
- | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
281
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
282
  ```json
283
  {
 
16
  - loss:MultipleNegativesSymmetricRankingLoss
17
  base_model: Alibaba-NLP/gte-modernbert-base
18
  widget:
19
+ - source_sentence: 'See Precambrian time scale # Proposed Geologic timeline for another
20
+ set of periods 4600 -- 541 MYA .'
21
  sentences:
22
+ - In 2014 election , Biju Janata Dal candidate Tathagat Satapathy Bharatiya Janata
23
+ party candidate Rudra Narayan Pany defeated with a margin of 1.37,340 votes .
24
+ - In Scotland , the Strathclyde Partnership for Transport , formerly known as Strathclyde
25
+ Passenger Transport Executive , comprises the former Strathclyde region , which
26
+ includes the urban area around Glasgow .
27
+ - 'See Precambrian Time Scale # Proposed Geological Timeline for another set of
28
+ periods of 4600 -- 541 MYA .'
29
+ - source_sentence: It is also 5 kilometers northeast of Tamaqua , 27 miles south of
30
+ Allentown and 9 miles northwest of Hazleton .
31
  sentences:
32
+ - In 1948 he moved to Massachusetts , and eventually settled in Vermont .
33
+ - Suddenly I remembered that I was a New Zealander , I caught the first plane home
34
+ and came back .
35
+ - It is also 5 miles northeast of Tamaqua , 27 miles south of Allentown , and 9
36
+ miles northwest of Hazleton .
37
+ - source_sentence: The party has a Member of Parliament , a member of the House of
38
+ Lords , three members of the London Assembly and two Members of the European Parliament
39
+ .
40
  sentences:
41
+ - The party has one Member of Parliament , one member of the House of Lords , three
42
+ Members of the London Assembly and two Members of the European Parliament .
43
+ - Grapsid crabs dominate in Australia , Malaysia and Panama , while gastropods Cerithidea
44
+ scalariformis and Melampus coeffeus are important seed predators in Florida mangroves
45
+ .
46
+ - Music Story is a music service website and international music data provider that
47
+ curates , aggregates and analyses metadata for digital music services .
48
+ - source_sentence: 'The play received two 1969 Tony Award nominations : Best Actress
49
+ in a Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .'
50
  sentences:
51
+ - Ravishanker is a fellow of the International Statistical Institute and an elected
52
+ member of the American Statistical Association .
53
+ - 'In 1969 , the play received two Tony - Award nominations : Best Actress in a
54
+ Theatre Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .'
55
+ - AMD and Nvidia both have proprietary methods of scaling , CrossFireX for AMD ,
56
+ and SLI for Nvidia .
57
+ - source_sentence: He was a close friend of Ángel Cabrera and is a cousin of golfer
58
+ Tony Croatto .
59
  sentences:
60
+ - He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto
61
+ .
62
+ - Eugenijus Bartulis ( born December 7 , 1949 in Kaunas ) is a Lithuanian Roman
63
+ Catholic priest , and Bishop of Šiauliai .
64
+ - UWIRE also distributes its members content to professional media outlets , including
65
+ Yahoo , CNN and CBS News .
66
  datasets:
67
  - redis/langcache-sentencepairs-v1
68
  pipeline_tag: sentence-similarity
 
162
  model = SentenceTransformer("redis/langcache-embed-v3")
163
  # Run inference
164
  sentences = [
165
+ 'He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto .',
166
+ 'He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto .',
167
+ 'UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News .',
168
  ]
169
  embeddings = model.encode(sentences)
170
  print(embeddings.shape)
 
173
  # Get the similarity scores for the embeddings
174
  similarities = model.similarity(embeddings, embeddings)
175
  print(similarities)
176
+ # tensor([[0.9922, 0.9922, 0.5352],
177
+ # [0.9922, 0.9961, 0.5391],
178
+ # [0.5352, 0.5391, 1.0000]], dtype=torch.bfloat16)
179
  ```
180
 
181
  <!--
 
241
  #### LangCache Sentence Pairs (all)
242
 
243
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
244
+ * Size: 26,850 training samples
245
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
246
  * Approximate statistics based on the first 1000 samples:
247
+ | | sentence1 | sentence2 | label |
248
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
249
+ | type | string | string | int |
250
+ | details | <ul><li>min: 8 tokens</li><li>mean: 27.35 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
251
  * Samples:
252
+ | sentence1 | sentence2 | label |
253
+ |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------|
254
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
255
+ | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
256
+ | <code>The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .</code> | <code>The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .</code> | <code>1</code> |
257
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
258
  ```json
259
  {
 
268
  #### LangCache Sentence Pairs (all)
269
 
270
  * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
271
+ * Size: 26,850 evaluation samples
272
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
273
  * Approximate statistics based on the first 1000 samples:
274
+ | | sentence1 | sentence2 | label |
275
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
276
+ | type | string | string | int |
277
+ | details | <ul><li>min: 8 tokens</li><li>mean: 27.35 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 27.27 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
278
  * Samples:
279
+ | sentence1 | sentence2 | label |
280
+ |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------|
281
+ | <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats .</code> | <code>The newer punts are still very much in existence today and run in the same fleets as the older boats .</code> | <code>1</code> |
282
+ | <code>After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall .</code> | <code>Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall .</code> | <code>1</code> |
283
+ | <code>The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 .</code> | <code>The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon .</code> | <code>1</code> |
284
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
285
  ```json
286
  {
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2436bd6a57ba4489ec4f129a2afc8517adaad6486ea6331ba055bc4c6ed07f24
3
  size 298041696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95d02211c4cca89113f9f3e93ed91f5176bf50170faa2cb835f7bfea15bb9dd2
3
  size 298041696