Jakh0103 commited on
Commit
0e14495
·
verified ·
1 Parent(s): b806a38

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - cis-lmu/glotlid-corpus
4
+ pipeline_tag: text-classification
5
+ metrics:
6
+ - f1
7
+ ---
8
+
9
+ ## Description
10
+ **ConLID**: Language Identification model that supports more than 2000 languages (three-letter ISO codes with script). For the list of all supported languages please refer to [labels.json](https://huggingface.co/Jakh0103/lid/blob/main/labels.json).
11
+
12
+ Repository: [GitHub](https://github.com/epfl-nlp/language-identification)
13
+
14
+ ## Usage
15
+
16
+ **Download the model**
17
+ ```
18
+ from huggingface_hub import snapshot_download
19
+
20
+ snapshot_download(repo_id="Jakh0103/lid", local_dir="checkpoint")
21
+ ```
22
+
23
+ **Use the model**
24
+ ```
25
+ from model import LID
26
+ model = LID.from_pretrained(dir='checkpoint')
27
+
28
+ # print the supported labels
29
+ print(model.get_labels())
30
+ ## ['aai_Latn', 'aak_Latn', 'aau_Latn', 'aaz_Latn', 'aba_Latn', ...]
31
+
32
+ # prediction
33
+ model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!")
34
+ # (['eng_Latn'], [0.970989465713501])
35
+
36
+ model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!", k=3)
37
+ ## (['eng_Latn', 'sco_Latn', 'jam_Latn'], [0.970989465713501, 0.006496887654066086, 0.00487488554790616])
38
+ ```