firqaaa commited on
Commit
f806aa6
·
1 Parent(s): d43571c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -2
README.md CHANGED
@@ -1,4 +1,62 @@
1
  ---
2
- license: cc-by-nc-sa-4.0
3
  pipeline_tag: feature-extraction
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  pipeline_tag: feature-extraction
3
+ tags:
4
+ - feature-extraction
5
+ - transformers
6
+ license: apache-2.0
7
+ language:
8
+ - id
9
+ metrics:
10
+ - accuracy
11
+ - f1
12
+ - precision
13
+ - recall
14
+ datasets:
15
+ - squad_v2
16
+ ---
17
+ ### indo-dpr-question_encoder-single-squad-base
18
+ <p style="font-size:16px">Indonesian Dense Passage Retrieval trained on translated SQuADv2.0 dataset in DPR format.</p>
19
+
20
+
21
+ ### Evaluation
22
+
23
+ | Class | Precision | Recall | F1-Score | Support |
24
+ |-------|-----------|--------|----------|---------|
25
+ | hard_negative | 0.9963 | 0.9963 | 0.9963 | 183090 |
26
+ | positive | 0.8849 | 0.8849 | 0.8849 | 5910 |
27
+
28
+ | Metric | Value |
29
+ |--------|-------|
30
+ | Accuracy | 0.9928 |
31
+ | Macro Average | 0.9406 |
32
+ | Weighted Average | 0.9928 |
33
+
34
+ <p style="font-size:16px">Note: This report is for evaluation on the dev set, after 12000 batches.</p>
35
+
36
+ ### Usage
37
+
38
+ ```python
39
+ from transformers import DPRContextEncoder, DPRContextEncoderTokenizer
40
+
41
+ tokenizer = DPRContextEncoderTokenizer.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base')
42
+ model = DPRContextEncoder.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base')
43
+ input_ids = tokenizer("Ibukota Indonesia terletak dimana?", return_tensors='pt')["input_ids"]
44
+ embeddings = model(input_ids).pooler_output
45
+ ```
46
+
47
+ You can use it using `haystack` as follows:
48
+
49
+ ```
50
+ from haystack.nodes import DensePassageRetriever
51
+ from haystack.document_stores import InMemoryDocumentStore
52
+
53
+ retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(),
54
+ query_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base",
55
+ passage_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base",
56
+ max_seq_len_query=64,
57
+ max_seq_len_passage=256,
58
+ batch_size=16,
59
+ use_gpu=True,
60
+ embed_title=True,
61
+ use_fast_tokenizers=True)
62
+ ```