Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,62 @@
|
|
1 |
---
|
2 |
-
license: cc-by-nc-sa-4.0
|
3 |
pipeline_tag: feature-extraction
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
2 |
pipeline_tag: feature-extraction
|
3 |
+
tags:
|
4 |
+
- feature-extraction
|
5 |
+
- transformers
|
6 |
+
license: apache-2.0
|
7 |
+
language:
|
8 |
+
- id
|
9 |
+
metrics:
|
10 |
+
- accuracy
|
11 |
+
- f1
|
12 |
+
- precision
|
13 |
+
- recall
|
14 |
+
datasets:
|
15 |
+
- squad_v2
|
16 |
+
---
|
17 |
+
### indo-dpr-question_encoder-single-squad-base
|
18 |
+
<p style="font-size:16px">Indonesian Dense Passage Retrieval trained on translated SQuADv2.0 dataset in DPR format.</p>
|
19 |
+
|
20 |
+
|
21 |
+
### Evaluation
|
22 |
+
|
23 |
+
| Class | Precision | Recall | F1-Score | Support |
|
24 |
+
|-------|-----------|--------|----------|---------|
|
25 |
+
| hard_negative | 0.9963 | 0.9963 | 0.9963 | 183090 |
|
26 |
+
| positive | 0.8849 | 0.8849 | 0.8849 | 5910 |
|
27 |
+
|
28 |
+
| Metric | Value |
|
29 |
+
|--------|-------|
|
30 |
+
| Accuracy | 0.9928 |
|
31 |
+
| Macro Average | 0.9406 |
|
32 |
+
| Weighted Average | 0.9928 |
|
33 |
+
|
34 |
+
<p style="font-size:16px">Note: This report is for evaluation on the dev set, after 12000 batches.</p>
|
35 |
+
|
36 |
+
### Usage
|
37 |
+
|
38 |
+
```python
|
39 |
+
from transformers import DPRContextEncoder, DPRContextEncoderTokenizer
|
40 |
+
|
41 |
+
tokenizer = DPRContextEncoderTokenizer.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base')
|
42 |
+
model = DPRContextEncoder.from_pretrained('firqaaa/indo-dpr-ctx_encoder-single-squad-base')
|
43 |
+
input_ids = tokenizer("Ibukota Indonesia terletak dimana?", return_tensors='pt')["input_ids"]
|
44 |
+
embeddings = model(input_ids).pooler_output
|
45 |
+
```
|
46 |
+
|
47 |
+
You can use it using `haystack` as follows:
|
48 |
+
|
49 |
+
```
|
50 |
+
from haystack.nodes import DensePassageRetriever
|
51 |
+
from haystack.document_stores import InMemoryDocumentStore
|
52 |
+
|
53 |
+
retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(),
|
54 |
+
query_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base",
|
55 |
+
passage_embedding_model="firqaaa/indo-dpr-ctx_encoder-single-squad-base",
|
56 |
+
max_seq_len_query=64,
|
57 |
+
max_seq_len_passage=256,
|
58 |
+
batch_size=16,
|
59 |
+
use_gpu=True,
|
60 |
+
embed_title=True,
|
61 |
+
use_fast_tokenizers=True)
|
62 |
+
```
|