fernandogd97 commited on
Commit
cd4009e
verified
1 Parent(s): f642df6

Add model card

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - es
5
+ base_model:
6
+ - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
7
+ tags:
8
+ - medical
9
+ - spanish
10
+ - bi-encoder
11
+ - entity-linking
12
+ - sapbert
13
+ - umls
14
+ - snomed-ct
15
+ ---
16
+
17
+ # **MedProcNER-bi-encoder**
18
+
19
+ ## Model Description
20
+
21
+ MedProcNER-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the MedProcNER corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks.
22
+
23
+ ## 馃挕 Intended Use
24
+ - **Domain**: Spanish Clinical NLP
25
+ - **Tasks**: Entity linking of MedProcNER mentions to SNOMED-CT concepts
26
+ - **Evaluated On**: MedProcNER (Gold Standard, Unseen Mentions, Unseen Codes)
27
+ - **Users**: Researchers and developers focusing on specialized medical NEL
28
+
29
+ ### 馃挰 Definitions
30
+ - **Gold Standard**: Mentions present in the training set (seen mentions and codes).
31
+ - **Unseen Mentions**: Mentions that do not appear in training but reference known codes.
32
+ - **Unseen Codes**: Mentions associated with SNOMED-CT codes never seen during training.
33
+
34
+ ## 馃搱 Performance Summary (Top-25 Accuracy)
35
+
36
+ | Evaluation Split | Top-25 Accuracy |
37
+ |--------------------|-----------------|
38
+ | Gold Standard | 0.917 |
39
+ | Unseen Mentions | 0.831 |
40
+ | Unseen Codes | 0.808 |
41
+
42
+ ## 馃И Usage
43
+
44
+ ```python
45
+ from transformers import AutoModel, AutoTokenizer
46
+ import torch
47
+
48
+ model = AutoModel.from_pretrained("ICB-UMA/MedProcNER-bi-encoder")
49
+ tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/MedProcNER-bi-encoder")
50
+
51
+ mention = "insuficiencia renal aguda"
52
+ inputs = tokenizer(mention, return_tensors="pt")
53
+ with torch.no_grad():
54
+ outputs = model(**inputs)
55
+ embedding = outputs.last_hidden_state[:, 0, :]
56
+ print(embedding.shape)
57
+ ```
58
+
59
+ Use with [Faiss](https://github.com/facebookresearch/faiss) or [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) for efficient retrieval.
60
+
61
+ ## 鈿狅笍 Limitations
62
+
63
+ - The model is specialized for MedProcNER mentions and may underperform in other domains or corpora.
64
+ - Expert supervision is advised for clinical deployment.
65
+
66
+ ## 馃摎 Citation
67
+
68
+ > Gallego, Fernando and L贸pez-Garc铆a, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986
69
+
70
+ ## Authors
71
+
72
+ Fernando Gallego, Guillermo L贸pez-Garc铆a, Luis Gasco-S谩nchez, Martin Krallinger, Francisco J Veredas