File size: 3,467 Bytes
29546b4
91e8a06
6dff40c
29546b4
 
 
 
 
 
91e8a06
32b707a
 
29546b4
4f3c2a8
81722bf
 
01ea22b
 
32b707a
 
29546b4
 
 
81722bf
58733e4
29546b4
b98f07f
81722bf
 
 
 
 
 
 
 
e7226cc
 
29546b4
e7226cc
3aa78c2
 
81722bf
 
 
 
 
 
 
 
 
 
 
 
 
 
3aa78c2
81722bf
f7d1b51
81722bf
 
 
3aa78c2
 
b98f07f
81722bf
3aa78c2
81722bf
3aa78c2
81722bf
 
 
3aa78c2
 
81722bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58733e4
2a73469
 
fccd458
2a860f6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
from dataclasses import dataclass
from enum import Enum

@dataclass
class Task:
    benchmark: str
    metric: str
    col_name: str


# Select your tasks here
# ---------------------------------------------------
class Tasks(Enum):
    # task_key in the json file, metric_key in the json file, name to display in the leaderboard 
    emea_ner = Task("emea_ner", "f1", "EMEA NER")
    medline_ner = Task("medline_ner", "f1", "MEDLINE NER")

NUM_FEWSHOT = 0 # Change with your few shot
# ---------------------------------------------------



# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">🏥 French Medical NLP Leaderboard</h1>"""

# What does your leaderboard evaluate?
INTRODUCTION_TEXT = """
This leaderboard evaluates French NLP models on biomedical Named Entity Recognition (NER) tasks.
We focus on BERT-like models with plans to extend to other architectures.

**Current Tasks:**
- **EMEA NER**: Named Entity Recognition on French medical texts from EMEA (European Medicines Agency)
- **MEDLINE NER**: Named Entity Recognition on French medical abstracts from MEDLINE

**Entity Types:** ANAT, CHEM, DEVI, DISO, GEOG, LIVB, OBJC, PHEN, PHYS, PROC
"""

# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = f"""
## How it works

We evaluate models by **fine-tuning** them on French medical NER tasks following the CamemBERT-bio methodology:

**Fine-tuning Parameters:**
- **Optimizer**: AdamW (following CamemBERT-bio paper)
- **Learning Rate**: 5e-5 (optimal from Optuna search - unchanged)
- **Scheduler**: Cosine with restarts (22.4% warmup ratio)
- **Steps**: 2000 (same as paper)
- **Batch Size**: 4 (CPU constraint)
- **Gradient Accumulation**: 4 steps (effective batch size 16)
- **Max Length**: 512 tokens
- **Output**: Simple linear layer (no CRF)

**Evaluation**: Uses seqeval with IOB2 scheme for entity-level **micro F1**, precision, and recall.

## Reproducibility
Results are obtained through proper fine-tuning, not zero-shot evaluation. Each model is fine-tuned independently on each task.

**Datasets:**
- EMEA: `rntc/quaero-frenchmed-ner-emea-sen`
- MEDLINE: `rntc/quaero-frenchmed-ner-medline`
"""

EVALUATION_QUEUE_TEXT = """
## Before submitting a model

### 1) Ensure your model is compatible with AutoClasses:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("your_model_name")
model = AutoModelForTokenClassification.from_pretrained("your_model_name")
```

### 2) Model requirements:
- Must be a fine-tuned model for token classification (not just a base model)
- Should be trained on French medical NER data
- Must be publicly available on Hugging Face Hub
- Prefer safetensors format for faster loading

### 3) Expected performance:
- Base models without fine-tuning will get very low scores (~0.02 F1)
- Fine-tuned models should achieve significantly higher scores

### 4) Model card recommendations:
- Specify the training dataset used
- Include model architecture details
- Add performance metrics if available
- Use an open license

## Troubleshooting
If your model fails evaluation:
1. Check that it loads properly with AutoModelForTokenClassification
2. Verify it's trained for token classification (not just language modeling)
3. Ensure the model is public and accessible
"""

CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = r"""
"""