Spaces:
Sleeping
Sleeping
File size: 3,467 Bytes
29546b4 91e8a06 6dff40c 29546b4 91e8a06 32b707a 29546b4 4f3c2a8 81722bf 01ea22b 32b707a 29546b4 81722bf 58733e4 29546b4 b98f07f 81722bf e7226cc 29546b4 e7226cc 3aa78c2 81722bf 3aa78c2 81722bf f7d1b51 81722bf 3aa78c2 b98f07f 81722bf 3aa78c2 81722bf 3aa78c2 81722bf 3aa78c2 81722bf 58733e4 2a73469 fccd458 2a860f6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
from dataclasses import dataclass
from enum import Enum
@dataclass
class Task:
benchmark: str
metric: str
col_name: str
# Select your tasks here
# ---------------------------------------------------
class Tasks(Enum):
# task_key in the json file, metric_key in the json file, name to display in the leaderboard
emea_ner = Task("emea_ner", "f1", "EMEA NER")
medline_ner = Task("medline_ner", "f1", "MEDLINE NER")
NUM_FEWSHOT = 0 # Change with your few shot
# ---------------------------------------------------
# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">🏥 French Medical NLP Leaderboard</h1>"""
# What does your leaderboard evaluate?
INTRODUCTION_TEXT = """
This leaderboard evaluates French NLP models on biomedical Named Entity Recognition (NER) tasks.
We focus on BERT-like models with plans to extend to other architectures.
**Current Tasks:**
- **EMEA NER**: Named Entity Recognition on French medical texts from EMEA (European Medicines Agency)
- **MEDLINE NER**: Named Entity Recognition on French medical abstracts from MEDLINE
**Entity Types:** ANAT, CHEM, DEVI, DISO, GEOG, LIVB, OBJC, PHEN, PHYS, PROC
"""
# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = f"""
## How it works
We evaluate models by **fine-tuning** them on French medical NER tasks following the CamemBERT-bio methodology:
**Fine-tuning Parameters:**
- **Optimizer**: AdamW (following CamemBERT-bio paper)
- **Learning Rate**: 5e-5 (optimal from Optuna search - unchanged)
- **Scheduler**: Cosine with restarts (22.4% warmup ratio)
- **Steps**: 2000 (same as paper)
- **Batch Size**: 4 (CPU constraint)
- **Gradient Accumulation**: 4 steps (effective batch size 16)
- **Max Length**: 512 tokens
- **Output**: Simple linear layer (no CRF)
**Evaluation**: Uses seqeval with IOB2 scheme for entity-level **micro F1**, precision, and recall.
## Reproducibility
Results are obtained through proper fine-tuning, not zero-shot evaluation. Each model is fine-tuned independently on each task.
**Datasets:**
- EMEA: `rntc/quaero-frenchmed-ner-emea-sen`
- MEDLINE: `rntc/quaero-frenchmed-ner-medline`
"""
EVALUATION_QUEUE_TEXT = """
## Before submitting a model
### 1) Ensure your model is compatible with AutoClasses:
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("your_model_name")
model = AutoModelForTokenClassification.from_pretrained("your_model_name")
```
### 2) Model requirements:
- Must be a fine-tuned model for token classification (not just a base model)
- Should be trained on French medical NER data
- Must be publicly available on Hugging Face Hub
- Prefer safetensors format for faster loading
### 3) Expected performance:
- Base models without fine-tuning will get very low scores (~0.02 F1)
- Fine-tuned models should achieve significantly higher scores
### 4) Model card recommendations:
- Specify the training dataset used
- Include model architecture details
- Add performance metrics if available
- Use an open license
## Troubleshooting
If your model fails evaluation:
1. Check that it loads properly with AutoModelForTokenClassification
2. Verify it's trained for token classification (not just language modeling)
3. Ensure the model is public and accessible
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = r"""
"""
|