Instructions to use monsoon-nlp/dv-wave with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use monsoon-nlp/dv-wave with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("monsoon-nlp/dv-wave", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,9 +1,13 @@
|
|
| 1 |
# dv-wave
|
| 2 |
|
| 3 |
-
This is a
|
|
|
|
| 4 |
|
| 5 |
Tokenization and training CoLab: https://colab.research.google.com/drive/1ZJ3tU9MwyWj6UtQ-8G7QJKTn-hG1uQ9v?usp=sharing
|
| 6 |
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
## Corpus
|
| 9 |
|
|
@@ -14,4 +18,4 @@ of Dhivehi text (79MB deduped).
|
|
| 14 |
|
| 15 |
## Vocabulary
|
| 16 |
|
| 17 |
-
Included as vocab.txt in the upload - vocab_size is
|
|
|
|
| 1 |
# dv-wave
|
| 2 |
|
| 3 |
+
This is a second attempt at a Dhivehi language model trained with
|
| 4 |
+
Google Research's [ELECTRA](https://github.com/google-research/electra).
|
| 5 |
|
| 6 |
Tokenization and training CoLab: https://colab.research.google.com/drive/1ZJ3tU9MwyWj6UtQ-8G7QJKTn-hG1uQ9v?usp=sharing
|
| 7 |
|
| 8 |
+
V1: similar performance to mBERT after 3 epochs
|
| 9 |
+
|
| 10 |
+
V2: fixed tokenizers do_lower_case=False and strip_accents=False to preserve vowel signs of Dhivehi
|
| 11 |
|
| 12 |
## Corpus
|
| 13 |
|
|
|
|
| 18 |
|
| 19 |
## Vocabulary
|
| 20 |
|
| 21 |
+
Included as vocab.txt in the upload - vocab_size is 29874
|