Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Lyte 
posted an update 4 days ago
Post
5553
Introducing Nanochat Moroccan

Nanochat Moroccan is the first language model family built specifically for Moroccan Darija.

This project brings together a small family of models and datasets centered on Darija, with the goal of building something genuinely useful for a language that is still underserved in AI.

1. Models

- KandirResearch/Nanochat-Moroccan-Base-0.7B
- KandirResearch/Nanochat-Moroccan-Instruct-0.7B-pt-raw
- KandirResearch/Nanochat-Moroccan-Instruct-0.7B

2. Data

- Lyte/darija-pretraining-corpus
- Lyte/darija-pretraining-corpus-nanochat
- Lyte/Moroccan-Darija-Instruct-573K
- GemMaroc/TULU-3-50k-darija-english

3. Collection

- https://huggingface.co/collections/KandirResearch/nanochat-the-first-moroccan-darija-language-model-family

Moroccan Darija is spoken by millions of people, yet it remains underrepresented in language technology. Nanochat Moroccan is a step toward building tools that take the language seriously.

You are welcome to try it and chat with it here:
Lyte/Nanochat-Moroccco-Instruct
In this post