arxiv:2410.15539

Grammatical Error Correction for Low-Resource Languages: The Case of Zarma

Published on Oct 20, 2024

Authors:

Abstract

A study on grammatical error correction for Zarma and Bambara compares rule-based methods, machine translation models, and large language models, finding that MT models outperform the others in low-resource settings.

AI-generated summary

Grammatical error correction (GEC) aims to improve quality and readability of texts through accurate correction of linguistic mistakes. Previous work has focused on high-resource languages, while low-resource languages lack robust tools. However, low-resource languages often face problems such as: non-standard orthography, limited annotated corpora, and diverse dialects, which slows down the development of GEC tools. We present a study on GEC for Zarma, spoken by over five million in West Africa. We compare three approaches: rule-based methods, machine translation (MT) models, and large language models (LLMs). We evaluated them using a dataset of more than 250,000 examples, including synthetic and human-annotated data. Our results showed that the MT-based approach using M2M100 outperforms others, with a detection rate of 95. 82% and a suggestion accuracy of 78. 90% in automatic evaluations (AE) and an average score of 3.0 out of 5.0 in manual evaluation (ME) from native speakers for grammar and logical corrections. The rule-based method was effective for spelling errors but failed on complex context-level errors. LLMs -- MT5-small -- showed moderate performance. Our work supports use of MT models to enhance GEC in low-resource settings, and we validated these results with Bambara, another West African language.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.15539 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.15539 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.