MALIBA-AI
/

bambara-embeddings

Feature Extraction

Model card Files Files and versions Community

bambara-embeddings / README.md

sudoping01's picture

Update README.md

65bcbe1 verified 9 days ago

|

history blame contribute delete

1.5 kB

	---
	language: bm
	tags:
	- bambara
	- fasttext
	- embeddings
	- word-vectors
	- african-nlp
	- low-resource
	license: apache-2.0
	datasets:
	- bambara-corpus
	metrics:
	- cosine_similarity
	pipeline_tag: feature-extraction
	---

	# Bambara FastText Embeddings

	## Model Description

	This model provides FastText word embeddings for the Bambara language (Bamanankan), a Mande language spoken primarily in Mali. The embeddings capture semantic relationships between Bambara words and enable various NLP tasks for this low-resource African language.

	Model Type: FastText Word Embeddings
	Language: Bambara (bm)
	License: Apache 2.0


	## Model Details

	### Model Architecture
	- Algorithm: FastText with subword information
	- Vector Dimension: 300
	- Vocabulary Size: 9,973 unique Bambara words
	- Training Method: Skip-gram with negative sampling
	- Subword Information: Character n-grams (enables handling of out-of-vocabulary words)

	### Training Data
	The model was trained on Bambara text corpora, building upon the work of David Ifeoluwa Adelani's research on African language embeddings.

	### Intended Use
	This model is designed for:
	- Semantic similarity tasks in Bambara
	- Information retrieval for Bambara documents
	- Cross-lingual research involving Bambara
	- Cultural preservation and digital humanities projects
	- Educational applications for Bambara language learning
	- Foundation for downstream NLP tasks in Bambara


	## Usage
	```
	Coming soon
	```