submission / README.md
pierre-loic's picture
update audio content
aeb6036
metadata
title: Frugal AI Challenge Submission
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

πŸ”Š Audio classification

Strategy for solving the problem

To minimize energy consumption, we deliberately chose not to use deep learning techniques such as CNN-based spectrogram analysis, LSTM on raw audio signals or transformer models, which are generally more computationally intensive.

Instead, a more lightweight approach was adopted:

  • Feature extraction from the audio signal (MFCCs and spectral contrast)
  • Training a simple machine learning model (decision tree) on these extracted features

Potential Improvements (Not Yet Tested)

  • Hyperparameter tuning for better performance
  • Exploring alternative lightweight ML models, such as logistic regression or k-nearest neighbors
  • Feature extraction without Librosa, using NumPy directly to compute basic signal properties, further reducing dependencies and overhead.

The model is exported from the notebook notebooks\Audio_Challenge.ipynb and saved as model_audio.pkl

πŸ“š Text classification

Evaluate locally

To evaluate the model locally, you can use the following command:

python main.py --config config_evaluation_{model_name}.json

where {model_name} is either distilBERT or embeddingML.

Models Description

DistilBERT Model

The model uses the distilbert-base-uncased model from the Hugging Face Transformers library, fine-tuned on the training dataset (see below).

Embedding + ML Model

The model uses a simple embedding layer followed by a classic ML model. Currently, the embedding layer is a simple TF-IDF vectorizer, and the ML model is a logistic regression.