Audio Classification
Transformers
Safetensors
pytorch_model_hub_mixin
model_hub_mixin
gender-classification
VoxCeleb
Instructions to use JaesungHuh/voice-gender-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JaesungHuh/voice-gender-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="JaesungHuh/voice-gender-classifier")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("JaesungHuh/voice-gender-classifier", dtype="auto") - Notebooks
- Google Colab
- Kaggle
metadata
tags:
- pytorch_model_hub_mixin
- model_hub_mixin
- gender-classification
- VoxCeleb
license: mit
datasets:
- ProgramComputer/voxceleb
pipeline_tag: audio-classification
Voice gender classifier
- This repo contains the inference code to use pretrained human voice gender classifier.
- You could also try 🤗Huggingface online demo.
Installation
First, clone the original github repository
git clone https://github.com/JaesungHuh/voice-gender-classifier.git
and install the packages via pip.
cd voice-gender-classifier
pip install -r requirements.txt
Usage
import torch
from model import ECAPA_gender
# You could directly download the model from the huggingface model hub
model = ECAPA_gender.from_pretrained("JaesungHuh/voice-gender-classifier")
model.eval()
# If you are using gpu ....
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Load the audio file and use predict function to directly get the output
example_file = "data/00001.wav"
with torch.no_grad():
output = model.predict(example_file, device=device)
print("Gender : ", output)
Pretrained weights
For those who need pretrained weights, please download it in here
Training details
State-of-the-art speaker verification model already produces good representation of the speaker's gender.
I used the pretrained ECAPA-TDNN from TaoRuijie's repository, added one linear layer to make two-class classifier, and finetuned the model with the VoxCeleb2 dev set.
The model achieved 98.7% accuracy on the VoxCeleb1 identification test split.
Caveat
I would like to note the training dataset I've used for this model (VoxCeleb) may not represent the global human population. Please be careful of unintended biases when using this model.
Reference
- Original github repository
- I modified the model architecture from TaoRuijie's repository.
- For more details about ECAPA-TDNN, check the paper.