MuQ & MuQ-MuLan

Static Badge Static Badge Static Badge Static Badge Static Badge

This is the official repository for the paper *"MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the paper.

In this repo, the following models are released:

  • MuQ(see this link): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
  • MuQ-MuLan(see this link): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.

Usage

To begin with, please use pip to install the official muq lib, and ensure that your python>=3.8:

pip3 install muq

Using MuQ-MuLan to extract the music and text embeddings and calculate the similarity:

import torch, librosa
from muq import MuQMuLan

# This will automatically fetch checkpoints from huggingface
device = 'cuda'
mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
mulan = mulan.to(device).eval()

# Extract music embeddings
wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
wavs = torch.tensor(wav).unsqueeze(0).to(device) 
with torch.no_grad():
    audio_embeds = mulan(wavs = wavs) 

# Extract text embeddings (texts can be in English or Chinese)
texts = ["classical genres, hopeful mood, piano.", "ไธ€้ฆ–้€‚ๅˆๆตท่พน้ฃŽๆ™ฏ็š„ๅฐๆ็ดๆ›ฒ๏ผŒ่Š‚ๅฅๆฌขๅฟซ"]
with torch.no_grad():
    text_embeds = mulan(texts = texts)

# Calculate dot product similarity
sim = mulan.calc_similarity(audio_embeds, text_embeds)
print(sim)

To extract music audio features using MuQ:

import torch, librosa
from muq import MuQ

device = 'cuda'
wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
wavs = torch.tensor(wav).unsqueeze(0).to(device) 

# This will automatically fetch the checkpoint from huggingface
muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
muq = muq.to(device).eval()

with torch.no_grad():
    output = muq(wavs, output_hidden_states=True)

print('Total number of layers: ', len(output.hidden_states))
print('Feature shape: ', output.last_hidden_state.shape)

Model Checkpoints

Model Name Parameters Data HuggingFace๐Ÿค—
MuQ ~300M MSD dataset OpenMuQ/MuQ-large-msd-iter
MuQ-MuLan ~700M music-text pairs OpenMuQ/MuQ-MuLan-large

Note: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.

License

The code is released under the MIT license.

The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.

Citation

@article{zhu2025muq,
      title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization}, 
      author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
      journal={arXiv preprint arXiv:2501.01108},
      year={2025}
} 
Downloads last month
25,379
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using OpenMuQ/MuQ-MuLan-large 8