Spaces:

teticio
/

audio-diffusion

Runtime error

App Files Files Community

teticio commited on Sep 8, 2022

Commit

aedf71e

1 Parent(s): b7f49a5

consolidate notebooks

Browse files

Files changed (3) hide show

README.md +4 -6
notebooks/test_model.ipynb +24 -51
notebooks/test_model_breaks.ipynb +0 -0

README.md CHANGED Viewed

@@ -9,14 +9,13 @@ app_file: app.py
 pinned: false
 license: gpl-3.0
 ---
-# audio-diffusion
 ### Apply [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) using the new Hugging Face [diffusers](https://github.com/huggingface/diffusers) package to synthesize music instead of images.
 ---
-**UPDATE**: I've trained a new [model](https://huggingface.co/teticio/audio-diffusion-breaks-256) on 30,000 samples that have been used in music, sourced from [WhoSampled](https://whosampled.com) and [YouTube](https://youtube.com). The idea is that the model could be used to generate loops or "breaks" that can be sampled to make new tracks. People ("crate diggers") go to a lot of lengths or are willing to pay a lot of money to find breaks in old records. See [`test_model_breaks.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model_breaks.ipynb) for details.
 ---
@@ -26,10 +25,9 @@ license: gpl-3.0
 Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the [`test_mel.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_mel.ipynb) notebook.
-A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the [`test_model.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model.ipynb) and [`test_model_breaks.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model_breaks.ipynb) notebooks for examples.
-You can play around with the model I trained on about 500 songs from my Spotify "liked" playlist on [Google Colab](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb) or [Hugging Face spaces](https://huggingface.co/spaces/teticio/audio-diffusion). Check out some automatically generated loops [here](https://soundcloud.com/teticio2/sets/audio-diffusion-loops).
 ---

 pinned: false
 license: gpl-3.0
 ---
+# audio-diffusion [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb)
 ### Apply [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) using the new Hugging Face [diffusers](https://github.com/huggingface/diffusers) package to synthesize music instead of images.
 ---
+**UPDATE**: I've trained a new [model](https://huggingface.co/teticio/audio-diffusion-breaks-256) on 30,000 samples that have been used in music, sourced from [WhoSampled](https://whosampled.com) and [YouTube](https://youtube.com). The idea is that the model could be used to generate loops or "breaks" that can be sampled to make new tracks. People ("crate diggers") go to a lot of lengths or are willing to pay a lot of money to find breaks in old records.
 ---
 Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the [`test_mel.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_mel.ipynb) notebook.
+A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio.
+You can play around with the model on [Google Colab](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb) or [Hugging Face spaces](https://huggingface.co/spaces/teticio/audio-diffusion). Check out some automatically generated loops [here](https://soundcloud.com/teticio2/sets/audio-diffusion-loops).
 ---

notebooks/test_model.ipynb CHANGED Viewed

@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "0fd939b0",
    "metadata": {},
    "source": [
     "<a href=\"https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
@@ -51,6 +51,26 @@
     "from audiodiffusion import AudioDiffusion"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "011fb5a1",
@@ -61,12 +81,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
    "id": "a3d45c36",
    "metadata": {},
    "outputs": [],
    "source": [
-    "audio_diffusion = AudioDiffusion(model_id=\"teticio/audio-diffusion-256\")"
    ]
   },
   {
@@ -112,7 +132,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ds = load_dataset('teticio/audio-diffusion-256')"
    ]
   },
   {
@@ -168,53 +188,6 @@
     "Audio(data=audio, rate=mel.get_sample_rate())"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "946fdb4d",
-   "metadata": {},
-   "source": [
-    "### Push model to hub"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "37c0564e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from diffusers.hub_utils import init_git_repo, push_to_hub\n",
-    "\n",
-    "\n",
-    "class AttributeDict(dict):\n",
-    "\n",
-    "    def __getattr__(self, attr):\n",
-    "        return self[attr]\n",
-    "\n",
-    "    def __setattr__(self, attr, value):\n",
-    "        self[attr] = value\n",
-    "\n",
-    "\n",
-    "args = AttributeDict({\n",
-    "    \"hub_model_id\":\n",
-    "    \"teticio/audio-diffusion-256\",\n",
-    "    \"output_dir\":\n",
-    "    \"../ddpm-ema-audio-256-repo\",\n",
-    "    \"local_rank\":\n",
-    "    -1,\n",
-    "    \"hub_token\":\n",
-    "    open(os.path.join(os.environ['HOME'], '.huggingface/token'), 'rt').read(),\n",
-    "    \"hub_private_repo\":\n",
-    "    False,\n",
-    "    \"overwrite_output_dir\":\n",
-    "    False\n",
-    "})\n",
-    "\n",
-    "repo = init_git_repo(args, at_init=True)\n",
-    "ddpm = DDPMPipeline.from_pretrained('../ddpm-ema-audio-256')\n",
-    "push_to_hub(args, ddpm, repo)"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,

  "cells": [
   {
    "cell_type": "markdown",
+   "id": "0a627a6f",
    "metadata": {},
    "source": [
     "<a href=\"https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
     "from audiodiffusion import AudioDiffusion"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "7fd945bb",
+   "metadata": {},
+   "source": [
+    "### Select model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97f24046",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#@markdown teticio/audio-diffusion-256        - trained on my Spotify \"liked\" playlist\n",
+    "#@markdown teticio/audio-diffusion-256-breaks - trained on samples used in music\n",
+    "model_id = \"teticio/audio-diffusion-256\"  #@param [\"teticio/audio-diffusion-256\", \"teticio/audio-diffusion-256-breaks\"]"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "011fb5a1",
   },
   {
    "cell_type": "code",
+   "execution_count": 4,
    "id": "a3d45c36",
    "metadata": {},
    "outputs": [],
    "source": [
+    "audio_diffusion = AudioDiffusion(model_id=model_id)"
    ]
   },
   {
    "metadata": {},
    "outputs": [],
    "source": [
+    "ds = load_dataset(model_id)"
    ]
   },
   {
     "Audio(data=audio, rate=mel.get_sample_rate())"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,

notebooks/test_model_breaks.ipynb DELETED Viewed

The diff for this file is too large to render. See raw diff