MusicGen Tutorial (2026): Generate AI Music Locally, Free

You know that moment when you’re building a video, streaming, or just need ambient music and realize you’re either paying Spotify, licensing fees, or trapped in some SaaS subscription? Yeah, I was there too. Then I discovered MusicGen running locally on my homelab, and I haven’t looked back.

This is genuinely one of those tools that feels like cheating. Describe what you want — “lo-fi chill beats with jazz piano and light rain” — and it generates a full track. No waiting for human composers. No subscriptions. No watermarks. Just pure AI-generated audio, running on your own hardware.

Here’s the kicker: it takes about 10 minutes to set up, and it’s open source.

Why MusicGen Deserves Real Estate on Your Homelab

I’ve been running MusicGen for a few months now, and it’s genuinely replaced my reliance on royalty-free music libraries and loop packs. The quality is solid, it’s fast on even modest GPUs, and you own everything it creates outright.

The real magic is the text-to-music generation. You describe exactly what you want, and it understands genre, mood, instrumentation, and pacing. “Upbeat electronic dance track with heavy bass and synths” gives you something completely different from “medieval tavern ambiance with lute and distant thunder.” It actually gets it.

There’s also melody conditioning — you can hum or upload a MIDI file, and MusicGen will generate a full orchestration around it. I’ve used this to turn rough voice memos into polished background tracks for client projects. It’s absurdly good for something running on your own box.

Best part? It integrates beautifully with your existing homelab. Home Assistant automation workflows, Plex metadata enrichment, streaming setups, creative projects — wherever you need audio generation, MusicGen slots right in.

The Install (It’s Stupidly Easy)

If you’re running Docker on your homelab (and you should be), this is a five-minute job. Here’s the reality: MusicGen isn’t some complicated beast. It’s a Python package with a REST API.

You’ve got two paths: Hugging Face’s hosted demo (zero setup, but cloud-dependent) or self-hosted using Docker. Obviously, we’re doing self-hosted.

Create a docker-compose.yml:

version: '3.8'
services:
  musicgen:
    image: docker.io/library/python:3.11-slim
    container_name: musicgen
    working_dir: /app
    volumes:
      - ./musicgen_outputs:/app/outputs
    ports:
      - "7860:7860"
    command: >
      bash -c "
      pip install -q gradio torch torchaudio audiocraft &&
      python -c 'from audiocraft.models import MusicGen; m = MusicGen.get_model("large"); print("Model loaded")' &&
      python app.py
      "
    environment:
      - GRADIO_SERVER_NAME=0.0.0.0
      - GRADIO_SERVER_PORT=7860
    restart: unless-stopped

Then create app.py in the same directory:

The gear I run for this

Hardware from my own homelab, relevant to this guide — direct Amazon links.

Raspberry Pi 5 (8GB)The ultimate homelab starter. Run Pi-hole, Home Assistant, lightweight AI, and Docker containers.

~AED 370

Beelink SER5 Mini PC (Ryzen 5)Compact Proxmox host. Run Docker, VMs, and lightweight AI workloads with 16GB RAM.

~AED 900

Crucial Pro 32GB DDR5 560032GB (2x16) DDR5 kit — the minimum for running LLMs and heavy Docker workloads locally.

~AED 500

Affiliate links — I earn a small commission at no extra cost to you. Browse my full homelab store →

import gradio as gr
from audiocraft.models import MusicGen
from audiocraft.data.audio_utils import convert_audio
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = MusicGen.get_model('large', device=device)

def generate_music(description, duration=30):
    model.generation_params = {'use_sampling': True, 'top_k': 250, 'duration': duration}
    wav = model.generate([description])
    return (16000, wav[0].cpu().numpy())

iface = gr.Interface(
    fn=generate_music,
    inputs=[
        gr.Textbox(label="Music Description", placeholder="e.g., 'upbeat dance track with heavy bass'"),
        gr.Slider(5, 60, value=30, step=5, label="Duration (seconds)")
    ],
    outputs=gr.Audio(label="Generated Music"),
    title="MusicGen",
    description="Generate music from text descriptions"
)

iface.launch(share=False)

Run it:

docker-compose up -d

Wait for the model to download (first run takes a few minutes), then hit http://localhost:7860. That’s it. You’re generating music now.

GPU matters here. I’m running this on a RTX 3060 and generating 30-second tracks takes about 15-20 seconds. On CPU? You’re looking at 2-3 minutes. Still faster than waiting for a human composer, but yeah, get a GPU if you can.

Actually Using It (Real Workflows)

Text prompts are the gateway drug. But here’s where it gets practical:

Background music for videos. I use this for client project intros and B-roll. Instead of hunting royalty-free sites, I describe the vibe in one sentence and regenerate until it fits. Takes 5 minutes instead of an hour of digging.

Ambient soundscapes. Running a stream or podcast? Generate theme music, intro music, transitions. “Cinematic sci-fi ambiance with pad synths and subtle electronic pulses” beats buying a $30 music pack.

Melody conditioning. This is the power move. Have a hummed melody or MIDI file? Upload it, and MusicGen arranges it into a full track. I’ve turned rough voice notes into polished audio in one pass.

Home Assistant automation. Hook MusicGen’s API into automation scripts. Play generated music at specific times, create personalized notifications with audio, or trigger music generation based on triggers. Your homelab can literally compose on demand.

Proxmox cluster integration. Running this in a VM or container on your Proxmox setup? Excellent. I’ve got it on a dedicated LXC container with a GPU passthrough. Scales beautifully if you want to batch-generate multiple tracks.

Reality Check: What Works, What Doesn’t

MusicGen is genuinely impressive, but it’s not magic. Vocals are rough — don’t expect it to sing lyrics. If you need specific samples or sounds, it might not nail it first try. And yes, it’s limited by your hardware; cloud services will be faster if you’ve got 500 videos to process.

That said, for background music, ambient audio, jingles, and experimental stuff? It’s genuinely better than most alternatives I’ve tried. The quality floor is respectably high.

The model sizes matter. The large model (3.1GB) gives you best quality. medium (1.4GB) is faster and still solid. small (700MB) for testing. Start with large if you have the VRAM — you won’t regret it.

Putting It in Your Homelab Stack

Running Traefik for reverse proxies? Slap MusicGen behind it with a subdomain and you’ve got a proper self-hosted music generation API accessible from anywhere on your network.

Want to make it accessible without exposing your homelab? Throw Cloudflare Tunnel in front of it and generate music from anywhere. Just be sensible about rate limiting.

The outputs directory is where your generated tracks live. Back them up, organize them by project, throw them in a Plex music library if you’re feeling cute.

This is legitimately one of those tools that makes you wonder why you were doing it the old way. Stop paying for music subscriptions. Stop hunting royalty-free libraries. Just describe what you need and let your homelab compose it for you.

Explore MusicGen in our AI Homelab Toolkit.