Piper Makes Local Text-to-Speech Stupid Easy (Here’s Why)

You know that moment when Home Assistant tries to speak to you and it takes three seconds to respond? Or when you realize Google’s been keeping recordings of every time your smart home said something? Yeah, I’m done with that.

I just ditched cloud TTS entirely for Piper, and honestly, I’m kicking myself for not doing it sooner. This is a self-hosted, CPU-based neural text-to-speech engine that produces genuinely natural-sounding speech in dozens of languages. It’s fast, it’s private, it’s open source, and it runs on basically nothing for hardware. If you’re running a homelab and you haven’t heard of it, this is the nudge you need.

Why Piper Beats Every Cloud TTS Service

Let’s be real: cloud TTS is convenient until it isn’t. You’re paying per request, your data leaves your network, there’s latency, and you’re locked into whatever voice Google or Amazon decides to give you.

Piper does none of that. It runs entirely locally on your CPU — no GPU required. I’m running it on a 2-core VM with 2GB RAM and response times are sub-100ms. That’s fast.

The voice quality is genuinely impressive. We’re talking neural network stuff, but trained specifically for speed and local inference. Dozens of languages, multiple voice options per language, and the accents don’t sound like a robotic nightmare. I’ve got English, Spanish, and German models loaded and they all sound natural enough that I don’t cringe when they announce things in my house.

And here’s the kicker: it’s completely private. No cloud calls, no analytics, no subscription fees. Your data never leaves your network. For a homelab setup where you care about that stuff, this is huge.

The Install (It’s Stupidly Easy)

If you’re running this in Docker — and you absolutely should be — here’s literally all you need:

version: '3.8'
services:
  piper:
    image: rhasspy/piper:latest
    container_name: piper
    restart: unless-stopped
    ports:
      - "10200:10200"
    volumes:
      - ./piper-data:/home/piper/.local/share/piper
    environment:
      - PIPER_SPEAKER=en_US-lessac-medium

That’s it. Spin it up, hit localhost:10200, and you’ve got a working TTS endpoint. The first run downloads the voice model (usually 50-100MB depending on which one you pick), and then you’re golden.

If you want to get fancy with Traefik for reverse proxy access or load balance it across multiple containers, sure, do that. But honestly, the basic setup takes five minutes tops.

Wiring It Into Home Assistant (The Smart Part)

Home Assistant already knows about Piper natively. This is actually where Piper got famous — the Home Assistant team built it into their system for exactly this reason.

Add this to your configuration.yaml:

tts:
  - platform: piper
    language: en_US
    voice: lessac-medium

Then in an automation, you can do stuff like:

The gear I run for this

Hardware from my own homelab, relevant to this guide — direct Amazon links.

Raspberry Pi 5 (8GB)The ultimate homelab starter. Run Pi-hole, Home Assistant, lightweight AI, and Docker containers.

~AED 370

Hailo-8L M.2 AI Accelerator13 TOPS M.2 AI chip. Drop it into your NAS or mini PC for real-time video analytics and AI workloads.

~AED 150

Beelink SER5 Mini PC (Ryzen 5)Compact Proxmox host. Run Docker, VMs, and lightweight AI workloads with 16GB RAM.

~AED 900

Affiliate links — I earn a small commission at no extra cost to you. Browse my full homelab store →

service: tts.piper_say
data:
  message: "Motion detected in the garage"
  entity_id: media_player.living_room_speaker

The whole thing just works. No API keys, no internet required, and the latency is so low you won’t notice any delay between the trigger and the announcement. I’ve got it calling out zone names when motion is detected, telling me when the dishwasher is done, reminding me when the garage door is open. It’s *actually useful* because it’s so responsive.

Voices, Languages, and Why Medium Trumps Large

Here’s a pro tip nobody mentions: don’t automatically grab the “high” or “large” models just because they exist.

I tested lessac-high vs lessac-medium side-by-side and honestly, the medium voice sounds better for announcements and alerts. It’s also 30% faster and uses half the memory. The large models are great if you’re doing long-form narration, but for homelab stuff? Medium is the sweet spot.

Piper ships with models for English, Spanish, German, French, Italian, Portuguese, Dutch, Russian, and more. Want to switch languages? Just change the speaker variable. It’s that simple.

Grab the full list here: https://huggingface.co/rhasspy/piper-voices. Browse it, pick what sounds natural to your ear, and update your Docker env var. Done.

What I’m Using Piper For (And What You Could Too)

This is where it gets fun. Beyond Home Assistant announcements, here’s what I’ve built:

Smart doorbell notifications: When someone rings the bell, Piper announces it through my speakers with the person’s name (detected by object recognition).
Temperature alerts: If my server room hits 28°C, it tells me before it becomes a problem.
Security status checks: Voice query “Is the house locked?” and it responds with current door/window status.
Custom automations: Integration with Node-RED for complex announcements based on multiple conditions.

The API is just HTTP POST. You can call it from anywhere in your stack: Proxmox event scripts, custom Python apps, bash one-liners, whatever. Throw JSON at it with text and a speaker, get back audio. That’s the entire contract.

The Real Talk: What Piper Isn’t

Let me be honest about the limitations so you’re not disappointed.

Piper is not a speech-to-text system. For that, you want Whisper.cpp. It’s not a full voice assistant. It’s just TTS — one direction, text in, speech out. And while the quality is great for a local system, it’s not going to fool anyone into thinking it’s a human. But it doesn’t need to. Your smart home doesn’t need a Hollywood voice actor; it needs reliability and low latency, and Piper crushes that.

Performance-wise, I haven’t hit any ceiling yet. Running three simultaneous TTS requests on my setup doesn’t even blink. YMMV depending on your hardware, but CPU usage sits around 15-20% per synthesis.

Deployment Tips Nobody Tells You

If you’re running Piper in production (and you should be, it’s that solid), pin your voice model versions in your compose file:

environment:
  - PIPER_SPEAKER=en_US-lessac-medium

This matters because model updates can change synthesis behavior. You want consistency.

Also, mount your piper-data volume on persistent storage. Models are downloaded once and cached. Don’t re-download them on every container restart.

If you’re running multiple languages, yeah, your storage footprint grows, but we’re talking a few hundred MB total. Not a concern for anyone with a homelab.

One more thing: if you’re proxying Piper through Traefik or nginx, make sure you’re allowing POST requests and not caching responses. Each TTS call generates unique audio output based on the input text.

Closing Thought

Piper is one of those homelab tools that feels almost too good to be true. It’s free, it’s open source, it works better than paid alternatives, and it gives you full control over your data. If you’re serious about building a truly independent homelab instead of renting features from cloud providers, this is non-negotiable.

Spin up a container tonight. You’ll wonder why you didn’t do it sooner.

Explore Piper in our AI Homelab Toolkit.