SillyTavern on my homelab: the install I wrote down

I kept spinning up SillyTavern, forgetting the exact sequence three weeks later, and having to re-learn where the config files lived. This time I wrote it down.

🎯 Not sure if this will run on your hardware?Use our free Local LLM Hardware Checker — pick your GPU and RAM, see which models will run with real tokens/sec estimates.

Check my hardware →

SillyTavern screenshot — SillyTavern u2014 from the official site

The appeal is straightforward: you get a proper UI for talking to local language models—the kind you run on your own hardware via Ollama or KoboldCpp. No API keys, no monthly bills, no history shared with anyone. You can build character cards, set up world details, run group conversations. The interface is polished enough that you don’t feel like you’re wrestling with a science project.

What you need first

You’ll want Node.js 18 or later. I have 20.x running, which works fine. A running instance of Ollama or KoboldCpp is assumed—if you haven’t set either up yet, do that first. They’re separate installs. SillyTavern is just the frontend layer talking to them.

A few GB of free disk space, nothing crazy. 8GB of RAM minimum on the machine running SillyTavern itself (not necessarily where your model runs). I’m doing this on an old Ryzen 5, bare metal, not in a container.

Getting it running in about fifteen minutes

Clone the repo first. It’s substantial—around 800MB after dependencies.

git clone https://github.com/SillyTavern/SillyTavern.git
cd SillyTavern
npm install

This will take a few minutes. If you see warnings about peer dependencies or deprecated packages, that’s normal. They don’t block the install.

Before you start the server, check the config file. It lives at config.json in the root directory. You probably don’t have one yet. Copy the example:

cp config.example.json config.json

Now edit it. Open config.json in whatever editor you use. The defaults are mostly fine, but find this section and make sure it matches where your model backend actually lives:

"api_server": "http://localhost:5000",
"api_type": "kobold"

If you’re using Ollama instead, change that to:

"api_server": "http://localhost:11434",
"api_type": "ollama"

Port 11434 is Ollama’s default. If you’ve remapped it, use whatever you actually set. Same goes for KoboldCpp—most people run it on 5000, but check your launch command.

The config file also has settings for where to store characters and chat logs. By default they go in a data folder relative to wherever you run the server from. That’s fine unless you’re paranoid about keeping data outside the install directory. I leave it as-is.

The gear I run for this

Hardware from my own homelab, relevant to this guide — direct Amazon links.

NVIDIA RTX 3060 (12GB)The sweet spot for local AI. 12GB VRAM runs Stable Diffusion, Ollama 13B models, and Whisper comfortably.

~AED 1,300

Beelink SER5 Mini PC (Ryzen 5)Compact Proxmox host. Run Docker, VMs, and lightweight AI workloads with 16GB RAM.

~AED 900

Crucial Pro 32GB DDR5 560032GB (2x16) DDR5 kit — the minimum for running LLMs and heavy Docker workloads locally.

~AED 500

Affiliate links — I earn a small commission at no extra cost to you. Browse my full homelab store →

Start the server:

npm start

You’ll see output telling you which port it bound to. By default, that’s 8000. Open a browser and go to http://localhost:8000. You should see the SillyTavern UI immediately.

First conversation

The interface will ask you to connect to a backend. Go to Settings (the gear icon, top right) and scroll to “API Connection.” Tell it which backend you’re using and where it lives. If you already fixed the config.json file, it might auto-detect this. If not, you’re setting it manually here.

Hit “Test Connection.” If your Ollama or KoboldCpp instance is actually running, it will say so. If it fails, the error message usually tells you what’s wrong—port mismatch, service not running, firewall issue.

Once that passes, you can start a chat. Click the big plus button to create a new character. You can write one from scratch or paste in a character card if you have one lying around. The format is flexible—SillyTavern handles most card styles.

The first response might take a while depending on your model and hardware. I’m using Mistral 7B, which takes maybe five to ten seconds per response on a 5700X. Larger models will be slower. Smaller ones faster. This is not a surprise. That’s why you’re running this yourself instead of hitting an API.

The one thing I always forget

CORS headers. If you’re trying to access SillyTavern from a different machine on your network, the browser will block cross-origin requests by default. The fix is to set an environment variable before you start the server:

export CORS=true
npm start

Or bake it into your launch script. I have a shell script that does this so I don’t have to remember.

Without it, you’ll get cryptic network errors in the browser console. It looks like your backend isn’t responding, but actually the frontend isn’t even allowed to try. Took me an embarrassing amount of time to remember this the second time I set it up.

What surprised me

The memory system works better than I expected. You can define “world info”—details about the setting, characters, or scenario—and SillyTavern injects that context into the prompt automatically. I thought this would be fragile or hit token limits fast. It’s not. It’s one of the things that makes extended roleplay actually coherent instead of the conversation drifting into nonsense after ten exchanges.

Also: it’s extensible. There’s a whole plugin system for things like regex replacements, custom API calls, or piping text through other tools. I haven’t dug into that yet, but knowing it exists is good. The base functionality is complete enough that I don’t need to customize much.

One thing that didn’t surprise me: the UI can feel sluggish if you’re on older hardware. The JavaScript bundle is not tiny. It works fine on my Ryzen machine, but I wouldn’t try running this on a Raspberry Pi 4. You need actual CPU horsepower, not just enough RAM.

Keeping it running long-term

If you want SillyTavern to start on boot, add it to systemd or just throw it in a screen session with a simple wrapper script. Nothing fancy needed.

Backups: your character cards and chat logs live in that data folder. If you care about preserving conversations, point it somewhere that gets backed up. I use a symlink to my NAS.

Updates come fairly regularly. New characters, bug fixes, features. Just pull the repo, run npm install again to pick up dependency changes, and restart. I’ve never had a breaking change that required config tweaks.

One last thing: if you’re running a model locally and also trying to do other work on the same machine, expect some friction. A large model inference will peg your CPU or GPU. SillyTavern won’t cause this by itself—that’s your model doing work—but it’s worth knowing the bottleneck isn’t the UI layer.

Explore SillyTavern in our AI Homelab Toolkit.

AI character-chat koboldcpp local-models node.js ollama self-hosted