You know that stack of documents on your desk? The one that’s been there for six months? Yeah, that’s going away today. I’ve been running Paperless-ngx in my homelab for almost a year now, and it’s genuinely the best “set it and forget it” tool I’ve deployed. It scans, OCRs, tags, and organizes everything automatically — and because it’s self-hosted, your documents never touch anyone else’s servers.
Most people don’t realize how expensive the “paperless” services are. Evernote wants $15/month. Microsoft’s document scanning is locked behind Microsoft 365. Meanwhile, Paperless-ngx is free, open source, and runs on a Docker container so tiny it’ll fit on a Raspberry Pi. You’re literally leaving money on the table if you’re not using this.
What Paperless-ngx Actually Does (And Why It’s Better Than the Hype)
This isn’t just a PDF organizer. Paperless-ngx is an AI-powered document intelligence system that lives in your infrastructure.
You throw a document at it — scan a receipt, upload a utility bill, drop a contract — and it automatically:
- Extracts text via OCR (using Tesseract, which is solid)
- Detects the document source (“This is from Amex”, “This is from your landlord”)
- Applies tags intelligently based on content
- Makes everything searchable by actual text, not just filename
- Stores it in a database you control
The magic part? Once it learns your patterns, it does this automatically for every new document. Upload a bill, it tags it as “bills” and associates it with the company. Upload a receipt, it catches the date, amount, and vendor without you typing anything.
I’ve got roughly 8,000 documents indexed now. Last week I needed a receipt from three years ago. One search, found it in 0.2 seconds. Try doing that with a filing cabinet.
The Install (It’s Stupidly Easy)
Seriously, if you can run Docker, you can have this running in under 10 minutes. Here’s the Docker Compose setup I use:
version: '3.8'
services:
paperless-ngx:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: paperless
restart: unless-stopped
ports:
- "8000:8000"
environment:
PAPERLESS_SECRET_KEY: your-super-secret-key-here
PAPERLESS_ALLOWED_HOSTS: "*"
PAPERLESS_DEBUG: "false"
PAPERLESS_ENABLE_COMPRESSION: "true"
PAPERLESS_ENABLE_MLKIT: "false"
PAPERLESS_OCR_LANGUAGE: "eng"
PAPERLESS_CONSUMER_POLLING: "5"
PAPERLESS_TIME_ZONE: "America/New_York"
volumes:
- ./paperless/data:/usr/src/paperless/data
- ./paperless/media:/usr/src/paperless/media
- ./paperless/export:/usr/src/paperless/export
- ./paperless/consume:/usr/src/paperless/consume
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health_check/"]
interval: 30s
timeout: 10s
retries: 5
That’s it. docker-compose up -d and you’re live at http://localhost:8000. Default login is admin / admin (change this immediately, obviously).
The consume folder is where the magic happens — drop PDFs or images there, and Paperless automatically processes them. I’ve got mine pointing to a Samba share so I can drag-and-drop from my laptop.
Making It Actually Useful (Smart Tagging and Automation)
Out of the box, Paperless-ngx is good. But if you spend 20 minutes configuring it, it becomes genuinely incredible.
Set up correspondents first. These are the document sources — “Amex”, “Landlord”, “Electric Company”, whatever. Once you create a few and feed Paperless examples, it learns to detect them automatically. I’ve got about 30 set up, and it catches maybe 95% on first try.
Then use tags strategically. Don’t go crazy — I use about 20 across 8,000 documents: “Bills”, “Medical”, “Tax”, “Receipts”, “Insurance”, “Contracts”, “Personal”, etc. The AI learns what gets tagged what based on content. A medical bill? Automatically tagged as both “Bills” and “Medical”.
Document types are optional but useful if you want to sort by invoice vs. receipt vs. letter. I don’t bother — search is powerful enough.
Pro tip: Use the API to integrate with Home Assistant or Node-RED for automated workflows. When a bill arrives, send yourself a notification. When it’s tax season, create a smart report. The possibilities are endless if you’re willing to tinker.
The Performance Reality (What You Need to Know)
Paperless-ngx is not a lightweight app if you’re running it at scale. OCR is CPU-intensive. Indexing 8,000 documents took about 4 hours on my Ryzen 5 with parallel workers enabled.
Here’s what I’d recommend:
- Raspberry Pi 4: Works fine for casual use (few hundred docs). Don’t expect instant OCR.
- Any modern NAS: Absolutely perfect. Synology, TrueNAS, whatever — this is exactly what they’re built for.
- Proxmox VM or dedicated Docker host: Best performance. Give it 4+ CPU cores, 4GB RAM, and you’re golden.
Storage-wise, budget about 2-5MB per document depending on OCR quality. My 8,000 docs are around 30GB with media and backups.
The biggest gotcha? Initial setup is slow. But after that, it chugs along silently. I process about 50 documents a week now, and my machine barely notices.
Why This Beats Every Commercial Option
I could rant about this forever, but the simple math: Evernote wants $15/month forever. That’s $180/year. Microsoft wants you locked into Office 365. Evernote doesn’t let you export easily (terrible for data portability).
Paperless-ngx costs zero. It’s open source. You own your data. You can export everything as PDFs, search indexes, whatever. If you decide to move to something else in five years, you’re not trapped.
Plus, self-hosting means you’re not letting some SaaS company scrape your documents for training data. Your bills, receipts, contracts — they stay on your infrastructure. That alone is worth it.
And honestly? The UI is way cleaner than Evernote’s. Search is better. The workflow is faster. I’m not exaggerating when I say this replaced three different services for me.
One More Thing: Backups Are Non-Negotiable
Your documents are now your single source of truth. Don’t be dumb about backups. I run a nightly backup to cold storage (B2) using a simple script. Takes 10 minutes to write, saves months of heartbreak if your drive dies.
Paperless has built-in export features. Use them. Export your database monthly. You’ll thank me when you need it.
Set up Paperless-ngx this weekend. Scan your desk into oblivion. By next month, you’ll have a fully indexed, searchable archive of every important document you own. No subscriptions. No SaaS nonsense. Just you, your documents, and a machine that actually respects your data. That’s the whole point of a homelab, isn’t it?
Explore Paperless-ngx in our AI Homelab Toolkit.
Recommended Hardware & Hosting
Build your homelab with hardware tested and used by our team.
Affiliate links — we may earn a small commission at no extra cost to you.