I have a filing cabinet full of documents I will never organize by hand. Bank statements, medical records, old invoices, tax stuff. They sit there. So I bought a decent scanner, looked at commercial solutions, and decided to run Paperless-ngx in Docker instead. It’s open source, indexes everything into a searchable database, and doesn’t phone home. Four months in, I’ve scanned about 800 documents and actually found one. That’s already a win.
Why this instead of just scanning to a folder
You could throw PDFs into a folder and use desktop search. I did that for years. It mostly works until you have 2000 documents and you’re trying to remember which bank statement contained that one transaction. Paperless-ngx does OCR on everything, so you search the text inside the scans, not just the filename. It auto-tags documents by correspondent (the company that sent it) and type (invoice, statement, receipt). That alone saves time.
What surprised me: the AI tagging is actually useful. It learns from documents you tag manually and starts suggesting tags. I was skeptical it would work in a homelab environment, but it does. Not perfect—it occasionally suggests tags that make no sense—but it works well enough that I don’t have to manually categorize everything.
The catch is setup. Docker, PostgreSQL, Redis, Tesseract for OCR, a reverse proxy if you want it accessible outside your network. It’s not hard, but it’s not zero-effort either.
Getting it running on Docker
I’m running this in a Proxmox LXC with Ubuntu 22.04, 4 cores, 4GB RAM. That’s overkill for light use but fine if you plan to scan regularly. Here’s a working docker-compose:
version: '3.4'
services:
db:
image: postgres:15
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: your_secure_password_here
volumes:
- pgdata:/var/lib/postgresql/data
restart: unless-stopped
redis:
image: redis:7-alpine
restart: unless-stopped
volumes:
- redisdata:/data
paperless-ngx:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
depends_on:
- db
- redis
environment:
PAPERLESS_REDIS: redis://redis:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBNAME: paperless
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: your_secure_password_here
PAPERLESS_SECRET_KEY: generate_a_random_string_here_min_32_chars
PAPERLESS_TIME_ZONE: America/New_York
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: your_admin_password
PAPERLESS_ENABLE_COMPRESSION: 'true'
ports:
- "8000:8000"
volumes:
- ./data:/home/paperless/data
- ./media:/home/paperless/media
- ./export:/home/paperless/export
restart: unless-stopped
volumes:
pgdata:
redisdata:
Save that as docker-compose.yml, make sure the three volume directories exist, then docker-compose up -d. It’ll take a minute or two to initialize the database. Once it’s running, hit http://your-host:8000 and log in with the admin credentials you set.
The environment variables are where the magic happens. PAPERLESS_OCR_LANGUAGE defaults to English, but you can add multiple languages if you need them: PAPERLESS_OCR_LANGUAGE: eng+deu for English and German, for example. I stuck with English since that’s 99% of what I scan.
First run: import and configure
Once the container is up, go to the web interface and poke around. You’ll see settings under the gear icon. The important ones:
- Consumption settings: Where Paperless watches for new documents. You can set it to watch a folder on your server, or upload through the web UI. I have a
/home/paperless/media/documents/inboxfolder mounted to my scanner’s network share, so I drop scans there and they get picked up automatically every minute. - Processing settings: Tweak OCR threads. If you only have 2 CPU cores, leave this at 1. More cores, you can bump it up. I’m at 2 on a 4-core system and OCR takes about 2-3 seconds per page.
- Email settings: Optional, but nice. You can forward emails to Paperless and it’ll save them as documents. I use this for bills I get emailed.
The first document you import, manually set a correspondent and document type. Hit save. After five or six documents, the AI starts making suggestions. By document twenty, it’s usually right. Don’t overthink the tagging structure at the beginning—you can retag everything later if needed.
The bit that caught me off guard
Paperless needs a lot of disk space if you keep the originals. A 500-page book scanned at 300 DPI is about 1.5GB. I have about 100GB allocated to storage and I’m at 35GB after 800 documents. If you’re scanning decades of paperwork, budget for it. You can enable PAPERLESS_ENABLE_COMPRESSION to trade a little CPU for smaller PDFs, which I did. It helps.
Also: the search is powerful but not instant. If you’re searching for something specific, use the filter bar on the left—select correspondent, document type, date range—and narrow it down. Full-text search across thousands of documents takes a few seconds. It’s fine, just don’t expect Ctrl+F speed.
What to do next
Set up a routine. Once a week or twice a month, scan your mail and receipts in batch. Don’t let it pile up. I learned this the hard way—I left a stack of bank statements unopened for two months, then tried to backfill them all at once. It’s tedious either way, but at least scanning as you go means you catch problems faster.
If you want it accessible outside your network, stick it behind a reverse proxy (I use Nginx Proxy Manager) and enable PAPERLESS_ALLOWED_HOSTS to match your domain. Don’t expose port 8000 directly to the internet.
The export feature is good for peace of mind. Every few months I export everything as PDF + metadata so I’m not locked into the database. Takes a few minutes, not a backup strategy, but it exists.
I’m still building a habit around this. Some days I remember to scan things immediately. Other days I find a pile of documents in the inbox folder I forgot about. It’s better than paper in a cabinet, but it’s not magic—you still have to actually use it.
Explore Paperless-ngx in our AI Homelab Toolkit.
Recommended Hardware & Hosting
Build your homelab with hardware tested and used by our team.
Affiliate links — we may earn a small commission at no extra cost to you.