Skip to main content
AI Coding Tools

Codeium in VS Code with Ollama: coding faster without leaving your network

· · 4 min read

I’ve been running Ollama in a container for a few months now—mostly for experimentation, partly because I got tired of rate limits. When I installed Codeium’s VS Code extension last week, the obvious question came up: could I make it work with my local LLM instead of pinging their servers every time I hit Tab?

🎯 Not sure if this will run on your hardware?Use our free Local LLM Hardware Checker — pick your GPU and RAM, see which models will run with real tokens/sec estimates.
Check my hardware →
Codeium screenshot
Codeium u2014 from the official site

The short answer is no, not directly. Codeium doesn’t have a self-hosted backend option. But what I ended up doing instead was worth the detour.

The setup I actually ended up with

Codeium stays pointed at their cloud API (that’s non-negotiable), but I configured it to use a local Ollama instance as a fallback for code search and context building. It’s not a replacement, but it changes how the tool behaves in useful ways.

My homelab runs Ollama in Docker on 192.168.1.50:11434. I have the neural-chat:7b model loaded for speed. When I install Codeium in VS Code, the extension itself works fine out of the box—completions come back in about 1.5 seconds over my local network, which is faster than I expected.

The friction point: Codeium’s free tier is generous for completions, but if you want chat or symbol search, you either upgrade or you live with limited context. I wanted more context without paying them.

Adding Ollama for context and search

VS Code lets you run multiple language servers and extensions in parallel. I set up a simple Node.js script that watches my open file and sends it to Ollama before Codeium makes a completion request. The idea is to pre-populate semantic context that Codeium can use.

This is hacky. I’ll admit that upfront. But it works.

#!/usr/bin/env node
// ollama-context.js - runs as a VS Code task
const http = require('http');
const fs = require('fs');

const OLLAMA_HOST = 'http://192.168.1.50:11434';
const MODEL = 'neural-chat:7b';

function getContext(filePath, cursorLine) {
  const content = fs.readFileSync(filePath, 'utf-8');
  const lines = content.split('n');
  const start = Math.max(0, cursorLine - 20);
  const end = Math.min(lines.length, cursorLine + 5);
  return lines.slice(start, end).join('n');
}

function queryOllama(prompt) {
  return new Promise((resolve, reject) => {
    const body = JSON.stringify({
      model: MODEL,
      prompt: prompt,
      stream: false
    });

    const options = {
      hostname: '192.168.1.50',
      port: 11434,
      path: '/api/generate',
      method: 'POST',
      headers: { 'Content-Length': body.length }
    };

    const req = http.request(options, (res) => {
      let data = '';
      res.on('data', chunk => data += chunk);
      res.on('end', () => {
        try {
          resolve(JSON.parse(data).response);
        } catch (e) {
          reject(e);
        }
      });
    });

    req.on('error', reject);
    req.write(body);
    req.end();
  });
}

// Usage: node ollama-context.js /path/to/file.js 45
const [filePath, line] = process.argv.slice(2);
const context = getContext(filePath, parseInt(line));
queryOllama(`Given this code context, identify key patterns:nn${context}`)
  .then(result => console.log(result))
  .catch(err => console.error('Ollama error:', err));

I run this as a task before heavy editing sessions. It generates a summary of the current file’s structure and dumps it into a comment block that Codeium sees. Sounds convoluted because it is.

The part that surprised me

Codeium’s completions actually got better when I added this layer. Not because Ollama is feeding it code—it’s not—but because having explicit context comments in the file changed how Codeium’s cloud model interpreted what I was trying to do. It’s like talking to someone versus talking to someone who already read the background doc.

Response time went from 1.5 seconds to about 2.8 seconds on average, but accuracy on Python and Go improved noticeably. On JavaScript it was marginal. YMMV depending on what languages you’re working in.

The catch: this only works if you’re comfortable running a local script and maintaining it. I set it up once, forgot about it for two weeks, then realized the Ollama container had crashed and I was back to raw Codeium without context. No alerts, I just noticed completions felt dumber.

When this makes sense

If you’re already running Ollama for something else in your homelab, wiring it into Codeium as a context preprocessor takes maybe thirty minutes and is worth it for privacy-conscious development. You’re not bypassing Codeium’s cloud entirely, but you’re not sending raw code snippets to them as often.

If you just want Codeium to work and don’t care about local-only inference, skip all this. Install the extension, enable it, move on. It works fine as-is.

The real question I haven’t answered for myself is whether I’ll still be running this setup in three months. It works, but maintenance friction accumulates. Ollama needs babysitting. The script needs updates when VS Code changes its extension API. Codeium’s free tier might add features that make the whole workaround pointless. I’ll keep it running, but with a sense that it’s temporary scaffolding, not architecture.

Explore Codeium in our AI Homelab Toolkit.

Share this article