Claude Context: Semantic Code Search for AI Coding Agents

Claude Context is an MCP plugin that gives Claude Code (and other AI agents) semantic search across your entire codebase. ~40% token savings. Review + setu

Developer searching codebase with AI semantic search via Claude Context MCP

Updated 14 June 2026 How we review →

By Rob14 June 2026 · 5 min read

One of the awkward realities of using Claude Code (or any AI coding agent) on a real-world codebase is that the agent only knows what you put in front of it. Show it the file you're editing and it does great. Ask it about a function in another package and you're either reading files yourself to figure out which ones to pin into context, or you're burning tokens letting the agent grep around blindly.

Claude Context is a Model Context Protocol (MCP) plugin from the Zilliz team that fixes this. Once installed, your codebase gets indexed into a vector database, and the agent can do natural-language semantic search across it - finding the relevant 200 lines instead of you having to feed it the relevant 20 files.

How it actually works

The architecture splits into three parts:

Core indexing engine - a TypeScript library that walks your codebase, chunks the source files, generates embeddings via your chosen provider (OpenAI, VoyageAI, Ollama, Gemini), and stores them in Milvus or Zilliz Cloud.
MCP server - exposes the index to AI agents that speak Model Context Protocol. Claude Code is the headline target; any MCP client works.
VSCode extension - lets you use the same index from inside your editor for human-driven semantic search.

The retrieval is hybrid: BM25 for keyword precision plus dense-vector embeddings for semantic similarity. That combination tends to outperform either one alone on code search, because exact function names + similar-but-not-identical patterns both contribute to the result ranking.

What the install looks like

You need two API keys before you start: one for the vector database (Zilliz Cloud is the SaaS option; self-hosted Milvus is the free path) and one for the embedding provider (OpenAI is the easy default; Ollama works if you want to keep embeddings on local hardware).

With those in hand, registering the MCP plugin against Claude Code is a one-liner:

claude mcp add claude-context \
  -e OPENAI_API_KEY=sk-... \
  -e MILVUS_ADDRESS=... \
  -e MILVUS_TOKEN=... \
  -- npx @zilliz/claude-context-mcp@latest

Then inside Claude Code, ask it to 'index this codebase' and it kicks off the chunking + embedding job. Indexing time depends on codebase size and your embedding provider's throughput - small repos are minutes, large monorepos an hour-plus.

Once indexed, you query semantically: "show me how we handle webhook retries" or "where do we validate user-supplied URLs?" and the agent gets back the relevant snippets without you needing to know the file paths upfront.

Why ~40% token savings is realistic

The 40% saving figure comes from how AI coding agents handle code-discovery without semantic search. Without indexing, the typical pattern is: agent runs grep/find, parses the file list, opens 8-12 files looking for the relevant function, then reads 2-3 of them in full before finding what it actually needed. Each of those reads is context-window tokens you pay for.

With Claude Context's semantic search, the same task collapses to one query that returns the 200 relevant lines directly. On a medium-sized codebase (~100k lines of code) the difference shows up immediately on tasks like "refactor this function across all callers" or "add a feature that touches our auth + billing modules".

For a heavy Claude Code user the cumulative effect over a working month is material - particularly if you're on usage-based pricing rather than a fixed subscription.

Caveats and limitations

Things worth knowing before you commit setup time:

You need a vector database: either pay Zilliz Cloud (free tier is generous for individual codebases; pricing scales with stored vectors) or run Milvus on a server. Not a one-click install if you're allergic to infrastructure.
Embedding API costs aren't free either: indexing a 100k-line codebase costs a few dollars on OpenAI's text-embedding-3-small. Ongoing re-indexing on file changes adds more, though incrementally smaller amounts. Self-hosted Ollama embeddings remove this entirely if you've got the GPU.
Supported file types: covers TypeScript, JavaScript, Python, Java, Go, Rust and several other mainstream languages. Niche file types may need explicit inclusion rules.
Configuration matters: file-inclusion and exclusion rules need tuning for your project. Default settings work but you'll get better retrieval after a pass at the config.

Should you bother?

If you use Claude Code (or any MCP-compatible AI agent) for more than light single-file edits, and you've ever found yourself manually pinning files into context to help the agent navigate, Claude Context is worth the afternoon of setup. The hybrid BM25 + dense-vector retrieval is the right architecture for code search, and the MCP integration means it slots in without disrupting your existing agent workflow.

At 11.8k GitHub stars and 28 releases tagged, this is one of the more mature MCP plugins in the ecosystem - well past the experimental stage. The Zilliz team has clear commercial interest in keeping it polished (it drives adoption of their vector database), which usually means longer-term maintenance commitment than hobbyist projects.

For local-AI / self-hosting folks specifically: the Ollama embedding support plus self-hosted Milvus means you can run the whole stack without any data leaving your network. That's the configuration that matters most for anyone using AI agents on private codebases or regulated data.