llms.txt Content
# gnosis-mcp
> Self-hosted documentation search server for AI agents. Zero-config SQLite,
> hybrid FTS5 + vector search, optional cross-encoder reranking, git history
> indexing, web crawl, REST API. Python 3.11+. MIT licence.
gnosis-mcp indexes developer documentation (Markdown, plain text, Jupyter
notebooks, TOML, CSV, JSON, optional reStructuredText and PDF) into a local
database and exposes it through the Model Context Protocol. AI coding agents
(Claude Code, Cursor, Windsurf, Claude Desktop, VS Code, JetBrains, Cline)
search the private corpus instead of hallucinating API signatures or pasting
large files into context.
## What it is
- **An MCP server.** Implements the Model Context Protocol, the open standard
for connecting LLMs to tools and data.
- **Zero-config.** `pip install gnosis-mcp && gnosis-mcp ingest ./docs && gnosis-mcp serve`.
No Docker, no external services, no API keys required.
- **SQLite-first.** Default storage is SQLite with FTS5 full-text search.
Optionally swap to PostgreSQL with pgvector for scale.
- **Hybrid search.** Keyword (BM25) + semantic (local ONNX embeddings) merged
via Reciprocal Rank Fusion. Tunable via `GNOSIS_MCP_RRF_K`.
- **Optional reranking.** Opt-in `[reranking]` extra adds a 22M-parameter
ONNX cross-encoder to re-score top candidates.
- **Git history as searchable docs.** `gnosis-mcp ingest-git <repo>` turns
commit messages into searchable context.
- **Web crawl.** `gnosis-mcp crawl <url>` — sitemap discovery or BFS, robots.txt
compliance, same-host redirect guard, trafilatura extraction.
## Measured numbers (v0.11.0, laptop CPU)
- Search speed (SQLite FTS5, in-memory, median of 3 runs):
- 100 docs / 300 chunks: **9,463 QPS**, p95 **0.16 ms**
- 1 000 docs / 3 000 chunks: 2,768 QPS, p95 0.72 ms
- 5 000 docs / 15 000 chunks: 839 QPS, p95 2.97 ms
- 10 000 docs / 30 000 chunks: 471 QPS, p95 5.60 ms
- End-to-end through MCP stdio protocol: **8.7 ms mean, 13.0 ms p95** per tool call.
- Retrieval qua