Privacy-first semantic search for scholars who work with serious text collections. Your books, your metadata, your machine.
RAG — retrieval-augmented generation — is how an AI reasons over your own texts: it retrieves passages and feeds them to the model. But most RAG is naive: everything goes in as chunks, everything comes out as chunks. A footnote weighs the same as an abstract; tag schemas, cross-references, tables of contents, chapter boundaries — invisible. Generic models stay generic precisely where you need them sharpest: in your own field. While the RAG market leader just reversed course and now compiles structure upstream of retrieval, Archilles uses what’s already there.
A scholar’s library carries two kinds of structure already. Outside each document: the tags, highlights, cross-references and reading notes that took a decade to settle. Inside each document: the table of contents, chapters and headings — the architecture of the text itself. Archilles indexes both — structure-aware chunking on the inside, your curation on the outside — and hands the model the edges naive retrieval cannot recover. The hardest part of intelligent search is intelligence. Scholars bring their own.
No cloud uploads, no telemetry, no tracking. Full sovereignty over research data that took you years to build.
Every result points to a precise location in your source. Hallucinations are for dreamers, not scholars.
Connect Archilles to Claude Desktop, ChatGPT*, Codex, or any MCP-compatible assistant. stdio and HTTP/SSE transport both shipped.
Source adapters auto-detect your library layout. No migration, no proprietary format. Over thirty file types supported.
Search German, English, Latin, Greek and French together. BGE-M3 embeddings understand meaning across languages.
Custom fields, reading status, project tags, annotations. The organising work you already did becomes a research superpower.
* ChatGPT Desktop does not currently support custom MCP servers. The MCP guide documents a working bridge. ↩
Set a library path to your Zotero, Obsidian vault, Calibre folder, or any directory.
Batch-index by tag, author, or the whole library. GPU-accelerated, resumable, crash-safe.
Hybrid semantic plus keyword search with RRF fusion and optional cross-encoder reranking.
Results with page numbers and chapters. Export to BibTeX, RIS, EndNote, JSON, CSV.
Find all discussions of trade routes between Mediterranean and Northern Europe before 1500.
Searches across Latin primary sources, German monographs, and English translations simultaneously.
Trace the motif of unreliable narrators across these fifty twentieth-century novels.
Finds passages that demonstrate the concept — even when the texts never name it.
Compare views on the hard problem of consciousness across Chalmers, Dennett and Nagel.
Precise name matching meets semantic understanding of the underlying concepts.
Find all precedents on liability for AI-generated decisions across my EU law collection.
Commentary, case law and regulatory texts searched in one pass. Custom fields act as filters.
Archilles is open source and in active development. The code lives on GitHub. Field notes and longer-form thinking on Substack and LinkedIn.