Archilles · v0.9

Ask your library.
Get cited answers.

Privacy-first semantic search for scholars who work with serious text collections. Your books, your metadata, your machine.

LocalOpen source · MITPython 3.11+
ArchillesResearch library
semantic search
MMXXVIZotero · Obsidian
Calibre · folders

Informed RAG.

Problem

Naive retrieval has no edges.

RAG — retrieval-augmented generation — is how an AI reasons over your own texts: it retrieves passages and feeds them to the model. But most RAG is naive: everything goes in as chunks, everything comes out as chunks. A footnote weighs the same as an abstract; tag schemas, cross-references, tables of contents, chapter boundaries — invisible. Generic models stay generic precisely where you need them sharpest: in your own field. While the RAG market leader just reversed course and now compiles structure upstream of retrieval, Archilles uses what’s already there.

Solution

Built on the structure that’s already there.

A scholar’s library carries two kinds of structure already. Outside each document: the tags, highlights, cross-references and reading notes that took a decade to settle. Inside each document: the table of contents, chapters and headings — the architecture of the text itself. Archilles indexes both — structure-aware chunking on the inside, your curation on the outside — and hands the model the edges naive retrieval cannot recover. The hardest part of intelligent search is intelligence. Scholars bring their own.

What it does, precisely.

I · Local processing

Your books never leave your machine.

No cloud uploads, no telemetry, no tracking. Full sovereignty over research data that took you years to build.

II · Exact citations

Page, chapter, section — every time.

Every result points to a precise location in your source. Hallucinations are for dreamers, not scholars.

III · MCP native

Twelve tools, any MCP client.

Connect Archilles to Claude Desktop, ChatGPT*, Codex, or any MCP-compatible assistant. stdio and HTTP/SSE transport both shipped.

IV · Any source

Zotero, Obsidian, Calibre — or a folder.

Source adapters auto-detect your library layout. No migration, no proprietary format. Over thirty file types supported.

V · Multilingual

Seventy-five languages, one query.

Search German, English, Latin, Greek and French together. BGE-M3 embeddings understand meaning across languages.

VI · Metadata amplified

Tags, notes, highlights — all searchable.

Custom fields, reading status, project tags, annotations. The organising work you already did becomes a research superpower.

* ChatGPT Desktop does not currently support custom MCP servers. The MCP guide documents a working bridge.

Four steps from folder to citation.

01

Point

Set a library path to your Zotero, Obsidian vault, Calibre folder, or any directory.

02

Index

Batch-index by tag, author, or the whole library. GPU-accelerated, resumable, crash-safe.

03

Search

Hybrid semantic plus keyword search with RRF fusion and optional cross-encoder reranking.

04

Cite

Results with page numbers and chapters. Export to BibTeX, RIS, EndNote, JSON, CSV.

Built for serious readers.

V.I

Historians

Find all discussions of trade routes between Mediterranean and Northern Europe before 1500.

Searches across Latin primary sources, German monographs, and English translations simultaneously.

V.II

Literary Scholars

Trace the motif of unreliable narrators across these fifty twentieth-century novels.

Finds passages that demonstrate the concept — even when the texts never name it.

V.III

Philosophers

Compare views on the hard problem of consciousness across Chalmers, Dennett and Nagel.

Precise name matching meets semantic understanding of the underlying concepts.

V.IV

Legal Scholars

Find all precedents on liability for AI-generated decisions across my EU law collection.

Commentary, case law and regulatory texts searched in one pass. Custom fields act as filters.

Read seriously.
Search seriously.

Archilles is open source and in active development. The code lives on GitHub. Field notes and longer-form thinking on Substack and LinkedIn.

Open source · No tracking · MIT licensed