How indexing works

oxcode index runs one pipeline — scan → extract → resolve → store → format — that turns a source tree into a queryable code graph in the embedded oxgraph database. The resolver, graph schema, storage, and PageRank context are language-neutral; only the extractor changes per language.

Scan

Walk the project, respecting ignore rules, and select the files with a registered extractor. Recognized-but-unindexed files are reported as skipped, never silently dropped.

Extract

tree-sitter parses each source file into a syntax tree, and a per-language extractor walks it — hand-written for the high-fidelity languages, or a shared tree-sitter query for the generic ones. Extraction emits symbol nodes (file, module, class, struct, trait, interface, function, method, field, …) and edges (contains, calls, imports, references, implements). Qualified names are normalized to a ::-joined internal form regardless of the language's own separator, so the resolver and graph stay language-neutral.

Resolve

References resolve to definitions across files through ordered tiers: exact qualified name → enclosing module scope → in-scope imports → receiver type → bare name. Ambiguous matches are kept and marked, not dropped, so a best-effort edge is still navigable.

Store

The resolved graph is reconciled into oxgraph-db with stable symbol identities: unchanged symbols and edges keep their ids and emit zero mutations. That is what makes re-indexing O(change) rather than O(repo) — the per-reindex write-ahead log shrinks from hundreds of megabytes to a few. See oxgraph Benchmarks for the numbers.

Format

Navigation commands expand graph ids back into agent-readable context — function names, definition ranges, signatures, docstrings, source previews, and call-site source — through the report DTOs in oxcode-model.

What PageRank does

Personalized PageRank over the stored graph ranks symbols by centrality. The context command (and the oxcode_explore MCP tool) seed it on the query's entry points, then expand nearby calls, contains, references, and implements relationships within a byte budget. The result is bounded and relevance-ranked rather than a raw dump — deterministic and graph-derived, not a re-embedding on every call.

Re-indexing

Because identities are stable, you re-index by re-running oxcode index. The .oxcode/manifest.json content digest tells oxcode what actually changed, so only changed files are re-extracted and only changed subjects are written.