How indexing works
oxcode index runs one pipeline — scan → extract → resolve → store → format —
that turns a source tree into a queryable code graph in the embedded
oxgraph database. The resolver, graph schema,
storage, and PageRank context are language-neutral; only the extractor changes
per language.
Scan
Walk the project, respecting ignore rules, and select the files with a registered extractor. Recognized-but-unindexed files are reported as skipped, never silently dropped.
Extract
tree-sitter parses each source file into a syntax tree, and a per-language
extractor walks it — hand-written for the high-fidelity
languages, or a shared tree-sitter query for the generic
ones. Extraction emits symbol nodes (file, module, class, struct, trait,
interface, function, method, field, …) and edges (contains, calls,
imports, references, implements). Qualified names are normalized to a
::-joined internal form regardless of the language's own separator, so the
resolver and graph stay language-neutral.
Resolve
References resolve to definitions across files through ordered tiers: exact qualified name → enclosing module scope → in-scope imports → receiver type → bare name. Ambiguous matches are kept and marked, not dropped, so a best-effort edge is still navigable.
Store
The resolved graph is reconciled into oxgraph-db with stable symbol
identities: unchanged symbols and edges keep their ids and emit zero
mutations. That is what makes re-indexing O(change) rather than O(repo) —
the per-reindex write-ahead log shrinks from hundreds of megabytes to a few. See
oxgraph Benchmarks for the numbers.
Format
Navigation commands expand graph ids back into agent-readable context —
function names, definition ranges, signatures, docstrings, source previews, and
call-site source — through the report DTOs in oxcode-model.
What PageRank does
Personalized PageRank over the stored graph ranks symbols by centrality. The
context command (and the oxcode_explore MCP tool) seed it on the query's
entry points, then expand nearby calls, contains, references, and
implements relationships within a byte budget. The result is bounded and
relevance-ranked rather than a raw dump — deterministic and graph-derived, not a
re-embedding on every call.
Re-indexing
Because identities are stable, you re-index by re-running oxcode index. The
.oxcode/manifest.json content digest tells oxcode what actually changed, so
only changed files are re-extracted and only changed subjects are written.