I've been running an LLM Wiki on my Obsidian vault for almost half a year

Karpathy's LLM Wiki idea, applied. Knowledge graph plus automated inbox. What works, what breaks.

Andrej Karpathy described the LLM Wiki idea: your second brain is a corpus that an LLM compiles into pages an LLM can read back. The compiler writes, the linter checks, the agent regenerates. Files, not RAG. The vault is the source of truth.

Translation: I have a folder of notes, and an AI helps me write, lint, and re-read them like a private Wikipedia. The notes themselves are the database. Asking the AI a question routes through the notes, not through a vector store.

I have been running a version of this on a real Obsidian vault for almost half a year.

What it looks like in practice

Three pieces, all local:

A vault with a few thousand Markdown notes spread across PARA folders. Health monographs, project logs, daily notes, ideas. Standard Obsidian.
A knowledge graph sitting next to the vault as a SQLite file. Subject, predicate, object triples. Around 1,400 of them at last count, across roughly 135 canonical predicates. Populated by the skills that operate on health and bloodwork notes. (The Starter zip ships the schema, not these triples. Those are mine.)
Skills. A dozen small Markdown files that tell Claude Code what to do when I type a slash command. The slash command is the entry point. The skill loads, the agent reads the relevant parts of the vault, the agent writes or rewrites a note, a diary entry logs what changed.

The skills are the program. The vault is the database. Claude Code is the runtime.

A concrete example

Last month I added a new compound to my health stack. The slash command was /health-review berberine. The skill walked the agent through nine phases: intake, contradiction check, citation chain, gap scan, methodology audit, master synthesis, assumption inventory, KG hygiene, output write.

The agent pulled dozens of papers from PubMed, cross-checked them against the existing compound monographs in the vault, found one contradiction in a recent review, flagged it, and wrote a long file at 2-areas/health/supplements/Berberine.md. It then added a handful of triples to the knowledge graph: berberine has_indication insulin_resistance, berberine synergy_with metformin, and so on.

The next slash command, /supplement-eval, asked: should I take this? The skill read the new file, the existing stack file, and the bloodwork PDFs, and answered: probably not at your current insulin sensitivity. With reasoning. With links to the source notes.

That round trip cost a few dollars in API spend and finished inside a coffee break. No human in the loop except the slash command and the read at the end.

What works

Vault as source of truth, not RAG. RAG was tried first. It missed cross-references between compounds because embedding similarity does not encode "interacts_with" or "antagonism_with"; explicit predicate triples did. Reading whole files into context plus a one-shot KG query beats vector retrieval for any task where I care about the reasoning chain.

The knowledge graph as a separate index works. The vault has the prose, the graph has the relations. Queries against the graph surface contradictions the agent missed on a single-file pass.

Skills as Markdown files work. Each one is between 100 and 400 lines. Version-controlled. Editable in the same Obsidian window where I read the output.

What broke

ChromaDB 1.x segfaulted on every write call from the MCP server. I downgraded to 0.6.3 and filed an issue upstream. Working since the downgrade.

The vault miner segfaults on directories with more than 40 immediate children. The fix is to chunk the inputs. I have not chunked the inputs. I work around it.

A stub-versus-monograph collision bug ate two evenings: an inbox note named the same as an existing supplement file silently overwrote the monograph because of macOS case-insensitive filesystem behavior. The fix was an explicit case-aware write check. The git log of that fix is the longest commit message in the repo.

About once a week, a skill returns garbage output because a frontmatter field changed shape and nothing checks. The fix is a JSON schema validator on every skill output. I have not built that yet. It's on the list.

Issue 02

Next week: the autoresearch loop. Same pattern, but the agent picks the next thing to research without me asking. It stops when the metric stops improving. Or when the API budget runs out.

If you want to run a version of this on your own vault, the Starter zip on Gumroad is the same skills, the same hooks, the same KG schema. Download it, drop it into your own vault, open it in Claude Code, and hit the slash commands.