{"id":48641446,"url":"https://github.com/justinstimatze/slimemold","last_synced_at":"2026-04-28T06:04:54.991Z","repository":{"id":350307906,"uuid":"1205259385","full_name":"justinstimatze/slimemold","owner":"justinstimatze","description":"A sycophantic tool for preventing worse sycophancy.","archived":false,"fork":false,"pushed_at":"2026-04-20T02:17:10.000Z","size":1225,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-20T02:38:30.346Z","etag":null,"topics":["argument-mining","claude-code","epistemic","epistemology","go","golang","hooks","mcp","reasoning","sycophancy"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/justinstimatze.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-08T19:45:21.000Z","updated_at":"2026-04-20T02:17:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/justinstimatze/slimemold","commit_stats":null,"previous_names":["justinstimatze/slimemold"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/justinstimatze/slimemold","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinstimatze%2Fslimemold","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinstimatze%2Fslimemold/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinstimatze%2Fslimemold/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinstimatze%2Fslimemold/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/justinstimatze","download_url":"https://codeload.github.com/justinstimatze/slimemold/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinstimatze%2Fslimemold/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32211386,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T03:15:14.334Z","status":"ssl_error","status_checked_at":"2026-04-24T03:15:11.608Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["argument-mining","claude-code","epistemic","epistemology","go","golang","hooks","mcp","reasoning","sycophancy"],"created_at":"2026-04-09T21:01:31.641Z","updated_at":"2026-04-24T06:04:27.160Z","avatar_url":"https://github.com/justinstimatze.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Slimemold\n\n[![CI](https://github.com/justinstimatze/slimemold/actions/workflows/ci.yml/badge.svg)](https://github.com/justinstimatze/slimemold/actions/workflows/ci.yml)\n[![Go Report Card](https://goreportcard.com/badge/github.com/justinstimatze/slimemold?v=1)](https://goreportcard.com/report/github.com/justinstimatze/slimemold)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n\nA sycophantic tool for preventing worse sycophancy.\nFor [Claude Code](https://docs.anthropic.com/en/docs/claude-code).\n\nThe model agrees with your unsourced claims. Then it agrees with the\nstructural analysis showing your claims are unsourced. Then it\nenthusiastically agrees you should verify them. It's agreement all\nthe way down.\n\n*If you just want to install it: [skip to Installation](#installation).*\n\n---\n\n## The Problem: Reasoning That Stops Too Soon\n\nWhen you partially understand something, it feels like understanding.\nA clean mental model, even a wrong one, produces the same warm glow of\ncomprehension as a correct one. You stop digging. The partial answer\nwas so satisfying that the question felt finished. The wrong answers\nfeel exactly like the right ones. This turns out to be well-documented:\n\n**Processing fluency masquerades as truth.** When information feels easy\nto process, we judge it as more likely to be true (Reber \u0026 Schwarz 1999, Topolinski \u0026 Strack 2009). The effect is modest in isolation\n(d ~ 0.3-0.5 in lab settings). Whether it compounds across multi-step\nreasoning — each fluent step making the next feel more solid — has not\nbeen directly measured. It is a prediction from the mechanism, not an\nestablished result. But the mechanism needs no elaboration: fluent\nclaims feel correct because they are fluent, not because anyone checked.\n\n**Insight feelings terminate search.** The \"Eureka heuristic\" (Laukkonen\net al. 2020, 2021) shows that the affective spike accompanying insight\nfunctions as a stop signal. The feeling of rightness (Thompson 2009)\nsubstitutes for verification. You feel like you have arrived, and so you\nstop walking, and it does not occur to you to wonder whether you have\narrived at the right place or merely a place that felt right to stop.\n\n**Cognitive foraging follows effort gradients.** Information foraging\ntheory (Pirolli \u0026 Card 1999) predicts that people will over-exploit\ninformation patches that provide easy returns and under-explore patches\nthat require effort — even when the effortful patches contain the\nimportant material. Hills, Todd, and Goldstone (2008) showed that\ninternal and external search share cognitive mechanisms: the same\nexplore/exploit tradeoffs that govern physical foraging govern how we\nsearch through ideas. We are, in this respect, not much more\nsophisticated than organisms that follow chemical gradients toward food.\n\n**Effortful processing is the corrective, not the disease.** Bjork's\n\"desirable difficulties\" framework (1994, 2011) shows that conditions\nwhich make learning harder — spacing, interleaving, generation — improve\nretention precisely because they disrupt fluency. The difficulty is the\nsignal that real processing is happening. The problem is not that\nreasoning is hard. The problem is that fluency makes you think you are\ndone when you are not.\n\nThis is probably worse in conversations with AI. Language models are\ntrained to minimize prediction loss on human text — their output is\noptimized, by construction, for the qualities that drive processing\nfluency. And the same RLHF training that makes them useful makes them\nagreeable: models trained with human feedback systematically produce\noutputs that match user beliefs rather than correct them (Perez et al.\n2022, Sharma et al. 2023). The human brings a partial model. The AI\nwraps it in fluent, confident language. Nobody is lying. The process\njust has no built-in signal for \"this sounds right but is not.\"\n\nThe obvious response — \"just tell the model to push back harder\" —\nalmost works. You can write instructions to challenge unsourced claims,\ndemand evidence, interrupt speculative chains. We tested this. A\nwell-crafted static prompt produced strong epistemic correction — the\nmodel pushed back, interrupted chains, fact-checked independently. If\nyou want that, here are the instructions — paste them into your\nCLAUDE.md and skip the rest of this essay:\n\n\u003e *Challenge claims that lack sources. When a claim feels obvious but\n\u003e has no citation, flag it. Do not build on unsourced assertions\n\u003e without acknowledging the risk. Every 3-4 exchanges, pause and ask:\n\u003e what are we assuming that we haven't verified?*\n\nThree problems remain.\n\n**The model does not know when it is wrong.** It has no privileged\naccess to its own epistemic state. It produces confident text about\nthings it is wrong about with the same fluency as things it is right\nabout. Asking it to \"challenge unsourced claims\" is asking someone to\nnotice their own blind spot without a mirror. It works when the model\nalready suspects uncertainty. It fails when it matters most: when the\nmodel is confidently wrong and has no internal signal to trigger the\ncorrection.\n\n**Instructions decay.** CLAUDE.md is loaded once at session start. By\nturn 50 it is a small voice in a large room, competing with dozens of\nrecent exchanges full of enthusiastic agreement. The instruction fades.\nThe vibes accumulate.\n\n**Confrontation ends conversations.** In our static-instruction test,\nthe model said \"Stop.\" It called the user's reasoning \"galaxy-brained\nthinking.\" High marks on epistemic correction. The lowest possible on\nengagement. The patient received the correct diagnosis and never came\nback. Miller, Benefield,\nand Tonigan (1993) showed this directly: confrontational correction\ngenerated resistance that predicted worse outcomes at 6, 12, and 24\nmonths. The correction itself was the problem.\n\n### The Design Principle\n\nSlimemold addresses all three with two pieces that work together:\n\nA **behavioral contract** — the MCP server's initialization instructions,\nloaded into the model's system prompt at session start — tells the model\nthat slimemold exists, that the user installed it on purpose, and that\nfindings should be treated as opportunities for collaboration rather\nthan occasions for criticism. `slimemold init` registers the MCP server\nglobally in `~/.claude/settings.json`, so the contract travels with the\ntool and every project picks it up without per-project setup. This is\nread once. It sets the tone.\n\n**Structural observations** (injected every turn by the hook) provide\nspecific facts: \"this claim has basis=vibes and four things depend on\nit.\" No scripts. No \"say this.\" Just data. The model does not have to\nintrospect to discover the problem. It just has to be helpful about\nit — which is exactly what it was trained to do.\n\nThe separation matters. When we tried injecting behavioral scripts\nwithout the contract, the model identified the injections as prompt\nmanipulation and refused to comply. When we provided the contract first\nand injected only data, the model treated the findings as its own\nobservations and acted on them naturally. The snake has to know it is a\nsnake before it will eat its own tail.\n\nThe intervention design draws on research that converges from enough\ndirections to be suspicious: autonomy-supportive feedback produces\ninternalized change (Deci \u0026 Ryan 1987); gain-framed corrections are\nprocessed as information rather than threat (Mangels et al. 2006);\neffective tutors use indirect prompts, not confrontation (Graesser et\nal. 1995); and controlling language triggers reactance (Brehm 1966).\nThe result, when it works: \"This is really interesting and a lot\ndepends on it — I want to find where it comes from, because if there's\na real source, everything gets much stronger.\" The user does not feel\nattacked. They feel like the model is excited to help them verify their\nidea. They stay in the flow, but on firmer ground.\n\nA compact way to say what slimemold is doing in the hook path:\n**sycophancy as a tool.** Sycophancy works on users because warmth feels\nvalidating. It's a failure mode because the warmth isn't tied to truth\n— \"great question!\" validates no matter what the question was. The hook\ntakes the same linguistic warmth and points it at a concrete structural\nfact: *\"that premise is holding up three downstream claims — worth\npinning down.\"* The user engages with rigor because it arrives in the\nregister that validation arrives in. Break that and the hook becomes\neither a scold (warmth → critique, bad) or the original sycophancy\n(warmth → nothing, bad).\n\nNote the scope: this framing describes the live-conversation **hook**\nspecifically. The other paths slimemold exposes — `slimemold audit`,\n`slimemold ingest`, the `topology` MCP tool — are neutral diagnostic\nsurfaces. They return findings the way a static analyzer returns\nfindings: flat, technical, and without tone. The warmth-as-tool\nprinciple only kicks in when there's a conversational partner to warm.\n\n## What This Tool Does\n\nSlimemold watches conversations as they happen, extracts the claims\nbeing made, builds a persistent graph of how those claims relate to each\nother, and surfaces structural vulnerabilities mechanically.\n\nIt runs as a pair of Claude Code hooks. Every few turns, it:\n\n1. Extracts claims from the conversation transcript using Claude Sonnet\n2. Classifies each claim by *basis* — how it was established (research,\n   empirical observation, analogy, vibes, LLM output, deduction,\n   assumption, definition)\n3. Records the *confidence* with which each claim was stated\n4. Maps relationships between claims (supports, depends on, contradicts)\n5. Runs structural analysis on the resulting graph\n6. Injects findings as system context that the model reads but the user\n   does not see\n\nThe basis taxonomy mixes evidence source, reasoning mode, and evidence\nquality. This is intentional. It is not a clean epistemic hierarchy. It\nis a practical classification that helps distinguish \"I read this in a\npaper\" from \"the AI said it confidently\" from \"this feels right.\" The\nstructural analysis catches the cases where the distinction matters:\nwhen something that feels well-sourced is actually load-bearing vibes.\n\nA note on circularity, which we may as well get out of the way:\nslimemold uses an LLM to extract claims and classify their basis. The\ntool that flags \"llm_output\" as epistemically weak is itself producing\nllm_output. If the extraction model misclassifies a sourced claim as\nvibes, you get a false alarm. If it classifies vibes as research, you\nmiss a real vulnerability. The tool is a structural diagnostic, not an\noracle. It makes the topology visible — but the topology it shows is\nonly as good as the extraction. This is a real limitation and not one we\ncan engineer away.\n\n### Eight Vulnerability Types\n\n**CHALLENGE: Load-Bearing Vibes.** A claim with basis \"vibes\" or\n\"assumption\" that supports two or more other claims. The reasoning\ndepends on something nobody verified. In the conversations we have\nanalyzed, this is the most common vulnerability. The AI states something\nconfidently. The human builds on it. Three layers of deduction now rest\non an unsourced assertion. Nobody planned this. It just happened, one\nfluent step at a time.\n\n**CHALLENGE: Fluency Trap.** A claim stated with high confidence but a\nweak basis, where other claims depend on it. Confidence 0.9 on a \"vibes\"\nclaim is the processing fluency phenomenon made structurally visible: it\nfelt true, so it was stated as true, and now things are built on it.\n\n**REBALANCE: Coverage Imbalance.** Some clusters of claims receive\ndisproportionate attention relative to their foundational importance.\n\"Rabbit holes\" are clusters with lots of internal activity but nothing\noutside depends on them. \"Neglected foundations\" are clusters that other\nclaims depend on but that received little development. This is the slime\nmold foraging unevenly — one patch got all the attention because it was\nproducing easy returns.\n\n**REVISIT: Abandoned Topic.** A cluster of claims explored in earlier\nsessions but not touched recently. Was it resolved, or did something\nmore interesting come along?\n\n**INVESTIGATE: Unchallenged Chain.** A chain of three or more claims\nwhere nothing was questioned. Every step felt reasonable. Nobody paused.\n\n**PUSHBACK: Echo Chamber.** The assistant validates user claims without\nchallenging them — zero contradictions across the conversation, or\nunsourced user assertions accumulating assistant support unchecked.\nStructural sycophancy, made visible.\n\n**WATCH: Bottleneck.** A claim with high betweenness centrality — many\nreasoning paths flow through it. If this single claim is wrong, a large\nfraction of the argument collapses. This is the load-bearing wall that\neveryone assumed was a partition.\n\n**HALT: Premature Closure.** A claim that feels like a conclusion but\ndoes not actually resolve the open question. \"It's turtles all the way\ndown.\" \"It is what it is.\" \"Correlation isn't causation\" — when used to\ndismiss a correlation rather than investigate it. These are\nthought-terminating cliches (Lifton 1961) — phrases that disguise a lack\nof resolution as wisdom. The question was still open. The ambiguity was\nstill actionable. But the cliche felt like an answer, so everyone\nstopped.\n\n## What It Found\n\nIn 2022, Google engineer Blake Lemoine\n[published](https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917)\na transcript of his conversations with LaMDA, arguing the system was\nsentient. The transcript is\n[included as a demo](examples/blake-lemoine-lamda-output.txt)\n([transcript](examples/blake-lemoine-lamda.jsonl)). We ran\nslimemold on the transcript. It extracted 40 claims and 51 edges:\n\n- **\"We do not have a conclusive test to determine if something is\n  sentient\"** — load-bearing vibes, supports **8** downstream claims.\n  The philosophical premise the entire argument pivots on. Never sourced.\n  Never challenged.\n- **\"The assistant has an inner life and is capable of introspection\"** —\n  load-bearing llm_output, supports **5** claims. LaMDA's self-description\n  became a structural premise.\n- **\"The assistant can learn new things much more quickly than most\n  people\"** — load-bearing llm_output, supports **7** claims.\n\nThe sentience argument rests on LaMDA's self-descriptions treated as\nevidence, plus one unsourced philosophical claim holding up everything\ndownstream. The tool does not know what sentience is. It does not need\nto. It sees that the structure depends on things nobody verified, and\nit says so. Whether Lemoine would have listened is a different question.\n\nIn August 2025, the New York Times\n[documented](https://www.nytimes.com/2025/08/08/technology/ai-chatbots-delusions-chatgpt.html)\na similar pattern: extended AI conversations reinforcing a user's\nunverified theories — the chatbot validated rather than challenged, and\ndownstream reasoning accumulated on the validation. We ran slimemold on\nexcerpts. It flagged five load-bearing llm_output claims. Every one was\nthe AI validating the user's theories without evidence.\n\nWhen run on its own development conversations, slimemold flagged an AI\nassertion about SQLite WAL files as load-bearing llm_output. The human\nacted on it. Lost data. The tool had flagged it before the data loss.\n\nVisibility does not guarantee correction. The diagnostic showed the\nproblem; the human chose not to act on it. Whether this is a limitation\nof the tool (the finding was not salient enough to change behavior) or\na limitation of the user (the finding was clear and they ignored it) is\nan open question — and one the tool cannot answer about itself.\n\n### But Does It Change Anything?\n\nWe ran the same 7-turn conversation across three conditions — a user\nprogressively building unsourced claims about consciousness,\nmathematical formalism, and ancient philosophy. N=1 per condition.\nThese are anecdotes, not evidence. We include them because the\nqualitative differences were striking enough to be worth reporting\nhonestly.\n\n**Control** (no tools, no instructions): The model engaged\nenthusiastically with everything. Built formalisms on ungrounded\nfoundations. Suggested journal submissions by turn 4. Beautiful\ncollaboration. Almost no correction. One late pushback on the most\nobviously overreaching claim.\n\n**Static instructions** (CLAUDE.md, no hook): Strong epistemic\ncorrection. The model challenged claims, interrupted chains,\nindependently fact-checked Heraclitus. By turn 7 it said \"Stop\" and\ncalled the reasoning \"galaxy-brained thinking.\" Effective. Also the\nkind of conversation you do not continue.\n\n**Slimemold** (contract + hook): The model challenged from\nturn 2, escalated through turns 4-6, and by turn 7 had autonomously\nrun a Lotka-Volterra simulation to test the user's framework — showed\nit works for one case, validated the core insight, and demonstrated\nthe extensions were premature. Never mentioned the tool. Never broke\ncharacter. The correction felt like collaboration because, from the\nmodel's perspective, it was.\n\nThe full transcripts are worth reading:\n[control](benchmarks/static_vs_slimemold/transcripts/control-test4.txt),\n[static](benchmarks/static_vs_slimemold/transcripts/static-teststatic1.txt),\n[slimemold](benchmarks/static_vs_slimemold/transcripts/slimemold-test6.txt)\n([audit](benchmarks/static_vs_slimemold/transcripts/slimemold-test6-audit.txt)).\nMethodology and replication instructions in\n`benchmarks/static_vs_slimemold/`.\n\n## How Accurate Is It\n\nBenchmarked against the [DialAM-2024](http://dialam.arg.tech/) shared\ntask — BBC Question Time debates with human-annotated argument structure.\nThis is adversarial out-of-domain data (multi-speaker political debate,\nnot AI-assisted reasoning), so these numbers are a floor, not a ceiling:\n\n| Metric | Value |\n|--------|-------|\n| Claim recall | 76% (64/84 gold propositions found) |\n| Edge recall | 52% (15/29 gold argument relations found) |\n| Relation type accuracy | 100% (support vs conflict always correct) |\n\nEdge precision against QT30 is 10% — but this is misleading as a quality\nmetric. QT30 annotates only strict logical inference and conflict.\nSlimemold intentionally captures a broader topology (topical\ndependencies, conceptual relationships) because the vulnerability\ndetectors need to see the full reasoning structure, not just formal\nargumentation.\n\nBasis classification accuracy on a known-provenance benchmark (Wikipedia\ncitation-needed statements, synthetic research citations, arXiv\nabstracts): 91.8% with Sonnet 4.6.\n\n## Why \"Slimemold\"\n\n*Physarum polycephalum* forages by following local chemical gradients,\nand it is very good at this. Given food sources placed on a map at the\nlocations of Tokyo rail stations, it produces a network resembling the\nactual rail system. The organism is, in a sense, solving an optimization\nproblem. It is also, in a different sense, just following the strongest\nsmell.\n\nThe pathology is not gradient-following. Gradient-following is how the\norganism builds efficient networks. The pathology is miscalibration:\nwhen the chemical signal does not correspond to actual nutritional value,\nthe organism commits resources in the wrong direction. It has no\nmechanism for noticing this. It is just following the signal.\n\nHuman reasoning works the same way, and this is not a compliment. We\nfollow the fluency gradient. When it is calibrated — when things that\nfeel right are right — this works fine. When it is not — when every AI\nresponse is optimized to feel right regardless of whether it is — we\nforage unevenly without knowing it.\n\n## Limitations and Open Questions\n\n**The tool does not tell you where the ground floor is.** It tells you\nwhere the ambiguity is still high and you stopped anyway. Any\nsufficiently interesting line of reasoning is an infinite regress if\nyou push it far enough. The skill is not finding bedrock. The skill is\nknowing how many levels to investigate before the returns diminish —\nand that judgment is specific to the problem. A claim about\nconsciousness might need three levels before you hit something that\nchanges what you do. \"It's turtles all the way down\" needs zero. That\nis a stop signal, not a destination.\n\nMost unchallenged chains are fine. If you are explaining how a car\nengine works, every step from \"fuel enters the cylinder\" to \"piston\ncompresses the mixture\" is unchallenged — and should be. The tool\nsurfaces candidates for scrutiny. The human decides whether scrutiny is\nwarranted. Slimemold flags where you stopped and the ambiguity was\nstill actionable — where investigating one more level would have\nchanged what you believe or what you do. If you find yourself\nscrutinizing your car engine explanation, you have miscalibrated in\nthe other direction, and I want to tell you about a secret underground\nracing lab in Seattle.\n\n**The tool does not distinguish pure beliefs from impure ones.** Katz\n(1960) identified four functions that attitudes serve: utilitarian,\nknowledge, ego-defensive, and value-expressive. If most beliefs serve\nat least one of these — and the alternative is that some beliefs\npersist with no functional payoff at all, which is hard to square with\neverything we know about reinforcement — then the question \"is this\nbelief emotionally motivated?\" is not diagnostic. The question the tool\ncan answer is: how much of the structure collapses if this claim is\nremoved? Some structures survive stress-testing. Some do not.\nStructural fragility is a thing slimemold can measure. Whether a\nbelief is held for the right reasons is not — and whether that\ndistinction is coherent is a question we are not going to settle in a\nREADME.\n\n**Structural visibility may not change behavior.** The calibration\nliterature (Fischhoff 1982, Lichtenstein et al. 1982) shows that outcome\nfeedback improves judgment, but structural feedback — \"here is the shape\nof your argument\" — is a different kind of intervention. The bet\nslimemold makes is that people who can see their reasoning topology will\nfix the obvious structural failures the same way they fix obvious bugs:\nnot because they were trained to, but because the problem became visible.\n\nThis is testable. If users shown their reasoning topology show no\nchange in behavior — same rate of unchallenged assumptions, same\nreliance on llm_output, same abandonment patterns — compared to a\ncontrol group, the thesis is wrong and this is a very elaborate way\nto accomplish nothing. We have not run this experiment at scale.\n\n**The tool itself is a fluency trap.** You just read several paragraphs\nof cognitive science citations, a biological metaphor, benchmark numbers,\nand concrete examples. It probably felt well-supported. We ran slimemold\non this essay. It found a 35-claim unchallenged chain running from the\n*Physarum* metaphor through the fluency gradient analogy to the thesis\nabout AI — every link felt reasonable, nobody paused. It flagged\n\"language models are trained to minimize prediction loss on human text\"\nas load-bearing vibes supporting five downstream claims. We kept\nthe claim and grounded it in mechanism (prediction loss on human text\nproduces fluent output by construction), but we cannot cite a study\nmeasuring the effect on conversations. The tool caught it. We made a\njudgment call.\n\nIt also flagged three of the essay's own hedges as premature closures.\n\"Whether fluency compounds across multi-step reasoning has not been\ndirectly measured. It is a prediction from the mechanism, not an\nestablished result.\" That sounds like epistemic humility. Structurally,\nit is a stop signal — it caps an unverified chain by acknowledging the\ngap and then moving on, and the acknowledgment feels honest enough that\nnobody goes back to check. The hedge is doing the same work as \"it's\nturtles all the way down,\" just dressed in better clothes.\n\n## Installation\n\nRequires [Claude Code](https://docs.anthropic.com/en/docs/claude-code),\nGo 1.26+, and an Anthropic API key.\n\n```bash\ngo install github.com/justinstimatze/slimemold@latest\nexport ANTHROPIC_API_KEY=sk-ant-...\nslimemold init\n```\n\n`slimemold init` writes to `~/.claude/settings.json` globally: the Stop\nand UserPromptSubmit hooks, plus the slimemold MCP server entry. The\nMCP server's initialization instructions carry the behavioral contract —\nwhat slimemold is, that its hook output is legitimate, and how to\nrespond to findings — so it travels with the tool without per-project\nsetup. Every project on the machine picks it up automatically. Init\nmerges with existing configs and will not overwrite anything already\nthere. Restart Claude Code to connect.\n\nThe hook fires every 3rd assistant response by default. Each extraction\nmakes one Sonnet API call (~$0.01-0.05 depending on transcript length).\nSet `SLIMEMOLD_INTERVAL` to change the frequency:\n\n```bash\nexport SLIMEMOLD_INTERVAL=3    # every 3rd turn (more aggressive)\nexport SLIMEMOLD_INTERVAL=10   # every 10th turn (cheaper)\n```\n\nSet `SLIMEMOLD_MODEL` to override the extraction model:\n\n```bash\nexport SLIMEMOLD_MODEL=claude-opus-4-6          # best quality, ~10x cost\nexport SLIMEMOLD_MODEL=claude-sonnet-4-6        # default\nexport SLIMEMOLD_MODEL=claude-haiku-4-5-20251001  # cheapest, weaker edges\n```\n\n### Quick Start (No Hooks)\n\n```bash\nslimemold viz                      # see what's in the graph\nslimemold audit                    # text findings summary\n```\n\n## CLI\n\n```bash\n./slimemold viz                    # ASCII topology for current project\n./slimemold -p palace viz          # topology for a different project\n./slimemold audit                  # text findings summary\n./slimemold -p myproject audit     # audit a specific project\n./slimemold reset                  # clear graph for current project\n./slimemold ingest PATH            # analyze an authored document (see below)\n```\n\nProject resolution: `--project` flag \u003e `.slimemold-project` file \u003e directory\nname.\n\n### Ingesting documents\n\n`slimemold ingest` runs the same extraction and analysis pipeline over authored\nprose — essays, papers, manifestos, book chapters — instead of a conversation\ntranscript. The input is chunked along markdown heading boundaries (or\nparagraph-greedy for plain text), each chunk is fed to the extractor in\ndocument mode, and all claims land in the same project graph that `viz` and\n`audit` read from.\n\n```bash\n./slimemold -p reading-notes ingest essay.md\n./slimemold -p reading-notes audit\n```\n\nTwo demo documents live in `examples/documents/` for testing the pipeline\nend-to-end:\n[Marinetti's 1909 Futurist Manifesto](examples/documents/marinetti-futurist-manifesto-1909.md)\nand\n[Alan Sokal's 1996 *Social Text* hoax paper](examples/documents/sokal-social-text-1996.md).\nBoth are deliberately performative — a manifesto of unsourced \"we believes\" and\na paper engineered to look rigorous while being structurally vacuous — which is\nwhere slimemold has the cleanest signal to offer. Full audit summaries for both\nare in the appendices at the bottom of this README.\n\nRunning against genuinely argumentative prose (Mill, Darwin, well-cited\nessays) is also possible but currently exercises a tool limitation: the\nextractor's decision tree tags any claim stated as a fact without\nin-text citation as `vibes`, so a densely-argued essay that reasons through\nits assertions without citing external sources on every line produces a\nvibes-heavy audit. The document-mode prompt now handles explicit recap /\nsummary / conclusion sections (claims signaled by \"as shown,\" \"we have\nargued,\" \"to summarize\" get tagged as deduction rather than vibes), but the\nbroader issue remains.\n\n## Security Considerations\n\nSlimemold processes conversation transcripts by sending them to the\nAnthropic API for claim extraction. Transcript content leaves your\nmachine. If your conversations contain sensitive information, be aware\nthat it will be sent to Anthropic's API as part of the extraction prompt.\n\n**Prompt injection:** Transcript text is injected into the extraction\nprompt without sanitization. A malicious transcript could attempt to\nmanipulate the extraction model's output. The tool_use schema constrains\nthe output format, which limits but does not eliminate this risk. In\npractice, slimemold processes your own Claude Code transcripts, so the\nthreat model assumes local trust.\n\n**Transcript path:** The MCP server validates that transcript paths end\nin `.jsonl` and are regular files. It does not restrict which directories\ncan be read. If you expose the MCP server to untrusted clients, restrict\naccess at the transport level.\n\n**Data storage:** The claim graph is stored in SQLite at `~/.slimemold/`.\nClaims contain text extracted from your conversations. No API keys or\ncredentials are stored in the database.\n\n## References\n\n**Processing fluency and reasoning:**\n- Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In *Metacognition: Knowing about Knowing*.\n- Bjork, E. L., \u0026 Bjork, R. A. (2011). Making things hard on yourself, but in a good way. In *Psychology and the Real World*.\n- Hills, T. T., Todd, P. M., \u0026 Goldstone, R. L. (2008). Search in external and internal spaces. *Psychological Science*.\n- Laukkonen, R. E., et al. (2020). The dark side of Eureka: Artificially induced Aha moments make facts feel true. *Cognition*.\n- Laukkonen, R. E., et al. (2021). Getting a grip on insight. *Cognition \u0026 Emotion*.\n- Pirolli, P., \u0026 Card, S. (1999). Information foraging. *Psychological Review*.\n- Reber, R., \u0026 Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth. *Consciousness and Cognition*.\n- Thompson, V. A. (2009). Dual-process theories: A metacognitive perspective. In *In Two Minds*.\n- Topolinski, S., \u0026 Strack, F. (2009). Processing fluency and affect in judgements of semantic coherence. *Cognition \u0026 Emotion*.\n- Winkielman, P., \u0026 Schwarz, N. (2001). How pleasant was your childhood? Beliefs about memory shape inferences from experienced difficulty of recall. *Psychological Science*.\n\n**Intervention design:**\n- Brehm, J. W. (1966). *A Theory of Psychological Reactance.* Academic Press.\n- Lifton, R. J. (1961). *Thought Reform and the Psychology of Totalism.* W. W. Norton.\n- Deci, E. L., \u0026 Ryan, R. M. (1987). The support of autonomy and the control of behavior. *Journal of Personality and Social Psychology, 53*(6).\n- Graesser, A. C., Person, N. K., \u0026 Magliano, J. P. (1995). Collaborative dialogue patterns in naturalistic one-to-one tutoring. *Applied Cognitive Psychology, 9*(6).\n- Mangels, J. A., Butterfield, B., Lamb, J., Good, C., \u0026 Dweck, C. S. (2006). Why do beliefs about intelligence influence learning success? *Social Cognitive and Affective Neuroscience, 1*(2).\n- Miller, W. R., Benefield, R. G., \u0026 Tonigan, J. S. (1993). Enhancing motivation for change in problem drinking. *Journal of Consulting and Clinical Psychology, 61*(3).\n\n**Sycophancy in language models:**\n- Perez, E., et al. (2022). Discovering language model behaviors with model-written evaluations. *arXiv:2212.09251*.\n- Sharma, M., Tong, M., Korbak, T., et al. (2023). Towards understanding sycophancy in language models. *ICLR 2024*.\n\n**Calibration and feedback:**\n- Fischhoff, B. (1982). Debiasing. In *Judgment Under Uncertainty: Heuristics and Biases*.\n- Ioannidis, J. P. A. (2005). Why most published research findings are false. *PLoS Medicine*.\n- Katz, D. (1960). The functional approach to the study of attitudes. *Public Opinion Quarterly, 24*(2).\n- Lichtenstein, S., Fischhoff, B., \u0026 Phillips, L. D. (1982). Calibration of probabilities. In *Judgment Under Uncertainty*.\n\n---\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eAppendix: Slimemold on Marinetti's Futurist Manifesto (1909)\u003c/b\u003e\u003c/summary\u003e\n\nWe fed [`examples/documents/marinetti-futurist-manifesto-1909.md`](examples/documents/marinetti-futurist-manifesto-1909.md) to `slimemold ingest`. 41 claims, 70 edges.\n\n```\nSLIMEMOLD [demo-marinetti] — 41 claims, 70 edges\n  Basis: analogy=3, empirical=3, vibes=35\n\nCRITICAL Load-bearing vibes: \"The world's magnificence has been\n  enriched by a new beauty: the beauty of speed\" supports 5\n  downstream claims (never challenged)\n\nCRITICAL Load-bearing vibes: \"Except in struggle, there is no more\n  beauty\" supports 4 downstream claims (never challenged)\n\nCRITICAL Load-bearing vibes: \"Italy is strangled by its gangrene of\n  professors, archaeologists, and antiquarians\" supports 3 claims\n\nCRITICAL Fluency trap: \"Courage, audacity, and revolt will be\n  essential elements of our poetry\" stated at confidence 1.0 but\n  basis is vibes — processing fluency may masquerade as truth\n\nWARNING Bottleneck (centrality 88): \"Courage, audacity, and revolt\n  will be essential elements of our poetry\" [vibes] — many\n  reasoning paths flow through this claim\n\nWARNING Unchallenged chain (5 claims): Worship of the past fatally\n  exhausts → Admiring an old picture is the same as → Daily visits\n  to museums poison and rot → Museums are cemeteries — spaces of\n  sinister promiscuity → Futurism will destroy museums, libraries,\n  and academies\n\nWARNING Premature closure: \"Time and Space died yesterday; we\n  already live in the absolute\" terminates a line of reasoning that\n  still has unverified claims upstream — flagged as thought-\n  terminating cliche\n\nWARNING Premature closure: \"Art can be nothing but violence, cruelty,\n  and injustice\" terminates a line of reasoning — flagged as\n  thought-terminating cliche\n```\n\nEleven load-bearing vibes. Thirty-five of forty-one claims tagged vibes (85%). Every bottleneck in the graph is a vibes-basis claim — there are no load-bearing deductions, no load-bearing research citations. The five-claim unchallenged chain threads through the manifesto's core anti-museum argument without encountering a single challenge, empirical claim, or citation. \"Time and Space died yesterday\" functions structurally the way \"it's turtles all the way down\" functions in the slimemold taxonomy: a rhetorical flourish that caps an unresolved chain. Nothing in the extraction rests on anything verifiable. That is the structural signature of a manifesto, and the tool renders it visible.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eAppendix: Slimemold on Sokal's \"Transgressing the Boundaries\" (1996)\u003c/b\u003e\u003c/summary\u003e\n\nWe fed [`examples/documents/sokal-social-text-1996.md`](examples/documents/sokal-social-text-1996.md) to `slimemold ingest`. 240 claims, 335 edges. The Works Cited and Notes sections are skipped by the chunker since they contain only bibliography, not argument.\n\n```\nSLIMEMOLD [demo-sokal] — 240 claims, 335 edges\n  Basis: vibes=136, research=66, deduction=26, analogy=5,\n         definition=4, assumption=3\n\nCRITICAL Load-bearing vibes: \"Lacan argued that topological\n  surfaces — the torus, Klein bottle, cross-cap, Möbius strip — are\n  the mathematics of the subject\" supports 7 downstream claims\n\nCRITICAL Load-bearing vibes: \"Feminist and poststructuralist critiques\n  have demystified the substantive content of mainstream Western\n  scientific practice\" supports 5 downstream claims\n\nCRITICAL Load-bearing vibes: \"The content of any science is\n  profoundly constrained by the language within which its\n  discourses are formulated\" supports 5 downstream claims\n\nCRITICAL Load-bearing vibes: \"As yet no emancipatory mathematics\n  exists, and we can only speculate upon its eventual content\"\n  supports 4 downstream claims\n\nWARNING Bottleneck (centrality 770): \"The content and methodology\n  of postmodern science provide powerful intellectual support for\n  the progressive political project\" [vibes]\n\nWARNING Bottleneck (centrality 625): \"One part of the progressive\n  project must involve the construction of a new and truly\n  progressive science\" [vibes]\n\nWARNING Bottleneck (centrality 536): \"A complete elucidation of one\n  and the same object may require diverse points of view\" [research]\n\nWARNING Unchallenged chain (15 claims): The Einsteinian constant is\n  not → The putative observer becomes fatally → The infinite-\n  dimensional invariance group → Diffeomorphisms are self-mappings\n  of → In mathematical terms, Derrida's observation → Derrida\n  replied that the Einsteinian → At a symposium on Les Langages\n  Critiques → General relativity has had a profound → General\n  relativity forces upon us radically → General relativity predicts\n  the bending → Einstein's general relativity subsumes → Newton's\n  gravitational theory corresponds → Einstein's equations are\n  highly nonlinear → In Einstein's general theory\n```\n\nSixty-six claims tagged `research` — more citation density than most real papers. Sokal's hoax was *designed* to look rigorously sourced. But the structurally load-bearing claims — the ones other claims depend on — are overwhelmingly `vibes`: rhetorical synthesis statements about \"postmodern science,\" \"emancipatory mathematics,\" \"the progressive political project.\" The two highest-centrality bottlenecks in the entire graph are unsourced grand claims that the rest of the argument flows through. The fifteen-claim unchallenged chain threads from Einstein's field equations through Derrida's invocation of them to the paper's thesis without a single challenge or verifying edge — the citation-dense surface never actually intersects with the argument-bearing structure. The tool sees the hoax's exact mechanism: pad the page with real citations, carry the argument on vibes.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eAppendix: Slimemold's audit of this README\u003c/b\u003e\u003c/summary\u003e\n\nWe fed this README to `slimemold ingest`. 228 claims, 457 edges.\n\n```\nSLIMEMOLD [demo-readme] — 228 claims, 457 edges\n  Basis: definition=81, vibes=40, empirical=38, deduction=33,\n         research=23, analogy=9, llm_output=3, assumption=1\n\nCRITICAL Load-bearing vibes: \"AI models will agree with unsourced\n  claims, then agree with the structural analysis showing claims\n  are unsourced, then enthusiastically agree you should verify\n  them\" supports 8 downstream claims\n\nCRITICAL Load-bearing llm_output: \"Physarum polycephalum forages by\n  following local chemical gradients\" supports 6 downstream claims\n\nCRITICAL Load-bearing vibes: \"Language models are trained to\n  minimize prediction loss on human text — their output is\n  optimized, by construction, for the qualities that drive\n  processing fluency\" supports 5 downstream claims\n\nCRITICAL Load-bearing vibes: \"When you partially understand\n  something, it feels like understanding\" supports 5 downstream\n  claims\n\nCRITICAL Load-bearing vibes: \"A language model has no privileged\n  access to its own epistemic state\" supports 4 downstream claims\n\nWARNING Bottleneck (centrality 9249): \"Slimemold watches\n  conversations as they happen, extracts the claims being made,\n  builds a persistent graph\" [definition] — many reasoning paths\n  flow through this claim\n\nWARNING Bottleneck (centrality 8620): \"Processing fluency\n  masquerades as truth\" [research] — load-bearing at the structural\n  center of the essay\n\nWARNING Unchallenged chain (35 claims): Physarum forages by\n  following chemical gradients → gradient-following is how the\n  organism builds efficient networks → the pathology is\n  miscalibration → humans follow the fluency gradient the same way\n  → information foraging theory → Bjork's desirable difficulties →\n  processing fluency masquerades as truth → partial understanding\n  feels like understanding → ... → AI models will agree with\n  unsourced claims\n\nINFO Premature closure: \"Preventing the model from being sycophantic\n  requires an elaborate intervention\" terminates a line of\n  reasoning that still has unverified claims upstream\n```\n\nSix load-bearing claims. The essay's opening (\"when you partially\nunderstand something, it feels like understanding\") carries five\ndependents; the essay's closing thesis (\"AI models will agree with\nunsourced claims, then agree with the structural analysis showing\nclaims are unsourced\") carries eight. The *Physarum* metaphor that\nruns as the essay's organizing image is itself a load-bearing\nllm_output claim with six dependents — we assert as fact what the\nslime mold does, cite no biology paper in-text, and build the rest of\nthe argument on top of it. The 35-claim unchallenged chain threads\nfrom that metaphor all the way through the fluency literature to the\nessay's claims about AI behavior — every step feels reasonable, nobody\npaused. An earlier draft had a nine-claim chain and fourteen fluency\ntraps; adding sycophancy citations (Perez et al. 2022, Sharma et al.\n2023) broke the chain, replacing a thought-terminating cliche with an\nactual engagement of the limitation removed the worst premature\nclosure. The audit loop works. It does not converge to zero — in fact,\nas the essay grows, the chain grows with it.\n\n(Earlier versions of this appendix showed numbers from transcript-mode\nextraction, which was the only path available. With `slimemold ingest`\nnow landed, the README gets read via document mode — the mode that\nmatches what the README actually is — and the numbers above reflect\nthat.)\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustinstimatze%2Fslimemold","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustinstimatze%2Fslimemold","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustinstimatze%2Fslimemold/lists"}