{"id":50436866,"url":"https://github.com/blmayer/brain","last_synced_at":"2026-05-31T17:30:44.906Z","repository":{"id":359422671,"uuid":"1245181587","full_name":"blmayer/brain","owner":"blmayer","description":"Mirror of brain project","archived":false,"fork":false,"pushed_at":"2026-05-21T20:10:21.000Z","size":1852,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T04:57:07.832Z","etag":null,"topics":["knowledge-graph","ontology","pos-tagging","python"],"latest_commit_sha":null,"homepage":"https://terminal.pink/brain/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blmayer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-21T01:47:30.000Z","updated_at":"2026-05-21T20:10:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/blmayer/brain","commit_stats":null,"previous_names":["blmayer/brain"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/blmayer/brain","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmayer%2Fbrain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmayer%2Fbrain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmayer%2Fbrain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmayer%2Fbrain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blmayer","download_url":"https://codeload.github.com/blmayer/brain/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmayer%2Fbrain/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33742184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["knowledge-graph","ontology","pos-tagging","python"],"created_at":"2026-05-31T17:30:44.115Z","updated_at":"2026-05-31T17:30:44.898Z","avatar_url":"https://github.com/blmayer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# brain\n\n**Knowledge-driven program synthesis and semantic reasoning.**\n\n`brain` stores knowledge as structured, queryable facts (originally triplets, now also rich plan templates) and uses that knowledge to assemble correct outputs instead of relying on LLM hallucination.\n\nThe active implementation is in Python.\n\n---\n\n## Python Path (Current Development Focus)\n\nThis is the actively evolving implementation.\n\n### Pipeline\n\nThe system follows an ontology-driven flow:\n\n```mermaid\nflowchart TD\n    A[Input Sentence] \n    --\u003e B[process_input\u003cbr/\u003eNLTK + Coreference Resolution]\n    \n    B --\u003e C[_extract_intent_features]\n    \n    C --\u003e D[_map_features_to_initial_concepts\u003cbr/\u003ePer-word keyword matching]\n    \n    D --\u003e E[_resolve_dependencies\u003cbr/\u003eRecursive relation walking + type satisfaction]\n    \n    E --\u003e F[_features_to_plan]\n    \n    F --\u003e G[tree_to_solved_plan]\n    \n    G --\u003e H[solve_plan]\n    \n    H --\u003e I[emit\u003cbr/\u003eUsing Concept emitters]\n    \n    D -.-\u003e J[get_ontology]\n    E -.-\u003e J\n```\n\nCurrent demo: \"write a Golang program that reads 2 integers and prints their sum\" → correct program using `var`/`fmt.Scanf`/`+`/`fmt.Println` emitted from the knowledge base.\n\n#### Detailed Data Flow Example\n\nThe following trace shows the **exact data flux** for the canonical example used in `test_augment.py`:\n\n```mermaid\nsequenceDiagram\n    autonumber\n    participant U as User/Test\n    participant M as main.py:process_input\n    participant C as coreference_resolver.py\n    participant A as augment.py\n    participant K as kb.py:Ontology\n\n    U-\u003e\u003eM: process_input(sentence: str)\n    Note over M: In: \"write a Golang program that reads 2 integers and prints their sum\"\n    M-\u003e\u003eM: nltk.word_tokenize + pos_tag + ne_chunk\n    Note right of M: parsed_tree: nltk.Tree\n    M-\u003e\u003eC: resolve_pronouns(parsed_tree)\n    Note right of C: returns tree with dict leaves\u003cbr/\u003e{word, pos, reference}\n    C--\u003e\u003eM: resolved_tree\n    M--\u003e\u003eU: (resolved_tree, parsed_tree)\n\n    U-\u003e\u003eA: tree_to_solved_plan(parsed_tree, resolved_tree)\n    Note right of A: Public entry point (ontology-native path)\n\n    A-\u003e\u003eA: _extract_intent_features(tree)\n    Note right of A: In: resolved_tree\u003cbr/\u003eOut (observed in test):\u003cbr/\u003e  verbs=['write','reads','prints',...]\u003cbr/\u003e  languages=['golang']\u003cbr/\u003e  io_verbs={}, arithmetic={},\u003cbr/\u003e  detected_concepts={}   (at print)\n    A-\u003e\u003eA: _map_features_to_initial_concepts(features)\n    Note right of A: In: features\u003cbr/\u003eOut (exact):\u003cbr/\u003e  [PrintOperation,\u003cbr/\u003e   fmt.Scanf,\u003cbr/\u003e   BinaryAdd]\u003cbr/\u003e(ranked by match count)\n    A-\u003e\u003eK: find_concepts_matching (per word, strict=True)\n    K--\u003e\u003eA: ranked Concept matches\u003cbr/\u003e(PrintOperation scored 2)\n    A-\u003e\u003eA: _resolve_dependencies(initial_concepts)\n    Note right of A: In: [PrintOperation, fmt.Scanf, BinaryAdd]\u003cbr/\u003eOut (exact list + order):\u003cbr/\u003e  [fmt.Println,\u003cbr/\u003e   fmt.Scanf,\u003cbr/\u003e   FunctionType,\u003cbr/\u003e   BinaryAdd]\n    A-\u003e\u003eA: _features_to_plan(features)\n    Note right of A: In: features\u003cbr/\u003eOut (exact):\u003cbr/\u003e  { \"type\": \"ontology_driven_plan\",\u003cbr/\u003e    \"starting_concepts\": [\"PrintOperation\",\"fmt.Scanf\",\"BinaryAdd\"],\u003cbr/\u003e    \"resolved_dependencies\": [\"fmt.Println\",\"fmt.Scanf\",\"FunctionType\",\"BinaryAdd\"],\u003cbr/\u003e    \"all_concepts\": [...] }\n    A-\u003e\u003eA: solve_plan(plan, Context())\n    Note right of A: In: ontology_driven_plan\u003cbr/\u003eOut (exact):\u003cbr/\u003e  ExecNode(\u003cbr/\u003e    concept=FunctionDeclaration,\u003cbr/\u003e    deps=[\u003cbr/\u003e      ExecNode(fmt.Println),\u003cbr/\u003e      ExecNode(fmt.Scanf),\u003cbr/\u003e      ExecNode(FunctionType),\u003cbr/\u003e      ExecNode(BinaryAdd)\u003cbr/\u003e    ] )\n    A--\u003e\u003eU: solved: ExecNode\n\n    U-\u003e\u003eA: emit(solved: ExecNode)\n    Note over A: DFS post-order, render() using emitters[0].template\n    A--\u003e\u003eU: lines: list[str]\u003cbr/\u003e(exact, current run):\u003cbr/\u003e  0: fmt.Println(a)\u003cbr/\u003e  1: fmt.Scanf(format, args)\u003cbr/\u003e  2: // no emitter defined for FunctionType\u003cbr/\u003e  3: left + right\u003cbr/\u003e  4: func namesignature {\u003cbr/\u003ebody\u003cbr/\u003e}\n```\n\n**Key observations from this trace:**\n- Concept discovery is entirely **KB-driven** (no hardcoded verb lists).\n- `_extract_intent_features` returns mostly raw words/POS + language hints. The actual `Concept` objects are produced later by `_map_features_to_initial_concepts` (via `ontology.find_concepts_matching`). In the inspection test print, `features['detected_concepts']` was still empty at that snapshot.\n- `_resolve_dependencies` is the heart of the new ontology system — it walks `specializes`, `hasParameter`, `produces`, and `implementedBy` relations to go from the 3 initial matches to the final 4 concrete concepts.\n- The final `ExecNode` tree is emitted via per-Concept Jinja-style templates stored in the ontology JSON files.\n- Emission is still maturing (hence the placeholder lines above); the important part is that the correct concepts were discovered and wired together.\n\n### Getting Started (Python)\n\n1. Install dependencies:\n\n   ```bash\n   pip install nltk\n   ```\n\n2. **Download the required NLTK data** (run once):\n\n   ```python\n   import nltk\n   nltk.download('punkt')\n   nltk.download('punkt_tab')          # newer NLTK versions\n   nltk.download('averaged_perceptron_tagger')\n   nltk.download('maxent_ne_chunker')\n   nltk.download('words')\n   ```\n\n   Or from the command line:\n\n   ```bash\n   python -c \"import nltk; nltk.download(['punkt','punkt_tab','averaged_perceptron_tagger','maxent_ne_chunker','words'])\"\n   ```\n\n3. Run the demo:\n\n   ```bash\n   python main.py\n   ```\n\n   Type sentences like:\n   - `write a Golang program that reads 2 integers and prints their sum`\n   - `write a python program that ...`\n\n4. Run the tests:\n\n   ```bash\n   python -m unittest test_augment test_coreference_resolver -v\n   ```\n\n### Key Python Files\n\n| File                        | Purpose |\n|----------------------------|---------|\n| `main.py`                  | Interactive entry point + `process_input()` (NLTK pipeline) |\n| `coreference_resolver.py`  | Pronoun resolution on the parsed tree |\n| `augment.py`               | `tree_to_solved_plan()`, generic plan builder, `solve_plan()`, `emit()` |\n| `kb.py`                    | Python-native Knowledge Base (Node with `needs`, `produces`, `emits`) |\n| `test_augment.py`          | Tests for the plan solver and end-to-end NLTK → emission flow |\n\n---\n\n## Knowledge Base\n\nKnowledge lives in two forms:\n\n1. **Rich plan templates** (`kb.py` + the JSONs under `kb/programming_languages/go/...`)\n   - Used by the Python solver (`sum`, `print`, `read`, `declaration`, etc.).\n   - Each node declares `needs`, `produces`, and `emits` (text + references).\n\n2. **Triplet KB** (the original `kb/*.json` files)\n   - Classic `subject verb object` facts with confidence, context, date, etc.\n\nAdding new capabilities is usually just adding a new node to `kb.py` (or a JSON file). The Python feature extractor automatically recognizes any new node IDs that appear in user sentences.\n\n---\n\n## Design Goals\n\n- Move from \"ask the LLM to write code\" to **\"parse intent + assemble from verified knowledge atoms\"**.\n- Make the system **extensible by data**, not by code changes.\n- Support traceability: every emitted line can be traced back to a specific KB entry.\n\nSee `DESIGN_DOC.md` for the original four-phase architecture.\n\n---\n\n## Limitations \u0026 Future Work\n\n- The current Python KB is still small (focused on the \"sum two numbers\" example).\n- NLTK parsing is a cheap local approximation — a real LLM parser (as described in the design doc) would be more robust for complex sentences.\n- No persistent storage or multi-turn conversation context yet in the Python path.\n\nContributions that expand `kb.py` with new reusable templates (loops, conditionals, different languages, etc.) are very welcome.\n\n---\n\n## License\n\nThis project is licensed under the BSD 3-Clause License — see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblmayer%2Fbrain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblmayer%2Fbrain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblmayer%2Fbrain/lists"}