{"id":25359638,"url":"https://github.com/monarch-initiative/dppkb","last_synced_at":"2025-04-09T02:43:17.974Z","repository":{"id":244860240,"uuid":"816485474","full_name":"monarch-initiative/dppkb","owner":"monarch-initiative","description":"DEMO example knowledge base created using DRAGON-AI","archived":false,"fork":false,"pushed_at":"2024-09-27T23:34:48.000Z","size":3143,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-14T21:07:07.464Z","etag":null,"topics":["curation","dragon-ai","human-phenotype-ontology","knowledge-base","large-language-models","monarchinitiative","onto-gpt","ontologies","pathophysiology"],"latest_commit_sha":null,"homepage":"https://monarch-initiative.github.io/dppkb/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/monarch-initiative.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-17T21:19:53.000Z","updated_at":"2024-11-18T14:00:36.000Z","dependencies_parsed_at":"2024-06-17T23:30:01.468Z","dependency_job_id":"be9b70f0-dc32-453d-872a-1eecf9fe7739","html_url":"https://github.com/monarch-initiative/dppkb","commit_stats":null,"previous_names":["monarch-initiative/dppkb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monarch-initiative%2Fdppkb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monarch-initiative%2Fdppkb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monarch-initiative%2Fdppkb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monarch-initiative%2Fdppkb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/monarch-initiative","download_url":"https://codeload.github.com/monarch-initiative/dppkb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247968249,"owners_count":21025797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["curation","dragon-ai","human-phenotype-ontology","knowledge-base","large-language-models","monarchinitiative","onto-gpt","ontologies","pathophysiology"],"created_at":"2025-02-14T21:07:09.786Z","updated_at":"2025-04-09T02:43:17.946Z","avatar_url":"https://github.com/monarch-initiative.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dppkb\n\nDisease Pathophysiology Knowledge Base FOR DEMO PURPOSES\n\nThis repo contains a mostly automated demo KB of diseases, pathophysiology, treatments,\netiology etc generated using DRAGON-AI/CurateGPT.\n\nThe KB is created via a cycle:\n\n1. Human expert creates one or two seed entries\n2. New entries are created from latent knowledgebase of LLM\n3. Pubmed is searched for support/refute evidence on a per-assertion basis\n4. LLM acts as critic guided by human to constantly refine\n\n## Website\n\n[https://monarch-initiative.github.io/dppkb](https://monarch-initiative.github.io/dppkb)\n\nClick on \"Diseases\" to browse the \"Knowledge Base\". You will see a highly generic\nrendering of auto-generated disease entries.\n\n## What is this?\n\nThis is an experiment in using CurateGPT for de-novo human-driven Knowledge Base cuation.\n\nThe general workflow is:\n\n1. A human writes some sample YAML files for a few entries\n   - the schema can be invented \"on the fly\"\n2. Iterate using claude.ai\n   - ask it to suggest other fields\n   - use as a template to create more\n3. Save as a .yaml file\n4. Iterate with curate-gpt\n   - `complete` command will generate a new entry\n   - `citeseek` command will add support/refute evidence from pubmed\n   - `update` command will enrich specific fields\n   - `review` command will use LLM as a critic and suggest changes\n\n\n## Files\n\n- [kb/dppkb.yaml](kb/dppkb.yaml) - main KB\n\n\n## Details\n\n### Create an CurateGPT index\n\nRun\n\n`make index`\n\nThis should be run periodically - it makes a local ChromaDB that will be used for RAG\n\nNote: this loads a pre-processed version that has the evidence removed; we want to\nhide this when doing RAG as we want to avoid publication hallucination.\n\n### Generate a new entity\n\nRun this:\n\n`make tmp/complete-Tuberculosis.yaml`\n\nThis uses RAG/DRAGON-AI to make a candidate entry. You can then copy this into the kb/dppkb.yaml, or\nyou can manually tweak it, or ask claude to tweak it.\n\nThe idea is that as the KB is incrementally built up with high quality examples, there will be\nless need for manual tweaking, RAG will be good enough.\n\nAlso recall we can enhance in future steps\n\nNOTE: This step does not use the pubmed directly. We are relying on the fact that the LLM has already ingested\nand compressed all the literature and can do a pretty good first-pass job at re-exporting that in any\nformat we like. It doesn't have to be perfect though, subsequent steps are designed to refine this.\n\n### Adding evidence\n\n`make tmp/with-evidence.yaml`\n\nThis with run CurateGPT `citeseek` over all assertions, if there is no `evidence` tag it will\nquery pubmed for supporting/refuting evidence.\n\n### Periodic Review\n\nIt is recommended to periodically inspect the file wearing a lead curator role, and to ask for reviews.\n\nEither global reviews:\n\n`curategpt review --model gpt-4o -p db -c disease \"{}\"  -t patch --primary-key name \u003e tmp/review.patch.yaml`\n\nOr focused, e.g. if you want `pathophysiology` to be fleshed out:\n\n`curategpt -vv review --model gpt-4o -p db -c disease \"{}\" -Z pathophysiology -P name -t patch --primary-key name --rule \"include as many mechanisms and molecular steps as you can\" \u003e tmp/pathophys-review.yaml`\n\nThe result is a patch file, This can be manually examined, edited, and applied:\n\n`curategpt apply-patch --patch tmp/patch.yaml --primary-key name kb/dppkb.yaml \u003e tmp/patched.kb.yaml`\n\nDo a diff then move it\n\n### YAML normalization\n\nthere are different ways to write YAML. Ensure the kb representation is normalized:\n\n`make normalize`\n\n### Linking to ontology term IDs\n\nCurrently we use labels not IDs as these are easier for humans reviewing the YAML, and for LLMs.\n\nGrounding is expected to be trivial and highly reliable, will add a simple mappings to every entry.\n\n### End to end automation\n\nTODO\n\n## Running the app\n\n`make app`\n\nThis will create a streamlit app where you can chat with the KB, visualize clusters, etc.\n\n### Clustering\n\nAsk a question:\n\n\u003cimg width=\"785\" alt=\"image\" src=\"https://github.com/monarch-initiative/dppkb/assets/50745/40fef947-3bd9-4c28-9a3f-3345c41026c2\"\u003e\n\nSee results clustered:\n\n\u003cimg width=\"770\" alt=\"image\" src=\"https://github.com/monarch-initiative/dppkb/assets/50745/10eb74f0-9861-459a-9188-2494b5b250b3\"\u003e\n\n### Chat\n\n\u003cimg width=\"696\" alt=\"image\" src=\"https://github.com/monarch-initiative/dppkb/assets/50745/fca67f84-31b9-44ed-b157-4644d6a15297\"\u003e\n\nresults:\n\n\u003cimg width=\"767\" alt=\"image\" src=\"https://github.com/monarch-initiative/dppkb/assets/50745/c09926a6-6d58-42fa-a841-d5c0a77158e6\"\u003e\n\n\u003cimg width=\"705\" alt=\"image\" src=\"https://github.com/monarch-initiative/dppkb/assets/50745/17b495ad-3dd9-4bf1-82b4-278c6e7f2e0e\"\u003e\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmonarch-initiative%2Fdppkb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmonarch-initiative%2Fdppkb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmonarch-initiative%2Fdppkb/lists"}