https://github.com/monarch-initiative/dppkb
DEMO example knowledge base created using DRAGON-AI
https://github.com/monarch-initiative/dppkb
curation dragon-ai human-phenotype-ontology knowledge-base large-language-models monarchinitiative onto-gpt ontologies pathophysiology
Last synced: about 1 year ago
JSON representation
DEMO example knowledge base created using DRAGON-AI
- Host: GitHub
- URL: https://github.com/monarch-initiative/dppkb
- Owner: monarch-initiative
- License: bsd-3-clause
- Created: 2024-06-17T21:19:53.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-27T23:34:48.000Z (almost 2 years ago)
- Last Synced: 2025-02-14T21:07:07.464Z (over 1 year ago)
- Topics: curation, dragon-ai, human-phenotype-ontology, knowledge-base, large-language-models, monarchinitiative, onto-gpt, ontologies, pathophysiology
- Language: HTML
- Homepage: https://monarch-initiative.github.io/dppkb/
- Size: 3 MB
- Stars: 5
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# dppkb
Disease Pathophysiology Knowledge Base FOR DEMO PURPOSES
This repo contains a mostly automated demo KB of diseases, pathophysiology, treatments,
etiology etc generated using DRAGON-AI/CurateGPT.
The KB is created via a cycle:
1. Human expert creates one or two seed entries
2. New entries are created from latent knowledgebase of LLM
3. Pubmed is searched for support/refute evidence on a per-assertion basis
4. LLM acts as critic guided by human to constantly refine
## Website
[https://monarch-initiative.github.io/dppkb](https://monarch-initiative.github.io/dppkb)
Click on "Diseases" to browse the "Knowledge Base". You will see a highly generic
rendering of auto-generated disease entries.
## What is this?
This is an experiment in using CurateGPT for de-novo human-driven Knowledge Base cuation.
The general workflow is:
1. A human writes some sample YAML files for a few entries
- the schema can be invented "on the fly"
2. Iterate using claude.ai
- ask it to suggest other fields
- use as a template to create more
3. Save as a .yaml file
4. Iterate with curate-gpt
- `complete` command will generate a new entry
- `citeseek` command will add support/refute evidence from pubmed
- `update` command will enrich specific fields
- `review` command will use LLM as a critic and suggest changes
## Files
- [kb/dppkb.yaml](kb/dppkb.yaml) - main KB
## Details
### Create an CurateGPT index
Run
`make index`
This should be run periodically - it makes a local ChromaDB that will be used for RAG
Note: this loads a pre-processed version that has the evidence removed; we want to
hide this when doing RAG as we want to avoid publication hallucination.
### Generate a new entity
Run this:
`make tmp/complete-Tuberculosis.yaml`
This uses RAG/DRAGON-AI to make a candidate entry. You can then copy this into the kb/dppkb.yaml, or
you can manually tweak it, or ask claude to tweak it.
The idea is that as the KB is incrementally built up with high quality examples, there will be
less need for manual tweaking, RAG will be good enough.
Also recall we can enhance in future steps
NOTE: This step does not use the pubmed directly. We are relying on the fact that the LLM has already ingested
and compressed all the literature and can do a pretty good first-pass job at re-exporting that in any
format we like. It doesn't have to be perfect though, subsequent steps are designed to refine this.
### Adding evidence
`make tmp/with-evidence.yaml`
This with run CurateGPT `citeseek` over all assertions, if there is no `evidence` tag it will
query pubmed for supporting/refuting evidence.
### Periodic Review
It is recommended to periodically inspect the file wearing a lead curator role, and to ask for reviews.
Either global reviews:
`curategpt review --model gpt-4o -p db -c disease "{}" -t patch --primary-key name > tmp/review.patch.yaml`
Or focused, e.g. if you want `pathophysiology` to be fleshed out:
`curategpt -vv review --model gpt-4o -p db -c disease "{}" -Z pathophysiology -P name -t patch --primary-key name --rule "include as many mechanisms and molecular steps as you can" > tmp/pathophys-review.yaml`
The result is a patch file, This can be manually examined, edited, and applied:
`curategpt apply-patch --patch tmp/patch.yaml --primary-key name kb/dppkb.yaml > tmp/patched.kb.yaml`
Do a diff then move it
### YAML normalization
there are different ways to write YAML. Ensure the kb representation is normalized:
`make normalize`
### Linking to ontology term IDs
Currently we use labels not IDs as these are easier for humans reviewing the YAML, and for LLMs.
Grounding is expected to be trivial and highly reliable, will add a simple mappings to every entry.
### End to end automation
TODO
## Running the app
`make app`
This will create a streamlit app where you can chat with the KB, visualize clusters, etc.
### Clustering
Ask a question:

See results clustered:

### Chat

results:

