https://github.com/dermatologist/pyomop
Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.
https://github.com/dermatologist/pyomop
cdm clinical-trials datawarehouse hacktoberfest health-data-analysis health-informatics llm ohdsi python text-to-sql
Last synced: 6 months ago
JSON representation
Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.
- Host: GitHub
- URL: https://github.com/dermatologist/pyomop
- Owner: dermatologist
- License: gpl-3.0
- Created: 2020-05-02T16:09:27.000Z (about 6 years ago)
- Default Branch: develop
- Last Pushed: 2025-12-20T13:54:19.000Z (6 months ago)
- Last Synced: 2025-12-22T17:17:59.571Z (6 months ago)
- Topics: cdm, clinical-trials, datawarehouse, hacktoberfest, health-data-analysis, health-informatics, llm, ohdsi, python, text-to-sql
- Language: Python
- Homepage: https://nuchange.ca/2025/08/vibe-coding-fhir-to-omop-cdm.html?github
- Size: 2.37 MB
- Stars: 56
- Watchers: 2
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Authors: AUTHORS.md
- Agents: AGENTS.md
Awesome Lists containing this project
- awesome-medical-ai-skills - pyomop - square) | Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import. | (Medical MCP Servers)
- awesome-healthcare-mcp-servers - PyOMOP - OHDSI/OMOP clinical data management with FHIR import for observational research (`L3` `B` `Python`). (Quality Compliance and Regulatory / Clinical Validity Score)
README
# pyomop: OMOP Swiss Army Knife 🔧
[](https://img.shields.io/github/v/release/dermatologist/pyomop)
[](https://github.com/dermatologist/pyomop/actions/workflows/pytest.yml?query=branch%3Adevelop)
[](https://codecov.io/gh/dermatologist/pyomop)
[](https://img.shields.io/github/commit-activity/m/dermatologist/pyomop)
[](https://img.shields.io/github/license/dermatologist/pyomop)
[](https://pypi.org/project/pyomop)
[](https://dermatologist.github.io/pyomop/)
[](https://github.com/dermatologist/pyomop/blob/develop/docs/github-dependents-info.md)
## ✨ Overview
**pyomop** is your OMOP Swiss Army Knife 🔧 for working with [OHDSI](https://www.ohdsi.org/) OMOP Common Data Model (CDM) v5.4 or v6 compliant databases using SQLAlchemy as the ORM. It supports converting query results to pandas DataFrames for machine learning pipelines and provides utilities for working with OMOP vocabularies. Table definitions are based on the [omop-cdm](https://github.com/thehyve/omop-cdm) library. Pyomop is designed to be a lightweight, easy-to-use library for researchers and developers experimenting and testing with OMOP CDM databases. It can be used both as a commandline tool and as an imported library in your code.
- Supports SQLite, PostgreSQL, and MySQL. CDM and Vocab tables are created in the same schema. (See usage below for more details)
- LLM-based natural language queries via langchain. [Usage](examples/llm_example.py).
- 🔥 FHIR to OMOP conversion utilities. (See usage below for more details)
- Execute [QueryLibrary](https://github.com/OHDSI/QueryLibrary). (See usage below for more details)
Please ⭐️ If you find this project useful!
## Installation
**Stable release:**
```
pip install pyomop
```
**Development version:**
```
git clone https://github.com/dermatologist/pyomop.git
cd pyomop
pip install -e .
```
**LLM support:**
```
pip install pyomop[llm]
```
#### ✨ See [this notebook](examples/llm_example.ipynb) or [script](examples/llm_example.py) for examples. 👇 [MCP SERVER](#mcp-server) is recommended for advanced usage.
## Docker
* A [docker-compose](/docker-compose.yml) is provided to quickly set up an environment with postgrs, [webapi](https://github.com/OHDSI/WebAPI), [atlas](https://github.com/OHDSI/atlas) and a [sql script](/examples/webapi_source.sql) to create a source in webapi. The script can be run using the `psql` command line tool or via the webapi UI. Please refresh after running the script by sending a request to /WebAPI/source/refresh.
## 🔧 Usage
```python
import asyncio
import datetime
from sqlalchemy import select
from pyomop import CdmEngineFactory, CdmVector, CdmVocabulary
# cdm6 and cdm54 are supported
from pyomop.cdm54 import Base, Cohort, Person, Vocabulary
async def main():
cdm = CdmEngineFactory() # Creates SQLite database by default for fast testing
# cdm = CdmEngineFactory(db='pgsql', host='', port=5432,
# user='', pw='',
# name='', schema='')
# cdm = CdmEngineFactory(db='mysql', host='', port=3306,
# user='', pw='',
# name='')
engine = cdm.engine
# Comment the following line if using an existing database. Both cdm6 and cdm54 are supported, see the import statements above
await cdm.init_models(Base.metadata) # Initializes the database with the OMOP CDM tables
vocab = CdmVocabulary(cdm, version='cdm54') # or 'cdm6' for v6
# Uncomment the following line to create a new vocabulary from CSV files
# vocab.create_vocab('/path/to/csv/files')
async with cdm.session() as session: # type: ignore
# Add Persons
async with session.begin():
session.add(
Person(
person_id=100,
gender_concept_id=8532,
gender_source_concept_id=8512,
year_of_birth=1980,
month_of_birth=1,
day_of_birth=1,
birth_datetime=datetime.datetime(1980, 1, 1),
race_concept_id=8552,
race_source_concept_id=8552,
ethnicity_concept_id=38003564,
ethnicity_source_concept_id=38003564,
)
)
session.add(
Person(
person_id=101,
gender_concept_id=8532,
gender_source_concept_id=8512,
year_of_birth=1980,
month_of_birth=1,
day_of_birth=1,
birth_datetime=datetime.datetime(1980, 1, 1),
race_concept_id=8552,
race_source_concept_id=8552,
ethnicity_concept_id=38003564,
ethnicity_source_concept_id=38003564,
)
)
# Query the Person
stmt = select(Person).where(Person.person_id == 100)
result = await session.execute(stmt)
for row in result.scalars():
print(row)
assert row.person_id == 100
# Query the person pattern 2
person = await session.get(Person, 100)
print(person)
assert person is not None
assert person.person_id == 100
# Convert result to a pandas dataframe
vec = CdmVector()
# https://github.com/OHDSI/QueryLibrary/blob/master/inst/shinyApps/QueryLibrary/queries/person/PE02.md
result = await vec.query_library(cdm, resource='person', query_name='PE02')
df = vec.result_to_df(result)
print("DataFrame from result:")
print(df.head())
result = await vec.execute(cdm, query='SELECT * from person;')
print("Executing custom query:")
df = vec.result_to_df(result)
print("DataFrame from result:")
print(df.head())
# Close engine
await engine.dispose() # type: ignore
# Run the main function
asyncio.run(main())
```
## 🔥 FHIR to OMOP mapping
pyomop can load FHIR Bulk Export (NDJSON) files into an OMOP CDM database.
- Sample datasets: https://github.com/smart-on-fhir/sample-bulk-fhir-datasets
- Remove any non-FHIR files (for example, `log.ndjson`) from the input folder.
- Download OMOP vocabulary CSV files (for example from OHDSI Athena) and place them in a folder.
Run:
```bash
pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/
```
This will create an OMOP CDM in SQLite, load the vocabulary files, and import the FHIR data from the input folder and reconcile vocabulary, mapping source_value to concept_id. The mapping is defined in the `mapping.example.json` file. The default mapping is [here](src/pyomop/mapping.default.json). Mapping happens in 5 steps as implemented [here](src/pyomop/loader.py).
* Example using postgres (Docker)
```bash
pyomop --dbtype pgsql --host localhost --user postgres --pw mypass --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/
```
* FHIR to data frame mapping is done with [FHIRy](https://github.com/dermatologist/fhiry)
* Most of the code for this functionality was written by an LLM agent. The prompts used are [here](notes/prompt.md)
### Command-line
```text
-c, --create Create CDM tables (see --version).
-t, --dbtype TEXT Database Type for creating CDM (sqlite, mysql or
pgsql)
-h, --host TEXT Database host
-p, --port TEXT Database port
-u, --user TEXT Database user
-w, --pw TEXT Database password
-v, --version TEXT CDM version (cdm54 (default) or cdm6)
-n, --name TEXT Database name
-s, --schema TEXT Database schema (for pgsql)
-i, --vocab TEXT Folder with vocabulary files (csv) to import
-f, --input DIRECTORY Input folder with FHIR bundles or ndjson files.
-e, --eunomia-dataset TEXT Download and load Eunomia dataset (e.g.,
'GiBleed', 'Synthea')
--eunomia-path TEXT Path to store/find Eunomia datasets (uses
EUNOMIA_DATA_FOLDER env var if not specified)
--connection-info Display connection information for the database (For R package compatibility)
--mcp-server Start MCP server for stdio interaction
--pyhealth-path TEXT Path to export PyHealth compatible CSV files
--help Show this message and exit.
```
## MCP Server
pyomop includes an MCP (Model Context Protocol) server that exposes tools for interacting with OMOP CDM databases. This allows MCP clients to create databases, load data, and execute SQL statements.
### Starting the MCP Server
To start the MCP server for stdio interaction:
```bash
# Using the main CLI
pyomop --mcp-server
```
#### Usage with MCP Clients
The server communicates via stdio and can be used with any MCP-compatible client. Example configuration for [vscode](/.vscode/mcp.json):
```json
{
"servers": {
"pyomop": {
"command": "uv",
"args": ["run", "pyomop", "--mcp-server"]
}
}
}
```
* *If the vocabulary is not installed locally or advanced vocabulary support is required from Athena, it is recommended to combine [omop_mcp](https://github.com/OHNLP/omop_mcp) with PyOMOP.*
#### Available MCP Tools
- **create_cdm**: Create an empty CDM database
- **create_eunomia**: Add Eunomia sample dataset
- **get_table_columns**: Get column names for a specific table
- **get_single_table_info**: Get detailed table information, including foreign keys
- **get_usable_table_names**: Get a list of all available table names
- **run_sql**: Execute SQL statements with error handling
* create_cdm and create_eunomia support only local sqlite databases to avoid inadvertent data loss in production databases.
#### Available Prompts
- **query_execution_steps**: Provides step-by-step guidance for executing database queries based on free text instructions
### Eunomia import and cohort creation
```
pyomop -e Synthea27Nj -v 5.4 --connection-info
pyomop -e GiBleed -v 5.3 --connection-info
```
## PyHealth and PLP Compatibility (For Machine Learning pipelines)
pyomop supports exporting OMOP CDM data (to `--pyhealth-path`) in a format compatible with [PyHealth](https://github.com/sunlabuiuc/PyHealth), a machine learning library for healthcare data analysis ([See Notebook](/examples/pyhealth.ipynb) and usage below). Additionally, you can export the connection information for use with the various R packages such as [PatientLevelPrediction](https://ohdsi.github.io/PatientLevelPrediction/) using the `--connection-info` option.
```bash
pyomop -e GiBleed -v 5.3 --connection-info --pyhealth-path ~/pyhealth
```
## Additional Tools
- **Convert FHIR to pandas DataFrame:** [fhiry](https://github.com/dermatologist/fhiry)
- **.NET and Golang OMOP CDM:** [.NET](https://github.com/dermatologist/omopcdm-dot-net), [Golang](https://github.com/E-Health/gocdm)
## Supported Databases
- PostgreSQL
- MySQL
- SQLite
## Environment Variables for Database Connection
You can configure database connection parameters using environment variables. These will be used as defaults by pyomop and the MCP server:
- `PYOMOP_DB`: Database type (`sqlite`, `mysql`, `pgsql`)
- `PYOMOP_HOST`: Database host
- `PYOMOP_PORT`: Database port
- `PYOMOP_USER`: Database user
- `PYOMOP_PW`: Database password
- `PYOMOP_SCHEMA`: Database schema (for PostgreSQL)
Example usage:
```bash
export PYOMOP_DB=pgsql
export PYOMOP_HOST=localhost
export PYOMOP_PORT=5432
export PYOMOP_USER=postgres
export PYOMOP_PW=mypass
export PYOMOP_SCHEMA=omop
```
These environment variables will be checked before assigning default values for database connection in pyomop and MCP server tools.
## Contributing
Pull requests are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md).
## Contributors
- [Bell Eapen](https://nuchange.ca) [](https://twitter.com/beapen)