https://github.com/scriptogre/biomedical-data
Biomedical databases as Parquet files (DrugCentral, Pharos, Open Targets, AACT Clinical Trials, MONDO, STRING, CORUM)
https://github.com/scriptogre/biomedical-data
Last synced: about 2 months ago
JSON representation
Biomedical databases as Parquet files (DrugCentral, Pharos, Open Targets, AACT Clinical Trials, MONDO, STRING, CORUM)
- Host: GitHub
- URL: https://github.com/scriptogre/biomedical-data
- Owner: scriptogre
- Created: 2026-04-22T12:45:52.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-22T13:46:45.000Z (about 2 months ago)
- Last Synced: 2026-04-22T15:03:33.251Z (about 2 months ago)
- Language: Python
- Size: 15.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Biomedical Data (Parquet)
Zstd-compressed Parquet exports of biomedical databases for use with [DuckDB](https://duckdb.org/). These are **curated subsets** tailored for pharmacology research (not full database mirrors).
Automated monthly via GitHub Actions. Download from [Releases](https://github.com/scriptogre/biomedical-data/releases).
## Databases
| Database | What's included | Source |
|---|---|---|
| **DrugCentral** | All tables from the official schema | [Public PostgreSQL](https://unmtid-dbs.net/) |
| **Pharos/TCRD** | All tables from the MySQL dump | [FigShare](https://figshare.com/search?q=pharos+database+mysql) |
| **Open Targets** | Selected datasets (targets, diseases, drugs, associations, mechanisms, indications) | [EBI FTP](https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest/) |
| **Clinical Trials (AACT)** | All tables from the daily flat file export | [AACT](https://aact.ctti-clinicaltrials.org/downloads) |
| **MONDO** | Full ontology (nodes + edges) | [GitHub releases](https://github.com/monarch-initiative/mondo/releases) |
| **STRING** | Human (taxon 9606) only: protein info, aliases, physical interactions | [StringDB](https://string-db.org/) |
| **CORUM** | Full complex dataset | [Helmholtz Munich](https://mips.helmholtz-muenchen.de/corum/) |
## Usage
```bash
# Download a database
gh release download drugcentral/latest -R scriptogre/biomedical-data
tar xzf drugcentral.tar.gz -C data/drugcentral/
```
```python
import duckdb
conn = duckdb.connect()
conn.execute("SELECT * FROM read_parquet('data/drugcentral/structures.parquet') LIMIT 10")
```
## License
Each database retains its original license. This repository only provides format conversion.