https://github.com/moleculehub/moleculedatasets.jl
A collection of cheminformatics datasets
https://github.com/moleculehub/moleculedatasets.jl
cadd cheminformatics chemistry drug-discovery julia julia-language
Last synced: 9 months ago
JSON representation
A collection of cheminformatics datasets
- Host: GitHub
- URL: https://github.com/moleculehub/moleculedatasets.jl
- Owner: MoleculeHub
- License: mit
- Created: 2023-12-23T08:46:50.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-09-28T05:04:29.000Z (9 months ago)
- Last Synced: 2025-09-28T07:08:21.310Z (9 months ago)
- Topics: cadd, cheminformatics, chemistry, drug-discovery, julia, julia-language
- Language: Julia
- Homepage:
- Size: 2.69 MB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MoleculeDatasets.jl
[](https://github.com/JuliaDiff/BlueStyle)
[](https://github.com/JuliaTesting/Aqua.jl)
A Julia package for easily downloading and accessing popular cheminformatics datasets.
## Installation
```julia
using Pkg
Pkg.add("MoleculeDatasets")
```
## Quick Start
```julia
using MoleculeDatasets
# Download and load a dataset
data = get_mol_dataset("esol")
```
## Available Datasets
See [dataset_info.jl](https://github.com/MoleculeHub/MoleculeDatasets.jl/blob/main/src/dataset_info.jl)
## Adding a Dataset
To add a new dataset to the package, edit the `MOL_DATASETS` dictionary in [`src/dataset_info.jl`](src/dataset_info.jl). Each dataset entry should include:
**For local datasets:**
```julia
"dataset_key" => Dict(
"name" => "Dataset Display Name",
"description" => "Brief description of the dataset",
"filepath" => "data/filename.csv",
"format" => "csv",
"size" => "file size",
"type" => "local",
"reference" => "Full citation",
"doi" => "DOI if available",
"website" => "URL if available"
)
```
**For remote datasets:**
```julia
"dataset_key" => Dict(
"name" => "Dataset Display Name",
"description" => "Brief description of the dataset",
"url" => "https://example.com/dataset.csv",
"format" => "csv",
"size" => "file size",
"type" => "remote",
"reference" => "Full citation",
"doi" => "DOI if available",
"website" => "URL if available"
)
```
## API Reference
### Dataset Functions
- `get_mol_dataset(name; output_dir="data", force_download=false, verbose=true)`: Download and load a dataset as a DataFrame