https://github.com/openml/openml-python
OpenML's Python API for a World of Data and More 💫
https://github.com/openml/openml-python
benchmarking data datascience machine-learning meta-learning openml python tabular-data
Last synced: 11 months ago
JSON representation
OpenML's Python API for a World of Data and More 💫
- Host: GitHub
- URL: https://github.com/openml/openml-python
- Owner: openml
- License: other
- Created: 2014-03-20T10:46:41.000Z (almost 12 years ago)
- Default Branch: develop
- Last Pushed: 2025-04-01T16:01:56.000Z (11 months ago)
- Last Synced: 2025-04-03T02:13:58.969Z (11 months ago)
- Topics: benchmarking, data, datascience, machine-learning, meta-learning, openml, python, tabular-data
- Language: Python
- Homepage: http://openml.github.io/openml-python/
- Size: 194 MB
- Stars: 291
- Watchers: 21
- Forks: 147
- Open Issues: 129
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
OpenML-Python
## The Python API for a World of Data and More :dizzy:
[](https://github.com/openml/openml-python/releases)
[](https://pypi.org/project/openml/)
[](https://pepy.tech/project/openml)
[](https://opensource.org/licenses/BSD-3-Clause)
[Installation](https://openml.github.io/openml-python/main/#how-to-get-openml-for-python) | [Documentation](https://openml.github.io/openml-python) | [Contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md)
OpenML-Python provides an easy-to-use and straightforward Python interface for [OpenML](http://openml.org), an online platform for open science collaboration in machine learning.
It can download or upload data from OpenML, such as datasets and machine learning experiment results.
## :joystick: Minimal Example
Use the following code to get the [credit-g](https://www.openml.org/search?type=data&sort=runs&status=active&id=31) [dataset](https://docs.openml.org/concepts/data/):
```python
import openml
dataset = openml.datasets.get_dataset("credit-g") # or by ID get_dataset(31)
X, y, categorical_indicator, attribute_names = dataset.get_data(target="class")
```
Get a [task](https://docs.openml.org/concepts/tasks/) for [supervised classification on credit-g](https://www.openml.org/search?type=task&id=31&source_data.data_id=31):
```python
import openml
task = openml.tasks.get_task(31)
dataset = task.get_dataset()
X, y, categorical_indicator, attribute_names = dataset.get_data(target=task.target_name)
# get splits for the first fold of 10-fold cross-validation
train_indices, test_indices = task.get_train_test_split_indices(fold=0)
```
Use an [OpenML benchmarking suite](https://docs.openml.org/concepts/benchmarking/) to get a curated list of machine-learning tasks:
```python
import openml
suite = openml.study.get_suite("amlb-classification-all") # Get a curated list of tasks for classification
for task_id in suite.tasks:
task = openml.tasks.get_task(task_id)
```
## :magic_wand: Installation
OpenML-Python is supported on Python 3.8 - 3.13 and is available on Linux, MacOS, and Windows.
You can install OpenML-Python with:
```bash
pip install openml
```
## :page_facing_up: Citing OpenML-Python
If you use OpenML-Python in a scientific publication, we would appreciate a reference to the following paper:
[Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter
**OpenML-Python: an extensible Python API for OpenML**
Journal of Machine Learning Research, 22(100):1−5, 2021](https://www.jmlr.org/papers/v22/19-920.html)
Bibtex entry:
```bibtex
@article{JMLR:v22:19-920,
author = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas Müller and Joaquin Vanschoren and Frank Hutter},
title = {OpenML-Python: an extensible Python API for OpenML},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {100},
pages = {1--5},
url = {http://jmlr.org/papers/v22/19-920.html}
}
```