https://github.com/lycosystem/lydata-package
Python package for programmatic access to the lyDATA tables as well as utilities to handle the datasets.
https://github.com/lycosystem/lydata-package
Last synced: 5 months ago
JSON representation
Python package for programmatic access to the lyDATA tables as well as utilities to handle the datasets.
- Host: GitHub
- URL: https://github.com/lycosystem/lydata-package
- Owner: lycosystem
- License: mit
- Created: 2025-06-18T12:34:41.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-09-04T09:23:02.000Z (10 months ago)
- Last Synced: 2025-09-04T10:37:20.252Z (10 months ago)
- Language: Python
- Homepage: https://lydata.readthedocs.io
- Size: 566 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Python Library for Loading and Manipulating lyDATA Tables
[](https://github.com/lycosystem/lydata-package/actions/workflows/release.yml)
[](https://github.com/lycosystem/lydata-package/actions/workflows/tests.yml)
[](https://lydata.readthedocs.io/stable/?badge=stable)
[](https://htmlpreview.github.io/?https://github.com/lycosystem/lydata-package/blob/python-coverage-comment-action-data/htmlcov/index.html)
This repository provides a Python library for loading, manipulating, and validating the datasets available on [lyDATA](https://github.com/lycosystem/lydata).
> [!WARNING]
> This Python library is still highly experimental!
>
> Also, it has recently been spun off from the repository of datasets, [lyDATA](https://github.com/lycosystem/lydata), and some things might still not work as expected.
## Installation
### 1. Install from PyPI
You can install the library from PyPI using pip:
```bash
pip install lydata
```
### 2. Install from Source
If you want to install the library from source, you can clone the repository and install it using pip:
```bash
git clone https://github.com/lycosystem/lydata-package
cd lydata-package
pip install -e .
```
## Usage
The first and most common use case would probably listing and loading the published datasets:
```python
>>> import lydata
>>> for dataset_spec in lydata.available_datasets(
... year=2023, # show all datasets added in 2023
... ref="61a17e", # may be some specific hash/tag/branch
... ):
... print(dataset_spec.name)
2023-clb-multisite
2023-isb-multisite
# return generator of datasets that include oropharyngeal tumor patients
>>> first_dataset = next(lydata.load_datasets(subsite="oropharynx"))
>>> print(first_dataset.head())
... # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
patient ... positive_dissected
# ... contra
id institution sex ... III IV V
0 P011 Centre Léon Bérard male ... 0.0 0.0 0.0
1 P012 Centre Léon Bérard female ... 0.0 0.0 0.0
2 P014 Centre Léon Bérard male ... 0.0 0.0 NaN
3 P015 Centre Léon Bérard male ... 0.0 0.0 NaN
4 P018 Centre Léon Bérard male ... NaN NaN NaN
[5 rows x 82 columns]
```
And since the three-level header of the tables is a little unwieldy at times, we also provide some shortcodes via a custom pandas accessor. As soon as `lydata` is imported it can be used like this:
```python
>>> print(first_dataset.ly.age)
... # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
0 67
1 62
...
261 60
262 60
Name: (patient, #, age), Length: 263, dtype: int64
```
And we have implemented `Q` and `C` objects inspired by Django that allow easier querying of the tables:
```python
>>> from lydata import C
# select patients younger than 50 that are not HPV positive (includes NaNs)
>>> query_result = first_dataset.ly.query((C("age") < 50) & ~(C("hpv") == True))
>>> (query_result.ly.age < 50).all()
np.True_
>>> (query_result.ly.hpv == False).all()
np.True_
```
For more details and further examples or use-cases, have a look at the [official documentation](https://lydata.readthedocs.org/)