An open API service indexing awesome lists of open source software.

https://github.com/limix/pandas-plink

PLINK reader for Python.
https://github.com/limix/pandas-plink

bed-format file-format genotype plink reader

Last synced: about 1 month ago
JSON representation

PLINK reader for Python.

Awesome Lists containing this project

README

        

# pandas-plink

Pandas-plink is a Python package for reading [PLINK binary file format](https://www.cog-genomics.org/plink2/formats)
and realized relationship matrices (PLINK or GCTA).
The file reading is taken place via [lazy loading](https://en.wikipedia.org/wiki/Lazy_loading), meaning that it saves up memory by actually reading only the genotypes that are actually accessed by the user.

Notable changes can be found at the [CHANGELOG.md](https://raw.githubusercontent.com/limix/pandas-plink/master/CHANGELOG.md).

## Install

It can be installed using [pip](https://pypi.python.org/pypi/pip):

```bash
pip install pandas-plink
```

Alternatively it can be intalled via [conda](http://conda.pydata.org/docs/index.html):

```bash
conda install -c conda-forge pandas-plink
```

## Usage

It is as simple as

```python
>>> from pandas_plink import read_plink1_bin
>>> G = read_plink1_bin("chr11.bed", "chr11.bim", "chr11.fam", verbose=False)
>>> print(G)

dask.array
Coordinates:
* sample (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
* variant (variant) object '11_316849996' '11_316874359' ... '11_345698259'
father (sample) >> print(G.sel(sample="B003", variant="11_316874359").values)
0.0
>>> print(G.a0.sel(variant="11_316874359").values)
G
>>> print(G.sel(sample="B003", variant="11_316941526").values)
2.0
>>> print(G.a1.sel(variant="11_316941526").values)
C
```
Portions of the genotype will be read as the user access them.

Covariance matrices can also be read very easily.
Example:

```python
>>> from pandas_plink import read_rel
>>> K = read_rel("plink2.rel.bin")
>>> print(K)

array([[ 0.885782, 0.233846, -0.186339, -0.009789, -0.138897, 0.287779,
0.269977, -0.231279, -0.095472, -0.213979],
[ 0.233846, 1.077493, -0.452858, 0.192877, -0.186027, 0.171027,
0.406056, -0.013149, -0.131477, -0.134314],
[-0.186339, -0.452858, 1.183312, -0.040948, -0.146034, -0.204510,
-0.314808, -0.042503, 0.296828, -0.011661],
[-0.009789, 0.192877, -0.040948, 0.895360, -0.068605, 0.012023,
0.057827, -0.192152, -0.089094, 0.174269],
[-0.138897, -0.186027, -0.146034, -0.068605, 1.183237, 0.085104,
-0.032974, 0.103608, 0.215769, 0.166648],
[ 0.287779, 0.171027, -0.204510, 0.012023, 0.085104, 0.956921,
0.065427, -0.043752, -0.091492, -0.227673],
[ 0.269977, 0.406056, -0.314808, 0.057827, -0.032974, 0.065427,
0.714746, -0.101254, -0.088171, -0.063964],
[-0.231279, -0.013149, -0.042503, -0.192152, 0.103608, -0.043752,
-0.101254, 1.423033, -0.298255, -0.074334],
[-0.095472, -0.131477, 0.296828, -0.089094, 0.215769, -0.091492,
-0.088171, -0.298255, 0.910274, -0.024663],
[-0.213979, -0.134314, -0.011661, 0.174269, 0.166648, -0.227673,
-0.063964, -0.074334, -0.024663, 0.914586]])
Coordinates:
* sample_0 (sample_0) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
* sample_1 (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
fid (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
iid (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
>>> print(K.values)
[[ 0.89 0.23 -0.19 -0.01 -0.14 0.29 0.27 -0.23 -0.10 -0.21]
[ 0.23 1.08 -0.45 0.19 -0.19 0.17 0.41 -0.01 -0.13 -0.13]
[-0.19 -0.45 1.18 -0.04 -0.15 -0.20 -0.31 -0.04 0.30 -0.01]
[-0.01 0.19 -0.04 0.90 -0.07 0.01 0.06 -0.19 -0.09 0.17]
[-0.14 -0.19 -0.15 -0.07 1.18 0.09 -0.03 0.10 0.22 0.17]
[ 0.29 0.17 -0.20 0.01 0.09 0.96 0.07 -0.04 -0.09 -0.23]
[ 0.27 0.41 -0.31 0.06 -0.03 0.07 0.71 -0.10 -0.09 -0.06]
[-0.23 -0.01 -0.04 -0.19 0.10 -0.04 -0.10 1.42 -0.30 -0.07]
[-0.10 -0.13 0.30 -0.09 0.22 -0.09 -0.09 -0.30 0.91 -0.02]
[-0.21 -0.13 -0.01 0.17 0.17 -0.23 -0.06 -0.07 -0.02 0.91]]
```

Please, refer to the [pandas-plink documentation](https://pandas-plink.readthedocs.io/) for more information.

## Authors

* [Danilo Horta](https://github.com/horta)

## License

This project is licensed under the [MIT License](https://raw.githubusercontent.com/limix/pandas-plink/master/LICENSE.md).