https://github.com/holgern/pyrdatasets

2293 datasets from various R packages packed as DataFrames through compressed pickle files
https://github.com/holgern/pyrdatasets

data-science datasets python rdatasets

Last synced: 6 months ago
JSON representation

2293 datasets from various R packages packed as DataFrames through compressed pickle files

Host: GitHub
URL: https://github.com/holgern/pyrdatasets
Owner: holgern
License: gpl-3.0
Created: 2019-11-12T14:16:08.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-06-19T15:04:21.000Z (11 months ago)
Last Synced: 2024-11-01T20:12:15.810Z (7 months ago)
Topics: data-science, datasets, python, rdatasets
Language: Python
Homepage: http://vincentarelbundock.github.io/Rdatasets/datasets.html
Size: 140 MB
Stars: 8
Watchers: 3
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # pyRdatasets

[![PyPi Version](https://img.shields.io/pypi/v/rdatasets.svg)](https://pypi.python.org/pypi/rdatasets/)

[![Anaconda-Server Badge](https://anaconda.org/conda-forge/rdatasets/badges/version.svg)](https://anaconda.org/conda-forge/rdatasets)

[![Anaconda-Server Badge](https://anaconda.org/conda-forge/rdatasets/badges/downloads.svg)](https://anaconda.org/conda-forge/rdatasets)

pyRdatasets is a collection of 2293 datasets taken from https://github.com/vincentarelbundock/Rdatasets.

The datasets were extracted from various R packages and stored as gzip packed pickle files in pandas DataFrame structure.

A description to each dataset can be found here: http://vincentarelbundock.github.io/Rdatasets/datasets.html

All 2293 data records are already included in the package (no internet connection necessary), which has a size around 40 Mb.

## Installation

```

pip install rdatasets

```

or

```

conda install conda-forge::rdatasets

```

## Usage

```

>>> import rdatasets

>>> dataset = rdatasets.data("iris")

>>> dataset

     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species

0             5.1          3.5           1.4          0.2     setosa

1             4.9          3.0           1.4          0.2     setosa

2             4.7          3.2           1.3          0.2     setosa

3             4.6          3.1           1.5          0.2     setosa

4             5.0          3.6           1.4          0.2     setosa

..            ...          ...           ...          ...        ...

145           6.7          3.0           5.2          2.3  virginica

146           6.3          2.5           5.0          1.9  virginica

147           6.5          3.0           5.2          2.0  virginica

148           6.2          3.4           5.4          2.3  virginica

149           5.9          3.0           5.1          1.8  virginica

[150 rows x 5 columns]

>>> rdatasets.data("forecast", "co2")

Could not read forecast/co2

Which item did you mean: ['gas', 'gold', 'taylor', 'wineind', 'woolyrnq']?

>>> rdatasets.data("forecast", "gas")

            time  value

0    1956.000000   1709

1    1956.083333   1646

2    1956.166667   1794

3    1956.250000   1878

4    1956.333333   2173

..           ...    ...

471  1995.250000  49013

472  1995.333333  56624

473  1995.416667  61739

474  1995.500000  66600

475  1995.583333  60054

[476 rows x 2 columns]

```

The dataset description can be printed by:

```

import rdatasets

print(rdatasets.descr("iris"))

```

A summary of all datasets is available as DataFrame object:

```

import rdatasets

rdatasets.summary()

```

## Thanks to

The archive of datasets distributed with R: of https://github.com/vincentarelbundock/Rdatasets

## Pre-commit-config

### Installation

```

$ pip install pre-commit

```

### Using homebrew:

```

$ brew install pre-commit

```

```

$ pre-commit --version

pre-commit 2.10.0

```

### Install the git hook scripts

```

$ pre-commit install

```

### Run against all the files

```

pre-commit run --all-files

pre-commit run --show-diff-on-failure --color=always --all-files

```

### Update package rev in pre-commit yaml

```bash

pre-commit autoupdate

pre-commit run --show-diff-on-failure --color=always --all-files

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/holgern/pyrdatasets

Awesome Lists containing this project

README