Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/codait/pardata
https://github.com/codait/pardata
artificial-intelligence data-science dataset machine-learning python
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/codait/pardata
- Owner: CODAIT
- License: apache-2.0
- Created: 2020-11-17T22:42:05.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2024-10-27T22:25:52.000Z (18 days ago)
- Last Synced: 2024-10-28T02:18:40.257Z (18 days ago)
- Topics: artificial-intelligence, data-science, dataset, machine-learning, python
- Language: Python
- Homepage: https://pardata.readthedocs.io/en/latest/
- Size: 45.7 MB
- Stars: 17
- Watchers: 12
- Forks: 6
- Open Issues: 47
-
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
- Authors: AUTHORS.rst
Awesome Lists containing this project
README
.. role:: file(literal)
.. role:: func(literal).. readme-start
ParData
=======.. image:: https://img.shields.io/pypi/v/pardata.svg
:target: https://pypi.python.org/pypi/pardata
:alt: PyPI.. image:: https://img.shields.io/pypi/pyversions/pardata
:target: https://pypi.python.org/pypi/pardata
:alt: PyPI - Python Version.. image:: https://img.shields.io/pypi/implementation/pardata
:target: https://pypi.python.org/pypi/pardata
:alt: PyPI - Implementation.. image:: https://badges.gitter.im/codait/pardata.svg
:target: https://gitter.im/codait/pardata
:alt: Gitter.. image:: https://github.com/codait/pardata/workflows/Runtime%20Tests/badge.svg
:target: https://github.com/CODAIT/pardata/commit/master
:alt: Runtime Tests.. image:: https://github.com/codait/pardata/workflows/Lint/badge.svg
:target: https://github.com/CODAIT/pardata/commit/master
:alt: Lint.. image:: https://github.com/codait/pardata/workflows/Docs/badge.svg
:target: https://github.com/CODAIT/pardata/commit/master
:alt: Docs.. image:: https://github.com/codait/pardata/workflows/Development%20Environment/badge.svg
:target: https://github.com/CODAIT/pardata/commit/master
:alt: Development EnvironmentParData (homophone of *partake*) is a Python API that enables data consumers and distributors to easily use and share
datasets, and establishes a standard for exchanging data assets. It enables:- a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and
- a data distributor to have a consistent, safe, and open source way to share datasets with interested communities... sidebar:: Quick Example
.. code-block:: python
>>> import pardata
>>> pardata.list_all_datasets()
{'claim_sentences_search': ('1.0.2',),
..., 'wikitext103': ('1.0.1',)}
>>> pardata.load_dataset('wikitext103')
{...} # Content of the datasetInstall the Package & its Dependencies
--------------------------------------To install the latest version of ParData, run
.. code-block:: console
$ pip install pardata
Alternatively, if you have downloaded the source, switch to the source directory (same directory as this README file,
``cd /path/to/pardata-source``) and run.. code-block:: console
$ pip install -U .
Quick Start
-----------Import the package and load a dataset. ParData will download `WikiText-103
`__ dataset (version ``1.0.1``) if it's not already
downloaded, and then load it... code-block:: python
import pardata
wikitext103_data = pardata.load_dataset('wikitext103')View available ParData datasets and their versions.
.. code-block:: python
>>> pardata.list_all_datasets()
{'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}To view your globally set configs for ParData, such as your default data directory, use :func:`pardata.get_config`.
.. code-block:: python
>>> pardata.get_config()
Config(DATADIR=PosixPath('dir/to/download/load/from'), ..., DATASET_SCHEMA_FILE_URL='file/to/load/datasets/from')By default, :func:`pardata.load_dataset` downloads to and loads from
:file:`~/.pardata/data///`. To change the default data directory, use :func:`pardata.init`... code-block:: python
pardata.init(DATADIR='new/dir/to/download/load/from')
Load a previously downloaded dataset using :func:`pardata.load_dataset`. With the new default data dir set, ParData now
searches for the `Groningen Meaning Bank `__
dataset (version ``1.0.2``) in :file:`new/dir/to/download/load/from/gmb/1.0.2/`... code-block:: python
gmb_data = load_dataset('gmb', version='1.0.2', download=False) # assuming GMB dataset was already downloaded
To learn more about ParData, check out `the documentation `__ and the
`tutorial `__.