{"id":19140792,"url":"https://github.com/codait/pardata","last_synced_at":"2025-12-14T15:35:51.090Z","repository":{"id":41887156,"uuid":"313761295","full_name":"CODAIT/pardata","owner":"CODAIT","description":null,"archived":false,"fork":false,"pushed_at":"2025-11-18T04:00:40.000Z","size":47892,"stargazers_count":17,"open_issues_count":46,"forks_count":6,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-11-18T06:08:38.577Z","etag":null,"topics":["artificial-intelligence","data-science","dataset","machine-learning","python"],"latest_commit_sha":null,"homepage":"https://pardata.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CODAIT.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-11-17T22:42:05.000Z","updated_at":"2024-02-11T02:01:18.000Z","dependencies_parsed_at":"2023-10-03T01:03:07.858Z","dependency_job_id":"de23bcdb-005d-444b-a210-28fd4a9dd9fa","html_url":"https://github.com/CODAIT/pardata","commit_stats":null,"previous_names":["codait/pydax"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/CODAIT/pardata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fpardata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fpardata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fpardata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fpardata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CODAIT","download_url":"https://codeload.github.com/CODAIT/pardata/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fpardata/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27730563,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-14T02:00:11.348Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","data-science","dataset","machine-learning","python"],"created_at":"2024-11-09T07:18:51.033Z","updated_at":"2025-12-14T15:35:51.074Z","avatar_url":"https://github.com/CODAIT.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. role:: file(literal)\n.. role:: func(literal)\n\n.. readme-start\n\nParData\n=======\n\n.. image:: https://img.shields.io/pypi/v/pardata.svg\n   :target: https://pypi.python.org/pypi/pardata\n   :alt: PyPI\n\n.. image:: https://img.shields.io/pypi/pyversions/pardata\n   :target: https://pypi.python.org/pypi/pardata\n   :alt: PyPI - Python Version\n\n.. image:: https://img.shields.io/pypi/implementation/pardata\n   :target: https://pypi.python.org/pypi/pardata\n   :alt: PyPI - Implementation\n\n.. image:: https://badges.gitter.im/codait/pardata.svg\n   :target: https://gitter.im/codait/pardata\n   :alt: Gitter\n\n.. image:: https://github.com/codait/pardata/workflows/Runtime%20Tests/badge.svg\n   :target: https://github.com/CODAIT/pardata/commit/master\n   :alt: Runtime Tests\n\n.. image:: https://github.com/codait/pardata/workflows/Lint/badge.svg\n   :target: https://github.com/CODAIT/pardata/commit/master\n   :alt: Lint\n\n.. image:: https://github.com/codait/pardata/workflows/Docs/badge.svg\n   :target: https://github.com/CODAIT/pardata/commit/master\n   :alt: Docs\n\n.. image:: https://github.com/codait/pardata/workflows/Development%20Environment/badge.svg\n   :target: https://github.com/CODAIT/pardata/commit/master\n   :alt: Development Environment\n\nParData (homophone of *partake*) is a Python API that enables data consumers and distributors to easily use and share\ndatasets, and establishes a standard for exchanging data assets. It enables:\n\n- a data scientist to have a simpler and more unified way to begin working with a wide range of datasets, and\n- a data distributor to have a consistent, safe, and open source way to share datasets with interested communities.\n\n.. sidebar:: Quick Example\n\n   .. code-block:: python\n\n      \u003e\u003e\u003e import pardata\n      \u003e\u003e\u003e pardata.list_all_datasets()\n      {'claim_sentences_search': ('1.0.2',),\n       ..., 'wikitext103': ('1.0.1',)}\n      \u003e\u003e\u003e pardata.load_dataset('wikitext103')\n      {...}  # Content of the dataset\n\nInstall the Package \u0026 its Dependencies\n--------------------------------------\n\nTo install the latest version of ParData, run\n\n.. code-block:: console\n\n   $ pip install pardata\n\nAlternatively, if you have downloaded the source, switch to the source directory (same directory as this README file,\n``cd /path/to/pardata-source``) and run\n\n.. code-block:: console\n\n   $ pip install -U .\n\nQuick Start\n-----------\n\nImport the package and load a dataset. ParData will download `WikiText-103\n\u003chttps://developer.ibm.com/exchanges/data/all/wikitext-103/\u003e`__ dataset (version ``1.0.1``) if it's not already\ndownloaded, and then load it.\n\n.. code-block:: python\n\n   import pardata\n   wikitext103_data = pardata.load_dataset('wikitext103')\n\nView available ParData datasets and their versions.\n\n.. code-block:: python\n\n   \u003e\u003e\u003e pardata.list_all_datasets()\n   {'claim_sentences_search': ('1.0.2',), ..., 'wikitext103': ('1.0.1',)}\n\nTo view your globally set configs for ParData, such as your default data directory, use :func:`pardata.get_config`.\n\n.. code-block:: python\n\n   \u003e\u003e\u003e pardata.get_config()\n   Config(DATADIR=PosixPath('dir/to/download/load/from'), ..., DATASET_SCHEMA_FILE_URL='file/to/load/datasets/from')\n\nBy default, :func:`pardata.load_dataset` downloads to and loads from\n:file:`~/.pardata/data/\u003cdataset-name\u003e/\u003cdataset-version\u003e/`. To change the default data directory, use :func:`pardata.init`.\n\n.. code-block:: python\n\n   pardata.init(DATADIR='new/dir/to/download/load/from')\n\nLoad a previously downloaded dataset using :func:`pardata.load_dataset`. With the new default data dir set, ParData now\nsearches for the `Groningen Meaning Bank \u003chttps://developer.ibm.com/exchanges/data/all/groningen-meaning-bank/\u003e`__\ndataset (version ``1.0.2``) in :file:`new/dir/to/download/load/from/gmb/1.0.2/`.\n\n.. code-block:: python\n\n   gmb_data = load_dataset('gmb', version='1.0.2', download=False)  # assuming GMB dataset was already downloaded\n\nTo learn more about ParData, check out `the documentation \u003chttps://pardata.readthedocs.io\u003e`__ and the\n`tutorial \u003chttps://pardata.readthedocs.io#tutorial\u003e`__.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodait%2Fpardata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodait%2Fpardata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodait%2Fpardata/lists"}