{"id":25740136,"url":"https://github.com/datadotworld/data.world-py","last_synced_at":"2025-04-06T12:11:54.096Z","repository":{"id":41190263,"uuid":"79499370","full_name":"datadotworld/data.world-py","owner":"datadotworld","description":"Python package for data.world","archived":false,"fork":false,"pushed_at":"2024-04-19T12:50:33.000Z","size":2173,"stargazers_count":101,"open_issues_count":23,"forks_count":27,"subscribers_count":42,"default_branch":"main","last_synced_at":"2025-03-30T10:07:54.441Z","etag":null,"topics":["api-client","datasets","dwstruct-t01-dist","open-data","reference-implementation"],"latest_commit_sha":null,"homepage":"https://data.world/integrations/python","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datadotworld.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-19T21:57:19.000Z","updated_at":"2025-01-29T17:50:46.000Z","dependencies_parsed_at":"2023-12-05T03:31:16.042Z","dependency_job_id":"c7e4e3d3-7597-4488-8ca7-68deffff8577","html_url":"https://github.com/datadotworld/data.world-py","commit_stats":{"total_commits":95,"total_committers":17,"mean_commits":5.588235294117647,"dds":0.736842105263158,"last_synced_commit":"9cf755f0c93d47507356fd2cae40bb7f7351eb4e"},"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Fdata.world-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Fdata.world-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Fdata.world-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Fdata.world-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datadotworld","download_url":"https://codeload.github.com/datadotworld/data.world-py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247478324,"owners_count":20945266,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-client","datasets","dwstruct-t01-dist","open-data","reference-implementation"],"created_at":"2025-02-26T08:36:37.984Z","updated_at":"2025-04-06T12:11:54.076Z","avatar_url":"https://github.com/datadotworld.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"=============\ndata.world-py\n=============\n\nA python library for working with data.world datasets.\n\nThis library makes it easy for data.world users to pull and work with data stored on data.world.\nAdditionally, the library provides convenient wrappers for data.world APIs, allowing users to create and update\ndatasets, add and modify files, etc, and possibly implement entire apps on top of data.world.\n\n\nQuick start\n===========\n\nInstall\n-------\n\nYou can install it using ``pip`` directly from PyPI::\n\n    pip install datadotworld\n\nOptionally, you can install the library including pandas support::\n\n    pip install datadotworld[pandas]\n\nIf you use ``conda`` to manage your python distribution, you can install from the community-maintained [conda-forge](https://conda-forge.github.io/) channel::\n\n    conda install -c conda-forge datadotworld-py\n\n\nConfigure\n---------\n\nThis library requires a data.world API authentication token to work.\n\nYour authentication token can be obtained on data.world once you enable Python under\n`Integrations \u003e Python \u003chttps://data.world/integrations/python\u003e`_\n\nTo configure the library, run the following command::\n\n    dw configure\n\n\nAlternatively, tokens can be provided via the ``DW_AUTH_TOKEN`` environment variable.\nOn MacOS or Unix machines, run (replacing ``\u003cYOUR_TOKEN\u003e\u003e`` below with the token obtained earlier)::\n\n    export DW_AUTH_TOKEN=\u003cYOUR_TOKEN\u003e\n\nLoad a dataset\n--------------\n\nThe ``load_dataset()`` function facilitates maintaining copies of datasets on the local filesystem.\nIt will download a given dataset's `datapackage \u003chttp://specs.frictionlessdata.io/data-package/\u003e`_\nand store it under ``~/.dw/cache``. When used subsequently, ``load_dataset()`` will use the copy stored on disk and will\nwork offline, unless it's called with ``force_update=True`` or ``auto_update=True``. ``force_update=True`` will overwrite your local copy unconditionally. ``auto_update=True`` will only overwrite your local copy if a newer version of the dataset is available on data.world.\n\nOnce loaded, a dataset (data and metadata) can be conveniently accessed via the object returned by ``load_dataset()``.\n\nStart by importing the ``datadotworld`` module:\n\n.. code-block:: python\n\n    import datadotworld as dw\n\nThen, invoke the ``load_dataset()`` function, to download a dataset and work with it locally.\nFor example:\n\n.. code-block:: python\n\n    intro_dataset = dw.load_dataset('jonloyens/an-intro-to-dataworld-dataset')\n\nDataset objects allow access to data via three different properties ``raw_data``, ``tables`` and ``dataframes``.\nEach of these properties is a mapping (dict) whose values are of type ``bytes``, ``list`` and ``pandas.DataFrame``,\nrespectively. Values are lazy loaded and cached once loaded. Their keys are the names of the files\ncontained in the dataset.\n\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e intro_dataset.dataframes\n    LazyLoadedDict({\n        'changelog': LazyLoadedValue(\u003cpandas.DataFrame\u003e),\n        'datadotworldbballstats': LazyLoadedValue(\u003cpandas.DataFrame\u003e),\n        'datadotworldbballteam': LazyLoadedValue(\u003cpandas.DataFrame\u003e)})\n\n**IMPORTANT**: Not all files in a dataset are tabular, therefore some will be exposed via ``raw_data`` only.\n\nTables are lists of rows, each represented by a mapping (dict) of column names to their respective values.\n\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e stats_table = intro_dataset.tables['datadotworldbballstats']\n    \u003e\u003e\u003e stats_table[0]\n    OrderedDict([('Name', 'Jon'),\n                 ('PointsPerGame', Decimal('20.4')),\n                 ('AssistsPerGame', Decimal('1.3'))])\n\nYou can also review the metadata associated with a file or the entire dataset, using the ``describe`` function.\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e intro_dataset.describe()\n    {'homepage': 'https://data.world/jonloyens/an-intro-to-dataworld-dataset',\n     'name': 'jonloyens_an-intro-to-dataworld-dataset',\n     'resources': [{'format': 'csv',\n       'name': 'changelog',\n       'path': 'data/ChangeLog.csv'},\n      {'format': 'csv',\n       'name': 'datadotworldbballstats',\n       'path': 'data/DataDotWorldBBallStats.csv'},\n      {'format': 'csv',\n       'name': 'datadotworldbballteam',\n       'path': 'data/DataDotWorldBBallTeam.csv'}]}\n    \u003e\u003e\u003e intro_dataset.describe('datadotworldbballstats')\n    {'format': 'csv',\n     'name': 'datadotworldbballstats',\n     'path': 'data/DataDotWorldBBallStats.csv',\n     'schema': {'fields': [{'name': 'Name', 'title': 'Name', 'type': 'string'},\n                           {'name': 'PointsPerGame',\n                            'title': 'PointsPerGame',\n                            'type': 'number'},\n                           {'name': 'AssistsPerGame',\n                            'title': 'AssistsPerGame',\n                            'type': 'number'}]}}\n\nQuery a dataset\n---------------\n\nThe ``query()`` function allows datasets to be queried live using ``SQL`` or ``SPARQL`` query languages.\n\nTo query a dataset, invoke the ``query()`` function.\nFor example:\n\n.. code-block:: python\n\n    results = dw.query('jonloyens/an-intro-to-dataworld-dataset', 'SELECT * FROM DataDotWorldBBallStats')\n\nQuery result objects allow access to the data via ``raw_data``, ``table`` and ``dataframe`` properties, of type\n``json``, ``list`` and ``pandas.DataFrame``, respectively.\n\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e results.dataframe\n          Name  PointsPerGame  AssistsPerGame\n    0      Jon           20.4             1.3\n    1      Rob           15.5             8.0\n    2   Sharon           30.1            11.2\n    3     Alex            8.2             0.5\n    4  Rebecca           12.3            17.0\n    5   Ariane           18.1             3.0\n    6    Bryon           16.0             8.5\n    7     Matt           13.0             2.1\n\n\nTables are lists of rows, each represented by a mapping (dict) of column names to their respective values.\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e results.table[0]\n    OrderedDict([('Name', 'Jon'),\n                 ('PointsPerGame', Decimal('20.4')),\n                 ('AssistsPerGame', Decimal('1.3'))])\n\nTo query using ``SPARQL`` invoke ``query()`` using ``query_type='sparql'``, or else, it will assume\nthe query to be a ``SQL`` query.\n\nJust like in the dataset case, you can view the metadata associated with a query result using the ``describe()``\nfunction.\n\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e results.describe()\n    {'fields': [{'name': 'Name', 'type': 'string'},\n                {'name': 'PointsPerGame', 'type': 'number'},\n                {'name': 'AssistsPerGame', 'type': 'number'}]}\n\nWork with files\n---------------\n\nThe ``open_remote_file()`` function allows you to write data to or read data from a file in a\ndata.world dataset.\n\nWriting files\n.............\n\nThe object that is returned from the ``open_remote_file()`` call is similar to a file handle that\nwould be used to write to a local file - it has a ``write()`` method, and contents sent to that\nmethod will be written to the file remotely.\n\n.. code-block:: python\n\n        \u003e\u003e\u003e import datadotworld as dw\n        \u003e\u003e\u003e\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test.txt') as w:\n        ...   w.write(\"this is a test.\")\n        \u003e\u003e\u003e\n\nOf course, writing a text file isn't the primary use case for data.world - you want to write your\ndata!  The return object from ``open_remote_file()`` should be usable anywhere you could normally\nuse a local file handle in write mode - so you can use it to serialize the contents of a PANDAS\n``DataFrame`` to a CSV file...\n\n.. code-block:: python\n\n        \u003e\u003e\u003e import pandas as pd\n        \u003e\u003e\u003e df = pd.DataFrame({'foo':[1,2,3,4],'bar':['a','b','c','d']})\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'dataframe.csv') as w:\n        ...   df.to_csv(w, index=False)\n\nOr, to write a series of ``dict`` objects as a JSON Lines file...\n\n.. code-block:: python\n\n        \u003e\u003e\u003e import json\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test.jsonl') as w:\n        ...   json.dump({'foo':42, 'bar':\"A\"}, w)\n        ...   json.dump({'foo':13, 'bar':\"B\"}, w)\n        \u003e\u003e\u003e\n\nOr to write a series of ``dict`` objects as a CSV...\n\n.. code-block:: python\n\n        \u003e\u003e\u003e import csv\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test.csv') as w:\n        ...   csvw = csv.DictWriter(w, fieldnames=['foo', 'bar'])\n        ...   csvw.writeheader()\n        ...   csvw.writerow({'foo':42, 'bar':\"A\"})\n        ...   csvw.writerow({'foo':13, 'bar':\"B\"})\n        \u003e\u003e\u003e\n\nAnd finally, you can write binary data by streaming ``bytes`` or ``bytearray`` objects, if you open the\nfile in binary mode...\n\n.. code-block:: python\n\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test.txt', mode='wb') as w:\n        ...   w.write(bytes([100,97,116,97,46,119,111,114,108,100]))\n\nReading files\n.............\n\nYou can also read data from a file in a similar fashion\n\n.. code-block:: python\n\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r:\n        ...   print(r.read)\n\n\nReading from the file into common parsing libraries works naturally, too - when opened in 'r' mode, the\nfile object acts as an Iterator of the lines in the file:\n\n.. code-block:: python\n\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test.txt', mode='r') as r:\n        ...   csvr = csv.DictReader(r)\n        ...   for row in csvr:\n        ...      print(row['column a'], row['column b'])\n\n\nReading binary files works naturally, too - when opened in 'rb' mode, ``read()`` returns the contents of\nthe file as a byte array, and the file object acts as an iterator of bytes:\n\n.. code-block:: python\n\n        \u003e\u003e\u003e with dw.open_remote_file('username/test-dataset', 'test', mode='rb') as r:\n        ...   bytes = r.read()\n\n\nAdditional API Features\n-----------------------\n\nFor a complete list of available API operations, see\n`official documentation \u003chttps://docs.data.world/documentation/api/\u003e`_.\n\nPython wrappers are implemented by the ``ApiClient`` class. To obtain an instance, simply call ``api_client``.\nFor example:\n\n.. code-block:: python\n\n    client = dw.api_client\n\nThe client currently implements the following functions:\n\n* ``create_dataset``\n* ``update_dataset``\n* ``replace_dataset``\n* ``get_dataset``\n* ``delete_dataset``\n* ``add_files_via_url``\n* ``append_records``\n* ``upload_files``\n* ``upload_file``\n* ``delete_files``\n* ``sync_files``\n* ``download_dataset``\n* ``download_file``\n* ``get_user_data``\n* ``fetch_contributing_datasets``\n* ``fetch_liked_datasets``\n* ``fetch_datasets``\n* ``fetch_contributing_projects``\n* ``fetch_liked_projects``\n* ``fetch_projects``\n* ``get_project``\n* ``create_project``\n* ``update_project``\n* ``replace_project``\n* ``add_linked_dataset``\n* ``remove_linked_dataset``\n* ``delete_project``\n* ``get_insight``\n* ``get_insights_for_project``\n* ``create_insight``\n* ``replace_insight``\n* ``update_insight``\n* ``delete_insight``\n* ``search_resources``\n* ``create_new_tables``\n* ``create_new_connections``\n\nFor a few examples of what the ``ApiClient`` can be used for, see below.\n\nAdd files from URL\n..................\n\nThe ``add_files_via_url()`` function can be used to add files to a dataset from a URL. \nThis can be done by specifying ``files`` as a dictionary where the keys are the desired file name and each item is an object containing ``url``, ``description`` and ``labels``. \n\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e client = dw.api_client()\n    \u003e\u003e\u003e client.add_files_via_url('username/test-dataset', files={'sample.xls': {'url':'http://www.sample.com/sample.xls', 'description': 'sample doc', 'labels': ['raw data']}})\n\nAppend records to stream\n........................\n\nThe ``append_record()`` function allows you to append JSON data to a data stream associated with a dataset. Streams do not need to be created in advance. Streams are automatically created the first time a ``streamId`` is used in an append operation. \n\nFor example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e client = dw.api_client()\n    \u003e\u003e\u003e client.append_records('username/test-dataset','streamId', {'data': 'data'})\n\nContents of a stream will appear as part of the respective dataset as a .jsonl file.\n\nYou can find more about those functions using ``help(client)``\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatadotworld%2Fdata.world-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatadotworld%2Fdata.world-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatadotworld%2Fdata.world-py/lists"}