{"id":23473000,"url":"https://github.com/caltechlibrary/py_dataset","last_synced_at":"2025-04-14T18:42:25.848Z","repository":{"id":57455833,"uuid":"175684474","full_name":"caltechlibrary/py_dataset","owner":"caltechlibrary","description":"Python package of dataset (https://github.com/caltechlibrary/dataset) for working with JSON objects as collections on disc","archived":false,"fork":false,"pushed_at":"2023-09-27T00:40:21.000Z","size":426818,"stargazers_count":2,"open_issues_count":3,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-12T00:56:58.146Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://caltechlibrary.github.io/py_dataset","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/caltechlibrary.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2019-03-14T19:15:15.000Z","updated_at":"2025-03-16T15:16:37.000Z","dependencies_parsed_at":"2023-09-27T05:27:08.438Z","dependency_job_id":null,"html_url":"https://github.com/caltechlibrary/py_dataset","commit_stats":{"total_commits":93,"total_committers":3,"mean_commits":31.0,"dds":"0.22580645161290325","last_synced_commit":"72e38a3c6a5993f0c953ec274a974dd5e2791c58"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fpy_dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fpy_dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fpy_dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fpy_dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/caltechlibrary","download_url":"https://codeload.github.com/caltechlibrary/py_dataset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248938380,"owners_count":21186396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-24T17:14:41.076Z","updated_at":"2025-04-14T18:42:25.805Z","avatar_url":"https://github.com/caltechlibrary.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# py_dataset   [![DOI](https://data.caltech.edu/badge/175684474.svg)](https://data.caltech.edu/badge/latestdoi/175684474)\n\npy_dataset is a Python wrapper for the [dataset](https://github.com/caltechlibrary/dataset) \nlibdataset a C shared library for working with \n[JSON](https://en.wikipedia.org/wiki/JSON) objects as collections. \nCollections can be stored on disc or in Cloud Storage.  JSON objects \nare stored in collections using a pairtree as plain UTF-8 text files.\nThis means the objects can be accessed with common \nUnix text processing tools as well as most programming languages.\n\nThis package wraps all [dataset](docs/) operations such \nas initialization of collections, creation, \nreading, updating and deleting JSON objects in the collection. Some of \nits enhanced features include the ability to generate data \n[frames](docs/frame.html) as well as the ability to \nimport and export JSON objects to and from CSV files.\n\npy_dataset is release under a [BSD](LICENSE) style license.\n\n## Features\n\n[dataset](docs/) supports \n\n- Basic storage actions ([create](docs/create.html), [read](docs/read.html), [update](docs/update.html) and [delete](docs/delete.html))\n- listing of collection [keys](docs/keys.html) (including filtering and sorting)\n- import/export  of [CSV](docs/csv.html) files.\n- The ability to reshape data by performing simple object [join](docs/join.html)\n- The ability to create data [frames](docs/frames.html) from collections based on keys lists and [dot paths](docs/dotpath.html) into the JSON objects stored\n\nSee [docs](docs/) for detials.\n\n### Limitations of _dataset_\n\n_dataset_ has many limitations, some are listed below\n\n- it is not a multi-process, multi-user data store (it's files on \"disc\" without locking)\n- it is not a replacement for a repository management system\n- it is not a general purpose database system\n- it does not supply version control on collections or objects\n\n## Install\n\nAvailable via pip `pip install py_dataset` or by downloading this repo and\ntyping `python setup.py install`. This repo includes dataset shared C libraries\ncompiled for Windows, Mac, and Linux and the appripriate library will be used\nautomatically.\n\n## Quick Tutorial\n\nThis module provides the functionality of the _dataset_ command line tool as a Python 3.10 module.\nOnce installed try out the following commands to see if everything is in order (or to get familier with\n_dataset_).\n\nThe \"#\" comments don't have to be typed in, they are there to explain the commands as your type them.\nStart the tour by launching Python3 in interactive mode.\n\n```shell\n    python3\n```\n\nThen run the following Python commands.\n\n```python\n    from py_dataset import dataset\n    # Almost all the commands require the collection_name as first paramter, \n    # we're storing that name in c_name for convienence.\n    c_name = \"a_tour_of_dataset.ds\"\n\n    # Let's create our a dataset collection. We use the method called \n    # 'init' it returns True on success or False otherwise.\n    dataset.init(c_name)\n\n    # Let's check to see if our collection to exists, True it exists\n    # False if it doesn't.\n    dataset.status(c_name)\n\n    # Let's count the records in our collection (should be zero)\n    cnt = dataset.count(c_name)\n    print(cnt)\n\n    # Let's read all the keys in the collection (should be an empty list)\n    keys = dataset.keys(c_name)\n    print(keys)\n\n    # Now let's add a record to our collection. To create a record we need to know\n    # this collection name (e.g. c_name), the key (most be string) and have a \n    # record (i.e. a dict literal or variable)\n    key = \"one\"\n    record = {\"one\": 1}\n    # If create returns False, we can check the last error message \n    # with the 'error_message' method\n    if not dataset.create(c_name, key, record):\n        print(dataset.error_message())\n\n    # Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'\n    dataset.count(c_name)\n    keys = dataset.keys(c_name)\n    print(keys)\n\n    # We can read the record we stored using the 'read' method.\n    new_record, err = dataset.read(c_name, key)\n    if err != '':\n        print(err)\n    else:\n        print(new_record)\n\n    # Let's modify new_record and update the record in our collection\n    new_record[\"two\"] = 2\n    if not dataset.update(c_name, key, new_record):\n        print(dataset.error_message())\n\n    # Let's print out the record we stored using read method\n    # read returns a touple so we're printing the first one.\n    print(dataset.read(c_name, key)[0])\n\n    # Finally we can remove (delete) a record from our collection\n    if not dataset.delete(c_name, key):\n        print(dataset.error_message())\n\n    # We should not have a count of Zero records\n    cnt = dataset.count(c_name)\n    print(cnt)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaltechlibrary%2Fpy_dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaltechlibrary%2Fpy_dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaltechlibrary%2Fpy_dataset/lists"}