{"id":13936883,"url":"https://github.com/iamaziz/PyDataset","last_synced_at":"2025-07-19T22:33:21.091Z","repository":{"id":37390999,"uuid":"50794193","full_name":"iamaziz/PyDataset","owner":"iamaziz","description":"Instant access to many datasets in Python.","archived":false,"fork":false,"pushed_at":"2022-03-25T16:24:01.000Z","size":15673,"stargazers_count":932,"open_issues_count":13,"forks_count":86,"subscribers_count":34,"default_branch":"master","last_synced_at":"2024-04-25T16:03:16.412Z","etag":null,"topics":["data-science","datasets","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iamaziz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-01-31T20:43:28.000Z","updated_at":"2024-03-31T07:55:11.000Z","dependencies_parsed_at":"2022-07-20T12:02:39.326Z","dependency_job_id":null,"html_url":"https://github.com/iamaziz/PyDataset","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamaziz%2FPyDataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamaziz%2FPyDataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamaziz%2FPyDataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamaziz%2FPyDataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iamaziz","download_url":"https://codeload.github.com/iamaziz/PyDataset/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226693903,"owners_count":17667757,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","datasets","python"],"created_at":"2024-08-07T23:03:05.214Z","updated_at":"2024-11-27T05:30:41.275Z","avatar_url":"https://github.com/iamaziz.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"## PyDataset\n [![PyPI version](https://badge.fury.io/py/pydataset.svg)](http://badge.fury.io/py/pydataset)\n\nProvides instant access to many datasets right from Python (in pandas DataFrame structure).\n\n### What?\n\nThe idea is simple. There are various datasets available out there, but they are scattered in different places over the web.\nIs there a quick way (in Python) to access them instantly without going through the hassle of searching, downloading, and reading ... etc?\nPyDataset tries to address that question :)\n\n\n### Usage:\n\nStart with importing `data()`:\n```python\nfrom pydataset import data\n```\n- To load a dataset:\n```python\ntitanic = data('titanic')\n```\n- To display the documentation of a dataset:\n```python\ndata('titanic', show_doc=True)\n```\n- To see the available datasets:\n```python\ndata()\n```\n\nThat's it.\nSee more [examples](examples).\n\n\n### Why?\n\nIn `R`, there is a very easy and immediate way to access multiple statistical datasets,\nin almost no effort. All it takes is one line ` \u003e data(dataset_name)`.\nThis makes the life easier for quick prototyping and testing.\nWell, I am jealous that Python does not have a similar functionality.\nThus, the aim of `pydataset` is to fill that gap.\n\nCurrently, `pydataset` has about 757 (mostly numerical-based) datasets, that are based on `RDatasets`.\nIn the future, I plan to scale it to include a larger set of datasets.\nFor example,\n1) include textual data for NLP-related tasks, and\n2) allow adding a new dataset to the in-module repository.\n\n\n### Installation:\n\n`$ pip install pydataset`\n\n#### Uninstall:\n\n- `$ pip uninstall pydataset`\n- `$ rm -rf $HOME/.pydataset`\n\n### Changelog\n\n**0.2.0**\n\n- Add search dataset by name similarity.\n- Example:\n\n```python\n\u003e\u003e\u003e data('heat')\nDid you mean:\nWheat, heart, Heating, Yeast, eidat, badhealth, deaths, agefat, hla, heptathlon, azt\n```\n\n**0.1.1**\n\n- Fix: add support to Windows and fix filepaths, issue #1\n\n### Dependency:\n- pandas\n\n### Miscellaneous:\n\n- Tested on OSX and Linux (debian).\n- Supports both Python 2 (2.7.11) and Python 3 (3.5.1).\n\n\n#### TODO:\n- add textual datasets (e.g. NLTK stuff).\n- add samples generators.\n\n\n#### Thanks to:\n\n- [RDatasets](https://github.com/vincentarelbundock/Rdatasets): R's datasets collection.  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamaziz%2FPyDataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiamaziz%2FPyDataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamaziz%2FPyDataset/lists"}