{"id":26491736,"url":"https://github.com/danielc92/python-data-guide","last_synced_at":"2026-04-02T18:52:19.680Z","repository":{"id":100324088,"uuid":"188955533","full_name":"danielc92/python-data-guide","owner":"danielc92","description":" This repository contains detailed instructional notebooks which examine various well known data formats using python modules exclusively. I have setup a notebook for each data format within the notebooks directory of this repo. ","archived":false,"fork":false,"pushed_at":"2019-06-24T04:37:55.000Z","size":48734,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-13T08:50:27.412Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danielc92.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-28T04:46:10.000Z","updated_at":"2019-07-19T01:11:26.000Z","dependencies_parsed_at":"2023-05-13T19:19:20.308Z","dependency_job_id":null,"html_url":"https://github.com/danielc92/python-data-guide","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/danielc92/python-data-guide","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielc92%2Fpython-data-guide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielc92%2Fpython-data-guide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielc92%2Fpython-data-guide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielc92%2Fpython-data-guide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danielc92","download_url":"https://codeload.github.com/danielc92/python-data-guide/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielc92%2Fpython-data-guide/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269277967,"owners_count":24389969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-07T02:00:09.698Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-20T08:50:28.670Z","updated_at":"2025-12-30T21:49:15.535Z","avatar_url":"https://github.com/danielc92.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Accessing Data with Python Guide\nThis repository contains detailed instructional notebooks which examine a wide array of data formats using `python3` modules. I have created a notebook for each data format/theme within the `notebooks` directory of this repo. The notebooks will attempt to go through the process of reading in, exploring, manipulating and exporting the data format of interest. The following data formats are currently available;\n\n**Common formats**\n\n- EXCEL (.xlsx, .xls)\n- CSV (.csv, .tsv, .psv)\n- JSON (.json)\n\n**Markup Languages**\n\n- HTML (.html, .htm)\n- XML (.xml)\n\n**Geography**\n\n- SHAPEFILE (.shp)\n\n**Media**\n\n- IMAGE (.jpg, .jpeg, .png, .bmp, .tiff)\n- VIDEO (.mp4, .avi)\n\n**Databases**\n\n- SQL (sqlite, PostgreSQL, MySQL, Microsoft SQL Server, Oracle Server)\n- NOSQL (MongoDB)\n\n\n# Setup\nSystem used in this project is *Ubuntu 19.04* with *Python 3.7* installation\n\n**Setting up virtual environment**\n\n```sh\n# Get location of python bin file \nwhich python3\n```\n\n**Create virtual environment called 'venv'**\n\n```sh\n# specify the python path which was collected above\nvirtualenv --python=/usr/bin/python3 name_of_virtualenv\n```\n\n**Activate venv**\n\n```sh\n# activate using source command\nsource name_of_virtualenv/bin/activate\n```\n\n**Python requirements**\n\n```sh\npip install pandas opencv-python lxml bs4 fiona descartes matplotlib jupyter\n```\n\n# Formats\nThe following data formats are available in this guide.\n\n|Data format|Repo location|Link|Available| \n| ----- | ----- | ----- | ----- |\n|EXCEL|`notebooks/EXCEL.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/EXCEL.ipynb)|`yes`|\n|CSV|`notebooks/CSV.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/CSV.ipynb)|`yes`|\n|JSON|`notebooks/JSON.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/JSON.ipynb)|`yes`|\n|HTML|`notebooks/HTML.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/HTML.ipynb)|`yes`|\n|XML|`notebooks/XML.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/XML.ipynb)|`yes`|\n|Shapefile|`notebooks/Shapefiles.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/Shapefiles.ipynb)|`yes`|\n|IMAGE|`notebooks/IMAGE.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/IMAGE.ipynb)|`yes`|\n|VIDEO|`notebooks/VIDEO.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/VIDEO.ipynb)|`yes`|\n|SQL|`notebooks/SQL.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/SQL.ipynb)|`yes`|\n|NOSQL|`notebooks/NOSQL.ipynb`|[click here](https://github.com/danielc92/python-data-guide/blob/master/notebooks/NOSQL.ipynb)|`yes`|\n\n# Contributors\n- Daniel Corcoran\n\n# Sources\n**Data Sources**\n\n- [XML file source](https://data.gov.au/dataset/ds-dga-4b7b5b50-774f-4416-90ce-5b7df85ff8ce/details?q=XML)\n- [Shapefiles for Australia 2018 release by ABS](https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202018?OpenDocument)\n- [flights database file](https://www.dropbox.com/s/a2wax843eniq12g/flights.db?dl=0)\n- [illegal tipping video (youtube)](https://www.youtube.com/watch?v=pTXQXp1mDkQ)\n- [Image from Pixabay](https://pixabay.com/photos/image-statue-brass-child-art-1465348/)\n\n**Documentation Sources**\n\n- [`pandas` library](https://pandas.pydata.org/pandas-docs/stable/)\n- [`cv2` library](https://opencv-python-tutroals.readthedocs.io/en/latest/index.html)\n- [`fiona` library](https://pypi.org/project/Fiona/)\n- [SQLite DB](https://www.sqlite.org/draft/docs.html)\n- [`pymongo` library](https://api.mongodb.com/python/current/)\n- [`bs4` library](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielc92%2Fpython-data-guide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielc92%2Fpython-data-guide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielc92%2Fpython-data-guide/lists"}