{"id":27443885,"url":"https://github.com/josephmachado/python_essentials_for_data_engineers","last_synced_at":"2025-07-27T20:38:11.579Z","repository":{"id":242231802,"uuid":"809029117","full_name":"josephmachado/python_essentials_for_data_engineers","owner":"josephmachado","description":"Code for blog at https://www.startdataengineering.com/post/python-for-de/","archived":false,"fork":false,"pushed_at":"2024-06-07T19:05:59.000Z","size":412,"stargazers_count":73,"open_issues_count":0,"forks_count":92,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-15T02:58:01.469Z","etag":null,"topics":["data-engineering","data-quality-checks","duckdb","polars","python","transformations"],"latest_commit_sha":null,"homepage":"https://www.startdataengineering.com/post/python-for-de/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/josephmachado.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-01T13:37:47.000Z","updated_at":"2025-04-02T16:19:00.000Z","dependencies_parsed_at":"2024-06-07T20:06:53.279Z","dependency_job_id":null,"html_url":"https://github.com/josephmachado/python_essentials_for_data_engineers","commit_stats":null,"previous_names":["josephmachado/python_essentials_for_data_engineers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fpython_essentials_for_data_engineers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fpython_essentials_for_data_engineers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fpython_essentials_for_data_engineers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fpython_essentials_for_data_engineers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/josephmachado","download_url":"https://codeload.github.com/josephmachado/python_essentials_for_data_engineers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248997095,"owners_count":21195797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","data-quality-checks","duckdb","polars","python","transformations"],"created_at":"2025-04-15T02:58:04.453Z","updated_at":"2025-04-15T02:58:04.976Z","avatar_url":"https://github.com/josephmachado.png","language":"Python","readme":"\n\n* [Python Essentials for Data Engineers](#python-essentials-for-data-engineers)\n    * [Run on Codespaces](#run-on-codespaces)\n    * [Running on your laptop](#running-on-your-laptop)\n    * [Using python REPL](#using-python-repl)\n\nCode for Blog at: [Python Essentials for Data Engineers](https://www.startdataengineering.com/post/python-for-de/).\n\n# Python Essentials for Data Engineers \n\n## Run on Codespaces\n\nOpen codespaces and wait for codespaces to setup. The process of opening codespaces and waiting for completion is shown below.\n\n**NOTE**: Make sure to turn off codespaces, you only have limited free usage per month.\n\n![Open codespace](./assets/cs.png)\n![Wait for codespace to setup](./assets/cs2.png)\n\n## Running on your laptop\n\nClone the repo, cd into it and setup the virtual environment as shown below.\n\n```bash\ngit clone https://github.com/josephmachado/python_essentials_for_data_engineers.git\ncd python_essentials_for_data_engineers\n\npython -m venv myenv\nsource myenv/bin/activate\npip install -r requirements.txt\n\n# open python REPL with \npython\n```\n\n## Using python REPL\n\n![REPL](./assets/repl.png)\n\nIn the Python REPL you can try out the commands and do the exercises.\n\nTo run pytest (under ./tests folder) you will need to run the `python -m pytest ./tests` command.\n\nThe questions are files with the prefix `-questions.py`, use these as starting points to practice python for data engineering. While the workbooks have solutions, there are multiple ways to do the same thing, and as long as you get the correct answer, you should be good.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjosephmachado%2Fpython_essentials_for_data_engineers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjosephmachado%2Fpython_essentials_for_data_engineers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjosephmachado%2Fpython_essentials_for_data_engineers/lists"}