{"id":23090484,"url":"https://github.com/dgerlanc/programming-with-data","last_synced_at":"2025-04-06T03:10:30.828Z","repository":{"id":44419355,"uuid":"68971623","full_name":"dgerlanc/programming-with-data","owner":"dgerlanc","description":"🐍 Learn Python and Pandas from the ground up","archived":false,"fork":false,"pushed_at":"2022-12-04T22:17:13.000Z","size":5591,"stargazers_count":246,"open_issues_count":2,"forks_count":215,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-03-30T02:07:28.814Z","etag":null,"topics":["dangerlanc","data-science","pandas","pandas-tutorial","python","workshop"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dgerlanc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-22T23:52:37.000Z","updated_at":"2024-12-02T02:38:07.000Z","dependencies_parsed_at":"2023-01-23T07:00:16.602Z","dependency_job_id":null,"html_url":"https://github.com/dgerlanc/programming-with-data","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgerlanc%2Fprogramming-with-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgerlanc%2Fprogramming-with-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgerlanc%2Fprogramming-with-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgerlanc%2Fprogramming-with-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dgerlanc","download_url":"https://codeload.github.com/dgerlanc/programming-with-data/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247427006,"owners_count":20937201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dangerlanc","data-science","pandas","pandas-tutorial","python","workshop"],"created_at":"2024-12-16T20:54:35.895Z","updated_at":"2025-04-06T03:10:30.798Z","avatar_url":"https://github.com/dgerlanc.png","language":"Jupyter Notebook","readme":"# Programming with Data: Python and Pandas\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dgerlanc/programming-with-data/main?urlpath=lab)\n\nThis repository contains the slides, exercises, and answers for *Programming\nwith Data: Python and Pandas*. The goal of this tutorial is to teach you,\nsomeone with experience programming in Python, most of the features available in\nPandas. The material from this course has been presented at conferences\nincluding ODSC and Battlefin Discovery Data and online through the O'Reilly\nplatform.\n\n## Why this course exists\nWhether in R, MATLAB, Stata, or python, modern data analysis, for many\nresearchers, requires some kind of programming. The preponderance of tools and\nspecialized languages for data analysis suggests that general purpose\nprogramming languages like C and Java do not readily address the needs of data\nscientists; something more is needed.\n\nIn this workshop, you will learn how to accelerate your data analyses using the\nPython language and Pandas, a library specifically designed for interactive data\nanalysis. Pandas is a massive library, so we will focus on its core\nfunctionality, specifically, loading, filtering, grouping, and transforming\ndata. Having completed this workshop, you will understand the fundamentals of\nPandas, be aware of common pitfalls, and be ready to perform your own analyses.\n\n### Prerequisites:\n\nWorkshop assumes that participants have intermediate-level programming ability\nin Python. Participants should know the difference between a `dict`, `list`, and\n`tuple`. Familiarity with control-flow (`if/else/for/while`) and error handling\n(`try/catch`) are required.\n\nNo statistics background is required.\n\n## Installation\n\n### Binder\n\nIf you have a stable Internet connection and the free Binder service isn't under\ntoo much load, the easiest way to interactively run the slides and try the\nexercises is to click the Binder badge (make sure you open in a new window).\nKeep in mind that Binder aggresively shuts down idle instances so you'll need to\nrefresh the link if you're idle for too long.\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dgerlanc/programming-with-data/main)\n\n### Prerendered Notebooks\n\nYou may view the HTML versions of slides and the answers directly in your browser on Github\nthough you will not be able to run them interactively:\n\n* [Lesson 1 - Series](https://github.com/dgerlanc/programming-with-data/blob/main/01-intro-to-pandas-part-1-slides.ipynb)\n* [Lesson 2 - DataFrames](https://github.com/dgerlanc/programming-with-data/blob/main/02-intro-to-pandas-part-2-slides.ipynb)\n* [Lesson 3 - Split, Apply, Combine](https://github.com/dgerlanc/programming-with-data/blob/main/03-group-apply-slides.ipynb)\n* [Lesson 4 - Time Series](https://github.com/dgerlanc/programming-with-data/blob/main/04-time-series-slides.ipynb)\n* [Lesson 5 - Merge and Concat](https://github.com/dgerlanc/programming-with-data/blob/main/05-merge-pivot-slides.ipynb)\n* [Lesson 6 - Advanced Merge and Reshape](https://github.com/dgerlanc/programming-with-data/blob/main/06-advanced-merge-reshape-slides.ipynb)\n\n### Local Installation\n\nIf you're taking the course, want to follow along with the slides and do the\nexercises, and may not have Internet access, download and\ninstall the Anaconda Python 3 distribution and `conda` package manager\nahead of time:\n\n```\nhttps://www.anaconda.com/download/\n```\n\nDownload the latest version of the course materials\n[here](https://github.com/dgerlanc/programming-with-data/archive/main.zip).\n\nAlternatively, you may clone the course repository using `git`:\n\n```\n$ git clone https://github.com/dgerlanc/programming-with-data.git\n```\n\nThe remainder of the installation requires that you use the command line.\n\nTo complete the course exercises, you must use `conda` to install the\ndependencies specified in the `environment.yml` file in the repository:\n\n```\n$ conda env create -f environment.yml\n```\n\nThis will create an `conda` environment called `progwd` which may be\n\"activated\" with the following commands:\n\n* Windows: `activate progwd`\n* Linux and Mac: `conda activate progwd`\n\nOnce you've activated the environment your prompt will probably\nlook something like this:\n\n```\n(progwd) $\n```\n\nThe entire course is designed to use `jupyter` notebooks. Start the\nnotebook server to get started:\n\n```\n(progwd) $ jupyter lab\n```\n\n## Feedback\n\nYour feedback on the course helps to improve it for future students.\nPlease leave feedback [here](https://danielgerlanc.typeform.com/to/RyB6AJ).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgerlanc%2Fprogramming-with-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdgerlanc%2Fprogramming-with-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgerlanc%2Fprogramming-with-data/lists"}