https://github.com/ejw-data/pandas-adv-groupby
Example of advanced applications of python groupby method. Advanced aggregation and filter was applied to a song dataset.
https://github.com/ejw-data/pandas-adv-groupby
pandas
Last synced: 2 months ago
JSON representation
Example of advanced applications of python groupby method. Advanced aggregation and filter was applied to a song dataset.
- Host: GitHub
- URL: https://github.com/ejw-data/pandas-adv-groupby
- Owner: ejw-data
- Created: 2022-04-01T00:30:58.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-09-24T01:07:09.000Z (almost 3 years ago)
- Last Synced: 2025-03-15T16:44:27.790Z (over 1 year ago)
- Topics: pandas
- Language: Jupyter Notebook
- Homepage:
- Size: 4.59 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pandas-adv-groupby

Photo by [Debbie Molle](https://unsplash.com/@djmle29n?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/pandas?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
> Purpose: Often people don't realize that tasks sometimes performed with `for loops` can actaully be completed by using a `pandas dataframe` and `groupby` method if the pattern is consistent. If more complex conditions are needed then a `for loop` would be necessary or a multistep solution using pandas.
## Overview
In this file, we import in a file of historical songs and their annual rankings by year. The data file has several issues that need resolved before applying a groupby.
An example of how to use pandas dataframes to find ordered data is provided. In this case, we order the data by the 'Annual Ranking' and then find the first two rows of each year. This is a nice example that shows how the groupby method works.
## Status
Some touchup will be made to the code but the content for this activity is mostly complete.
## Technologies
* Python
* Pandas
* Matplotlib
## Data Source
Grimi94 (April 8, 2016). data-science-music (commit 7779de0). https://github.com/Grimi94/data-science-music. (Accessed March 31, 2022).
## Setup and Installation
1. Environment needs the following:
* Python 3.6+
* Pandas
1. Activate your environment
1. Clone the repo to your local machine
1. Start Jupyter Notebook within the environment from the repo
1. Run `pandas-groupby-songs.ipynb`
## Analysis
There is no real analysis to provide. The take aways of the activity show the following:
* `.groupby(column)` starts at the top of the dataframe and adds the values to new dataframe for each column. In Fig.1, the top path shows a groupby of by one column (last name) with only one other column(first name). The order of the values in the list is based on the order that the data appears in the dataframe.
* `.sort_values(by=column, ascending=True).groupby(column)` first sorts the dataframe from smallest to largest value for numerical values or from a-z for text values for the specified column. Next, the data is grouped based on the order of the data from the top of the dataframe to the bottom of the dataframe.
* `.sort_values(by=column, ascending=False).groupby(column)` first sorts the dataframe from largest to smallest value for numerical values or from z-a for text values for the specified column. Next, the data is grouped based on the order of the data from the top of the dataframe to the bottom of the dataframe.

Fig.1 - Diagram of what sorting, groupby, and list methods do
* series methods like `.head(x)` can be used to obtain the first x values.
* series methods like `.apply(list)` convert the groupby object into lists. This is helpful when you need a list of lists such as inputs for pandas boxplots.