Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aegis301/nyc_high_school_project
Data cleaning project using NYC high school data
https://github.com/aegis301/nyc_high_school_project
data-analytics data-cleaning data-science data-visualization pandas
Last synced: about 1 month ago
JSON representation
Data cleaning project using NYC high school data
- Host: GitHub
- URL: https://github.com/aegis301/nyc_high_school_project
- Owner: aegis301
- Created: 2020-12-21T20:48:44.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2020-12-21T21:09:25.000Z (about 4 years ago)
- Last Synced: 2024-11-06T19:55:58.862Z (3 months ago)
- Topics: data-analytics, data-cleaning, data-science, data-visualization, pandas
- Language: Jupyter Notebook
- Homepage:
- Size: 18 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NYC High School Project
*Disclaimer: This project is based on the data cleaning walkthrough provided by dataquest.io. Though my own take on this project might differ from what can be found on their website.*## The Project
In this project I will try to showcase my skills in data cleaning, data exploration and presentation. While I will perform some analyses on this project, they will remain at a lower level of complexity. If you want to see my performance on more complex issues, I would refer you to my other projects.
## The Question
In this project, I will try to investigate whether standardized testing in U.S. highschools is efficiant and if certain groups are at a disadvantage.
## The Data
### The Data
In order to answer said question, I am going to use publicly accessible SAT data from 2012 from the city of New York. In order to investigate demographics I need more data though. Here's a list of all datasets I am going to use:* [SAT scores by school](https://data.cityofnewyork.us/Education/2012-SAT-Results/f9bf-2cp4) - SAT scores for each high school in New York City
* [School attendance](https://data.cityofnewyork.us/Education/2010-2011-School-Attendance-and-Enrollment-Statist/7z8d-msnt) - Attendance information for each school in New York City
* [Class size](https://data.cityofnewyork.us/Education/2010-2011-Class-Size-School-level-detail/urz7-pzb3) - Information on class size for each school
* [AP test results](https://data.cityofnewyork.us/Education/AP-College-Board-2010-School-Level-Results/itfs-ms3e) - Advanced Placement (AP) exam results for each high school (passing an optional AP exam in a particular subject can earn a student college credit in that subject)
* [Graduation outcomes](https://data.cityofnewyork.us/Education/2005-2010-Graduation-Outcomes-School-Level/vh2h-md7a) - The percentage of students who graduated, and other outcome information
* [Demographics](https://data.cityofnewyork.us/Education/2006-2012-School-Demographics-and-Accountability-S/ihfw-zy9j) - Demographic information for each school
* [School survey](https://data.cityofnewyork.us/Education/2011-NYC-School-Survey/mnz3-dyi8) - Surveys of parents, teachers, and students at each school## Skills Used
* reading different file formats into pandas
* condensing data by concatenating and merging pandas Data Frames
* converting data types into different formats
* converting, cleaning and recalculating rows and columns using vectorized methods
* working with geographic data
* handling and replacing missing data