Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aegis301/nyc_high_school_project

Data cleaning project using NYC high school data
https://github.com/aegis301/nyc_high_school_project

data-analytics data-cleaning data-science data-visualization pandas

Last synced: about 1 month ago
JSON representation

Data cleaning project using NYC high school data

Awesome Lists containing this project

README

        

# NYC High School Project
*Disclaimer: This project is based on the data cleaning walkthrough provided by dataquest.io. Though my own take on this project might differ from what can be found on their website.*

## The Project
In this project I will try to showcase my skills in data cleaning, data exploration and presentation. While I will perform some analyses on this project, they will remain at a lower level of complexity. If you want to see my performance on more complex issues, I would refer you to my other projects.
## The Question
In this project, I will try to investigate whether standardized testing in U.S. highschools is efficiant and if certain groups are at a disadvantage.
## The Data
### The Data
In order to answer said question, I am going to use publicly accessible SAT data from 2012 from the city of New York. In order to investigate demographics I need more data though. Here's a list of all datasets I am going to use:

* [SAT scores by school](https://data.cityofnewyork.us/Education/2012-SAT-Results/f9bf-2cp4) - SAT scores for each high school in New York City
* [School attendance](https://data.cityofnewyork.us/Education/2010-2011-School-Attendance-and-Enrollment-Statist/7z8d-msnt) - Attendance information for each school in New York City
* [Class size](https://data.cityofnewyork.us/Education/2010-2011-Class-Size-School-level-detail/urz7-pzb3) - Information on class size for each school
* [AP test results](https://data.cityofnewyork.us/Education/AP-College-Board-2010-School-Level-Results/itfs-ms3e) - Advanced Placement (AP) exam results for each high school (passing an optional AP exam in a particular subject can earn a student college credit in that subject)
* [Graduation outcomes](https://data.cityofnewyork.us/Education/2005-2010-Graduation-Outcomes-School-Level/vh2h-md7a) - The percentage of students who graduated, and other outcome information
* [Demographics](https://data.cityofnewyork.us/Education/2006-2012-School-Demographics-and-Accountability-S/ihfw-zy9j) - Demographic information for each school
* [School survey](https://data.cityofnewyork.us/Education/2011-NYC-School-Survey/mnz3-dyi8) - Surveys of parents, teachers, and students at each school

## Skills Used
* reading different file formats into pandas
* condensing data by concatenating and merging pandas Data Frames
* converting data types into different formats
* converting, cleaning and recalculating rows and columns using vectorized methods
* working with geographic data
* handling and replacing missing data