https://github.com/memgonzales/pisa-2018-analysis
Jupyter notebook presenting the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the 2018 PISA Dataset
https://github.com/memgonzales/pisa-2018-analysis
data-cleaning data-modeling data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy oecd-data pandas pisa scipy statistical-inference
Last synced: 9 days ago
JSON representation
Jupyter notebook presenting the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the 2018 PISA Dataset
- Host: GitHub
- URL: https://github.com/memgonzales/pisa-2018-analysis
- Owner: memgonzales
- Created: 2022-01-08T01:22:58.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-12-26T16:03:57.000Z (over 2 years ago)
- Last Synced: 2025-06-13T11:07:42.123Z (9 days ago)
- Topics: data-cleaning, data-modeling, data-science, data-visualization, exploratory-data-analysis, jupyter-notebook, matplotlib, numpy, oecd-data, pandas, pisa, scipy, statistical-inference
- Language: Jupyter Notebook
- Homepage:
- Size: 3.02 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 2018 PISA Analysis
![badge][badge-jupyter]

![badge][badge-pandas]
![badge][badge-numpy]
![badge][badge-scipy]This project presents the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the **2018 PISA Dataset**. The [PISA](https://www.oecd.org/pisa/), which stands for **Programme for International Student Assessment** is a worldwide set of tests conducted by the Organisation for Economic Co-operation and Development (OECD) to gauge the knowledge and competence of 15-year-old students in the key subject areas of reading, mathematics, and science
This is a major course output in a statistical modeling and simulation class under Mr. Arren C. Antioquia of the Department of Software Technology, De La Salle University.
## Task
The task is to create a [Jupyter notebook](https://github.com/memgonzales/pisa-2018-analysis/blob/master/2018%20PISA%20Analysis.ipynb) that presents the process leading up to the generation of insights from a raw dataset:
- Dataset Representation
- Data Cleaning
- Exploratory Data Analysis
- Research Questions
- Statistical Inference
- Insights and ConclusionsThe complete project specifications can be found in the document [`Project Specifications.pdf`](https://github.com/memgonzales/pisa-2018-analysis/blob/master/Project%20Specifications.pdf).
## Datasets
The following real-world data sources (one primary dataset and two auxiliary datasets) were used:Dataset | Source
-- | --
2018 OECD PISA School Questionnaire Dataset *(Primary Dataset)* | [Kaggle](https://www.kaggle.com/dilaraahan/pisa-2018-school-questionnaire)
2018 OECD PISA Average Score of Mathematics, Science, and Reading Test Scores Dataset *(Auxiliary Dataset)* | [FactsMaps](https://factsmaps.com/pisa-2018-worldwide-ranking-average-score-of-mathematics-science-reading/)
ISO 3166-1 alpha-3 Code List *(Auxiliary Dataset)* | [ISO](https://www.iso.org/publication/PUB500001.html)## Built Using
This project is a Jupyter notebook, with the following Python libraries and modules used:Library/Module | Description | License
-- | -- | --
[`os`](https://docs.python.org/3/library/os.html) | Provides miscellaneous operating system interfaces | Python Software Foundation License
[`pandas`](https://pandas.pydata.org/) | Provides functions for data analysis and manipulation | BSD 3-Clause "New" or "Revised" License
[`numpy`](https://numpy.org/) | Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays | BSD 3-Clause "New" or "Revised" License
[`scipy`](https://scipy.org/) | Provides efficient numerical routines, such as those for numerical integration, interpolation, optimization, linear algebra, and statistics | BSD 3-Clause "New" or "Revised" License
[`matplotlib`](https://matplotlib.org/) | Provides functions for creating static, animated, and interactive visualizations | Matplotlib License (BSD-Compatible)*The descriptions are taken from their respective websites.*
[badge-selenium]: https://img.shields.io/badge/Selenium-43B02A?style=flat&logo=Selenium&logoColor=white
[badge-github-actions]: https://img.shields.io/badge/GitHub_Actions-2088FF?style=flat&logo=github-actions&logoColor=white
[badge-heroku]: https://img.shields.io/badge/Heroku-430098?style=flat&logo=heroku&logoColor=white## Authors
- **Mark Edward M. Gonzales**
[email protected]
[email protected]
- **Hylene Jules G. Lee**
[email protected]
[email protected][badge-jupyter]: https://img.shields.io/badge/Jupyter-F37626.svg?&style=flat&logo=Jupyter&logoColor=white
[badge-pandas]: https://img.shields.io/badge/Pandas-2C2D72?style=flat&logo=pandas&logoColor=white
[badge-numpy]: https://img.shields.io/badge/Numpy-777BB4?style=flat&logo=numpy&logoColor=white
[badge-scipy]: https://img.shields.io/badge/SciPy-654FF0?style=flat&logo=SciPy&logoColor=white