Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mchenryspagg/investigate_a_dataset

This is a data analysis project that demonstrates the student's ability to use python data analysis libraries such as pandas, numpy and pyplot in matplotlib to investigate a dataset and answer specific questions from the dataset, thus demonstrating skills in data cleaning, data wrangling, and exploratory data analysis.
https://github.com/mchenryspagg/investigate_a_dataset

data-analysis datetime descriptive-analysis descriptive-statistics exploratory-data-analysis numpy pandas pyplot python visualization

Last synced: about 2 months ago
JSON representation

This is a data analysis project that demonstrates the student's ability to use python data analysis libraries such as pandas, numpy and pyplot in matplotlib to investigate a dataset and answer specific questions from the dataset, thus demonstrating skills in data cleaning, data wrangling, and exploratory data analysis.

Awesome Lists containing this project

README

        

# Investigate_a_dataset

### What
Showcase ability to investigate a dataset.

### How
This project demonstrates the student's ability to use python data analysis libraries such as pandas, numpy and pyplot in matplotlib to investigate a dataset and answer specific questions from the dataset, thus demonstrating the requisite know-how in data cleaning, data wrangling, and exploratory data analysis using statistical analysis and visualizations.

## Table of contents

- [Overview](#overview)
- [The challenge](#the-challenge)
- [Outcome](#outcome)
- [Links](#links)
- [Built with](#built-with)
- [What was learnt](#what-was-learnt)
- [Key Insights](#key-insights)
- [Continued development](#continued-development)
- [Useful resources](#useful-resources)
- [Acknowledgments](#acknowledgments)

## Overview
The Dataset consists of the data related to the medical appointment details of 110,527 persons in Brazil. The data is focused on showing if the patients that booked appointment dates actually showed up for their appointments. The dataset is broken down into 14 associated variables (characteristics) and the variables would be further explained

The original dataset can be found [here](https://www.kaggle.com/datasets/joniarroba/noshowappointments?select=KaggleV2-May-2016.csv)

### The challenge
The goal of this project is to investigate a dataset of medical appointment records for Brasil public hospitals. The data includes some attributes of patients and state if the patients showed up to appointments.

### Outcome
The analysis should be focused on finding trends influencing patients to show or not show up to appointments. Using descriptive statistics and appropriate visualizations to showcase relationships, the following questions are considered:

1. What gender books more medical appointments
2. Does awaiting time interval have any effect on patients showing up for appointments.
3. What factors are important for us to know if patients will show up for their appointments
4. Is there a relationship between age and having a scholarship?

The project detailed analysis is available here - [Kaggle notebook](https://www.kaggle.com/code/henryokam/noshowappointment-investigate-a-dataset) or [Jupyter notebook](./Investigate_a_Dataset.ipynb)

### Links

- Summary Report : [Portfolio URL](https://sites.google.com/view/mchenrys-portfolio/data-analysis-projects/investigate-a-dataset-noshowappointment)

### Built with

- Jupyter Notebook
- Python
- Pandas, Matplotlib, Numpy libraries.

### What was learnt

The ability of investigate any dataset, understand the dataset, ask questions from a dataset, carry out appropriate data wrangling and data cleaning processes, perform exploratory data analysis to answer set questions from dataset by investigating trends and relationship in the data.

### Key Insights

- More females booked for appointment compared to their male counterparts.
- Awaiting time interval wasn't found to have any major effect on awaiting time interval
- There was no distinct feature that was noticed to have had an effect on patients showing up for their appointments.
- No relationship was found in the age of patients been attributed to them getting a scholarship

### Continued development

This is the first data analysis project in an Udacity Nanodegree program. Further learning is still been undertaken in python as at the time this project was completed. More project would be worked on and subsequently.

### Useful resources

- [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) - This course is great to learn and master javascript with practical hands-on experience to test your learning as you go
- [Stack Overflow](https://stackoverflow.com/questions/25646200/python-convert-timedelta-to-int-in-a-dataframe) - A useful platform to search for possible answers to questions
- [No Appointment Dataset](https://www.google.com/url?q=https://www.kaggle.com/joniarroba/noshowappointments&sa=D&ust=1532469042118000)
- [Info on Medical Scholarship in Brazil](https://www.google.com/url?q=https://en.wikipedia.org/wiki/Bolsa_Fam%25C3%25ADlia&sa=D&ust=1532469042119000)

## Acknowledgments
Special thanks to ALX - T, the entire ALG/ALX, and their sponsors for the sponsoring this Udacity data analysis nanaodegree program. Its a great privilege that i don't take for granted.