https://github.com/seif-elkateb/dataset-analysis-r
https://github.com/seif-elkateb/dataset-analysis-r
cu-boulder data data-analysis datamodeling datascience ms-ds msds434 r
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/seif-elkateb/dataset-analysis-r
- Owner: Seif-Elkateb
- Created: 2025-03-07T06:32:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-07T07:34:22.000Z (over 1 year ago)
- Last Synced: 2025-04-01T05:46:56.977Z (over 1 year ago)
- Topics: cu-boulder, data, data-analysis, datamodeling, datascience, ms-ds, msds434, r
- Language: HTML
- Homepage:
- Size: 8.4 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analysis Projects
This repository contains two data analysis projects: one focusing on NYPD data and the other on COVID-19 data. Both projects utilize R for data manipulation, analysis, and visualization.
## Projects Overview
### 1. NYPD Data Analysis
This project analyzes NYPD data to uncover trends and insights related to crime in New York City.
#### Libraries Used
- Tidyverse
- Lubridate
#### Data Import
The following CSV files are imported for analysis:
- `NYPD_Shooting_Incident_Data__Historic_csv`
#### Data Tidying and Transformation
- Removed unnecessary columns.
- Created New Columns such as Day/Night
- New columns for Day / Month / Year
- fixed the object type of some variables such as occur_time and occur_data by using the lubridate mutate function
#### Data Summary
- Filtered out irrelevant observations.
- Summarized the data to include only relevant observations.
#### Data Visualization
- Visualized crime trends over time.
- Analyzed crime distribution by borough and precinct.
### 2. COVID-19 Data Analysis
This project analyzes COVID-19 data to understand the spread and impact of the pandemic globally and in the US.
#### Libraries Used
- Tidyverse
- Lubridate
#### Data Import
The following CSV files are imported for analysis:
- `time_series_covid19_confirmed_US.csv`
- `time_series_covid19_confirmed_global.csv`
- `time_series_covid19_deaths_US.csv`
- `time_series_covid19_deaths_global.csv`
- `UID_ISO_FIPS_LookUp_Table.csv`
#### Data Tidying and Transformation
- Removed unnecessary columns.
- Transformed date columns into a single `date` column.
- Created new columns for analysis, such as `cases` and `deaths`.
#### Data Summary
- Filtered out observations with zero cases.
- Summarized the data to include only relevant observations.
#### Data Visualization
- Visualized cases and deaths over time.
- Summarized total cases and deaths by state and country.
## How to Run
1. Clone the repository.
2. Install the required libraries.
3. Run the R scripts to reproduce the analysis and visualizations for both projects.
## Summary Statistics
### NYPD Data
- Key insights and trends related to crime in NYC.
### COVID-19 Data
- Last date in the dataset: `2023-03-09`
- Maximum number of cases: `103,802,702`
- Maximum number of deaths: `1,123,836`
---