An open API service indexing awesome lists of open source software.

https://github.com/abdoomohamedd/data-science-projects

A collection of data science projects ranging from exploratory data analysis to predictive modeling and clustering. Each project is designed to solve specific problems or explore particular datasets using various data science techniques and tools.
https://github.com/abdoomohamedd/data-science-projects

data-analysis data-analysis-python data-cleaning data-science data-visualization machine-learning machine-learning-algorithms

Last synced: 9 months ago
JSON representation

A collection of data science projects ranging from exploratory data analysis to predictive modeling and clustering. Each project is designed to solve specific problems or explore particular datasets using various data science techniques and tools.

Awesome Lists containing this project

README

          

# Data Science Projects

Welcome to my Data Science projects repository! This repository contains a collection of small to medium-sized data science projects that I have worked on. Each project is organized into its own folder and contains all necessary files, including data, code, and documentation.

## Table of Contents

- [Introduction](#introduction)
- [Projects](#projects)
- [Small Projects](#small-projects)
- [Medium Projects](#medium-projects)
- [Getting Started](#getting-started)
- [Contributing](#contributing)
- [Contact](#contact)

## Introduction

This repository is a compilation of various data science projects that I have created. The projects range from simple analyses and visualizations to more complex machine learning models. Each project aims to solve a specific problem or explore a particular dataset.

## Projects

### Small Projects

Small projects are quick and concise, often focusing on a single concept or technique. They are great for beginners or for anyone looking to understand the basics of data science.

1. **1- Exploring NYC Public School Test Result Scores**
- Description: This project analyzes test result scores from NYC public schools to identify trends and insights that can help improve educational outcomes.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/1-%20Exploring%20NYC%20Public%20School%20Test%20Result%20Scores)

2. **2- Netflix Data Analysis**
- Description: This project analyzes Netflix data to determine if movie lengths are getting shorter and to identify the most frequent movie duration in the 1990s as well as the number of short action movies released in the 1990s.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/2-%20Investigating%20Netflix%20Movies)

3. **3- Visualizing the History of Nobel Prize Winners**
- Description: This project analyzes Nobel Prize data to answer several key questions about the demographics and trends of laureates.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/3-%20Visualizing%20the%20History%20of%20Nobel%20Prize%20Winners)

4. **4- Analyzing Crime in Los Angeles**
- Description: This project analyzes crime data in Los Angeles to help the Los Angeles Police Department gain insights about the crimes in the city.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/4-%20Analyzing%20Crime%20in%20Los%20Angeles)

5. 5- Project: Customer Analytics: Preparing Data for Modeling

- **Description**: This project involves transforming a DataFrame called `ds_jobs_transformed` to store the data from `customer_train.csv` much more efficiently. The goal is to optimize data types and filter the dataset based on specific criteria to reduce memory usage.
- **Technologies**: Python, Pandas, Numpy
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/5-%20Project%20Customer%20Analytics%20Preparing%20Data%20for%20Modeling)

6. 6- Exploring Airbnb Market Trends

- **Description**: This project investigates the short-term rental market in New York using Airbnb listing data. It involves analyzing the dates of the earliest and most recent reviews, counting the number of private rooms, and calculating the average listing price. The results are combined into a summary DataFrame.
- **Technologies**: Python, Pandas, Jupyter Notebook, seaborn, matplotlib
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/6-%20Exploring%20Airbnb%20Market%20Trends)

7. **7- Modeling Car Insurance Claim Outcomes**
- Description: This project aims to build a model to predict whether a customer will make a claim on their car insurance during the policy period.
- Technologies: Python, Pandas, Scikit-learn, statsmodels
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/7-%20Modeling%20Car%20Insurance%20Claim%20Outcomes)

8. 8- Hypothesis Testing with Men's and Women's Soccer Matches
- **Description:** This project investigates whether more goals are scored in women's international soccer matches compared to men's. The analysis focuses on official FIFA World Cup matches since January 1, 2002, using statistical hypothesis testing to validate the hypothesis.
- **Technologies:** Python, Pandas, Matplotlib, Pingouin, Scipy
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/8-%20Hypothesis%20Testing%20with%20Men's%20and%20Women's%20Soccer%20Matches)

9. **9- Predictive Modeling for Agriculture**
- Description: This project aims to assist farmers in selecting the best crops to plant each season by using machine learning to predict crop yields based on soil conditions.
- Technologies: Python, Pandas, Scikit-learn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/9-%20Predictive%20Modeling%20for%20Agriculture)

10- **10- Clustering Antarctic Penguin Species**
- Description: This project uses clustering techniques to identify different species of Antarctic penguins based on their physical characteristics.
- Technologies: Python, Pandas, Scikit-learn, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/10-%20Clustering%20Antarctic%20Penguin%20Species)

11- **11- Predicting Movie Rental Durations**
- Description: This project aims to predict the duration for which a movie will be rented based on various features.
- Technologies: Python, Pandas, Scikit-learn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/11-%20Predicting%20Movie%20Rental%20Durations)

### Medium Projects

Medium projects are more comprehensive and involve multiple steps and techniques. They are suitable for those who have a basic understanding of data science and want to delve deeper into more complex problems.

1. **12- DataCamp Data Scientist Associate Practical Supermarket Loyalty**
- Description: This project involves analyzing supermarket loyalty data to understand customer behavior and predict future loyalty.
- Technologies: Python, Pandas, Scikit-learn, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/12-%20DataCamp%20Data%20Scientist%20Associate%20Proctical%20Supermarket%20Loyalty)

2. **13- DataCamp Data Scientist Associate Certification DS601P**
- Description: This project is part of the DataCamp Data Scientist Associate Certification and involves various data science tasks to demonstrate proficiency in data analysis and modeling.
- Technologies: Python, Pandas, Scikit-learn, Matplotlib, Seaborn
- [Link to Project](https://github.com/AbdooMohamedd/Data-Science-projects/tree/main/13-%20DataCamp%20Data%20Scientist%20Associate%20Certification%20DS601P)

## Getting Started

To get started with any of the projects, follow these steps:

1. **Clone the repository:**

```bash
git clone https://github.com/AbdooMohamedd/data-science-projects.git
cd data-science-projects
```

2. **Navigate to the desired project folder:**

```bash
cd project-name-1
```

3. **Install the required dependencies:**

Each project specifies its dependencies in the documentation. Install the dependencies using pip:

```bash
pip install pandas Matplotlib Seaborn scikit-learn
```

4. **Run the project:**

Follow the instructions provided in the project's README or documentation file to run the project.

## Contributing

I welcome contributions to this repository. If you have a project that you would like to add or improvements to suggest, please fork the repository and create a pull request.

## Contact

If you have any questions or suggestions, feel free to contact me at [abdelrahman.mohamed1081@gmail.com](mailto:abdelrahman.mohamed1081@gmail.com).