https://github.com/victorlcastro-dsa/pbl-datacamp

This repository features projects from DataCamp's Project-Based Learning (PBL) courses, showcasing practical applications of data analysis, machine learning, and visualization. Explore real-world datasets and interactive results that highlight the skills gained through hands-on learning.
https://github.com/victorlcastro-dsa/pbl-datacamp

data-analysis data-science data-visualization datacamp-projects hypothesis-testing machine-learning project-based-learning

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/victorlcastro-dsa/pbl-datacamp
Owner: victorlcastro-dsa
Created: 2024-08-23T22:38:39.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-10-23T01:25:24.000Z (9 months ago)
Last Synced: 2025-03-03T09:42:39.043Z (4 months ago)
Topics: data-analysis, data-science, data-visualization, datacamp-projects, hypothesis-testing, machine-learning, project-based-learning
Language: Jupyter Notebook
Homepage:
Size: 33 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

---

## 📊 DataCamp Project-Based Learning Repository

Welcome to my repository showcasing projects completed through DataCamp's project-based learning courses. This repository features a collection of real-world projects that I undertook as part of my DataCamp learning journey. These projects cover various aspects of data science, including data analysis, machine learning, and visualization, and are designed to demonstrate practical applications of the skills and concepts acquired during my studies.

### 📁 Contents

- **Project 1: [Insurance Claims Prediction](workspaces/modeling_car_insurance_claim_outcomes)**

A machine learning project focused on predicting whether a customer will make a claim on their car insurance during the policy period. Key techniques used include data preprocessing, feature engineering, and model selection to optimize accuracy.

- **Project 2: [Predictive Modeling for Agriculture](workspaces/predictive_modeling_for_agriculture)**

A machine learning project aimed at assisting farmers in selecting the optimal crop for their field using supervised learning and feature selection techniques. The challenge involves addressing a budget constraint where only one of four critical soil measures can be assessed: Nitrogen, Phosphorous, Potassium, or pH value. The goal is to identify the most influential feature for accurate crop prediction, demonstrating the application of feature selection in a real-world agricultural context.

- **Project 3: [Exploring Airbnb Market Trends](workspaces/exploring_airbnb_market_trends)**

This project conducts a detailed analysis of the short-term rental market in New York City using Airbnb listing data from 2019. By merging information from multiple files, the study explores key aspects such as the earliest and most recent review dates, the number of private room listings, and the average listing price. The primary goal is to provide valuable insights for a real estate startup, with a focus on understanding the market for private rooms, and to consolidate the results into a single DataFrame for easy interpretation and decision-making.

- **Project 4: [Customer Analytics: Preparing Data for Modeling](workspaces/customer_analytics_preparing_data_for_modeling)**

This project focuses on optimizing a large customer dataset for *Training Data Ltd.*, with the aim of enhancing the efficiency of predictive modeling. The dataset contains detailed information about students, including their demographics, education, and work experience, which will be used to predict whether they are seeking new job opportunities. By applying various data transformation techniques—such as converting categorical data to more memory-efficient types and filtering the dataset for relevant entries—the project significantly reduces memory usage, enabling faster and more efficient model training and predictions.

- **Project 5: [Visualizing the History of Nobel Prize Winners](workspaces/visualizing_the_history_of_nobel_prize_winners)**

This project analyzes a century's worth of Nobel Prize data to uncover trends and insights into the distribution of awards across different demographics and categories. By exploring the dataset, the project identifies key patterns, such as the most common gender and birth country among Laureates, the prominence of US-born winners across decades, and the representation of female Laureates. Through data manipulation and visualization, it sheds light on potential biases and notable achievements in the history of the Nobel Prize.

- **Project 6: [Analyzing Crime in Los Angeles](workspaces/analyzing_crime_in_los_angeles)**

This project analyzes crime data in Los Angeles to help the LAPD allocate resources more effectively. By examining over a century of crime records, the project identifies patterns in criminal activity, such as peak hours for crimes and areas with high frequencies of night crimes. The analysis also explores the demographics of crime victims, providing insights into age group vulnerabilities. These findings will support strategic decision-making to enhance public safety in Los Angeles.

- **Project 7: [Exploring NYC Public School Test Result Scores](workspaces/exploring_nyc_public_school_test_result_scores)**

This project analyzes the SAT performance of New York City's public schools, focusing on identifying schools with the highest math scores, examining score variations across different boroughs, and ranking the top ten schools based on their combined SAT scores. The insights derived from this analysis are valuable for educators, policymakers, and parents in making informed decisions about school performance and educational quality in NYC.

- **Project 8: [Investigating Netflix Movies](workspaces/investigating_netflix_movies)**

This project analyzes Netflix movie data to explore trends and patterns in film production over a specific decade. By utilizing Python data manipulation and visualization techniques, the goal is to identify common features among popular films, such as frequently watched genres, typical durations, and factors influencing viewer ratings. These analyses provide valuable insights for film production and market understanding on the platform.

- **Project 9: [Mobile Games A/B Testing with Cookie Cats](workspaces/mobile_games_a_b_testing_with_cookie_cats)**

This project analyzes an A/B test in the mobile game *Cookie Cats* to evaluate the impact of moving a gameplay gate from level 30 to level 40 on player retention. By examining 1-day and 7-day retention rates, the analysis reveals that keeping the gate at level 30 results in higher retention. The findings suggest that an earlier gate encourages players to take breaks, prolonging their enjoyment of the game and improving retention, which is critical for maintaining an active player base.

- **Project 10: [The Android App Market on Google Play](workspaces/the_android_app_market_on_google_play)**

This project analyzes the Android app market by examining over ten thousand apps on the Google Play Store. The analysis includes a comparison between free and paid apps, focusing on user sentiment, number of installs, and other metrics that influence app success. Through sentiment analysis of user reviews, the project reveals that free apps often receive more negative feedback than paid apps, which tend to be of higher quality. The insights gained from this analysis can guide strategies for app development and marketing, helping developers make informed decisions to enhance app performance and user satisfaction.

- **Project 11: [What and Where are the World's Oldest Businesses](workspaces/what_and_where_are_the_worlds_oldest_businesses)**

This project explores the world's oldest businesses, analyzing data from various industries and regions to understand how these companies have managed to survive for centuries. The analysis reveals that certain industries, such as banking and finance, have a higher representation among the oldest businesses, while regions like Europe and Asia host many of these ancient companies. The insights gained from this study can inform strategies for long-term business success and sustainability.

- **Project 12: [Creating Functions to Register App Users](workspaces/creating_functions_to_register_app_users)**

This project involves developing and integrating Python validation functions into a mobile app's registration system. By catching errors in user inputs, the functions aim to improve the robustness of the sign-up process, ensuring a better onboarding experience for app users. The notebook provides a step-by-step guide to defining, testing, and deploying these functions within the app.

- **Project 13: [Interstellar Delivery: Mastering Datetime in Python](workspaces/interstellar_delivery_mastering_datetime_in_python)**

The project was developed to create reusable functions that assist in managing a fictional space logistics startup. Utilizing Python’s `datetime` module, the project focuses on crucial tasks such as timestamp formatting, rocket landing time estimation, and days-until-delivery calculations. These basic functions are essential for managing schedules and deadlines in an intergalactic scenario, offering a practical and educational application for handling dates and times in Python.

- **Project 14: [Predicting Temperature in London](workspaces/predicting_temperature_in_london)**

This project focuses on building a machine learning pipeline to predict the mean temperature in London, England. By experimenting with various regression models and utilizing `sklearn` and `mlflow`, the project aims to identify the best approach for predicting weather patterns. The analysis involves exploring the `london_weather.csv` dataset and evaluating different weather factors to enhance prediction accuracy. Additionally, the use of `mlflow` enables tracking and optimization of model performance throughout the project.

- **Project 15: [Hypothesis Testing With Men's and Women's Soccer Matches](workspaces/hypothesis_testing_with_men's_and_women's_soccer_matches)**

This project compares the performance of men's and women's soccer teams to test the null hypothesis that there is no difference in their performance. The analysis uses a dataset of World Cup matches to explore the distributions of goals scored, shots on goal, and possession percentage. The project applies hypothesis testing to evaluate the differences between male and female teams and provides insights into the results. The study's findings can be used to improve strategies for both men's and women's teams, as well as highlight areas of improvement for teams in both categories.

- **Project 16: [Clustering Antarctic Penguin Species](workspaces/clustering_antartic_penguin_species)**

This project leverages clustering techniques to analyze and categorize Antarctic penguin species based on physical measurements like beak size, flipper length, and body weight. By identifying patterns and distinct characteristics, the analysis aims to assist researchers in understanding the unique adaptations of each species. The findings can provide valuable insights into biodiversity and ecological strategies in extreme climates.

- **Project 17: [Hypothesis Testing in Healthcare](workspaces/hypothesis_testing_in_healthcare)**

This project applies hypothesis testing to analyze the relationship between a new drug and five adverse effects: headache, abdominal pain, dyspepsia, upper respiratory infection, and chronic obstructive airway disease (COAD). The analysis focuses on identifying statistically significant differences in the proportion of patients experiencing each adverse effect between the drug and placebo groups. The study's findings can inform healthcare professionals about the potential risks associated with the drug and guide decision-making for treatment strategies.

- **More Projects** `Coming Soon`

### ✨ Highlights

- **Real-World Data:** Projects utilize real-world datasets to provide hands-on experience with practical challenges. 🌍
- **Comprehensive Analysis:** Includes detailed analysis, data preprocessing, and machine learning model implementation. 🔍
- **Interactive Visualizations:** Incorporates visualizations to effectively communicate insights and findings. 📈

### 🎯 Learning Objectives

Through these projects, I have gained valuable experience in:

- **Data Cleaning and Preprocessing:** Techniques for handling and preparing data for analysis. 🧹
- **Exploratory Data Analysis (EDA):** Methods for understanding data patterns and relationships. 🕵️‍♂️
- **Machine Learning:** Implementing various algorithms for predictive modeling and classification. 🤖
- **Visualization:** Creating informative and compelling visual representations of data. 🎨

### 🔧 Installation Instructions

To get started with these projects, you'll need to have the following installed:

- Python 3.x
- Required libraries (listed in `requirements.txt`)

You can install the required libraries using pip:

```bash
pip install -r requirements.txt
```

### 📜 Documentation

Detailed documentation for each project is available in the corresponding Jupyter notebooks or scripts. For additional information, please refer to the comments within the code.

### 📈 Examples

Below are some examples of visualizations and results obtained from the projects:

`Coming Soon`

### 🛠️ Tools and Technologies

- **Python** 🐍
- **Pandas** 📊
- **Numpy** 🧮
- **Scikit-Learn** 🤖
- **Matplotlib** 📉
- **Jupyter Notebook** 📓
- **And More**

### 🤝 Contributing

Contributions are welcome! If you have suggestions or improvements, please fork the repository and submit a pull request. For any major changes, please open an issue first to discuss what you would like to change.

### 📧 Contact

Feel free to reach out via [email](mailto:[email protected]) for any questions or feedback.

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/victorlcastro-dsa/pbl-datacamp

Awesome Lists containing this project

README