https://github.com/als8446/tripleten-data-science-projects
Projects Overview Projects made in the Data Scientist course from TripleTen LatAm
https://github.com/als8446/tripleten-data-science-projects
data data-analysis hypothesis-tests machine matplotlib numpy pandas python scipy sklearn
Last synced: 2 months ago
JSON representation
Projects Overview Projects made in the Data Scientist course from TripleTen LatAm
- Host: GitHub
- URL: https://github.com/als8446/tripleten-data-science-projects
- Owner: als8446
- Created: 2025-09-26T03:24:28.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-26T05:31:20.000Z (9 months ago)
- Last Synced: 2025-09-26T05:40:21.024Z (9 months ago)
- Topics: data, data-analysis, hypothesis-tests, machine, matplotlib, numpy, pandas, python, scipy, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 942 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TripleTen-Data-Science-Projects
These are the projects I made in the Data Scientist course in TripleTen LatAm.
They involve data Preprocessing, data analysis as well as statistical analysis. Some of them involve the creation of a Machine Learning models.
## 📊 Projects Overview
| Topic | Project | Description | Highlights & Libraries |
|-------------------------------|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| Data Preprocessing | Customer Loyalty Program Preparation | The e-commerce company **Store 1** is preparing the launch of a new Customer Loyalty Program. The goal is to analyze and clean its customer database to ensure data is complete, consistent, and ready for analysis. This includes cleaning customer profiles, standardizing names and ages, calculating total spending, and validating data consistency. | Python, pandas, data cleaning, preprocessing, KPIs |
| Exploratory Data Analysis | Customer Segmentation & Consumption Trends | In the final stage of the project, I focused on **customer segmentation** and **analyzing consumption trends over time** for Store 1, using enriched datasets to gain deeper insights and improve personalized marketing campaigns. | Python, pandas, data analysis, customer segmentation |
| Exploratory Data Analysis | Music Streaming Analysis | I worked with **real music streaming data** from Springfield and Shelbyville to explore and process user listening habits. This included describing the dataset, cleaning and preprocessing the data, and analyzing activity patterns by city and day of the week. | Python, pandas, numpy, matplotlib, seaborn, data cleaning, EDA, time series analysis |
| Exploratory Data Analysis | Instacart Customer Purchase Analysis | I analyzed **Instacart data** to explore customer grocery shopping behavior. The project involved cleaning and preprocessing multiple datasets, creating visualizations for order patterns by hour and day, analyzing repeated orders, and identifying top products and purchasing trends. | Python, pandas, numpy, matplotlib, seaborn, data cleaning, EDA, multi-table analysis |
| Statistical Data Analysis | Megaline Tariff Revenue Analysis | I analyzed **Megaline telecom data** to determine which prepaid plan, Surf or Ultimate, generates more revenue. The project included preprocessing multiple datasets, calculating user usage, estimating monthly revenue per user, analyzing usage patterns, and testing statistical hypotheses about revenue differences. | Python, pandas, numpy, matplotlib, seaborn, statistical analysis, hypothesis testing |
| ⭐ Project | Video Games Success Analysis | I analyzed **historical video game data** from the online store Ice to identify patterns that determine whether a game is successful. The project involved preprocessing data, analyzing sales by platform, genre, and region, exploring relationships between critic/user reviews and sales, and testing hypotheses on ratings and genres. | Python, pandas, numpy, matplotlib, seaborn, hypothesis testing, EDA |
| Software Engineering & Web App| Streamlit Dashboard for Vehicle Ads | I developed a **web application using Streamlit** to explore and visualize vehicle sales data. This included creating a Python virtual environment, performing basic exploratory data analysis (EDA) with Plotly, and building interactive visualizations (histogram and scatter plot) that can be triggered via buttons or checkboxes in the web app. | Python, pandas, plotly, streamlit, EDA, web app development, GitHub, Render deployment |
| Data Analysis & Hypothesis Testing | Chicago Taxi Trips Analysis | I analyzed **taxi trips data for Chicago** to identify travel patterns and the impact of external factors like weather. The project involved writing SQL queries to aggregate trips by taxi company and neighborhoods, joining trip data with weather records, conducting exploratory analysis with Python, visualizing results, and testing the hypothesis that "average trip duration from the Loop to O'Hare changes on rainy Saturdays." | Python, pandas, matplotlib, seaborn, SQL, data cleaning, EDA, hypothesis testing |
| Machine Learning | Megaline Plan Recommendation Model | I built a **classification model** to recommend Megaline's new plans (Smart or Ultra) based on user behavior. The task involved feature engineering, training/validation/test splits, hyperparameter tuning, model evaluation (accuracy threshold: 0.75), and sanity checks. | Python, pandas, scikit-learn, xgboost, model selection, classification, hyperparameter tuning |
| Machine Learning & Modeling | Beta Bank Customer Churn Prediction | I developed a churn prediction pipeline for **Beta Bank** to predict whether customers will leave. The focus was maximizing **F1 score** (target ≥ 0.59) and evaluating AUC-ROC. The project includes preprocessing, class imbalance handling, model selection/tuning, evaluation on a test set, and interpretation. | Python, pandas, scikit-learn, imbalanced-learn, xgboost, feature engineering, F1, AUC-ROC |
| Regression & Bootstrapping | Oil Wells Selection & Region Profit Analysis | I built a **linear regression pipeline** to predict oil reserves for candidate wells across three regions, selected the top 200 wells per region, estimated profit under a $100M budget constraint, and quantified risk with **bootstrapping** (1000 samples). I selected the region(s) that meet the loss-risk threshold (<2.5%) and have the highest mean profit. | Python, pandas, scikit-learn (LinearRegression), numpy, scipy, bootstrapping, RMSE, profit simulation |
| Machine Learning & Regression | Gold Recovery Prediction | I developed a **gold recovery prediction pipeline** for industrial ore processing datasets. The workflow included cleaning, analyzing missing values and anomalies, exploring metal concentrations, preprocessing features, training regression models, evaluating with **sMAPE**, and selecting the best model for final predictions. | Python, pandas, scikit-learn, numpy, matplotlib, seaborn, data cleaning, regression, sMAPE |
| Linear Algebra & ML | Sure Tomorrow Insurance Analysis | I applied **linear algebra and machine learning** to solve practical insurance tasks. Projects included finding similar clients, predicting probability of receiving insurance benefits, predicting benefit amounts with linear regression, and implementing **data masking** to protect sensitive information without reducing model quality. | Python, pandas, numpy, scikit-learn, linear algebra, regression, data masking, classification, similarity analysis |
| Machine Learning & Regression | Rusty Bargain Car Price Prediction | I developed a **car price prediction pipeline** for **Rusty Bargain** using historical car listings. The project involved preprocessing categorical and numerical features, training multiple regression models (Linear Regression, Random Forest, Gradient Boosting), tuning hyperparameters, evaluating model quality using **RECM**, and analyzing prediction speed and training time. | Python, pandas, scikit-learn, xgboost, lightgbm, catboost, regression, hyperparameter tuning, model evaluation |
| Time Series Forecasting & ML | Sweet Lift Taxi Orders Prediction | I developed a **time series prediction model** for **Sweet Lift Taxi** to forecast taxi orders at airports. The project involved resampling historical data to hourly intervals, exploring patterns, training multiple regression and boosting models, tuning hyperparameters, evaluating **RECM**, and ensuring predictions stay below the threshold for operational planning. | Python, pandas, scikit-learn, xgboost, lightgbm, time series, regression, RECM, forecasting |
| NLP & Classification | Film Junky Union Movie Review Sentiment | I developed a **text classification pipeline** for **Film Junky Union** to detect negative movie reviews automatically. The project included preprocessing text data, exploring class balance, training at least three models (Logistic Regression, Gradient Boosting, others), evaluating with **F1 ≥ 0.85**, and performing custom review predictions. | Python, pandas, scikit-learn, NLP, text preprocessing, classification, sentiment analysis, F1 |
| Computer Vision & ML | Good Seed Alcohol Age Verification | I developed a **computer vision model** for **Good Seed** to automatically verify customer age when purchasing alcohol. The project included exploratory data analysis of images, model training on GPU, evaluation of accuracy, and reporting insights to ensure legal compliance for alcohol sales. | Python, OpenCV, TensorFlow, Keras, computer vision, CNN, image preprocessing, model evaluation |
| Exploratory Data Analysis & ML| Project 18 Task Planning & Model Solution | I conducted **exploratory data analysis and planning** to develop a structured approach for completing tasks. The project focused on clarifying requirements, creating a multi-step plan, implementing code solutions, testing models, and reporting outcomes with explanations for all decisions and challenges faced. | Python, pandas, numpy, Jupyter Notebook, data analysis, model evaluation, planning, reporting |