Projects in Awesome Lists tagged with data-cleaning-and-preprocessing
A curated list of projects in awesome lists tagged with data-cleaning-and-preprocessing .
https://github.com/aliiimaher/laptop-price-prediction
This is an AI model for predicting laptop price, trained on about 1200 data.
ai data-cleaning-and-preprocessing linear-algebra linear-regression price-prediction-model
Last synced: 21 Sep 2025
https://github.com/patilni3/project_power_bi
Global Superstore BI Dashboard
dashboard data-cleaning-and-preprocessing database powerbi report
Last synced: 03 Apr 2025
https://github.com/mindful-ai-assistants/sp2024-election-analysis
π An analysis of voting patterns in SΓ£o Paulo's 2024 elections, focusing on voter behavior, absenteeism, and geographic trends.
beautifulsoup dashboards data-analysis data-cleaning-and-preprocessing data-science dataset-creation datavisualization election-sp-brazil-2024 geolocalization geolocation geolocator maps oneness-consciousness power-bi python web-scraping-python
Last synced: 14 Apr 2025
https://github.com/harshita2234/potato-prices-prediction
Project aims to forecast potato prices in India using LSTM, KNN, and Random Forest Regression, integrating historical data on prices, regional stats, and rainfall patterns. Targeting agricultural stakeholders for informed decision-making.
csv-files data-cleaning-and-preprocessing data-mining-python k-nearest-neighbours knn long-short-term-memory lstm machine-learning-algorithms predictive-modeling python3 random-forest-regression random-forest-regressor
Last synced: 25 Oct 2025
https://github.com/1sumer/1sumer
Data Analyst | Python | SQL | Power BI | R | Excel | PySpark | EDA | ETL | Data Visualization | Statistical Analysis | Data Wrangling | Data Modeling | MongoDB | Machine Learning | Deployment | GitHub | AWS
data-analyst data-cleaning-and-preprocessing data-engineer data-modelling data-scientist data-visualization
Last synced: 19 Jan 2026
https://github.com/jim60105/image-dataset-prep-tools
Scripts for cleaning, converting, and managing image datasets for ML training. (Zsh/Python)
data-cleaning data-cleaning-and-preprocessing ml python zsh
Last synced: 22 Sep 2025
https://github.com/vidhi1290/robust-yield-prediction-
"Predicting a Greener Future πΎπ Delve into the world of agriculture and data science with our Yield Prediction project. We harness machine learning and weather data to forecast crop yields accurately. Join us in cultivating smarter farming practices for a sustainable tomorrow."
artificial-intelligence data-analysis data-cleaning-and-preprocessing data-science data-visualization dataexploration devops docker machine-learning machine-learning-algorithms matplotlib matplotlib-pyplot pandas python scikit-learn scikitlearn-machine-learning streamlit yield-prediction-for-food-processing
Last synced: 15 Apr 2026
https://github.com/karlyndiary/global-electronics-retailer-sales-and-customer-insights
Developed an analysis using Python, SQL, and Excel to examine sales and customer demographics for a Global Electronics Retailer. The findings aim to enhance business strategies and improve overall performance.
dashboard data-analysis data-cleaning-and-preprocessing data-pipeline data-visualization etl microsoft-excel microsoft-sql-server python sql
Last synced: 14 Feb 2026
https://github.com/randomgamingdev/grabcraft-to-schema
A Python library and its cli for converting grabcraft to schema (more specifically litematica schematic) files
ai data-cleaning data-cleaning-and-preprocessing data-science grabcraft library litematica mc minecraft minecraft-build minecraft-building python schematic schematics
Last synced: 18 Feb 2026
https://github.com/madhurimarawat/data-warehousing
This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.
data-aggregation data-cleaning data-cleaning-and-preprocessing data-warehousing detailed-documentation etl etl-pipeline mysql normalization olap-cube olap-data olap-database query-optimization snowflake-schema star-schema
Last synced: 28 Apr 2026
https://github.com/anarya22/tata-data-visualization-empowering-business-with-effective-insights-job-simulation-on-forage
Completed a simulation involving creating data visualizations for Tata Consultancy Services. Created visuals for data analysis to help executives with effective decision making.
business-analysis data-cleaning-and-preprocessing data-visualization excel powerbi
Last synced: 07 Jan 2026
https://github.com/willie-conway/datavista
DataVista is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training β all in a unified, professional-grade interface. Built with React and Recharts.
analytics-platform api-integration classification coingecko-api csv-import data-analysis data-cleaning-and-preprocessing data-pipeline data-science data-visualizations etl hypothesis-testing json-export machine-learning-models open-meteo react recharts regression statistics world-bank
Last synced: 30 May 2026
https://github.com/mayankyadav23/t20i-world-cup-2024-analysis
Explore my Jupyter Notebook π featuring comprehensive datasets and visualizations from the 2024 T20 World Cup analysis. Discover key insights into player performances π, match statistics π, and team dynamics, making it a valuable resource for cricket enthusiasts and analysts alike! πet enthusiasts and analysts alike!
cricket data-cleaning-and-preprocessing data-visualization icc insights jupyter-notebook t20-world-cup
Last synced: 28 Aug 2025
https://github.com/sudarshanasrao/from-data-to-gold--my-journey-creating-an-olympic-tableau-dashboard
Developed an interactive dashboard using Tableau with Kaggleβs Olympic dataset.
data-cleaning-and-preprocessing eda python tableau-dashboards
Last synced: 20 Jun 2026
https://github.com/hamada-khairi/pfda-hamada
A comprehensive R-based data analysis project that examines housing rental patterns across multiple cities, utilizing statistical methods and visualization techniques to analyze 4,746 properties' data points including rent prices, locations, and amenities. The project employs various R libraries to clean, process, and visualize rental market trends
apu data-analysis data-analysis-in-r data-cleaning-and-preprocessing data-processing-and-analysis data-science data-visualization-project ggplot2 house-rent-prediction r-programming-projects r-statistics r-studio real-estate-analytics
Last synced: 16 Mar 2025
https://github.com/shrawans007/hotel_customers_sentiments
Sentiment Analysis for a Hotel Based on Customer's Reviews
2018-2019 data-analysis data-analysis-in-excel data-cleaning data-cleaning-and-preprocessing data-visualization excel excel-pivot-tables github hotel-review-sentiments hotel-service ms-excel ms-excel-data-analytics pivot-tables sentiment-analysis tableau tableau-public text-reviews treemap
Last synced: 22 Mar 2025
https://github.com/girish119628/codsoft
Data Enthusiast | Predictive Modeler | Turning Insights into Strategies
cross-validation data-cleaning-and-preprocessing exploratory-data-analysis model-selection-and-evaluation
Last synced: 08 May 2026
https://github.com/yash22222/data-analysis-on-real-time-social-media-comments
EngageInsight analyzes user interactions in comment data. It provides insights through visualizations created using Python libraries like Pandas and Matplotlib. The project aims to uncover patterns and trends in user engagement. The visualizations provide an overview of comment lengths, the frequency of different types of replies.
data-analysis data-cleaning-and-preprocessing data-visualization matplotlib pandas pattern-recognition real-time-social-media-data seaborn trend-analysis
Last synced: 14 May 2026
https://github.com/sayamalt/superstore-sales-prediction
Successfully established a machine learning model that can accurately predict the sales of a superstore based on various features such as quantity, profit, discount, postal code, etc. The features are mainly associated with order details and customer demographics.
azure-machine-learning azure-web-app-service cicd-deployment cross-validation data-cleaning-and-preprocessing data-visualization exploratory-data-analysis feature-engineering github-actions-ci-cd hyperparameter-tuning machine-learning model-deployment model-retraining model-testing model-training-and-evaluation regression-models
Last synced: 09 Nov 2025
https://github.com/farhad-here/adventureworks_interactive_sales_dashboard_powerbi
An interactive Power BI dashboard for Adventure Works sales team to analyze performance, customers, products, and employees. Includes data cleaning, data modeling, DAX measures and advanced visualization features.
business-intelligence chart csv data-analysis data-cleaning data-cleaning-and-preprocessing data-visualization dax powerbi
Last synced: 13 Aug 2025
https://github.com/lkethridge/integrated_project_2
Integrated Project 2 from TripleTen
anomaly-detection cross-validation data-analytics data-cleaning-and-preprocessing data-science feature-engineering gold-recovery machine-learning metal-purification model-evaluation pandas portfolio-project python scikit-learn smape supervised-learning
Last synced: 18 Apr 2026
https://github.com/lmizner/cs249_datasciencefundamentals
Course work from UCLA's CS249 - Data Science Fundamentals
data-cleaning-and-preprocessing exploratory-data-analysis jupyter-notebook knn-regression linear-regression matplotlib numpy pandas python ridge-regression
Last synced: 12 Apr 2026
https://github.com/shubhamgoyal575/tableau-visualization-dashboard
This repository features interactive Tableau dashboards for sales performance and healthcare analysis. It includes insights on revenue trends, regional sales, patient demographics, and hospital occupancy for data-driven decision-making. π
dashborad data-analysis data-cleaning-and-preprocessing healthcare-analysis healthcare-dashboard sales-dashboard sales-data-analysis-project tableau tableau-dashboards tableau-public visualization visualization-tools
Last synced: 20 Feb 2026
https://github.com/who-else-but-arjun/convolve
This repository contains the projects developed for the Convolve PAN IIT AI-ML Hackathon, conducted by IDFC Bank. Predicting Credit Card Defaulters β A deep learning-based model to assess the risk of credit card default. Optimizing Email Engagement Time Slots β A machine learning model to determine the best time slots for personalised emails.
data-cleaning-and-preprocessing feature-engineering hyperparameter-tuning lstm neural-networks regression-models
Last synced: 22 Aug 2025
https://github.com/adi3042/data_science
ππ Explore the Data Science Universe! Unlock insights and master data skills with hands-on assignments spanning machine learning, visualization, and more. Your journey to becoming a data expert starts here! π―π‘ DataScienceJourney
anomaly-detection big-data-processing classification clustering computer-vision data-cleaning-and-preprocessing data-visualization deep-learning dimensionality-reduction ensemble-learning exploratory-data-analysis feature-engineering machine-learning model-deployment model-selection-and-evaluation natural-language-processing regression-analysis statistical-analysis time-series-analysis-and-forecasting
Last synced: 17 Jan 2026
https://github.com/sayamalt/life-expectancy-prediction
Successfully established a machine learning model which can accurately predict the expected life duration of a human being based on several demographic features such as alcohol consumption per capita, average BMI of entire population, etc.
cross-validation data-cleaning-and-preprocessing data-visualization docker end-to-end-pipeline exploratory-data-analysis feature-engineering github-actions-workflow hyperparameter-tuning machine-learning model-deployment model-training-and-evaluation
Last synced: 04 May 2026
https://github.com/jiyanshgarg/delhivery-logistics-data-analysis
This project analyzes Delhivery's logistics delivery dataset to understand delivery performance, route efficiency, and operational patterns using data analytics techniques. The analysis focuses on transforming raw segment-level logistics data into meaningful trip-level insights that can help improve delivery efficiency and route planning.
business-insights-and-recommendations data-analysis data-cleaning-and-preprocessing data-visualization exploratory-data-analysis feature-engineering feature-extraction feature-selection hypothesis-testing outlier-detection outlier-treatment
Last synced: 12 Jun 2026
https://github.com/aishwaryagade02/analyzing-animated-movie-release-date-patterns-and-its-effect-on-revenue
Predicting the release date for the anime movie to maximize the revenue of the movie
data-cleaning-and-preprocessing data-visualization machine-learning model-optimization neural-networks pytorch regression
Last synced: 11 Jun 2025
https://github.com/asuquoaa/ann_arbor_weather_analysis_2005-2015
This project analyzes historical weather data from Ann Arbor, Michigan, collected by the National Centers for Environmental Information (NCEI) Global Historical Climatology Network daily (GHCNd).
data-cleaning-and-preprocessing data-visualization
Last synced: 03 Apr 2025
https://github.com/jdavydovportfolio/moneypulse
Offline-first OCR β LLM β validation pipeline with a PySide6 GUI that ingests PDFs/images, extracts key merchant fields, enforces business rules, and exports clean CSV/JSON for CRM upload.
ai credit-analytics csv data-cleaning-and-preprocessing data-validation data-validation-and-error-handling etl financial-data fintech json llm lm-studio localllm ocr offline-first ollama pdf pyinstaller python pytorch
Last synced: 05 May 2026
https://github.com/udhaya2823/cardheko-used_car_price_prediction
π Car Dheko - Used Car Price Prediction This project enhances Car Dheko's customer experience by deploying an ML model that predicts used car prices accurately. Using a multi-city dataset, we perform data cleaning, feature engineering, and model optimization. The final model is hosted on a Streamlit app, providing instant price prediction.
data-cleaning-and-preprocessing documentation-and-reporting exploratory-data-analysis machine-learning-model-deployment model-deployment model-evaluation-and-optimization price-prediction-techniques streamlit-application-development
Last synced: 14 Oct 2025
https://github.com/aninditaws/questionnaire-exploratory-data-analysis
A comprehensive EDA project for analyzing questionnaire results. Includes data cleaning, descriptive statistics, and visualizations to identify trends and patterns in survey responses.
data-cleaning-and-preprocessing descriptive-statistics exploratory-data-analysis jupyter-notebook probability-and-statistics
Last synced: 26 Mar 2025
https://github.com/abhijeet107/final-project
Final project summation INTERNSHIP PROJECTS (2 WEEKS)
data-analysis data-cleaning-and-preprocessing excel mysql-database python tableau-public
Last synced: 23 Feb 2026
https://github.com/manishrajmss13/regression_project
A predictive machine learning model to forecast the Algerian Forest Fire FWI using Python, Scikit-learn, and Statsmodels. Includes complete data cleaning and EDA.
data-cleaning-and-preprocessing data-science eda feature-engineering learning-by-doing linear-regression machine-learning python regression scikit-learn statsmodel
Last synced: 09 May 2026
https://github.com/brooks-code/toulouse-biblio-chronicle
Snapshot of Toulouse public library customer habits β cleaning raw, messy datasets of musical, cinematic, and literary checkouts; includes data-cleaning steps, analysis notebook revealing cultural tastes in the Pink City.
data-analysis data-cleaning data-cleaning-and-preprocessing data-quality exploratory-data-analysis jupyter-notebook library-data misaligned-data mojibake tutorial
Last synced: 10 Oct 2025
https://github.com/roushankhalid/structural-heart-disease
This project uses machine learning on ECG data to predict Structural Heart Disease (SHD), with fine-tuned models, explainable AI for feature insights, and an LLM-powered recommendation system to support clinical decision-making.
data-cleaning-and-preprocessing fine-tuning llm machine-learning-algorithms python3 recommendation-system
Last synced: 17 May 2026
https://github.com/srosalino/data_wrangling_investigations
Series of 3 investigation works, regarding the subject of Data Wrangling (Acquire data from different sources; Understand how to clean and pre-process data; Transform data for analytics purposes; Perform feature engineering; Visualize data)
data-cleaning-and-preprocessing data-extraction-and-pre-processing data-visualization feature-engineering
Last synced: 19 Oct 2025
https://github.com/mdfaisalahmed025/global-university-insights
Global University Insights
data-cleaning-and-preprocessing data-visualisation python selenium-webdriver tableau-public web-scraping
Last synced: 15 Apr 2026
https://github.com/kathisnehith/medicare-ip-hospital-analysis
In-depth Data analysis and visualization of Medicare inpatient hospital data.
data-analysis data-cleaning-and-preprocessing data-merging excel exploratory-data-analysis medicare-claim-costs-prediction powerquery sql tableau-dashboards
Last synced: 10 Feb 2026
https://github.com/quantum-software-development/5-datamining_datacleaning_preparation_anomalies_outlier
π©π»βπ 5-Data Mining - Data Cleaning, Preparation and Detection of Anomalies (Outlier Detectio
accuracy-metrics data-cleaning-and-preprocessing data-exploratory fraud-detection logistic-regression random-forest test-model
Last synced: 14 Feb 2026
https://github.com/crazy-dot/zomato-data-analysis
This project analyzes 50k Bengaluru restaurants from Zomato, focusing on 17 features like location and ratings. It cleans, explores, and visualizes data to improve services. Key visualizations include delivery, booking, location, and cost. The goal is to provide insights for better customer experiences.
data-cleaning-and-preprocessing data-manipulation-with-pandas inferential-statistics kaggle-dataset numpy pandas-python python zomato-data-analysis
Last synced: 19 Apr 2026
https://github.com/jrili/data-engineer-portfolio
Jessa Rili-MigriΓ±o's Data Engineer Portfolio
beautifulsoup4 data-cleaning-and-preprocessing etl pandas python webscraping
Last synced: 24 Apr 2026
https://github.com/hossein-rahmati/airbnb-property-dataset
This project explores, cleans, and analyzes an Airbnb property dataset to uncover insights related to listings, pricing, and availability. The goal is to better understand patterns in Airbnb listings, detect outliers, and prepare data for potential machine learning models or business insights.
airbnb data-cleaning-and-preprocessing eda pandas sklearn
Last synced: 06 May 2026
https://github.com/asuquoaa/big_4_sports_teams_and_city_population_analysis-2018-
Analysis of sports teams' win/loss ratios vs. metro area populations across NFL, NBA, MLB, and NHL.
data-cleaning-and-preprocessing numpy pandas
Last synced: 13 May 2026
https://github.com/mohsinraza2999/generous-tipper
A production level modular data science project aims to predict generous tippers for taxi drivers.
backend-development ci-pipeline data-analysis data-cleaning-and-preprocessing docker exploratory-data-analysis fastapi feature-engineering front-end hypothesis-testing logistic-regression randon-forest understanding-business-problem xgboost-classifier
Last synced: 14 Jun 2026
https://github.com/ganesh2409/strive_towards_ai
This repository contains materials from a two-session workshop on Machine Learning and Deep Learning. Session 1 covers data preprocessing techniques including data cleaning, feature engineering, and exploratory data analysis. Session 2 focuses on building and training a neural network using TensorFlow and the Fashion MNIST dataset.
data-cleaning-and-preprocessing deep-learning exploratory-data-analysis machine-learning
Last synced: 16 Jun 2026
https://github.com/tanmayborse/institionistic_fuzzy_approx_space
This model introduces a hybrid approach that utilizes rough sets on intuitionistic fuzzy approximation spaces for pre-processing and soft sets for post-processing, resulting in an effective decision-making solution.
data-cleaning-and-preprocessing data-science data-visualization decision-making fuzzy-logic
Last synced: 17 Jun 2026
https://github.com/rajesh9943/visualizing-global-development-trends-an-animated-analysis-of-life-expectancy-and-fertility-rates
To clean and analyze data to find trends in global population, fertility, and life expectancy from 1960 to 2016. This idea was inspired by hans rosling . To analyze the data, I used a scatter bubble chart, which clearly shows how's the population increased and the fertility rate decreased from 1960 to 2016.
data-analysis data-cleaning-and-preprocessing data-exploration expolatory-data-analysis identify-patterns reporting vizualisation
Last synced: 08 Oct 2025
https://github.com/mattsebastianh/Analyze-Data-with-Python-Portfolio-Project
Analyze Data with Python
barplot categories chi-square-test conservation contingency-table crosstab data-analysis data-cleaning-and-preprocessing eda endangered-species matplotlib national-parks pandas-dataframe species species-conservation
Last synced: 18 Jun 2026
https://github.com/narpat78/proactive-fraud-detection
A Fraud detection project with Data Cleaning, Exploratory Data Analysis, Feature Engineering, and Modeling using Logistic Regression and Random Forest on a transaction data.
data-cleaning-and-preprocessing data-modeling eda feature-engineering fraud-detection logistic-regression random-forest-classifier
Last synced: 09 Sep 2025
https://github.com/narpat78/layoffs-data-cleaning-and-eda-using-sql
A SQL-based project to clean and analyze layoffs dataset. Focuses on standardizing data, handling nulls, converting data types, and performing exploratory queries for business insights.
data-cleaning-and-preprocessing eda mysql mysql-workbench sql
Last synced: 09 Sep 2025
https://github.com/syedanimrafatima/coffee-shop-sales-analysis-sql-powerbi
This Repository's details will be updated in a while.
advanced-dax advanced-sql business-intelligence csv data-analytics data-cleaning-and-preprocessing data-loading data-transformation data-visualization excel mysql-database powerbi sales-analysis sql-queries
Last synced: 05 Apr 2025
https://github.com/whereishussain/data-science
Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation
data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks
Last synced: 24 Jun 2025
https://github.com/aruppatra04/end-to-end-data_warehouse-pipeline
Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
bronze-silver-gold data-cleaning-and-preprocessing data-warehouse sql sql-server
Last synced: 02 Feb 2026
https://github.com/AsuquoAA/Big_4_Sports_Teams_and_City_Population_Analysis-2018-
Analysis of sports teams' win/loss ratios vs. metro area populations across NFL, NBA, MLB, and NHL.
data-cleaning-and-preprocessing numpy pandas
Last synced: 21 Jul 2025
https://github.com/annaanastasy/mushroom-binary-classification-eda-ml
Explored and modeled a competition dataset of mushroom species, focusing on data cleaning, exploratory data analysis, and building machine learning models for accurate classification of edible and poisonous mushrooms.
binary-classification data data-cleaning-and-preprocessing data-science exploratory-data-analysis machine-learning-algorithms xgboost-classifier
Last synced: 29 Mar 2025
https://github.com/calebtheman116/hotel_customers_sentiments
Sentiment Analysis for a Hotel Based on Customer's Reviews
2018-2019 data-analysis data-analysis-in-excel data-cleaning data-cleaning-and-preprocessing data-visualization excel excel-pivot-tables github hotel-review-sentiments pivot-tables sentiment-analysis tableau-public text-reviews
Last synced: 21 Jul 2025
https://github.com/vbhvsingh0/nflteam_corr_population
The goal of this project is to find the correlation in between NFL teams' win and loss with the population of the city.
data-analysis data-cleaning-and-preprocessing data-manipulation-with-pandas numpy-library pandas-python pearson-correlation python3
Last synced: 04 Mar 2025
https://github.com/m-hussain-x199/data-science
Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation
data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks
Last synced: 12 May 2025
https://github.com/muhammadrauhan/project-using-pyspark
Cleaned and Processed an E-Commerce Orders Dataset using PySpark.
apache-spark data-cleaning-and-preprocessing data-processing pyspark
Last synced: 15 May 2026
https://github.com/chiugo-nsoke/student-performance-analysis
An analysis of student performance factors using Python, featuring data cleaning, EDA, and machine learning for prediction.
data-cleaning-and-preprocessing exploratory-data-analysis jupyter-notebook logistic-regression machine-learning
Last synced: 14 Mar 2025
https://github.com/srosalino/determining_traffic_accident_severity_in_the_usa
Helping the authorities to better understand traffic problems and to establish public policies to minimize this issue, and for insurance companies to define their commercial policy
data-cleaning-and-preprocessing data-engineering data-wrangling feature-engineering machine-learning
Last synced: 12 Jun 2026
https://github.com/sayamalt/travel-insurance-claim-prediction
Successfully established a supervised machine learning model that can accurately predict whether the travel insurance claim of a particular customer should be approved or not by a travel insurance agency.
binary-classification cross-validation data-cleaning-and-preprocessing exploratory-data-analysis feature-engineering hyperparameter-tuning model-training-and-evaluation supervised-machine-learning
Last synced: 28 Jun 2025
https://github.com/omari-kd/transborder-freight-data-analysis
This project analyses transportation data from the Bureau of Transportation Statistics (BTS) to uncover insights into cross-border freight's efficiency, safety and environmental impacts across road, rail, air and water modes.
data-analysis data-analysis-in-r data-cleaning-and-preprocessing data-science data-visualization powerbi
Last synced: 30 Mar 2025
https://github.com/jrili/ibm-etl-car-dealership
ETL project on car dealership data taken from IBM Python project for Data Engineering on Coursera.
data-cleaning-and-preprocessing etl pandas python
Last synced: 04 Aug 2025