An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-cleaning-and-preprocessing

A curated list of projects in awesome lists tagged with data-cleaning-and-preprocessing .

https://github.com/aliiimaher/laptop-price-prediction

This is an AI model for predicting laptop price, trained on about 1200 data.

ai data-cleaning-and-preprocessing linear-algebra linear-regression price-prediction-model

Last synced: 21 Sep 2025

https://github.com/harshita2234/potato-prices-prediction

Project aims to forecast potato prices in India using LSTM, KNN, and Random Forest Regression, integrating historical data on prices, regional stats, and rainfall patterns. Targeting agricultural stakeholders for informed decision-making.

csv-files data-cleaning-and-preprocessing data-mining-python k-nearest-neighbours knn long-short-term-memory lstm machine-learning-algorithms predictive-modeling python3 random-forest-regression random-forest-regressor

Last synced: 25 Oct 2025

https://github.com/1sumer/1sumer

Data Analyst | Python | SQL | Power BI | R | Excel | PySpark | EDA | ETL | Data Visualization | Statistical Analysis | Data Wrangling | Data Modeling | MongoDB | Machine Learning | Deployment | GitHub | AWS

data-analyst data-cleaning-and-preprocessing data-engineer data-modelling data-scientist data-visualization

Last synced: 19 Jan 2026

https://github.com/jim60105/image-dataset-prep-tools

Scripts for cleaning, converting, and managing image datasets for ML training. (Zsh/Python)

data-cleaning data-cleaning-and-preprocessing ml python zsh

Last synced: 22 Sep 2025

https://github.com/vidhi1290/robust-yield-prediction-

"Predicting a Greener Future πŸŒΎπŸ“Š Delve into the world of agriculture and data science with our Yield Prediction project. We harness machine learning and weather data to forecast crop yields accurately. Join us in cultivating smarter farming practices for a sustainable tomorrow."

artificial-intelligence data-analysis data-cleaning-and-preprocessing data-science data-visualization dataexploration devops docker machine-learning machine-learning-algorithms matplotlib matplotlib-pyplot pandas python scikit-learn scikitlearn-machine-learning streamlit yield-prediction-for-food-processing

Last synced: 15 Apr 2026

https://github.com/karlyndiary/global-electronics-retailer-sales-and-customer-insights

Developed an analysis using Python, SQL, and Excel to examine sales and customer demographics for a Global Electronics Retailer. The findings aim to enhance business strategies and improve overall performance.

dashboard data-analysis data-cleaning-and-preprocessing data-pipeline data-visualization etl microsoft-excel microsoft-sql-server python sql

Last synced: 14 Feb 2026

https://github.com/randomgamingdev/grabcraft-to-schema

A Python library and its cli for converting grabcraft to schema (more specifically litematica schematic) files

ai data-cleaning data-cleaning-and-preprocessing data-science grabcraft library litematica mc minecraft minecraft-build minecraft-building python schematic schematics

Last synced: 18 Feb 2026

https://github.com/madhurimarawat/data-warehousing

This repository contains practical examples of data warehousing concepts, including star schema and ETL processes, all implemented using MySQL.

data-aggregation data-cleaning data-cleaning-and-preprocessing data-warehousing detailed-documentation etl etl-pipeline mysql normalization olap-cube olap-data olap-database query-optimization snowflake-schema star-schema

Last synced: 28 Apr 2026

https://github.com/anarya22/tata-data-visualization-empowering-business-with-effective-insights-job-simulation-on-forage

Completed a simulation involving creating data visualizations for Tata Consultancy Services. Created visuals for data analysis to help executives with effective decision making.

business-analysis data-cleaning-and-preprocessing data-visualization excel powerbi

Last synced: 07 Jan 2026

https://github.com/willie-conway/datavista

DataVista is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training β€” all in a unified, professional-grade interface. Built with React and Recharts.

analytics-platform api-integration classification coingecko-api csv-import data-analysis data-cleaning-and-preprocessing data-pipeline data-science data-visualizations etl hypothesis-testing json-export machine-learning-models open-meteo react recharts regression statistics world-bank

Last synced: 30 May 2026

https://github.com/mayankyadav23/t20i-world-cup-2024-analysis

Explore my Jupyter Notebook πŸ“Š featuring comprehensive datasets and visualizations from the 2024 T20 World Cup analysis. Discover key insights into player performances 🏏, match statistics πŸ“ˆ, and team dynamics, making it a valuable resource for cricket enthusiasts and analysts alike! 🌟et enthusiasts and analysts alike!

cricket data-cleaning-and-preprocessing data-visualization icc insights jupyter-notebook t20-world-cup

Last synced: 28 Aug 2025

https://github.com/sudarshanasrao/from-data-to-gold--my-journey-creating-an-olympic-tableau-dashboard

Developed an interactive dashboard using Tableau with Kaggle’s Olympic dataset.

data-cleaning-and-preprocessing eda python tableau-dashboards

Last synced: 20 Jun 2026

https://github.com/hamada-khairi/pfda-hamada

A comprehensive R-based data analysis project that examines housing rental patterns across multiple cities, utilizing statistical methods and visualization techniques to analyze 4,746 properties' data points including rent prices, locations, and amenities. The project employs various R libraries to clean, process, and visualize rental market trends

apu data-analysis data-analysis-in-r data-cleaning-and-preprocessing data-processing-and-analysis data-science data-visualization-project ggplot2 house-rent-prediction r-programming-projects r-statistics r-studio real-estate-analytics

Last synced: 16 Mar 2025

https://github.com/girish119628/codsoft

Data Enthusiast | Predictive Modeler | Turning Insights into Strategies

cross-validation data-cleaning-and-preprocessing exploratory-data-analysis model-selection-and-evaluation

Last synced: 08 May 2026

https://github.com/yash22222/data-analysis-on-real-time-social-media-comments

EngageInsight analyzes user interactions in comment data. It provides insights through visualizations created using Python libraries like Pandas and Matplotlib. The project aims to uncover patterns and trends in user engagement. The visualizations provide an overview of comment lengths, the frequency of different types of replies.

data-analysis data-cleaning-and-preprocessing data-visualization matplotlib pandas pattern-recognition real-time-social-media-data seaborn trend-analysis

Last synced: 14 May 2026

https://github.com/sayamalt/superstore-sales-prediction

Successfully established a machine learning model that can accurately predict the sales of a superstore based on various features such as quantity, profit, discount, postal code, etc. The features are mainly associated with order details and customer demographics.

azure-machine-learning azure-web-app-service cicd-deployment cross-validation data-cleaning-and-preprocessing data-visualization exploratory-data-analysis feature-engineering github-actions-ci-cd hyperparameter-tuning machine-learning model-deployment model-retraining model-testing model-training-and-evaluation regression-models

Last synced: 09 Nov 2025

https://github.com/farhad-here/adventureworks_interactive_sales_dashboard_powerbi

An interactive Power BI dashboard for Adventure Works sales team to analyze performance, customers, products, and employees. Includes data cleaning, data modeling, DAX measures and advanced visualization features.

business-intelligence chart csv data-analysis data-cleaning data-cleaning-and-preprocessing data-visualization dax powerbi

Last synced: 13 Aug 2025

https://github.com/shubhamgoyal575/tableau-visualization-dashboard

This repository features interactive Tableau dashboards for sales performance and healthcare analysis. It includes insights on revenue trends, regional sales, patient demographics, and hospital occupancy for data-driven decision-making. πŸš€

dashborad data-analysis data-cleaning-and-preprocessing healthcare-analysis healthcare-dashboard sales-dashboard sales-data-analysis-project tableau tableau-dashboards tableau-public visualization visualization-tools

Last synced: 20 Feb 2026

https://github.com/who-else-but-arjun/convolve

This repository contains the projects developed for the Convolve PAN IIT AI-ML Hackathon, conducted by IDFC Bank. Predicting Credit Card Defaulters – A deep learning-based model to assess the risk of credit card default. Optimizing Email Engagement Time Slots – A machine learning model to determine the best time slots for personalised emails.

data-cleaning-and-preprocessing feature-engineering hyperparameter-tuning lstm neural-networks regression-models

Last synced: 22 Aug 2025

https://github.com/adi3042/data_science

πŸ“ŠπŸš€ Explore the Data Science Universe! Unlock insights and master data skills with hands-on assignments spanning machine learning, visualization, and more. Your journey to becoming a data expert starts here! πŸŽ―πŸ’‘ DataScienceJourney

anomaly-detection big-data-processing classification clustering computer-vision data-cleaning-and-preprocessing data-visualization deep-learning dimensionality-reduction ensemble-learning exploratory-data-analysis feature-engineering machine-learning model-deployment model-selection-and-evaluation natural-language-processing regression-analysis statistical-analysis time-series-analysis-and-forecasting

Last synced: 17 Jan 2026

https://github.com/sayamalt/life-expectancy-prediction

Successfully established a machine learning model which can accurately predict the expected life duration of a human being based on several demographic features such as alcohol consumption per capita, average BMI of entire population, etc.

cross-validation data-cleaning-and-preprocessing data-visualization docker end-to-end-pipeline exploratory-data-analysis feature-engineering github-actions-workflow hyperparameter-tuning machine-learning model-deployment model-training-and-evaluation

Last synced: 04 May 2026

https://github.com/jiyanshgarg/delhivery-logistics-data-analysis

This project analyzes Delhivery's logistics delivery dataset to understand delivery performance, route efficiency, and operational patterns using data analytics techniques. The analysis focuses on transforming raw segment-level logistics data into meaningful trip-level insights that can help improve delivery efficiency and route planning.

business-insights-and-recommendations data-analysis data-cleaning-and-preprocessing data-visualization exploratory-data-analysis feature-engineering feature-extraction feature-selection hypothesis-testing outlier-detection outlier-treatment

Last synced: 12 Jun 2026

https://github.com/asuquoaa/ann_arbor_weather_analysis_2005-2015

This project analyzes historical weather data from Ann Arbor, Michigan, collected by the National Centers for Environmental Information (NCEI) Global Historical Climatology Network daily (GHCNd).

data-cleaning-and-preprocessing data-visualization

Last synced: 03 Apr 2025

https://github.com/jdavydovportfolio/moneypulse

Offline-first OCR β†’ LLM β†’ validation pipeline with a PySide6 GUI that ingests PDFs/images, extracts key merchant fields, enforces business rules, and exports clean CSV/JSON for CRM upload.

ai credit-analytics csv data-cleaning-and-preprocessing data-validation data-validation-and-error-handling etl financial-data fintech json llm lm-studio localllm ocr offline-first ollama pdf pyinstaller python pytorch

Last synced: 05 May 2026

https://github.com/udhaya2823/cardheko-used_car_price_prediction

πŸš— Car Dheko - Used Car Price Prediction This project enhances Car Dheko's customer experience by deploying an ML model that predicts used car prices accurately. Using a multi-city dataset, we perform data cleaning, feature engineering, and model optimization. The final model is hosted on a Streamlit app, providing instant price prediction.

data-cleaning-and-preprocessing documentation-and-reporting exploratory-data-analysis machine-learning-model-deployment model-deployment model-evaluation-and-optimization price-prediction-techniques streamlit-application-development

Last synced: 14 Oct 2025

https://github.com/aninditaws/questionnaire-exploratory-data-analysis

A comprehensive EDA project for analyzing questionnaire results. Includes data cleaning, descriptive statistics, and visualizations to identify trends and patterns in survey responses.

data-cleaning-and-preprocessing descriptive-statistics exploratory-data-analysis jupyter-notebook probability-and-statistics

Last synced: 26 Mar 2025

https://github.com/abhijeet107/final-project

Final project summation INTERNSHIP PROJECTS (2 WEEKS)

data-analysis data-cleaning-and-preprocessing excel mysql-database python tableau-public

Last synced: 23 Feb 2026

https://github.com/manishrajmss13/regression_project

A predictive machine learning model to forecast the Algerian Forest Fire FWI using Python, Scikit-learn, and Statsmodels. Includes complete data cleaning and EDA.

data-cleaning-and-preprocessing data-science eda feature-engineering learning-by-doing linear-regression machine-learning python regression scikit-learn statsmodel

Last synced: 09 May 2026

https://github.com/brooks-code/toulouse-biblio-chronicle

Snapshot of Toulouse public library customer habits β€” cleaning raw, messy datasets of musical, cinematic, and literary checkouts; includes data-cleaning steps, analysis notebook revealing cultural tastes in the Pink City.

data-analysis data-cleaning data-cleaning-and-preprocessing data-quality exploratory-data-analysis jupyter-notebook library-data misaligned-data mojibake tutorial

Last synced: 10 Oct 2025

https://github.com/roushankhalid/structural-heart-disease

This project uses machine learning on ECG data to predict Structural Heart Disease (SHD), with fine-tuned models, explainable AI for feature insights, and an LLM-powered recommendation system to support clinical decision-making.

data-cleaning-and-preprocessing fine-tuning llm machine-learning-algorithms python3 recommendation-system

Last synced: 17 May 2026

https://github.com/srosalino/data_wrangling_investigations

Series of 3 investigation works, regarding the subject of Data Wrangling (Acquire data from different sources; Understand how to clean and pre-process data; Transform data for analytics purposes; Perform feature engineering; Visualize data)

data-cleaning-and-preprocessing data-extraction-and-pre-processing data-visualization feature-engineering

Last synced: 19 Oct 2025

https://github.com/quantum-software-development/5-datamining_datacleaning_preparation_anomalies_outlier

πŸ‘©πŸ»β€πŸš€ 5-Data Mining - Data Cleaning, Preparation and Detection of Anomalies (Outlier Detectio

accuracy-metrics data-cleaning-and-preprocessing data-exploratory fraud-detection logistic-regression random-forest test-model

Last synced: 14 Feb 2026

https://github.com/crazy-dot/zomato-data-analysis

This project analyzes 50k Bengaluru restaurants from Zomato, focusing on 17 features like location and ratings. It cleans, explores, and visualizes data to improve services. Key visualizations include delivery, booking, location, and cost. The goal is to provide insights for better customer experiences.

data-cleaning-and-preprocessing data-manipulation-with-pandas inferential-statistics kaggle-dataset numpy pandas-python python zomato-data-analysis

Last synced: 19 Apr 2026

https://github.com/jrili/data-engineer-portfolio

Jessa Rili-MigriΓ±o's Data Engineer Portfolio

beautifulsoup4 data-cleaning-and-preprocessing etl pandas python webscraping

Last synced: 24 Apr 2026

https://github.com/hossein-rahmati/airbnb-property-dataset

This project explores, cleans, and analyzes an Airbnb property dataset to uncover insights related to listings, pricing, and availability. The goal is to better understand patterns in Airbnb listings, detect outliers, and prepare data for potential machine learning models or business insights.

airbnb data-cleaning-and-preprocessing eda pandas sklearn

Last synced: 06 May 2026

https://github.com/asuquoaa/big_4_sports_teams_and_city_population_analysis-2018-

Analysis of sports teams' win/loss ratios vs. metro area populations across NFL, NBA, MLB, and NHL.

data-cleaning-and-preprocessing numpy pandas

Last synced: 13 May 2026

https://github.com/ganesh2409/strive_towards_ai

This repository contains materials from a two-session workshop on Machine Learning and Deep Learning. Session 1 covers data preprocessing techniques including data cleaning, feature engineering, and exploratory data analysis. Session 2 focuses on building and training a neural network using TensorFlow and the Fashion MNIST dataset.

data-cleaning-and-preprocessing deep-learning exploratory-data-analysis machine-learning

Last synced: 16 Jun 2026

https://github.com/tanmayborse/institionistic_fuzzy_approx_space

This model introduces a hybrid approach that utilizes rough sets on intuitionistic fuzzy approximation spaces for pre-processing and soft sets for post-processing, resulting in an effective decision-making solution.

data-cleaning-and-preprocessing data-science data-visualization decision-making fuzzy-logic

Last synced: 17 Jun 2026

https://github.com/rajesh9943/visualizing-global-development-trends-an-animated-analysis-of-life-expectancy-and-fertility-rates

To clean and analyze data to find trends in global population, fertility, and life expectancy from 1960 to 2016. This idea was inspired by hans rosling . To analyze the data, I used a scatter bubble chart, which clearly shows how's the population increased and the fertility rate decreased from 1960 to 2016.

data-analysis data-cleaning-and-preprocessing data-exploration expolatory-data-analysis identify-patterns reporting vizualisation

Last synced: 08 Oct 2025

https://github.com/narpat78/proactive-fraud-detection

A Fraud detection project with Data Cleaning, Exploratory Data Analysis, Feature Engineering, and Modeling using Logistic Regression and Random Forest on a transaction data.

data-cleaning-and-preprocessing data-modeling eda feature-engineering fraud-detection logistic-regression random-forest-classifier

Last synced: 09 Sep 2025

https://github.com/narpat78/layoffs-data-cleaning-and-eda-using-sql

A SQL-based project to clean and analyze layoffs dataset. Focuses on standardizing data, handling nulls, converting data types, and performing exploratory queries for business insights.

data-cleaning-and-preprocessing eda mysql mysql-workbench sql

Last synced: 09 Sep 2025

https://github.com/whereishussain/data-science

Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation

data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks

Last synced: 24 Jun 2025

https://github.com/aruppatra04/end-to-end-data_warehouse-pipeline

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

bronze-silver-gold data-cleaning-and-preprocessing data-warehouse sql sql-server

Last synced: 02 Feb 2026

https://github.com/AsuquoAA/Big_4_Sports_Teams_and_City_Population_Analysis-2018-

Analysis of sports teams' win/loss ratios vs. metro area populations across NFL, NBA, MLB, and NHL.

data-cleaning-and-preprocessing numpy pandas

Last synced: 21 Jul 2025

https://github.com/annaanastasy/mushroom-binary-classification-eda-ml

Explored and modeled a competition dataset of mushroom species, focusing on data cleaning, exploratory data analysis, and building machine learning models for accurate classification of edible and poisonous mushrooms.

binary-classification data data-cleaning-and-preprocessing data-science exploratory-data-analysis machine-learning-algorithms xgboost-classifier

Last synced: 29 Mar 2025

https://github.com/vbhvsingh0/nflteam_corr_population

The goal of this project is to find the correlation in between NFL teams' win and loss with the population of the city.

data-analysis data-cleaning-and-preprocessing data-manipulation-with-pandas numpy-library pandas-python pearson-correlation python3

Last synced: 04 Mar 2025

https://github.com/m-hussain-x199/data-science

Projects related Data Visualisation, Cleaning, Preprocessing, Machine Learning, Deep Learning, ANN and CNN Projects and Model Training and Model Evaluation

data-cleaning-and-preprocessing data-science data-visualisation machine-learning machine-learning-models model-training-and-evaluation neural-networks

Last synced: 12 May 2025

https://github.com/muhammadrauhan/project-using-pyspark

Cleaned and Processed an E-Commerce Orders Dataset using PySpark.

apache-spark data-cleaning-and-preprocessing data-processing pyspark

Last synced: 15 May 2026

https://github.com/chiugo-nsoke/student-performance-analysis

An analysis of student performance factors using Python, featuring data cleaning, EDA, and machine learning for prediction.

data-cleaning-and-preprocessing exploratory-data-analysis jupyter-notebook logistic-regression machine-learning

Last synced: 14 Mar 2025

https://github.com/srosalino/determining_traffic_accident_severity_in_the_usa

Helping the authorities to better understand traffic problems and to establish public policies to minimize this issue, and for insurance companies to define their commercial policy

data-cleaning-and-preprocessing data-engineering data-wrangling feature-engineering machine-learning

Last synced: 12 Jun 2026

https://github.com/sayamalt/travel-insurance-claim-prediction

Successfully established a supervised machine learning model that can accurately predict whether the travel insurance claim of a particular customer should be approved or not by a travel insurance agency.

binary-classification cross-validation data-cleaning-and-preprocessing exploratory-data-analysis feature-engineering hyperparameter-tuning model-training-and-evaluation supervised-machine-learning

Last synced: 28 Jun 2025

https://github.com/omari-kd/transborder-freight-data-analysis

This project analyses transportation data from the Bureau of Transportation Statistics (BTS) to uncover insights into cross-border freight's efficiency, safety and environmental impacts across road, rail, air and water modes.

data-analysis data-analysis-in-r data-cleaning-and-preprocessing data-science data-visualization powerbi

Last synced: 30 Mar 2025

https://github.com/jrili/ibm-etl-car-dealership

ETL project on car dealership data taken from IBM Python project for Data Engineering on Coursera.

data-cleaning-and-preprocessing etl pandas python

Last synced: 04 Aug 2025