An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with cleaning-data

A curated list of projects in awesome lists tagged with cleaning-data .

https://github.com/pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

cleaning-data data data-engineering dataframe hacktoberfest pandas pydata

Last synced: 18 Feb 2026

https://github.com/araafroyall/Cleaner-Royall

๐Ÿš€ ๐—” ๐— ๐—ผ๐˜€๐˜ ๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—–๐—น๐—ฒ๐—ฎ๐—ป๐—ฒ๐—ฟ ๐—™๐—ผ๐—ฟ ๐—”๐—ป๐—ฑ๐—ฟ๐—ผ๐—ถ๐—ฑ [Root]

cache cache-cleaner cache-control cache-storage cachemanager clean cleaner cleaner-android cleaner-app cleaner-apps cleaning cleaning-data cleanup lsposed magisk magisk-module sdmaid storage-manager xposed xposed-module

Last synced: 15 Apr 2025

https://github.com/prasanthg3/cleantext

An open-source package for python to clean raw text data

cleaning-data cleantext datacleaning nlp python

Last synced: 18 Feb 2026

https://github.com/longnguyen010203/youtube-recommend-master-etl-pipeline

๐Ÿ’œ๐ŸŒˆ๐Ÿ“Š A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api ๐ŸŒบ

cleaning-data dagster data-engineering data-engineering-pipeline dbt docker docker-compose dockerfile etl-pipeline metabase minio mysql polars postgresql processing pyspark spark streamlit youtube youtube-api

Last synced: 09 Jul 2025

https://github.com/futuresearch/everyrow-sdk

Intelligent pandas dataframe ops: sort, filter, dedupe & join by qualitative criteria

cleaning-data dedupe entity-resolution filtering llm-agents merging-algorithms pandas-dataframe ranking semantic-analysis

Last synced: 20 Feb 2026

https://github.com/nikhiljsk/preprocess_nlp

A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with parallel processing using custom number of processes.

cleaning-data feature-extraction glove natural-language-processing nlp parallel-processing preprocess python3 reduction spacy stages tfidf vectorization word2vec

Last synced: 12 Apr 2025

https://github.com/kingabzpro/annual-recycled-energy-saved-in-singapore

Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal

cleaning-data data-analysis data-science deepnote energy environment

Last synced: 19 Jun 2025

https://github.com/aflah02/cleansetext

This is a simple library to help you clean your textual data

cleaning-data nlp preprocessing pypi text

Last synced: 06 Oct 2025

https://github.com/engineering87/sharpsanitizer

A .NET library for sanitizing and validating object properties using customizable rules to ensure clean and secure data

cleaning-data csharp dotnet sanitizer validation

Last synced: 18 Feb 2026

https://github.com/emadbeltaje/gaza_flutter_cleaner

Clean all your flutter projects with one command line and save your disk space ๐Ÿš€

cleaner cleaning-data cli-app dart flutter

Last synced: 23 Mar 2025

https://github.com/aaurelions/clearrr

๐Ÿงน Effortlessly clear heavy temp folders โ€” safely, fast, and with full control.

cache-cleaner cache-control cleaner cleaning-data clear cleardata delete node nodejs php python rust

Last synced: 11 Apr 2026

https://github.com/dimitriskatos/health_stroke_prediction

Prediction possible strokes using RandomForest and PCA

cleaning-data pca prediction random-forest-classifier smote

Last synced: 06 Apr 2025

https://github.com/michalwols/awesome-data-curation

๐Ÿ—‘๏ธ โœจ ๐Ÿ“Š Awesome things related to data collection, annotation, cleaning and management.

active-learning annotation cleaning-data data data-science deep-learning machine-learning

Last synced: 24 Jun 2026

https://github.com/madhuresh2011/gen-z-project-using-sql

Gen-Z career aspiration response data analysis ,It aims to provide actionable strategies for businesses targeting this influential generation.

analysing cleaning-data csv-files database dataset excel sql-queries sql-server standardization

Last synced: 28 Oct 2025

https://github.com/madhuresh2011/amazon-sales-report-analysis-using-python

This project focuses on analyzing Amazon sales data using Python to uncover insights into sales performance, customer behavior, and product trends

charts cleaning-data data-analysis jupyter-notebook matplotlib numpy pandas python seaborn visualization

Last synced: 17 Apr 2026

https://github.com/mariam-badr-mb/gtc-ml-project1-hotel-bookings

The goal of this project is to build a robust data preprocessing pipeline for a hotel booking cancellation prediction model. The focus is not on training the final machine learning model but on ensuring that the dataset is clean, consistent, and ML-ready.

cleaning-data data-analysis exploratory-data-analysis

Last synced: 05 Sep 2025

https://github.com/nikhilash45/live_ipl_report

This repository hosts the source code for an interactive IPL (Indian Premier League) Dashboard built using PowerBI. The dashboard provides real-time updates on ongoing matches, including live scores, batting and bowling statistics for both teams, and the points table.

analysts cleaning-data cricket-data dashboard data data-analysis data-visualization dax powerbi

Last synced: 19 Mar 2026

https://github.com/nafisalawalidris/buybuy-e-commerce-company

The BuyBuy E-commerce Company repository is a comprehensive hub for the company's e-commerce platform. It includes source code, documentation, and data analysis insights, providing a data-driven approach to improve customer experience, drive revenue, and inform decision-making.

buybuy cleaning-data company customer-experience data data-analysis decision-making documentation e-commerce excel insights postgresql repository revenue source-code sql

Last synced: 16 Mar 2025

https://github.com/shuklayash02/data_analysis_using_r

Covid19 analysis and cleaning of data where the death age and deaths of specific gender is cleaned and analysed

analysis cleaning-data data-analysis data-visualization rprogramming

Last synced: 09 Oct 2025

https://github.com/idemio/tiny-clean

light-weight high-performance sanitizers for common use cases

cleaning-data owasp sanitation sanitizer string-manipulation

Last synced: 26 Jun 2025

https://github.com/lisakey/datacamp-data-analyst-python-sql-projects

Several projects completed during my Data Analyst ๐Ÿ“Š training on the DataCamp platform with Python ๐Ÿ and SQL ๐Ÿ—ƒ๏ธ. Each project addresses real-world challenges using modern analytical tools and techniques.

analysis cleaning-data data dataanalysis dataanalyst matplotlib pandas python seaborn sql transformation visuali

Last synced: 19 Apr 2026

https://github.com/gre1wy/mathmod

KPI IPT course, 5 semester

cleaning-data regression-analysis

Last synced: 17 Oct 2025

https://github.com/saob007/modelado_retencion_personal_proyecto

Construcciรณn de un modelo de aprendizaje automรกtico que permite predecir si un empleado desertarรก o no de una empresa industrial de desarrollo automotriz

cleaning-data exploratory-data-analysis exploratory-data-visualizations jupyter-notebook logistic-regression-classifier pickle python3 random random-forest-classifier scikitlearn-machine-learning xgboost-classifier

Last synced: 16 May 2026

https://github.com/codepurge/codepurgekit

Swift package providing shared data models, utilities, and SwiftUI components to manage and organize purgable items in Xcode environments.

cleaning-data cleanup developer-tools macos swift swiftui swiftui-example xcode

Last synced: 07 May 2026

https://github.com/mastermindromii/data-cleaning-using-pandas

This repo is all about efficient data cleaning with Pandas ๐Ÿงน.

cleaning-data dataframe pandas python tricks

Last synced: 06 May 2026

https://github.com/prakashpandey16/sql_data_warehouse_project

Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.

cleaning-data data data-engineering data-science database etl-pipeline sqlserver

Last synced: 03 May 2026

https://github.com/vikkiezdev/ai-global-index-analysis

This project analyzes the AI readiness of 62 countries using key indicators like government strategy, commercial activity, research, development, and infrastructure. Through data cleaning, EDA, and visualization, it identifies key drivers of AI adoption and competitiveness.

cleaning-data correlation-analysis eda matplotlib numpy pandas python3 seaborn statistical-analysis

Last synced: 06 May 2026

https://github.com/ireddragonicy/booruprompt

A simple web application built with NextJS to extract tags from booru websites. Just paste the URL of a booru post, and this tool will fetch and display the associated tags, ready for you to copy.

booru cleaning-data crawler nextjs noobai tags typescript web

Last synced: 07 May 2026

https://github.com/gre1wy/mtad

KPI IPT course, 5 semester

cleaning-data visualization

Last synced: 05 Jan 2026

https://github.com/rajesh9943/web-scraping-analysis-of-top-us-company-revenue-growth-in-2023

Explore the landscape of US business growth in 2023 with our dynamic project, 'Web Scraping for US 2023 Revenue Growth.' Utilizing advanced web scraping techniques, we unveil insights into the top companies driving economic expansion.

cleaning-data data data-analysis data-visualization manipulation numpy pandas pre-fill

Last synced: 16 Aug 2025

https://github.com/jackmnob/python-tableau-eda-stockdash

Data cleaning, preparation, and manipulation (EDA) for an interactive stock market dashboard with Tableau - using pandas (Python) via JupyterLab

cleaning-data dashboard data-analysis data-preparation eda jupyter-notebook jupyterlab python tableau-public

Last synced: 14 May 2026

https://github.com/ddzikri/mini-project

Mini Project Data Engineer at Alterra Academy

cleaning-data dataset etl-pipeline firebase gcp

Last synced: 15 May 2026

https://github.com/istinnew/enaic-s-discount-strategy-analysis

**(Open to Collaboration):** This project evaluates the impact of discounts on sales and customer retention for Eniac. It includes data cleaning, visualization, storytelling, and strategic insights to optimize discount strategies while maintaining brand reputation. ๐Ÿ“Š๐Ÿ›๏ธโœจ

cleaning-data cleaning-data-in-python cost-optimization data-analysis data-science data-visualization library presentation python visualization

Last synced: 03 Apr 2025

https://github.com/brunofsbravo/us-household-income

Processo de limpeza e exploraรงรฃo de dados da Renda de Famรญlias disponibilizados pelo governo dos Estados Unidos.

cleaning-data exploratory-data-analysis mysql sql

Last synced: 03 Apr 2025

https://github.com/gagan8605/zepto_sql_analysis

This project explores and analyzes the inventory data of Zepto, a rapidly growing 10-minute grocery delivery platform in India. The dataset contains over 3,000+ SKUs across key product categories such as Fruits & Vegetables, Dairy, Beverages, Packaged Foods, and more. The analysis was performed using PostgreSQL, covering both data cleaning and bus

cleaning-data data-analysis database-management postgresql sql

Last synced: 16 Jul 2025

https://github.com/ndomah1/excel

Learning materials for learning Microsoft Excel.

charts cleaning-data conditional-formatting excel formulas pivot-tables vlookup xlookup

Last synced: 10 Feb 2026

https://github.com/aksharabhavitha/covid19-analysis

This repository contains the analysis of COVID19 and Visualisations including the CSV file used.

cleaning-data eda jupyter-notebook python visualisations

Last synced: 17 May 2026

https://github.com/kevinwood15/python_twitter_datawrangling_project

The main objectives of this project is to wrangle (clean) and analyze twitter data. I deal with some messy data, clean it, then plot some visualizations of the data to analyze it.

cleaning-data data-science data-visualization python wrangling-data

Last synced: 18 May 2026

https://github.com/amarlearning/exploring-the-evolution-of-linux

Data Analysis about the development of the Linux operating system by exploring its Git repository history.

cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux

Last synced: 12 May 2026

https://github.com/tanyagarg25/local_store_performance_analysis

Analyzing local store performance using sales data to identify trends, inefficiencies, and opportunities for growth. This project includes data cleaning, descriptive statistics, and interactive visualizations using Tableau and Excel

analytics cleaning-data eda excel tableau visualization

Last synced: 11 Feb 2026

https://github.com/nadaabdelmalek97/-deforestation-sql-project-

โ€œ Deforestation SQL project โ€œ ๐Ÿ“Œ This project was a comprehensive journey that encompassed rigorous data cleaning ,building schema, and provide insights with regard to the forestation trend between 1990 - 2016. Our main goal was to explore and analysed the dataset by writing simple SQL queries ๐Ÿ“

cleaning-data insights postgresql quire remove-duplicates schema sql trends

Last synced: 24 Aug 2025

https://github.com/badranalyst/data-cleaning-and-exploratory-data-analysis-project

This project uses SQL to clean and analyze a layoffs dataset. Data cleaning tasks include removing duplicates, standardizing values, and handling missing data. Exploratory analysis is performed to identify trends in layoffs across companies, industries, and time periods.

cleaning-data data database dataset mysql mysql-database sql

Last synced: 07 Apr 2025

https://github.com/nadaabdelmalek97/supplier-quality-analysis-

Analysis for a real Data Set aims to improve manufacturing quality by identifying key causes of downtime and defects, with vendors and material performance.

cleaning-data excel powerquery python r tableau

Last synced: 10 Apr 2026

https://github.com/akanshdivker/gitclean

A simple Python script to clean files from directories based on their extensions.

cleaning-data github python repository-management repository-tools research-tool tool tooling tools

Last synced: 20 May 2026

https://github.com/sdam-au/epigraphic_cleaning

Cleaning of epigraphic texts for further text mining and analysis

cleaning-data epigraphy etl r regex text-mining

Last synced: 15 May 2025

https://github.com/ashu3291/blinkit-app-store-

conducted a comprehensive analysis of Blinkit's sales performance, customer satisfaction and inventory distribution to improve the sales performance.

cleaning-data data dataanalysis-projects powerbi-visuals powerbidashboard sql

Last synced: 05 Jan 2026

https://github.com/juanparias29/bigdataprocessingproject

Este repositorio contiene un proyecto de anรกlisis y procesamiento de datos a gran escala basado en la metodologรญa CRISP-DM, enfocado en resolver preguntas de negocio dentro del รกmbito educativo.

apache-spark big-data big-data-analytics cleaning-data cluster datamodeling exploratory-data-analysis visualization

Last synced: 21 Apr 2026

https://github.com/shellynagar27/mobile-sales-analysis

Analyzed 2024 mobile sales data to uncover product trends, customer behavior, and regional insights using Power BI dashboards and structured data modeling.

cleaning-data data-analysis data-visualization dax eda figma modelling powerbi powerquery storytelling wireframe

Last synced: 16 May 2025

https://github.com/sehgal-vishal/citi-bike-data-analysis

The goal of this project is to analyze the usage patterns of Citi Bike in New York City

advanceexcel business-analytics citibike cleaning-data dataanalysis pivot-tables visualisation

Last synced: 26 Jan 2026

https://github.com/tanyagarg25/uber_vs_lyft_price_analysis

A comprehensive analysis comparing Uber and Lyft ride prices and service performance. The project explores key factors such as distance, surge pricing, and weather conditions affecting fares. Data cleaning, visualizations, and predictive modeling were used to provide insights into pricing strategies and market positioning.

azure cleaning-data excel modeling tableau visualization

Last synced: 24 Mar 2025

https://github.com/shellynagar27/transportation-and-logistics-challenge

Analyzing logistics data to optimize shipment efficiency, reduce delays, and enhance supply chain visibility using Power BI. Insights include top routes, delays, supplier trends, and peak shipments.

cleaning-data critical-thinking data-analysis data-visualization exploratory-data-analysis feature-engineering powerbi preprocessing-data problem-solving python

Last synced: 16 May 2026

https://github.com/theo-liang/sql-and-tableau-project-analysis-for-rockbuster-stealth

This project involved analyzing data for Rockbuster Stealth LLC, a fictional movie rental company transitioning to an online video rental service.

cleaning-data common-table-expressions filtering-data relational-databases sql subqueries-and-joins

Last synced: 27 Feb 2026

https://github.com/moonmoonsamal/customer_purchase_behavior_analysis

Customer purchase analysis with SQL, Python, and PowerBI

cleaning-data cte eda manipulate-data normalization visualization

Last synced: 25 Mar 2025

https://github.com/prajjwol09/data-cleaning-project

This project is dedicated to cleaning, standardizing a dataset, dealing with null values from a CSV file named "layoffs" using MySQL, with MySQL Workbench as the workspace environment. The goal is to prepare the data for analysis.

cleaning-data columns data-analysis database duplicates mysql rows standard

Last synced: 20 Apr 2026

https://github.com/anarya22/e-commerce_analysis

E-Commerce_Analysis is a data analysis project performed on the Superstore_USA dataset. It explores various aspects of e-commerce performance, including sales trends, customer demographics, product categories, and regional performance. The analysis includes data cleaning, visualizations, and insights on factors influencing sales and profitability.

analysis analytics cleaning-data data

Last synced: 09 Oct 2025

https://github.com/badranalyst/nashville-housing-sql-data-cleaning

This project focuses on cleaning and preparing the Nashville Housing dataset for analysis using SQL. It involves identifying and rectifying inconsistencies, handling missing values, and optimizing the dataset for further exploration. The cleaned data is essential for accurate insights into housing trends and patterns in Nashville.

cleaning-data data-cleaning database dataset sql

Last synced: 11 Oct 2025

https://github.com/kingflow-23/wikipedia-topic-clustering

This project scrapes Wikipedia pages on various topics, processes the text using TF-IDF vectorization, and clusters the topics using KMeans. The results are visualized in a 2D plot using UMAP, providing insights into the relationships and groupings of different Wikipedia topics based on their content.

beautifulsoup4 cleaning-data clustering jupyrt-notebook python scraping umap vectorization wikipedia-api

Last synced: 15 Apr 2026

https://github.com/nagar2nd/ml-regressionmodel---cardekho-price-prediction

This repository features a machine learning model for predicting used car prices using data from CarDekho.com. The project leverages exploratory data analysis and regression techniques to empower sellers and buyers with actionable insights in the Indian used car market.

analytics cleaning-data data linear-regression machine-learning matplotlib numpy pandas python seaborn

Last synced: 16 Apr 2026

https://github.com/ddeepanshu-997/datascience-e-commerce-shopping-details-

in this project i am going to apply data preprocessing technique on the dataset in order to clean the data using libraries, etc. make some insights/analyses to findout the hotpicks of the shopping along with some data visualsation libraries to get the trends and many more aspects in order to make a small contribution to the field of data science

cleaning-data data data-science data-visualization dataframe datapreprocessing dataset libraries matplotlib-pyplot numpy pandas plots python visualization

Last synced: 30 Apr 2026

https://github.com/kisaa-fatima/amazon-fashion-products-data-processing-and-database-schema

Contains work on preprocessing, cleaning, and storing Amazon fashion product data in a MySQL database. The project builds on data collected in a previous assignment, focusing on ensuring data quality, handling image data, and designing a normalized database schema.

cleaning-data datahandling imagehandler mysql preprocessing

Last synced: 01 May 2026