Projects in Awesome Lists tagged with cleaning-data
A curated list of projects in awesome lists tagged with cleaning-data .
https://github.com/pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
cleaning-data data data-engineering dataframe hacktoberfest pandas pydata
Last synced: 18 Feb 2026
https://github.com/Meteor-Community-Packages/meteor-simple-schema
Meteor integration package for simpl-schema
cleaning-data form-generation form-validation meteor meteor-package meteorjs schema validation
Last synced: 12 Nov 2025
https://github.com/meteor-community-packages/meteor-simple-schema
Meteor integration package for simpl-schema
cleaning-data form-generation form-validation meteor meteor-package meteorjs schema validation
Last synced: 14 May 2025
https://github.com/araafroyall/Cleaner-Royall
๐ ๐ ๐ ๐ผ๐๐ ๐๐ฑ๐๐ฎ๐ป๐ฐ๐ฒ ๐๐น๐ฒ๐ฎ๐ป๐ฒ๐ฟ ๐๐ผ๐ฟ ๐๐ป๐ฑ๐ฟ๐ผ๐ถ๐ฑ [Root]
cache cache-cleaner cache-control cache-storage cachemanager clean cleaner cleaner-android cleaner-app cleaner-apps cleaning cleaning-data cleanup lsposed magisk magisk-module sdmaid storage-manager xposed xposed-module
Last synced: 15 Apr 2025
https://github.com/prasanthg3/cleantext
An open-source package for python to clean raw text data
cleaning-data cleantext datacleaning nlp python
Last synced: 18 Feb 2026
https://github.com/notesjor/corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
big-data cleaning-data cooccurrence corpus-linguistics corpus-processing data-minig data-mining data-science datajournalism journalism linguistics natural-language-processing natural-language-understanding nlp sdk tagger text-analysis text-mining text-processing visualization
Last synced: 17 Jan 2026
https://github.com/longnguyen010203/youtube-recommend-master-etl-pipeline
๐๐๐ A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api ๐บ
cleaning-data dagster data-engineering data-engineering-pipeline dbt docker docker-compose dockerfile etl-pipeline metabase minio mysql polars postgresql processing pyspark spark streamlit youtube youtube-api
Last synced: 09 Jul 2025
https://github.com/futuresearch/everyrow-sdk
Intelligent pandas dataframe ops: sort, filter, dedupe & join by qualitative criteria
cleaning-data dedupe entity-resolution filtering llm-agents merging-algorithms pandas-dataframe ranking semantic-analysis
Last synced: 20 Feb 2026
https://github.com/nikhiljsk/preprocess_nlp
A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with parallel processing using custom number of processes.
cleaning-data feature-extraction glove natural-language-processing nlp parallel-processing preprocess python3 reduction spacy stages tfidf vectorization word2vec
Last synced: 12 Apr 2025
https://github.com/kingabzpro/annual-recycled-energy-saved-in-singapore
Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal
cleaning-data data-analysis data-science deepnote energy environment
Last synced: 19 Jun 2025
https://github.com/aflah02/cleansetext
This is a simple library to help you clean your textual data
cleaning-data nlp preprocessing pypi text
Last synced: 06 Oct 2025
https://github.com/engineering87/sharpsanitizer
A .NET library for sanitizing and validating object properties using customizable rules to ensure clean and secure data
cleaning-data csharp dotnet sanitizer validation
Last synced: 18 Feb 2026
https://github.com/emadbeltaje/gaza_flutter_cleaner
Clean all your flutter projects with one command line and save your disk space ๐
cleaner cleaning-data cli-app dart flutter
Last synced: 23 Mar 2025
https://github.com/byteplant/address-validator-net
NodeJS wrapper for the address-validator.net API
address address-autocomplete address-cleaning address-matching address-validation address-verification autocomplete byteplant cleaning cleaning-data data-quality data-validation javascript node-js node-module typescript validation verification wrapper
Last synced: 16 Jul 2025
https://github.com/byteplant/email-validator-net
NodeJS wrapper for the email-validator.net API
byteplant cleaning cleaning-data data-quality data-validation email email-cleaning email-marketing email-validation email-verification javascript node-js node-module typescript validation verification
Last synced: 29 Oct 2025
https://github.com/aaurelions/clearrr
๐งน Effortlessly clear heavy temp folders โ safely, fast, and with full control.
cache-cleaner cache-control cleaner cleaning-data clear cleardata delete node nodejs php python rust
Last synced: 11 Apr 2026
https://github.com/dimitriskatos/health_stroke_prediction
Prediction possible strokes using RandomForest and PCA
cleaning-data pca prediction random-forest-classifier smote
Last synced: 06 Apr 2025
https://github.com/moindalvs/sentiment_analysis_on_-elon_musk_tweets
Perform sentimental analysis on the Elon-musk tweets (Elon-musk.csv)
bag-of-words cleaning-data elon-musk feature-engineering nlp nltk polarity sentiment-analysis sentiment-intensity sentiment-polarity spacy subjectivity text-mining text-processing textblob-sentiment-analysis tfidf tfidf-vectorizer tokenizer tweet-analysis twitter-sentiment-analysis
Last synced: 21 Apr 2026
https://github.com/byteplant/phone-validator-net
NodeJS wrapper for the phone-validator.net API
byteplant cleaning cleaning-data data-quality data-validation javascript node-js node-module phone phone-marketing phone-number phone-number-verification phone-validation phonenumber typescript validation
Last synced: 02 Jul 2025
https://github.com/nicetink/effinitum-x
"System optimization tool created with WPF"
cleaner cleaning-data optimization optimize optimizer registry-cleaner registry-hacks toolbox tweaks windows-10 windows-11-tweaker windowstweaks winodws wpf
Last synced: 07 May 2026
https://github.com/michalwols/awesome-data-curation
๐๏ธ โจ ๐ Awesome things related to data collection, annotation, cleaning and management.
active-learning annotation cleaning-data data data-science deep-learning machine-learning
Last synced: 24 Jun 2026
https://github.com/madhuresh2011/gen-z-project-using-sql
Gen-Z career aspiration response data analysis ,It aims to provide actionable strategies for businesses targeting this influential generation.
analysing cleaning-data csv-files database dataset excel sql-queries sql-server standardization
Last synced: 28 Oct 2025
https://github.com/madhuresh2011/amazon-sales-report-analysis-using-python
This project focuses on analyzing Amazon sales data using Python to uncover insights into sales performance, customer behavior, and product trends
charts cleaning-data data-analysis jupyter-notebook matplotlib numpy pandas python seaborn visualization
Last synced: 17 Apr 2026
https://github.com/mariam-badr-mb/gtc-ml-project1-hotel-bookings
The goal of this project is to build a robust data preprocessing pipeline for a hotel booking cancellation prediction model. The focus is not on training the final machine learning model but on ensuring that the dataset is clean, consistent, and ML-ready.
cleaning-data data-analysis exploratory-data-analysis
Last synced: 05 Sep 2025
https://github.com/nikhilash45/live_ipl_report
This repository hosts the source code for an interactive IPL (Indian Premier League) Dashboard built using PowerBI. The dashboard provides real-time updates on ongoing matches, including live scores, batting and bowling statistics for both teams, and the points table.
analysts cleaning-data cricket-data dashboard data data-analysis data-visualization dax powerbi
Last synced: 19 Mar 2026
https://github.com/nafisalawalidris/buybuy-e-commerce-company
The BuyBuy E-commerce Company repository is a comprehensive hub for the company's e-commerce platform. It includes source code, documentation, and data analysis insights, providing a data-driven approach to improve customer experience, drive revenue, and inform decision-making.
buybuy cleaning-data company customer-experience data data-analysis decision-making documentation e-commerce excel insights postgresql repository revenue source-code sql
Last synced: 16 Mar 2025
https://github.com/shuklayash02/data_analysis_using_r
Covid19 analysis and cleaning of data where the death age and deaths of specific gender is cleaned and analysed
analysis cleaning-data data-analysis data-visualization rprogramming
Last synced: 09 Oct 2025
https://github.com/idemio/tiny-clean
light-weight high-performance sanitizers for common use cases
cleaning-data owasp sanitation sanitizer string-manipulation
Last synced: 26 Jun 2025
https://github.com/lisakey/datacamp-data-analyst-python-sql-projects
Several projects completed during my Data Analyst ๐ training on the DataCamp platform with Python ๐ and SQL ๐๏ธ. Each project addresses real-world challenges using modern analytical tools and techniques.
analysis cleaning-data data dataanalysis dataanalyst matplotlib pandas python seaborn sql transformation visuali
Last synced: 19 Apr 2026
https://github.com/gre1wy/mathmod
KPI IPT course, 5 semester
cleaning-data regression-analysis
Last synced: 17 Oct 2025
https://github.com/saob007/modelado_retencion_personal_proyecto
Construcciรณn de un modelo de aprendizaje automรกtico que permite predecir si un empleado desertarรก o no de una empresa industrial de desarrollo automotriz
cleaning-data exploratory-data-analysis exploratory-data-visualizations jupyter-notebook logistic-regression-classifier pickle python3 random random-forest-classifier scikitlearn-machine-learning xgboost-classifier
Last synced: 16 May 2026
https://github.com/codepurge/codepurgekit
Swift package providing shared data models, utilities, and SwiftUI components to manage and organize purgable items in Xcode environments.
cleaning-data cleanup developer-tools macos swift swiftui swiftui-example xcode
Last synced: 07 May 2026
https://github.com/mastermindromii/data-cleaning-using-pandas
This repo is all about efficient data cleaning with Pandas ๐งน.
cleaning-data dataframe pandas python tricks
Last synced: 06 May 2026
https://github.com/prakashpandey16/sql_data_warehouse_project
Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.
cleaning-data data data-engineering data-science database etl-pipeline sqlserver
Last synced: 03 May 2026
https://github.com/vikkiezdev/ai-global-index-analysis
This project analyzes the AI readiness of 62 countries using key indicators like government strategy, commercial activity, research, development, and infrastructure. Through data cleaning, EDA, and visualization, it identifies key drivers of AI adoption and competitiveness.
cleaning-data correlation-analysis eda matplotlib numpy pandas python3 seaborn statistical-analysis
Last synced: 06 May 2026
https://github.com/ireddragonicy/booruprompt
A simple web application built with NextJS to extract tags from booru websites. Just paste the URL of a booru post, and this tool will fetch and display the associated tags, ready for you to copy.
booru cleaning-data crawler nextjs noobai tags typescript web
Last synced: 07 May 2026
https://github.com/mehtadigisha/clean-visualize-analyze
Clean Visualize Analyze
cleaning-data data-analysis data-cleaning data-visualization eda juypter-notebook pandas python seaborn seaborn-plots visualization
Last synced: 09 May 2026
https://github.com/rajesh9943/web-scraping-analysis-of-top-us-company-revenue-growth-in-2023
Explore the landscape of US business growth in 2023 with our dynamic project, 'Web Scraping for US 2023 Revenue Growth.' Utilizing advanced web scraping techniques, we unveil insights into the top companies driving economic expansion.
cleaning-data data data-analysis data-visualization manipulation numpy pandas pre-fill
Last synced: 16 Aug 2025
https://github.com/jackmnob/python-tableau-eda-stockdash
Data cleaning, preparation, and manipulation (EDA) for an interactive stock market dashboard with Tableau - using pandas (Python) via JupyterLab
cleaning-data dashboard data-analysis data-preparation eda jupyter-notebook jupyterlab python tableau-public
Last synced: 14 May 2026
https://github.com/ddzikri/mini-project
Mini Project Data Engineer at Alterra Academy
cleaning-data dataset etl-pipeline firebase gcp
Last synced: 15 May 2026
https://github.com/istinnew/enaic-s-discount-strategy-analysis
**(Open to Collaboration):** This project evaluates the impact of discounts on sales and customer retention for Eniac. It includes data cleaning, visualization, storytelling, and strategic insights to optimize discount strategies while maintaining brand reputation. ๐๐๏ธโจ
cleaning-data cleaning-data-in-python cost-optimization data-analysis data-science data-visualization library presentation python visualization
Last synced: 03 Apr 2025
https://github.com/brunofsbravo/us-household-income
Processo de limpeza e exploraรงรฃo de dados da Renda de Famรญlias disponibilizados pelo governo dos Estados Unidos.
cleaning-data exploratory-data-analysis mysql sql
Last synced: 03 Apr 2025
https://github.com/gagan8605/zepto_sql_analysis
This project explores and analyzes the inventory data of Zepto, a rapidly growing 10-minute grocery delivery platform in India. The dataset contains over 3,000+ SKUs across key product categories such as Fruits & Vegetables, Dairy, Beverages, Packaged Foods, and more. The analysis was performed using PostgreSQL, covering both data cleaning and bus
cleaning-data data-analysis database-management postgresql sql
Last synced: 16 Jul 2025
https://github.com/ndomah1/excel
Learning materials for learning Microsoft Excel.
charts cleaning-data conditional-formatting excel formulas pivot-tables vlookup xlookup
Last synced: 10 Feb 2026
https://github.com/aksharabhavitha/covid19-analysis
This repository contains the analysis of COVID19 and Visualisations including the CSV file used.
cleaning-data eda jupyter-notebook python visualisations
Last synced: 17 May 2026
https://github.com/kevinwood15/python_twitter_datawrangling_project
The main objectives of this project is to wrangle (clean) and analyze twitter data. I deal with some messy data, clean it, then plot some visualizations of the data to analyze it.
cleaning-data data-science data-visualization python wrangling-data
Last synced: 18 May 2026
https://github.com/amarlearning/exploring-the-evolution-of-linux
Data Analysis about the development of the Linux operating system by exploring its Git repository history.
cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux
Last synced: 12 May 2026
https://github.com/tanyagarg25/local_store_performance_analysis
Analyzing local store performance using sales data to identify trends, inefficiencies, and opportunities for growth. This project includes data cleaning, descriptive statistics, and interactive visualizations using Tableau and Excel
analytics cleaning-data eda excel tableau visualization
Last synced: 11 Feb 2026
https://github.com/elkronos/helper_py
Helper functions in python
cleaning-data data-science helpers python strings-manipulation
Last synced: 24 Jul 2025
https://github.com/nadaabdelmalek97/-deforestation-sql-project-
โ Deforestation SQL project โ ๐ This project was a comprehensive journey that encompassed rigorous data cleaning ,building schema, and provide insights with regard to the forestation trend between 1990 - 2016. Our main goal was to explore and analysed the dataset by writing simple SQL queries ๐
cleaning-data insights postgresql quire remove-duplicates schema sql trends
Last synced: 24 Aug 2025
https://github.com/badranalyst/data-cleaning-and-exploratory-data-analysis-project
This project uses SQL to clean and analyze a layoffs dataset. Data cleaning tasks include removing duplicates, standardizing values, and handling missing data. Exploratory analysis is performed to identify trends in layoffs across companies, industries, and time periods.
cleaning-data data database dataset mysql mysql-database sql
Last synced: 07 Apr 2025
https://github.com/nadaabdelmalek97/supplier-quality-analysis-
Analysis for a real Data Set aims to improve manufacturing quality by identifying key causes of downtime and defects, with vendors and material performance.
cleaning-data excel powerquery python r tableau
Last synced: 10 Apr 2026
https://github.com/akanshdivker/gitclean
A simple Python script to clean files from directories based on their extensions.
cleaning-data github python repository-management repository-tools research-tool tool tooling tools
Last synced: 20 May 2026
https://github.com/bationoa/how_does_a_bike_share_navigate_speedy_success
Bike rendting case study
analytics business-intelligence cleaning-data data-analysis data-collection data-visualization r
Last synced: 26 May 2026
https://github.com/sdam-au/epigraphic_cleaning
Cleaning of epigraphic texts for further text mining and analysis
cleaning-data epigraphy etl r regex text-mining
Last synced: 15 May 2025
https://github.com/ashu3291/blinkit-app-store-
conducted a comprehensive analysis of Blinkit's sales performance, customer satisfaction and inventory distribution to improve the sales performance.
cleaning-data data dataanalysis-projects powerbi-visuals powerbidashboard sql
Last synced: 05 Jan 2026
https://github.com/juanparias29/bigdataprocessingproject
Este repositorio contiene un proyecto de anรกlisis y procesamiento de datos a gran escala basado en la metodologรญa CRISP-DM, enfocado en resolver preguntas de negocio dentro del รกmbito educativo.
apache-spark big-data big-data-analytics cleaning-data cluster datamodeling exploratory-data-analysis visualization
Last synced: 21 Apr 2026
https://github.com/shellynagar27/mobile-sales-analysis
Analyzed 2024 mobile sales data to uncover product trends, customer behavior, and regional insights using Power BI dashboards and structured data modeling.
cleaning-data data-analysis data-visualization dax eda figma modelling powerbi powerquery storytelling wireframe
Last synced: 16 May 2025
https://github.com/sehgal-vishal/citi-bike-data-analysis
The goal of this project is to analyze the usage patterns of Citi Bike in New York City
advanceexcel business-analytics citibike cleaning-data dataanalysis pivot-tables visualisation
Last synced: 26 Jan 2026
https://github.com/tanyagarg25/uber_vs_lyft_price_analysis
A comprehensive analysis comparing Uber and Lyft ride prices and service performance. The project explores key factors such as distance, surge pricing, and weather conditions affecting fares. Data cleaning, visualizations, and predictive modeling were used to provide insights into pricing strategies and market positioning.
azure cleaning-data excel modeling tableau visualization
Last synced: 24 Mar 2025
https://github.com/shellynagar27/transportation-and-logistics-challenge
Analyzing logistics data to optimize shipment efficiency, reduce delays, and enhance supply chain visibility using Power BI. Insights include top routes, delays, supplier trends, and peak shipments.
cleaning-data critical-thinking data-analysis data-visualization exploratory-data-analysis feature-engineering powerbi preprocessing-data problem-solving python
Last synced: 16 May 2026
https://github.com/theo-liang/sql-and-tableau-project-analysis-for-rockbuster-stealth
This project involved analyzing data for Rockbuster Stealth LLC, a fictional movie rental company transitioning to an online video rental service.
cleaning-data common-table-expressions filtering-data relational-databases sql subqueries-and-joins
Last synced: 27 Feb 2026
https://github.com/moonmoonsamal/customer_purchase_behavior_analysis
Customer purchase analysis with SQL, Python, and PowerBI
cleaning-data cte eda manipulate-data normalization visualization
Last synced: 25 Mar 2025
https://github.com/prajjwol09/data-cleaning-project
This project is dedicated to cleaning, standardizing a dataset, dealing with null values from a CSV file named "layoffs" using MySQL, with MySQL Workbench as the workspace environment. The goal is to prepare the data for analysis.
cleaning-data columns data-analysis database duplicates mysql rows standard
Last synced: 20 Apr 2026
https://github.com/anarya22/e-commerce_analysis
E-Commerce_Analysis is a data analysis project performed on the Superstore_USA dataset. It explores various aspects of e-commerce performance, including sales trends, customer demographics, product categories, and regional performance. The analysis includes data cleaning, visualizations, and insights on factors influencing sales and profitability.
analysis analytics cleaning-data data
Last synced: 09 Oct 2025
https://github.com/badranalyst/nashville-housing-sql-data-cleaning
This project focuses on cleaning and preparing the Nashville Housing dataset for analysis using SQL. It involves identifying and rectifying inconsistencies, handling missing values, and optimizing the dataset for further exploration. The cleaned data is essential for accurate insights into housing trends and patterns in Nashville.
cleaning-data data-cleaning database dataset sql
Last synced: 11 Oct 2025
https://github.com/kingflow-23/wikipedia-topic-clustering
This project scrapes Wikipedia pages on various topics, processes the text using TF-IDF vectorization, and clusters the topics using KMeans. The results are visualized in a 2D plot using UMAP, providing insights into the relationships and groupings of different Wikipedia topics based on their content.
beautifulsoup4 cleaning-data clustering jupyrt-notebook python scraping umap vectorization wikipedia-api
Last synced: 15 Apr 2026
https://github.com/nagar2nd/ml-regressionmodel---cardekho-price-prediction
This repository features a machine learning model for predicting used car prices using data from CarDekho.com. The project leverages exploratory data analysis and regression techniques to empower sellers and buyers with actionable insights in the Indian used car market.
analytics cleaning-data data linear-regression machine-learning matplotlib numpy pandas python seaborn
Last synced: 16 Apr 2026
https://github.com/ddeepanshu-997/datascience-e-commerce-shopping-details-
in this project i am going to apply data preprocessing technique on the dataset in order to clean the data using libraries, etc. make some insights/analyses to findout the hotpicks of the shopping along with some data visualsation libraries to get the trends and many more aspects in order to make a small contribution to the field of data science
cleaning-data data data-science data-visualization dataframe datapreprocessing dataset libraries matplotlib-pyplot numpy pandas plots python visualization
Last synced: 30 Apr 2026
https://github.com/kisaa-fatima/amazon-fashion-products-data-processing-and-database-schema
Contains work on preprocessing, cleaning, and storing Amazon fashion product data in a MySQL database. The project builds on data collected in a previous assignment, focusing on ensuring data quality, handling image data, and designing a normalized database schema.
cleaning-data datahandling imagehandler mysql preprocessing
Last synced: 01 May 2026