Projects in Awesome Lists tagged with datacleaning
A curated list of projects in awesome lists tagged with datacleaning .
https://github.com/openrefine/openrefine
OpenRefine is a free, open source power tool for working with messy data and improving it
data-analysis data-science data-wrangling datacleaning datacleansing datajournalism datamining java journalism opendata reconciliation wikidata
Last synced: 13 May 2025
https://github.com/OpenRefine/OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
data-analysis data-science data-wrangling datacleaning datacleansing datajournalism datamining java journalism opendata reconciliation wikidata
Last synced: 15 Mar 2025
https://github.com/great-expectations/great_expectations
Always know what to expect from your data.
cleandata data-engineering data-profilers data-profiling data-quality data-science data-unit-tests datacleaner datacleaning dataquality dataunittest eda exploratory-analysis exploratory-data-analysis exploratorydataanalysis mlops pipeline pipeline-debt pipeline-testing pipeline-tests
Last synced: 12 May 2025
https://github.com/sfu-db/dataprep
Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
apis apiwrapper cleaning connector data-exploration data-science datacleaning dataconnector dataprep datapreparation eda exploratory-data-analysis webconnector
Last synced: 14 May 2025
https://github.com/yobulkdev/yobulkdev
🔥 🔥 🔥Open Source & AI driven Data Onboarding Platform:Free flatfile.com alternative
csv-import csv-parser csv-reader data-engineering datacleaning embeddable javascript languagemodel mongodb nextjs nodejs open-source react stream streaming
Last synced: 21 Apr 2025
https://github.com/datacanvasio/hypergbm
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 15 May 2025
https://github.com/DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 09 May 2025
https://github.com/DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake
Last synced: 05 May 2025
https://github.com/datakitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake
Last synced: 04 Apr 2025
https://github.com/benchopt/benchmark_bilevel
Benchmark for bi-level optimization solvers
bilevel-optimization datacleaning hyperparameter-optimization
Last synced: 01 Aug 2025
https://github.com/mundipagg/amora-data-build-tool
Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.
analytics analytics-dashboard analytics-engineering bigquery business-intelligence data-engineering data-modeling datacleaning dataquality elt machine-learning python transformation
Last synced: 08 Sep 2025
https://github.com/data-cleaning/validatedb
Validate on a table in a DB, using dbplyr
database datacleaning validation
Last synced: 22 Oct 2025
https://github.com/nirala96/bangalore-house-prediction-app
Predicts home prices of Bangalore. Used Flutter, Flask and Jupyter Notebook.
data-science datacleaning exploratory-data-analysis flask-api flutter jupyter-notebook linear-regression python
Last synced: 23 Mar 2025
https://github.com/ropensci/excluder
Checks for Exclusion Criteria in Online Data
datacleaning exclusion mturk qualtrics r r-package rstats
Last synced: 22 Oct 2025
https://github.com/ronlee12355/kaggle-with-r
All kaggle datasets and the R codes
dataanalytics datacleaning datascience eda kaggle machine-learning r
Last synced: 22 Jul 2025
https://github.com/easonlai/samples_for_azure_databricks_orientation
Samples for Azure Databricks Orientation
azure azure-storage azureblobstorage azuresqldb databricks databricks-notebooks datacleaning json json-schema matplotlib matplotlib-pyplot pandas pandas-dataframe pyodbc pyspark pyspark-notebook pyspark-tutorial python seaborn seaborn-plots
Last synced: 26 Apr 2025
https://github.com/nelson-gon/mde
mde: Missing Data Explorer
data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit r r-package r-stats recode replace rstats statistics
Last synced: 24 Jul 2025
https://github.com/salaah01/pandas-data-cleaner
A package to aid with data cleaning using pandas.
Last synced: 11 Aug 2025
https://github.com/Nelson-Gon/mde
mde: Missing Data Explorer
data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit r r-package r-stats recode replace rstats statistics
Last synced: 30 Jul 2025
https://github.com/NhanAZ/DataCleaner
Clean up unnecessary data inside plugin_data folder
dataclean datacleaning php plugin pmmp pocketmine pocketmine-mp
Last synced: 09 Jul 2025
https://github.com/vijishmadhavan/parse-clip
A simple CLIP based project for combining images from multiple datasets.
clip data datacleaning dataexploration dataset fastai image python
Last synced: 09 Oct 2025
https://github.com/divithraju/divith-raju-immigration-data-engineering
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
apachespark bigdata bigdataprocessing bigdataproject capstone-project datacleaning dataengineering datalake datamodeling datapipeline dataprocessing dataschema dataset datawherehouse pandas sql
Last synced: 20 Feb 2025
https://github.com/mchenryspagg/google-play-store-apps-analysis-visualization
An analysis and visualization of google play store apps scraped data for the period of 2010 - 2018 . This project aims at cleaning the dataset, analyzing the given dataset, and mining informational quality insights. This project also involves visualizing the data to better and easily understand trends and different categories.
dataanalysis datacleaning datavisualization documentation mysql powerbi preprocessing python sql
Last synced: 20 Feb 2025
https://github.com/siddharthbadal/sql
SQL Data Analysis Projects
big-query data-science datacleaning datanalysis exploratory-data-analysis mysql postgresql sql
Last synced: 14 Apr 2025
https://github.com/siddh34/dsml-project
Regression
datacleaning datavisualization prediction regression-analysis
Last synced: 25 Feb 2025
https://github.com/vishrut-b/clustering-analysis-of-online-retail-data
This project leverages machine learning techniques to analyze online retail data through customer segmentation. It uses KMeans clustering to identify key customer groups and proposes tailored business strategies based on their purchasing behaviors.
clustering datacleaning exploratory-data-analysis feature-engineering kmeans-clustering machine-learning numpy online-retail pandas sciki seaborn
Last synced: 12 May 2025
https://github.com/ngambip/top-uk-youtubers-2024.githu.io
This project involves a comprehensive analysis to determine the top YouTubers in the UK for 2024, Using Excel, SQL and Power BI.
analysis dashboards datacleaning dataqualitycheck dax excel kpi mockup powerbi recommendations testing tsql
Last synced: 11 Oct 2025
https://github.com/girish119628/girish119628
Data Enthusiast | Predictive Modeler | Turning Insights into Strategies
cross-validation data-visualization datacleaning feature-engineering modeling preprocessing
Last synced: 17 Jul 2025
https://github.com/ngambip/diabetes_factors_2024
Exploring BMI Categories and Health Factors.
dashboards data datacleaning dax-languague powerbi sql sqlstudio tsql visualization
Last synced: 24 Nov 2025
https://github.com/arzan101/ev--car-data-analysis
This Power BI dashboard provides an interactive and data-driven overview of the electric vehicle (EV) landscape. It visualizes key insights across various dimensions including sales trends, model performance, manufacturer comparisons, and market growth. The purpose of the dashboard is to enable stakeholders to explore and analyze development
data-analysis data-science data-visualization database datacleaning excel powerbi
Last synced: 17 Jun 2025
https://github.com/sonu275981/big-mart-sales-prediction
Using Machine Learning Algorithms for Regression Analysis to predict the sales pattern and Using Data Analysis and Data Visualizations to Support it.
bigmart-sales-prediction data-science database datacleaning feature-engineering machine-learning pandas python sales xgboost-algorithm
Last synced: 06 Aug 2025
https://github.com/pavankethavath/car_dekho_car_price_prediction
A Streamlit web app utilizing Python, scikit-learn, and pandas for used car price prediction. Features data preprocessing (scaling, encoding), Random Forest model optimization with GridSearchCV, and interactive user input handling. Achieves high accuracy (R² score: 0.9028), showcasing skills in machine learning, data engineering, and deployment.
dataanalysis datacleaning datapreprocessing eda encoding feature-extraction feature-selection featureimportance fine-tuning machine-learning minmaxscaling normalization pandas pickle prediction-model python random-forest randomsearch-cv regression streamlit
Last synced: 23 Apr 2025
https://github.com/cintia0528/data_science-ab_testing
Conduct a 5-way AB Test on Montana State University Library's website, comparing the original "Interact" button with new versions ("Learn," "Help," "Connect," "Services") to boost user engagement.
abtesting bonferroni chisquare-test data data-science datacleaning datavisualization hypothesis-testing mde statistics
Last synced: 31 Mar 2025
https://github.com/eshaagarwa/sales_insight_project
Sales insights project using Powerbi and SQL
data-analysis data-visualization databse datacleaning datamodeling microsoft-power-bi mysql-database powerbi sales-insights sql
Last synced: 08 Aug 2025
https://github.com/r-mahesh45/hr---resume-text-classification
Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.
data datacleaning extract-transform-load feature-extraction nlp nltk-tokenizer text-mining text-processing
Last synced: 12 Sep 2025
https://github.com/jenderal92/data-cleaning-tools
This tool is simple and effective for cleaning datasets in CSV format. With its features, you can improve data quality automatically.
data-cleaing-tools datacleaning python python27 remove-duplicates remove-empty-rows
Last synced: 25 Mar 2025
https://github.com/mariaegbuna/road-accidents
Analyzing a road accidents dataset using Python.
data-visualization datacleaning jupyter-notebook pandas-dataframe python
Last synced: 10 Oct 2025
https://github.com/geo-y20/coursera-managment-system
ML and Data Science-based recommendation system
course coursera data data-science data-visualization datacleaning machine-learning mean-square-error recommendation-system
Last synced: 25 Feb 2025
https://github.com/shivam1808/data-cleaning-project
We take raw housing data and transform it in SQL Server to make it more usable for analysis.
analysis data datacleaning sql sqlserver
Last synced: 06 Mar 2025
https://github.com/R-Mahesh45/HR---Resume-Text-Classification
Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.
data datacleaning extract-transform-load feature-extraction nlp nltk-tokenizer text-mining text-processing
Last synced: 13 Oct 2025
https://github.com/abhijit2505/coupon-redemption-prediction
A machine learning test case to predict the redemption of the coupon.
data-science datacleaning decision-trees logistic-regression machine-learning-algorithms python3
Last synced: 17 Mar 2025
https://github.com/imsalmanmalik/linear-regression-model-airbnb-prices-seattle
Linear Regression Model on Airbnb prices of Seattle using Dash and Python
airbnb choropleth-map dash dataanalysis datacleaning datamanupilation datascience exploratory-data-analysis feature-engineering machine-learning matplotlib normalization numpy onehot-encoding pandas python seaborn-plots sklearn-library trainandtestsets visualization
Last synced: 05 Sep 2025
https://github.com/sksubhadeep/nashville-housing-data-cleaning-project-using-sql
SQL Data Cleaning Project on Nashville Housing Dataset
Last synced: 20 Feb 2025
https://github.com/kiranmayi5/data-warehouse-development-and-analysis
This project highlights my ability to design a comprehensive data warehouse and leverage SQL to generate actionable insights for strategic decision-making.
datacleaning datawarehousing etl sql
Last synced: 26 Feb 2025
https://github.com/shuklayash02/excel_complete_vrindastore_dataanalysis
Compltete AnalysisData Cleaning,processing and data analysis with interactive dashboard
analysis data data-visualization datacleaning excel excel-vba
Last synced: 12 Jun 2025
https://github.com/gaurav-van/toxic-comment-web_app
Data Science Project to classify a comment into several toxicity categories. This Repository is used for deployment of the project.
classification data-science datacleaning exploratory-data-analysis machine-learning nlp nlp-machine-learning python streamlit
Last synced: 28 Mar 2025
https://github.com/madhuresh2011/career-aspiration-of-gen-z-project-using-excel
Career Aspiration Of Gen-Z ,To explore the industries, roles, and pathways using Excel .
dashboards data-analysis data-visualization datacleaning dataset designing excel functional-dashboard gen-z kpi pivot-charts pivot-tables project-using-excel
Last synced: 20 Dec 2025
https://github.com/shuklayash02/sales_dashboard_powerbi
Created interactive dashboard to track and analyze online sales data Used complex parameters to drill down in worksheet and customization using filters and slicers
data-visualization datacleaning excel powerbi
Last synced: 09 Apr 2025
https://github.com/rosanafss/alteryx-journey
Practicing for Udacity Data Track. Data analysis executed by me based on the free course | Creating an Analytical Dataset | of Udacity
aggregating alteryx cross-tabbing dataanalysis datacleaning transposing webscraping
Last synced: 19 Nov 2025
https://github.com/rizz1406/customer-churn-analysis
Telco Customer Churn Analysis - Data analysis and visualization to identify churn patterns in telecom customers. Includes EDA, feature engineering, and optional machine learning modeling to predict churn and provide business insights.
churn-analysis dataanalysis dataanalysisusingpython datacleaning jupyter-notebook python visualization
Last synced: 09 Mar 2025
https://github.com/sreejabethu/sales-data-analysis-forecasting
Welcome to the Sales Data Analysis & Forecasting project! 🚀 This repository showcases my data analysis skills through exploratory data analysis (EDA), data cleaning, and visualization of sales and customer feedback data. The goal is to extract actionable insights to drive business decisions.
analysis barchart data-visualization datacleaning exploratory-data-analysis forecasting histogram matplotlib-pyplot numpy-library pandas-library pycharm-ide sales-analysis salesdata salesdataanalysis seaborn-plots transformation
Last synced: 31 Aug 2025
https://github.com/cintia0528/data_science-unsupervised_machine_learning
I aim to automate playlist creation for Moosic, a startup known for manual curation, using Machine Learning, while addressing skepticism about the ability of audio features to capture playlist "mood."
data data-preprocessing data-scaling data-science data-visualization datacleaning elbow-method kclustering machine-learning pandas python silhouette-score unsupervised-machine-learning
Last synced: 31 Mar 2025
https://github.com/netcodez/data-science-projects
Data Science Projects completed on DataCamp Data Scientist with Python Career Track
data data-analysis data-visualization datacleaning feature-engineering feature-extraction machine-learning predictive-analytics predictive-modeling python scikit-learn-python scikitlearn-machine-learning statistical-analysis statistical-models
Last synced: 31 Dec 2025
https://github.com/dhruwsunita/data-analysis-projects
Data Analysis Projects
data-visualization dataanalysis datacleaning datainterpretation eda
Last synced: 04 May 2025
https://github.com/makepath/medaprep
medaprep is a data preparation and feature engineering toolkit for geospatial applications.
data data-science datacleaning eda exploratory-data-analysis xarray
Last synced: 29 Jun 2025
https://github.com/wakolivotes/data-processing-and-preparation
In this tutorial, we use the Titanic Data (obtained from Kaggle) to illustrate key aspects of Data Processing and Preparation by relying on useful Python Libraries
data-science datacleaning jupyter-notebook python
Last synced: 22 Mar 2025
https://github.com/simran2911/sales-analysis-dashboard
This github repository contains Comprehensive Sales Analysis Dashboard. The objective of this Tableau project is to create an interactive and insightful dashboard that provides a comprehensive analysis of sales data.
analysis datacleaning excel tableau
Last synced: 26 Feb 2025
https://github.com/shuklayash02/complete_data_analysis_project
A Full Data Analysis project where a sales data is ask,prepare,process,analyze,share and act through data analysis process
data data-visualization dataanalysis database datacleaning powerbi sql
Last synced: 16 Jul 2025
https://github.com/cintia0528/data_cleaning_and_analytics-python
Evaluate if aggressive discounting benefits Eniac long-term, considering differing views on customer acquisition and brand positioning. Focus on data cleaning for informed decision-making.
colab-notebook data data-analysis datacleaning dataquality jupyter-notebook matplotlib pandas python seaborn
Last synced: 31 Mar 2025
https://github.com/datarohit/data-cleaning-exercise-2
In this Exercise a partial cleaning and the reordering of column headings has been done in excel and rest cleaning done in Python.
data-cleaning datacleaning pandas
Last synced: 15 Aug 2025
https://github.com/rakumar99/power-bi-projects
This repository contains various power bi projects and dashboards of Humaan Resources , Financial Analysis using Power BI Desktop.
dashboards data-analysis data-visualization databases datacleaning datamodeling etl powerbi powerquery reports
Last synced: 26 Feb 2025
https://github.com/joyalshaji135/product-sale-report-using-power-bi
In Power BI, load the Sales and Category tables, create a relationship between them using CategoryID, and define measures like Total Sales. Build a report with visuals (e.g., bar charts, tables) to display sales data by category, format the visuals, and add slicers for dynamic filtering by category and date.
Last synced: 05 Jan 2026
https://github.com/rizz1406/superstore-sales-analysis
Power BI dashboard analyzing superstore sales trends and forecasting future sales
datacleaning datamodeling datavizualization microsoft-excel powerbi powerbidashboard salesanalysis
Last synced: 02 Mar 2025
https://github.com/adi-200/bank_loan_data_analysis_using_power-bi
Bank Loan Data Analysis Using Power BI
charts datacleaning datatransformation datavisualization dax-expression dax-query powerbi
Last synced: 29 Jun 2025
https://github.com/edochiari/layoffs-data_cleaning
SQL script for cleaning a dataset related to work layoffs. It removes duplicates, standardizes data, handles null values, and eliminates irrelevant columns and rows, ensuring data integrity
Last synced: 29 Mar 2025
https://github.com/edochiari/tiktok-project
This project builds a predictive model to help TikTok classify user-reported content claims, improving moderation efficiency by identifying and prioritizing content that may need review. Insights from this model enable TikTok to manage reports more effectively, ensuring a safer and more engaging platform.
content-claims dataanalysis datacleaning hypothesis-testing jupyter-notebook regression tiktok
Last synced: 29 Mar 2025
https://github.com/edochiari/coffee_sales-data_analysis
This project involves creating a dynamic Coffee Sales Performance Dashboard in Excel, offering actionable insights into sales across various dimensions. Users can filter and explore data interactively, focusing on total sales, sales by country, and top customers, helping stakeholders identify trends and make informed decisions.
coffee dataanalysis datacleaning datavisualization excel sales
Last synced: 29 Mar 2025
https://github.com/edochiari/automatidata-project
This project uses taxi trip data to identify key factors that influence tipping, providing insights to help drivers maximize tips through optimized service.
dataanalysis datacleaning hypothesis-testing jupyter-notebook machine-learning regression taxi tipping
Last synced: 29 Mar 2025
https://github.com/bipinoli/complex-sentence-splitter-to-simple-sentences
A package to split a complex text into simple sentences.
datacleaning nlp-library nlp-parsing python
Last synced: 15 Jul 2025
https://github.com/georgehanymilad/end-to-end-shopping-trends-data-analysis
SQL+ Python + Power BI Project for Data Analysis
data-analysis data-visualization datacleaning mssql powerbi python sql
Last synced: 29 Oct 2025
https://github.com/shekharkram/project
A collection of data analytics projects showcasing skills in data cleaning, exploration, visualization, and basic SQL queries. Designed to demonstrate entry-level data analyst competencies using real-world datasets and tools.
datacleaning excel jupyter-notebook mysql numpy pandas postgresql python sql
Last synced: 24 Dec 2025
https://github.com/tejaswirupa/early-prediction-of-diabetes-risk-using-machine-learning
Built a predictive model using CDC health data to identify individuals at risk of developing diabetes. Achieved 90.6% F1-score using Logistic Regression and revealed key health indicators like BMI and blood pressure as top predictors.
data-science datacleaning exploratory-data-analysis modelevaluation preprocessing-data python scikit-learn supervised-machine-learning
Last synced: 15 Jul 2025
https://github.com/udhaya2823/dataspark-illuminating-insights-for-global-electronics
✨DataSpark✨ is a powerful analytics project transforming raw retail data into actionable insights for Global Electronics. By leveraging Python, SQL, and interactive visualizations, it uncovers trends in customer behavior, sales performance, and product popularity, driving smarter business decisions and boosting growth.
data-science data-visualization database-management datacleaning exploratory-data-analysis matplotlib numpy pandas powerbi python seaborn sql version-control
Last synced: 17 Jul 2025
https://github.com/athari22/statistics-from-stock-data
Statistics from Stock Data
cvs data data-science dataanalysis datacleaning dataframe jupyter pandas pandas-python python statistics stock table
Last synced: 07 Oct 2025
https://github.com/huseinhaji/projects
This repository is a collection of projects I have worked on, showcasing my skills in data analysis, data science, and machine learning.
businessanalytics dataanalysis datacleaning datavisualization machinelearning matplotlib python sklearn
Last synced: 19 Jun 2025
https://github.com/edochiari/customer_clustering-project
This project applies K-Means clustering to segment customers based on RFM metrics, helping identify key customer groups for targeted marketing and loyalty strategies.
dataanalysis datacleaning jupyter-notebook kmeans-clustering
Last synced: 12 Mar 2025
https://github.com/shubhamsoni98/survey-data-analysis
Surey Data Analysis
analysis dashboards data data-mining data-visualization dataanalysis datacleaning datascience datasets insights pivot-tables pivotanalysis
Last synced: 02 Mar 2025
https://github.com/yadavkaushal/datascience-e-commerce-shopping-details
This project analyzes customer purchase data including details such as location, company, credit card usage, browser info, job roles and purchase price. It explores patterns in payment methods, spending behavior and online transactions. Using Pandas, Matplotlib and Seaborn, we clean analyze and visualize key trends to derive actionable insights.
data datacleaning dataframe datapreprocessing dataset libraries matplotlib numpy pandas plots visulaization
Last synced: 24 Dec 2025
https://github.com/kimatudo3/atliq-hardware-dashboard
The AtliQ Hardware BI 360 Dashboard is a comprehensive business intelligence tool crafted to empower AtliQ Hardware with data-driven insights across various departments.
atliq dashboard data-engineering data-visualization database database-management datacleaning dax-query m mysql powerbi-desktop powerquery sql-server visualization
Last synced: 21 Mar 2025
https://github.com/aadityasikder/Object-Detection-with-raspberry-pi-implementing-TinyML-models
Repository for Raspberry Pi-based object detection with TinyML models like TensorFlow Lite, PyTorch Nano, including data gathering, mAP evaluation, and image data preparation in Jupyter notebooks.
data-gathering datacleaning dataprocessing image-preparation object-detection pytorch-nano raspberry-pi-4 tensorflow-lite tinyml
Last synced: 16 Dec 2025
https://github.com/lazakiro/finally-postgres
Ready-to-use PostgreSQL development environment with Docker. Simple setup, smart defaults, and comprehensive management commands for local development and testing. Zero configuration needed to start, fully configurable when needed.
containerization database datacleaning development development-environment devops docker-environment lambda-functions local-development nextjs postgres-docker postgresql quickdbd react
Last synced: 04 Mar 2025
https://github.com/vincenzopalazzo/visualsars2chart
Visual analytics data COVID-19 (SARS 2) with python and Tableau
covd-19 covid-2019 covid19 data-visualization datacleaning dataset python3
Last synced: 28 Mar 2025
https://github.com/ahmad-ali-rafique/random-forest-regressor-modeling
Detailed exploration of random forest regressors, including data cleaning, model building, and performance evaluation on various datasets.
data dataanalytics datacleaning evaluation-metrics modeling random-forest random-forest-regression regression regression-analysis
Last synced: 05 Mar 2025
https://github.com/zawadi-wanjiru/house-prices-prediction-group-project
Predicting House Prices Using Regression Analysis
datacleaning datavisualization descriptive-statistics exploratory-data-analysis jupyter-notebook matplotlib modelling pandas-library predictive-analysis python regression-analysis scikit-learn seaborn-python
Last synced: 29 Mar 2025
https://github.com/kuldeepsharma-dataanalyst/college_database_system_sql_project
SQL project demonstrating database design, queries, and analysis for a college management system.
columns datacleaning datamanagement dbms dbmsproject mysql-database pgadmin postgresql queries rows sql sqlqueries tables
Last synced: 05 Nov 2025
https://github.com/ashish-kr-srivastava/olympic-games-eda---python
About Exploratory Data Analysis of a Historical Olympic Games Dataset, including all the games from Athens 1896 to Rio 2016.
data-visualization datacleaning eda matpotlib numpy pandas python seaborn seaborn-python
Last synced: 24 Oct 2025
https://github.com/abhijeet107/task-1
Data Cleaning and Preprocessing
datacleaning excel pandas python
Last synced: 13 Apr 2025
https://github.com/mohamed-khaled0/covid-data-exploration.sql
Covid-19 data
covid19-data data-analysis datacleaning microsoft-sql-server sql
Last synced: 20 Jul 2025
https://github.com/sathyanarayanan2002/ml_project
A house price prediction website built with Django allows users to input property details and receive real-time price estimates using machine learning model. The site integrates Django for backend functionality and serves machine learning predictions based on user input.
algorithms css3 datacleaning django html5 linear-regression python
Last synced: 29 Dec 2025
https://github.com/thebaldanalyst/projects
A collection of various data analytic projects showcasing skills in EDA, data cleaning, data visualization and data scrapping
dashboard datacleaning datavisualization eda excel powerbi python smss sql tableau
Last synced: 09 Apr 2025
https://github.com/mastermindromii/data-cleaning-using-power-query
A Simple Real-Time Data Cleaning Using Power Query in Power BI
datacleaning netflixdata powerbi powerquery rawdata-converter
Last synced: 25 Feb 2025
https://github.com/swethajoseph/credit-risk-assessment-eda-case-study
Conducted an Exploratory Data Analysis (EDA) using Python to assess credit risk, identifying key factors that contribute to loan defaults and improving lending decisions
data-analysis data-visualization datacleaning datapreparation exploratory-data-analysis feature-engineering jupyter-notebook matplotlib-pyplot numpy-library pandas-library python-library risk-analysis risk-assessment risk-management seaborn-plots visual-studio-code
Last synced: 27 Oct 2025
https://github.com/priyapuranik/diwali_sales_dashboard
A Power BI dashboard that analyzes Diwali sales data, providing insights into revenue, orders, and customer demographics across various categories and regions.
charts dashboard datacleaning dax-query powerbi
Last synced: 15 Jul 2025
https://github.com/ahmad-ali-rafique/electricity-consumption-analysis-household-dataset
This repository contains analysis and predictive modeling of household electricity consumption using Python. It includes data cleaning, exploratory data analysis (EDA), time series forecasting (ARIMA, SARIMA, LSTM), and model evaluation to optimize energy usage.
arima-forecasting artificial-intelligence artificial-neural-networks data data-science dataanalytics datacleaning evaluation-metrics exploratory-data-analysis long-short-term-memory lstmmodel modeling time-series timeseries-forecasting
Last synced: 23 Jun 2025