Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with eda

A curated list of projects in awesome lists tagged with eda .

https://github.com/ashishtukaral/data-science-projects

This collection showcases a variety of projects where I've applied machine learning and data analysis techniques to tackle real-world problems

classification clustering data-science detection eda feature-engineering machine-learning prediction python regression

Last synced: 08 Jan 2025

https://github.com/arxiver/airbnb-eda-and-regression

Big data exploration and analysis on Airbnb dataset as well as regression model for price prediction of entities

airbnb analysis big-data big-data-analytics bigdata eda python regression regression-models visualization xgboost

Last synced: 15 Jan 2025

https://github.com/kunalshelke90/wine-quality-testing

This project is about creating a machine learning algorithm that can predict the quality of wine based on the given dataset. Different machine learning algorithms such as logistic regression, decision tree and random forest are used in this project.

eda feature-engineering flask machine-learning numpy pandas python

Last synced: 03 Jan 2025

https://github.com/mindlessmuse666/eda-pandas

Проект по разведочному анализу данных (EDA) о пассажирах Титаника с использованием библиотеки Pandas. Включает в себя загрузку данных, предобработку, статистический анализ, визуализацию и создание сводных таблиц. Цель проекта - демонстрация основных методов и инструментов EDA для анализа и понимания данных.

data-analysis data-processing data-science data-visualization eda exploratory-data-analysis matplotlib pandas python titanic

Last synced: 27 Jan 2025

https://github.com/christosg88/gen_design

Generate a verilog file and an acompanying SPEF file, for a hierarchical design

cpp cpp20 eda

Last synced: 09 Nov 2024

https://github.com/ammahmoudi/bike-sharing-trends

Predicting bike sharing trends using classic machine learning methods (linear regression, decision tree)

bike-sharing decision-trees eda linear-regression machine-learning ml

Last synced: 15 Jan 2025

https://github.com/27ahmad/amazon-sales-analysis

This repository contains an exploratory data analysis (EDA) and visualization project of Amazon sales data. The goal is to uncover insights and present key metrics through a Tableau dashboard.

data-analysis eda pandas python seaborn tableau

Last synced: 15 Jan 2025

https://github.com/27ahmad/foreign-direct-investment-analytics

This repository contains an exploratory data analysis (EDA) and visualization project on a dataset of Foreign Direct Investment (FDI) by companies. The objective is to analyze FDI trends and present key insights through an interactive Tableau dashboard.

data-analysis eda matplotlib pandas python seaborn tableau

Last synced: 15 Jan 2025

https://github.com/gappeah/layoffs-exploratory-data-analysis

This project uses MySQL to perform data cleaning and exploratory data analysis (EDA) on a dataset detailing company layoffs. The primary goal is to process, clean, and explore the data to gain insights into trends and patterns related to layoffs across various sectors.

data dataanalysis eda mysql sql

Last synced: 07 Jan 2025

https://github.com/controldata23/product-sales-from-amazon

This is an Exploratory Data Analysis done on the Amazon Product Sales dataset from kaggle.

data-analysis-python data-cleaning data-exploration data-visualisation eda matplotlib

Last synced: 08 Jan 2025

https://github.com/realsubhamsahoo/data-cleaning-eda-project-mysql

Data Cleaning and EDA Project using MySQL

data-cleaning eda mysql sql

Last synced: 09 Jan 2025

https://github.com/krzysikd/uber_fare_prediction

Predicting uber fares using advanced machine learning models and feature engineering techniques

data-analysis data-processing eda hyperparameter-tuning jupyter machine-learning regression-models

Last synced: 15 Dec 2024

https://github.com/ashishsingh789/customer_purchase_prediction_using_decision-tree-_classifier

Decision Tree Classifier to predict customer purchases using demographic and behavioral data. Key steps: data preprocessing, EDA, model training, evaluation, and feature importance analysis.

data datascience desiciontree eda machine-learning-algorithms matplotlib numpy pandas-dataframe python seaborn

Last synced: 15 Jan 2025

https://github.com/aniruddha-10/data-201-group-project

Simple and basic Exploratory data analysis

eda excel openrefine tableau

Last synced: 15 Jan 2025

https://github.com/karlyndiary/mavens-pizza-sales-insight

Analyzing Maven Pizza's sales performance and business insights by exploring key metrics, product trends, customer behavior, and peak sales periods, utilizing SQL for querying and Excel for dashboard visualizations.

analysis data-exploratory data-pipeline data-visualization eda etl excel-dashboard pizza-dataset sql

Last synced: 28 Jan 2025

https://github.com/raufjatoi/diabetes

EDA and model implementation on diabetes dataset

data-visualization eda machine-learning

Last synced: 25 Jan 2025

https://github.com/faizantkhan/machine-learning

Machine Learning Practice and Exercises Welcome to our repository dedicated to the practice and mastery of machine learning (ML) concepts and techniques. This repository serves as a comprehensive resource for learners and enthusiasts looking to enhance their ML skills through hands-on exercises and practical applications.

classification-algorithm clustering-algorithm data-science datavisualization decision-trees eda linear-regression logistic-regression machine-learning machine-learning-algorithms machine-learning-library math matplotlib-pyplot model-selection pandas python sklearn-library testing-data training-data

Last synced: 15 Jan 2025

https://github.com/faizantkhan/automated-eda

This repository showcases tools for automatic Exploratory Data Analysis (EDA) in Python. These tools help you quickly understand your datasets and generate insightful reports.

automatic automation autoviz data-analysis data-analysis-python data-science data-visualization dtale dtale-library eda exploratory-data-analysis ml pandas pandas-profiling python python-library sweetviz

Last synced: 15 Jan 2025

https://github.com/atharvapathak/credit_eda_case_study

This repository describes my assignment in Data Science related to Exploratory Data Analysis in detecting the type of customers who are more likely to accept a loan with minimum defaulting rate.

credit credit-eda-github eda exploratory-data-analysis loan-default-analysis loan-prediction python

Last synced: 15 Jan 2025

https://github.com/snowkylin/npn

A boolean matcher that computes the NPN canonical representative for a given boolean function.

boolean-matcher cpp eda logic-synthesis npn pypi-package python

Last synced: 15 Jan 2025

https://github.com/brayvid/satellites-eda

Flatiron School Data Science Bootcamp Phase 2 Project

data-science earth-orbit eda inferential-statistics kaggle

Last synced: 11 Jan 2025

https://github.com/brayvid/space-debris-eda

Flatiron School Data Science Bootcamp Phase 1 Project

data-science descriptive-statistics earth-orbit eda kaggle

Last synced: 11 Jan 2025

https://github.com/ayeshaaaaaaaaa/app-reviews-sentiment-analysis

App Reviews Sentiment Analysis is the process of extracting meaningful insights from the textual content of app reviews by determining the sentiment expressed within. Sentiment analysis helps categorize reviews as positive, neutral, or negative, making it easier for developers to prioritize enhancements based on user feedback.

app-review bda eda logistic-regression naive-bayes sentiment-analysis svm

Last synced: 15 Jan 2025

https://github.com/sharoonjoseph321/insurance_fraud_detection

Fraud Detection using machine learning algorithm-KN Neighbors .Data exploration using Pyspark and matplotlib.

analytics data data-science eda high-performance knn-algorithm knn-classification machine-learning matplotlib-pyplot pyspark python seaborn spark statistics

Last synced: 28 Jan 2025

https://github.com/lu-sketch/chocolate-imports-dataset

Chocolate Imports for South Africa

data eda visualization

Last synced: 28 Jan 2025

https://github.com/anderson-andre-p/exploratory-data-analysis.roller-coaster

This repository contains an exploratory data analysis (EDA) project focused on roller coasters. The project involved organizing, cleaning, and visualizing the data to gain insights into roller coasters' characteristics and performance.

data-analysis eda exploratory-data-analysis exploratory-data-visualizations notebook

Last synced: 22 Jan 2025

https://github.com/skywardai/cecilia

EDA tools and datasets generator for ML projects

dataset dataset-generation eda embeddings

Last synced: 11 Nov 2024

https://github.com/nikhilsree5/targetcasestudy

An exploratory and in-depth study of the e-commerce market in Brazil.

bigquery eda sql visualization

Last synced: 22 Jan 2025

https://github.com/harmanveer-2546/bird-species-prediction-using-deep-learning

Using convolutional neural networks to build and train a bird species classifier on bird pics data with corresponding species labels, also build GUI for the same.

3d-graph callback deep-learning eda gui gui-application image-generator imageclassification keras-tensorflow matplotlib maxpooling mobilenetv2 numpy opencv pillow plotly python seaborn transfer-learning visualization

Last synced: 11 Jan 2025

https://github.com/harmanveer-2546/eda-on-indian-railways

Indian Railways is a statutory body under the ownership of the Ministry of Railways of the Government of India that operates India's national railway system. As of 2023, it manages the fourth largest national railway system by size with a track length of 132,310 km, running track length of 106,493 km and route length of 68,584 km.

clean-data eda exploratory-data-analysis geometry geopandas indian-railways json linestring matplotlib numpy os pandas plotly python railway seaborn shapely train visualization

Last synced: 11 Jan 2025

https://github.com/harmanveer-2546/chatgpt-reviews-eda

The data in this EDA consists of daily-updated user reviews and ratings for the ChatGPT Android App. It also contains data on the relevancy of these reviews and the dates they were posted.

calendar chatgpt correlation datacollection droppingirrelevantcolumns eda exploration exploratory-data-analysis insights-data matplotlib mticker pandas python removingduplicates reviews-eda seaborn

Last synced: 11 Jan 2025

https://github.com/ribin-baby/the-sparks-foundation-data-science-internship

This repository contains tasks and solutions assigned as part of internship program. This repository contains workbooks on data analysis and model building parts.

data-analysis eda python3

Last synced: 16 Jan 2025

https://github.com/jsinkx/traffic-accident-ml

Forecast car and traffic accidents depending on weather with machine learning

eda forecast matplotlib ml numpy optuna pandas randomforest seaborn sklearn tpot

Last synced: 09 Jan 2025

https://github.com/no-country-simulation/s16-21-n-data-bi

Analisis del COVID-19 - insights sobre la evolución de la pandemia - impacto en 5 paises sudamericanos.

eda etl machine-learning matplotlib pandas powerbi python scikit-learn seabron streamlit

Last synced: 11 Nov 2024

https://github.com/albertofaraujo/sql_eda_capes

O objetivo desta análise exploratória é identificar padrões e tendências nas atividades de fomento a bolsas de estudos no Brasil e no exterior, promovidas pela Capes desde 2005

apache-spark data-science databricks eda sql

Last synced: 17 Jan 2025

https://github.com/alchemine/know-based-job-recommendation

KNOW기반 직업 추천 알고리즘 경진대회

eda machine-learning python

Last synced: 16 Jan 2025

https://github.com/alchemine/liver-microsome-prediction

인간과 쥐의 간 대사 효소에 대한 화합물 대사안정성 예측모델 개발 경진대회

drug-discovery eda gnn python

Last synced: 16 Jan 2025

https://github.com/mehrab-kalantari/spotify-songs-analysis

Analysis and Hypothesis Testing of Spotify Songs Dataset

eda hypothesis-testing hypothesis-tests statistical-tests

Last synced: 16 Jan 2025

https://github.com/mehrab-kalantari/advanced-house-prices-analysis

Analysis and Hypothesis Testing of Advanced House Prices Dataset

eda hypothesis-testing hypothesis-tests ttest

Last synced: 16 Jan 2025

https://github.com/virajbhutada/telecom-customer-churn-prediction

Predict and prevent customer churn in the telecom industry with this project. Harness the power of advanced analytics and Machine Learning on a diverse dataset to develop a robust classification model. Gain deep insights into customer behavior and identify critical factors influencing churn using interactive Power BI visualizations.

churn-prediction classification-models customer-attrition-analysis customer-churn-prediction data-analysis data-science decision-tree-classifier eda logistic-regression machine-learning machine-learning-algorithms machine-learning-models pandas powerbi powerbi-desktop python random-forest-classifier roc-curve xgboost-classifier

Last synced: 10 Jan 2025

https://github.com/hannah-aji/bank-loan-segmentation-using-sql-excel-tableau

The project aims to uncover insights related to loan applications, funding, repayments, and borrower demographics, facilitating data-driven decision-making in the banking sector.

data-analytics data-science dax-query eda etl excel-functions loan-analysis ms-sql-server tableau-server

Last synced: 18 Jan 2025

https://github.com/balajimohan18/loan-clustering-datascience-project

This project uses Machine Learning to Cluster loan together based on their similarities. The project uses a dataeset of loan application which includes information about the Loan amount and Balance. The project then use the clustering algorithm to group the loan together based on the similarities.

clustering-algorithm data-analysis data-science data-visualization eda kmeans-clustering machine-learning sql unsupervised-learning

Last synced: 14 Jan 2025

https://github.com/shubhamsoni98/prediction-with-binomial-logistic-regression

To predict client subscription to term deposits and optimize marketing strategies by identifying potential subscribers.

binomial data data-science eda machine-learning matplotlib pipeline python scikit-learn seaborn sklearn sql visualization

Last synced: 22 Jan 2025

https://github.com/tazeenrashid/orders-analysis-using-python-sql-server-and-tableau

I sourced some Orders data through Kaggle; did EDA using Python and then fetched some insights out of cleaned data using SQL Server (SSMS). Then, I built a Tableau Dashboard for some visual insights. Have a look and share your feedback!

analytics data eda jupyter-notebook python sql tableau

Last synced: 22 Dec 2024

https://github.com/somjit101/data_science-eda

A collection of useful implementations to perform EDA on a new dataset in order to understand preliminary patterns in the dataset and gain a high-level grasp of the dataset using plots and visualizations.

boxplots contour-plots distribution eda histogram iris-dataset plots qqplot seaborn-plots statistical-analysis violin-plots

Last synced: 16 Jan 2025

https://github.com/duckyot/mcdonald_nutritional_analysis

This project provides a comprehensive analysis of McDonald's menu items, focusing on their nutritional content.The project empowers stakeholders to make data-driven decisions to optimize menu offerings and promote healthier choices.

calorie-breakdown data-science eda food-analytics interactive-dashboard jupyter-notebook mcdonalds menu-optimization nutritional-data pandas public-health restaurant-analytics sales tableau

Last synced: 22 Jan 2025

https://github.com/vlada-pv/eda-regression-marketingdata

Engineering a regression model to predict clients' annual income, evaluating feature distributions, testing hypotheses, and assessing the model using the OLS method.

eda ols prediction-model regression-analysis regression-models statistics

Last synced: 22 Jan 2025

https://github.com/venkyiyer/project-deployment

A project about creating a model in the research environment, and then transform the research code into production code, package the code and deploy to an API, and add continuous integration and continuous delivery.

cicd eda jupyter-notebooks ml python3

Last synced: 28 Jan 2025

https://github.com/tomfreudenberg/cedra

Harnessing the strengths of Cement for fast CLI apps, Dramatiq for reliable task processing, and Grpc for external access, CEDRA redefines efficiency in event-driven architecture.

cement dramatiq eda event-driven grpc message-queue python rabbitmq trpc

Last synced: 15 Dec 2024

https://github.com/ajmannust41288/data-analyst

Data Analyst ,Microsoft Professional expert,Desktop PowerBi ,Tablue and Dashboards with ChatGP4 AI uses

business-analytics data-analysis data-analyst data-analytics eda

Last synced: 22 Jan 2025

https://github.com/hariprasath-v/amazon-ml-hiring-challenge

Online machine learning hackathon to classify the customer based on various activity scores on the e-commerce website.

dataanalysis eda exploratory-data-analysis ggplot2 r

Last synced: 13 Jan 2025

https://github.com/abhipatel35/moviematcher-movie-recommender-system

A robust movie recommendation system using the MovieLens dataset, employing Collaborative Filtering, Matrix Factorization, and Hybrid Models to enhance recommendation accuracy and diversity.

collaborative-filtering content-based-filtering data-analysis eda hybrid-models machine-learning matrix-factorization movie-recommendations movielens-dataset python recommender-system surprise-library

Last synced: 22 Jan 2025

https://github.com/ammahmoudi/water-treatment-plant

Categorizing the plant's operation state using sensor data suing SVMs.

eda knn machine-learning ml svm water-treatment

Last synced: 15 Jan 2025

https://github.com/priyapuranik/gold_price_prediction

This project is based on regression that will predict the Gold Price using Random Forest regressor model by analyzing historical market data.

eda jupyter-notebook machine-learning-algorithms python random-forest-regression visualization

Last synced: 22 Jan 2025

https://github.com/ibm-cloud-architecture/refarch-eda-item-inventory-sql-flink

A SQL Flink implementation to compute the item inventory and store inventory

eda kafka

Last synced: 17 Jan 2025

https://github.com/srking501/futurelearn_mooc

A summative coursework for CSC8631 Data Management and Exploratory Data Analysis

crisp-dm data-mining data-preprocessing data-science data-visualization deployment eda exploratory-data-analysis

Last synced: 28 Jan 2025

https://github.com/omarsaad21/credit-train-data-science-project

This a full web application to predict the credit score of clients plus I did many visulizations to express many insights in chart

eda matplotlib ml numpy pandas python sklearn streamlit-webapp

Last synced: 15 Jan 2025

https://github.com/noodleslove/house-of-representatives-analysis-ii

In this project, we want to estimate if a transaction will have capital gains exceeding $200 using the provided dataset.

coursework data-analysis data-science eda feature-engineering pandas python3

Last synced: 28 Jan 2025

https://github.com/anuuragg/human-microbiome---eda

Fundamentals of Data Science - End Semester Project 1

data-science data-visualization eda fds microbiome

Last synced: 21 Jan 2025

https://github.com/mdanwarulkarim/netflix-data-analysis-excel-project

This project analyzes Netflix's content data, emphasizing trends in production and distribution. It addresses business questions through an interactive dashboard, exploring movie and TV show distribution, key contributors, genre trends, and geographic diversity. The analysis provides insights into Netflix's expanding library.

eda excel

Last synced: 29 Jan 2025

https://github.com/alchemine/computer-vision-anomaly-detection

Computer Vision 이상치 탐지 알고리즘 경진대회

eda machine-learning python

Last synced: 16 Jan 2025

https://github.com/alchemine/diabetes-prediction

Diabetes Prediction and Analysis (NHIS-2018)

eda jupyter python scikit-learn streamlit

Last synced: 16 Jan 2025

https://github.com/abhash-rai/regression-car-price-prediction

This repository contains my first complete data science project from web scrapping for data to data preprocessing, cleaning, exploratory data analysis, model training and deployment.

data data-science data-visualization eda exploratory-data-analysis machine-learning neural-network prediction prediction-model regression

Last synced: 16 Jan 2025

https://github.com/vishnu-vamshii/layoffs-data-analysis-in-sql

This project focuses on the cleaning and exploratory analysis of a dataset containing layoff information. It includes data deduplication, standardization of columns, handling null and blank values, and analyzing layoffs by company, industry, country, and date. Various SQL queries are used to explore trends and patterns in layoffs over time.

data-analysis eda mysql

Last synced: 29 Jan 2025

https://github.com/sehgal-vishal/sql-nyc-collision-analysis

this analysis is based on the Collisions(Accidents) happend in New York City. I have used Sql Server For EDA(Exploratory Data Analysis

data-analysis database eda sql-server

Last synced: 18 Jan 2025

https://github.com/ashish-kr-srivastava/social-media-database-analysis-sql-project

I have recently worked on a project of analyzing a social media platform data on MS SQL SERVER. In this project I have used advanced SQL functions and keywords like Views, Indexes, CTE, Windows Functions and many more.

eda joins mssqlserver schema views windowsfunction

Last synced: 18 Jan 2025

https://github.com/abinashsahoo007/project-bankruptcy-prevention

The project is to create a classification model that predicts the chances of a business facing bankruptcy based on the key feature like Industrial Risk, Management Risk, Financial Flexibility, Credibility, Competitiveness, Operating Risk.

data-analysis data-mining data-visualization deployments eda machine-learning pickle python statistics streamlit

Last synced: 09 Jan 2025

https://github.com/abinashsahoo007/project-song-recommendation-system

This Project is a Simple Content-Based Song Recommendation System. It suggest similar item to the user based on the content the user provide.

correlation cosine-similarity data-mining dbscan-clustering deployment eda heirarchical-clustering k-means-clustering pandas-profiling pca pickle recommender-system statistics streamlit visualization

Last synced: 09 Jan 2025

https://github.com/kaushikrohida/bank-customer-data-prep

Cleaning and Exploring the bank customer data to prepare it for machine learning models

business eda finance geospatial python

Last synced: 19 Jan 2025

https://github.com/silvano315/churn-prediction-with-shap

This projects aims to classify potential churn customers using a Telco Customer Dataset from IBM. The main applications are about the explainability integration with SHAP algorithm and the creation of an interactive dashboard with Exploratory, Classification and Explainability insights.

churn-prediction classification data-science eda explainability ibm machine-learning shap

Last synced: 29 Jan 2025

https://github.com/silvano315/stroke_prediction

Stroke prediction with machine learning and SHAP algorithm using Kaggle dataset

classification eda explainability machine-learning shap stroke

Last synced: 29 Jan 2025

https://github.com/silvano315/health-insurance-prediction

This repository aims to test some machine learning and ELI5 explainability technique in order to predict whether the customer would be interested in Vehicle insurance, you have information about demographics, vehicles, policy

adasyn data-science eda eli5 explainability health insurance machine-learning random-forest vehicle xgboost

Last synced: 29 Jan 2025

https://github.com/gpsyrou/binary_classification_of_bank_marketing_campaigns

Exploratory data analysis (EDA) and development of classification algorithms (Logistic Regression, Random Forest) to predict clients that are most likely to subscribe to a bank's product, as a result of marketing campaigns.

classification eda logistic-regression python random-forest

Last synced: 23 Jan 2025

https://github.com/dragonman225/ngrp

A Ngspice ASCII rawfile parser written in Javascript.

eda eletronics ic-design ngspice spice

Last synced: 19 Jan 2025

https://github.com/sparab16/creditcardprediction

To build a classification methodology to determine whether a person defaults the credit card payment for the next month.

eda flask machine-learning naive-bayes python sqllite3 xgboost

Last synced: 29 Jan 2025

https://github.com/nicklasbekkevold/anonymized-dataset-classification

Classifying an unknown data set using ensemble machine learning methods with a focus on exploratory data analysis. This was a part of the course TDT05 - Modern Machine Learning in Practice at NTNU autumn 2021.

eda ensemble-learning machine-learning ntnu

Last synced: 22 Dec 2024

https://github.com/karthikarajagopal44/pandas-beginner-to-advanced

This repository is designed to be a comprehensive guide to mastering pandas, the powerful data manipulation and analysis library in Python.

data-manipulation datascience eda pandas pandas-dataframe python

Last synced: 25 Jan 2025

https://github.com/shubhamsoni98/project_using_knn

This project applies the K-Nearest Neighbors (KNN) algorithm to predict iPhone purchases based on customer data. Using features like age, salary, and previous purchase behavior, the KNN model classifies customers into buyers and non-buyers.

anaconda analytics data data-science eda knn knn-classification machine-learning-algorithms predict project python scikit-learn tableau

Last synced: 22 Jan 2025

https://github.com/gregoritsch3/ml_eda_classification_loanapprovalprediction

An EDA and Machine Learning Classification exercise on the Loan Approval dataset demonstrating EDA, feature engineering, StratifiedKFold and the use of Tensorflow NN, SVC, LinearSVC, XGBoost, Naive-Bayes, Bagging, Random Forest and Decision Tree algorithms.etc. The modela are optimized using hyperparameter tuning through GridSearchCV.

eda feature-engineering machine-learning matplotlib numpy pandas scikit-learn scipy seaborn tensorflow

Last synced: 02 Feb 2025

https://github.com/ashwin331133/sql-project--sales-data-analysis--walmart

This SQL-based Walmart data analysis project aims to identify top-performing branches and products, optimize sales strategies using Kaggle's Walmart Sales Forecasting Competition dataset.

data-analysis eda sql

Last synced: 22 Jan 2025