An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/monish-nallagondalla/diamondpriceprediction

Diamond Price Prediction is an end-to-end machine learning project that predicts diamond prices based on attributes like carat, cut, color, clarity, and dimensions. It features a Flask web application for real-time predictions and utilizes models such as Linear Regression, Lasso, and Ridge.

data-analysis data-science flask jupyter-notebooks machine-learning predictive-modeling python

Last synced: 06 May 2026

https://github.com/muneeb1030/eda-of-physionets-ecg

EDA of Physionet Data set regarding "A Large Scale 12 Lead Electrocardiogram Database for Arrhythmia Study 1.0.0". This project focuses on the preprocessing of electrocardiogram (ECG) signals and utilizes Principal Component Analysis (PCA) for dimensionality reduction

12-lead-ecg data-analysis ecg-signal eda pca python3 wfdb

Last synced: 25 Jul 2025

https://github.com/sarathchandranpm/restaurant_order_analysis

This project entails an in-depth analysis of a restaurant's order and menu data. The focus is on exploring customer ordering behaviors, menu item attributes, and order specifics. By investigating the connections between order details, menu items, and order dates, the project seeks to generate valuable insights into the restaurant's operations.

data-analysis mysql sql

Last synced: 10 Apr 2025

https://github.com/backdoorali/insider-threat-detection-project

Personal data analysis project combining insider threat detection, cybersecurity, and exploratory data analytics. Built for portfolio showcase and practical skills demonstration.

cybersecurity data-analysis data-analysis-excel data-analysis-project data-analyst data-analytics data-visualization eda excel insider-threat jupyter-lab jupyter-notebook matplotlib numbers pandas portfolio-project python python3 threat-detection threat-intelligence

Last synced: 07 May 2026

https://github.com/cworld1/novel-analysis

A simple project for analyzing Chinese novels

data-analysis novel

Last synced: 17 Mar 2025

https://github.com/robinmillford/analytics_for_fashion_supply_management

This Streamlit dashboard provides a comprehensive analysis of supply chain data, focusing on key metrics such as production volumes, stock levels, order quantities, revenue, manufacturing costs, lead times, shipping costs, transportation routes, risk factors, and sustainability factors

dashboard data-analysis data-visualization streamlit supply-chain-management

Last synced: 07 Sep 2025

https://github.com/prime-infinity/type-one

Software to visualize and analyze GitHub repos based on certain statistics such as stars, forks and issues

data-analysis data-visualization

Last synced: 03 Feb 2026

https://github.com/jubinjacob03/heartdiseaseclassify-ml

Heart Disease Dataset Analysis & Classification using ML models such as linear, support vector machine, k-means, k-nearest neighbors and logistic regression.

data-analysis data-science data-visualization ipython-notebook kaggle-dataset kmeans knn linear-regression logistic-regression machine-learning matplotlib python seaborn support-vector-machine

Last synced: 18 Jan 2026

https://github.com/jethronap/jstat-gui

Web-based GUI application for data analysis

data-analysis data-visualization java jstat mongodb

Last synced: 08 May 2026

https://github.com/cagandemirmr/google-play-yorum-analizi

Türkiyede 2024 yılında en çok beğenilen My Supermarket Simulator 3D oyununa ait yorumların duygu durumu,yorumların beğeni sayısını,Firmanın geri dönüşleri ve kullanıcı nicknameleri gibi değişkenleri analiz ederek içgörü topladım.

bert data-analysis data-science nlp

Last synced: 10 Jun 2026

https://github.com/aekanshd/crazytics-suicidesindia

Basic interpretation of the Suicides in India data-set using R.

data-analysis data-science graph india r suicides

Last synced: 10 Jun 2026

https://github.com/jayita11/atliqo-bank-credit-card-launch-eda

This project involves exploratory data analysis and statistical testing for AtliQo Bank's new credit card launch. Key insights include targeting high-income occupations and the 18-25 age group. Recommendations focus on tailored marketing campaigns, education, and incentives to enhance credit card adoption and usage among young adults.

data-analysis hypothesis-testing matplotlib p-value pandas python seaborn statistics z-test

Last synced: 09 Apr 2026

https://github.com/devexpress-examples/aspnet-pivot-grid-custom-aggregates

This example shows how to aggregate data by the field's first value.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 06 Jul 2025

https://github.com/ahmednasef3/udemy-courses-full-eda

Exploratory Data Analysis on the factors that can affect the promotions and earnings in Udemy Courses and the perfect way to make a good saled course in Udemy.

data-analysis data-science data-visualization eda exploratory-data-analysis matplotlib pandas seaborn udemy-course-project

Last synced: 01 May 2026

https://github.com/mariam-badr-mb/gtc-ml-project2-diabetes-prediction

This project is part of the GTC Machine Learning Program. It demonstrates the end-to-end ML workflow by building a predictive model for diabetes detection

classification-algorithm data-analysis data-visualization diabetes-prediction gridsearchcv hyperparameter-tuning machine-learning python

Last synced: 09 May 2026

https://github.com/asifdotexe/air-quality-analysis-aqa

AQA is a data-driven project focused on analyzing air quality data sourced from data.gov.in. The project encompasses data preprocessing, analysis, and visualization to gain insights into air pollution levels across various locations in India. By examining six key pollutants, the project aims to raise awareness about the environmental issues

aqi-analysis data-analysis data-preprocessing data-science data-visualization presentation

Last synced: 07 Jun 2026

https://github.com/happybono/sonatasmooth

Provides three different noise reduction algorithms for smoothing out data : Rectangular Averaging, Binomial Median Filtering, and Binomial Averaging. It processes data from a list and displays the results in another list.

algorithms average binomial binomial-coefficient binomial-theorem calibration csharp data-analysis data-calibration dynamic-noise-reduction median noise-algorithms noise-reduction noise-reduction-kernel outliers rectangular-averaging windows-desktop windows-desktop-application windows-forms winforms

Last synced: 30 Oct 2025

https://github.com/nafisalawalidris/northwind-traders-sales-analysis

Northwind Traders Sales Analysis project, which analyses sales data for a fictitious company. It utilises the Northwind Database and includes SQL queries to provide insights on employees, products, suppliers and revenue. The project aims to help the company gain valuable information for business decision-making.

business-insights data-analysis database northwind-traders sales sql

Last synced: 07 Aug 2025

https://github.com/anushadatta/airbnb-in-seattle

🏨 Understanding the Airbnb rental landscape in Seattle using data science.

airbnb data-analysis data-exploration data-visualization datascience sentiment-analysis

Last synced: 13 Jun 2025

https://github.com/gauranshgoel123/predictive-demand-analysis

Demand Forecasting Project A web application for predicting future demand for part numbers based on historical data. Built with React for the frontend and FastAPI with Python for the backend, this application visualizes demand trends and allows users to input additional data for improved accuracy. In render analyzer is frontend analysis is backend

chartjs data-analysis data-science data-visualization dataset deployment full-stack machine-learning numpy pandas predictive-analysis prophet-model python reactjs render

Last synced: 13 Apr 2026

https://github.com/muneeb1030/dataannotation

This streamlines the process of annotating data for machine learning tasks, making it easier and more efficient for teams to create labeled datasets by leveraging Label Studio and Bulk

bulk data-analysis data-annotation label-studio python

Last synced: 10 May 2026

https://github.com/zpreisler/modules

Python libraries and modules for processing simulation outputs

data-analysis python scripts tensorflow

Last synced: 13 May 2026

https://github.com/whis99/userfunnelanalysis

An ecommerce user funnel conversion data analysis with matplotlib & python.

data-analysis data-analysis-python data-analyst data-visualization google-colab jupyter-notebook matplotlib python

Last synced: 13 Apr 2026

https://github.com/pferreirafabricio/data-immersion

🏊🏻‍♂️ Activities and exercises from 'Imersão Dados' event

data data-analysis data-science dataset jupiter-notebook python

Last synced: 14 May 2026

https://github.com/incubrain/awesome-maharashtra-data

A collection of datasets specific to Maharashtra, India. WIP

ai artificial-intelligence data data-analysis data-science datasets maharashtra marathi

Last synced: 23 May 2026

https://github.com/jatin-mehra119/bike-rentals-dataset

This repository focuses on optimizing bike rental availability during peak hours and days using machine learning techniques. Leveraging publicly available data from the UCI Machine Learning Repository, it includes scripts for data preprocessing, model training, and visualization, along with detailed observations and results.

data-analysis data-science ensemble-model pandas scikitlearn-machine-learning

Last synced: 15 Apr 2026

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 05 Feb 2026

https://github.com/abhi18av/innovation-competition

Submission for a programming challenge

clojure clojurescript data-analysis

Last synced: 13 Jun 2026

https://github.com/gustavo-zamai/product_return_data_analysis

Analysis returns products of differents stores

data-analysis excel pandas plotly-express python3 pywin32

Last synced: 13 May 2026

https://github.com/ryannapp12/quant_trading_engine

A modular, and scalable quantitative trading engine built in Python. This project demonstrates efficient data caching with SQLite, concurrent backtesting, and advanced risk analytics, showcasing best practices in clean code architecture and performance optimization.

algorithmic-trading backtesting dash data-analysis data-visualization fintech lstm machine-learning numpy pandas plotly python quantitative-finance real-time risk-management sqlite technical-analysis tensorflow time-series-analysis trading-strategies

Last synced: 11 Apr 2026

https://github.com/kaz-yos/distributed

Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulation Study (Pharmacoepidemiol Drug Saf 2018)

data-analysis epidemiology statistics

Last synced: 15 Jun 2026

https://github.com/luochang212/weibo-analysis

Data analysis based on sina weibo.

data-analysis weibo

Last synced: 03 Apr 2026

https://github.com/parmeetbhamrah/air-quality-india-analysis

Exploratory data analysis of real-time air quality data from Indian cities using Python, Pandas, Matplotlib, and Seaborn.

air-quality data-analysis eda exploratory-data-analysis government-data india matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/mindgamesnl/yanderestats

https://mindgamesnl.github.io/YandereStats/

data-analysis reporting-pipeline yandere yandere-sim

Last synced: 18 Jun 2026

https://github.com/rayyan9477/diamond-price-forecasting

This is a comprehensive machine learning project focused on predicting diamond prices. Using a dataset of diamond attributes, the project implements various machine learning models to forecast prices. Key features include data preprocessing, exploratory data analysis (EDA), and model training with algorithms such as Linear Regression, Decision Tree

data-analysis data-science decision-trees eda linear-regression machine-learning

Last synced: 26 Jul 2025

https://github.com/derrickbaruga7/mapping-median-age-europe

An R project that creates an interactive map of the median age across European regions using Eurostat data and spatial visualization packages.

data-analysis data-science data-visualization datascience european-union mapping r

Last synced: 25 Mar 2025

https://github.com/rogernet/desafio-profissional-produto-data-driven

Ajudar a formar Analistas de Produto, PMs e Gestores de Negócio capazes de tomar decisões estratégicas baseadas em dados.

data-analysis data-science data-visualization product

Last synced: 23 Jun 2026

https://github.com/ayaanjawaid/google_playstore_data_analysis

This project provides an in-depth analysis of Google Play Store apps and user reviews, focusing on understanding app performance, user sentiment, and key trends in app categories. Using Python, I performed data cleaning, feature engineering, and exploratory data analysis (EDA) on app data and reviews.

data-analysis eda html numpy pandas-dataframe plotly python vizualisation

Last synced: 24 Feb 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/maddieemihle/python-challenge

Creating a Python script that analyzes financial records and election results

data-analysis python

Last synced: 09 Jun 2026

https://github.com/ababic/dumpling

Fast, flexibile, powerful static data anonymisation for SQL dumps

anonymisation cli data-analysis data-science pii pii-redaction postgres privacy rust rust-lang scrubber scrubbing security tooling

Last synced: 03 May 2026

https://github.com/syed-m-nofel/python-data-science-fundamentals

Python notebooks for data manipulation (Pandas/NumPy) and API workflows – from basics to practical examples.

api beginner-friendly data-analysis data-science http-requests jupyter-notebook numpy pandas pandas-dataframe python tutorial

Last synced: 03 May 2026

https://github.com/joelfaldin/data-analysis

A collection of data-analysis projects I've built over time! ✨⛏️

data-analysis python r

Last synced: 03 May 2026

https://github.com/devesh8423/machine_learning

Machine Learning practice projects, Jupyter notebooks, and datasets for learning regression, classification, and data analysis.

classification data-analysis data-science data-visualization jupyter-notebook machine-learning matplotlib ml-project numpy-library pandas python regression sckit-learn seaborn

Last synced: 03 May 2026

https://github.com/donmaruko/flask-data-analysis

Flask API for statistical calculations. Data analysis, cleansing, visualization, and manipulation. Documented by Swagger.

api api-rest data-analysis data-science data-visualization datascience flasgger matplotlib pandas seaborn sqlite wordcloud

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/mindlessmuse666/titanic-data-visualization

Проект по визуализации данных о пассажирах Титаника с использованием библиотек Python Matplotlib, Seaborn и Plotly.

data-analysis data-visualization matplotlib pandas plotly python seaborn titanic

Last synced: 04 May 2026

https://github.com/nickenshidqia/uber-new-york-data-analysis

Analyze Uber pickups on New York to get insight from this data

data-analysis data-analyst exploratory-data-analysis python

Last synced: 04 May 2026

https://github.com/fatihilhan42/the-office-eda

Data analysis study of my favorite sitcom, The Office (US).

data-analysis data-science data-visualization fatihilhan office python sitcom

Last synced: 04 May 2026

https://github.com/damisparks/become_data_analyst

Are you new to Data Analysis ? Here you will find simple notebook that will help through your journey. These are personal projects I work on and still working.

data data-analysis data-visualization matplotlib numpy pandas-tutorial

Last synced: 04 May 2026

https://github.com/mr-chang95/sf_data_visualization

In this personal project, I am interested in examining all of the active businesses in the San Francisco Bay Area while performing some simple data visualizations, mainly on categorical variables.

business data-analysis data-visualization jupyter-notebook pandas python san-francisco

Last synced: 04 May 2026

https://github.com/hyperplasma/olympic-visualization-analysis

Multidimensional analysis and visualization of Olympic medals, economy, and happiness index.

data-analysis data-visualization matplotlib numpy pandas python wordcloud

Last synced: 04 May 2026

https://github.com/jendives2000/regressions

Performing of a Linear Regression analysis to determine the strength of the relationship between the number of reviews and sales for a retail company.

data-analysis linear-regression pearson-correlation-coefficient regression

Last synced: 04 May 2026

https://github.com/dhruvsrikanth/basic-data-science

A short Data Science Project I took up for fun! This is a data analysis based on a dataset I created to predict the distribution of wealth within an economy as well as several characteristics of each class within society!

analysis data-analysis data-pipeline data-science data-visualization machine-learning matplotlib pandas python seaborn sklearn

Last synced: 05 May 2026

https://github.com/zafir100100/cancer-stage-prediction

This code predicts cancer data using various regression models, calculates their average R-squared scores, and prints the best model.

cross-validation data-analysis data-preprocessing decision-trees gradient-boosting linear-regression machine-learning-algorithms numpy pandas random-forest regression scikit-learn

Last synced: 05 May 2026

https://github.com/cicku/en.650.672

HW of EN.650.672

analytics data-analysis numpy pandas

Last synced: 05 May 2026

https://github.com/monish-nallagondalla/universal-bank

Credit Card Ownership Prediction A machine learning project that predicts credit card ownership using features like age and income, balancing class distributions for improved accuracy.

classification-models credit-card-prediction data-analysis data-classification decision-tree-classifier imbalanced-datasets machine-learning model-evaluation python scikit-learn

Last synced: 05 May 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/donmaruko/python-eda-toolkit

CLI-runned EDA with 30 commands utilizing text-related functions, statistical calculations, data visualization, and data manipulation.

data data-analysis data-science data-visualization matplotlib pandas scipy seaborn statistical-analysis statistics wordcloud

Last synced: 06 May 2026

https://github.com/ryuzen6/bangalore-real-estate-price-prediction

This is a Data Science Project which predicts the cost of Real Estate in Bangalore. Requirements: Jupyter Notebook (for Data Cleaning and creating the Linear Regression using various python libraries) , Pycharm (python IDE for creating Python Flask Server), Visual Studio Code (to create the UI with HTML, CSS and Javascript).

css3 data-analysis data-science html5 javascript jupyter-notebook machine-learning python3

Last synced: 06 May 2026

https://github.com/yashpaneliya/bank-loan-default-analysis

Analyze and understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default.

data-analysis loan-default-analysis matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/ankitwalimbe/sentiment-analysis

Sentiment analysis of Amazon Fashion reviews using VADER and a baseline ML model (TF-IDF + SGDClassifier). Includes visualizations, reproducible notebook, and recruiter-ready documentation.

data-analysis machine-learning matplotlib nlp pandas python seaborn sentiment-analysis sklearn

Last synced: 06 May 2026

https://github.com/harryrlk/data_analysis_showcase

This repository showcases my data analysis and visualization projects using Excel, Python, R, and Tableau. Some projects are under NDA, so key figures and specific numbers are not included, but brief overviews and methodologies are provided. Feel free to explore and contact me for further details.

data-analysis data-science data-visualization excel portfolio python r tableau

Last synced: 06 May 2026

https://github.com/josepablodmg/python--linear-regression-advertising

A linear regression analysis to predict sales based on advertising spending across TV, radio, and newspaper channels. The project includes exploratory data analysis, model training, coefficient visualization, and residual analysis.

advertising data-analysis exploratory-data-analysis linear-regression machine-learning python regression scikit-learn visualization

Last synced: 06 May 2026

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026

https://github.com/suhas-005/jovian-data-analysis-course-assignment

These are my assignments for Data Analysis : Zero to Pandas course by Jovian.ai

data-analysis data-analytics numpy pandas python

Last synced: 07 May 2026

https://github.com/jpgiant/gujaratrainfallanalysis_2021

Analysis about the rainfall that occurred in the districts of Gujarat state in 2021

data-analysis exploratory-data-analysis exploratory-data-visualizations matplotlib numpy pandas-python python

Last synced: 07 May 2026

https://github.com/pedrosfaria2/fugascomhelicoptero

Meu primeiro uso do Jupyter Notebook em um projeto

analise-de-dados data-analysis jupyter-notebook matplotlib pandas python

Last synced: 07 May 2026

https://github.com/vyjayanthipolapragada/genai_smart_retail_recommendation

GenAI Smart Retail is a recommendation system designed for retail environments. It provides personalized product recommendations to users based on product descriptions using a content-based filtering approach. The system leverages FastAPI for backend integration, allowing users to interact with the recommendation engine via an API. This project aim

content-based-recommendation data-analysis data-science data-visualization fastapi gen-ai instacart-data jupyter-notebook open-ai python3 retail scikitlearn-machine-learning stream

Last synced: 07 May 2026

https://github.com/blladerunner/customer-churn-dashboard

Customer Churn Dashboard — SQL + Python analytics project exploring customer retention patterns, churn rate by demographics and services, and key insights for telecom business strategy.

business-intelligence churn-analysis customer-retention dashboard data-analysis data-analytics data-science pandas powerbi python sql sqlite telecom

Last synced: 08 May 2026

https://github.com/danmadeira/algoritmos-estatistica-python

Demonstração de Algoritmos de Estatística em Python

algorithms data-analysis data-science python statistics

Last synced: 08 May 2026

https://github.com/0290192029/apartment-price-predictor

Python-проект по прогнозированию стоимости аренды квартир с помощью линейной регрессии. Практическая работа по теме: "Основы машинного обучения" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

apartment-price-prediction apartments-for-rent api correios-api data-analysis feature-engineering feature-enginering linear-regression linear-regression-models mlops numpy prediction-model r seaborn

Last synced: 08 May 2026