An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/prateek5525/retail-sales-analysis-project

This project involves analyzing retail sales data using SQL to uncover insights into sales patterns, customer behavior, and product performance. It serves as an exercise to develop foundational SQL skills in data exploration, cleaning, and analysis.

data-analysis data-cleaning retail-sales-data sql

Last synced: 03 Jan 2026

https://github.com/malakasupun/crime-data-analysis-of-lapd

This project aims to explore and analyse crime patterns in Los Angeles using a dataset spanning from 2020 to the present. The primary focus is to extract meaningful insights by integrating structured data analysis and advanced techniques in SQL and Natural Language Processing (NLP).

data-analysis data-visualization llm nlp sql

Last synced: 29 Jul 2025

https://github.com/ozep/genshincharacteranalysis

Uses a spreadsheet with Character Data and organizes it into readable graphs.

data-analysis jypyternotebook python

Last synced: 18 Apr 2026

https://github.com/antrita/stroke_prediction_model

A model that combines Kaggle's Stroke Prediction Dataset with live weather/air quality data to implement FDA-compliant MLOps pipeline and shows expertise in healthcare regulations and real-time inference.

ai data-analysis deep-learning kaggle-dataset machine-learning prediction-model random-forest real-time scikit-learn streamlit weather-api xgboost

Last synced: 07 May 2026

https://github.com/alanmenchaca/getting-and-cleaning-data-course-project

The purpose of this project is to demonstrate how to collect, work with, and clean a data set.

data-analysis getting-and-cleaning-data rstudio tidy-data

Last synced: 31 Jul 2025

https://github.com/ayeshathoi/simulation-sessional-412

Simulation of SSQS, Inventory System, Transient State, PERT, Monte Carlo Alo etc.

data-analysis excel inventory-system monte-carlo python simulation ssqs triangle-distributions

Last synced: 31 Jul 2025

https://github.com/farrelfaricaf/exploratorydataanalyst---titanic

This project analyzes the Titanic dataset using exploratory data analysis (EDA) and visualization techniques to identify survival patterns. The goal is to understand how demographic factors like gender and age influenced survival rates during the 1912 disaster.

data data-analysis data-science data-visualization eda python titanic-dataset

Last synced: 31 Jul 2025

https://github.com/computingvictor/mercadona_agent

Web app to explore supermarket products with advanced filters, search, favorites, and nutritional info. Includes data analysis notebooks for deeper insights.

css data-analysis data-science data-visualization filtering html interactive-ui javascript notebooks nutritional-info pandas product-catalog python supermarket webapp

Last synced: 09 Apr 2026

https://github.com/aygp-dr/claude-log-stream

Advanced analytics engine for Claude Code logs with real-time processing capabilities

claude-api clojure data-analysis monitoring

Last synced: 24 Sep 2025

https://github.com/rodolfo-brandao/pos-graduacao

[pt-BR] Repositório para armazenar alguns materiais e projetos de cada módulo da minha especialização em Ciência de Dados (2025–2027)

artificial-intelligence data-analysis data-science data-visualization databases deep-learning jupyter linear-algebra machine-learning python r statistics

Last synced: 09 Apr 2026

https://github.com/analyst-lochan/flight-delay-and-cancellation-dataset-2019-2023-

This project demonstrates a complete data analytics pipeline starting from raw real-world flight data to professional visual dashboards using SQL Server and Power BI. It showcases data import, cleaning, optimization, transformation, and dynamic DAX-based visual reporting.

airline-performance business-intelligence data-analysis data-cleaning data-modeling data-visualization dax etl flight-data kaggle-dataset portfolio-project powerbi powerbi-dashboard sql sql-server

Last synced: 09 Sep 2025

https://github.com/jigyasag18/airline-performance-and-passenger-satisfaction-project-using-big-data-analytics

This project analyzes 10 years of U.S. domestic airline data (~3GB) using Hadoop (Cloudera) and Hive for data processing. Power BI dashboards visualize key metrics like delays, on-time rates, air time, and diversions. The solution includes Hive queries, DAX measures, HDFS ingestion scripts, and year-wise insights with recommendations.

big-data big-data-analytics bigdata cloudera cloudera-hadoop cloudera-hadoop-framework data data-analysis data-visualization database hadoop hive power-bi powerbi powerbi-dashboard powerbi-dashboards powerbi-report powerbi-visuals powerbi-visuals-tools powerbidashboard

Last synced: 01 Aug 2025

https://github.com/takshak26/predict_blood_donations-

About The title of the project is “Predict Blood Donations”. It uses python as language, data science, and machine learning as the field of operation, TPOT library for model selection, logistic regression for model building, and jupyter notebook as the code editor.

data-analysis data-visualization datascience machine-learning python3

Last synced: 16 May 2026

https://github.com/nushratjabenaurnima/cse_477_data_mining

A collection of labs, reports, Jupyter notebooks, and project outputs for the CSE 477 Data Mining course. This repository tracks my learning journey through data preprocessing, association rules, clustering, classification, and real-world data analysis with Python.

data data-analysis data-mining data-science google-colab-notebook jupyter-notebook machine-learning python python-3

Last synced: 09 Apr 2026

https://github.com/quesocosteno03/data-analysis-projects

This repository serves as a collection of all my projects.

data-analysis jupyter-notebook powerbi

Last synced: 02 Aug 2025

https://github.com/lc-rezende/eqx_boston_dataset

Exploratory data analysis, clustering, and forecasting on Boston crime data (2011-2015), revealing key crime trends, hotspots, and temporal patterns to support data-driven insights for urban safety and policing strategies.

data-analysis exploratory-data-analysis jupyter-notebook kmeans matplotlib numpy pandas prophet-facebook python scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/faint-liebfraumilch101/fraud-detection-sql-unsupervised

🕵️♂️ Detect fraud in bank transactions using SQL for feature engineering and Python's Isolation Forest for unsupervised anomaly detection.

anomaly-detection banking-data data-analysis data-science financial-analytics fraud-detection isolation-forest machine-learning portfolio-project python sql sqlite unsupervised-learning

Last synced: 07 May 2026

https://github.com/mxagar/space_exploration

This repository is a collection of mini-projects and tutorials related to space images and geo-spatial data.

data-analysis deep-learning geospatial machine-learning

Last synced: 29 Sep 2025

https://github.com/asghar-rizvi/hotel_reservation_data_analysis

This project involves a comprehensive data analysis of a hotel reservation dataset using Excel. The primary focus is on examining reservation cancellations. Through detailed analysis and visual representation.

dashboard dashboard-templates data-analysis data-analysis-excel data-representation data-science excel

Last synced: 02 Mar 2026

https://github.com/lyubov0406/data_analyst_portfolio

В репозитории собраны пет-проекты, демонстрирующие мои навыки в аналитике данных

data-analysis matplotlib numpy pandas portfolio python scipy seaborn sql tableau visualization

Last synced: 09 Apr 2026

https://github.com/acerbilab/svbmc

Stacking Variational Bayesian Monte Carlo (S-VBMC) algorithm for combining Variational Bayesian Monte Carlo (VBMC) posteriors to boost inference performance.

bayesian-inference data-analysis machine-learning model-fitting python stacking variational-inference

Last synced: 20 Jan 2026

https://github.com/fortunewalla/birdstrikes

birdstrikes database created for postgresql with simple sample queries

birdstrikes csv data-analysis data-science database dataset pgsql postgresql practice sample sql sql-query workshop

Last synced: 02 Oct 2025

https://github.com/omdoshi13/pricing-of-laptops-using-ml

Data Analysis, training Machine Learning models, and Model Evaluation and Refinement for Pricing of Laptops dataset.

data-analysis data-analysis-project datascience google-colab jupyter-notebook machine-learning matplotlib model-evaluation model-refinement numpy pandas python scikit-learn

Last synced: 09 Apr 2026

https://github.com/jagoda11/elastic-vision

This repository contains a full-stack application designed to explore data from ElasticSearch🧐indices and visualize it using charts and graphs. The backend is built using Node.js and the frontend is powered🚀 by React.

backend chartjs dashboard-development data-analysis data-visualization docker elasticsearch frontend fullstack javascript material-ui monorepo mui-x node pie-chart react restful-api tables

Last synced: 09 Apr 2026

https://github.com/muneeb706/human_activity_recognition

This project performs data cleaning and data exploration steps for Human Activity Recognition Using Smartphones Data Set in R programming language.

data-analysis data-cleaning data-exploration r-programming

Last synced: 08 Aug 2025

https://github.com/abhigyan126/prompt2query

A Python desktop application for streamlined data analysis, enabling users to generate and execute Pandas and SQL queries with ease. Focus on reducing analysis time through an intuitive interface and efficient workflows

data-analysis data-science data-visualization database gemini generative-ai ide llm pandas pandas-interface python sql-interface

Last synced: 13 Feb 2026

https://github.com/brunomontezano/digital-interventions-for-depression

📱 "Digital interventions for depressive symptoms: a randomized clinical trial" code

academia clinical-trials cognitive-behavioral-therapy data-analysis digital-health open-science smartphone-app

Last synced: 03 Oct 2025

https://github.com/hemangsharma/hotel-revenue-booking-analysis

This project provides a comprehensive revenue and reservation analysis for Highfield Hotel using historical data exported from booking systems and internal revenue reports. The goal is to derive actionable insights to improve room profitability, understand booking patterns, and support data-driven decision-making.

analysis data-analysis data-visualization hotel

Last synced: 10 Aug 2025

https://github.com/ifigeneiatsiflidou/applied-statistics-project

Project for an Applied Statistics course, involving exploratory data analysis and predictive modeling of movie revenue using engineered features and multiple linear regression.

correlation-analysis data-analysis linear-regression python scikit-learn visualization

Last synced: 29 Apr 2026

https://github.com/gutow/langmuir_trough

Code to run homebuilt Langmuir Trough using Jupyter and Python. Link below for API docs:

data-acquisition data-analysis jupyter langmuir-trough plotting

Last synced: 11 Aug 2025

https://github.com/ct83/become-a-data-analyst-udacity

This repository contains all of the code, projects and reports that I wrote as I pursued my Udacity - Data Analyst NanoDegree.

data-analysis data-analysis-python data-analyst data-visualisation data-visualization-project datascience python udacity udacity-data-analyst-nanodegree

Last synced: 12 Aug 2025

https://github.com/nabilalibou/uber_fare_prediction_explained

This repository documents a complete ML workflow to model Uber fares in Paris, from granular EDA and feature engineering to building and fine-tuning a stacking regressor on 10k real-world rides.

data-analysis data-science eda feature-engineering machine-learning predictive-analytics pricing-model python regression-model stacking-ensemble uber

Last synced: 12 Aug 2025

https://github.com/itsachrafmansari/moroccan-real-estate-analysis

Scrape, process, analyze, and visualize data from Avito.ma to uncover current trends in Morocco's real estate market.

api-scraping data data-analysis data-mining data-science data-scraping data-visualization eda exploratory-data-analysis morocco real-estate web-scraping

Last synced: 13 Aug 2025

https://github.com/dcs-training/scottishaccounts

This repo contains various examples of analysis that can be performed on the Statistical Accounts of Scotland dataset. Go to the readme file

data-analysis data-visualisation data-wrangling geographical-data r rmarkdown text-analysis

Last synced: 16 Aug 2025

https://github.com/i-e-b/dynamictimewarp

A quick C# implementation of https://jeremykun.com/2012/07/25/dynamic-time-warping/

data-analysis pattern-matching working

Last synced: 17 Aug 2025

https://github.com/edoardotosin/january-2025-southern-california-wildfires-burn-severity-sentinel2

Scripts and data for analyzing burn severity of the January 2025 Southern California wildfires using Sentinel-2 satellite imagery. This project explores the use of the Differenced Normalized Burn Ratio (dNBR) and Relativized Burn Ratio (RBR) to classify burn severity, leveraging publicly available satellite data.

burn-severity copernicus data-analysis earth-observation satellite-imagery sentinel-2 wildfire wildfire-detection wildfires

Last synced: 09 Feb 2026

https://github.com/harshindcoder/online_retail_data_clustering_project

This marketing analytics project uses RFM (Recency, Frequency, Monetary) features for customer classification, inspired by the online retail mining paper. The RFM model helps segment customers, identify high-value ones, and optimize marketing strategies.

customer-segmentation data-analysis data-visualization market-analytics

Last synced: 17 Aug 2025

https://github.com/davidzajac1/four-percent-rule-pandas-analysis

Analysis of the 4% Personal Finance Rule of Thumb

data-analysis data-visualization pandas python

Last synced: 20 Apr 2026

https://github.com/berkekaragoz/media-investments-data-analysis

Advertisement Investments Distribution of Turkey by Medium

data-analysis r

Last synced: 19 Aug 2025

https://github.com/shadz23/smart-energy-dashboard

Power BI dashboard analyzing household electricity consumption to reveal usage patterns, peak hours, and estimated costs for smarter energy management and reduced bills. 🐙

chart data-analysis data-visualization dax energy-consumption hs110 hs300 ibm ibm-cloud influxdb jupyter-notebook kasa kp115 linuxone observability photovoltaics-dashboard plotly sense

Last synced: 19 Aug 2025

https://github.com/rahmamohammad/retail_project

Retail & Data analytics: KPIs, sales trends, Excel planning pack, forecasting & inventory tracking.

data-analysis data-visualization ecommerce excel jupyter-notebook matplotlib python retail-analytics storytelling

Last synced: 17 May 2026

https://github.com/apostolis-bloutsos-data/employee-data-eda

Mini EDA project on synthetic employee records using Python, pandas, and matplotlib

data-analysis eda jupyter-notebook matplotlib pandas python seaborn

Last synced: 09 May 2026

https://github.com/cyberoctane29/epa-air-quality-aqi-analysis

This project involved analyzing air quality data from the EPA, focusing on the Air Quality Index (AQI). I used Python data structures like dictionaries and sets to manage and process the data, simulating real-world data analysis to assess pollution levels and their health implications.

data-analysis numpy pandas python statistics

Last synced: 10 Apr 2026

https://github.com/jedrzej-wydra/competition-cooperation

Competition, cooperation, and parental effects in larval aggregations formed on carrion by communally breeding beetles Necrodes littoralis (Staphylinidae: Silphinae)

data-analysis non-linear-regression r

Last synced: 20 Aug 2025

https://github.com/shriansh8619/sql_eda

Explored relational databases using SQL to perform comprehensive Exploratory Data Analysis (EDA), covering database exploration, segmentation, trend analysis, and performance ranking. Developed reusable SQL scripts to analyze dimensions, measures, and time-based metrics, helping uncover key business insights.

data-analysis exploratory-data-analysis mysql

Last synced: 20 Aug 2025

https://github.com/myriamba/neuraview

AI-Powered Data Insights and Visualization Generator

data-analysis data-engineering data-insights data-visualization generative-ai llm

Last synced: 21 Aug 2025

https://github.com/nickenshidqia/sql-for-financial-data-analysis

Design SQL queries to generate accurate and timely financial reports including Profit and Loss statements, Balance Sheets, and Cash Flow statements

azure-data-studio data-analysis finance microsoft-sql-server sql

Last synced: 09 Mar 2026

https://github.com/debjyotisaha/tableau-projects-phase-2

Published interactive dashboards on Tableau Public, highlighting expertise in data visualization and storytelling through analyses of transportation patterns, sales trends, and demographic studies. These projects showcase the ability to transform complex datasets into actionable, intuitive visuals for decision-making.

dashboards data data-analysis data-visualisation tableau

Last synced: 26 Aug 2025

https://github.com/putuwaw/dashboard-ecommerce

Dashboard for E-Commerce Public Dataset using Streamlit and Plotly

dashboard data-analysis dicoding plotly streamlit

Last synced: 20 Feb 2026

https://github.com/sarathchandranpm/walmart-sales-analysis

Analysis of Walmart Myanmar's Q1 2019 sales data covering customer behavior, product performance, general operations, and sales patterns.

data-analysis mysql sql

Last synced: 29 Aug 2025

https://github.com/shubhammittal-data/hr_dashboard_tableau

An interactive HR Analytics Dashboard built using Tableau. Provides insights into workforce demographics, hiring trends, salary analysis, and employee records for data-driven decision-making.

chatgpt4 data data-analysis data-visualization drawio-tools faker-generator hr-analytics hr-analytics-dashboard human-resources numpy python tableau tableau-public

Last synced: 17 May 2026

https://github.com/devanshsahu47/talentscape-glassdoor-analysis

TalentScape is an end-to-end Python project that cleans and analyzes a comprehensive Glassdoor Jobs dataset. It features robust data wrangling and 20 insightful visualizations to uncover trends in job titles, salary ranges, company ratings, and more—providing actionable recommendations to optimize recruitment and compensation strategies.

business-intelligence data-analysis data-vizualisation jupyter-notebook python3

Last synced: 15 May 2026

https://github.com/mehrab-kalantari/olympics-data-analysis

A streamlit application to analyze the Olympics dataset from several views

data-analysis streamlit-dashboard streamlit-webapp

Last synced: 20 Apr 2026

https://github.com/mysftz/statistical-analysis

A in-depth review of statistical analysis in Python from datasets.

data-analysis python python3 statistics university university-project

Last synced: 14 May 2025

https://github.com/als8446/tripleten-data-science-projects

Projects Overview Projects made in the Data Scientist course from TripleTen LatAm

data data-analysis hypothesis-tests machine matplotlib numpy pandas python scipy sklearn

Last synced: 10 Apr 2026

https://github.com/iness000/online-retail-customer-segmentation

This project performs comprehensive customer segmentation analysis on an online retail dataset using machine learning clustering techniques and RFM (Recency, Frequency, Monetary) analysis. The goal is to identify distinct customer segments to drive better customer relationship management strategies and business insights.

customer-segmentation data-analysis k-means

Last synced: 31 Aug 2025

https://github.com/rdrahul123/ecommerce-sales-dashboard

This project focuses on analyzing e-commerce sales data to uncover actionable insights and improve business decision-making. Using interactive dashboards and data analysis techniques, the project evaluates key performance metrics, customer behavior, sales trends, and payment modes across different categories and regions.

data-analysis data-science excel powerbi

Last synced: 22 Mar 2025

https://github.com/jayqi/data-analysis-tools

Presentation on Data Analysis Tools

data-analysis presentation-slides

Last synced: 06 Jan 2026

https://github.com/evanwporter/sloth

Faster Pandas Dataframe

cython data-analysis dataframe pandas

Last synced: 14 Mar 2025

https://github.com/virajbhutada/hr-analytics-excel-sql-tableau-powerbi

Explore a comprehensive HR Analytics portfolio showcasing data analysis and visualization skills. Featuring dashboards in Power BI, Excel, and Tableau, along with SQL queries for deeper insights. A holistic view of expertise in HR analytics, data visualization, and database management. Let's dive into the game of data insights!

data-analysis data-management data-visualization excel hr-analytics interactive-dashboards portfolio-project postgresql powerbi powerbi-visuals sql sql-queries tableau tableau-public

Last synced: 02 Aug 2025

https://github.com/barraharrison/spotify-listening-trends

Using EDA to look at song longevity, regional preferences, and streaming behavior in the charts and on Spotify.

data-analysis data-visualization jupyter-notebook kaggle-dataset

Last synced: 03 Feb 2026

https://github.com/ronylpatil/whatsapp-group-chat-analysis

This project is totally based on data analysis where our college official Whatsapp group is used to extract useful information from the chat. Some of the useful extracted features are most active members of the group, most active day of the week, top-10 media contributors in the Group, and many more...

data-analysis data-preprocessing data-wrangling feature-engineering

Last synced: 14 Jun 2025

https://github.com/satyam4229/omnify-dataanalysis

Our assessment of Omnify focused on data-driven strategies to maximize profitability. We identified "Product X" as the most profitable product and recommended leveraging the "Wellness Solutions" keyword category for optimal keyword strategy.

data-analysis data-science data-visualization excel omnify

Last synced: 04 Jan 2026

https://github.com/serlo/data-pipeline-interactive-exercises

processing pipeline for exercise dashboards

data-analysis serlo

Last synced: 26 Feb 2025

https://github.com/andrii04/ga4-gcs-to-bigquery-etl

Automated Data Pipeline that ingests daily GA4-formatted CSV files from a private Google Cloud Storage bucket, validates and loads them into BigQuery, and prepares analysis-ready views. The solution is built for deployment as a Cloud Function triggered by Cloud Scheduler and uses Python with the Google Cloud Storage and BigQuery client libraries.

automation bigquery cloud cloudfunctions data data-analysis data-engineering etl etlpipeline gcp google googlecloudplatform pipeline python sql

Last synced: 18 May 2026

https://github.com/rahul-jha98/restauranttrends.stats-backend

Application that scrapes the Zomato Dataset and enables the user to visualise the results.

data-analysis data-extraction firebase-storage web-scraping zomato-api

Last synced: 16 Mar 2026

https://github.com/azaz9026/data_cleaning

Welcome to the Data Cleaning repository! This collection is dedicated to showcasing techniques and methods for cleaning and preparing datasets for analysis.

data-analysis data-engineering data-structures data-visualization eda feature-engineering machine-learning numpy outliers pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/bibymaths/python_snippets

A collection of Python scripts for bioinformatics data analysis, including tools for transcription counts, nucleotide composition, and protein sequence evaluation.

amino-acid-scoring bioinformatics data-analysis fasta-generation mathematical-evaluation nucleotide-analysis protein-sequence-analysis transcription-counts

Last synced: 29 Jul 2025

https://github.com/lopez86/rust-mlearn

Machine Learning Tools in Rust

data-analysis data-science machine-learning rust

Last synced: 15 May 2025

https://github.com/shivam5509/power-bi-project

Expert in creating interactive dashboards and reports using Power BI, utilizing 10+ visual tools like cards, slicers, and charts. Skilled in cleaning and transforming large datasets with Power Query Editor. Proficient in advanced DAX functions (SUMX, FILTER, CALCULATE) to derive insights and drive data-driven decisions.

advanced-excel computer-science data-analysis data-mining data-visualization engineering mysql numpy pandas powerbi pyhton3 sql sql-server

Last synced: 11 Apr 2026

https://github.com/grlyntng/rpims

Django Code and documentation for the Retail Pharmacy Inventory Management System (best final year project award)

data-analysis django erp forecasting-models lstm-neural-networks reporting

Last synced: 26 May 2026

https://github.com/weybsonalves/prevendo-o-atrito-de-clientes

Projeto em que percorro as etapas que compõem o ciclo de vida da ciência de dados a fim de prever o atrito de clientes do serviço de cartões de crédito de um banco.

data-analysis data-science data-visualization machine-learning python

Last synced: 06 May 2026

https://github.com/elakkiya-u/digital-marketing-campaign

A machine learning project to predict whether a customer will convert based on digital marketing campaign data.

campaigns data-analysis deployment digital-marketing machine-learning predictive-modeling python

Last synced: 30 Jun 2025

https://github.com/jayita11/healthcare-management-optimization-analysis-and-visualization

This project analyzes healthcare data from 2019 to May 2024, optimizing patient care, resource allocation, and financial management. Insights include billing trends, blood bank management, doctor performance, and medication demand, supported by excel,interactive Tableau dashboards and SQL analysis.

data-analysis excel healthcare interactive-dashboards mysql sql tableau-dashboards

Last synced: 23 Mar 2025

https://github.com/diem0n/100daysofdatascience

This repository is a collection of things i do on as a data scientist each day as i am hired at a fictional company called keko corp

data-analysis data-engineering data-science data-science-from-scratch data-warehousing machine-learning python

Last synced: 09 Apr 2026