An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/harkishen/Agriculture-DS

An Agricultural based Mtech project, on Data Science, which predicts the growth of crops based on previous year records.

data-analysis pandas python

Last synced: 11 Dec 2025

https://github.com/ankitwalimbe/ecommerce-funnel-analysis

SQL-based analysis of the Olist e-commerce dataset — building an order funnel (purchase → approval → delivery) with breakdowns by payment type, product category, region, and monthly trend. Includes insights, CSV exports, and Tableau dashboard.

bigquery business-intelligence data-analysis ecommerce funnel-analysis sql tableau-public

Last synced: 05 Oct 2025

https://github.com/josepablodmg/python--linear-regression---housing-exercise

A predictive analysis exploring the relationship between household characteristics and median income in California. Using linear regression, the project investigates whether blocks with fewer households correspond to higher median incomes.

california data-analysis data-science exploratory-data-analysis housing-data linear-regression machine-learning python regression scikit-learn statistics visualization

Last synced: 05 Oct 2025

https://github.com/farhad-here/height-distribution-analysis

Statistical comparison of height distributions in two groups using mean, standard deviation, and boxplots.

coefficient-of-variation data-analysis interquartile-ranges matplotlib mean numpy python scipy standard-deviation variance

Last synced: 13 Apr 2026

https://github.com/davifeliciano/modern_physics_experiments

Collection of data analysis and visualization scripts developed in Python around some modern physics experiments

data-analysis data-visualization modern-physics physics physics-experiments

Last synced: 18 Jan 2026

https://github.com/eharshit/end-to-end-vendor-insights

End-to-end analysis of vendor performance for wholesale/retail businesses, featuring data ingestion, cleaning, insights, and interactive Power BI dashboards.

analysis analysis-algorithms analytics dashboard data data-analysis datascience jupyter jupyter-notebook pandas powerbi powerbi-report retail wholesale

Last synced: 07 Oct 2025

https://github.com/shourya1997/boston_housing

In this project, you will apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home.

boston-housing-dataset data-analysis jupyter-notebook machine-learning python unsupervised-machine-learning

Last synced: 18 May 2026

https://github.com/dcs-training/machinelearning

Introduction to Machine Learning with Python delivered by the centre in the year 2022-23. Go to the read me file

data-analysis data-wrangling machine-learning python statistics

Last synced: 08 Oct 2025

https://github.com/kmranrg/bikeshare

a project based on Data Analysis

data-analysis python

Last synced: 08 Oct 2025

https://github.com/alexquilis1/spanish-fuel-stations-analysis

Real-time analysis of Spanish fuel prices using government API data with interactive maps and regional comparisons

data-analysis data-visualization fuel-prices geospatial-analysis ggplot2 government-data leaflet open-data r shiny spain tidyverse

Last synced: 08 Oct 2025

https://github.com/jlee9503/telecommunication-churn

Analyze key factors influencing customer churn using Python data analytics technique. Explore key factors through data preprocessing, exploratory data analysis (EDA), and predictive modeling.

data-analysis data-visualization matplotlib pandas python scikit-learn

Last synced: 18 Jan 2026

https://github.com/faisal-khann/ipl-analysis

The IPL Analysis project is a comprehensive data-driven exploration of the Indian Premier League (IPL), analyzing historical match data to uncover patterns in team performance, player statistics, and match outcomes.

data-analysis exploratory-data-analysis jupyter-notebook matplotlib numpy pandas seaborn

Last synced: 08 May 2026

https://github.com/priyanshubiswas-tech/priyanshubiswas-tech

SWE-Data Engineer @ EDN | Kubeflow-MLOps | Kubernetes | Databricks | AWS EMR-Lambda-Glue, Eventbridge, SQS-SNS | OCI Multi-Cloud Architect Professional | GCP GA4 | Gen AI | IEEE Brand Amb. | Ex-Chair, PES | Ex-Sec, SB

apache-spark aws data-analysis data-engineering data-visualization dbt hadoop kubernetes python3 sql

Last synced: 21 Jan 2026

https://github.com/ninadpatil09/hospital_emergency_room_analysis

This comprehensive analysis delves into the performance and characteristics of the hospital's emergency room over the past year. By scrutinizing key metrics and patient demographics, this study aims to provide valuable insights for optimizing patient care, resource allocation, and overall operational efficiency.

data-analysis tableau-public visualization

Last synced: 15 Feb 2026

https://github.com/samuelsoaress/wkd-default-reduction

reduction of default from 35% to 25% or less with machine learning techniques

data-analysis data-exploration data-science machine-learning-algorithms

Last synced: 10 Oct 2025

https://github.com/loaiwalid07/automation_data_overviwe

This is Streamlit app that gives an overview for a dataset you upload

automation data data-analysis data-exploration data-science data-transformation data-visualization

Last synced: 19 May 2026

https://github.com/brooks-code/toulouse-biblio-chronicle

Snapshot of Toulouse public library customer habits — cleaning raw, messy datasets of musical, cinematic, and literary checkouts; includes data-cleaning steps, analysis notebook revealing cultural tastes in the Pink City.

data-analysis data-cleaning data-cleaning-and-preprocessing data-quality exploratory-data-analysis jupyter-notebook library-data misaligned-data mojibake tutorial

Last synced: 10 Oct 2025

https://github.com/its-ekanshi/sql-analytics-project

Designed relational tables with primary and foreign keys, populated with sample data for real-world testing. Implemented advanced SQL techniques such as CTEs, window functions, aggregates, and filters to extract valuable insights.

business-intelligence data-analysis exploratory-data-analysis microsoft-sql-server sql sql-queries

Last synced: 10 Oct 2025

https://github.com/salma-mamdoh/a-visual-history-of-nobel-prize-winners-project

My project aims to practice Data Analysis and Data Visualization on DataCamp

data-analysis data-visualization datacamp matplotlib pandas python seaborn

Last synced: 04 May 2026

https://github.com/cyberoctane29/diamonds-anova-analysis

This project uses ANOVA in Python to analyze how diamond color and cut affect pricing. By testing for statistical significance and running post hoc comparisons, it reveals key pricing patterns. Built with pandas, statsmodels, and Seaborn, the findings help inform diamond valuation and purchasing decisions.

anova-test data-analysis data-analytics data-science diamonds-dataset regression-analysis statistical-analysis tukey-hsd

Last synced: 10 Oct 2025

https://github.com/saifalibaig/covid-19-death-rate-analysis-using-python

Analysis of Covid-19 data along with the world happiness report to identify if there is any relationship between death rate and happiness rate of countries all over the world.

data-analysis data-visualization numpy pandas python3 sns visualization

Last synced: 03 May 2026

https://github.com/ahsankhizar5/titanic-eda-visualization

Exploratory Data Analysis and Visualization on the Titanic Dataset using Python, Pandas, Matplotlib, and Seaborn to uncover survival patterns.

data-analysis data-science data-visualization eda kaggle machine-learning matplotlib pandas python seaborn titanic-dataset

Last synced: 31 May 2026

https://github.com/silvermete0r/sdu_hackathon_uss_db_analysis

Smart Data Ukimet Hackathon - "Data Modeling" case Solution - Topic: Store Analysis based on Unified Star Schema

data-analysis data-modeling postgresql python sql unified-star-schema

Last synced: 14 Apr 2026

https://github.com/navp7/pizzasales_powerbi

This project involves creating a comprehensive sales performance dashboard using Power BI to visualize and analyze the sales data of an Italian pizza company.

data-analysis ms-sql-server ms-word powerbi visualization

Last synced: 13 Mar 2026

https://github.com/tzerk/esr

R package 'ESR' for plotting and analysing ESR spectra in dating applications

data-analysis data-visualization electron-spin-resonance geochronology r

Last synced: 13 Mar 2026

https://github.com/sunsided/esc2024

Exploratory Data Analysis on the ESC 2024 results

csv data-analysis eurovision-song-contest scraping

Last synced: 18 Feb 2026

https://github.com/louisfernando1204/websocket-benchmark

A comprehensive performance testing and analysis suite designed to evaluate and compare different WebSocket server implementations across various programming languages and libraries.

benchmarking broadcast-test coder-websocket csv data-analysis data-visualization echo-test golang gorilla-websocket nodejs python3 socket-io websocket-client websocket-server ws

Last synced: 09 Apr 2026

https://github.com/gmalbert/supreme-court

Data Analysis of the US Supreme Court from 1790 to present

data-analysis data-science supreme-court

Last synced: 31 May 2026

https://github.com/angelalim88/jakarta-air-quality-index-classification

This project classifies Jakarta's Air Quality Index (AQI) from 2010 to 2023 using machine learning models (Random Forest, MLP, SVM) based on pollutant concentrations.

data-analysis data-visua machine-learning scikit-learn tensorflow

Last synced: 13 Oct 2025

https://github.com/szymon-budziak/real_estate_house_prices_prediction

Predicting real estate house prices using various machine learning algorithms, including data exploration, preprocessing, model training, and evaluation.

data-analysis data-preprocessing data-science eda jupyter-notebook machine-learning matplotlib numpy optuna pandas predictive-modeling price-prediction python random-forest regression scikit-learn seaborn

Last synced: 21 Jan 2026

https://github.com/inddrsingh/restaurant_orders_mysql

Complex SQL queries on restaurant data for better and precise insights

data-analysis insights mysql

Last synced: 28 Jan 2026

https://github.com/ankitpoddar07/excel-project_back-office

📊 Coffee Sales Analytics – Back Office Excel Project

data-analysis ms-excel

Last synced: 05 Feb 2026

https://github.com/a26nine/msc-dissertation-bitcoin-dashboard

An interactive data visualisation dashboard built using Tableau Desktop to research and analyse the relationship between the price volatility and adoptability of bitcoin.

data-analysis data-science data-visualization tableau tableau-desktop tableau-prep

Last synced: 17 Feb 2026

https://github.com/kunalpisolkar24/winequalityprediction

Predicting wine quality using machine learning with matplotlib, numpy, pandas, and seaborn for insightful data analysis. 🍇🤖📊

data-analysis data-science data-visualization machine-learning prediction-model

Last synced: 16 Oct 2025

https://github.com/aishwaryahastak/ipl_analysis

Analysis of IPL dataset using PySpark

data-analysis mllib pyspark

Last synced: 16 Oct 2025

https://github.com/fatihilhan42/nba-players-data-1950-to-2021

In this project, the data of the NBA players between the years 1950-2021 were examined. After the NBA players' season, height, performance, averages of points, teams and positions they played were obtained through csv files, important tables and graphs were created using data cleaning and data visualization algorithms.

data data-analysis data-engineering data-science data-visualization

Last synced: 16 Oct 2025

https://github.com/balajimohan18/sql-projects

The repository contains Structured Query Language (SQL) Scripts. The Multiple SQL scripts for various projects which includes data cleaning, data pre-processing, data processing, data transformation and insights gaining through Query Language

data-analysis data-mining data-science eta microsoft-sql-server query-language sql sql-server sql-server-management-studio sqlqueries

Last synced: 14 Mar 2026

https://github.com/pizofreude/da-with-r

Data analysis with R data centric programming language

data-analysis r

Last synced: 17 Oct 2025

https://github.com/abhijeet107/task-4

Design an interactive dashboard for business stakeholders.

data-analysis excel-csv tableau-dashboards tableau-public

Last synced: 22 Jan 2026

https://github.com/casassg/ms_thesis

Social Media Analysis for Crisis Informatics in the Cloud

casassg-thesis data-analysis google-cloud kubernetes

Last synced: 19 Oct 2025

https://github.com/abishek0103/olist-ecommerce-sql-project

SQL Project using Olist Dataset – E-commerce analysis with MS SQL Server to extract business insights.

business-insights data-analysis sql-server

Last synced: 19 Oct 2025

https://github.com/navp7/hr_analysis_excel

This project utilizes Microsoft Excel to conduct a comprehensive analysis of HR data, focusing on identifying the various reasons for employee attrition and evaluating job satisfaction

dashboards data-analysis excel visualization

Last synced: 23 Jan 2026

https://github.com/brianlesko/r_data_science_stat5730

Written by Brian Lesko, the repository contains R Scripts demonstrating data science topics largely originating from study at Ohio State. Contents are written in R studio using the R markdown file. As of 1/21/23 Future projects concerning data science, statistics, and machine learning will be in python in my machine learning Repository

data data-analysis flight-data ggplot2 olympics-data r-markdown tidyverse

Last synced: 23 Jan 2026

https://github.com/psychelzh/cogstruct-old

Data Analysis on Cognitive Structure

cognition data-analysis intelligence psychology

Last synced: 25 Oct 2025

https://github.com/limatix/limatix

Limatix datacollect and processtrak tools

data-analysis python scientific-workflows

Last synced: 23 Jan 2026

https://github.com/alunera-data/alunera-data

Hi, I’m Yvonne – building data solutions at the intersection of BI, SQL & Service Management

business-intelligence data-analysis data-engineering data-science github-profile portfolio rstats sql

Last synced: 28 Jan 2026

https://github.com/aneeshmurali-n/global-superstore-sales-dashboard---power-bi-stunning-dark-theme

This Power BI dashboard provides a comprehensive view of sales data, enabling users to analyze sales trends, identify top-performing regions, and gain insights into customer behavior.

dark-theme dashboard data-analysis data-science data-visualization powerbi salesdashboard

Last synced: 28 Jan 2026

https://github.com/andreicirciumaru/best-of-breed

CSV fundamentals screener: schema validation + market-cap weights

csv data-analysis finance pandas python screener

Last synced: 15 Apr 2026

https://github.com/smahala02/magnetism-lab

This repository contains Python scripts and data for analyzing inductance in toroidal coils to calculate the magnetic permeability of ferrite materials. The project helps classify materials as soft or hard magnets based on experimental data.

data-analysis inductance jupyter-notebook magnetism python toroids

Last synced: 29 Jan 2026

https://github.com/joannescode/regex_with_py

Learning by practicing with Regex (Python)

data-analysis python3 regex

Last synced: 30 Jan 2026

https://github.com/ljadhav25/decision-tree-random-forest-algorithm-data-science-

This repository contains an implementation of decision tree and random forest algorithms from scratch in Python. Decision trees and random forests are popular machine learning algorithms used for classification and regression tasks. The goal of this project is to provide a clear and understandable implementation of these algorithms

data-analysis data-science decision-trees machine-learning-algorithms matplotlib numpy pandas python random-forest-classifier

Last synced: 15 Apr 2026

https://github.com/manishabarse/hr_data_analysis

Used Microsoft SQL Server Management Studio and Power BI

data-analysis powerbi sql ssms

Last synced: 30 Jan 2026

https://github.com/tralahm/parliament-2017-dataset

Concise, Clean data sets of the 2017 Kenyan General Election results for the Members of the Senate and National Assembly Composition

csv-parsing data-analysis data-visualization datasets election-data ipynb-jupyter-notebook kaggle-dataset kenya-constituencies kenya-counties matplotlib python3 tralahtek

Last synced: 31 Jan 2026

https://github.com/jujulis18/olympicsmedalsdashboard

Olympic Dashboard – Paris 2024 est un tableau de bord interactif permettant d’explorer les performances des athlètes médaillés des Jeux Olympiques d’été de Paris 2024.

dashboard data-analysis data-visualization eda olympic python streamlit

Last synced: 31 Jan 2026

https://github.com/ginanti-riski/streamlit_datapenyewaansepeda

Analisis Bike Sharing adalah proyek yang bertujuan untuk memahami pola penyewaan sepeda berdasarkan berbagai faktor seperti cuaca, musim, dan hari. Proyek ini menggunakan teknik analisis data untuk mendapatkan wawasan yang lebih dalam mengenai tren peminjaman sepeda.

data-analysis data-analysis-python data-science data-visualization python streamlit

Last synced: 15 Apr 2026

https://github.com/gastonstat/stat133

STAT 133: Concepts in Computing with Data

data-analysis data-science data-visualization r-programming syllabus

Last synced: 25 Feb 2026

https://github.com/emediongfrancis/unified-data-lake-implementation-gcp-kafka-airflow-snowflake

This project demonstrates the integration of data from multiple sources into a unified data lake. The project showcases the use of Apache Airflow for ETL tasks, Google Cloud Storage as a data lake, Apache Kafka for data movement automation, Snowflake for data warehousing, and Google BigQuery for analysis.

airflow data-analysis data-warehousing etl etl-pipeline gcp-storage kafka snowflake value variety

Last synced: 07 Feb 2026

https://github.com/ludreinsalvador/life-expectancy-data-analysis

Contains Power BI dashboards analyzing global life expectancy trends, mortality rates, and health expenditures. Using a dataset sourced from Google Sheets, the project explores the impact of economic and healthcare factors on longevity.

dashboard data-analysis data-visualization healthcare-analysis life-expectancy powerbi

Last synced: 25 Feb 2026

https://github.com/rissh/titanicsurvivalpredictionusingml

Predicting Titanic passenger survival through machine learning. This project includes data preprocessing, exploratory data analysis, feature engineering, and model training using Python. 🚢

data data-analysis data-science data-visualization dataanalysis jupiter-notebook machine-learning machine-learning-algorithms machinelearning matplotlib numpy pandas prediction prediction-model python python3 seaborn tenserflow tflearn titanic

Last synced: 01 Feb 2026

https://github.com/nagar2nd/jenson-usa-mysql-analysis

We are analyzing Jenson USA's dataset to gain valuable insights into customer behavior, staff performance, inventory management, and store operations. By crafting advanced SQL queries, the analysis explores key metrics such as product sales, customer spending, and order patterns, ultimately guiding strategic decision-making and operations.

data-analysis problem-solving sql

Last synced: 01 Feb 2026

https://github.com/yeuner/file-analysis-sql-demo

Streamlit-based application that leverages pandas, sqlite3, and file handling libraries (OpenPyXL and PyArrow) to practice SQL queries, analyze datasets, and export results. A personal project to enhance Python and SQL skills.

data-analysis dataset pandas sql sqlite streamlit vizualization

Last synced: 15 Apr 2026

https://github.com/khanovico/python-stock-analyzer

This is a Webapp implemented by python and several data science frameworks, enabling online stock trend analyzing.

amcharts-js-charts data-analysis data-visualization flask javascript pandas python scikit-learn

Last synced: 02 Feb 2026

https://github.com/vladimiracunadev-create/python-data-science-program

Python Data Science Program — 197 clases en 9 partes. Pauta avanzada derivada de Géron, VanderPlas, Huyen, ISLP y Barocas/Hardt/Narayanan. Recurso personal de aprendizaje, enseñanza y mejora continua.

bootcamp data-analysis data-science education jupyter machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 01 Jun 2026

https://github.com/shubham200137/customer-churn-analysis

In this case study, we analyze customer churn for a telecom company serving Southern California. The company faces increased competition and wants to retain customers by understanding the reasons for churn. Our objectives include improving service quality, identifying churn factors, pinpointing attractive services, and retaining high LTV customers.

data-analysis data-visualization numpy-python pandas-python sqlite tableau

Last synced: 15 Apr 2026

https://github.com/suhail25/hotel-booking-analysis

Analyzed the cancelling of booking of hotels and summarized insights to the Hotel Manager to increase profit by 30%. Demonstrated data exploration, cleaning, analysis using Python and its libraries: pandas, seaborn, matplot. Documented the results in PDF report: reduced cancellation by 30% and releasing discounts for 10 days in a month.

data-analysis ipynb-notebook matplotlib pandas python seaborn

Last synced: 08 Feb 2026

https://github.com/michalspano/maturitna-skuska-proj

Maturitná skúška 2021/2022 - objektívna spracovanie a analýza dát

data-analysis

Last synced: 19 Mar 2026

https://github.com/shubham200137/spotify-listening-habits-analytics

Spotify Listening Habits Analytics is a project aimed at analyzing personalized Spotify listening habits and music trends. It involves Exploratory Data Analysis (EDA) with Python Pandas, data processing using SQL Server, and creating visualizations with Power BI. The goal is to uncover insights into listening patterns, track popularity, and artist.

data-analysis data-visualization exploratory-data-analysis jupyter-notebook pandas power-bi-dashboard sqlserver

Last synced: 18 Mar 2026

https://github.com/ninadpatil09/heart_disease_detection_analysis

The Heart Disease Detection Analysis aims to create a predictive model for identifying individuals at risk of heart disease. Using a dataset with attributes like age, sex, and health metrics, the project focuses on distinguishing patients with and without heart disease.

data-analysis data-cleaning data-science data-visualization machine-learning

Last synced: 15 Apr 2026

https://github.com/haroontrailblazer/machine_learning

About This Repository A curated resource hub for learning machine learning, featuring tutorials, code examples, datasets, and hands-on projects to build foundational skills and explore real-world applications.

data data-analysis data-visualization database dataset gradient-descent machine-learning pandas python3 random-forest sklearn statistics

Last synced: 16 Apr 2026

https://github.com/gnneto/nf-analyzer

Script Python para extrair dados de Notas Fiscais Eletrônicas (XML) e gerar Excel consolidado, com foco na extração de informações financeiras, como vencimentos e valores, para uma análise mais detalhada e eficiente. mantendo formatação numérica.

data-analysis excel finance nf-analyzer pandas python xlm

Last synced: 16 Apr 2026

https://github.com/sreekar0101/bank-financial-loan-performance-trend-analysis

About This project analyzes the performance trends of financial loans using SQL for data extraction and Tableau for visualization. The goal was to perform exploratory data analysis (EDA) to understand key metrics like loan applications, funded amounts, interest rates, and debt-to-income ratios using sql and tableau for visualization

data-analysis data-visualization sql tableau

Last synced: 27 Feb 2026

https://github.com/multitagging/benchmarks

Provides benchmarks to test the MultiTagging framework

benchmarks data-analysis ethereum smart-contracts vulnerabilities

Last synced: 11 Feb 2026

https://github.com/praveen-devknight/event-registration-analytics-dashboard

This project presents an interactive and visually-rich Power BI dashboard that analyzes registration data from a college-level technical and non-technical event, Teciton. The dashboard provides comprehensive insights into participant demographics, event preferences, food choices, and time-based trends.

data-analysis data-visualization excel powerbi sql

Last synced: 11 Feb 2026

https://github.com/rodrigojunqueiradev/python-exercises

Repositório para armazenar exercícios realizados na linguagem Python / Repository to organize exercises with Python language

data-analysis data-science data-structures data-visualization database math pandas pandas-python python python-3 python3 sql statistics

Last synced: 16 Apr 2026

https://github.com/koldlight/bluetab-data-science-2017

Repositorio para compartir material y publicar los retos

course data-analysis data-science exercises

Last synced: 12 Feb 2026

https://github.com/ankit21111/carpredict

This project predicts car prices using machine learning models, including Simple and Multiple Linear Regression. It covers data acquisition, feature selection, and optimization techniques like Ridge Regression. The best model, Multiple Linear Regression, achieved an R² score of 0.84. Check out the full analysis in the repository!

data-analysis data-visualization matplotlib numpy pandas pyhton scipy seaborn sklearn

Last synced: 16 Apr 2026

https://github.com/mananabbasi/dashboard-power-bi

This repository showcases **Power BI projects** focused on data visualization and business intelligence. Each project transforms raw data into interactive dashboards and reports, providing actionable insights for decision-making. The repository includes Power BI files, datasets, and documentation for each project.

data-analysis data-science data-visualization powerbi

Last synced: 13 Feb 2026

https://github.com/kambleakash0/mubi_eda

Mini Project #1 for EAS503 course at SUNY Buffalo

data-analysis data-visualization eda

Last synced: 16 Apr 2026

https://github.com/fhdsl/seattlestatsummer_r

A 4-day introduction to R programming, focused on Fred Hutch Research Interns

beginner beginner-friendly course data-analysis data-science introduction-to-programming r-programming tidyverse

Last synced: 19 Mar 2026