An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/samwhaaa/da_portfolio

Showcasing some of my Data Analytics projects

data-analysis data-analytics data-visualization jupyter jupyter-notebook python

Last synced: 01 Mar 2025

https://github.com/juanmerino89/data-job-market-analysis-project

Análisis completo del mercado laboral a través de datos abiertos, scraping y visualizaciones. Proyecto explicado paso a paso en mi canal de YouTube.

career-insights data-analysis data-science job-data job-market jupyter-notebook machine-learning market-trends open-data portfolio-project python salary-analysis visualization web-scraping youtube-project

Last synced: 18 May 2026

https://github.com/ryanbbrown/volleyball-analysis-project

Analyzes 10 years of self-collected men's NCAA volleyball player height and team wins data to determine the importance of height for success.

data-analysis data-visualization python volleyball

Last synced: 31 May 2026

https://github.com/firetyrant/sql-portfolio-projects

Documenting my SQL learning journey with hands-on projects focused on data cleaning, analysis, and optimization.

bigquery data-analysis databases etl learning portfolio query-optimization sql

Last synced: 19 Apr 2026

https://github.com/samruddhi3012/rfm-sales-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 12 Mar 2025

https://github.com/chen0040/spark-tabular-analytics

Spark statistical inference framework for performing column pair-wise data analytics for large data table

anova chi-square-test confidence-intervals data-analysis hypothesis-testing spark statistical-inference tabular-data

Last synced: 07 Jul 2025

https://github.com/masamallow/jupyterlab-my-local

Configuration to run my personal JupyterLab on my local.

data-analysis jupyter jupyter-notebook jupyterlab

Last synced: 26 Mar 2025

https://github.com/deliprofesor/k-means-clustering-for-retail-data-analysis

This project uses K-Means clustering to segment wholesale customers based on their spending habits. The data is preprocessed, scaled, and clustered into four groups. The Elbow and Silhouette methods determine the optimal number of clusters, and results are visualized using boxplots and scatter plots to uncover spending patterns.

clustering-visualisation data-analysis elbow-method k-means k-means-clustering r silhouette-score

Last synced: 10 Apr 2025

https://github.com/edanur-y/airline-customer-satisfaction-prediction-with-multiple-logistic-regression

Performing multiple logistic regression analysis on airline and customer data to predict the satisfaction. 🔵R

data-analysis missing-values-analysis multiple-logistic-regression optimal-cut-off-points r

Last synced: 09 Jun 2026

https://github.com/codesaadumair/pandas_exercises_personal

Personalized enhancements to pandas exercises with comprehensive solutions and practical insights for mastering data analysis in Python.

data-analysis data-science pandas python

Last synced: 09 May 2026

https://github.com/pranavsp108/time-series-forcasting

A time-series forecasting project to predict hourly energy consumption using Python, Pandas, and an XGBoost regression model.

data-analysis data-science energy-consumption forecasting matplotlib numpy pandas python scikit-learn sustainability time-series xgboost

Last synced: 10 Apr 2026

https://github.com/vishal786-commits/target-businesscasestudy-sql

This project analyzes Target’s e-commerce transactions in Brazil between 2016 and 2018 using SQL. The goal was to explore customer behavior, order patterns, payments, delivery times, and freight costs to generate actionable business insights.

bigquery data-analysis sql

Last synced: 05 Oct 2025

https://github.com/josepablodmg/python--linear-regression---housing-exercise

A predictive analysis exploring the relationship between household characteristics and median income in California. Using linear regression, the project investigates whether blocks with fewer households correspond to higher median incomes.

california data-analysis data-science exploratory-data-analysis housing-data linear-regression machine-learning python regression scikit-learn statistics visualization

Last synced: 05 Oct 2025

https://github.com/marielachirinosr/analysis-urgencias-hospital-pitalito

This project involves analyzing emergency room admission data from the E.S.E Hospital Departamental de Pitalito using a star schema model.

bigquery data data-analysis etl-pipeline tableau

Last synced: 21 Jan 2026

https://github.com/data-edd/mastering_sql

This is a repo documenting me mastering sql

data-analysis mysql mysql-database sql

Last synced: 06 Oct 2025

https://github.com/nuriadevs/informes-powerbi

Este repositorio contiene informes elaborados con Power BI.

data-analysis powerbi

Last synced: 18 Feb 2026

https://github.com/prarthana-singh/bangalore-house-price-predictor

🏡 Bangalore House Price Prediction – A Machine Learning model to predict house prices in Bangalore using real estate data. Built with Linear Regression, Python, Pandas, NumPy, and Scikit-Learn.

data-analysis eda house-price-prediction linear-regression machine-learning numpy pandas python real-estate regression scikit-learn

Last synced: 19 Apr 2026

https://github.com/gabboraron/biostatisztika_es_alkalmazasai

"A statisztika a matematika azon ága, melynek feladata, hogy eszközt adjon a politikusok kezébe, mellyel tetszőleges állítás és annak ellentéte is tudományos alapon igazolható"

biostatistics data-analysis data-visualization r statistics statistics-course

Last synced: 24 Oct 2025

https://github.com/omarsolieman/socialgiveawaydataanalysis

This project involved cleaning, analyzing, and processing data from an Instagram giveaway to ensure a fair and data-driven winner selection process. The primary goal was to automate the process of identifying valid entries, weighting them based on engagement (likes and multiple entries), and performing a post-giveaway analysis

data-analysis data-science data-visualization instagram scraping threejs

Last synced: 14 May 2026

https://github.com/aymanmomin/excel-coffee-data-analytics-exploring-coffee-orders-dataset

This project utilizes a coffee orders dataset to perform comprehensive data analytics and gain insights into customer preferences, popular items, and sales trends. The analysis aims to provide valuable information for coffee shop owners and enthusiasts, facilitating data-driven decision-making and improved customer satisfaction.

data-analysis data-visualization excel project

Last synced: 18 Jan 2026

https://github.com/pranavsp108/market_basket_analysis-instacart

Customer segmentation and market basket analysis using the Instacart dataset with Python, Pandas, and K-Means clustering.

customer-segmentation-and-buying-behavior data-analysis data-visualization instacart jupyter-notebook kmeans-clustering market-basket-analysis pandas python scikit-learn

Last synced: 05 May 2026

https://github.com/kmranrg/bikeshare

a project based on Data Analysis

data-analysis python

Last synced: 08 Oct 2025

https://github.com/dcs-training/exploratory-data-analysis-and-visualisation-with-observable-plot

This two-hour workshop will teach you how to follow an exploratory data analysis pipeline with Observable Plot, a new JavaScript library based on the Grammar of Graphics, that proposes a simple yet expressive interface to create powerful graphics easily shareable on the web. Go to the Readme file

d3 data-analysis data-visualisation javascript observable-notebook

Last synced: 17 May 2026

https://github.com/tyriek-cloud/power-bi-nyc-housing-financial-report

This report was conducted to provide a comprehensive analysis of various NYC housing and financial data.

dashboard data-analysis data-visualization financial-analysis powerbi statistics

Last synced: 21 Jan 2026

https://github.com/sarvesh2304/stellarator_simulation

A comprehensive Julia package for stellarator fusion reactor physics analysis featuring 3D magnetic field calculations, neoclassical transport modelling, quasi-isodynamic optimisation algorithms, and interactive 3D visualisations. Includes tokamak comparison framework and high-resolution plotting capabilities for fusion research.

3d-visualisation data-analysis field-line-tracing fusion-physics fusion-research interactive-3d julia magnetic-confinement magnetic-field-calculations magnetic-surfaces matplotlib neoclassical-transport numerical-methods optimisations physics-simulation plasma-physics plotly quasi-isodynamic stellarator stellarator-optimization

Last synced: 09 Oct 2025

https://github.com/amish5ingh/cricket-data-analytics-ipl

Data analysis and visualization of IPL 2022 matches using Python, Pandas, Matplotlib, and Seaborn. Includes insights on match outcomes, player performances, toss trends, and venue stats with 12+ charts.

data-analysis data-visualization ipl-data-analysis ipl-data-visualization jupiter-notebook matplotlib-pyplot numpy pandas python seaborn

Last synced: 09 May 2026

https://github.com/debjyotisaha/sql-projects

Designed and implemented SQL-based projects to analyse and manage datasets efficiently. Demonstrated expertise in writing complex queries, optimizing database performance, and performing data extraction, transformation, and loading (ETL) processes.

data-analysis database sql

Last synced: 09 Oct 2025

https://github.com/mirwais-farahi/data-visualization-with-tableau-specialization

The Specialization provides Tableau for data visualization and business intelligence. The series covers skills like assessing data quality, designing visualizations and dashboards, and combining data sources to create compelling, data-driven stories.

dashboard data-analysis geospatial map tableau visualization

Last synced: 16 Feb 2026

https://github.com/sabdikay/analysis-of-biodiversity

This project analyzes biodiversity data from the National Parks Service, focusing on species in various park locations. Conducted in Jupyter Notebook, it uses pandas, matplotlib, NumPy, seaborn, and chi2_contingency for analysis and visualization.

data-analysis data-analysis-python data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 14 Apr 2026

https://github.com/brooks-code/toulouse-biblio-chronicle

Snapshot of Toulouse public library customer habits — cleaning raw, messy datasets of musical, cinematic, and literary checkouts; includes data-cleaning steps, analysis notebook revealing cultural tastes in the Pink City.

data-analysis data-cleaning data-cleaning-and-preprocessing data-quality exploratory-data-analysis jupyter-notebook library-data misaligned-data mojibake tutorial

Last synced: 10 Oct 2025

https://github.com/sharvesh1401/battsense

BattSense is a machine learning project focused on predicting the State of Health (SOH) of lithium-ion batteries using operational parameters such as voltage, current, temperature, and capacity. The model enables accurate, data-driven diagnostics for battery performance monitoring in electric vehicles and portable devices.

battery-diagnostics battery-health battery-health-prediction battery-soh data-analysis electric-vehicles energy-storage machine-learning predictive-maintenance python regression scikit-learn

Last synced: 07 May 2026

https://github.com/salma-mamdoh/the-android-app-market-on-google-play-project

My project aims to practice Data Analysis and Data Visualization on DataCamp

data-analysis data-visualization datacamp jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 12 Apr 2026

https://github.com/pranav016/exploratory-data-analysis-of-google-app-store-dataset

This is a data analysis done on the Google app store dataset to answer a few questions related to the data through data visualization techniques.

data-analysis

Last synced: 11 Oct 2025

https://github.com/montanaz0r/testing-if-mma-math-deduction-works-using-ufc-fighters-data

The probabilistic reasoning about phenomenon called MMA math using UFC fighters data and Python.

bayesian-inference data-analysis data-science graphviz jupyter-notebook pandas python scipy statistics

Last synced: 14 Apr 2026

https://github.com/vinay-jose/territorial-sales-dashboard

EDA was carried out in the sales data of Atliq Technologies and a Dashboard was created in PowerBI to draw insights.

data-analysis data-visualization powerbi-desktop sql

Last synced: 11 Oct 2025

https://github.com/ahsankhizar5/titanic-eda-visualization

Exploratory Data Analysis and Visualization on the Titanic Dataset using Python, Pandas, Matplotlib, and Seaborn to uncover survival patterns.

data-analysis data-science data-visualization eda kaggle machine-learning matplotlib pandas python seaborn titanic-dataset

Last synced: 31 May 2026

https://github.com/thinzarhninyu/dap

Notes and Projects for Data Analysis with Python course from FreeCodeCamp.org

data-analysis data-analysis-python ipynb jupyter-notebook python

Last synced: 18 Feb 2026

https://github.com/dzakwanalifi/stadata-x

Terminal UI untuk menjelajahi dan mengunduh data BPS Indonesia secara interaktif

bps-api cli-app data-analysis data-visualization indonesia-statistics indonesian-data open-data python statistics terminal-ui textual tui

Last synced: 20 Jan 2026

https://github.com/tzerk/esr

R package 'ESR' for plotting and analysing ESR spectra in dating applications

data-analysis data-visualization electron-spin-resonance geochronology r

Last synced: 13 Mar 2026

https://github.com/sunsided/esc2024

Exploratory Data Analysis on the ESC 2024 results

csv data-analysis eurovision-song-contest scraping

Last synced: 18 Feb 2026

https://github.com/jsimell/sleepanalysis

A Python data analysis project analyzing the sleep quality affecting factors and temporal patterns in the sleeping data of a single subject.

data-analysis matplotlib numpy pandas python scikit-learn seaborn

Last synced: 14 Apr 2026

https://github.com/samkazan/business-analysis-tableau

Business Analysis on Global/Superstore data using Tableau.

analysis data-analysis tableau visualization

Last synced: 08 Feb 2026

https://github.com/ayorick23/python-data-science-cheat-sheet

Guía rápida y práctica de sintaxis, comandos y funciones esenciales de Python para Ciencia de Datos. Perfecta para recordar cómo usar las librerías más comunes como NumPy, Pandas, Matplotlib y Scikit-learn en tus análisis diarios.

cheat-sheet data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning matplotlib ml numpy pandas python scikit-learn scipy seaborn statistics sympy tensorflow

Last synced: 07 Apr 2026

https://github.com/ankitpoddar07/excel-project_back-office

📊 Coffee Sales Analytics – Back Office Excel Project

data-analysis ms-excel

Last synced: 05 Feb 2026

https://github.com/saisurajmatta/e-commerce-sales-advanced-data-analysis

Excel-based e-commerce analytics for FNP, a gift company. It covers data extraction, modeling, and visualization, providing actionable insights on revenue, customer behavior, and operations. Key skills include Excel, Power Query, Power Pivot, and DAX. The analysis culminates in data-driven business recommendations.

data-analysis data-visualization dax excel power-pivot power-query

Last synced: 22 Jan 2026

https://github.com/sanjayankur31/20181206-neurofedora

Slides for my NeuroFedora seminar at the UH Biocomputaiton group's weekly seminar

computational-neuroscience data-analysis neurofedora neuroimaging neuroscience open-science

Last synced: 19 Feb 2026

https://github.com/hase3b/flask-dash-interactive-dashboard

An interactive data visualization dashboard created using Flask and Dash. This project includes comprehensive data preparation, exploratory data analysis (EDA), and dynamic visualizations with Seaborn and Plotly. Explore the multi-page Dash app with features like dropdowns and callbacks for updated plots.

callbacks dash dashboard data-analysis data-visualization dropdown eda flask interactive plotly seaborn web-app

Last synced: 19 May 2026

https://github.com/mattdelaune/excel_sales_dashboard

Interactive Excel Dashboard for Coffee Sales Analysis: This project leverages Excel to analyze sales data, uncover seasonal trends, regional preferences, and customer behaviors, providing actionable insights for optimizing inventory and marketing strategies.

data-analysis excel pivot-tables sales-dashboard sales-data

Last synced: 27 Jan 2026

https://github.com/casassg/ms_thesis

Social Media Analysis for Crisis Informatics in the Cloud

casassg-thesis data-analysis google-cloud kubernetes

Last synced: 19 Oct 2025

https://github.com/abishek0103/olist-ecommerce-sql-project

SQL Project using Olist Dataset – E-commerce analysis with MS SQL Server to extract business insights.

business-insights data-analysis sql-server

Last synced: 19 Oct 2025

https://github.com/Kaushik-Puttaswamy/Airline-Passenger-Referral-Prediction-Using-Machine-Learning

This project uses a machine learning model to predict if passengers referred by existing customers will book a flight, helping airlines target likely customers. Key factors like service ratings and value for money drive predictions, achieving over 90% accuracy.

airline-marketing customer-referral-prediction customer-satisfaction data-analysis feature-engineering hyperparameter-tuning machine-learning model-evaluation predictive-analytics

Last synced: 20 Oct 2025

https://github.com/jigyasag18/bird-strikes-in-aviation-project

This project analyzes over a decade of U.S. bird strike data (2000–2011) to evaluate safety risks, damage trends, and cost implications in aviation. Using PostgreSQL for database management and Power BI for dashboard visualization, it uncovers critical insights into when, where, and how wildlife impacts aircraft. Key findings inform strategically.

bird-strike-prevention bird-strike-prevention-in-real-airport data data-analysis data-analysis-project data-visualisation data-visualization data-visualization-project data-visualizations database dataset dax-query postgresql postgresql-database powerbi powerbi-desktop powerbi-report powerbi-visuals sql sql-database

Last synced: 09 May 2026

https://github.com/gunifiri/duckdb-ghw

🦆 Accelerate analytics with DuckDB's integration for GitHub workflows, enabling efficient data handling and processing directly within your repositories.

analytics analytics-engine big-data columnar-storage data-analysis data-science database duckdb in-memory-database open-source parquet python query-planner r sql

Last synced: 29 Apr 2026

https://github.com/albertobarrago/sentinel

A contribute for the research of Corrado Malanga and Filippo Biondi

data-analysis sar

Last synced: 24 Oct 2025

https://github.com/sugumarsrinivasan/sql-datawarehouse-project

Building Mordern datawarehouse with SQL Server, including ETL Processes, data modeling, and data analytics.

data-analysis data-analytics data-engineering data-lake data-science data-warehouse datawarehousing etl etl-pipeline medallion-architecture sql sql-query sql-server

Last synced: 19 Jun 2026

https://github.com/brianlesko/r_data_science_stat5730

Written by Brian Lesko, the repository contains R Scripts demonstrating data science topics largely originating from study at Ohio State. Contents are written in R studio using the R markdown file. As of 1/21/23 Future projects concerning data science, statistics, and machine learning will be in python in my machine learning Repository

data data-analysis flight-data ggplot2 olympics-data r-markdown tidyverse

Last synced: 23 Jan 2026

https://github.com/alessandroryo/bike-rental-data-analysis

A data analysis project focused on understanding and predicting bike rental patterns. This project utilizes data processing, visualization, and predictive modeling techniques to gain insights into bike rental usage, fulfilling the final submission requirement for Dicoding Indonesia's Data Analysis course.

bike-rental data-analysis data-visualization jupyter-notebook machine-learning python streamlit

Last synced: 09 Apr 2026

https://github.com/nishumehta/coffee-beans-sales-analysis

An in-depth analysis of coffee bean sales using an interactive Excel dashboard, which highlights trends and customer insights

dashboard data-analysis data-visualization excel

Last synced: 28 Jan 2026

https://github.com/janiavdv/data-spirits

Analysis of alcohol and sports betting data, including a correlation investigation.

correlation data-analysis data-science machine-learning

Last synced: 11 Nov 2025

https://github.com/a26nine/kortext-usage-dashboard

An interactive data visualisation dashboard built using Tableau software to understand the value of digital resources issued on Kortext platform at Middlesex University, London.

data-analysis data-science data-visualization knime tableau

Last synced: 01 Feb 2026

https://github.com/sehgal-vishal/sql-nyc-collision-analysis

this analysis is based on the Collisions(Accidents) happend in New York City. I have used Sql Server For EDA(Exploratory Data Analysis

data-analysis database eda sql-server

Last synced: 06 Feb 2026

https://github.com/shrutiijoshi/apple_greenhouse_gas_emissions

A breakdown of Apple's greenhouse gas emissions from 2015 to 2022 as they aim to reach net zero emissions by 2030.

dashboard data-analysis data-visualization powerbi

Last synced: 06 Feb 2026

https://github.com/ljadhav25/linear_regression_data_science

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

data-analysis data-science linear-regression machine-learning

Last synced: 26 Oct 2025

https://github.com/vishalsiingh/deloitte-virtual-internship

Submission for the STEM Virtual Program by Deloitte via Forage.

coding cyber-security data-analysis deloitte development forage forensics

Last synced: 23 Jan 2026

https://github.com/garcane/unicorn-companies-analysis

Tracking unicorn startups (valued at $1B+) provides valuable insights for investors and analysts to identify high-growth industries and emerging trends.

data-analysis exploratory-data-analysis financial-analysis investor postgresql sql

Last synced: 24 Jan 2026

https://github.com/rahulchouhan1/sql-data-warehouse-project

Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.

data-analysis data-cleaning data-engineering data-science data-warehouse datascience etl etl-pipeline sql sql-query sql-server

Last synced: 24 Jan 2026

https://github.com/yash1882/music-store-data-analysis

A project focuses on analyzing music store data using SQL ♬

begineer-friendly data-analysis music music-store-data music-store-data-analysis sql-project

Last synced: 28 Jan 2026

https://github.com/anurag-ghosh-12/library_management_system_sql

This project showcases the development of a comprehensive Library Management System utilizing Structured Query Language (SQL). It demonstrates a practical application of relational database principles to efficiently manage library resources, member information, and borrowing/returning transactions.

data-analysis data-visualisation dbms-project sql

Last synced: 29 Jan 2026

https://github.com/andreicirciumaru/best-of-breed

CSV fundamentals screener: schema validation + market-cap weights

csv data-analysis finance pandas python screener

Last synced: 15 Apr 2026

https://github.com/engineertolulope/us_states_living_ranking_analysis

Python script for analyzing and ranking U.S. states based on factors like cost of living, tax burden, diversity, crime rates, and climate. Uses weighted criteria to identify the best states to live in according to these metrics. Ideal for decision-making on relocation.

data-analysis data-science linear-regression machine-learning python scikit-learn

Last synced: 29 Jan 2026

https://github.com/smahala02/magnetism-lab

This repository contains Python scripts and data for analyzing inductance in toroidal coils to calculate the magnetic permeability of ferrite materials. The project helps classify materials as soft or hard magnets based on experimental data.

data-analysis inductance jupyter-notebook magnetism python toroids

Last synced: 29 Jan 2026

https://github.com/shrutiijoshi/marketing-campaign-report

The dataset includes information on campaign types, recipient segments, interactions (clicks, opens, bounces, etc.), and conversion metrics.

dashboard data-analysis data-visualization tableau-public

Last synced: 25 Feb 2026

https://github.com/joannescode/regex_with_py

Learning by practicing with Regex (Python)

data-analysis python3 regex

Last synced: 30 Jan 2026

https://github.com/mfakhriazhar/us-companies-revenue-dashboard

This project is a data visualization dashboard built using Power BI that highlights lists of the largest companies in the United States by revenue. The goal is to provide an interactive overview of company performance across industries, focusing on revenue, employee metrics, and industry trends.

dashboard data-analysis data-visualization largest-companies-us powerbi revenue united-states

Last synced: 30 Jan 2026

https://github.com/touchesir/twitter_physicalactivity

Companion Data / Analysis for "Monitoring Physical Activity Levels using Social Media Data"

data-analysis twitter

Last synced: 30 Jan 2026

https://github.com/jcaperella29/jc_bioinformatics_hub

A personal hub to showcase my bioinformatics applications including RNA-Seq, ATAC-Seq, and miRNA-Seq analysis tools. Powered by simple HTML, CSS, and JavaScript with a biotech-themed design.

atac-seq bioinformatics biotech data-analysis github-pages portal rna-seq webapp

Last synced: 25 Feb 2026

https://github.com/gurpreet17/uc-davis-sql-for-data-science-specialization

Completed the SQL Basics for Data Science Specialization from the University of California, Davis, gaining proficiency in Data Analysis, SQL, Apache Spark, and Delta Lake.

apache-spark bigdata data-analysis data-science delta-lake sqlite

Last synced: 15 Apr 2026

https://github.com/luminati-io/indeed-dataset-samples

A sample dataset of over 1000 Indeed job listings, extracted using the Bright Data API, ideal for market analysis and growth.

api data-analysis datasets indeed jobs web-scraping

Last synced: 07 Feb 2026

https://github.com/allanotieno254/bank-loan-analysis-dashboard-power-bi

An interactive Power BI dashboard that analyzes bank loan data to provide insights into approval trends, default risks, and customer profiles. Designed to assist financial institutions in making data-driven lending decisions.

bank-loans business-intelligence dashboard data-analysis financial-analysis power-bi risk-assessment

Last synced: 31 Jan 2026

https://github.com/malthejorgensen/repx

Python regular expression file transformer

command-line-tool data-analysis text-processing

Last synced: 31 Jan 2026