An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/souravsuvarna/whatsapp-chat-analyzer-and-visualizer-web-application

The WhatsApp chat analyzer and visualizer uses NLP algorithms to analyze chat data, tracking usage patterns and presenting insights through visually appealing charts and graphs. It helps users understand communication patterns and behaviors on WhatsApp.

data-analysis data-science data-visualization python python3 streamlit

Last synced: 18 Apr 2026

https://github.com/nikitalpopov/evotor_champ

solution for evotor data challenge

data-analysis data-science python scikit-learn

Last synced: 15 Apr 2026

https://github.com/devexpress-examples/wpf-pivot-grid-provide-custom-summary-values

This example demonstrates how to determine the value type when you calculate custom summary values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 01 May 2026

https://github.com/tusharpandey003/chat_analysis

Analysis of group chat with respect to individual member of group

chat-analysis chat-analyzer data-analysis data-science streamlit whatsapp whatsapp-chat whatsapp-web

Last synced: 01 Feb 2026

https://github.com/axsk/geekgraph

parse, cluster and visualize boardgamegeek.com user profiles

data-analysis scraper

Last synced: 01 Feb 2026

https://github.com/bineet-ratna-shakya/data-science-salary-analysis

analyzing a dataset containing salaries of data science professionals from 2020 to 2023.

data-analysis data-science data-visualization jupyter numpy pandas python

Last synced: 01 Feb 2026

https://github.com/nevermendel/revolut-analysis

Python script to analyse Revolut transactions

data-analysis revolut revolut-analysis

Last synced: 12 Apr 2025

https://github.com/ludreinsalvador/life-expectancy-data-analysis

Contains Power BI dashboards analyzing global life expectancy trends, mortality rates, and health expenditures. Using a dataset sourced from Google Sheets, the project explores the impact of economic and healthcare factors on longevity.

dashboard data-analysis data-visualization healthcare-analysis life-expectancy powerbi

Last synced: 25 Feb 2026

https://github.com/bibymaths/python_snippets

A collection of Python scripts for bioinformatics data analysis, including tools for transcription counts, nucleotide composition, and protein sequence evaluation.

amino-acid-scoring bioinformatics data-analysis fasta-generation mathematical-evaluation nucleotide-analysis protein-sequence-analysis transcription-counts

Last synced: 29 Jul 2025

https://github.com/rissh/titanicsurvivalpredictionusingml

Predicting Titanic passenger survival through machine learning. This project includes data preprocessing, exploratory data analysis, feature engineering, and model training using Python. 🚢

data data-analysis data-science data-visualization dataanalysis jupiter-notebook machine-learning machine-learning-algorithms machinelearning matplotlib numpy pandas prediction prediction-model python python3 seaborn tenserflow tflearn titanic

Last synced: 01 Feb 2026

https://github.com/rohitdusane/healthcare-analytics

𝐏𝐨𝐰𝐞𝐫 𝐁𝐈 𝐃𝐚𝐬𝐡𝐛𝐨𝐚𝐫𝐝 is designed to provide valuable insights into patient waiting times across outpatient and inpatient healthcare services. It offers a comprehensive analysis of key factors influencing wait lists, including Age Profile, Specialty, Time Bands, and Patient Case Types.

data-analysis data-visualization dax dax-query healthcare-analysis powerbi-report

Last synced: 01 Feb 2026

https://github.com/sanchittechnogeek/overscripted-analysis

Geolocation and user language extraction analysis from Mozilla Overscripted dataset

analysis data data-analysis mozilla

Last synced: 23 Mar 2025

https://github.com/azaz9026/data_cleaning

Welcome to the Data Cleaning repository! This collection is dedicated to showcasing techniques and methods for cleaning and preparing datasets for analysis.

data-analysis data-engineering data-structures data-visualization eda feature-engineering machine-learning numpy outliers pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/vishnu-vamshii/data-science-jobs-salaries

Created an interactive dashboard to analyze data science jobs salaries in different regions of the world, experience levels, average salaries in USD and type of employment along with a geographical visual.

data-analysis data-science data-visualization tableau tableau-dashboard

Last synced: 01 Feb 2026

https://github.com/rtgrt5645/numpy-lab

🧮 Explore, manipulate, and visualize data with NumPy to enhance your Python skills in scientific computing and data analysis.

array-operations data-analysis data-science jupyter-notebook machine-learning numerical-computing numpy numpy-arrays numpy-library numpy-python python python3 scientific-computing

Last synced: 04 May 2026

https://github.com/tameronline/ai-financial-analyst

AI-driven financial analyst system utilizing LangChain and Ollama for real-time stock analysis, market trends, and financial insights.

ai data-analysis finance financial-analysis langchain machine-learning nlp ollama stock-market

Last synced: 02 Feb 2026

https://github.com/prince-pastakiya/human-resources-tableau-project

👥 Interactive Tableau dashboard for HR analytics — includes workforce overview, demographics, income analysis, and detailed employee records with full filtering.

chatgpt data-analysis data-visualization human-resources numpy python python-faker tableau-dashboards tableau-public

Last synced: 18 Apr 2026

https://github.com/khanovico/python-stock-analyzer

This is a Webapp implemented by python and several data science frameworks, enabling online stock trend analyzing.

amcharts-js-charts data-analysis data-visualization flask javascript pandas python scikit-learn

Last synced: 02 Feb 2026

https://github.com/atanikan/data-mining-projects

Data Mining Homework

data-analysis iub

Last synced: 14 Mar 2025

https://github.com/devbigboy/excel-power-query-get-transform

Power Query is a feature in Excel that allows you to quickly import data from multiple sources and easily clean, transform, and reshape it to suit your needs.

data-analysis data-science excel

Last synced: 08 Feb 2026

https://github.com/suhail25/hotel-booking-analysis

Analyzed the cancelling of booking of hotels and summarized insights to the Hotel Manager to increase profit by 30%. Demonstrated data exploration, cleaning, analysis using Python and its libraries: pandas, seaborn, matplot. Documented the results in PDF report: reduced cancellation by 30% and releasing discounts for 10 days in a month.

data-analysis ipynb-notebook matplotlib pandas python seaborn

Last synced: 08 Feb 2026

https://github.com/farzeen-2001/superstore_analysis_sql

Anaylsed the superstore Data using SQl

data-analysis mysql sql

Last synced: 15 Apr 2025

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026

https://github.com/grindelfp/datasets-analysis

The Machine Learning and Data Analysis course task dedicated to training skills of data normalizing and preprocessing.

data-analysis datasets ipynb mlda

Last synced: 05 Mar 2026

https://github.com/mmfava/lonomia-host-plants-2024

This project investigates the relationship between Lonomia achelous and Lonomia obliqua caterpillars and their host plants. The project uses Docker for a consistent environment and R for statistical analysis, with detailed processes documented in Jupyter notebooks.

data-analysis host-plants lonomia lonomism r

Last synced: 01 May 2026

https://github.com/djm158/learning-microsoft-r

Working through https://www.gitbook.com/book/smott/introduction-to-microsoft-r-server/details and creating samples

data-analysis data-science microsoft microsoft-sql-server r

Last synced: 15 Apr 2026

https://github.com/josericodata/statisticsapp

Interactive statistics analysis app using Python and Streamlit. Perform key statistical tests, visualise distributions, and explore data with ease.

alpha-value chi-square-test confidence-intervals data-analysis dublin dublin-ireland europe hyphotesis-tests ireland normal-distribution null-hypothesis p-value portfolio python statistics streamlit t-test tech ubuntu z-test

Last synced: 26 Feb 2026

https://github.com/jweinst1/xenon

A processing based language

data-analysis interpreter reactive-programming

Last synced: 15 Apr 2026

https://github.com/cdeweyx/bryce-harper-2016-analysis

Notebook analyzing Bryce Harper's disappointing 2016 campaign in historical context through data analytics.

data-analysis data-visualization python

Last synced: 01 May 2026

https://github.com/badranalyst/titanic-survival-prediction-full-data-science-project-classification

This project predicts Titanic survivors using classification models. It includes data cleaning, pre-processing, exploratory data analysis (EDA), categorical feature conversion, model building, and evaluation. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used to analyze and predict survival outcomes.

classification data-analysis data-science eda exploratory-data-analysis machine-learning matplo matplotlib-pyplot ml model numpy pandas predictive-modeling python seaborn

Last synced: 06 May 2026

https://github.com/okwilkins/retailanalysis

A comprehensive exploratory analysis and implementation of kmeans/hierarchical clustering on online retail data.

data-analysis data-science machine-learning statistics

Last synced: 18 Oct 2025

https://github.com/themihirmathur/uber-data-analytics

The goal of this project is to perform comprehensive data analytics on Uber trip data using a modern data engineering stack on Google Cloud Platform (GCP).

bigquery data-analysis data-engineering etl-pipeline google-cloud-platform looker python

Last synced: 09 Feb 2026

https://github.com/shubham200137/spotify-listening-habits-analytics

Spotify Listening Habits Analytics is a project aimed at analyzing personalized Spotify listening habits and music trends. It involves Exploratory Data Analysis (EDA) with Python Pandas, data processing using SQL Server, and creating visualizations with Power BI. The goal is to uncover insights into listening patterns, track popularity, and artist.

data-analysis data-visualization exploratory-data-analysis jupyter-notebook pandas power-bi-dashboard sqlserver

Last synced: 18 Mar 2026

https://github.com/barraharrison/airbnb-price-trends

Looking at how Airbnbs differ in price when it comes to location, room type and host activity

data-analysis data-science pandas plotly python streamlit

Last synced: 09 Feb 2026

https://github.com/rahul-jha98/restauranttrends.stats-backend

Application that scrapes the Zomato Dataset and enables the user to visualise the results.

data-analysis data-extraction firebase-storage web-scraping zomato-api

Last synced: 16 Mar 2026

https://github.com/naninsv/apple-retail-sales-warranty-analysis

An advanced SQL project analyzing over 1 million rows of Apple retail sales data to solve real-world business problems, optimize query performance, and extract actionable insights. The analysis includes sales trends, warranty claims, product performance, and year-over-year growth

business-intelligence data-analysis data-science etl insights retailanalytics sql sqladvance

Last synced: 26 Feb 2026

https://github.com/ninadpatil09/heart_disease_detection_analysis

The Heart Disease Detection Analysis aims to create a predictive model for identifying individuals at risk of heart disease. Using a dataset with attributes like age, sex, and health metrics, the project focuses on distinguishing patients with and without heart disease.

data-analysis data-cleaning data-science data-visualization machine-learning

Last synced: 15 Apr 2026

https://github.com/nulltea/kicksware-scrapebot

Web scraping tool to retrieve sneaker details & images from web store sites

bot data-analysis pandas python sneakers web-scraping

Last synced: 15 Apr 2026

https://github.com/andrii04/ga4-gcs-to-bigquery-etl

Automated Data Pipeline that ingests daily GA4-formatted CSV files from a private Google Cloud Storage bucket, validates and loads them into BigQuery, and prepares analysis-ready views. The solution is built for deployment as a Cloud Function triggered by Cloud Scheduler and uses Python with the Google Cloud Storage and BigQuery client libraries.

automation bigquery cloud cloudfunctions data data-analysis data-engineering etl etlpipeline gcp google googlecloudplatform pipeline python sql

Last synced: 18 May 2026

https://github.com/haroontrailblazer/machine_learning

About This Repository A curated resource hub for learning machine learning, featuring tutorials, code examples, datasets, and hands-on projects to build foundational skills and explore real-world applications.

data data-analysis data-visualization database dataset gradient-descent machine-learning pandas python3 random-forest sklearn statistics

Last synced: 16 Apr 2026

https://github.com/purushothamadluru/kpi-driven-insights-dashboard-customer-churn-analysis

This repository features a Power BI project designed to deliver KPI-driven insights into customer churn patterns. Leveraging a robust dataset and advanced data modeling techniques, this project uncovers trends, identifies key drivers of churn, and enables businesses to make data-driven decisions.

customer-churn-analysis data-analysis insights-dashboard kpi powerbi

Last synced: 09 Feb 2026

https://github.com/ronitjariwala/prodigy_ds_02

Prodigy InfoTech Data Science Internship Task-2

data-analysis python

Last synced: 28 Apr 2026

https://github.com/animesh-chourey/power-bi

Various projects at my attempt to learn Power BI

business-analytics data-analysis data-visualization powerbi

Last synced: 10 Feb 2026

https://github.com/tushar2704/imdb-movie-analysis

This project extracts meaningful insights, trends, and patterns from the data, shedding light on various aspects of the movie industry. By leveraging this analysis, filmmakers, studios, and enthusiasts can gain valuable information to inform decision-making, understand audience preferences, and contribute to the creation of successful movies.

artificial-intelligence data-analysis data-science imdb project tushar2704

Last synced: 10 Feb 2026

https://github.com/marknature/machine-learning-intern

Machine Learning tasks involving the Titanic Dataset and Breast Cancer Wisconsin (Diagnostic) dataset

data-analysis github jupiter-notebook machine-learning matplotlib numpy pandas python scikit-learn sklearn

Last synced: 10 Apr 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/skysign/dat

데이터분석을 함께 공부하는 스터디입니다.

data data-analysis data-science

Last synced: 02 Jan 2026

https://github.com/serlo/data-pipeline-interactive-exercises

processing pipeline for exercise dashboards

data-analysis serlo

Last synced: 26 Feb 2025

https://github.com/aneeshmurali-n/project-ml-data-preprocessing

The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.

data-analysis data-cleaning data-encoding data-exploration feature-scaling label-encoding matplotlib minmaxscaler numpy one-hot-encoding outlier-detection pandas standardscaler

Last synced: 02 May 2026

https://github.com/nickenshidqia/startup-venture-funding-dashboard-data-analysis

The Startup Venture Funding Dashboard is a comprehensive visual representation of the dynamic landscape of startup funding, providing valuable insights into the top startups, funding round types, markets, startup statuses, and investor details.

dashboard data-analysis tableau tableau-dashboards

Last synced: 11 Feb 2026

https://github.com/haonamnguyen/data-science-job-analysis

Evaluate the factors influencing salary trends in the data science industry, including experience levels, job titles, employment types, company sizes, and remote work arrangements, to help HR teams and hiring managers make data-driven decisions regarding compensation packages and recruitment strategies.

data-analysis data-science data-visualization jupyter-notebook python

Last synced: 16 Apr 2026

https://github.com/multitagging/benchmarks

Provides benchmarks to test the MultiTagging framework

benchmarks data-analysis ethereum smart-contracts vulnerabilities

Last synced: 11 Feb 2026

https://github.com/praveen-devknight/event-registration-analytics-dashboard

This project presents an interactive and visually-rich Power BI dashboard that analyzes registration data from a college-level technical and non-technical event, Teciton. The dashboard provides comprehensive insights into participant demographics, event preferences, food choices, and time-based trends.

data-analysis data-visualization excel powerbi sql

Last synced: 11 Feb 2026

https://github.com/joemull/pyjade

A data curation script for the Jane Addams Digital Edition

data-analysis digital-humanities

Last synced: 11 Feb 2026

https://github.com/rodrigojunqueiradev/python-exercises

Repositório para armazenar exercícios realizados na linguagem Python / Repository to organize exercises with Python language

data-analysis data-science data-structures data-visualization database math pandas pandas-python python python-3 python3 sql statistics

Last synced: 16 Apr 2026

https://github.com/satyam4229/omnify-dataanalysis

Our assessment of Omnify focused on data-driven strategies to maximize profitability. We identified "Product X" as the most profitable product and recommended leveraging the "Wellness Solutions" keyword category for optimal keyword strategy.

data-analysis data-science data-visualization excel omnify

Last synced: 04 Jan 2026

https://github.com/karlyndiary/spotify-excel-dashboard

Data Analysis on the Spotify Dataset using Microsoft Excel and VBA.

charts data-analysis data-cleaning data-visualization excel excel-export excel-vba pivot-tables

Last synced: 04 Jan 2026

https://github.com/mstovarh/analisis-de-bebidas-de-starbucks

En este repositorio se encuentran unas gráficas basadas en diversas características de las bebidas de Starbucks, usé tecnologías como la herramienta de Data Analysis de ChatGPT, Excel y PowerQuery.

chatgpt data-analysis excel powerquery

Last synced: 15 Apr 2025

https://github.com/mnkanout/patients_medication_prediction

The aim of the project is to create a model that can help medical professionals select the proper medication for patients based on their symptoms. The model uses historical data of other patients to predict what could be the most suitable medication based on the patient's symptoms.

data data-analysis data-science data-visualization decision-tree-classifier machine-learning python3

Last synced: 29 Jun 2025

https://github.com/bala-1409/sql-projects

The repository contains Structured Query Language (SQL) Scripts. The Multiple SQL scripts for various projects which includes data cleaning, data pre-processing, data processing, data transformation and insights gaining through Query Language.

data-analysis data-mining data-science data-transformation database eda etl-framework exploratory-data-analysis microsoft-sql-server query-language sql sql-server sql-server-database sql-server-management-studio

Last synced: 27 Feb 2026

https://github.com/thlindustries/mortalidade_neonatal_python_react

Uma plataforma de visualização de dados montada utilizando Python e React com a library de visualização do Plotly

data-analysis data-visualization plotly python python3 react reactjs

Last synced: 16 Apr 2026

https://github.com/thanaraklee/exploring-and-analyzing-data-in-oracle-database

This project focuses on data analysis using SQL with Oracle Database 21c. It aims to familiarize with data management and data analysis using SQL commands and Oracle Database 21c.

data-analysis oracle-database sql sql-developer

Last synced: 12 Feb 2026

https://github.com/jatin-mehra119/flight-price-prediction

This study aims to analyze flight booking data from "Ease My Trip" website, using statistical tests and linear regression to extract insights. By understanding this data, valuable information can be gained to benefit passengers using the platform.

data-analysis datacleaning datavisualization machine-learning preprocessing-data python sklearn-pipeline sklearn-regression-algorithm streamlit-webapp

Last synced: 04 May 2026

https://github.com/dnut/associations

Python 3 library to identify high-dimensional statistical relationships in any data set.

analytics arch-linux association-rules data data-analysis data-mining data-science machine-learning python-modules

Last synced: 01 May 2026

https://github.com/rohitblaze10/-excel-_seller_store_analysis

A collection of data analysis projects showcasing data cleaning, exploration, visualization, and machine learning. Using "Excel" and more to uncover insights and drive data-driven decision-making. Feel free to explore, contribute, or collaborate!

data-analysis data-visualization excel excel-export

Last synced: 12 Feb 2026

https://github.com/andimashkulli/vpms

Vehicle Parking Management System for Gjon Buzuku Gymnasium

backend-api data-analysis databases frontend-react mongodb nodejs software

Last synced: 12 Feb 2026

https://github.com/yalai92/alfalfa_imp_exp_analysis

This repository covers data cleaning, analysis, and visualization of global alfalfa and pellet imports, focusing on trends from 2003 to 2023. It also includes a predictive analysis of global alfalfa demand for 2024-2029, using data science techniques to provide insights for stakeholders in the alfalfa industry.

data-analysis data-cleaning data-visualization matplotlib numpy pandas python sckiit-learn tableau

Last synced: 12 Feb 2026

https://github.com/ankit21111/carpredict

This project predicts car prices using machine learning models, including Simple and Multiple Linear Regression. It covers data acquisition, feature selection, and optimization techniques like Ridge Regression. The best model, Multiple Linear Regression, achieved an R² score of 0.84. Check out the full analysis in the repository!

data-analysis data-visualization matplotlib numpy pandas pyhton scipy seaborn sklearn

Last synced: 16 Apr 2026

https://github.com/martachesnova/big-data

Finding out whether reviews from Amazon's Vine program are trustworthy. Performed ETL process in the Cloud and uploaded a DataFrame to an RDS instance. Used PySpark and Spark SQL to perform a statistical analysis and uncover "hidden" insights.

big-data data-analysis dataset python spark sql

Last synced: 16 Apr 2026

https://github.com/edoaltamura/rotational-ksz-macsis

Repository for suppelementary material from my publication on the rotational kinetic SZ effect in MACSIS

cosmology data-analysis galaxy-clusters high-performance-computing hydrodynamics

Last synced: 28 Feb 2026

https://github.com/rahulsm20/storedata

A data analysis project aimed at analyzing the sales data of the super store and providing useful insight into customer preferences.

data-analysis matplotlib numpy pandas python streamlit

Last synced: 16 Apr 2026

https://github.com/farhad-here/median-performance-comparison

Benchmarking the performance of median calculation using vanilla Python vs NumPy.

data-analysis matplotlib numpy python

Last synced: 18 Apr 2026

https://github.com/shrutiijoshi/coffee_sales

This project aims to analyze coffee sales data to identify key trends, patterns, and factors influencing sales performance.

data-analysis microsoft-excel

Last synced: 28 Feb 2026

https://github.com/kariemseiam/geoegy

An innovative and responsive dashboard to discover, filter, and analyze places across Egypt. Featuring advanced search, interactive maps with Leaflet.js, real-time analytics, dark mode, and seamless data export—all wrapped in a sleek, modern design with RTL support.

accessibility data-analysis data-visualization es6-modules geojson javascript leaflet mapping openstreetmap places-data responsive-design web-development

Last synced: 13 Feb 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/ronylpatil/whatsapp-group-chat-analysis

This project is totally based on data analysis where our college official Whatsapp group is used to extract useful information from the chat. Some of the useful extracted features are most active members of the group, most active day of the week, top-10 media contributors in the Group, and many more...

data-analysis data-preprocessing data-wrangling feature-engineering

Last synced: 14 Jun 2025

https://github.com/haideratgh/sql-data-analytics-project

This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis

analytics business-analytics business-intelligence data data-analysis data-analyst data-analytics data-engineering data-science data-scientist database datascience query reporting sql sql-query sql-server window-functions-in-sql

Last synced: 29 Jun 2025