An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sinsunsan/earth-survival-kit

Global warning data visualisation app to make everyone understand global warning and take actions that matter

angular angular7 d3 data-analysis data-visualization ecology global-warning ngx-charts

Last synced: 05 May 2026

https://github.com/nathadriele/transaction_fraud_prevention_pipeline

Uma solução de detecção e prevenção de fraudes em transações financeiras, combinando Machine Learning, regras de negócio e análises estatísticas avançadas. O sistema oferece um dashboard interativo para monitoramento em tempo real, análise de dados e gestão de alertas de fraude.

data-analysis data-visualization docker fraud-prevention machine-learning matplotlib numpy pandas pipeline pytest python scikit-learn scipy seaborn streamlit tensorflow transaction xgboost

Last synced: 10 Apr 2026

https://github.com/rijul007/market-basket-analysis-using-r

Market Basket Analysis using association rules, leveraging R’s powerful tools for data-driven retail strategies.

data-analysis data-science r

Last synced: 02 Apr 2025

https://github.com/annnieglez/computer-vision-parking-lot

This project leverages computer vision techniques to analyze parking lot occupancy. The goal is to detect available parking spaces in real-time using image and video input.

computer-vision data-analysis data-science data-visualization google-colab image-classification image-processing machine-learning python transfer-learning

Last synced: 15 May 2026

https://github.com/karencofre/riesgorelativo-lookerstudio

proyecto de análisis de datos y análisis perdicitvo en looker studio y google colab

bigquery data-analysis data-science machine-learning matplotlib python sklearn sql

Last synced: 03 Jan 2026

https://github.com/kathisnehith/realestate-sales-analysis

Investigating real estate sales trends to understand market dynamics and inform investment decisions.

data-analysis excel realestate sales sql stastical-analysis-tools tableau

Last synced: 12 Feb 2026

https://github.com/madhursinghbhadoriya/data_analysis_sales_insights_using_tableau

• Performed Data Cleaning using MySQL. • Data analysis and ETL in Tableau. • Created an Interactive Dashboard with significant information about the Sales Insights, Profit and Revenue Analysis.

data-analysis data-visualization dataanalysis etl mysql tableau-dashboards tableau-desktop

Last synced: 09 Apr 2025

https://github.com/kartikey2807/bike-classification-1rt700

Binary classification problem involving Logistic regression, SMOTE and feature expansion.

data-analysis data-engineering data-visualization logistic-regression

Last synced: 30 Jul 2025

https://github.com/nishumehta/retail-sales-analysis

Retail sales performance analysis using Python and Power BI.

data-analysis ipynb-notebook jupyter-notebook powerbi python

Last synced: 15 May 2026

https://github.com/prakashjha1/whatsapp-chat-analyzer

WhatsApp Analyzer means we are analyzing our WhatsApp group activities. It tracks our conversation and analyses how much time we are spending or saying it as “wasting” on WhatsApp.

data-analysis data-science natural-language-processing pandas pyhton regular-expression

Last synced: 15 May 2026

https://github.com/sanveed-adnan/supermarket-sales-sql-project

SQL-based data analysis project on supermarket sales performance using SQLite and Power BI.

business-intelligence data-analysis data-science data-science-projects data-visualization power-bi sales-data sql sqlite

Last synced: 08 Nov 2025

https://github.com/rachkat/random-foresst-analysis-r-studio-plotting-classification-tree

Classification analysis in R using the birthwt dataset. Built and compared Decision Tree and Random Forest models to predict low birth weight. Both achieved 71.05% accuracy, with Random Forest reducing overfitting and confirming maternal weight and age as key predictors.

classification data-analysis decision-trees machine-learning predictive-modeling r random-forest

Last synced: 04 Oct 2025

https://github.com/alanmenchaca/getting-and-cleaning-data-course-project

The purpose of this project is to demonstrate how to collect, work with, and clean a data set.

data-analysis getting-and-cleaning-data rstudio tidy-data

Last synced: 31 Jul 2025

https://github.com/teamtigers/echartify

A web application built with .net core 2.2 that has come with the idea of reading the National Election's Data-set of Bangladesh in a fastest possible time and then representing the data-set with different statistical charts.

bangladesh chartjs code-first-migration cross-platform data-analysis data-structures data-visualization dotnet-core election-analysis election-data entity-framework-core materializecss mvc npoi razor-pages

Last synced: 16 Apr 2026

https://github.com/vara-co/solar-eclipse-2024

Group Project on the 2024 Solar Eclipse's Path over the US with an interactive map and a couple of visualizations on the data gathered.

data-analysis data-visualizations html-css-javascript interactive-map javascript map solar-eclipse

Last synced: 15 May 2026

https://github.com/k31ner/inmopipeline

Proyecto integral de análisis y modelado predictivo de datos inmobiliarios, que abarca recolección, transformación, visualización y machine learning utilizando Python y herramientas modernas de ingeniería y ciencia de datos.

data-analysis data-engineering data-science fastapi python streamlit

Last synced: 08 May 2026

https://github.com/anas436/data-science-projects

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources. Dive in to discover insights and techniques in data science. Reach out for collaborations and feedback.

data-analysis data-science machine-learning

Last synced: 27 Mar 2025

https://github.com/jovicdev97/Financial-Loan-DataScience-Notebook

using numpy and pandas to analyze a synthetic loan dataset with python

data-analysis matlabplot numpy pandas plotting python seaborn

Last synced: 12 Mar 2025

https://github.com/cyberoctane29/epa-air-quality-aqi-analysis

This project involved analyzing air quality data from the EPA, focusing on the Air Quality Index (AQI). I used Python data structures like dictionaries and sets to manage and process the data, simulating real-world data analysis to assess pollution levels and their health implications.

data-analysis numpy pandas python statistics

Last synced: 10 Apr 2026

https://github.com/alrza2003/google-data-analysis-case-study-cyclistic

This project analyzes Cyclistic’s trip data to identify patterns in bike usage between casual riders and annual members. The findings help optimize marketing strategies and membership conversions.

business-task cyclistic-bike-share-analysis-case-study data-analysis data-science data-visualization google-data-analytics google-data-analytics-capstone-project google-data-analytics-professional jupyter-notebook python rmarkdown tableau

Last synced: 09 May 2026

https://github.com/ayeshathoi/simulation-sessional-412

Simulation of SSQS, Inventory System, Transient State, PERT, Monte Carlo Alo etc.

data-analysis excel inventory-system monte-carlo python simulation ssqs triangle-distributions

Last synced: 31 Jul 2025

https://github.com/aalkiyumi/project-3-docker-container-for-data-processing-script

This Dockerized Python application analyzes two text files (IF.txt and AlwaysRememberUsThisWay.txt). It counts total words, identifies the largest file, and finds the top three most frequent words in each. Results are saved to an output file and printed to the console.

cs5165 data-analysis data-engineering data-science docker introduction-to-cloud-computing statistical-analysis text-processing uc uc2026 university-of-cincinnati

Last synced: 17 May 2026

https://github.com/jofaval/iris-flowers

Multilabel Classification of the famous Iris Flowers Dataset from Ronald Aylmer Fisher in 1936

classification data-analysis data-science data-visualization google-colab iris-flowers kaggle machine-learning python scikit-learn xgboost

Last synced: 05 Apr 2026

https://github.com/mainak-97/netflix-content-analysis-project

SQL-based analysis of Netflix’s movies and TV shows dataset to uncover content trends, popular genres, geographical insights, and audience preferences. Includes data queries, findings, and a presentation of key insights.

data-analysis mysql mysql-workbench powerpoint presentation-slides sql

Last synced: 23 Sep 2025

https://github.com/remram44/apex-legends-ocr-data

Get data from Apex Legends streams using OCR

apex-legends data-analysis video-games

Last synced: 31 Jul 2025

https://github.com/chandkund/loan-eligibility-prediction

This project is designed to predict the eligibility of loan applicants based on various factors such as income, credit history, and marital status. By analyzing historical loan application data, the model helps to determine whether a loan application should be approved or not.

data-analysis data-science data-visualization machine-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 09 Apr 2026

https://github.com/farrelfaricaf/exploratorydataanalyst---titanic

This project analyzes the Titanic dataset using exploratory data analysis (EDA) and visualization techniques to identify survival patterns. The goal is to understand how demographic factors like gender and age influenced survival rates during the 1912 disaster.

data data-analysis data-science data-visualization eda python titanic-dataset

Last synced: 31 Jul 2025

https://github.com/jpgiant/training_project

Analyzing whether there is a difference between the average death ages of left handers and right handers using Bayesian Conditional Probability Theorem.

bayesian-statistics data-analysis data-visualization numpy pandas-dataframe python

Last synced: 30 Apr 2026

https://github.com/pauliorandall/airline-passenger-satisfaction-r

Analysing the Airline Passenger Satisfaction dataset from Maven Analytics

data-analysis data-analytics r

Last synced: 01 Aug 2025

https://github.com/computingvictor/mercadona_agent

Web app to explore supermarket products with advanced filters, search, favorites, and nutritional info. Includes data analysis notebooks for deeper insights.

css data-analysis data-science data-visualization filtering html interactive-ui javascript notebooks nutritional-info pandas product-catalog python supermarket webapp

Last synced: 09 Apr 2026

https://github.com/darkdk123/handwashing-discovery-analysis

A Guided Project in a Boot camp to Analyse the Original Data used in the Discovery of Viruses & Hand Washing By Dr. Ignaz Semmelweis in Vienna General Hospital in the 1840s.

data-analysis data-science data-visualization matplotlib-pyplot numpy pandas plotly-python python seaborn-plots

Last synced: 09 Apr 2026

https://github.com/kailenroa/sleep-efficiency-project

This project focuses on analyzing sleep efficiency using wearable technology data. It explores patterns in sleep behavior and key factors impacting sleep quality. A dashboard was created using phyton and data visualization tools to provide actionable insights and recommendations for improving sleep health.

dashboard data-analysis html phyton sleep-efficiency

Last synced: 06 Jan 2026

https://github.com/hevalhazalkurt/word_analyser

A web app developed in Python and Django that analyzes given text mathematically and sentimentally.

analyzer analyzes content data-analysis django emotion python python3 sentiment sentiment-analyser sentiment-analysis text text-analysis

Last synced: 19 May 2026

https://github.com/aygp-dr/claude-log-stream

Advanced analytics engine for Claude Code logs with real-time processing capabilities

claude-api clojure data-analysis monitoring

Last synced: 24 Sep 2025

https://github.com/palwisha-18/time_series_analysis_lex_vs_gdp

Analyzes how a country’s GDP per capita correlates with the life expectancy of its citizens over a period of about 100+ years

data-analysis data-visualization pandas plotl time

Last synced: 19 May 2026

https://github.com/vipulbunny/ml-learning_projects

A collection of machine learning projects implemented in Python, showcasing core concepts like regression, classification, clustering, and model evaluation techniques. Ideal for learners and data science enthusiasts.

classification clustering data-analysis data-science data-visualization decision-trees jupyter-notebook machine-learning model-evaluation random-forest regression supervised-learning unsupervised-learning

Last synced: 23 Jul 2025

https://github.com/aravind2060/employee_engagement_analysis_spark

Using Spark Structured APIs to analyze employee data and extract insights related to employee satisfaction, engagement, concerns, and job titles within an organization.

apache-spark data-analysis data-preprocessing docker docker-compose python

Last synced: 09 Apr 2026

https://github.com/i-e-b/dynamictimewarp

A quick C# implementation of https://jeremykun.com/2012/07/25/dynamic-time-warping/

data-analysis pattern-matching working

Last synced: 17 Aug 2025

https://github.com/jasoncobra3/finops-copilot

An end-to-end AI-powered FinOps platform that ingests cloud billing data, analyzes cost trends, answers natural-language questions using a RAG pipeline (LangChain + FAISS + sentence-transformers + Groq), and provides actionable cost optimization recommendations. Includes a FastAPI backend and Streamlit dashboard UI - fully containerized with Docker

ai-assistant cloud-cost-optimization cloud-enginee cost-analytics data-analysis devops docker faiss faiss-vector-database fastapi finops groq langchain llm pandas rag rag-pipeline sentence-transformers sqlite3 streamlit

Last synced: 13 Apr 2026

https://github.com/wwgolay/hr1099-timelapse-vlbi

The repository for HR1099 timelapse VLBI.

astronomy astrophysics data-analysis website

Last synced: 03 Apr 2025

https://github.com/galal-pic/advanced_regression

A project to predict house prices through machine learning different techniques

data-analysis data-science deep-learning feature-engineering flask machine-learning python regression

Last synced: 08 Jul 2025

https://github.com/rodolfo-brandao/pos-graduacao

[pt-BR] Repositório para armazenar alguns materiais e projetos de cada módulo da minha especialização em Ciência de Dados (2025–2027)

artificial-intelligence data-analysis data-science data-visualization databases deep-learning jupyter linear-algebra machine-learning python r statistics

Last synced: 09 Apr 2026

https://github.com/0xunkn0wn4m1r/data_engineering_banking_project

🏦 Build a complete data engineering workflow for a banking system, showcasing ETL processes, data transformations, and an interactive financial dashboard.

automation data-analysis data-cleaning data-science feature-engineering fintech-bank flask-api loan-default-prediction machine-learning mlops model-explainability numpy postgresql scikit-learn segmentation shap sql unsupervised-learning

Last synced: 09 Apr 2026

https://github.com/puspacempaka/superstore-analysis-with-sql

This repository showcases various data analyses on the popular Superstore dataset using SQL queries. The analyses cover a range of business insights, including sales performance, customer segmentation, and product profitability. Each analysis is documented with the SQL queries used and explanations of the steps involved.

business-intelligence data-analysis sales-analysis sql superstore-dataset

Last synced: 09 Mar 2026

https://github.com/analyst-lochan/flight-delay-and-cancellation-dataset-2019-2023-

This project demonstrates a complete data analytics pipeline starting from raw real-world flight data to professional visual dashboards using SQL Server and Power BI. It showcases data import, cleaning, optimization, transformation, and dynamic DAX-based visual reporting.

airline-performance business-intelligence data-analysis data-cleaning data-modeling data-visualization dax etl flight-data kaggle-dataset portfolio-project powerbi powerbi-dashboard sql sql-server

Last synced: 09 Sep 2025

https://github.com/vishal-bhandary/sql-data-analytics

This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

analytics business-intelligence customer-segmentation dashboarding data-analysis data-reporting data-visualization data-warehouse etl kpi product-analysis sql sql-server star-schema t-sql

Last synced: 02 Aug 2025

https://github.com/ashwin331133/powerbi-data_professional_survey_breakdown

This project analyzes survey data from individuals interested in transitioning to the data field. The survey aims to understand their backgrounds, motivations, and the challenges they face. Using Power BI for data visualization, the project provides insights into the demographics and preferences of these aspirants.

data-analysis data-visualization powerbi

Last synced: 03 Jan 2026

https://github.com/tharun2806/end-to-end-internship-data-analysis

Internship Dataset Analysis is an end-to-end project analyzing an internship dataset obtained from Kaggle. The project involves cleaning and preprocessing the data using Excel and SQL, followed by exploratory data analysis (EDA). The analysis includes statistical, sectoral and geospatial insights, visualized through an interactive Tableau dashboard

bigquery data-analysis data-cleaning data-preprocessing data-visualization exploratory-data-analysis geospatial-analysis microsoft-excel reporting sectoral-analysis statistical-analysis tableau-public

Last synced: 01 Apr 2025

https://github.com/jigyasag18/ai-ml-salaries-and-ai-tools-usage-trends

This repository presents an in-depth Power BI analytics report on the AI job market trends and student AI tool usage from 2020 to 2025. It combines structured datasets (job postings, salaries, surveys) with custom DAX measures to uncover key patterns in salaries, remote work, industry demand, and student engagement. 5 interaractive dashboards made.

analysis data data-analysis data-visualization dataanalysis dataanalytics dataset datavisualization power-bi powerbi powerbi-dashboards powerbi-desktop powerbi-report powerbi-visuals powerbidashboard visualization

Last synced: 16 Feb 2026

https://github.com/jigyasag18/global-terrorism-1970-2017-analysis-using-big-data

This repository explores over 180,000 terrorist incidents across 205 countries using Hadoop and Power BI. The project identifies global and regional patterns in terrorism, analyzes the impact on civilians, and highlights high-risk areas. Key insights include attack trends,weapon usage,top terror groups,& country-specific risks like those in India.

big-data big-data-analytics data data-analysis data-visualization dataanalytics dataset hadoop hive hive-database hive-db hivedb power-bi powerbi powerbi-dashboards powerbi-desktop powerbi-report powerbi-report-validation powerbi-visuals powerbidashboard

Last synced: 19 Feb 2026

https://github.com/jigyasag18/airline-performance-and-passenger-satisfaction-project-using-big-data-analytics

This project analyzes 10 years of U.S. domestic airline data (~3GB) using Hadoop (Cloudera) and Hive for data processing. Power BI dashboards visualize key metrics like delays, on-time rates, air time, and diversions. The solution includes Hive queries, DAX measures, HDFS ingestion scripts, and year-wise insights with recommendations.

big-data big-data-analytics bigdata cloudera cloudera-hadoop cloudera-hadoop-framework data data-analysis data-visualization database hadoop hive power-bi powerbi powerbi-dashboard powerbi-dashboards powerbi-report powerbi-visuals powerbi-visuals-tools powerbidashboard

Last synced: 01 Aug 2025

https://github.com/alexzalox/us_stocks

Read US stock tickers and their costs from a CSV and display the formatted DataFrame in the terminal using pandas.

data-analysis finance pandas python python3 stocks yfinance

Last synced: 15 May 2026

https://github.com/juliargubolin/sql-for-data-analysis

This repository was created in order to insert all the documents, files and notes I took while learning SQL and data analysis through "SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights" by Cathy Tanimura (O'Reilly).

advanced data-analysis data-science sql

Last synced: 11 Jan 2026

https://github.com/darkdk123/house-valuation-model

A Challenge Project in a Boot-Camp to create a ML Model to predict the prices of houses in Boston Massachusetts from multiple parameters Using Multivariable Regression.

data-analysis data-science data-visualization matplotlib-pyplot multivariate-regression predictive-modeling statistics

Last synced: 07 Jul 2025

https://github.com/ryanga09/digitalent_fundamentaldatascience-selfpractice

A repository of hands-on projects from DigiTalent’s Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.

data data-analysis data-science data-visualization dataset digitalent komdigi notebook-jupyter notebooks

Last synced: 02 Aug 2025

https://github.com/dlozeve/topological-persistence

Topological persistence diagram (barcode) of a triangulation

data-analysis persistence topology

Last synced: 02 Aug 2025

https://github.com/ausaaf-rh/movie-recommendation-system-collaborative-filtering

🎬 A comprehensive movie recommendation system implementing item-based collaborative filtering with cosine similarity. Features real-time recommendations, performance evaluation metrics (Precision@K, Recall@K), and interactive user interface. Built with Python, scikit-learn, and MovieLens dataset for academic research and learning purposes.

agents data-analysis jupyter-notebook python python3

Last synced: 17 Apr 2026

https://github.com/s1m0n38/cr-analysis

An exercise in data collection/analysis

clash-royale data-analysis data-collection data-science

Last synced: 08 Jul 2025

https://github.com/takshak26/predict_blood_donations-

About The title of the project is “Predict Blood Donations”. It uses python as language, data science, and machine learning as the field of operation, TPOT library for model selection, logistic regression for model building, and jupyter notebook as the code editor.

data-analysis data-visualization datascience machine-learning python3

Last synced: 16 May 2026

https://github.com/muneeb706/r-programming

R-Programming examples for data analysis.

data-analysis r-programming

Last synced: 26 Mar 2025

https://github.com/jotstolu/netflix-sql-data-analysis-project

This project explores the Netflix dataset using SQL queries to uncover trends, patterns, and business insights that could help stakeholders understand content distribution, viewer preferences, and platform optimization

data-analysis sql sql-server tsql

Last synced: 02 Aug 2025

https://github.com/edoardotosin/january-2025-southern-california-wildfires-burn-severity-sentinel2

Scripts and data for analyzing burn severity of the January 2025 Southern California wildfires using Sentinel-2 satellite imagery. This project explores the use of the Differenced Normalized Burn Ratio (dNBR) and Relativized Burn Ratio (RBR) to classify burn severity, leveraging publicly available satellite data.

burn-severity copernicus data-analysis earth-observation satellite-imagery sentinel-2 wildfire wildfire-detection wildfires

Last synced: 09 Feb 2026

https://github.com/harshindcoder/online_retail_data_clustering_project

This marketing analytics project uses RFM (Recency, Frequency, Monetary) features for customer classification, inspired by the online retail mining paper. The RFM model helps segment customers, identify high-value ones, and optimize marketing strategies.

customer-segmentation data-analysis data-visualization market-analytics

Last synced: 17 Aug 2025

https://github.com/nushratjabenaurnima/cse_477_data_mining

A collection of labs, reports, Jupyter notebooks, and project outputs for the CSE 477 Data Mining course. This repository tracks my learning journey through data preprocessing, association rules, clustering, classification, and real-world data analysis with Python.

data data-analysis data-mining data-science google-colab-notebook jupyter-notebook machine-learning python python-3

Last synced: 09 Apr 2026

https://github.com/jibbs1703/airline-data-analysis

This repository contains the Exploratory Data Analysis of the flight delay and cancellation for airline flights in the United States in the year 2015. With this EDA, insights and solutions are suggested for business owners and airport managers.

business-insights business-solution data-analysis data-visualization

Last synced: 20 Mar 2025

https://github.com/kavicastelo/colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 26 Mar 2025

https://github.com/nahiyanhkhan/stock-market-data-analysis_capstone-project

In this course, learned and solved assignments on SQL and Python. Final capstone project was on analyzing "Stock Market Data". Achieved 100% score in every assignment.

data-analysis data-analytics matplotlib mysql mysql-database numpy pandas python sql

Last synced: 09 Apr 2026

https://github.com/codeonthespectrum/web-scrap

Este projeto realiza o web scraping da Wikipédia para obter dados sobre os municípios mais populosos do estado do Rio de Janeiro.

data-analysis data-visualization webscraping

Last synced: 16 Feb 2026

https://github.com/syed-amjad-ali/restaurant-sales-sql-project

This was a simple SQL project where I analyzed restaurant sales data, showcasing skills in data creation and querying. The project explores menu performance, order trends, and customer insights.

aggregations business-intelligence data-analysis guided-project joins maven-analytics querying restaurant-sales sales-data sql subqueries

Last synced: 03 Jan 2026

https://github.com/davidzajac1/four-percent-rule-pandas-analysis

Analysis of the 4% Personal Finance Rule of Thumb

data-analysis data-visualization pandas python

Last synced: 20 Apr 2026

https://github.com/quesocosteno03/data-analysis-projects

This repository serves as a collection of all my projects.

data-analysis jupyter-notebook powerbi

Last synced: 02 Aug 2025

https://github.com/lulloooo/bizdata-nexus

Collection of my Business & Data Analysis projects, from professional/academic endeavors to passion-driven explorations 📊

business-analysis data-analysis economics etl excel finance mysql python r risk-analysis

Last synced: 05 Apr 2026

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 13 May 2026

https://github.com/yamslam/contentsunderpressure_processing

A repository for data processing and analysis for Contents Under Pressure.

data-analysis data-processing data-visualization game-based-learning judgments process-safety

Last synced: 07 Sep 2025