An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/neuralsignal/loris

Loris: Database and Analysis application for a Drosophila Lab (or any lab)

data-analysis data-structures database datajoint flask neuroscience

Last synced: 12 Mar 2026

https://github.com/falakrana/data-analysis-visualization

This repository showcases data analysis and visualization projects using Python and Tableau. It includes exploratory data analysis, interactive dashboards, and insightful visual stories derived from real-world datasets.

data-analysis data-visualization python tableau-public

Last synced: 01 May 2026

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026

https://github.com/ronitjariwala/prodigy_ds_02

Prodigy InfoTech Data Science Internship Task-2

data-analysis python

Last synced: 28 Apr 2026

https://github.com/shruti-h/netflix-eda

Exploratory Data Analysis on Netflix Movies & TV Shows dataset using Python, Pandas, Matplotlib, and Seaborn

data-analysis data-science eda matplotlib netflix pandas-library python seaborn

Last synced: 01 May 2026

https://github.com/skysign/dat

데이터분석을 함께 공부하는 스터디입니다.

data data-analysis data-science

Last synced: 02 Jan 2026

https://github.com/aneeshmurali-n/project-ml-data-preprocessing

The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.

data-analysis data-cleaning data-encoding data-exploration feature-scaling label-encoding matplotlib minmaxscaler numpy one-hot-encoding outlier-detection pandas standardscaler

Last synced: 02 May 2026

https://github.com/karlyndiary/spotify-excel-dashboard

Data Analysis on the Spotify Dataset using Microsoft Excel and VBA.

charts data-analysis data-cleaning data-visualization excel excel-export excel-vba pivot-tables

Last synced: 04 Jan 2026

https://github.com/mstovarh/analisis-de-bebidas-de-starbucks

En este repositorio se encuentran unas gráficas basadas en diversas características de las bebidas de Starbucks, usé tecnologías como la herramienta de Data Analysis de ChatGPT, Excel y PowerQuery.

chatgpt data-analysis excel powerquery

Last synced: 15 Apr 2025

https://github.com/mmfava/lonomia-host-plants-2024

This project investigates the relationship between Lonomia achelous and Lonomia obliqua caterpillars and their host plants. The project uses Docker for a consistent environment and R for statistical analysis, with detailed processes documented in Jupyter notebooks.

data-analysis host-plants lonomia lonomism r

Last synced: 01 May 2026

https://github.com/beyzabasarir/brazilian-e-commerce-analysis

Brazilian E-Commerce Dataset By Olist PostgreSQL Analysis

data-analysis data-visualization sql

Last synced: 08 Jan 2026

https://github.com/haideratgh/sql-data-analytics-project

This repository contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis

analytics business-analytics business-intelligence data data-analysis data-analyst data-analytics data-engineering data-science data-scientist database datascience query reporting sql sql-query sql-server window-functions-in-sql

Last synced: 29 Jun 2025

https://github.com/deypadma2020/sql_project

✏️ A collection of practical SQL case studies and solutions exploring real-world business scenarios: car showroom analysis, esports tournament, customer insights, finance analysis, pricing strategy, and marketing analytics.

business-intelligence case-study data-analysis database mysql queries sql

Last synced: 30 May 2026

https://github.com/akmj1011/hill-and-valley-prediction-using-logistic-regression

Created A Prediction System Using Logistic Regression For Figuring Out The Hall And Valley From The Given Datasets

cloud-computing data-analysis data-manipulation data-preprocessing data-transformation data-visualization google-colab

Last synced: 13 May 2026

https://github.com/deypadma2020/dataanalysis-mlalgo

Practice repository for data analysis, feature engineering, statistics, web scraping, and building ML model pipelines in Python.

data-analysis eda feature-engineering machine-learning-algorithms ml-pipeline statistics web-scraping

Last synced: 30 May 2026

https://github.com/abhijeet107/final-project

Final project summation INTERNSHIP PROJECTS (2 WEEKS)

data-analysis data-cleaning-and-preprocessing excel mysql-database python tableau-public

Last synced: 23 Feb 2026

https://github.com/mahmoudwal27/sql-data-analysis

This project demonstrates SQL operations for managing student enrollments, including creating tables, inserting data, updating records, and running queries to analyze student and course information. It showcases skills in data manipulation, aggregation, and advanced query formulation.

analysis data-analysis sql sql-data-analysis sql-queries

Last synced: 13 Feb 2026

https://github.com/idb-devs/dataanalyticsairbnb

Construir um modelo de previsão de preço que permita uma pessoa comum que possui um imóvel possa saber quanto deve cobrar pela diária do seu imóvel.

data-analysis data-science jupyter python

Last synced: 18 Apr 2026

https://github.com/hi-jin2/data-analysis-basics

데이터분석기초(R) 수업 중에 작성한 소스코드 모음입니다. 『모두를 위한 R 데이터 분석 입문』 교재를 통해 R언어를 학습하였습니다.

data-analysis r r-studio

Last synced: 19 Jul 2025

https://github.com/greatwoman23/hotel_reservation_analysis

In this project, we delve into the intricate world of hotel reservations, utilizing a multifaceted analytical approach to uncover valuable insights. Through a combination of SQL queries and Tableau visualizations, we meticulously dissect a rich dataset comprising booking details, customer demographics, and reservation statuses.

data-analysis data-science data-visualization hotel hotel-reservation publications sql sql-query sqlite3 tableau

Last synced: 15 May 2026

https://github.com/anuppm9917/super-store-sales-analysis-power-bi-project

My drive to know which products, regions, categories and customer segments a company should target or avoid, I search and selected an appropriate dataset on kaggle which will match a standard superstore requirement.

data data-analysis data-visualization datacleansing excel exploratory-data-analysis jupyter-notebook numpy pandas plotly powerbi python3

Last synced: 10 Apr 2026

https://github.com/dimits-ts/college_analysis

A statistical study about US college admissions, featuring a full report in LaTeX.

anova data-analysis exploratory-data-analysis linear-regression statistics

Last synced: 25 Jan 2026

https://github.com/lanzafame/polycarp

[WIP] Subset operations on latlon data read from CSVs

data-analysis geospatial wip

Last synced: 12 Jan 2026

https://github.com/cdeweyx/bryce-harper-2016-analysis

Notebook analyzing Bryce Harper's disappointing 2016 campaign in historical context through data analytics.

data-analysis data-visualization python

Last synced: 01 May 2026

https://github.com/mikma03/datascience_python_datacamp

DataScience with Python. Code and examples. Python libraries, including pandas, NumPy, Matplotlib, and many more.

data-analysis data-science datacamp datascience numpy pandas python

Last synced: 06 May 2026

https://github.com/virajbhutada/diamond-price-estimator

This project develops a predictive model to estimate diamond prices based on characteristics like carat, cut, color, and clarity. It covers data preprocessing, feature engineering, model selection, training, and evaluation. The final product is a web app where users can input diamond attributes to get accurate and instant price predictions.

cross-validation css data-analysis data-science-projects data-visualization eda feature-engineering html hyperparameter-tuning jupyter-notebooks machine-learning ml-algorithms model-deployment model-selection performance-optimization predictive-modeling python python-app user-interface

Last synced: 14 Apr 2026

https://github.com/codeslash21/wrangle-twitter-archive

Wrangle Twitter Archive WeRateDog. WeRateDog has 8M followers and they rate the dogs with funny comments and unique rating system. Also use dog-breed classifier to predict dog's breed in the tweets.

data-analysis data-wrangling neural-networkt twitter-api twitter-archive

Last synced: 10 Apr 2025

https://github.com/codeslash21/wrangle_twitter_archive

Wrangle Twitter Archive WeRateDog. WeRateDog has 8M followers and they rate the dogs with funny comments and unique rating system. Also use dog-breed classifier to predict dog's breed in the tweets.

data-analysis data-wrangling nanodegree-project neural-network twitter-api twitter-archive

Last synced: 10 Apr 2025

https://github.com/spacebakery/nba-trends-project

Data Science Foundations I | Exploratory Data Analysis in Python | Summarizing Relationship Between Two Features

categorical-variables data-analysis data-visualization matplotlib nba-dataset quantitative-variables scipy seaborn subset summary-statistics

Last synced: 11 Mar 2025

https://github.com/virajbhutada/music-store-data-analysis-sql

Hands-on SQL data analysis project for music store. Enhance proficiency with database queries. Ideal for practitioners seeking real-world analytics experience. Gain insights into customer behavior, revenue trends, and genre preferences, empowering strategic decision-making in the music industry. Explore the project for a rich learning experience.

data-analysis data-insights data-science database genre-prediction music-industry music-store postgresql postgresql-database query-optimization revenue-trends sql sql-queries

Last synced: 01 May 2026

https://github.com/evanwporter/sloth

Faster Pandas Dataframe

cython data-analysis dataframe pandas

Last synced: 14 Mar 2025

https://github.com/samruddhi3012/public-health-data-analysis

Hi! This repo involves analyzing the Healthcare analytics using Advanced Microsoft Excel.

dashboard data-analysis data-visualization healthcare microsoft-excel pivot-chart pivot-tables vlookup

Last synced: 05 Feb 2026

https://github.com/namratagulati/fraud_detection

This fulfills all the requirements of a fraud detection model developed on linear regression using feature scaling, engineering and testing model with the help of auc-roc curve and others.

data-analysis data-visualization machine-learning machine-learning-algorithms machinelearning-python

Last synced: 04 Jun 2026

https://github.com/sarincr/training-on-artificial-intelligence

Entree Academy 10 Days free training on Artificial Intelligence. Course will be conducted in a Blended learning way with Daily one hour online training and 3 hour project based training

artificial-intelligence artificial-intelligence-algorithms data-analysis data-science data-visualization decision-trees deep-learning deeplearning logistic-regression machine-learning machine-learning-algorithms machinelearning num numpy pandas regression scikit-learn scipy sklearn

Last synced: 10 Apr 2026

https://github.com/shellynagar27/merchandise-sales-analysis

Merchandise Sales Analysis explores the sales trends of influencer Lee Chatmen’s merchandise using Power BI, and Power Query. The project uncovers key insights on revenue, product performance, location impact, shipping trends, and customer reviews.

critical-thinking data-analysis data-visualization figma powerbi powerquery problem-solving

Last synced: 07 Apr 2025

https://github.com/ankitpoddar07/sqlpizzas-saleproject

🍕 Pizza Sales Analysis with SQL

data-analysis database excel mysql powerbi ppt python

Last synced: 09 May 2026

https://github.com/akashprak/socialnetworkads

Predicting customer purchase behavior from the Social Network Ads dataset.

data-analysis machine-learning mlflow pandas python scikit-learn seaborn xgboost

Last synced: 30 Mar 2025

https://github.com/nmelgar/healthy_child_dataviz

Data visualization project to analyze what a healthy child is.

analysis data data-analysis data-science data-visualization dataviz research tableau visualization

Last synced: 23 Feb 2026

https://github.com/samwhaaa/da_portfolio

Showcasing some of my Data Analytics projects

data-analysis data-analytics data-visualization jupyter jupyter-notebook python

Last synced: 01 Mar 2025

https://github.com/satyam4229/prediction-of-cement-compressive-strength

Prediction of cement compressive strength is a model which is based on Regression model, Here we predict that how much is the compressive strength of the particular cement has with variety of mixtures of its component.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 13 Apr 2026

https://github.com/sadratehranian/data-collection-and-machine-learning

create a model using logistic regression to predict whether the fire alarm of a smoke detector should sound or not. Second, predicts whether an electric drive in a production plant may be faulty or not.

data data-analysis data-science datacollection logistic-regression machine-learning ml nn

Last synced: 05 Jan 2026

https://github.com/wilfordaf/dataanalyst-test

Test task for Junior Data Analyst position

data-analysis pandas python trading-data

Last synced: 28 Feb 2025

https://github.com/purposeachiever6/discovering_hidden_pattern

Discovering Hidden Patterns in Sequential and Numerical Data

data-analysis r statistical-analysis

Last synced: 28 Feb 2025

https://github.com/pratik-khose/realtime-sales-simulation

Power BI: Realtime Sales Simulation using SQL Server and Direct Query

data-analysis data-analytics data-visualization dax-query powerbi sql sql-server sqlserver

Last synced: 10 Jun 2026

https://github.com/robinmillford/cardiac-care-performance-dashboard

This project presents a comprehensive data analysis and interactive dashboard focused on Cardiac Surgery and Percutaneous Coronary Interventions (PCI) performance by hospital, spanning from 2008 onwards.

cardiac data-analysis data-visualization plotly-express streamlit-dashboard tableau tableau-public

Last synced: 07 Sep 2025

https://github.com/iness000/online-retail-customer-segmentation

This project performs comprehensive customer segmentation analysis on an online retail dataset using machine learning clustering techniques and RFM (Recency, Frequency, Monetary) analysis. The goal is to identify distinct customer segments to drive better customer relationship management strategies and business insights.

customer-segmentation data-analysis k-means

Last synced: 31 Aug 2025

https://github.com/als8446/tripleten-data-science-projects

Projects Overview Projects made in the Data Scientist course from TripleTen LatAm

data data-analysis hypothesis-tests machine matplotlib numpy pandas python scipy sklearn

Last synced: 10 Apr 2026

https://github.com/kalyan4636/chocos-sales-analysis-report-and-dashboard.-

📊 Built using Power BI, this dashboard delivers actionable insights to boost strategic decision-making. Would you like me to include GitHub tags or a project description for the README as well?

bussiness-analyst data-analysis data-visualization dataanalyst microsoft-power-bi powerbi

Last synced: 26 Jan 2026

https://github.com/ariyaarka/result-analysis

A simple analysis of result based on different factors shown in figures

data-analysis jupyter-notebook matplotlib numpy-library pandas-dataframe python seaborn

Last synced: 01 May 2026

https://github.com/bpkaur/a-network-analysis-of-game-of-thrones

A Network analysis of Game of Thrones: To analyze the co-occurrence network of the characters in the Game of Thrones books

data-analysis data-science machine-learning networkx python3

Last synced: 01 May 2026

https://github.com/mysftz/statistical-analysis

A in-depth review of statistical analysis in Python from datasets.

data-analysis python python3 statistics university university-project

Last synced: 14 May 2025

https://github.com/dnut/associations

Python 3 library to identify high-dimensional statistical relationships in any data set.

analytics arch-linux association-rules data data-analysis data-mining data-science machine-learning python-modules

Last synced: 01 May 2026

https://github.com/dcs-training/introtostatistics

This is a repository which contains all the materials to be used in the introduction to statistics course. Go to the readme file

data-analysis r rmarkdown statistics

Last synced: 26 Mar 2025

https://github.com/auliannee/customer-analysis-with-tableau

This repository contains the data source and the tableau workbook.

data-analysis data-visualization tableau

Last synced: 12 Mar 2026

https://github.com/as16082023/motor-vehicle-thefts

Using SQL to analyze vehicle theft patterns across New Zealand, focusing on trends related to specific times and locations.

data-analysis mysql sql

Last synced: 10 Apr 2025

https://github.com/leandrocollares/home-team-advantage-in-epl

Home team advantage in the English Premier League: an exploratory data analysis

data-analysis matplotlib pandas plotly

Last synced: 11 Jun 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/myounesdev/authorgraphanalyzer

a web-based visualization tool for analyzing and exploring author collaboration networks

algorithms binary-tree bts d3js data-analysis dijkstra-algorithm django exception-handling pandas python scss

Last synced: 08 Jun 2026

https://github.com/caesaredia/la-cafe-market-analysis

A data-driven feasibility study exploring the potential of launching a robot-staffed café in Los Angeles, based on real F&B business data.

business-intelligence cafe data-analysis data-visualization food-industry franchise los-angeles market-research pandas python

Last synced: 01 May 2026

https://github.com/sairupeshl/leo-orbital-congestion-analysis

Geospatial data analysis of the UCS Satellite Database using Python to map active LEO space assets, validate orbital parameters, and isolate mega-constellation traffic bottlenecks.

aerospace-engineering data-analysis geospatial-analysis orbital-mechanics pandas python satellite-data seaborn

Last synced: 08 Jun 2026

https://github.com/mysftz/numerical-methods-in-matlab

Multiple MatLab scripts over multiple data analysis assignments.

data-analysis data-science matlab university university-assignment

Last synced: 14 May 2025

https://github.com/manjit-baishya-datascience/flipkart-laptop-listing-eda

This project analyzes laptop price data from Flipkart using AutoScraper for web scraping. It includes data loading, EDA, cleaning, statistical analysis, and visualization. The goal is to derive insights for pricing strategies and market positioning. Explore the repository for detailed documentation and code.

data-analysis ecommerce-platform flipkart laptop python

Last synced: 08 Jun 2026

https://github.com/juanmerino89/data-job-market-analysis-project

Análisis completo del mercado laboral a través de datos abiertos, scraping y visualizaciones. Proyecto explicado paso a paso en mi canal de YouTube.

career-insights data-analysis data-science job-data job-market jupyter-notebook machine-learning market-trends open-data portfolio-project python salary-analysis visualization web-scraping youtube-project

Last synced: 18 May 2026

https://github.com/deanlogan/data-analysis-course

Code created when completing the Data Analysis with Python Course on freecodecamp.org

course data-analysis numpy pandas python python3

Last synced: 06 May 2026

https://github.com/singhs05/global-youtube-trends

Understand the impact of Likes, comments, dislikes on the video consumption for the videos that were trending.

data-analysis mssqlserver query sql

Last synced: 18 Mar 2026

https://github.com/pratanup/solar-power-generation-prediction

A solar power generation company wants to optimize solar power production and needs the prediction model to predict ‘Clearsky DHI’, ‘Clearsky DNI’, ‘Clearsky GHI’.

anaconda data-analysis data-science google-colab jupiter-notebook machine-learning machine-learning-algorithms machinelearning-python prediction prediction-model python

Last synced: 01 May 2026

https://github.com/soypete/example-go-dataframes-parser

example of https://godoc.org/github.com/kniren/gota/dataframe

data-analysis data-science datastructures golang-examples ml

Last synced: 12 Sep 2025

https://github.com/ryanbbrown/volleyball-analysis-project

Analyzes 10 years of self-collected men's NCAA volleyball player height and team wins data to determine the importance of height for success.

data-analysis data-visualization python volleyball

Last synced: 31 May 2026

https://github.com/codesaadumair/data-science-monorepo

Comprehensive Data Science monorepo featuring EDA, Machine Learning, Preprocessing, Feature Engineering, and Visualization projects with Jupyter notebooks and Python.

data-analysis data-science data-science-projects data-visualization eda jupyter-notebook jupyterlab machine-learning python

Last synced: 01 May 2026

https://github.com/linguini1/edueval

The BorealisAI Let's Solve It mentorship project: summarizing student feedback submissions on their professor into one cohesive paragraph for faculty consideration during performance reviews.

ai data data-analysis data-science machine-learning machinelearning nlp python pytorch sentiment-analysis

Last synced: 01 May 2026

https://github.com/devanshsahu47/talentscape-glassdoor-analysis

TalentScape is an end-to-end Python project that cleans and analyzes a comprehensive Glassdoor Jobs dataset. It features robust data wrangling and 20 insightful visualizations to uncover trends in job titles, salary ranges, company ratings, and more—providing actionable recommendations to optimize recruitment and compensation strategies.

business-intelligence data-analysis data-vizualisation jupyter-notebook python3

Last synced: 15 May 2026

https://github.com/firetyrant/sql-portfolio-projects

Documenting my SQL learning journey with hands-on projects focused on data cleaning, analysis, and optimization.

bigquery data-analysis databases etl learning portfolio query-optimization sql

Last synced: 19 Apr 2026

https://github.com/xre22zax/airline-analysis

Travel agency and need to know the ins and outs of airline prices for your clients

data-analysis data-visualization python python3 visualization

Last synced: 13 Apr 2026

https://github.com/esther-poniatowski/multitask-context-dependent-behavior

Data analysis of neuronal recordings in naive and trained animals performing multiple tasks in active and passive attentional states

cognitive-neuroscience computational-neuroscience data-analysis data-visualization information-processing

Last synced: 26 Mar 2025

https://github.com/poglolopez/prueba_tecnica_inlaze

Este repositorio muestra mis habilidades en análisis de datos a través de una prueba técnica para Inlaze. Incluye flujos de trabajo con Python, SQLite y Power BI para analizar el comportamiento de jugadores, depósitos y rendimiento de fuentes de tráfico, destacando eficiencia operativa e información estratégica.

data-analysis data-v etl jupyter powerbi python sqlite

Last synced: 26 Feb 2025

https://github.com/luminati-io/walmart-dataset-samples

A sample dataset of over 1000 Walmart products, extracted using the Bright Data API, ideal for consumer market insights and competitor analysis.

api data-analysis dataset walmart walmart-scraper web-scraping

Last synced: 04 Jan 2026

https://github.com/satyam4229/prediction-of-different-diseases

Prediction of the different diseases with the help of different symptoms express the diseases in the real time. In the dataset, there are 132+ different symptoms on which the model is trained to give the best result of the disease.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 13 Apr 2026

https://github.com/ragedunicorn/mantisx-notebook

A repository for Jupyter notebooks analysing mantisx data

data-analysis data-visualization mantis mantisx shooting training

Last synced: 24 Jul 2025

https://github.com/roshaka/samplr

Samplr is a Python decorator for selecting a subset of items from a list, with options for customisation and informative console printouts.

data data-analysis data-engineering decorators list python sampling

Last synced: 14 Jan 2026

https://github.com/agdturner/ccg-data

A modularised Java library for processing data sets with classes for: data records; collections of data records; and identifiers.

data data-analysis

Last synced: 12 Jan 2026

https://github.com/rainbowatcher/simple

Make data work easier, saving your working time

bigdata data-analysis etl

Last synced: 10 Apr 2025

https://github.com/mattholy/haka

HaKa is an out-of-the-box tool system designed for data engineers and data analysts in medium-sized enterprises. It is easy to deploy and scale.

celery data-analysis data-engineering fastapi python uvicorn-gunicorn

Last synced: 19 May 2026

https://github.com/obirikan/u.s.-county-commute-data-analysis

This project extracts and analyzes U.S. county-level commuting data from the 2020 American Community Survey (ACS 5-Year Estimates) via the U.S. Census Bureau API.

data-analysis

Last synced: 28 Jun 2025

https://github.com/karlyndiary/adidas-sales-analysis

Analyzed Adidas' product sales performance, top retailers, monthly trends, yearly growth, regional distribution, and pricing insights. Performed ETL from Python (Pandas) to SQL Server, extracted data with SQL, and visualized key insights in Excel.

adidas-sales-analysis adidas-sales-dashboard dashboard data-analysis data-cleaning data-pipeline data-visualization etl excel-dashboard microsoft-excel microsoft-sql-server python

Last synced: 10 Feb 2026

https://github.com/parthshah02/customer_churn_dashboard

This repository features a comprehensive project showcasing data analysis and interactive dashboard using Python

data-analysis matplotlib numpy pandas python

Last synced: 13 Apr 2026