An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/robinmillford/sales-metrics-dashboard-streamlit

This Streamlit dashboard provides an interactive and comprehensive analysis of customer behavior, regional sales trends, and revenue insights. The dashboard enables businesses to identify key performance metrics, customer segments, and revenue drivers, supporting data-driven decision-making.

dashboard data-analysis data-visualization duckdb sales-analysis sales-dashboard streamlit-dashboard

Last synced: 19 Apr 2026

https://github.com/decepticon-ts/cap-ai-studio

Description: A modern, powerful web application for advanced image analysis and batch processing, featuring real-time AI-powered image captioning, comprehensive reporting, and an intuitive user interface. Built with Streamlit and Google's Gemini API.

artificial-intelligence batch-processing computer-vision data-analysis gemini-api image-processing image-processing-python python streamlit streamlit-webapp threading

Last synced: 19 Apr 2026

https://github.com/samwhaaa/superfoodsmax

A customer demographic & spending trend analysis on the fictional SuperFoodsMax grocery chain

data-analysis data-analytics data-visualization jupyter jupyter-notebook python

Last synced: 20 Apr 2026

https://github.com/jerinpious/movie-recommendation-system

A content-based movie recommendation system built using Python. The system processes movie data, extracts relevant features, and provides recommendations based on user preferences

content-based-recommendation data-analysis jupyter-notebook machine-learning pandas python streamlit

Last synced: 20 Apr 2026

https://github.com/xre22zax/roller-coaster

Explore award-winning wood and steel coasters from 2013-2018 Golden Ticket Awards & Captain Coaster, all powered by Python and interactive visualizations.

analytics data-analysis data-visualization pandas python python-lambda python3 visualization

Last synced: 20 Apr 2026

https://github.com/abinashsahoo007/project-bankruptcy-prevention

The project is to create a classification model that predicts the chances of a business facing bankruptcy based on the key feature like Industrial Risk, Management Risk, Financial Flexibility, Credibility, Competitiveness, Operating Risk.

data-analysis data-mining data-visualization deployments eda machine-learning pickle python statistics streamlit

Last synced: 20 Apr 2026

https://github.com/mahmoudwal27/e-commerce-data-analysis

A collection of data analysis and visualization projects focused on ecommerce datasets. Using Python in Google Colab for analysis and Excel for exploration, these projects uncover key insights and trends, showcasing expertise in data manipulation and visualization to inform business decisions.

analytics data-analysis data-analysis-python data-set google-cloud python

Last synced: 21 Apr 2026

https://github.com/meerantajalli/networksecuritydefense

This Network Security defense systems acts as an indicator against SMP Floods, UDP Floods, ICMP Floods. This model is trained using packets from wireshark and can easily differentiate between normal network traffic and traffic that has been targetted on the machine by an attacker using the rate of packets transfer and using the source IP.

anomaly-detection classification cyber-security data-analysis ddos-detection icmp-flood intrusion-detection machine-learning network-security packet-analysis python random-forest security smp-flood udp-flood wireshark

Last synced: 21 Apr 2026

https://github.com/tmmvn/analytics-notebooks

A bunch of data analytics notebooks done testing out JetBrains DataLore

ai algorithms data-analysis datalore elements-of-ai helsinki-university-mooc python

Last synced: 22 Apr 2026

https://github.com/thinogueiras/jornada-python

Jornada Python - Hashtag Programação.

data-analysis data-science inteligencia-artificial python rpa

Last synced: 22 Apr 2026

https://github.com/rorrell/lifeexpectancy

A Jupyter Notebook where I create a chart with two line plots on it to check out the life expectancy of men vs. women from 1900-2018

data-analysis data-visualization jupyter-notebook python3

Last synced: 22 Apr 2026

https://github.com/ayushi-gajendra/restaurant-order-analysis-sql

End-to-end SQL analysis of 12,266 restaurant transactions to identify high-performing menu items, revenue concentration, bulk ordering behavior, and strategic growth opportunities.

analytics-portfolio business-intelligence case-study customer-segmentation data-analysis data-analytics database-analysis menu-engineering mysql revenue-analysis sql sql-project

Last synced: 05 Jun 2026

https://github.com/floffah/my-listening

Various ways to analyse your Spotify extended streaming history data

convex data-analysis listening-history spotify

Last synced: 23 Apr 2026

https://github.com/syed-nihaal/car-price-prediction-and-performance-analysis

A data science notebook project focused on analyzing car features and building a model for car price prediction.

data data-analysis data-visualization jupyter-notebook python

Last synced: 23 Apr 2026

https://github.com/datalopes1/bank_marketing

Este projeto será baseado no Dataset Bank Marketing encontrado na UC Irvine - Machine Learning Repository e disponibilizado por S. Moro, R. Laureano e P. Cortez

data-analysis data-science data-visualization eda python

Last synced: 24 Apr 2026

https://github.com/voidnire/redditviralmysteryposts

Análise de posts de subreddits de mistério. O que define um post viral neste tipo de sub?

data-analysis data-visualization mysteries mystery nlms python-3 reddit

Last synced: 24 Apr 2026

https://github.com/muthukumar0908/youtube-data-harvesting-and-warehousing-using-sql-mongodb-and-streamlit

Create a simple and intuitive user interface using Streamlit, From the youtube getting and extracting the data by using API key. That data stored in database.

data-analysis mongodb-atlas python sqldatabase streamlit-webapp youtube-api

Last synced: 24 Apr 2026

https://github.com/cyberoctane29/python-for-data-analysis

A repository dedicated to learning Python for data analysis, data science, and data analytics. This collection of Jupyter notebooks covers practical exercises and concepts from the Google Advanced Data Analytics Professional Certificate program.

data data-analysis data-analytics data-science python

Last synced: 24 Apr 2026

https://github.com/gnodux/adb-link

An MCP server that connects to multiple databases. Supports access control and dynamic SQL query tool registration and invocation.

agent ai-tools data-analysis database-gateway go mcp mcp-server

Last synced: 06 Jun 2026

https://github.com/ismielabir/pycsvsummarizer

A lightweight tool to summarize CSV files using various features.

csv data-analysis data-summary python

Last synced: 25 Apr 2026

https://github.com/fbarffmann/belly-button-challenge

Built an interactive JavaScript dashboard to visualize bacterial biodiversity from belly button samples. Analyzed data from 153 participants and identified OTU 1167 as the most common bacteria.

biodiversity dashboard data-analysis data-visualization interactive-charts javascript json plotly

Last synced: 25 Apr 2026

https://github.com/m-biriulova/python-job-market-analysis

Web scraping, data analysis, and visualization of Python developer vacancies in Czech Republic.

automation beautifulsoup data-analysis data-visualization portfolio-project python selenium web-scraping

Last synced: 25 Apr 2026

https://github.com/aastopher/mma_outcome

Simple exploratory analysis of UFC Fights and Vegas fight odds from 1993 to 2021

data-analysis data-visualization

Last synced: 06 Jun 2026

https://github.com/devexpress-examples/winforms-create-a-custom-exporter-for-pivotgridcontrol-with-xtrareport

This example illustrates how to dynamically create a custom report based on PivotGridControl content in WinForms.

data-analysis dotnet pivot-grid pivot-grid-for-winforms winforms

Last synced: 26 Apr 2026

https://github.com/dcs-training/2023-10-22-carpentry-social-science

Go to https://dcs-training.github.io/2023-10-22-Carpentry-Social-Science/ to follow along the material

data-analysis data-visualisation data-wrangling intro-to-programming r

Last synced: 06 Jun 2026

https://github.com/rociobenitez/happiness-index-data-processing

Repository for Big Data Processing - Contains Jupyter Notebooks and Datasets for data analysis and processing tasks related to Big Data.

big-data big-data-processing data-analysis data-processing happiness-index happiness-report jupyter-notebook matplotlib pandas seaborn

Last synced: 15 May 2026

https://github.com/arush-codes/paris-olympic-de

data engineering project on paris olympics 2024

azure data-analysis data-engineering microsoft-azure olympics2024 pipeline

Last synced: 27 Apr 2026

https://github.com/airdac/ml-palmerpenguins

Classification and analysis of the palmerpenguins dataset in Python. Team project from UPC's Master's Degree in Data Science

classification data-analysis data-science machine-learning palmer-penguin python upc

Last synced: 07 Jun 2026

https://github.com/l2nce/datamining-study

Introduction to data mining

data-analysis data-mining matplotlib numpy panda

Last synced: 28 Apr 2026

https://github.com/elmezianech/autoinventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

automation dashboard data-analysis data-engineering-pipeline docker etl glue-job inventory-management kafka kpis lambda-functions lstm ml-pipeline mlflow power-bi pytorch redshift s3 streamlit warehouse-management

Last synced: 28 Apr 2026

https://github.com/dcs-training/decode-winterschool

In here you can find material on cluster analysis, data wrangling, and network analysis. Go to the readme file for more info

data-analysis data-visualisation data-wrangling gephi network-analysis python r statistics

Last synced: 28 Apr 2026

https://github.com/ericdataplus/kaggle-airbnb-nyc

NYC Airbnb Market Analysis: Multi-source from 2 Kaggle datasets (151K listings)

airbnb data-analysis kaggle nyc python visualization

Last synced: 28 Apr 2026

https://github.com/prady2309/sales-prediction-using-python

Implemented using Multiple Linear Regression

data-analysis data-science machine-learning python

Last synced: 29 Apr 2026

https://github.com/thanaraklee/pyspark-dataframe-operations

This project focuses on utilizing PySpark DataFrames to analyze and visualize data sourced from external datasets, such as CSV files. It provides a practical example of how to manipulate, transform, and gain insights from large datasets using the PySpark framework.

data-analysis dataframe pyspark python

Last synced: 29 Apr 2026

https://github.com/marcinz20/anomaly-detection-in-credo-dataset

University project, which goal is to build a system, that detects anomalies in CREDO dataset

credo data-analysis data-science encoder-decoder-model jupiter-notebook pca-analysis python3

Last synced: 29 Apr 2026

https://github.com/vanshuchaudhary/zomato

This Jupyter Notebook contains an exploratory data analysis (EDA) of Zomato restaurant data. It includes data cleaning, visualization, and insights into restaurant ratings, pricing, cuisine distribution, and location-based trends.

business-analytics data-analysis data-mining data-science data-visualization datascience matplotlib pandas-dataframe pandas-python python python-3 python-library

Last synced: 29 Apr 2026

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/findmyway/dataframe-in-julia

A quick introduction of DataFrame in Julia for users from Python

data-analysis dataframe julia jupyter-notebook

Last synced: 29 Apr 2026

https://github.com/carlos-edulira/mbabigdata-projeto

Entrega do projeto MBA Unipe Big Data BI

data-analysis delta minio python spark

Last synced: 29 Apr 2026

https://github.com/jofaval/melbourne-temperature-timeseries

Timeseries Data Analysis and Forecasting of the daily min temperature in Melbourne from 1981 to 1990

data-analysis data-science data-visualization deep-learning google-colab melbourne python temperature tensorflow timeseries timeseries-analysis

Last synced: 29 Apr 2026

https://github.com/valikmorinko/ecommerce-sales-analysis

Анализ продаж e-commerce: данные, визуализации, аналитические выводы.

data-analysis e-commerce jupyter matplotlib pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/saikumar787/car_price_prediction_using_linear-regression

A machine learning project to predict the selling price of used cars using regression techniques. Includes data preprocessing, model training, evaluation, and testing on new data.

car-price-prediction-with-machine-learning data-analysis joblib jupiter-notebook linear-regression-models model-deployment python scikit-learn standardscaler

Last synced: 29 Apr 2026

https://github.com/istinnew/eniac_ab_insight

Dive into a comprehensive analysis aimed at boosting iPhone 13 sales by optimizing the Click-Through Rate (CTR) of the “SHOP NOW” button, compare different button designs and determine the most effective strategy for increasing engagement.

ab-testing data data-analysis data-engineering data-science data-visualization google googlecolab libraries python testing testing-tools visual-studio-code

Last synced: 29 Apr 2026

https://github.com/prithviraj-2003/cognifyz-data-science-internship

🎓 Data Science Internship at Cognifyz Technologies 📅 Duration: 2 Months 🧠 Worked on real-world restaurant data 🗂️ Completed structured tasks across 3 levels 📌 Tasks focused on EDA, data preprocessing, visualization, and analysis 📎 Task descriptions provided in an attached PDF

data-analysis data-science data-visualization matplotlib numpy pandas python3

Last synced: 29 Apr 2026

https://github.com/farhad-here/student_performance_analyzer

Student Performance Analyzer with python, it is on of my data analysis course project. I teach you about filter(),lambda,map() in python

data-analysis data-visualization filter kaggle kaggle-dataset lambda map pandas python python-tutorial streamlit

Last synced: 29 Apr 2026

https://github.com/avazasgarov/soccer-hypothesis-testing

Statistical analysis comparing goal-scoring patterns in Men’s vs. Women’s FIFA World Cups using hypothesis testing.

data-analysis eda hypothesis-testing matplotlib-pyplot pandas pingouin python scipy

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/farhad-here/id_validator

Iranian National ID Validator. This was one of my data analysis project for the course i had.

data-analysis identity idverification object-oriented-programming oop oops-in-python python streamlit

Last synced: 30 Apr 2026

https://github.com/mfakhriazhar/nlp-movie-recommender-system

This project is a content-based movie recommender system built using Natural Language Processing (NLP) techniques. By extracting and combining important text features from movie metadata, this system suggests movies that are similar to a user's selected title.

data-analysis data-science deep-learning machine-learning natural-language-processing python recommender-system

Last synced: 30 Apr 2026

https://github.com/busra-deveci/kaggle-iris_data_analysis

Exploratory data analysis and visualization of the Iris dataset using Python.

data-analysis iris-dataset kaggle pandas python seaborn visualization

Last synced: 30 Apr 2026

https://github.com/fazatholomew/marlboroplan

In order to contribute to a more inclusive sustainable energy program in Massachusetts, this project is part of my work for a nonprofit organization called All In Energy and undergraduate thesis for my degree.

data-analysis data-visualization energy jupyter-notebook massachusetts python

Last synced: 01 May 2026

https://github.com/shruti-h/netflix-eda

Exploratory Data Analysis on Netflix Movies & TV Shows dataset using Python, Pandas, Matplotlib, and Seaborn

data-analysis data-science eda matplotlib netflix pandas-library python seaborn

Last synced: 01 May 2026

https://github.com/bpkaur/a-network-analysis-of-game-of-thrones

A Network analysis of Game of Thrones: To analyze the co-occurrence network of the characters in the Game of Thrones books

data-analysis data-science machine-learning networkx python3

Last synced: 01 May 2026

https://github.com/myounesdev/authorgraphanalyzer

a web-based visualization tool for analyzing and exploring author collaboration networks

algorithms binary-tree bts d3js data-analysis dijkstra-algorithm django exception-handling pandas python scss

Last synced: 08 Jun 2026

https://github.com/manjit-baishya-datascience/flipkart-laptop-listing-eda

This project analyzes laptop price data from Flipkart using AutoScraper for web scraping. It includes data loading, EDA, cleaning, statistical analysis, and visualization. The goal is to derive insights for pricing strategies and market positioning. Explore the repository for detailed documentation and code.

data-analysis ecommerce-platform flipkart laptop python

Last synced: 08 Jun 2026

https://github.com/fbarffmann/project1

Analyzed factors influencing movie profitability using Python. Cleaned and visualized film industry data to uncover trends in budgets, sales, genres, and ratings.

box-office-analysis data-analysis data-visualization matplotlib movie-industry pandas python regression seaborn

Last synced: 01 May 2026

https://github.com/linguini1/edueval

The BorealisAI Let's Solve It mentorship project: summarizing student feedback submissions on their professor into one cohesive paragraph for faculty consideration during performance reviews.

ai data data-analysis data-science machine-learning machinelearning nlp python pytorch sentiment-analysis

Last synced: 01 May 2026

https://github.com/rafath0ssain/predihome

Data analysis using economic factors affecting living conditions across Canadian provinces.

data-analysis data-visualization dplyr ggplot2 graph kaggle linear-regression prediction-model r shiny tidyr

Last synced: 01 May 2026

https://github.com/kheriberto/pandas_and_seabron_project

In this project I showcase my ability using pandas and seaborn to mold, transform and plot data.

data-analysis pandas python seaborn

Last synced: 01 May 2026

https://github.com/guptakushal03/whatsapp-chat-analyser

The WhatsApp Chat Analyzer is a Python-based tool built with Streamlit for analyzing WhatsApp chat data. It provides insights such as total messages, word count, media shared, links shared, monthly activity timeline, most active users, activity maps, and word clouds.

chat-analysis data-analysis data-visualization python streamlit text-processing whatsapp word-cloud

Last synced: 01 May 2026

https://github.com/ujjwalll/get-that-flair

It is a repository for project detecting the flair of reddit post through their links. You can find the working model of it at - https://get-that-flair.herokuapp.com/

data-analysis data-visualization django-application herokuapp machine-learning naive-bayes-classifier praw-reddit python3 random-forest reddit-api sentiment-analysis topic-modeling

Last synced: 01 May 2026

https://github.com/harshindcoder/salifort_motors_project

This people analytics project analyzes factors influencing employee turnover and predicts whether an employee is likely to leave. It aims to uncover patterns behind departures, helping Salifort improve retention, workplace culture, and professional growth strategies.

data-analysis data-science data-visualization hr-analytics machine-learning tree-models

Last synced: 02 May 2026

https://github.com/shreeparab1890/unicorns-of-india-till-sep-2022-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Unicorns of India till Sep 2022.

analysis data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 02 May 2026

https://github.com/bhawnagoyal18/ai-doctor-a-symptom-checker-disease-predictor

AI Doctor is an intelligent healthcare application that utilizes machine learning (ML) and Python to predict potential diseases based on user-input symptoms. The project integrates data from multiple medical datasets and provides an interactive web-based UI for an intuitive user experience.

data-analysis data-engineering data-visualization dataset flask html5 machine-learning python sql stacking statistics

Last synced: 02 May 2026

https://github.com/isaqueiros/motorpremium-predictions-mlpclassifier

This Jupyter Notebooks is an initial study of the application of sklearn neural network MLP Classifier model. The model is applied to dataset MotorPremiums, which is supplied separately in .csv format.

data-analysis data-science machine-learning neural-network python sklearn-library

Last synced: 02 May 2026

https://github.com/benzerinsio/breastcancer-eda

📊 Análise Exploratória de Dados (EDA) - Câncer de Mama | Exploração de características clínicas para identificar padrões e relações no diagnóstico de câncer de mama.

analise-de-dados analise-exploratoria analise-exploratoria-de-dados data-analysis data-visualization diagnosis eda exploratory-data-analysis health-care medical-data python seaborn

Last synced: 02 May 2026

https://github.com/bhavna-kale/cars-eda-project

Project analyzing used car market data to identify high-impact price drivers and depreciation curves, presented through an interactive web application.

data-analysis excel matplotlib numpy pandas python3 searborn streamlit

Last synced: 03 May 2026

https://github.com/ahmedhosssam/lesser_pandas

Pandas-like Data Analysis library in C++

cpp data-analysis data-science pandas

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/emredemirbas/movie-ratings-analysis

A data analysis project investigating potential bias in movie ratings from 2015, comparing them with ratings from other platforms using Python, pandas, and visualization libraries.

data-analysis matplotlib pandas python seaborn

Last synced: 03 May 2026

https://github.com/devlucho/modelos-predictivos

Modelos predictivos utilizando los algoritmos de Regresión Lineal, Regresión Logística y Árboles de Decisión.

data-analysis jupyter-notebook python3

Last synced: 03 May 2026

https://github.com/salma-mamdoh/project-writing-functions-for-product-analysis

My Project to learn the Basics of Analysis on DataCamp

data-analysis data-camp pandas python

Last synced: 03 May 2026

https://github.com/ljadhav25/swiggy-restaurant-analysis

This repository contains data and analysis related to restaurants listed on Swiggy, one of India's largest online food ordering and delivery platforms. The objective is to explore restaurant trends, customer reviews, pricing strategies, and delivery metrics to gain insights into the food delivery industry.

data-analysis data-visualization matplotlib-pyplot numpy-library pandas-library python seaborn-plots

Last synced: 03 May 2026

https://github.com/syarwinaaa09/analyzing-crime-in-los-angeles

Exploratory data analysis of Los Angeles crime data with insights on temporal patterns, locations, and age demographics.

crime-data data-analysis eda los-angeles pandas public-safety python visualization

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/r13i/cheapest-phone-call

Small challenge to find the best phone operator to use based on call price

big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist

Last synced: 04 May 2026

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: 04 May 2026

https://github.com/mchenryspagg/investigate_a_dataset

This is a data analysis project that demonstrates the student's ability to use python data analysis libraries such as pandas, numpy and pyplot in matplotlib to investigate a dataset and answer specific questions from the dataset, thus demonstrating skills in data cleaning, data wrangling, and exploratory data analysis.

data-analysis datetime descriptive-analysis descriptive-statistics exploratory-data-analysis numpy pandas pyplot python visualization

Last synced: 04 May 2026

https://github.com/fatihilhan42/book-recommendation-system-with-python

In this project, we are making a book recommendation system that recommends similar books according to the genres or ratings that the user enters, using a large book dataset. The link of the dataset is given below. Happy reading...

books data-analysis data-science data-visualization kaggle python recommendation-engine recommendation-system

Last synced: 04 May 2026

https://github.com/jatin-mehra119/flight-price-prediction

This study aims to analyze flight booking data from "Ease My Trip" website, using statistical tests and linear regression to extract insights. By understanding this data, valuable information can be gained to benefit passengers using the platform.

data-analysis datacleaning datavisualization machine-learning preprocessing-data python sklearn-pipeline sklearn-regression-algorithm streamlit-webapp

Last synced: 04 May 2026