An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/ahshah322/world-happiness-report-2025

Data analysis and visualization of the World Happiness Report 2025 using Python (pandas, seaborn, matplotlib). Explores how GDP, health, freedom, generosity, and corruption perception influence global happiness.

data-analysis data-science matplotlib numpy pandas python seaborn worldhappiness

Last synced: 29 Apr 2026

https://github.com/farhad-here/student_performance_analyzer

Student Performance Analyzer with python, it is on of my data analysis course project. I teach you about filter(),lambda,map() in python

data-analysis data-visualization filter kaggle kaggle-dataset lambda map pandas python python-tutorial streamlit

Last synced: 29 Apr 2026

https://github.com/alam025/invoice-generator

Processed 500+ invoices with automated payment reminders and multi-currency PDF generation

api data-analysis finance fintech nextjs pdfkit prisma python stripe

Last synced: 08 Jun 2026

https://github.com/jayita11/customer-engagement-insights-for-yelp-restaurant-business-success

This project analyzes Yelp restaurant data using SQLite, Python, and Tableau to explore user engagement, reviews, and ratings. It provides insights into restaurant success across cities, regions, and user behavior.

customer-engagement data-analysis interactive-visualizations json python ratings review sqlite3 tableau-dashboards-for-data-visualization yelp-restaurants

Last synced: 12 May 2026

https://github.com/angchekar28/air-quality-index-analysis

This project analyzes Air Quality Index (AQI) data to identify pollution trends, seasonal variations, and the impact of different pollutants. It includes data visualization, correlation analysis, and insights into air quality variations over time.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook machine-learning python

Last synced: 30 Apr 2026

https://github.com/avazasgarov/soccer-hypothesis-testing

Statistical analysis comparing goal-scoring patterns in Men’s vs. Women’s FIFA World Cups using hypothesis testing.

data-analysis eda hypothesis-testing matplotlib-pyplot pandas pingouin python scipy

Last synced: 30 Apr 2026

https://github.com/aishwaryagade02/loan-funnel-optimization-analysis

Tracks how loan applications move through each stage, helps spot where people drop off, and gives clear insights to improve approval strategies and overall performance.

ab-testing data-analysis data-creation hypothesis-testing python reporting sql statistical-methods streamlit

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/ygalvao/bra_scraper_2022

A web scraper bot for the 2nd round of the 2022 Brazilian Federal Elections.

data-analysis data-analytics selenium web-scraper webscraper

Last synced: 12 May 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/mfakhriazhar/nlp-movie-recommender-system

This project is a content-based movie recommender system built using Natural Language Processing (NLP) techniques. By extracting and combining important text features from movie metadata, this system suggests movies that are similar to a user's selected title.

data-analysis data-science deep-learning machine-learning natural-language-processing python recommender-system

Last synced: 30 Apr 2026

https://github.com/min-thway-htut/r-programming

Repository for R-Programming

data-analysis r-programming

Last synced: 10 Jun 2026

https://github.com/ahmedtaher10/covid-19-cases

The data we are using contains the data on covid-19 cases and their impact on GDP from December 31, 2019, to October 10, 2020.

data-analysis python visualization

Last synced: 30 Apr 2026

https://github.com/abhi227070/ipl-2024-sold-player-data-analysis

This project analyzes IPL 2024 auctioned players' data, including name, team, cricket type, nationality, and price. Users input a player's name to access team, style, nationality, and auction price, aiding research and fantasy leagues. It offers insights into player dynamics, serving cricket enthusiasts with comprehensive data exploration.

data-analysis data-visualization dataanalytics machine-learning machine-learning-algorithms python3

Last synced: 30 Apr 2026

https://github.com/fbarffmann/credit-risk-classification

Classified 19,000+ loans as high-risk or healthy using logistic regression. Achieved 100% precision for healthy loans and 84% precision for high-risk loans.

classification credit-risk data-analysis logistic-regression machine-learning model-evaluation pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/sakan811/honkai-star-rail-a-few-fun-insights-with-data-analysis

The project gives insights that delve into the Honkai Star Rail's character's stats of all available characters as of the given date.

data data-analysis data-science data-visualization docker flask game honkai honkai-star-rail honkai-starrail seaborn webscraping webscraping-data webscraping-selenium

Last synced: 10 Jun 2026

https://github.com/syarwinaaa09/investigating-netflix-movies

🎬 investigating netflix movie trends using python and pandas 📊

csv data-analysis matplotlib netflix pandas visualization

Last synced: 01 May 2026

https://github.com/fazatholomew/marlboroplan

In order to contribute to a more inclusive sustainable energy program in Massachusetts, this project is part of my work for a nonprofit organization called All In Energy and undergraduate thesis for my degree.

data-analysis data-visualization energy jupyter-notebook massachusetts python

Last synced: 01 May 2026

https://github.com/filip-kustura/python-covid-19-behaviors-analysis

Using Jupyter Notebook, this university project analyzes attitudes and behaviors related to the COVID-19 pandemic using a two-year survey from Imperial College London and YouGov research company. Utilizing Pandas, NumPy and Matplotlib, the data analysis focuses on three countries, exploring trends and insights throughout the pandemic.

covid-19 data-analysis data-visualization jupyter-notebook matplotlib numpy pandas python university-project

Last synced: 12 Apr 2026

https://github.com/shruti-h/netflix-eda

Exploratory Data Analysis on Netflix Movies & TV Shows dataset using Python, Pandas, Matplotlib, and Seaborn

data-analysis data-science eda matplotlib netflix pandas-library python seaborn

Last synced: 01 May 2026

https://github.com/yash-3-bit/online-sales-analysis

Project-Merging the different months datasets and performing the data cleaning ,Analysis and Visualization

data-analysis data-visualization pandas-library

Last synced: 27 Mar 2025

https://github.com/elishah-john/happiness-report-2019

Analysis of "Happiness Report 2019" using python.

data-analysis data-visualization educational jupyter-notebook python

Last synced: 12 May 2026

https://github.com/virajbhutada/music-store-data-analysis-sql

Hands-on SQL data analysis project for music store. Enhance proficiency with database queries. Ideal for practitioners seeking real-world analytics experience. Gain insights into customer behavior, revenue trends, and genre preferences, empowering strategic decision-making in the music industry. Explore the project for a rich learning experience.

data-analysis data-insights data-science database genre-prediction music-industry music-store postgresql postgresql-database query-optimization revenue-trends sql sql-queries

Last synced: 01 May 2026

https://github.com/johannaschmidle/amazon-cat-couch

Customer product reviews + ratings analysis and visualization [Python, Excel, Tableau, R]

data-analysis data-visualization jupyter-notebook python-notebook r-markdown sentiment-analysis text-analysis web-scraping

Last synced: 11 Jun 2026

https://github.com/ariyaarka/result-analysis

A simple analysis of result based on different factors shown in figures

data-analysis jupyter-notebook matplotlib numpy-library pandas-dataframe python seaborn

Last synced: 01 May 2026

https://github.com/dnut/associations

Python 3 library to identify high-dimensional statistical relationships in any data set.

analytics arch-linux association-rules data data-analysis data-mining data-science machine-learning python-modules

Last synced: 01 May 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/caesaredia/la-cafe-market-analysis

A data-driven feasibility study exploring the potential of launching a robot-staffed café in Los Angeles, based on real F&B business data.

business-intelligence cafe data-analysis data-visualization food-industry franchise los-angeles market-research pandas python

Last synced: 01 May 2026

https://github.com/pablo1785/receipt-rs

Receipt processing backend built with Shuttle.rs, Axum and Azure Form Recognizer API

api-rest axum azure backend cognitive-services computer-vision data-analysis rust shuttle-rs sqlx

Last synced: 01 May 2026

https://github.com/nel-zi/city_logistics

Built an automated, scalable Azure cloud data infrastructure for City Logistics, integrating market trends to optimize operations and enhance decision-making.

azure azure-cloud-services data-analysis data-automation data-cleaning data-engineering data-transformation

Last synced: 01 May 2026

https://github.com/fbarffmann/project1

Analyzed factors influencing movie profitability using Python. Cleaned and visualized film industry data to uncover trends in budgets, sales, genres, and ratings.

box-office-analysis data-analysis data-visualization matplotlib movie-industry pandas python regression seaborn

Last synced: 01 May 2026

https://github.com/codesaadumair/data-science-monorepo

Comprehensive Data Science monorepo featuring EDA, Machine Learning, Preprocessing, Feature Engineering, and Visualization projects with Jupyter notebooks and Python.

data-analysis data-science data-science-projects data-visualization eda jupyter-notebook jupyterlab machine-learning python

Last synced: 01 May 2026

https://github.com/abdoomohamedd/python-data-analysis-projects

A collection of data analysis projects that I have worked on. Each project involves cleaning data, performing exploratory data analysis (EDA), and creating visualizations to extract insights. The projects utilize various Python libraries, including pandas, numpy, matp

data-analysis data-analysis-python data-cleaning data-visualization matplotlib matplotlib-pyplot numpy pandas python

Last synced: 01 May 2026

https://github.com/rafath0ssain/predihome

Data analysis using economic factors affecting living conditions across Canadian provinces.

data-analysis data-visualization dplyr ggplot2 graph kaggle linear-regression prediction-model r shiny tidyr

Last synced: 01 May 2026

https://github.com/kheriberto/pandas_and_seabron_project

In this project I showcase my ability using pandas and seaborn to mold, transform and plot data.

data-analysis pandas python seaborn

Last synced: 01 May 2026

https://github.com/parthds02/-daily-calorie-count-meal-plan-generator-

Welcome to the Daily Calorie Count Meal Plan Generator project! This Streamlit web application is designed to create personalized meal plans based on user inputs such as age, weight, gender, and calorie goals. It also allows users to download their customized meal plans as PDFs.

calories-tracker data-analysis data-science pdf-generation streamlit vscode

Last synced: 13 May 2026

https://github.com/guptakushal03/whatsapp-chat-analyser

The WhatsApp Chat Analyzer is a Python-based tool built with Streamlit for analyzing WhatsApp chat data. It provides insights such as total messages, word count, media shared, links shared, monthly activity timeline, most active users, activity maps, and word clouds.

chat-analysis data-analysis data-visualization python streamlit text-processing whatsapp word-cloud

Last synced: 01 May 2026

https://github.com/bhoyarapurva23399/mini-erp-inventory-billing

Lightweight ERP inventory and billing web app built using Python Flask and SQLite — featuring product, customer, and dashboard management.

backend data-analysis erp flask inventory-billing mini-project python sqlite

Last synced: 01 May 2026

https://github.com/stoll-jonathan/sorting_algorithm_analyzer

C++ program which analyses the performance of different sorting algorithms on a dataset of random numbers

bubble-sort data-analysis insertion-sort merge-sort sorting-algorithms

Last synced: 01 Apr 2025

https://github.com/faithererer/haokanvideo_spider

好看视频爬取与数据分析

data-analysis data-visualization python spider

Last synced: 02 May 2026

https://github.com/shridhar1504/milk-production-time-series-forecasting-datascience-project

This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.

adf arima-model augmented-dickey-fuller-test data-analysis data-analytics data-science data-visualization eda exploratory-data-analysis machine-learning machine-learning-algorithms python python3 residuals sarimax seasonality time-series time-series-forecasting trends

Last synced: 02 May 2026

https://github.com/shreeparab1890/unicorns-of-india-till-sep-2022-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Unicorns of India till Sep 2022.

analysis data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 02 May 2026

https://github.com/suma-aljudaia/my-portfolio

Suma Aljudaia | Portfolio – AI & Data Analysis Enthusiast

ai css data-analysis html machine-learning portfolio

Last synced: 02 May 2026

https://github.com/dhanyasri20/credit-risk-prediction

Credit Risk Prediction using Python, SQL, and Flask. Trained ML models (Random Forest) to identify high-risk loan applicants with 86% accuracy, automated SQL reporting, and deployed a Flask web app for real-time predictions.

classification credit-risk data-analysis financial-data flask loan-prediction machine-learning python random-forest sql

Last synced: 28 Apr 2026

https://github.com/bhawnagoyal18/ai-doctor-a-symptom-checker-disease-predictor

AI Doctor is an intelligent healthcare application that utilizes machine learning (ML) and Python to predict potential diseases based on user-input symptoms. The project integrates data from multiple medical datasets and provides an interactive web-based UI for an intuitive user experience.

data-analysis data-engineering data-visualization dataset flask html5 machine-learning python sql stacking statistics

Last synced: 02 May 2026

https://github.com/manukot/sturdy-engine-python-

I've leant not only various Theoretical Concepts but also practical projects in my Masters Coursework

data-analysis data-visualization python3

Last synced: 13 May 2026

https://github.com/isaqueiros/motorpremium-predictions-mlpclassifier

This Jupyter Notebooks is an initial study of the application of sklearn neural network MLP Classifier model. The model is applied to dataset MotorPremiums, which is supplied separately in .csv format.

data-analysis data-science machine-learning neural-network python sklearn-library

Last synced: 02 May 2026

https://github.com/fatihilhan42/spotify-songs-recommendations-system_with_python

We developed a song recommendation system for the user with the data we received from our Spotify song dataset. Data set and other applications are given in the description. Have a nice day.

data-analysis data-science data-visualization jupyter-notebook python recommendation-engine recommendation-system

Last synced: 02 May 2026

https://github.com/mituskillologies/dkte-da-mar25

Programs conducted at DKTE's Engineering Institute, Ichalkaranji in training on Python Data Analytics March 2025.

data-analysis matplotlib numpy pandas python-programming tkinter-python

Last synced: 13 May 2026

https://github.com/m0saan/python-for-data-analysis

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney,

data-analysis data-science ipython-notebook machine-learning matplotlib numpy pandas python

Last synced: 02 May 2026

https://github.com/jayavarshini-jayakumaran/nba-exploratory-data-analysis

A data analytics project that explores NBA game and player data using Python and Power BI. Features data preprocessing, EDA, feature engineering, and an interactive dashboard for visualizing team and player performance trends.

data-analysis data-visualization exploratory-data-analysis powerbi python3

Last synced: 20 Jun 2026

https://github.com/se7en69/rna-seq-data-processing-and-analysis-pipeline

This pipeline automates essential steps for RNA-Seq data analysis, including quality control, read trimming, alignment to a reference genome, and coverage quantification. It leverages tools like FastQC, fastp, STAR, and bedtools to ensure high-quality results, with MultiQC reports providing an overview at each stage.

bioinformaitcs-scripting bioinformatics bioinformatics-pipeline data-analysis linux scripts shell

Last synced: 02 May 2026

https://github.com/benzerinsio/breastcancer-eda

📊 Análise Exploratória de Dados (EDA) - Câncer de Mama | Exploração de características clínicas para identificar padrões e relações no diagnóstico de câncer de mama.

analise-de-dados analise-exploratoria analise-exploratoria-de-dados data-analysis data-visualization diagnosis eda exploratory-data-analysis health-care medical-data python seaborn

Last synced: 02 May 2026

https://github.com/mostafa-ghorab/global-happiness-analysis

An analysis of global happiness rankings based on various factors like GDP, family support, health, and freedom from the World Happiness Report (2015-2017). This project provides data visualizations and statistical insights into how these factors influence happiness scores in different regions.

business-analysis data-analysis data-visualization matplotlib numpy pandas python seaborn

Last synced: 12 Apr 2026

https://github.com/badranalyst/movie-correlation-analysis-in-python

This project analyzes movie data correlations using Python libraries like Pandas, NumPy, Seaborn, and Matplotlib. It examines relationships between attributes such as ratings, genres, and box office performance to uncover trends that inform recommendations and enhance understanding of movie success factors.

data data-analysis dataset jupyter jupyter-notebook matplotlib matplotlib-pyplot numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/bhavna-kale/cars-eda-project

Project analyzing used car market data to identify high-impact price drivers and depreciation curves, presented through an interactive web application.

data-analysis excel matplotlib numpy pandas python3 searborn streamlit

Last synced: 03 May 2026

https://github.com/mahmoudnamnam/superstore-analysis

This project explores the SuperStore dataset to uncover insights into sales, profit, and customer behavior. It identifies key trends, regional variations, and product performance, using data analysis and machine learning techniques to guide business strategy and optimize performance.

clustering data-analysis data-science data-visualization geopandas jupyter-notebook machine-learning numpy pandas plotly regression seaborn sklearn

Last synced: 12 Apr 2026

https://github.com/stepankuzmin/machine-learning-data-analysis

My homeworks on Coursera Machine Learning and Data Analysis specialization

coursera data-analysis jupiter machine-learning python

Last synced: 03 May 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/alexgenovese/react-charts-covid-19-data

Examples on COVID-19 data using different library charts: G2, G2Plot, Plotly, ApexCharts

data-analysis data-science data-visualization react reactjs

Last synced: 13 May 2026

https://github.com/ibrahimhabibeg/national-university-of-singapore-sms-analysis

Analysis of SMS messages collected by the National University of Singapore

analytics data-analysis data-science nlp python

Last synced: 13 May 2026

https://github.com/emredemirbas/movie-ratings-analysis

A data analysis project investigating potential bias in movie ratings from 2015, comparing them with ratings from other platforms using Python, pandas, and visualization libraries.

data-analysis matplotlib pandas python seaborn

Last synced: 03 May 2026

https://github.com/maddieemihle/python-challenge

Creating a Python script that analyzes financial records and election results

data-analysis python

Last synced: 09 Jun 2026

https://github.com/vipulbunny/restaurant-insight-analysis

A comprehensive data analysis project exploring restaurant ratings, locations, and customer sentiments. This project includes data preprocessing, descriptive analysis, geospatial mapping, sentiment analysis, and price-rating correlations using Python and visualization tools.

data-analysis data-preprocessing data-visualization folium geospatial geospatial-analysis geospatial-visualization machine-learning nlp pandas python restaurant-insights seaborn sentiment-analysis

Last synced: 03 May 2026

https://github.com/devlucho/modelos-predictivos

Modelos predictivos utilizando los algoritmos de Regresión Lineal, Regresión Logística y Árboles de Decisión.

data-analysis jupyter-notebook python3

Last synced: 03 May 2026

https://github.com/nathadriele/diabetes-clinical-etl-pipeline

Este projeto de Engenharia de Dados em Saúde Pública implementa um pipeline completo para coletar, tratar, padronizar, validar, integrar e visualizar dados públicos do SUS relacionados ao Diabetes Mellitus no Brasil, filtrando pelos códigos CID-10 E10 a E14.

cid data-analysis data-extraction data-pipeline data-science data-structures data-visualization datasus diabetes-detection diabetes-prediction epidemiology-analysis etl-pipeline healthcare-analytics ibge logger pytest sih streamlit sus

Last synced: 09 Jun 2026

https://github.com/salma-mamdoh/project-writing-functions-for-product-analysis

My Project to learn the Basics of Analysis on DataCamp

data-analysis data-camp pandas python

Last synced: 03 May 2026

https://github.com/shellynagar27/transportation-and-logistics-challenge

Analyzing logistics data to optimize shipment efficiency, reduce delays, and enhance supply chain visibility using Power BI. Insights include top routes, delays, supplier trends, and peak shipments.

cleaning-data critical-thinking data-analysis data-visualization exploratory-data-analysis feature-engineering powerbi preprocessing-data problem-solving python

Last synced: 16 May 2026

https://github.com/ankitgmishra/machinelearning

Continuously deep diving in understanding & advancing my expertise in Machine Learning through ongoing education and hands on experience with practical learning.

artificial-intelligence data-analysis data-cleaning data-gathering machine-learning machinel-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/joelfaldin/data-analysis

A collection of data-analysis projects I've built over time! ✨⛏️

data-analysis python r

Last synced: 03 May 2026

https://github.com/ljadhav25/swiggy-restaurant-analysis

This repository contains data and analysis related to restaurants listed on Swiggy, one of India's largest online food ordering and delivery platforms. The objective is to explore restaurant trends, customer reviews, pricing strategies, and delivery metrics to gain insights into the food delivery industry.

data-analysis data-visualization matplotlib-pyplot numpy-library pandas-library python seaborn-plots

Last synced: 03 May 2026

https://github.com/devesh8423/machine_learning

Machine Learning practice projects, Jupyter notebooks, and datasets for learning regression, classification, and data analysis.

classification data-analysis data-science data-visualization jupyter-notebook machine-learning matplotlib ml-project numpy-library pandas python regression sckit-learn seaborn

Last synced: 03 May 2026

https://github.com/donmaruko/flask-data-analysis

Flask API for statistical calculations. Data analysis, cleansing, visualization, and manipulation. Documented by Swagger.

api api-rest data-analysis data-science data-visualization datascience flasgger matplotlib pandas seaborn sqlite wordcloud

Last synced: 03 May 2026

https://github.com/parth-jatav/super-store-analysis-project

The Super Store Analysis project leverages Python libraries such as pandas, matplotlib, and numpy to perform a comprehensive analysis of a retail store's data. This project includes data cleaning, visualization, and statistical analysis to identify key trends, optimize inventory, enhance decision-making processes for improved business performance.

data-analysis matplotlib numpy pandas python super-store

Last synced: 12 Apr 2026