An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/dindagustiayu/data-processing

The digital text book to interpreting characterisation results.

characterisation data-analysis gitbook latex-package myst qualitative-analysis quantitative-analysis

Last synced: 08 Jun 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/theoplayz2/eda-explorer

Инструмент на Python для разведочного анализа данных (EDA) и визуализации, поддерживающий загрузку данных CSV и JSON, с модульной архитектурой ООП. Практическая работа по теме: "Обнаружение и визуализация данных для понимания их сущности" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

analysis battery-life cqrs csharp data-analysis eeg-analysis exploratorydataanalysis json-visualization matplotlib messaging profile-report python verilog visualization

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/alam025/algo-trading-bot

Backtested 20+ strategies achieving 18% annualised returns on historical S&P 500 data

api ccxt data-analysis finance fintech pandas postgresql python

Last synced: 08 Jun 2026

https://github.com/angchekar28/air-quality-index-analysis

This project analyzes Air Quality Index (AQI) data to identify pollution trends, seasonal variations, and the impact of different pollutants. It includes data visualization, correlation analysis, and insights into air quality variations over time.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook machine-learning python

Last synced: 30 Apr 2026

https://github.com/shruti-h/sales_data_analysis

Sales Data Analysis | Pandas & Matplotlib

data-analysis data-science data-vi matplotlib pandas-library python

Last synced: 30 Apr 2026

https://github.com/edgarhtt/uber_freight_data_analysis

Uber Freight interview homework. It consisted of solving a 2 warehouse problem and an ETL task

data-analysis data-science data-visualization python

Last synced: 30 Apr 2026

https://github.com/aishwaryagade02/loan-funnel-optimization-analysis

Tracks how loan applications move through each stage, helps spot where people drop off, and gives clear insights to improve approval strategies and overall performance.

ab-testing data-analysis data-creation hypothesis-testing python reporting sql statistical-methods streamlit

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/farhad-here/id_validator

Iranian National ID Validator. This was one of my data analysis project for the course i had.

data-analysis identity idverification object-oriented-programming oop oops-in-python python streamlit

Last synced: 30 Apr 2026

https://github.com/mfakhriazhar/nlp-movie-recommender-system

This project is a content-based movie recommender system built using Natural Language Processing (NLP) techniques. By extracting and combining important text features from movie metadata, this system suggests movies that are similar to a user's selected title.

data-analysis data-science deep-learning machine-learning natural-language-processing python recommender-system

Last synced: 30 Apr 2026

https://github.com/mitchellharrison/mitchellharrison.github.io

Welcome to my slice of the internet, where I share the knowledge that Duke gave me, so you don't have to spend the mortgage-sized amount to access it. Built with R, Python, Quarto, and love.

ai algorithms-and-data-structures blog data-analysis data-science data-visualization educational machine-learning portfolio portfolio-website quarto r r-language statistics tutorials

Last synced: 30 Apr 2026

https://github.com/ahmedtaher10/covid-19-cases

The data we are using contains the data on covid-19 cases and their impact on GDP from December 31, 2019, to October 10, 2020.

data-analysis python visualization

Last synced: 30 Apr 2026

https://github.com/fbarffmann/credit-risk-classification

Classified 19,000+ loans as high-risk or healthy using logistic regression. Achieved 100% precision for healthy loans and 84% precision for high-risk loans.

classification credit-risk data-analysis logistic-regression machine-learning model-evaluation pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/syarwinaaa09/investigating-netflix-movies

🎬 investigating netflix movie trends using python and pandas 📊

csv data-analysis matplotlib netflix pandas visualization

Last synced: 01 May 2026

https://github.com/falakrana/data-analysis-visualization

This repository showcases data analysis and visualization projects using Python and Tableau. It includes exploratory data analysis, interactive dashboards, and insightful visual stories derived from real-world datasets.

data-analysis data-visualization python tableau-public

Last synced: 01 May 2026

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026

https://github.com/bpkaur/a-network-analysis-of-game-of-thrones

A Network analysis of Game of Thrones: To analyze the co-occurrence network of the characters in the Game of Thrones books

data-analysis data-science machine-learning networkx python3

Last synced: 01 May 2026

https://github.com/myounesdev/authorgraphanalyzer

a web-based visualization tool for analyzing and exploring author collaboration networks

algorithms binary-tree bts d3js data-analysis dijkstra-algorithm django exception-handling pandas python scss

Last synced: 08 Jun 2026

https://github.com/sairupeshl/leo-orbital-congestion-analysis

Geospatial data analysis of the UCS Satellite Database using Python to map active LEO space assets, validate orbital parameters, and isolate mega-constellation traffic bottlenecks.

aerospace-engineering data-analysis geospatial-analysis orbital-mechanics pandas python satellite-data seaborn

Last synced: 08 Jun 2026

https://github.com/pablo1785/receipt-rs

Receipt processing backend built with Shuttle.rs, Axum and Azure Form Recognizer API

api-rest axum azure backend cognitive-services computer-vision data-analysis rust shuttle-rs sqlx

Last synced: 01 May 2026

https://github.com/fbarffmann/project1

Analyzed factors influencing movie profitability using Python. Cleaned and visualized film industry data to uncover trends in budgets, sales, genres, and ratings.

box-office-analysis data-analysis data-visualization matplotlib movie-industry pandas python regression seaborn

Last synced: 01 May 2026

https://github.com/codesaadumair/data-science-monorepo

Comprehensive Data Science monorepo featuring EDA, Machine Learning, Preprocessing, Feature Engineering, and Visualization projects with Jupyter notebooks and Python.

data-analysis data-science data-science-projects data-visualization eda jupyter-notebook jupyterlab machine-learning python

Last synced: 01 May 2026

https://github.com/linguini1/edueval

The BorealisAI Let's Solve It mentorship project: summarizing student feedback submissions on their professor into one cohesive paragraph for faculty consideration during performance reviews.

ai data data-analysis data-science machine-learning machinelearning nlp python pytorch sentiment-analysis

Last synced: 01 May 2026

https://github.com/nurulashraf/hierarchical-clustering-customer-segmentation

A customer segmentation project using hierarchical clustering to group customers based on their spending behaviour and demographics. This helps businesses identify patterns and create targeted marketing strategies.

business-analytics clustering-algorithm customer-segmentation data-analysis hierarchical-clustering machine-learning python unsupervised-learning

Last synced: 18 May 2026

https://github.com/bhoyarapurva23399/mini-erp-inventory-billing

Lightweight ERP inventory and billing web app built using Python Flask and SQLite — featuring product, customer, and dashboard management.

backend data-analysis erp flask inventory-billing mini-project python sqlite

Last synced: 01 May 2026

https://github.com/mateusoliveira30/top-intelligent-people

This project performs an exploratory analysis of the top_intelligent_people_in_the_world_5000.csv dataset, featuring some of the world's most intelligent individuals. Using pandas and matplotlib, the analysis includes checking for missing values, describing variables, and visualizing data.

data-analysis graphics kaggle-dataset python3

Last synced: 03 May 2026

https://github.com/more-joao/color-distance-luminance

Data analysis project that aims to establish a relation between the Canberra distance between white and any given color in the RGB colorspace and its luminance.

canberra-distance data-analysis luminance python r rgb

Last synced: 02 May 2026

https://github.com/faithererer/haokanvideo_spider

好看视频爬取与数据分析

data-analysis data-visualization python spider

Last synced: 02 May 2026

https://github.com/teja-1403/ignosis-tech-ml-assignment

Analysis of transaction data to identify the most profitable products and key customer segments, providing insights for targeted marketing strategies.

customer-segmentation data-analysis data-visualization machine-learning marketing-strategy python

Last synced: 02 May 2026

https://github.com/isaqueiros/motorpremium-predictions-mlpclassifier

This Jupyter Notebooks is an initial study of the application of sklearn neural network MLP Classifier model. The model is applied to dataset MotorPremiums, which is supplied separately in .csv format.

data-analysis data-science machine-learning neural-network python sklearn-library

Last synced: 02 May 2026

https://github.com/m0saan/python-for-data-analysis

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney,

data-analysis data-science ipython-notebook machine-learning matplotlib numpy pandas python

Last synced: 02 May 2026

https://github.com/dissorial/prx21_erikz

Analysis of self-tracked data: interactive visualizations & predictive algorithms

analytics data-analysis data-science data-visualization machine-learning matplotlib pandas python python3 visualization

Last synced: 02 May 2026

https://github.com/inevolin/multivariate-data-analysis

Showcases of modern multivariate & multidimensional data analysis in industrial and high-tech settings.

analytics data-analysis data-science data-visualization javascript

Last synced: 09 Jun 2026

https://github.com/sarah-marion/sovereign-osint-toolkit

Sovereign OSINT Toolkit - Advanced, self-hosted intelligence platform for security researchers and investigators. Ethical, private and production-ready.

correlation-engine cybersecurity data-analysis docker fastapi infosec intelligence investigation open-source osint privacy python3 security-research security-tools threat-intelligence

Last synced: 02 May 2026

https://github.com/badranalyst/movie-correlation-analysis-in-python

This project analyzes movie data correlations using Python libraries like Pandas, NumPy, Seaborn, and Matplotlib. It examines relationships between attributes such as ratings, genres, and box office performance to uncover trends that inform recommendations and enhance understanding of movie success factors.

data data-analysis dataset jupyter jupyter-notebook matplotlib matplotlib-pyplot numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/bhavna-kale/cars-eda-project

Project analyzing used car market data to identify high-impact price drivers and depreciation curves, presented through an interactive web application.

data-analysis excel matplotlib numpy pandas python3 searborn streamlit

Last synced: 03 May 2026

https://github.com/stepankuzmin/machine-learning-data-analysis

My homeworks on Coursera Machine Learning and Data Analysis specialization

coursera data-analysis jupiter machine-learning python

Last synced: 03 May 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/emredemirbas/movie-ratings-analysis

A data analysis project investigating potential bias in movie ratings from 2015, comparing them with ratings from other platforms using Python, pandas, and visualization libraries.

data-analysis matplotlib pandas python seaborn

Last synced: 03 May 2026

https://github.com/vipulbunny/restaurant-insight-analysis

A comprehensive data analysis project exploring restaurant ratings, locations, and customer sentiments. This project includes data preprocessing, descriptive analysis, geospatial mapping, sentiment analysis, and price-rating correlations using Python and visualization tools.

data-analysis data-preprocessing data-visualization folium geospatial geospatial-analysis geospatial-visualization machine-learning nlp pandas python restaurant-insights seaborn sentiment-analysis

Last synced: 03 May 2026

https://github.com/devlucho/modelos-predictivos

Modelos predictivos utilizando los algoritmos de Regresión Lineal, Regresión Logística y Árboles de Decisión.

data-analysis jupyter-notebook python3

Last synced: 03 May 2026

https://github.com/nathadriele/diabetes-clinical-etl-pipeline

Este projeto de Engenharia de Dados em Saúde Pública implementa um pipeline completo para coletar, tratar, padronizar, validar, integrar e visualizar dados públicos do SUS relacionados ao Diabetes Mellitus no Brasil, filtrando pelos códigos CID-10 E10 a E14.

cid data-analysis data-extraction data-pipeline data-science data-structures data-visualization datasus diabetes-detection diabetes-prediction epidemiology-analysis etl-pipeline healthcare-analytics ibge logger pytest sih streamlit sus

Last synced: 09 Jun 2026

https://github.com/codeslash21/tmdb_data_analysis

We analysed TMDB dataset which contains around 11000 movies details. We analyzed to find some interesting facts about the dataset.

data-analysis data-visualization matplotlib nanodegree-project numpy pandas python tmdb-movie

Last synced: 03 May 2026

https://github.com/muskanmi/data_analysis_python

Data analysis on students result dataset using python libraries.

boxplot countplots data-analysis numpy pandas pie-chart python3 seaborn

Last synced: 03 May 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/nurulashraf/logistic-regression-loan-prediction

Loan approval prediction using logistic regression based on applicant data, including income, credit history, and property details, after data preparation and feature engineering.

data-analysis data-science loan-prediction logistic-regression machine-learning predictive-modeling python sklearn

Last synced: 03 May 2026

https://github.com/ljadhav25/swiggy-restaurant-analysis

This repository contains data and analysis related to restaurants listed on Swiggy, one of India's largest online food ordering and delivery platforms. The objective is to explore restaurant trends, customer reviews, pricing strategies, and delivery metrics to gain insights into the food delivery industry.

data-analysis data-visualization matplotlib-pyplot numpy-library pandas-library python seaborn-plots

Last synced: 03 May 2026

https://github.com/iguptashubham/ev-market-exploration

So, market size analysis is a crucial aspect of market research that determines the potential sales volume within a given market

data-analysis data-analysis-projects data-science-project forecast projects python

Last synced: 03 May 2026

https://github.com/syarwinaaa09/analyzing-crime-in-los-angeles

Exploratory data analysis of Los Angeles crime data with insights on temporal patterns, locations, and age demographics.

crime-data data-analysis eda los-angeles pandas public-safety python visualization

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/r13i/cheapest-phone-call

Small challenge to find the best phone operator to use based on call price

big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist

Last synced: 04 May 2026

https://github.com/sanchittechnogeek/rental-data-visualization_python

Statistics and visualization of rental data with python

data-analysis data-science data-visualization statistics

Last synced: 04 May 2026

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: 04 May 2026

https://github.com/ibrahimm7004/supermarket-sales-analysis

This project focuses on Data Mining techniques to gather inisights about customer behaviour regarding Supermarket Sales. Includes: Association Rule Mining, Temporal Patterns in customer behavior, Sequential Pattern Mining, Classification, Regression, and Outlier Detection.

apriori association-rules data-analysis data-mining data-science data-visualization fpgrowth python sales-analysis supermarket-sales

Last synced: 04 May 2026

https://github.com/mr-chang95/sf_data_visualization

In this personal project, I am interested in examining all of the active businesses in the San Francisco Bay Area while performing some simple data visualizations, mainly on categorical variables.

business data-analysis data-visualization jupyter-notebook pandas python san-francisco

Last synced: 04 May 2026

https://github.com/hyperplasma/olympic-visualization-analysis

Multidimensional analysis and visualization of Olympic medals, economy, and happiness index.

data-analysis data-visualization matplotlib numpy pandas python wordcloud

Last synced: 04 May 2026

https://github.com/halyusa16/e-commerce-analysis

This project analyzes a public e-commerce dataset to uncover valuable insights and answer critical business questions. The dataset contains customer, product, order, and transaction details, providing a comprehensive view of the e-commerce platform's operations.

data-analysis data-cleaning data-exploration data-visualization self-project

Last synced: 09 Jun 2026

https://github.com/shubhamgoyal575/credit-card-fraud-detection

📌 Credit Card Fraud Detection using Machine Learning This project focuses on detecting fraudulent credit card transactions using machine learning models like Random Forest, XGBoost, and Deep Learning. The dataset is preprocessed to handle class imbalance, and multiple models are evaluated based on ROC AUC Score and F1 Score.

adaboost-classifier artificial-neural-networks credit-card-fraud data-analysis data-cleaning data-preprocessing data-science data-visualization deep-learning exploratory-data-analysis lightgbm machine-learning machine-learning-algorithms random-forest-classifer scikit-learn tensorflow xgboost

Last synced: 08 Feb 2026

https://github.com/swatisinghit/e-commerce-trend-analysis-for-target

An exploratory and in-depth study of the E-Commerce sales data for a Brazilian store using SQL.

bigquery data-analysis mysql sql

Last synced: 19 May 2026

https://github.com/amarlearning/exploring-the-evolution-of-linux

Data Analysis about the development of the Linux operating system by exploring its Git repository history.

cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux

Last synced: 12 May 2026

https://github.com/imnotamr/datasets-used

A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics

ai classification data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning models python regression-models

Last synced: 19 May 2026

https://github.com/mulukensholaye/spark_kafka_streaming_csv

Real-time streaming data analysis pipeline with integrating apache spark's streaming library to read records from kafka topic

apache-kafka apache-spark data-analysis python3 realtime-messaging

Last synced: 19 May 2026

https://github.com/syed-amjad-ali/airbnb-listing-analysis

Analyzing AirBnB listings in Paris to determine the impact of recent regulations

business-intelligence data-analysis jupyter-notebook maven-analytics python

Last synced: 19 May 2026

https://github.com/hawmex/aut_data_and_information_analysis_project

This repository contains the files of my project for the "Data & Information Analysis" course at AUT (Tehran Polytechnic).

data-analysis data-science k-means outlier-detection python

Last synced: 19 May 2026

https://github.com/devexpress-examples/wpf-pivotgrid-how-to-display-underlying-data

This example demonstrates how to obtain the records from the control's underlying data source for a selected cell or multiple selected cells.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 19 May 2026

https://github.com/samir-atra/share-lm_dataset_analysis

Analysis, studies and optimizations on the ShareLM extension dataset

data-analysis data-visualization gemma3n huggingface huggingface-transformers pandas

Last synced: 19 May 2026

https://github.com/advestis/adadjust

Package allowing to fit any mathematical function to (for now 1-D only) data.

data-analysis fit python

Last synced: 17 May 2026

https://github.com/prakshal0809/sql-data-analysis-project

This project involves analyzing pizza sales data using SQL to address various data analysis questions, providing essential foundational to advanced SQL knowledge.

data-analysis sql

Last synced: 26 Jun 2025

https://github.com/borjamome/radiografia-madrid

Análisis de Población, Economía y Sociedad de Madrid con R.

data-analysis data-visualization madrid r

Last synced: 17 Jun 2025

https://github.com/singingsandhill/data_analysis

데이터 분석_개인 프로젝트 정리

data-analysis python

Last synced: 19 May 2026

https://github.com/ccoolbaugh/individualized_cooling_data_analysis

Matlab code to analyze data collected during a brown adipose tissue individualized cooling protocol.

brown-adipose-tissue cold-exposure data-analysis ibutton matlab skin-temperature thermoregulation

Last synced: 18 Aug 2025

https://github.com/oubiche-ishak19/stock_evaluation_python

A Python script to classify companies based on financial metrics like Piotroski F-Score and Stock Valuation, using CSV financial data for analysis and output.

backtesting-frameworks classification csv-processing data-analysis expert-system finance financial-analysis-tools python rule-based-classifier stock stock-market streamlit tkinter-gui yahoo-finance

Last synced: 15 May 2026

https://github.com/sukhitashvili/pca_tutorial

PCA algorithm from scrach, using only matrix-vector multiplications

data-analysis data-science data-visualization machine-learning-algorithms pca

Last synced: 29 Mar 2025

https://github.com/samukiszhsd/alteryx-analytics

Você está trabalhando com dados de transações bancárias do Itaú e precisa fazer algumas análises para ajudar o time de auditoria a detectar padrões incomuns e possíveis transações suspeitas.

alteryx data-analysis data-structures data-visualization etl workflow

Last synced: 18 Feb 2026

https://github.com/prady2309/stock-analysis

Analysis on the stock prices of Apple, Google, Microsoft and Amazon

data-analysis data-science data-visualization python stock-market

Last synced: 19 May 2026

https://github.com/eve-ning/ppshift

Analyzes maps and scores from 2015

data-analysis data-mining osu osugame

Last synced: 13 Feb 2026