An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/fburic/panda-grove

A lightweight package to manage multiple Pandas DataFrame

data-analysis data-science data-wrangling pandas

Last synced: 27 Jun 2026

https://github.com/xsolla/data-fast-insights

Xsolla data analytics tool for fast business insights and reporting.

analytics data data-analysis data-science python reporting xsolla

Last synced: 29 Jun 2026

https://github.com/cdilga/knn-c

C implementation of a K-Nearest Neighbour algorithm

data-analysis knn

Last synced: 04 Apr 2026

https://github.com/adnanrahin/apache-spark-complete-reference

This repository reflects on all the necessary steps to take before jump in into Big Data.

big-data data-analysis data-science kaggle-dataset machine-learning rdd scala spark

Last synced: 29 Apr 2026

https://github.com/manwithacap/by-the-metric-match

🎲🃏 A game data tracker for your board/card/video games!

data-analysis data-visualization games jupyter-notebook python utility

Last synced: 29 Apr 2026

https://github.com/faris771/investigate_a_dataset

This repository contains a Jupyter Notebook that investigates a dataset using data analysis techniques.

data-analysis

Last synced: 29 Apr 2026

https://github.com/bassamn/titanic-data-analysis

Exploratory data analysis (EDA) of the Titanic dataset using Python. Analyzed survival patterns by age, gender, and class with visualizations (seaborn/matplotlib). Non-ML focus—highlighting insights with statistics and plots.

data-analysis eda pandas python seaborn titanic visualization

Last synced: 08 May 2026

https://github.com/lebrancconvas/how-much-love-in-thai-song

How much Love song among the Thai Songs?

data-analysis side-project web-scraping

Last synced: 19 Jun 2026

https://github.com/kiranmayi5/python-projects

A collection of Python projects showcasing skills in data analysis and visualization.

data-analysis data-visualization machine-learning nlp python

Last synced: 05 May 2026

https://github.com/timmymatten/spikeball-stat-tracker

Spikeball stat tracking web app built with Streamlit and Python, designed to easily log and analyze player performance over multiple games.

data data-analysis data-visualization dataset matplotlib-pyplot multipage python spikeball statistics streamlit

Last synced: 18 Apr 2026

https://github.com/mikasenghaas/covid19-analysis

analysis of correlation between covid-19 infection numbers and weather data from the beginning of the pandemic until april 2021

data-analysis statistical-analysis

Last synced: 14 Feb 2026

https://github.com/jongan69/potion-leaderboard

Start of Entry for potion leaderboard contest

data-analysis leaderboard potion trading

Last synced: 11 Jun 2026

https://github.com/daniel1kp/openrtb-dashboard

This is a demo project designed to illustrate using Rill to analyze programmatic bid logs using the canonical open RTB framework.

data-analysis openrtb real-time-bidding rill

Last synced: 19 Mar 2026

https://github.com/geo-y20/loan-approval-automation-using-mongodb-and-pymongo

This project demonstrates the implementation of a loan approval system that utilizes MongoDB for distributed data storage and management, and PyMongo for database operations. The project aims to automate the assessment of loan eligibility using customer details from online applications.

crud-application data data-analysis data-science data-visualization deployment jupyter-notebook loan-default-prediction loan-prediction-analysis machine-learning machine-learning-algorithms matplotlib mongodb pymongo streamlit web

Last synced: 08 May 2026

https://github.com/kirkalyn13/open-signal-report-generator

Script used to generate results/summary, including the trends of flagged provinces, from the raw excel data file,

data-analysis data-science data-visualization matplotlib numpy pandas python

Last synced: 19 Jun 2026

https://github.com/jhrcook/wagenmaker-data-analysis

Analysis of Registered Replication Report: Strack, Martin, & Stepper (1988) by Wagenmaker et al.

data-analysis r r-project statistics

Last synced: 08 Jun 2026

https://github.com/sathyasris27/time-series-and-spectral-analysis-

The aim of this project involves the analyses the data, removing trends and seasonal effects, identifying the underlying process, understanding the dominant frequencies, and using the residuals to make predictions.

data-analysis data-visualization forecasting r spectral-analysis time-series-analysis

Last synced: 07 Jun 2026

https://github.com/thevinh-ha-1710/rstudio-statistics

This project deeply studies 2 datasets using applied statistics techniques.

applied-statistics data-analysis data-science data-visualization rmarkdown rstudio

Last synced: 31 Jan 2026

https://github.com/hayatiyrtgl/data_analysis_project

Financial data analysis: preprocess, visualize, calculate technical indicators.

data-analysis data-analysis-python data-science dataframe numpy pandas python python3 stock-price-prediction talib trade-analysis

Last synced: 04 Apr 2026

https://github.com/phammings/sales-management-analysis

Sales management analysis and Power BI dashboard for sample business request and user stories

data-analysis excel powerbi sql

Last synced: 01 Feb 2026

https://github.com/ferrangarciarovira/premier-league-betting-analysis

Comprehensive Python analysis of Premier League betting market inefficiencies (2005–2024). Evaluates bookmaker biases, betting strategies, and market efficiency using statistical methods and Monte Carlo simulations.

betting-strategies bias-detection data-analysis market-efficiency monte-carlo-simulation premier-league python sports-analytics

Last synced: 03 May 2026

https://github.com/shridhar1504/loan-clustering-datascience-project

This project uses Machine Learning to Cluster loan together based on their similarities. The project uses a dataset of loan application which includes information about the Loan amount and Balance. The project then use the clustering algorithm to group the loan together based on the similarities.

clustering-algorithm data-analysis data-science data-visualization datanalysis eda kmeans-clustering machine-learning python sql sql-server unsupervised-learning

Last synced: 08 May 2026

https://github.com/varshithdupati/yelp-business-analysis

Big Data analysis on Yelp reviews/businesses for Arizona. Using Hadoop, Spark, PySpark.

arizona-state-university big-data big-data-analytics data-analysis hadoop pyspark spark yelp

Last synced: 04 May 2026

https://github.com/tynoee/record_company-database

A record company database with multiple query commands using SQL

data-analysis sql

Last synced: 31 Jan 2026

https://github.com/wizardoftrap/football-team-analytics

This Jupyter notebook, created on Kaggle, analyzes football player and team statistics for the 2024-2025 season. It provides insights into player performance, team metrics, and playing styles across major European leagues using data from the dataset players_data-2024_2025.csv.

data-analysis data-visualization jupyter-notebook pandas python

Last synced: 05 May 2026

https://github.com/dina-hosny/telco-customer-churn-analysis-using-power-bi

An interactive dashboard to represent some analysis of "Telco customer churn" data and the reasons that made customers churn using Microsoft Power BI.

business-intelligence data-analysis data-modeling data-visualization power-bi powerbi

Last synced: 19 Mar 2026

https://github.com/iguptashubham/ott-churn-eda-ml

Understanding why customers discontinue their subscriptions will be crucial in optimizing the user experience, reducing churn, and maximizing customer lifetime value. By using Machine learning model to predict the Customer Churn.

data-analysis data-analysis-project data-science data-science-portfolio data-science-projects data-visualization machine-learning python

Last synced: 08 May 2026

https://github.com/jethronap/jstat-gui

Web-based GUI application for data analysis

data-analysis data-visualization java jstat mongodb

Last synced: 08 May 2026

https://github.com/luminati-io/shopee-dataset-samples

A sample dataset of over 1000 Shopee products, extracted using the Bright Data API, ideal for pricing optimization, gap analysis, and market strategy refinement..

api data-analysis data-mining datasets products shopee web-scraping

Last synced: 12 Feb 2026

https://github.com/msthamizh/phonepe-pulse-data-visualization-and-exploration

Developing a Streamlit application that allows users to explore and analyze transaction data from the PhonePe Pulse dataset. The project aims to provide insights into digital payment trends across India.

data-analysis data-visualization dataframe mysql pandas plotly python streamlit

Last synced: 02 May 2026

https://github.com/miroslav-reiter/kurz_jazyk_sql_analytici_datovi_vedci

Materiály ku kurzu Jazyk SQL 1 pre Analytikov a Dátových Vedcov

analysis analytics data data-analysis data-science database mysql reiter sql

Last synced: 08 May 2026

https://github.com/cagandemirmr/google-play-yorum-analizi

Türkiyede 2024 yılında en çok beğenilen My Supermarket Simulator 3D oyununa ait yorumların duygu durumu,yorumların beğeni sayısını,Firmanın geri dönüşleri ve kullanıcı nicknameleri gibi değişkenleri analiz ederek içgörü topladım.

bert data-analysis data-science nlp

Last synced: 10 Jun 2026

https://github.com/allanotieno254/us-largest-companies-by-revenue-web-scraping

A Python project for web scraping and analyzing the largest companies in the United States by revenue from Wikipedia

automation beautifulsoup csv data-analysis data-cleaning data-execution data-extraction pandas python web-scraping

Last synced: 08 May 2026

https://github.com/framebuffers/mindhunter

Wrappers for Pandas DataFrames to add quicker access for common statistical values, utilities and functionality.

data-analysis data-science numpy pandas python utilities-python

Last synced: 08 May 2026

https://github.com/aekanshd/crazytics-suicidesindia

Basic interpretation of the Suicides in India data-set using R.

data-analysis data-science graph india r suicides

Last synced: 10 Jun 2026

https://github.com/md-emon-hasan/data_analytics_project

Data analytics tasks and solutions, featuring hands-on exercises for data cleaning, visualization, and analysis using Python libraries.

cars-dataset census-data covid19-data data-analysis london-house-price police-data weather-data

Last synced: 08 May 2026

https://github.com/victor-antoniassi/junior_data_analyst_test_01

Solution developed for a technical assessment that analyzed video game sales data to support gaming partnership decisions.

asses assessment-project data-analysis data-analysis-project data-analyst duckdb etl prefect python

Last synced: 01 Jun 2026

https://github.com/denisecase/nlp-03-text-exploration

Exploratory analysis of text corpora using tokenization, frequency, co-occurrence, and bigrams to reveal structure in text.

bigrams co-occurence corpus-analysis data-analysis nlp python text-analysis text-exploration tokenization

Last synced: 02 Jun 2026

https://github.com/mengyaohuang/data-manipulation-and-analysis

Data processing implementation with tools in Python

data-analysis nlp-machine-learning pandas-dataframe python

Last synced: 27 Apr 2026

https://github.com/datalopes1/ds_salaries2024_eda

Neste projeto será realizado o processo de EDA (Exploratory Data Analysis) a partir do dataset Data Science Salaries 2024, que pode ser encontrado no Kaggle, com licensa Database: Open Database e enviado por Sazidul Islam.

data-analysis data-visualization eda exploratory-data-analysis jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/flyingfathead/neurograph-framework

A versatile tool for visualizing entropy loss in TensorFlow-based neural network training, providing insightful scatter plots with annotations.

data-analysis data-analysis-python data-visualization entropy graph graphs neural-network neural-networks neural-networks-visualization nn python python3 tensorflow tensorflow2 training visualization visualization-tools

Last synced: 24 Apr 2026

https://github.com/souravsuvarna/whatsapp-chat-analyzer-api

The WhatsApp Chat Analyzer API is a public api specifically designed for frontend enthusiasts who are interested in building a WhatsApp Chat Data Visualizer project. Built on FastAPI, this API offers a seamless and efficient method to process chat data and returns the processed result data in JSON format.

api data-analysis data-science fastapi publicapi python

Last synced: 20 Jun 2026

https://github.com/gonzalo123/pivot.pandas

Data Analysis with Python. Pivot tables with Pandas

data-analysis jupyter-notebook pandas pivot-tables python

Last synced: 05 May 2026

https://github.com/elcaiseri/udacity-advanced-data-analysis

UDACITY - Advanced-Data-Analysis Track Project

data-analysis python

Last synced: 05 May 2026

https://github.com/shuddha2021/stellar-candidate-selector

A sophisticated candidate selection algorithm leveraging multi-criteria analysis and machine learning to identify top software engineering candidates. This tool features flexible filtering, score adjustment, and detailed visualizations to streamline the recruitment process.

candidate-selection data-analysis data-visualization machine-learning pandas plotting-in-python python python-data-analysis recruitment scikit-learn

Last synced: 05 May 2026

https://github.com/keganedwards/housing-prices-exploration

Using machine learning algorithms to explore housing prices

data-analysis data-science python school-project

Last synced: 24 Apr 2026

https://github.com/vara-co/python-api-challenge

Weather and Perfect Vacationing Spots Worldwide, by using APIs

api apis data-analysis data-visualization hvplot jupyter-notebook matplotlib pandas vacation weather

Last synced: 05 May 2026

https://github.com/sadia-khan13/supervised_machine_learning

This repository is meant to document my hands-on experience with supervised learning algorithms and techniques. It includes a variety of exercises, and experiments using different types of data and tools. Each file represents a step forward in building my machine learning skills.

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms python sciket-learn supervised-machine-learning

Last synced: 06 Mar 2026

https://github.com/spaghettifunk/gvb

Analysis of GVB in Amsterdam

data-analysis public-transportation

Last synced: 28 Feb 2026

https://github.com/jamiemagee/rhi

Collating the data on the Renewable Heat Incentive scheme, and presenting it in a more readable format.

data-analysis open-data open-government rhi

Last synced: 25 Feb 2026

https://github.com/antonio-f/big-data-analysis-with-scala-and-spark

Coding assignments from the course "Big Data Analysis with Scala and Spark" (Coursera).

big-data bigdata coursera data-analysis scala spark

Last synced: 27 Apr 2026

https://github.com/akshat0427/python_youtube_history

a bunch of data science operations performed on youtube history data

data-analysis data-science extracting-features

Last synced: 10 Jun 2026

https://github.com/avijit-jana/redbus-data-scraper-dashboard

A Streamlit-based application leveraging Selenium to automate data scraping from Redbus, enabling efficient collection, analysis, and visualization of bus travel data for improved operational efficiency and strategic planning in the transportation industry.

automation dashboard data-analysis data-visualisation data-visualization datadrivendecisions filtering python3 redbus selenium selenium-python streamlit streamlit-application travel web-scraping webscrapping

Last synced: 09 May 2026

https://github.com/emso-exe/reclamacoes_de_consumidores_com_empresa_de_telecomunicacoes

Projeto de análise de reclamações de consumidores com empresa de telecomunicações no 1º semestre de 2021 com base nos dados do site consumidor.gov.br.

analise-de-dados ciencia-de-dados data-analysis data-science datascience python python-3 python3

Last synced: 02 May 2026

https://github.com/kalfasyan/filoma

profiling files, directories, image data

data-analysis profiler validation

Last synced: 05 Apr 2026

https://github.com/sedatdikbas/aefes-time-series-forecasting

Bu proje, Anadolu Efes Biracılık ve Malt Sanayii A.Ş. (AEFES) piyasa verilerini kullanarak kapanış fiyatlarının gelecekteki değerlerini tahmin etmek amacıyla derin öğrenme yöntemleri (LSTM, BiLSTM, CNN+LSTM) kullanmaktadır. Projede, veri ön işleme, model eğitimi ve değerlendirme adımları detaylandırılmıştır.

bilstm cnn-lstm data-analysis deep-learning financial-forecasting lstm machine-learning python stock-price-prediction tensorflow

Last synced: 09 May 2026

https://github.com/alemalvarez/data-analysis-web-project

Web-app providing a simple interface for data storage,

data-analysis data-science javascript react webapp

Last synced: 29 Apr 2026

https://github.com/leosimoes/uerj-tcc-analisador-dados

Trabalho de conclusão de curso (TCC) em Engenharia de Computação. Aplicativo Web para preparação e análise de dados, criação de gráficos e modelos de regressão linear e logistica.

computer-engineer data-analysis data-science data-visualization linear-logistic linear-regression python streamlit

Last synced: 24 Apr 2026

https://github.com/rubinlake/rl-academy-data-analytics

Educational data analysis project demonstrating BMW sales data analysis with AI-powered code assistance using Cursor IDE and Jupyter notebooks

cursor-ide data-analysis educational-project jupyter langchain matplotlib numpy pandas python scipy seaborn

Last synced: 09 May 2026

https://github.com/chahiriabderrahmane/carpricepredictor

🚗 Cars Exploration & Price Prediction | Analyzing Cars.com Listings

data-analysis data-science data-visualization machine-learning python streamlit web-scraping

Last synced: 08 Feb 2026

https://github.com/dangerousfish/uk-climate-trends-dashboard-metoffice

A data pipeline and Streamlit dashboard that aggregates, cleans and visualises historical UK Met Office station data - interactive charts, heatmaps and maps for temperature, rainfall and sunshine.

climate climate-analysis climate-change climate-data climate-science data-analysis data-visualization metoffice metofficeweather streamlit temperature weather

Last synced: 02 May 2026

https://github.com/dina-hosny/explore-us-bike-share-data-project

Explore US Bike Share Data project - FWD Data Analysis Professional Track. In this project, I used Python to explore data related to bike share systems for three major cities in the United States and answer questions about it by computing descriptive statistics.

data-analysis data-science numpy pandas python

Last synced: 09 May 2026

https://github.com/markmusic27/data-statistics-calculator

💣 This method (made in JavaScript / Python) can find the mean, median, mode, range, and standard deviation.

data-analysis standard-deviation statistics statistics-calculator

Last synced: 20 Jun 2026

https://github.com/snehilk1312/data_science

This Repository contains the Data Science things I have done in recent times along with visualization , cleaning , models, statistics, Courses, Datasets. :=)

data-analysis data-science glove natural-language-processing nlp nltk statistics word2vec

Last synced: 02 Apr 2026

https://github.com/dogan-the-analyst/developer_survey_analysis

Analysis of the 2024 Stack Overflow developer survey. Tools used include Python, Pandas, Matplotlib, and IBM Cognos.

data-analysis data-visualization ibm-cognos-analytics matplotlib pandas python

Last synced: 09 May 2026

https://github.com/kimtth/agent-data-analyst-stream-chainlit

⚡️Chainlit-based Data Analyst Chat Agent (Responses API, Server Sent Events) 📈

agent azure-openai chainlit code-interpreter data-analysis server-sent-events stream-response

Last synced: 09 Jun 2026

https://github.com/dcs-training/intromachinelearning

This course is aimed at providing an introduction to machine learning for those with some beginner level python skills. Go to the readme file

data-analysis data-wrangling machine-learning python statistics

Last synced: 06 Mar 2026

https://github.com/scarblase/sales_insights

A data-driven analysis of 15,000 sales records using Python, Pandas, and visualizations to uncover trends, optimize strategies, and enhance business performance. 🚀📊

data-analysis data-visualization dataset matplotlib-pyplot pandas python3 sales-analysis seaborn

Last synced: 05 May 2026

https://github.com/pngo1997/axa-xl-insurance-bi-dashboard

Provides a comprehensive analysis of insurance submissions, approvals, compliance rates, and profitability for AXA XL Insurance.

bi-analytics bi-dashboard business-analytics data-analysis filtering performance-analysis powerbi segmentation visualization

Last synced: 08 Feb 2026

https://github.com/obirikan/ad-performance-analysis

This project Compares Ad Effectiveness Using A/B Tests; analyzes ad performance using user interaction data, advertisement metadata, and device data. The goal is to evaluate click-through rates (CTR) across various ad versions, platforms, and devices.

data-analysis pandas

Last synced: 27 Apr 2026

https://github.com/adrija-debnath/ideas-isi-data-science-internship

Topic of the Project - Predictive Maintenance Analysis, Data Science Internship at IDEAS - Institute of Data Engineering, Analytics and Science Foundation Technology Innovation Hub at Indian Statistical Institute, Kolkata.

data-analysis data-science predictive-analytics predictive-maintenance streamlit

Last synced: 27 Apr 2026

https://github.com/ehtisham-sadiq/building-an-ml-based-heart-disease-diagnosis-system-with-flask

It is an end-to-end project that combines machine learning to create a user-friendly Heart Disease Diagnosis System, powered by Flask.

data-analysis exploratory-data-analysis feature-engineering flask machine-learning model-building model-evaluation pipelines python3 rest-api

Last synced: 04 May 2026

https://github.com/sejalmankar1012/yuvaco_data_analysis_assessment

This assignment involves writing a Python script to calculate the cost of package deliveries based on provided data and a cost grid. The script takes package details such as weight, distance, and delivery type, applies the cost calculation rules, and saves the results in an output file. You can also run the script in Google Colab for convenience.

csv-file-handling data-analysis google-colab package-delivery python python-scripting

Last synced: 29 Apr 2026

https://github.com/mg380/ibm-applied-data-science-capstone

This Capstone is the 10th (final) course in IBM Data Science Professional Certificate specialization, and it actually summarises in the form of project all materials that have been learned during this specialization

capstone data data-analysis data-science datascience ibm machine-learning plotly python scikit-learn sql

Last synced: 05 Mar 2026