An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/dogan-the-analyst/developer_survey_analysis

Analysis of the 2024 Stack Overflow developer survey. Tools used include Python, Pandas, Matplotlib, and IBM Cognos.

data-analysis data-visualization ibm-cognos-analytics matplotlib pandas python

Last synced: 09 May 2026

https://github.com/talha-1010/imdb-data-analysis

A data analysis project made with python using pandas

data-analysis data-visualization jupyter-notebook pandas pandas-dataframe

Last synced: 09 May 2026

https://github.com/mariam-badr-mb/gtc-ml-project2-diabetes-prediction

This project is part of the GTC Machine Learning Program. It demonstrates the end-to-end ML workflow by building a predictive model for diabetes detection

classification-algorithm data-analysis data-visualization diabetes-prediction gridsearchcv hyperparameter-tuning machine-learning python

Last synced: 09 May 2026

https://github.com/sathyasris27/data-analysis-on-adult-smoking-patterns-in-the-uk

The aim of this analysis is to understand the smoking patterns among adults in the UK.

data data-analysis data-visualization python3

Last synced: 09 May 2026

https://github.com/musaibnagani/fraud-detection

End-to-end fraud detection simulation using Python — Phase 1 (SQLite + Rules) and Phase 2 (MSSQL + Velocity/Behavioral Features) with synthetic banking data.

data-analysis fraud-detection fraudulent-transactions mssql mssql-database pandas python sqlite3 time-series

Last synced: 10 May 2026

https://github.com/christos99/scraping-project

This project is a Python-based tool for web scraping with a user-friendly GUI. Built with PyQt5 and Selenium, it allows users to scrape online listings by specifying keywords, price ranges, and exclusions. Results are displayed in a table and can be exported to an Excel file.

automation data-analysis excel gui openpyxl pandas pyqt5 python selenium web-scraping

Last synced: 10 May 2026

https://github.com/gabrielmpinho/cs50-sql

Solutions and notes from CS50’s Introduction to Databases with SQL. Covers CRUD operations, data modeling, normalization, joins, views, indexes, and connecting SQL with Python and Java. Begins with SQLite for portability and introduces PostgreSQL and MySQL for scalability.

data-analysis data-structures data-visualization database databases javascript python sql

Last synced: 10 May 2026

https://github.com/pratik-khose/data-analysis-with-pandasai

PandasAI with Llama3 for Interactive Data Analysis

data-analysis llama3 llma pandasai streamlit visualization

Last synced: 11 May 2026

https://github.com/chayandatta/got_script_manipulation

Game of Thrones Script - String & file manipulation

data-analysis data-science pandas python3

Last synced: 11 May 2026

https://github.com/easycris-software/easycris

Professional statistical analysis and RNA-seq for researchers — no coding required

anova bioinformatics data-analysis desktop-app genomics pharmacology research-tools rna-seq statistics tauri

Last synced: 11 May 2026

https://github.com/is-leeroy-jenkins/sherpa

A budget execution & data analysis tool based on Winforms, .NET 6, and written in C# for EPA analysts

budget-management data-analysis data-science data-visualization federal-government

Last synced: 13 May 2026

https://github.com/zpreisler/modules

Python libraries and modules for processing simulation outputs

data-analysis python scripts tensorflow

Last synced: 13 May 2026

https://github.com/iguptashubham/pizzahut-analysis-sql

best dataset for data analysis. Pizzahut data analysis done by Shubham Gupta in MySql. This dataset is provided by friend of mine intern at pizzahut. In pizzahut, they used this dataset to train and ask question. This data does not reveal anything about the pizzahut. It is safe to share. data

data-analysis data-analytics database dataset datasets mysql mysql-database pizzahut

Last synced: 14 May 2026

https://github.com/pferreirafabricio/data-immersion

🏊🏻‍♂️ Activities and exercises from 'Imersão Dados' event

data data-analysis data-science dataset jupiter-notebook python

Last synced: 14 May 2026

https://github.com/sunnybibyan/random_data_generation

A project that generates a dataset using various statistical distributions (Normal, Uniform, Exponential, Random Integers, and Binomial) and performs data analysis. Includes visualizations and an option to export the data as a CSV file.

data-analysis data-visualization python random-data-generation statistics streamlit-webapp

Last synced: 13 Jun 2026

https://github.com/abhi18av/innovation-competition

Submission for a programming challenge

clojure clojurescript data-analysis

Last synced: 13 Jun 2026

https://github.com/reinmagine/eliminating-no-sensor

Contains my project that analyzes air quality sensor data to determine if the NO (Nitric Oxide) sensor in N. Mai, Los Angeles, CA can be removed without affecting data accuracy.

air-quality-sensor colab-notebook cost-optimization data-analysis data-optimization matplotlib-python nitric-oxide pyspark-python python sql

Last synced: 14 Jun 2026

https://github.com/soufianboukir/ecom-analytics-platform

End-to-end data science project on an Amazon sales dataset, including data preprocessing, analysis, modeling, and a Streamlit dashboard for insights and decision-making.

data-analysis data-science data-visualization data-visualization-dashboard forecasting-models timeseries

Last synced: 14 Jun 2026

https://github.com/kaz-yos/distributed

Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulation Study (Pharmacoepidemiol Drug Saf 2018)

data-analysis epidemiology statistics

Last synced: 15 Jun 2026

https://github.com/ensinho/data-analysis

My repository for data analysis studys in Python.

csv data-analysis graphs python python-documentation

Last synced: 15 Jun 2026

https://github.com/chetanmalviya513/Firm-Financial-Transaction-Analysis

📊 Financial Analysis & Forecasting Processed large-scale financial data using Python for trend analysis and insights. Developed interactive Tableau dashboards to improve forecasting accuracy and reduce costs by 25%.

data-analysis financial-data forecasting insights msexcel pandas python reporting tableau-dashboards

Last synced: 15 Jun 2026

https://github.com/kaushik0911/jubilant-guide

A Streamlit application for advanced route planning and accessibility analysis using OpenRouteService (ORS). Explore optimal routes while avoiding roadblocks and discover points of interest (POIs) within travel time ranges.

data-analysis data-visualization geospatial-analysis python streamlit

Last synced: 16 Jun 2026

https://github.com/mindgamesnl/yanderestats

https://mindgamesnl.github.io/YandereStats/

data-analysis reporting-pipeline yandere yandere-sim

Last synced: 18 Jun 2026

https://github.com/duoan/machine-learning-notebook

A notebook repository for tracking learning machine learning notebook.

data-analysis decision-tree ensemble-model gbdt machine-learning numpy pandas xgboost

Last synced: 18 Jun 2026

https://github.com/jmssnr/shuffle-kit

shuffle-kit: model and analyze playing card shuffles in Python

data-analysis playing-cards python shuffle statistics

Last synced: 19 Jun 2026

https://github.com/lebrancconvas/how-much-love-in-thai-song

How much Love song among the Thai Songs?

data-analysis side-project web-scraping

Last synced: 19 Jun 2026

https://github.com/kirkalyn13/open-signal-report-generator

Script used to generate results/summary, including the trends of flagged provinces, from the raw excel data file,

data-analysis data-science data-visualization matplotlib numpy pandas python

Last synced: 19 Jun 2026

https://github.com/souravsuvarna/whatsapp-chat-analyzer-api

The WhatsApp Chat Analyzer API is a public api specifically designed for frontend enthusiasts who are interested in building a WhatsApp Chat Data Visualizer project. Built on FastAPI, this API offers a seamless and efficient method to process chat data and returns the processed result data in JSON format.

api data-analysis data-science fastapi publicapi python

Last synced: 20 Jun 2026

https://github.com/markmusic27/data-statistics-calculator

💣 This method (made in JavaScript / Python) can find the mean, median, mode, range, and standard deviation.

data-analysis standard-deviation statistics statistics-calculator

Last synced: 20 Jun 2026

https://github.com/alicankaya192/world-happiness-report-2025

Comprehensive exploratory data analysis (EDA) and visualization of the World Happiness Report 2025. Analyzes global rankings, regional distributions, key happiness factors, and detects wealth-happiness paradox outliers using Python (Pandas, Matplotlib, SciPy).

correlation-analysis data-analysis data-science data-visualization eda exploratory-data-analysis global-happiness happiness-index matplotlib pandas python scipy statistics whr-2025 world-happiness-report

Last synced: 21 Jun 2026

https://github.com/alicankaya192/ai-jobs-market-2025-2026-salaries

🤖 Global AI & LLM jobs market analysis (2025–2026). Salary trends, remote work premiums, top paying skills, and LLM engineering vs traditional AI comparisons. 📈

ai-jobs data-analysis data-science data-visualization eda exploratory-data-analysis generative-ai jobs jupyter-notebook llm-learning market-analysis matplotlib pandas salary-analysis statistics

Last synced: 21 Jun 2026

https://github.com/ryanfranklin237/data-cleansing

A group of python scripts that clean large data sets by removing duplicate data, putting data in correct formats, and removing redundant cells

data-analysis data-cleaning data-science extract-transform-load pandas-dataframe python

Last synced: 23 Jun 2026

https://github.com/rogernet/desafio-profissional-produto-data-driven

Ajudar a formar Analistas de Produto, PMs e Gestores de Negócio capazes de tomar decisões estratégicas baseadas em dados.

data-analysis data-science data-visualization product

Last synced: 23 Jun 2026

https://github.com/abhik1711/material-classification-and-energy-band-prediction---excavate-25

A Two-Stage Machine Learning Pipeline: A Binary Classifier to identify insulators with high accuracy and a Stacking Regressor to predict precise band gap values for insulators by leveraging advanced feature engineering techniques and ensemble learning methods

data-analysis machine-learning python

Last synced: 23 Jun 2026

https://github.com/rudra-g-23/find-my-joint

A utility to find potential join keys (matching columns) across multiple DataFrames.

data-analysis data-visualization join network-graph pandas pandas-dataframe

Last synced: 24 Jun 2026

https://github.com/infinitode/duplipy

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

ai augmentation data-analysis data-preprocessing data-science images language-models nlp preprocessing text-data text-datasets text-formatting

Last synced: 28 Jun 2026

https://github.com/amruthadevops/suspicious_web_threat_interactions

To detect and analyze patterns in web interactions for identifying suspicious or potentially harmful activities

cyber-security data-analysis data-science data-visualization jupyter-notebook machine-learning powerbi python

Last synced: 29 Jun 2026

https://github.com/savinrazvan/heredity

An AI that assesses the likelihood of genetic traits in individuals using a Bayesian Network to analyze family genetic data, modeling genetic inheritance and mutations to infer probabilities of gene presence and trait expression.

ai bayesian-network biological-data-analysis data-analysis educational-project family-genetics genetic-inheritance genetic-traits heredity mutation-modeling probability-calculation python

Last synced: 29 Jun 2026

https://github.com/selcuk05/forbes_top_100_celebrities_data_analysis

Forbes Top 100 Celebrities since 2005 Data Analysis and Visualization

data-analysis data-science

Last synced: 11 Oct 2025

https://github.com/agungbudiwirawan/sales-analysis-using-excel-formulas

The objective of this project is to analyze supermarket sales data using formulas in Microsoft Excel.

data-analysis excel excel-formulas microsoft-excel spreadsheet

Last synced: 08 Jan 2026

https://github.com/cyberoctane29/unicorn-companies-analysis

This project explores unicorn companies, private startups valued at over $1 billion, using Python for data analysis. It covers industry trends, geographic distribution, and investment patterns through EDA, including data cleaning, handling missing values, datetime transformations, and visualizations to uncover key insights.

data-analysis eda numpy pandas python

Last synced: 02 May 2026

https://github.com/cworld1/novel-analysis

A simple project for analyzing Chinese novels

data-analysis novel

Last synced: 17 Mar 2025

https://github.com/hifza-khalid/book-management-system-sql

A Book Management System SQL project 📚 featuring tables for Authors ✍️, Books 📖, Customers 👤, and Orders 🛒. Includes sample queries for tracking book sales 💰, pricing by genre 🎭, and customer order history 📅.

book-management data-analysis database-management sql sql-queries

Last synced: 03 Feb 2026

https://github.com/nafisalawalidris/springforth-university-foodbank

Springforth University Food Bank: A collaborative initiative with UNESCO to address student food insecurity. Contains code and resources for the web application, data analysis, and insights into the prevalence and impact of food insecurity on academic performance.

academic-performance collaborative-initiative data-analysis data-visualization excel pivot-tables powerbi springforth-university-food-bank student-food-insecurity unesco

Last synced: 17 Feb 2026

https://github.com/ansh420/mcdonald_case-study

It is basically depend on the market Segment Analysis. It is a case study of mcDonald.

algorithms-implemented data-analysis python3 segmentation

Last synced: 12 Apr 2026

https://github.com/geetisha/sales_insight_data_analysis_using_sql_and_tableau-etl-

Sales Insights - A Data Analysis Project performed on Tableau & SQL Topics

dashboard data-analysis data-visualization mysql project sales-analysis sql tableau

Last synced: 07 Jan 2026

https://github.com/azmainadel/twitter-data-neo4j

Playing with graph database on a large dataset of twitter data.

data-analysis data-visualization neo4j-database snap

Last synced: 06 Apr 2025

https://github.com/dcs-training/intronetworkanalysis

This is a repository for the Introduction to Network Analysis course provided by Brian Wong for the CDCS. Within the repository there are files with sample datasets and a guide to building datasets. It will be updated before each section. Go to the Readme file

data-analysis data-visualisation gephi network-analysis text-analysis

Last synced: 27 Jan 2026

https://github.com/cosmoduende/r-twitter

Explore your Twitter activity with R: Sentiment Analysis and Data Visualization. How to analyze your Twitter account (or any account), discover your habits and sentiments with the "rtweet" package and NLP.

data-analysis data-visualization lemmatization nlp nlp-library nlp-resources nltk nltk-library r-package r-programming r-studio rtweet stemming twitter twitter-api twitter-data twitter-data-analysis twitter-data-extraction twitter-sentiment-analysis udpipe

Last synced: 10 Oct 2025

https://github.com/moindalvs/learn_eda_house_price_dataset

Data Set: House Prices: Advanced Regression Techniques Exploratory Data Analysis on more than 80 features

cardinality data-analysis data-science data-structures data-visualization missing-values

Last synced: 10 Oct 2025

https://github.com/kamanhang/sqldatawarehousedataengineeringproject

This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics

customer-analytics data-analysis data-cleaning data-engineering data-modeling data-pipeline data-visualization datascience etl-pipeline postgresql powerbi powerbidashboard sales-analysis sql

Last synced: 10 Oct 2025

https://github.com/ryanfranklin237/data-visualization-python

A tool that allows you to visualize data from a csv or excel file in a graph or charts form

data-analysis data-science data-visualization matplotlib pandas-dataframe python

Last synced: 11 Jun 2026

https://github.com/saeun-park/data-analysis

데이터 분석 프로젝트 및 공모전

anova-test data-analysis data-visualization statistics

Last synced: 21 Jan 2026

https://github.com/gher-uliege/bluecloud-plankton

Spatial interpolation of plankton data using a neural network

data data-analysis data-visualization neural-network oceanography

Last synced: 30 Mar 2025

https://github.com/abhi227070/whatsapp-chat-analyzer

The WhatsApp Chat Analyzer is a project that leverages machine learning and natural language processing techniques to analyze chat data from WhatsApp conversations. It provides insights such as message statistics, sentiment analysis, word clouds, and more.

artificial-intelligence data-analysis data-visualization machine-learning machine-learning-algorithms python-3 python-programming

Last synced: 29 Jun 2026

https://github.com/viztruth/google-play-store-data-analysis

This repository contains all the materials of my final project 'Google Play store Data Analysis' for the 'Telling Stories with Data' course at PES University.

data-analysis data-visualization

Last synced: 21 Aug 2025

https://github.com/harshals499/ecosecure-visualization

Data visualization project using Qlik to analyze sales performance for EcoSecure Systems.

business-intelligence data-analysis data-visualization qlik-sense sales-analysis

Last synced: 12 Jun 2026

https://github.com/sing-group/bew

Public repository for Biofilmfs Experiment Workbench (BEW).

aibench data-analysis data-management java jfreechart workbench

Last synced: 03 Jul 2025

https://github.com/macdon112/layoff-analysis

SQL data cleaning & analysis of global layoffs

data-analysis data-cleaning data-exploration sql

Last synced: 21 Feb 2026

https://github.com/PatriLoto/Intro_R_para_reinventarTEC_2021

Material para el taller de Primeros pasos en R para el análisis de datos

data-analysis rstats

Last synced: 10 Oct 2025

https://github.com/smusab9152/pokemon_data_analysis

This repo that explores and analyzes a dataset of Pokémon attributes. The analysis includes data cleaning, exploratory data analysis (EDA), and visualizations .

analytics data-analysis data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas pokemon python seaborn statistical-analysis

Last synced: 02 May 2026

https://github.com/narenkhatwani/arkouda-projects

This repository contains the source codes of the projects done using Arkouda (a software package that allows a user to interactively issue massive parallel computations on distributed data using functions and syntax that mimic NumPy, the underlying computational library used in most Python data science workflows.)

arkouda data-analysis data-analytics data-science high-performance high-performance-computing highperformancecomputing numpy pandas parallel-computing parallel-processing parallelization python

Last synced: 17 Apr 2026

https://github.com/vruddhi18/e-commerce_data_analysis_powerbi_dashboard

The E-Commerce Data Analysis project leverages Power BI to analyze sales and customer insights from Blinkit, Zepto, Myntra, and Flipkart, providing interactive dashboards to enhance e-commerce strategies.

data-analysis powerbi

Last synced: 27 Feb 2026

https://github.com/shadan100/stroke-prediction-analysis

A web based application to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

artificial-intelligence data-analysis data-science django django-framework jupyter-notebook machine-learning matplotlib pandas predictive-modeling python stroke-prediction web-application

Last synced: 08 Mar 2026

https://github.com/happybono/sonatasmooth

Provides three different noise reduction algorithms for smoothing out data : Rectangular Averaging, Binomial Median Filtering, and Binomial Averaging. It processes data from a list and displays the results in another list.

algorithms average binomial binomial-coefficient binomial-theorem calibration csharp data-analysis data-calibration dynamic-noise-reduction median noise-algorithms noise-reduction noise-reduction-kernel outliers rectangular-averaging windows-desktop windows-desktop-application windows-forms winforms

Last synced: 30 Oct 2025

https://github.com/priyanshubiswas-tech/aws-mwaa-elt-airflow-sql-dbt-superset-project

This project was created as part of an assessment for DigitalXC AI. It demonstrates a cloud-based ELT pipeline using AWS MWAA, Airflow, dbt, PostgreSQL, and Superset. The pipeline automates data ingestion from S3, transformation with dbt, and visualization through Superset, following modern data engineering practices on a scalable AWS architecture.

apache-airflow apache-superset aws-s3 dag data-analysis data-engineering-pipeline data-visualization dbt elt-pipeline python rds-postgres

Last synced: 03 Jul 2025

https://github.com/dwidevelopes/database-input-pelanggran-mahasiswa

Menginput data Mahasiswa Yang Melakukan Pelanggran yang siap di data dan di hukum Dan juga siap Terkena Sanksi

aplikasi aplikasi-sekolah data data-analysis database input-method mahasiswa sekolah siswa siswi website

Last synced: 02 May 2026

https://github.com/ifibla/adsdb-project

Algorithms, Data Structures and Databases Project

data-analysis data-engineering python

Last synced: 12 Apr 2026

https://github.com/chen0040/pyspark-advanced-algorithms

Samples of Advanced Algorithms and Data Analysis implemented in pyspark

advanced-algorithms data-analysis map-reduce pyspark

Last synced: 12 Jan 2026

https://github.com/gholamrezadar/most-profitable-actors

Finds the list of actors with the most boxoffice profit using TMDB API.

crawling data-analysis tmdb

Last synced: 16 Jan 2026

https://github.com/shuklayash02/data_analysis_using_r

Covid19 analysis and cleaning of data where the death age and deaths of specific gender is cleaned and analysed

analysis cleaning-data data-analysis data-visualization rprogramming

Last synced: 09 Oct 2025

https://github.com/jfjlaros/spreadscript

SpreadScript: Use a spreadsheet as a function.

automation command-line data-analysis evaluation function interface spreadsheet

Last synced: 16 Oct 2025

https://github.com/dual-points/dplearn

A Python package for data analysis.

data-analysis data-science python python-package

Last synced: 16 Oct 2025

https://github.com/john-science/data_science_by_example

Examples of Data Science Tools & Libraries

data-analysis data-science ipython pandas

Last synced: 12 May 2025