An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/duoan/machine-learning-notebook

A notebook repository for tracking learning machine learning notebook.

data-analysis decision-tree ensemble-model gbdt machine-learning numpy pandas xgboost

Last synced: 18 Jun 2026

https://github.com/jmssnr/shuffle-kit

shuffle-kit: model and analyze playing card shuffles in Python

data-analysis playing-cards python shuffle statistics

Last synced: 19 Jun 2026

https://github.com/alicankaya192/world-happiness-report-2025

Comprehensive exploratory data analysis (EDA) and visualization of the World Happiness Report 2025. Analyzes global rankings, regional distributions, key happiness factors, and detects wealth-happiness paradox outliers using Python (Pandas, Matplotlib, SciPy).

correlation-analysis data-analysis data-science data-visualization eda exploratory-data-analysis global-happiness happiness-index matplotlib pandas python scipy statistics whr-2025 world-happiness-report

Last synced: 21 Jun 2026

https://github.com/abhik1711/material-classification-and-energy-band-prediction---excavate-25

A Two-Stage Machine Learning Pipeline: A Binary Classifier to identify insulators with high accuracy and a Stacking Regressor to predict precise band gap values for insulators by leveraging advanced feature engineering techniques and ensemble learning methods

data-analysis machine-learning python

Last synced: 23 Jun 2026

https://github.com/jasontanx/capstone-project-machine-learning

A final semester project from my MSc Data Science course

data-analysis datascience machinelearningprojects tourism-data

Last synced: 26 Mar 2025

https://github.com/pinedah/loan-approval-predictor-excercise

Proyecto de Machine Learning para predecir la aprobación de tarjetas de crédito utilizando dos datasets. Incluye limpieza, análisis exploratorio, imputación de datos sintéticos y modelado con algoritmos como Random Forest, Gradient Boosting y Árboles de Decisión.

data-analysis data-science decision-tree escom gradient-boosting machine-learning predictor random-forest school-project

Last synced: 11 Oct 2025

https://github.com/tnickster/ai-analyst-agent

Ask questions about your business data in plain English, Get automatic SQL queries and visualizations, Receive AI-powered insights and recommendations, No SQL knowledge required

ai-assistant business-analytics business-intelligence data-analysis data-analyst data-visualization database-query gpt-4 langchain llm mysql natural-language-processing openai plotly python sql-generation streamlit

Last synced: 08 Apr 2026

https://github.com/steciuk/ium-recommendation-system

Evaluation and comparison of 3 different recommendations models for web shopping service simulation.

data-analysis model-evaluation recomendation-system

Last synced: 29 Oct 2025

https://github.com/vimal0156/ruaroa-ai

🧙‍♂️ Zero-Code Machine Learning Wizard - Transform ideas into intelligent solutions without writing code. AI-powered ML pipeline automation with interactive web interface.

ai-agents ai-assistant artificial-intelligence automated-machine-learning code-generation data-analysis data-science deep-learning jupyter machine-learning machine-learning-pipeline neural-networks no-code openai python scikit-learn streamlit visualization

Last synced: 09 Apr 2026

https://github.com/devandrenicolas/analise-de-vendas

This project is a comprehensive data analysis tool designed to analyze sales performance data. It includes modules for generating fake sales data, cleaning and preprocessing the data, and performing exploratory data analysis (EDA) with advanced visualizations.

data-analysis data-visualization faker-generator matplotlib pandas python

Last synced: 07 May 2026

https://github.com/al-ghaly/power-bi-dashboard

A dashboard to analyze data specializations job market.

dashboard data-analysis powerbi

Last synced: 02 Feb 2026

https://github.com/adityakumarsingh01/customer-purchase-behaviour-analysis

A data analysis project exploring online consumer behavior and FOMO effects using EDA on survey data.

consumer-behavior data-analysis eda fomo online-shopping python survey-data

Last synced: 25 Apr 2026

https://github.com/pranabdas/suvtools

Python library for analyzing and visualizing SSLS SUV Beamline data.

data-analysis data-visualisation python

Last synced: 07 May 2025

https://github.com/ganesh2409/cricket-player-performance

This repository contains a comprehensive project focused on analyzing cricket player performance using various datasets, including batting, bowling, and match results. The project involves data preprocessing, feature engineering, and model training to predict and evaluate player performance scores. It includes detailed scripts for data analysis

cricket-performance-analysis data-analysis machine-learning sports-analytics

Last synced: 05 Aug 2025

https://github.com/khuyentran1401/sample_datapane_script

This repo shows how to use Datapane create a simple script to see the rank of the authors or publications with respect to publishing frequency

data-analysis data-science datapane python

Last synced: 21 May 2026

https://github.com/cuadernin/coffeeanalysis

Análisis de datos correspondiente a la tercera etapa de la certificación de Datacamp.

coffee data-analysis datacamp python

Last synced: 07 Aug 2025

https://github.com/roberto-butti/fit_explorer

FIT File Explorer, in GO Lang

data-analysis fitness geospatial golang

Last synced: 12 Apr 2025

https://github.com/giordano-lucas/tesco-extension

Products clustering and interactive visualization

clustering data-analysis data-visualization tesco

Last synced: 17 Jun 2026

https://github.com/nafisalawalidris/northwind-traders-sales-analysis

Northwind Traders Sales Analysis project, which analyses sales data for a fictitious company. It utilises the Northwind Database and includes SQL queries to provide insights on employees, products, suppliers and revenue. The project aims to help the company gain valuable information for business decision-making.

business-insights data-analysis database northwind-traders sales sql

Last synced: 07 Aug 2025

https://github.com/navdeep-g/data-quality-checker

A comprehensive Python tool for data analysis and data quality

data-analysis data-science pandas python

Last synced: 16 May 2026

https://github.com/garcane/nike_web_crawler

This project involves web scraping Nike's product pages to extract product names, prices and links. The project showcases three different implementations of the web crawler using Selenium and BeautifulSoup. It also includes visualisation of the scraped data using Matplotlib and Seaborn.

beautifulsoup data-analysis data-visualization python selenium web-crawler web-scraper webcrawler webscraper webscraping webscraping-beautifulsoup

Last synced: 18 Apr 2026

https://github.com/turquetti/projeto5-vamoai

Projeto final da Resilia + iFood <3

data-analysis python tableau

Last synced: 14 May 2026

https://github.com/sarathchandranpm/cleaning-and-exploratory-analysis-of-global-layoff-data

This project involves a thorough data analysis and cleaning process centered on global layoff data. It showcases advanced data management abilities by integrating data cleaning methods with a detailed exploration of workforce reduction patterns across various companies, industries, and countries.

data-analysis data-cleaning mysql sql

Last synced: 22 Sep 2025

https://github.com/jen-uis/loan-status-prediction

This repository contains project materials for the Winter STAT 206 class, University of California, Riverside, A. Gary Anderson School of Management.

data data-analysis data-analytics data-cleaning data-visualization descriptive-analytics julia julia-language jupyter-notebook predictive-analytics predictive-modeling team-collaboration

Last synced: 02 Jan 2026

https://github.com/draym/swmanager

Web-app to help you in your daily life raids in SpacesWars thanks to game statistics and data management

dashboard-application data-analysis data-visualization game-data game-utility

Last synced: 19 Jun 2025

https://github.com/the-tech-idea/beepdm

A Library for Managing your Connection to Different DataSources . Still in Alpha.please be patient

data-analysis data-management data-management-platform data-science database dataset information

Last synced: 08 Aug 2025

https://github.com/simranjeet97/ipl-dataanalysis

Data Analysis performed on IPL Dataset with Data Profiling, Data Pre-Processing, Data Manipulation, and Data Visualization.

artificial-intelligence data-analysis data-manipulation data-mining data-preprocessing data-science data-visualization indian-premier-league-2008-2018 ipl ipl-dataset iplayer python

Last synced: 08 May 2026

https://github.com/messi10tom/ai-based-grade-prediction

GDSC task-1: Build a model to predict a student’s final grade based on features such as attendance, participation, assignment scores, and exam marks.

ai data-analysis data-science regression streamlit

Last synced: 02 May 2026

https://github.com/ebowwa/chatgpt-export-processor

🤖 Extract, analyze & search your ChatGPT conversations locally | Privacy-first tool for OpenAI ChatGPT data export processing | Python CLI with embeddings support

ai-tools chatgpt chatgpt-export chatgpt-tools cli conversation-analysis data-analysis data-extraction embeddings local-first nlp openai openai-api privacy python

Last synced: 19 May 2026

https://github.com/revan-alqahmi/summarize-talabat-company-reviews

Natural Language Processing Project, which is a program that analyzes Arabic comments at Talabat Company and classifies them into positive, negative, and neutral using machine learning algorithms and natural language processing techniques.

artificial-intelligence data-analysis machine-learning-algorithms natural-language-processing python

Last synced: 11 Jan 2026

https://github.com/zachpinto/real-time-indicators

Streamlit-based analytics dashboard visualizing real-time economic indicators. This project uses cron jobs to provide real-time updates of common economic indicators

analytics-engineering data-analysis plotly streamlit visualization

Last synced: 15 May 2026

https://github.com/salman-khan-mohammed/predicting-the-intent-of-online-shoppers

This project aims to predict online shoppers' purchase intentions using browsing history and user data from e-commerce sites. By analyzing clickstream and session information, the goal is to create a machine learning model that accurately forecasts customers' likelihood of making a purchase.

cluster-analysis data-analysis data-pre eda outliers prediction

Last synced: 31 Oct 2025

https://github.com/enamhasan/analyzing-the-impact-of-recession-on-automobile-sales

Data Analyis and Visualization Dashboard of the Impact of Recession on Automobile Sales

dashboard data-analysis data-science data-visualization pandas plotly plotly-dash python

Last synced: 05 May 2026

https://github.com/jayita11/atliqo-bank-credit-card-launch-eda

This project involves exploratory data analysis and statistical testing for AtliQo Bank's new credit card launch. Key insights include targeting high-income occupations and the 18-25 age group. Recommendations focus on tailored marketing campaigns, education, and incentives to enhance credit card adoption and usage among young adults.

data-analysis hypothesis-testing matplotlib p-value pandas python seaborn statistics z-test

Last synced: 09 Apr 2026

https://github.com/JovaniPink/excel-powerbi

The folder of my work with Excel, VBA, and PowerBI for Data Analysis & Visualization.

data-analysis data-visualization dax excel excel-vba power-pivot power-query powerbi vba-macros

Last synced: 20 Jul 2025

https://github.com/leocornus/leocornus-visualdata

JavaScript libraries to make data visualization simpler and easier.

data-analysis data-mining data-visualization data-visualization-simpler javascript-library

Last synced: 10 Aug 2025

https://github.com/abhaysingh71/laptop-price-predictor

Laptop Price Predictor is a Dockerized machine learning project that predicts laptop prices based on specs using ensemble models like Random Forest, XGBoost, and Gradient Boosting.Including Streamlit UI, and full Docker support.

data-analysis data-science deployment docker docker-image ensemble-learning laptop-price-prediction machine-learning-algorithms streamlit xgboost

Last synced: 05 May 2026

https://github.com/deepanshkhurana/cloudsimplifier

Simple helper functions to fetch and read data from various formats stored on Amazon AWS S3 Buckets. Most functions are essentially wrapping over cloudyR.

amazon aws cloudyr data-analysis data-fetching data-science package r rpackage s3

Last synced: 20 May 2026

https://github.com/ahmednurabdii/data-analytics-portfolio-superstore

My first portfolio project showcasing data cleaning, analysis, and visualization of Superstore sales data.

data-analysis data-visualization jupyter-notebook matplotlib numpy pandas portfolio-project python sales-analysis scipy seaborn superstore-dataset

Last synced: 07 Apr 2026

https://github.com/devexpress-examples/web-forms-pivot-grid-implement-editable-aspxpivotgrid

This example demonstrates how to allow end-users to modify data cell values in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 09 Mar 2026

https://github.com/sayantanidalui/student-mental-health-analysis

A SQL-based analysis project exploring student mental health, stress, and lifestyle patterns. Uncovers key insights using joins, CTEs, and window functions — no other tools used.

data-analysis mental-health mysql sql studentdata

Last synced: 07 Jul 2025

https://github.com/jcbritobr/iris

Iris dataset and data analysis with julia language.

data-analysis data-science data-visualization iris-dataset julia-language

Last synced: 06 Apr 2025

https://github.com/atxtechbro/glassdoorwebscraping

"Scraping Glassdoor: A GraphQL Journey" is an advanced data harvesting tool leveraging GraphQL and an API-first strategy to extract and analyze Glassdoor data for business intelligence and predictive analytics.

api-first-approach business-intelligence data-analysis data-harvesting data-mining data-science glassdoor-scraper graphql html machine-learning performance-optimization predictive-analytics python requests-library-python scaleability scraper system-design web-scraping

Last synced: 16 May 2026

https://github.com/filiplangiewicz/businessintelligence

🏭 Data warehouses and business intelligence project

airbnb business-intelligence data-analysis data-warehouse

Last synced: 09 Mar 2026

https://github.com/anurag-kumar-molankala/anurag-kumar-molankala

👋 About Me I'm a Power BI Developer with a passion for data visualization and UI/UX design. I create interactive dashboards that turn data into clear, actionable insights for smarter decision-making.

business-intelligence dashboards data-analysis data-visualization dax-query mlanguage powerbi sqlserver uiuxdesigner

Last synced: 25 Jan 2026

https://github.com/sumidcyber/dataviz-master

This Python application provides a user-friendly interface to load and visualize the contents of a CSV file. Users can choose from various types of graphs and perform analyses on the dataset.

data-analysis data-analysis-project data-analysis-python database databases python python3

Last synced: 02 Jan 2026

https://github.com/thanaphongk37/data-science-and-data-analyst-project

Portfolio Data Analysis and Data Science projects and Data Engineer built using Azure Service, SQL and Python.

apache-superset azure-storage dashboards data-analysis data-science databricks dataengineering datafactory datapipeline powerbi python sisense sql sql-server visualization

Last synced: 11 May 2026

https://github.com/pymarcus/tcc_sistemasdeinformacao2025

This application is part of a research project aimed to use Gemini AI agent to identify "atoms of confusion" -- minimal code elements that cause misunderstandings -- in the context of Software Engineering.

atoms-of-code ci-cd clean-architecture concurrent-programming data-analysis design-patterns gemini-api golang ifmg inteligencia-artificial postgresql software-engineering solid tcc tdd workerpool

Last synced: 14 May 2026

https://github.com/denko5/sales-analysis

A complete SQL-based sales analysis project covering Africa, showcasing data cleaning, exploratory analysis, insights, and lessons learned. The project highlights sales trends, regional performances, and marketing effectiveness across multiple platforms.

africa data data-analysis data-science exploratory-data-analysis insights kenya sales sql

Last synced: 24 Jan 2026

https://github.com/theanujsinha01/rainfall-prediction-using-machine-learning

This project predicts whether it will rain or not based on weather features like pressure, humidity, dew point, cloud cover, sunshine, wind direction, and wind speed. We use a Random Forest Classifier, a popular ML algorithm, trained on historical weather data. The model learns patterns and helps us forecast rain chances.

classification data-analysis eda machine-learning-algorithms matplotlib numpy pandas python scikit-learn seaborn supervised-learning

Last synced: 11 Apr 2026

https://github.com/shriram-vibhute/digit_classification

This project demonstrates various machine learning techniques for classifying handwritten digits from the MNIST dataset. It covers data preprocessing, model training, evaluation, and advanced classification strategies.

classification data-analysis data-visualization machine-learning matplotlib numpy pandas sk-learn

Last synced: 28 Oct 2025

https://github.com/ghackenberg/kurs-datenanalyse

This repository contains material for my data analysis course. In this course we first introduce the concept of databases and SQL, before diving into OLAP and other data analysis tools.

data-analysis data-structures data-warehouse entity-relationship-diagram etl graph list olap relational-algebra relational-database sql tree

Last synced: 17 Feb 2026

https://github.com/adolbyb/data-science-python

An Introduction to Data Science and Data Visualization with the FAU Data Science and Machine Learning Club

data-analysis data-science data-visualization jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/ehopperdietzel/billionaires-analysis

Análisis de la cantidad de billonarios por país. Inspirado en el artículo "Russian Billionaires"

bootstrap data-analysis poisson-distribution prediction

Last synced: 18 May 2026

https://github.com/fatihilhan42/web_scraping_football_statistics_per_game_data-main

In this notebook I will describe the process of scraping data from web portal understat.com that has a lot of statistical information about all games in top 5 European football leagues.

data-analysis data-manipulation data-science data-scraping data-visualization jupyter-notebook python

Last synced: 19 May 2026

https://github.com/arv-anshul/easy-analysis

A python package to perform Data Analysis easily. (Not Recommended)

arv-dumped data-analysis data-science easy-analysis eda pypi pypi-package python3

Last synced: 14 May 2025

https://github.com/virajbhutada/walmart-retail-analyzer

Gain valuable insights into retail sales with the "Walmart Retail Performance Dashboard" in MS Excel. This user-friendly tool facilitates an in-depth analysis of key sales metrics, providing a comprehensive view of Walmart's performance. Make data-driven decisions for informed and strategic business outcomes.

analytics data-analysis data-science data-visualization excel insights interactive-visualizations performance-analysis retail-sales walmart

Last synced: 04 Mar 2026

https://github.com/saksham-jain177/automated-data-analysis-and-visualization

About Automated Data Analysis and Visualization is a Streamlit web application designed for quick and insightful data analysis. Users can easily upload CSV files, perform automated preprocessing, and generate interactive visualizations such as histograms, scatter plots, and heatmaps.

automated-reporting data-analysis data-preprocessing data-science data-visualization datasets exploratory-data-analysis interactive-visualizations machine-learning python streamlit

Last synced: 15 May 2026

https://github.com/as16082023/nashville-housing-data-cleaning-project

This project involved using MySQL to clean and optimize a Nashville housing dataset, addressing key data quality issues to ensure it was ready for accurate analysis.

data-analysis data-cleaning mysql nashville-housing-data

Last synced: 10 Apr 2025

https://github.com/helosantosdesousa/analise-previsao-de-rotatividade-ml

Projeto final do Bootcamp Data Girls 2025 que analisa a rotatividade de funcionários usando Machine Learning. Com base no dataset IBM HR Analytics Attrition, o projeto identifica os principais fatores de risco e cria modelos preditivos (SVC e Random Forest) com até 89% de acurácia para antecipar saídas e apoiar decisões estratégicas de RH.

analise-de-dados analise-exploratoria bootcamp ciencia-de-dados colab-notebook dados data data-analysis data-science dataanalytics dataframe eda machine-learning machine-learning-algorithms pandas python random-forest svc

Last synced: 16 Apr 2026

https://github.com/idhs-song/resume-matcher-agent-cn

🤖 Enhance your job applications with this AI-driven resume matcher that analyzes job descriptions to optimize your resume for better chances of success.

api-integration automation backend-development data-analysis data-visualization github-actions job-search machine-learning natural-language-processing open-source-tools python recommendation-system resume-matching user-interface web-app

Last synced: 18 May 2026

https://github.com/madhuresh2011/amazon-sales-report-analysis-using-python

This project focuses on analyzing Amazon sales data using Python to uncover insights into sales performance, customer behavior, and product trends

charts cleaning-data data-analysis jupyter-notebook matplotlib numpy pandas python seaborn visualization

Last synced: 17 Apr 2026

https://github.com/nhsdigital/sde_summary_notebooks

Notebooks provided by the Wranglers for users to quickly gain insights on datasets inside the Secure Data Environment (SDE)

data-analysis data-linkage data-quality data-summary metrics statistics

Last synced: 12 Aug 2025

https://github.com/agustinmusanti/sqlchallenge-4

Desafio de creación de una base de datos SQL para una plataforma de streaming. Incluye DDL, DML y consultas avanzadas.

data-analysis database mysql sql streaming

Last synced: 18 May 2026

https://github.com/Narius2030/Hive-DataWarehouse-Analysis

Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems

apache-hadoop apache-hive data-analysis etl-pipeline hiveql machine-learning statistics

Last synced: 12 Aug 2025

https://github.com/jen-uis/la-crime-data-analysis

This repository contains project materials for the Fall 2023 MGT 256 class. This project is completed with assists from Professor Adem Orsdemir.

business-analytics crime-data crime-data-analysis data-analysis knn la-crimes-from-2020 la-safe r r-markdown r-studio report-generation rmd united-states visualization

Last synced: 14 Mar 2025

https://github.com/mahdi-eth/covid-analysis

Covid-19 data analysis project using python, numpy, pandas, matplotlib

data-analysis data-science python

Last synced: 13 Aug 2025

https://github.com/karatechop/noaa-storm-database-data-analysis

Analysis of population health and economic consequences of events documented in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

data-analysis knitr r rmarkdown

Last synced: 14 Mar 2025

https://github.com/bcko/ud-da-eda-whitewinequality

Udacity Data Analyst Nanodegree Project : Exploratory Data Analysis : White Wine Quality dataset

data-analysis exploratory-data-analysis rmarkdown rstudio udacity udacity-data-analyst-nanodegree

Last synced: 03 Jan 2026

https://github.com/drcbeatz/aynm-data

Python scripts for data cleaning and processing for AYNM (Pandas/NumPy/Selenium/AWS Textract)

automation aws-textract csv data-analysis data-cleaning ipynb numpy ocr pandas python reverb selenium shopify webscraping xml

Last synced: 07 Mar 2026

https://github.com/misaghmomenib/stock-momentum-analysis

A Python-based Data Analysis Tool Designed to Evaluate Stock Momentum. Leverages Historical Market Data to Identify Trends, Predict Price Movements, and Assist in Making Informed Investment Decisions.

data-analysis data-analysis-python data-visualization git open-source python

Last synced: 10 Apr 2025

https://github.com/shibam120302/heart-disease-data-analysis-by-shibam

You can read more on the heart disease statistics and causes for self-understanding. This project covers manual exploratory data analysis

analysis data-analysis scraper

Last synced: 13 Aug 2025

https://github.com/subhojit45/python3-iphones-x-flipkart-sales-analysis

A simple six questions and their insights derived from iphone sales on Flipkart dataset.

data-analysis jupyter-notebook python3 visual-studio-code visualization

Last synced: 19 May 2026

https://github.com/x1ao4/doc-merger

通过 python 脚本将两个相对不完整的文档合并为一个完整的文档 / merge two relatively incomplete documents into one complete document via python script

data-analysis data-merging document-analysis document-comparison document-processing documents filtering filtering-data merge merge-documents

Last synced: 28 Jun 2025

https://github.com/geobatpo07/office-hours-bootcamp

Practical case studies and labs from the Akademi 2025 Data Science & AI Bootcamp office hours.

artificial-intelligence data-analysis data-science data-visualization database deep-learning learning learning-by-doing machine-learning statistics

Last synced: 07 Mar 2026

https://github.com/naruaika/eruo-data-studio

A powerful yet friendly ETL tool powered by Polars backend

data-analysis data-science desktop-app gnome-desktop gtk4 proof-of-concept python spreadsheet

Last synced: 18 Jul 2025

https://github.com/anurag-kumar-molankala/data-professional-survey

This Power BI dashboard analyzes survey responses from data professionals, covering key aspects such as salary distribution, job satisfaction, and preferred programming languages. The insights help understand trends in the data industry and what matters most to professionals.

dashboard data-analysis data-visualization dax-measures dax-query demographics etl-process excel-import power-bi salary-analysis sql-server survey-analysis trend-analysis

Last synced: 02 Feb 2026

https://github.com/nelsonkariuki/dataanalysis

This project involves data analysis of vido game sales from https://www.kaggle.com/gregorut/videogamesales/download

data-analysis data-visualization python

Last synced: 11 Jun 2026