An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/shibbir24/a-data-driven-approach-to-food-security-and-supermarket-accessibility

A Data-Driven Approach to Food Security and Supermarket Accessibility

data-analysis matplotlib numpy pandas python3 seaborn

Last synced: 13 Apr 2026

https://github.com/rishitabansal9/adult-census-income-prediction

This is a project made for data analysis and income prediction using random forest classifier with 91% accuracy.

data data-analysis data-science feature-engineering random-forest-classifier

Last synced: 25 Mar 2025

https://github.com/saro0307/exploratory-data-analysis-terrorism

Phase 1 of Data Science project (program) to perform Exploratory Data Analysis on Terrorism using Python On Google Colab for Coderscave Internship sept 2023

colaboratory data-analysis datascience machine-learning numpy pandas python seaborn skit-learn visualization

Last synced: 13 Apr 2026

https://github.com/1401dev/customer-lifetime-value-prediction

A data science project leveraging Python and Scikit-Learn to build predictive models that estimate customer lifetime value (CLV). Includes data cleaning, feature engineering, and model selection to identify key drivers of CLV, supporting strategic decision-making in customer retention and marketing.

clv clv-analysis customer-retention data-analysis dataprocessing feature-engineering machine-learning marketing-analytics predictive-modeling python regression-analysis scikit-learn

Last synced: 06 May 2026

https://github.com/tatilimongi/first_python_project

Este repositório contém um estudo de caso de automação de planilhas em Python para análise de vendas de carros por fabricante ao longo dos anos

data-analysis email-sending file-manipulation graphical-visualization spreadsheet-automation

Last synced: 26 Mar 2025

https://github.com/sadia-khan13/data-preprocessing

Welcome to the Data preprocessing Repository! This repository is dedicated to showcase the comprehensive resources and implementations related to Data Preprocessing using Python and Jupyter Notebook.

artificial-intelligence data-analysis data-mining data-preprocessing data-science jupyter-notebook matplotlib numpy pandas python seaborn-python sklearn

Last synced: 11 Apr 2026

https://github.com/weisswuerste/polars-eurovision-analytics

Analytics example using both the Pandas and Polars libraries

data-analysis data-analytics pandas polars python python-3 python3

Last synced: 08 May 2026

https://github.com/jkaardal/csvnav

A memory-efficient python class for navigating large CSV/text files.

csv data-analysis data-science machine-learning memory-management

Last synced: 14 Jan 2026

https://github.com/isaqueiros/newspapersales-predictions-linearregression_and_regularisation

This notebook is a study on the sales of newspapers of a local stand, with intention to predict the newspaper sales performance based on the different features available. For this, 4 sklearn models are applied: Linear Regression, Lasso Regression, Ridge Regression and Elastic Net Regression.

data-analysis data-science linear-regression machine-learning python regularization-methods sklearn-library sklearn-linear-regression

Last synced: 02 May 2026

https://github.com/jakubteichman/bullbozer_price_prediction_ml_project

A bulldozer price estimatior from Kaggle competition dataset

data-analysis data-science estimation machine-learning prediction

Last synced: 06 Sep 2025

https://github.com/lucalullo/monitoring-healthcare-waiting-times-puglia

Monitoring and analysis of public healthcare waiting times in Puglia (Italy), 2024 — based on official open data

data-analysis healthcare italy jupyter-notebook kaggle open-data pandas public-data puglia time-series waiting-times

Last synced: 08 Jan 2026

https://github.com/wsu-carbon-lab/ezfit

Fitting in python made dead simple

data-analysis experimental-physics fitting pandas-accessor

Last synced: 14 Jun 2025

https://github.com/hyperentangledqubit/shellplot

shellplot -- Generate plot(s) directly from terminal via matplotlib or ggplot2 (plotnine)!

data-analysis ggplot2 graphics matplotlib plotnine plotting pyplot terminal

Last synced: 10 May 2026

https://github.com/bertiewooster/ipywidgets

Interactive data visualizations in a Jupyter Notebook per tutorial https://python.plainenglish.io/interactive-visualizations-with-pandas-seaborn-and-ipywidgets-173e5d7d6a5e

data-analysis data-science data-visualization ipython-notebook ipywidgets juypter-notebook python

Last synced: 06 Mar 2026

https://github.com/grandechowhiskey/fcc-data_analysis-projects

A collection of projects completed as part of the FreeCodeCamp "Data Analysis with Python" certification. These projects cover statistical calculations, data visualization, and trend analysis using real-world datasets.

data-analysis data-visualization matplotlib pandas python3 scikit-learn seaborn

Last synced: 01 May 2026

https://github.com/giorgossideris/athens_weather_analysis

Analyse the data of Athens' weather.

data-analysis visualization

Last synced: 16 Mar 2025

https://github.com/karishmagupta05/udemy-course-analysis

This project analyzes Udemy courses using Exploratory Data Analysis (EDA) techniques to uncover insights about course trends, pricing, subscriber counts, and popularity. By leveraging Python, Pandas, and data visualization libraries, we extract meaningful information from the dataset.

data-analysis data-visualization eda jupiter-notebook pandas python

Last synced: 13 Apr 2026

https://github.com/srinibas-masanta/electric-vehicle-analysis-dashboard

This repository features an interactive Tableau dashboard that visualizes electric vehicle (EV) adoption trends in the U.S. 🚗⚡ Explore EV growth, top manufacturers, regional distribution, and the impact of incentives—all in one dynamic view. 📊 Use filters to dive deeper into the data and uncover key insights! 🚀

dashboards data-analysis data-visualization tableau

Last synced: 15 Jan 2026

https://github.com/srinibas-masanta/zomato-customer-and-restaurant-analysis

This repository contains a comprehensive analysis of Zomato's platform, focusing on various aspects of customer behavior, restaurant performance, and market trends. The analysis leverages data-driven insights to answer key questions that can guide business strategies, enhance customer satisfaction, and optimize operational efficiency.

business-analytics data-analysis data-science data-visualization

Last synced: 02 Apr 2025

https://github.com/devexpress-examples/web-forms-pivot-grid-custom-summary-values

This example demonstrates how to determine the value type when you calculate custom summary values in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 06 Jul 2025

https://github.com/abhijeet107/final-project

Final project summation INTERNSHIP PROJECTS (2 WEEKS)

data-analysis data-cleaning-and-preprocessing excel mysql-database python tableau-public

Last synced: 23 Feb 2026

https://github.com/spacebakery/nba-trends-project

Data Science Foundations I | Exploratory Data Analysis in Python | Summarizing Relationship Between Two Features

categorical-variables data-analysis data-visualization matplotlib nba-dataset quantitative-variables scipy seaborn subset summary-statistics

Last synced: 11 Mar 2025

https://github.com/ankitpoddar07/sqlpizzas-saleproject

🍕 Pizza Sales Analysis with SQL

data-analysis database excel mysql powerbi ppt python

Last synced: 09 May 2026

https://github.com/samwhaaa/da_portfolio

Showcasing some of my Data Analytics projects

data-analysis data-analytics data-visualization jupyter jupyter-notebook python

Last synced: 01 Mar 2025

https://github.com/sadratehranian/data-collection-and-machine-learning

create a model using logistic regression to predict whether the fire alarm of a smoke detector should sound or not. Second, predicts whether an electric drive in a production plant may be faulty or not.

data data-analysis data-science datacollection logistic-regression machine-learning ml nn

Last synced: 05 Jan 2026

https://github.com/pratik-khose/realtime-sales-simulation

Power BI: Realtime Sales Simulation using SQL Server and Direct Query

data-analysis data-analytics data-visualization dax-query powerbi sql sql-server sqlserver

Last synced: 10 Jun 2026

https://github.com/auliannee/customer-analysis-with-tableau

This repository contains the data source and the tableau workbook.

data-analysis data-visualization tableau

Last synced: 12 Mar 2026

https://github.com/as16082023/motor-vehicle-thefts

Using SQL to analyze vehicle theft patterns across New Zealand, focusing on trends related to specific times and locations.

data-analysis mysql sql

Last synced: 10 Apr 2025

https://github.com/firetyrant/sql-portfolio-projects

Documenting my SQL learning journey with hands-on projects focused on data cleaning, analysis, and optimization.

bigquery data-analysis databases etl learning portfolio query-optimization sql

Last synced: 19 Apr 2026

https://github.com/satyam4229/prediction-of-different-diseases

Prediction of the different diseases with the help of different symptoms express the diseases in the real time. In the dataset, there are 132+ different symptoms on which the model is trained to give the best result of the disease.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 13 Apr 2026

https://github.com/roshaka/samplr

Samplr is a Python decorator for selecting a subset of items from a list, with options for customisation and informative console printouts.

data data-analysis data-engineering decorators list python sampling

Last synced: 14 Jan 2026

https://github.com/rainbowatcher/simple

Make data work easier, saving your working time

bigdata data-analysis etl

Last synced: 10 Apr 2025

https://github.com/mattholy/haka

HaKa is an out-of-the-box tool system designed for data engineers and data analysts in medium-sized enterprises. It is easy to deploy and scale.

celery data-analysis data-engineering fastapi python uvicorn-gunicorn

Last synced: 19 May 2026

https://github.com/parthshah02/customer_churn_dashboard

This repository features a comprehensive project showcasing data analysis and interactive dashboard using Python

data-analysis matplotlib numpy pandas python

Last synced: 13 Apr 2026

https://github.com/samruddhi3012/rfm-sales-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 12 Mar 2025

https://github.com/mainak-97/weather-data-analysis-using-python

A comprehensive analysis of time-series weather data using Python and Pandas, focusing on data exploration, cleaning, and uncovering insights.

data-analysis jupyter-notebook pandas pandas-dataframe python python3 time-series-analysis

Last synced: 08 May 2026

https://github.com/manditacaos/hypefemme-analise-vendas

Projeto de análise de dados e visualização no Power BI da loja fictícia Hype Femme.

data-analysis jupyter-notebook portfolio powerbi python

Last synced: 10 Apr 2025

https://github.com/nimomach/cafe-sales

This analysis focuses on evaluating the sales performance of a cafe by examining key metrics such as total revenue, sales by product category, peak sales times, and many more.

cafe data-analysis data-visualization sales

Last synced: 12 Mar 2026

https://github.com/subratamondal1/heart-attack-prediction

Heart Attack Prediction of patients based on the required data. Data Ingestion - Data Preparation - Exploratory Data Analysis (EDA) - Modelling - Evaluation.

data-analysis data-science data-visualization kaggle-dataset machine-learning matplotlib-pyplot numpy pandas python3 scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/aalekhpatel07/statcan

StatCAN dataset fetcher and cleaner.

census data-analysis data-science statcan

Last synced: 02 Apr 2025

https://github.com/deliprofesor/behavioral-insights-and-data-exploration

This project analyzes Spanish speech data, focusing on acoustic features and demographics. It includes data cleaning, outlier detection, clustering, and time series modeling (ARIMA, Holt-Winters) to uncover patterns in speech duration and word frequency.

acoustic-features arima clustering data-analysis holt-winters k-means machine-learning speech-analysis time-series-analysis

Last synced: 10 Apr 2025

https://github.com/abhipatel35/svm-hyperparameter-optimization-for-breast-cancer

Utilizing SVM for breast cancer classification, this project compares model performance before and after hyperparameter tuning using GridSearchCV. Evaluation metrics like classification report showcase the effectiveness of the optimized model.

breast-cancer cancer-diagnosis classification data-analysis data-science gridsearchcv healthcare hyperparameter-tuning jupyter-notebook machine-learning medical-imaging pycharm python scikit-learn support-vector-machine svm

Last synced: 05 Feb 2026

https://github.com/sabelomkhwanzi/data-alchemist-boot-camp

Built on Covalent's Unified API, Increment has the full historical data set for 40+ chains including every smart contract, event, transaction, address, etc. With access to all this data you can find:

covalent data-analysis increment

Last synced: 11 Mar 2026

https://github.com/edanur-y/airline-customer-satisfaction-prediction-with-multiple-logistic-regression

Performing multiple logistic regression analysis on airline and customer data to predict the satisfaction. 🔵R

data-analysis missing-values-analysis multiple-logistic-regression optimal-cut-off-points r

Last synced: 09 Jun 2026

https://github.com/prajjwol09/data-cleaning-project

This project is dedicated to cleaning, standardizing a dataset, dealing with null values from a CSV file named "layoffs" using MySQL, with MySQL Workbench as the workspace environment. The goal is to prepare the data for analysis.

cleaning-data columns data-analysis database duplicates mysql rows standard

Last synced: 20 Apr 2026

https://github.com/codesaadumair/pandas_exercises_personal

Personalized enhancements to pandas exercises with comprehensive solutions and practical insights for mastering data analysis in Python.

data-analysis data-science pandas python

Last synced: 09 May 2026

https://github.com/bitcoin-apps-suite/bitcoin-spreadsheet

Open source Bitcoin-powered spreadsheet application with blockchain data integration, smart contract calculations, and collaborative financial modeling | By THE BITCOIN CORPORATION LTD

bitcoin bitcoin-sv blockchain bsv cryptocurrency dapp data-analysis decentralized excel-alternative nextjs spreadsheet typescript web3-spreadsheet

Last synced: 05 May 2026

https://github.com/jianxi-erin/bigdata-machinelearning-lab

本项目是一个综合性的大数据与机器学习实验平台,包含两个主要任务,每个任务涵盖三个关键技术模块:大数据处理、数据分析和机器学习。项目基于真实的竞赛设计,提供完整的数据处理模拟和建模实践。

data-analysis data-visualization hadoop machine-learning python spark sql

Last synced: 03 May 2026

https://github.com/pranavsp108/time-series-forcasting

A time-series forecasting project to predict hourly energy consumption using Python, Pandas, and an XGBoost regression model.

data-analysis data-science energy-consumption forecasting matplotlib numpy pandas python scikit-learn sustainability time-series xgboost

Last synced: 10 Apr 2026

https://github.com/vishal786-commits/target-businesscasestudy-sql

This project analyzes Target’s e-commerce transactions in Brazil between 2016 and 2018 using SQL. The goal was to explore customer behavior, order patterns, payments, delivery times, and freight costs to generate actionable business insights.

bigquery data-analysis sql

Last synced: 05 Oct 2025

https://github.com/ankitwalimbe/ecommerce-funnel-analysis

SQL-based analysis of the Olist e-commerce dataset — building an order funnel (purchase → approval → delivery) with breakdowns by payment type, product category, region, and monthly trend. Includes insights, CSV exports, and Tableau dashboard.

bigquery business-intelligence data-analysis ecommerce funnel-analysis sql tableau-public

Last synced: 05 Oct 2025

https://github.com/josepablodmg/python--linear-regression---housing-exercise

A predictive analysis exploring the relationship between household characteristics and median income in California. Using linear regression, the project investigates whether blocks with fewer households correspond to higher median incomes.

california data-analysis data-science exploratory-data-analysis housing-data linear-regression machine-learning python regression scikit-learn statistics visualization

Last synced: 05 Oct 2025

https://github.com/chdre/data-analyzer

A small package to analyze and preprocess data.

data-analysis python

Last synced: 06 Oct 2025

https://github.com/data-edd/mastering_sql

This is a repo documenting me mastering sql

data-analysis mysql mysql-database sql

Last synced: 06 Oct 2025

https://github.com/swatisinghit/treadmill-customer-profiling-for-aerofit

Create comprehensive customer profiles for each AeroFit treadmill product through descriptive analytics. Develop two-way contingency tables and analyze conditional and marginal probabilities to discern customer characteristics, facilitating improved product recommendations and informed business decisions.

analytics conditional-probability data-analysis data-science data-visualization eda numpy pandas probability statistics

Last synced: 08 May 2026

https://github.com/ilaxi/lomicontadores

data management tool in reference to number of actions per day in a year

data-analysis gdscript godot godot4 python

Last synced: 19 Apr 2026

https://github.com/dacq-trap/dacq

DacQ Score Server

data-analysis go nextjs

Last synced: 14 Jan 2026

https://github.com/nuriadevs/informes-powerbi

Este repositorio contiene informes elaborados con Power BI.

data-analysis powerbi

Last synced: 18 Feb 2026

https://github.com/michaelcurrin/yahoo-finance-reports

Use the Yahoo Finance API to get info on shares of interest and report on them

data-analysis data-science python reporting shares stock-market yahoo-finance yahoo-finance-api

Last synced: 07 Oct 2025

https://github.com/prarthana-singh/bangalore-house-price-predictor

🏡 Bangalore House Price Prediction – A Machine Learning model to predict house prices in Bangalore using real estate data. Built with Linear Regression, Python, Pandas, NumPy, and Scikit-Learn.

data-analysis eda house-price-prediction linear-regression machine-learning numpy pandas python real-estate regression scikit-learn

Last synced: 19 Apr 2026

https://github.com/roydevashish/algo8.ai-data-manipulation-assignment

This assignment performs transaction-level sales data analysis and generates reports using Pandas / SQL / Spark inside a containerized environment. The dataset contains sales transaction records and is used to analyze SKUs, customers, and sales representative performance.

data-analysis duckdb python3 sql uv

Last synced: 15 May 2026

https://github.com/gabboraron/biostatisztika_es_alkalmazasai

"A statisztika a matematika azon ága, melynek feladata, hogy eszközt adjon a politikusok kezébe, mellyel tetszőleges állítás és annak ellentéte is tudományos alapon igazolható"

biostatistics data-analysis data-visualization r statistics statistics-course

Last synced: 24 Oct 2025

https://github.com/npodlozhniy/podlozhnyy-module

One place for the most useful methods for work

data-analysis data-science pypi

Last synced: 21 Jan 2026

https://github.com/omarsolieman/socialgiveawaydataanalysis

This project involved cleaning, analyzing, and processing data from an Instagram giveaway to ensure a fair and data-driven winner selection process. The primary goal was to automate the process of identifying valid entries, weighting them based on engagement (likes and multiple entries), and performing a post-giveaway analysis

data-analysis data-science data-visualization instagram scraping threejs

Last synced: 14 May 2026

https://github.com/maccccd/sql-proficiency-journey

A technical journey of my SQL understanding.

data-analysis sql systems-analysis-and-design uml-class-diagram

Last synced: 15 Feb 2026

https://github.com/ndomah1/learning-probability-and-statistics

This repo is a comprehensive learning resource that covers fundamental to advanced topics in probability and statistics, including probability theory, descriptive and inferential statistics, probability distributions, regression analysis, and data exploration techniques.

correlation-analysis data-analysis descriptive-statistics exploratory-data-analysis hypothesis-testing inferential-statistics probability regression statistics

Last synced: 18 Jan 2026

https://github.com/kmranrg/bikeshare

a project based on Data Analysis

data-analysis python

Last synced: 08 Oct 2025

https://github.com/alexquilis1/spanish-fuel-stations-analysis

Real-time analysis of Spanish fuel prices using government API data with interactive maps and regional comparisons

data-analysis data-visualization fuel-prices geospatial-analysis ggplot2 government-data leaflet open-data r shiny spain tidyverse

Last synced: 08 Oct 2025

https://github.com/sorebit/pdrpy-pd-2

Data analysis of various stackechange.com archives.

data-analysis stackexchange time-travel university-project

Last synced: 08 Oct 2025

https://github.com/debjyotisaha/power-bi-projects-phase-2

Created interactive dashboards and reports using Power BI to visualize complex datasets. Demonstrated proficiency in data modelling, DAX calculations, and storytelling through data to provide actionable insights.

dashboards data-analysis data-modeling data-visualisation power-query powerbi

Last synced: 18 Jan 2026

https://github.com/jlee9503/telecommunication-churn

Analyze key factors influencing customer churn using Python data analytics technique. Explore key factors through data preprocessing, exploratory data analysis (EDA), and predictive modeling.

data-analysis data-visualization matplotlib pandas python scikit-learn

Last synced: 18 Jan 2026

https://github.com/faisal-khann/ipl-analysis

The IPL Analysis project is a comprehensive data-driven exploration of the Indian Premier League (IPL), analyzing historical match data to uncover patterns in team performance, player statistics, and match outcomes.

data-analysis exploratory-data-analysis jupyter-notebook matplotlib numpy pandas seaborn

Last synced: 08 May 2026

https://github.com/l1ght14/tradersentiment_primetrade

Analyzes Bitcoin market sentiment's impact on Hyperliquid trader PnL & behavior. Uncovers patterns using Python (Pandas, Seaborn) to derive actionable trading insights. Junior Data Scientist assignment for PrimeTrade

bitcoin crypto-trading cryptocurrency data-analysis financial-data-analysis jupyter-notebook market-sentiment pandas python trader-behavior web3

Last synced: 20 Oct 2025

https://github.com/debjyotisaha/sql-projects

Designed and implemented SQL-based projects to analyse and manage datasets efficiently. Demonstrated expertise in writing complex queries, optimizing database performance, and performing data extraction, transformation, and loading (ETL) processes.

data-analysis database sql

Last synced: 09 Oct 2025

https://github.com/alokthedataguy/financial-friend-web-app

Financial Friend is a privacy-first web app that takes a user’s payment statement (PhonePe, GPay, bank CSV/PDF), cleans and understands it, and then talks back like a friend—giving simple, human answers (plus a few tiny visuals) to questions people actually care about.

data-analysis data-science data-visualization fastapi finance-management financial-analysis financial-data insights personal-finance-and-data-anlaysis python react

Last synced: 14 Apr 2026

https://github.com/sillyash/untappd-viz

A data visualisation page using public datasets and HTML/CSS/JS with D3.js.

beer beer-statistics data data-analysis data-visualization kaggle kaggle-dataset public-dataset school-project

Last synced: 18 May 2026

https://github.com/samuelsoaress/wkd-default-reduction

reduction of default from 35% to 25% or less with machine learning techniques

data-analysis data-exploration data-science machine-learning-algorithms

Last synced: 10 Oct 2025

https://github.com/filipe-rds/bi-atividade-1

Atividade de análise de dados para a disciplina de Inteligência Empresarial

data-analysis jupyter-notebook python

Last synced: 15 May 2026

https://github.com/badranalyst/time-series-analysis-of-global-trends-in-diet-gym-and-finance

This project analyzes global trends in diet, gym, and finance over time using time series data. The analysis is performed using Python libraries like Pandas, Matplotlib, and Seaborn to visualize trends and identify patterns in these sectors across various countries.

data-analysis dataset matplotlib-pyplot numpy pandas python seaborn time-series

Last synced: 14 Apr 2026

https://github.com/imrandil/excel_learning_dir

Excel learning practice with some data, the doing

data-analysis datasets excel

Last synced: 27 Jan 2026