An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/ahsankhizar5/titanic-eda-visualization

Exploratory Data Analysis and Visualization on the Titanic Dataset using Python, Pandas, Matplotlib, and Seaborn to uncover survival patterns.

data-analysis data-science data-visualization eda kaggle machine-learning matplotlib pandas python seaborn titanic-dataset

Last synced: 31 May 2026

https://github.com/kavicastelo/colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 26 Mar 2025

https://github.com/bakulwani/data-mart-weekly-sales

Cleaned and analyzed weekly sales data using SQL to build a business-focused data mart with KPIs, customer segmentation, and platform insights.

customer-segmentation data-analysis data-cleaning etl kpi-analysis mysql sales-analysis sql

Last synced: 21 Feb 2026

https://github.com/codeonthespectrum/web-scrap

Este projeto realiza o web scraping da Wikipédia para obter dados sobre os municípios mais populosos do estado do Rio de Janeiro.

data-analysis data-visualization webscraping

Last synced: 16 Feb 2026

https://github.com/thinzarhninyu/dap

Notes and Projects for Data Analysis with Python course from FreeCodeCamp.org

data-analysis data-analysis-python ipynb jupyter-notebook python

Last synced: 18 Feb 2026

https://github.com/navp7/pizzasales_powerbi

This project involves creating a comprehensive sales performance dashboard using Power BI to visualize and analyze the sales data of an Italian pizza company.

data-analysis ms-sql-server ms-word powerbi visualization

Last synced: 13 Mar 2026

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 13 May 2026

https://github.com/dzakwanalifi/stadata-x

Terminal UI untuk menjelajahi dan mengunduh data BPS Indonesia secara interaktif

bps-api cli-app data-analysis data-visualization indonesia-statistics indonesian-data open-data python statistics terminal-ui textual tui

Last synced: 20 Jan 2026

https://github.com/who-else-but-arjun/isro_xrf_sr

Source Codes for super resolution of the lunar elemental abundance map using a semi-supervised deep spatial interpolation model. This hybrid approach combined ResNet50 for spatial feature extraction with Graph Neural Network (GATv2Conv) layers and Convolutional Neural Networks (CNNs), followed by fusion layers.

cnn data-analysis graph-neural-networks pytorch semi-supervised-learning spatial-interpolation super-resolution

Last synced: 30 Apr 2026

https://github.com/NurFakhri/scraping-and-analysis-skincare

Scraping and data analysis of Indonesian skincare reviews.

beutifulsoup data-analysis data-scraping python requests review scraping-websites

Last synced: 12 Oct 2025

https://github.com/dadvaiahpavan/ai-data-scientist-

AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.

ai anomaly-detection classification data-analysis data-science machine-learning panda plotu regression scikit-learn sentiment-analysis streamlit

Last synced: 03 May 2026

https://github.com/jeffbrennan/analysis-templates

Templates of commonly used graphics/functions/settings to help focus on the bigger picture

data-analysis r rmd

Last synced: 12 Oct 2025

https://github.com/alexondata/daan_eda-exploratory-data-analysis_ecommerce

This project presents an Exploratory Data Analysis (EDA) pipeline for an eCommerce dataset, integrating Python, SQL Server, and Power BI to transform raw transactional data into meaningful business insights. The project was developed as part of an academic assignment at Transilvania University of Brașov, Faculty of Mathematics and Computer Science.

data-analysis data-visualization ecommerce microsoft-sql-server powerbi python

Last synced: 18 May 2026

https://github.com/sandergi/ekichabi

A digital phonebook to connect sustenance farmers in Tanzania. Works via USSD so farmers without an internet connection can use it (via their Telecom). Build with Django in Python and a MySQL database. This is a public copy of the private repo with user information stripped.

android data-analysis ict4d research ussd

Last synced: 14 May 2026

https://github.com/hemangsharma/bookingdataanalysisreport

The report helps understand key trends and insights around customer bookings, pricing, and other related attributes.

analysis data data-analysis data-analytics data-visualization streamlit streamlit-dashboard

Last synced: 14 May 2026

https://github.com/zulfachafidz/green_horizon_forecasting_peak_organic_avocado_sales_with_the_prophet_algorithm

The Green Horizon Project leverages the Prophet algorithm to predict peak sales of organic avocados, supporting the campaign "APEAM GO ORGANIC." Using Python and Looker Studio, this analysis aims to provide deep insight into sales trends and potential, forming the basis of smarter marketing strategies.

algorithm algorithms analytics data data-analysis data-engineering data-mining data-science data-visualization forecasting machine-learning machine-learning-algorithms prophet-model python python-script

Last synced: 17 May 2026

https://github.com/bala-1409/sql-projects

The repository contains Structured Query Language (SQL) Scripts. The Multiple SQL scripts for various projects which includes data cleaning, data pre-processing, data processing, data transformation and insights gaining through Query Language.

data-analysis data-mining data-science data-transformation database eda etl-framework exploratory-data-analysis microsoft-sql-server query-language sql sql-server sql-server-database sql-server-management-studio

Last synced: 27 Feb 2026

https://github.com/benmar2406/rent-in-germany

Interactive visualizations and maps depicting topics around rent prices and income in Germany built with Svelte.

charts d3 d3-visualization d3js data-analysis data-visualization gis gis-data infographic infographics map mapbox mapbox-gl mapbox-gl-js mapboxgl svelte

Last synced: 26 Mar 2025

https://github.com/sadratehranian/pem-fuel-cell

The methodology section details the use of Python for data processing and analysis, employing statistical and machine learning-based anomaly detection techniques to identify potential issues in fuel cell stacks. It emphasizes data preprocessing, feature engineering, exploratory data analysis (EDA), and anomaly detection.

anomaly-detection data-analysis data-science data-visualization exploratory-data-analysis feature-engineering fuel-cell machine-learning preprocessing python statistical-analysis visual-studio-code

Last synced: 26 Mar 2025

https://github.com/thlindustries/mortalidade_neonatal_python_react

Uma plataforma de visualização de dados montada utilizando Python e React com a library de visualização do Plotly

data-analysis data-visualization plotly python python3 react reactjs

Last synced: 16 Apr 2026

https://github.com/louisfernando1204/websocket-benchmark

A comprehensive performance testing and analysis suite designed to evaluate and compare different WebSocket server implementations across various programming languages and libraries.

benchmarking broadcast-test coder-websocket csv data-analysis data-visualization echo-test golang gorilla-websocket nodejs python3 socket-io websocket-client websocket-server ws

Last synced: 09 Apr 2026

https://github.com/sidsin0809/hmdb-endo-flagger

A Python toolkit to identify and score endogenous human metabolites from HMDB XML metadata

data-analysis hmdb metabolomics ontology pipeline python-3 streaming-parser xml-parsing

Last synced: 06 Jul 2025

https://github.com/amoghkori/deeplabcut-package-for-animal-pose-estimation

DeepLabCut Mouse Location Prediction: Training a deep neural network to predict the location of a mouse using annotated joint positions.

data-analysis data-annotations data-preprocessing deep-learning machine-learning model-evaluation python-programming research research-project

Last synced: 17 Mar 2025

https://github.com/l1ght14/e-commerce-sales-analysis

Interactive Power BI dashboard analyzing e-commerce sales, profit trends, top products, and customer segments using the Sample Superstore dataset.

dashboard data-analysis powerbi

Last synced: 12 Feb 2026

https://github.com/gmalbert/supreme-court

Data Analysis of the US Supreme Court from 1790 to present

data-analysis data-science supreme-court

Last synced: 31 May 2026

https://github.com/angelalim88/jakarta-air-quality-index-classification

This project classifies Jakarta's Air Quality Index (AQI) from 2010 to 2023 using machine learning models (Random Forest, MLP, SVM) based on pollutant concentrations.

data-analysis data-visua machine-learning scikit-learn tensorflow

Last synced: 13 Oct 2025

https://github.com/gmalbert/rugby

Rugby Data Analysis and Sports Betting

data-analysis rugby sports-betting

Last synced: 31 May 2026

https://github.com/jackmnob/python-tableau-eda-stockdash

Data cleaning, preparation, and manipulation (EDA) for an interactive stock market dashboard with Tableau - using pandas (Python) via JupyterLab

cleaning-data dashboard data-analysis data-preparation eda jupyter-notebook jupyterlab python tableau-public

Last synced: 14 May 2026

https://github.com/sumit9000/submission-of-web-server-log-analysis-assessment

This project analyzes one year of real-world HTTP access logs from the University of Calgary’s computer science server. Using Python, pandas, and regular expressions, we clean and parse the data to extract meaningful insights and answer 10 analytical questions.

data-analysis data-cleaning eda jupyter-notebook log-parsing pandas python realworld-data regex web-log-analysis

Last synced: 14 Apr 2026

https://github.com/inddrsingh/restaurant_orders_mysql

Complex SQL queries on restaurant data for better and precise insights

data-analysis insights mysql

Last synced: 28 Jan 2026

https://github.com/derogative404/google_data_analytics_capstone

Capstone project part of the Google Data Analytics Certificate Program

data-analysis excel r tableau

Last synced: 26 Mar 2025

https://github.com/samkazan/business-analysis-tableau

Business Analysis on Global/Superstore data using Tableau.

analysis data-analysis tableau visualization

Last synced: 08 Feb 2026

https://github.com/anushkundu/london-housing-market-analysis

London Housing Market Analysis: An Insightful Power BI Dashboard"

data-analysis data-visualization powerbi transformation

Last synced: 27 Jan 2026

https://github.com/achrefbenammar404/quasi-patterned-conversations-analysis

Official Implementation of the IEEE EUROCON 2025 Paper A Computational Approach to Modeling Conversational Systems Analyzing Large-Scale Quasi-Patterned Dialogue Flows Mohamed Achref Ben Ammar – National Institute of Applied Science and Technology (INSAT), University of Carthage, Tunisia Mohamed Taha Bennani – University of Tunis El Manar (FST)

ai computational-linguistics conversational-agent conversational-ai data-analysis graph-algorithms nlp research research-paper

Last synced: 14 Oct 2025

https://github.com/ayorick23/python-data-science-cheat-sheet

Guía rápida y práctica de sintaxis, comandos y funciones esenciales de Python para Ciencia de Datos. Perfecta para recordar cómo usar las librerías más comunes como NumPy, Pandas, Matplotlib y Scikit-learn en tus análisis diarios.

cheat-sheet data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning matplotlib ml numpy pandas python scikit-learn scipy seaborn statistics sympy tensorflow

Last synced: 07 Apr 2026

https://github.com/sundar2k22/agriculture-decision-management-system

Geospatial Analysis Using QGIS: Comparative Study of Yelahanka and Sarjapur

data-analysis qgis

Last synced: 22 Jan 2026

https://github.com/ankitpoddar07/excel-project_back-office

📊 Coffee Sales Analytics – Back Office Excel Project

data-analysis ms-excel

Last synced: 05 Feb 2026

https://github.com/saisurajmatta/healthcare-data-analytics

Power BI project analyzing Emergency Department data, demonstrating skills in data transformation, DAX, and visualization. It focuses on patient flow, wait times, demographics, and satisfaction, providing actionable insights for healthcare improvement. Includes documentation, data dictionary, and code samples.

data-analysis data-modeling data-visualization dax power-bi powerbi-visuals powerquery

Last synced: 22 Jan 2026

https://github.com/saisurajmatta/e-commerce-sales-advanced-data-analysis

Excel-based e-commerce analytics for FNP, a gift company. It covers data extraction, modeling, and visualization, providing actionable insights on revenue, customer behavior, and operations. Key skills include Excel, Power Query, Power Pivot, and DAX. The analysis culminates in data-driven business recommendations.

data-analysis data-visualization dax excel power-pivot power-query

Last synced: 22 Jan 2026

https://github.com/rohitblaze10/-excel-_seller_store_analysis

A collection of data analysis projects showcasing data cleaning, exploration, visualization, and machine learning. Using "Excel" and more to uncover insights and drive data-driven decision-making. Feel free to explore, contribute, or collaborate!

data-analysis data-visualization excel excel-export

Last synced: 12 Feb 2026

https://github.com/rohanrony19/movie-recommendation-system

This is a python project where using Pandas library we will find correlation and give the best recommendation for movies.

data-analysis deep-learning knn-algorithm numpy pandas python recommendation-system

Last synced: 14 Apr 2026

https://github.com/pedrosfaria2/analisetitulosnetflix

Estudo de popularidade dos filmes da Netflix no IMDB.

analise-de-dados data-analysis jupyter-notebook matplotlib numpy pandas python

Last synced: 14 Apr 2026

https://github.com/virajbhutada/hollywood-insights-tableau

Strategic cinematic insights through Hollywood's data landscape. Tableau-driven analytics for genre, studio profitability, and audience dynamics. Uncover trends, assess audience reception, and navigate through years of film data, elevating your understanding of the cinematic world.

analystics business-intelligence dashboard data-analysis data-visualization entertainment hollywood storytelling tableau tableau-desktop visualization

Last synced: 05 Feb 2026

https://github.com/kunalpisolkar24/winequalityprediction

Predicting wine quality using machine learning with matplotlib, numpy, pandas, and seaborn for insightful data analysis. 🍇🤖📊

data-analysis data-science data-visualization machine-learning prediction-model

Last synced: 16 Oct 2025

https://github.com/ashithapallath/r-lab

This repository offers a collection of exercises, assignments, and projects designed for the R Programming course. It focuses on utilizing R for data analysis, statistical modeling, and visualization tasks.

data-analysis exploratory-data-analysis machine-learning r-language visualization

Last synced: 16 Oct 2025

https://github.com/xenon1919/credit-card-fraud-detection

Credit Card Fraud Detection is a machine learning project to predict fraudulent credit card transactions. It handles imbalanced data using undersampling and applies Logistic Regression and XGBoost models. With an AUC of 0.98, it offers robust fraud detection. Includes a Streamlit app for real-time predictions.

data-analysis machine-learning python

Last synced: 14 May 2026

https://github.com/hase3b/flask-dash-interactive-dashboard

An interactive data visualization dashboard created using Flask and Dash. This project includes comprehensive data preparation, exploratory data analysis (EDA), and dynamic visualizations with Seaborn and Plotly. Explore the multi-page Dash app with features like dropdowns and callbacks for updated plots.

callbacks dash dashboard data-analysis data-visualization dropdown eda flask interactive plotly seaborn web-app

Last synced: 19 May 2026

https://github.com/kaushik-puttaswamy/amazon-sales-dashboard-using-tableau

The Amazon Sales Data Analysis Dashboard provides insights into key sales metrics like profit, revenue, shipment days, and units sold. It includes visualizations to assess performance by region, country, and sales channel. The dashboard helps stakeholders optimize strategies and improve profitability through data-driven analysis.

dashboard data-analysis data-visualization tableau

Last synced: 11 Jan 2026

https://github.com/supertetelman/coursera-exdata-09

This repo contains several R scripts that were used to analyze, plot, and clean data from various datasets. These projects were part of the Coursera course, Exploratory Data Analysis. The end results of the analysis are included.

big-data course coursera data-analysis r

Last synced: 16 Oct 2025

https://github.com/ritap03/neuralnetwork-shapeclassifier

Feedforward neural network system in MATLAB for geometric shape classification. Includes data preprocessing, network training and evaluation, confusion matrix analysis, and a graphical interface for user interaction and model testing.

ai data-analysis deep-learning feedforward-network gui image-classification machine-learning matlab neural-network pattern-recognition

Last synced: 14 May 2026

https://github.com/lucashomuniz/project-15

[Dashboard] Enhancing Business Intelligence: Leveraging SQL, Python, and DAX for Strategic Insights in Sales Analysis

business-analytics business-intelligence data-analysis data-science data-visualization dax-languague machine-learning powerbi python

Last synced: 12 Jul 2025

https://github.com/koldlight/bluetab-data-science-2017

Repositorio para compartir material y publicar los retos

course data-analysis data-science exercises

Last synced: 12 Feb 2026

https://github.com/mattdelaune/excel_sales_dashboard

Interactive Excel Dashboard for Coffee Sales Analysis: This project leverages Excel to analyze sales data, uncover seasonal trends, regional preferences, and customer behaviors, providing actionable insights for optimizing inventory and marketing strategies.

data-analysis excel pivot-tables sales-dashboard sales-data

Last synced: 27 Jan 2026

https://github.com/pooja-manjunatha/nyc_parking_violations_dbt

This project uses dbt to transform NYC parking violations data through a layered architecture: Bronze: Raw ingested data Silver: Cleaned and enriched data Gold: Aggregated tables for analytics Using DuckDB as the warehouse backend, it ensures data quality with tests and documentation. The project enables reliable analysis of parking violations

data data-analysis data-engineering dbt duckdb python sql

Last synced: 14 May 2026

https://github.com/aakk23/netflix_sql_project

This SQL project provides an analytical overview of Netflix's movies and TV shows dataset, uncovering key insights related to content types, ratings, release trends, and geographic distribution. It helps explore patterns in content availability, audience targeting, and regional preferences to support data-driven decisions.

data-analysis netflix-data-analysis postgresql sql

Last synced: 10 Apr 2025

https://github.com/valyaevgeorgiy/r_basic

Работа с основами среды R и тем самым изучения нового языка программирования, связанного непосредственно с анализом данных и построением графиков и диаграмм.

coding data data-analysis r rstudio

Last synced: 12 Dec 2025

https://github.com/dcs-training/much-ado-about-nothing-missing-data-in-research

Repo for the Much ado about nothing workshop. Go to the Readme file

data-analysis data-cleaning data-wrangling r

Last synced: 15 Jun 2025

https://github.com/prateek5525/online-shopping-analytics-project

The Online Shopping Analytics Project analyzed product trends, and regional sales using SQL and Tableau. Insights from the Sales and Location Dashboards highlighted key trends in demographics, product popularity, and regional performance. These findings empower businesses to optimize strategies, enhance marketing, and improve inventory management.

data-analysis excel kaggle-dataset sql tableau

Last synced: 20 Feb 2026

https://github.com/codeslash21/communicate_data_findings

Analyze and visualize Bay Wheel system data which contains 2.5M individual trips data. And communicate the data findings from the dataset in the form notebook slide.

bay-wheel data-analysis data-visualization explanatory-data-visualization exploratory-data-analysis

Last synced: 22 Jan 2026

https://github.com/abhijeet107/task-4

Design an interactive dashboard for business stakeholders.

data-analysis excel-csv tableau-dashboards tableau-public

Last synced: 22 Jan 2026

https://github.com/codeslash21/analyze-a-b-test-results

Analyze results of an A/B test run by an e-commerce website.

ab-test data-analysis

Last synced: 22 Jan 2026

https://github.com/casassg/ms_thesis

Social Media Analysis for Crisis Informatics in the Cloud

casassg-thesis data-analysis google-cloud kubernetes

Last synced: 19 Oct 2025

https://github.com/leabrodyheine/ml-kaggle-cirrhosis-data

This project showcases skills in machine learning, data preprocessing, and model evaluation using Python libraries such as scikit-learn, XGBoost, and Optuna. It involves implementing various machine learning models, handling imbalanced data, and employing imputation techniques to enhance model performance for predicting cirrhosis outcomes.

data-analysis data-pre imbalanced-data imputation machine-learning optuna pipeline scikit-learn xgboost

Last synced: 14 May 2026

https://github.com/rajesh9943/visualizing-global-development-trends-an-animated-analysis-of-life-expectancy-and-fertility-rates

To clean and analyze data to find trends in global population, fertility, and life expectancy from 1960 to 2016. This idea was inspired by hans rosling . To analyze the data, I used a scatter bubble chart, which clearly shows how's the population increased and the fertility rate decreased from 1960 to 2016.

data-analysis data-cleaning-and-preprocessing data-exploration expolatory-data-analysis identify-patterns reporting vizualisation

Last synced: 08 Oct 2025

https://github.com/Kaushik-Puttaswamy/Airline-Passenger-Referral-Prediction-Using-Machine-Learning

This project uses a machine learning model to predict if passengers referred by existing customers will book a flight, helping airlines target likely customers. Key factors like service ratings and value for money drive predictions, achieving over 90% accuracy.

airline-marketing customer-referral-prediction customer-satisfaction data-analysis feature-engineering hyperparameter-tuning machine-learning model-evaluation predictive-analytics

Last synced: 20 Oct 2025

https://github.com/ashwin331133/sql-pizza-outlet-sales-analysis

This project analyzes pizza sales data to gain insights into customer behavior and revenue patterns. Key analyses include customer insights, popular pizza types and sizes, revenue generation, and order trends. The findings help optimize menu offerings, staffing, and marketing strategies to boost overall business performance.

data-analysis sql

Last synced: 24 Feb 2026

https://github.com/mothraa/etl-marketanalysis-webscraping-poo

OC project 2 refactoring (POO version not yet completed)

data-analysis etl poo python web-scraping

Last synced: 20 Oct 2025

https://github.com/tomkyle/binning

Determine optimal number of bins 𝒌 for histogram creation and optimal bin width 𝒉 using various statistical methods.

binning data-analysis distributions doanes-rule freedman-diaconis histogram histogram-binning math php-math rice-rule scotts-rule square-root statistics sturges-rule terrell-scotts-rule

Last synced: 21 Oct 2025

https://github.com/rajesh9943/web-scraping-analysis-of-top-us-company-revenue-growth-in-2023

Explore the landscape of US business growth in 2023 with our dynamic project, 'Web Scraping for US 2023 Revenue Growth.' Utilizing advanced web scraping techniques, we unveil insights into the top companies driving economic expansion.

cleaning-data data data-analysis data-visualization manipulation numpy pandas pre-fill

Last synced: 16 Aug 2025

https://github.com/manish506/loan-approval-prediction

Explore predictive modeling in this project by applying classification techniques to a loan approval dataset. Analyze and preprocess the data, then use models like K-Nearest Neighbors, Random Forest, SVC, and Logistic Regression to predict loan outcomes. Gain insights into approval factors and enhance prediction accuracy.

classification classification-models data-analysis data-science jupyter-notebook loan-approval-prediction machine-learning predictive-analytics predictive-modeling project python

Last synced: 19 Jan 2026

https://github.com/borjamome/explorando-madrid

Exploring Madrid: A Data-driven Analysis with R 🐻🌳

data-analysis data-visualization madrid r

Last synced: 26 Mar 2025

https://github.com/borjamome/top-goleadores

Mejores delanteros en Europa según los datos

data-analysis data-visualization football-analytics r

Last synced: 26 Mar 2025