An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/mafda/seattle_airbnb_data_analysis

This repository contains a comprehensive analysis of the Seattle Airbnb dataset, conducted using the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology.

crisp-dm data-analysis data-science jupyter-notebook pandas-python seattle-data

Last synced: 29 May 2026

https://github.com/angelgardt/wlm-sdarp-old

World of Linear Models: Statistics & Data Analysis in R for Psychologists

data-analysis data-visualization gh-pages manim-animations quarto r rstudio statistics

Last synced: 04 May 2026

https://github.com/ehtisham-sadiq/building-an-ml-based-heart-disease-diagnosis-system-with-flask

It is an end-to-end project that combines machine learning to create a user-friendly Heart Disease Diagnosis System, powered by Flask.

data-analysis exploratory-data-analysis feature-engineering flask machine-learning model-building model-evaluation pipelines python3 rest-api

Last synced: 04 May 2026

https://github.com/kimtth/agent-data-analyst-stream-chainlit

⚡️Chainlit-based Data Analyst Chat Agent (Responses API, Server Sent Events) 📈

agent azure-openai chainlit code-interpreter data-analysis server-sent-events stream-response

Last synced: 09 Jun 2026

https://github.com/alejo1630/chicago_crimes

A Jupyter Notebook with the data analysis and data visualization of crimes in Chicago from 2017 to 2023 using libraries such as seaborn and folium

data-analysis data-visualization folium pandas python seaborn

Last synced: 12 Apr 2026

https://github.com/githubuseraccountamazing/the-amari-project

a project in which I attempted to push some of the limits of stable-diffusion while taking some data along the way

ai ai-generated-images bash data-analysis machine-learning stable-diffusion textual-inversion

Last synced: 05 May 2026

https://github.com/mirokeimioniemi/classifying-software-pirates

Exploring the factors driving people into software piracy by training two machine learning models to predict whether a person with certain characteristics and sentiments is likely to possess any pirated software or not using a dataset collected via a survey targeting users of music production software.

data-analysis data-science decision-tree-classifier logistic-regression machine-learning piracy python software-piracy survey

Last synced: 06 May 2026

https://github.com/bcko/ud-da-stroopeffect

Udacity Data Analyst Nanodegree Project : Test a Perceptual Phenomenon (Stroop Effect)

data-analysis data-analyst-nanodegree stroop-effect udacity udacity-data-analyst-nanodegree

Last synced: 04 Jul 2025

https://github.com/kirkalyn13/opensignal_autogenerate_report

Script used to generate results/summary, including the trends of flagged provinces, from the raw excel data file,

data-analysis data-science data-visualization matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/namratha2301/best-selling-books

Comprehensive examination of best-selling books, focusing on understanding sales patterns, genre distributions, and the impact of various features on book performance.This project aims to predict book sales and classify genres, providing valuable insights for authors, publishers, and readers.

data-analysis data-visualization matplotlib pandas sckiit-learn seaborn

Last synced: 06 May 2026

https://github.com/mrankitgupta/titanic-survival-prediction-93-xgboost

Titanic Survival Prediction Project (93% Accuracy)🛳️ In this notebook, The goal is to correctly predict if someone survived the Titanic shipwreck using different Machine Learning Model & Hyperparameter tunning.

classification data-analysis data-science data-visualization gradient-boosting kaggle-competition linear-regression logistic-regression machine-learning machine-learning-algorithms ml ml-models nlp prediction predictive-modeling random-forest titanic titanic-kaggle titanic-survival-prediction xgboost

Last synced: 06 May 2026

https://github.com/sivas-2/coffee-sales-visualization

This repository contains data visualization scripts and notebooks analyzing coffee sales data from a vending machine, sourced from Kaggle. The visualizations explore sales trends, customer preferences, and product popularity over time.

data data-analysis data-science data-visualization python visualization

Last synced: 07 May 2026

https://github.com/gallillio/unsupervised_clustering_music_recommendation_system

Music Recommendation System using Unsupervised Machine Learning Clustering Methods using K-Means, Fuzzy C Mean DBSCAN, Gaussian Mixture Model, BIRCH and Agglomerative Clustering

affinity-propagation agglomerative-clustering birch-clustering data-analysis data-visualization dbscan-clustering fuzzy-cmeans-clustering gaussian-mixture-models k-means-clustering pca unsupervised-machine-learning

Last synced: 19 Oct 2025

https://github.com/md-emon-hasan/data-science

Data science tutorials, including data preprocessing, analysis, visualization, project deployment, machine learning and deep learning algorithms.

artificial-intelligence data-analysis data-engineering data-science deep-learning machine-learning-algorithms python

Last synced: 07 May 2026

https://github.com/moscarde/pyproductivity

Application uptime tracker that monitors active windows, automatically generating daily usage reports.

daily-report data-analysis python tracker

Last synced: 19 Oct 2025

https://github.com/narenkhatwani/arkouda-projects

This repository contains the source codes of the projects done using Arkouda (a software package that allows a user to interactively issue massive parallel computations on distributed data using functions and syntax that mimic NumPy, the underlying computational library used in most Python data science workflows.)

arkouda data-analysis data-analytics data-science high-performance high-performance-computing highperformancecomputing numpy pandas parallel-computing parallel-processing parallelization python

Last synced: 17 Apr 2026

https://github.com/whitehathackerpr/data-visualization-tool

This is a Python-based web application that allows users to upload datasets, analyze data, and create visualizations interactively. The tool is designed for ease of use and provides a simple interface to perform basic data analysis and generate visualizations

data data-analysis data-visualization python python3

Last synced: 05 Sep 2025

https://github.com/renanmoliveir/analise_de_dados_bikestore_power-bi_atualizan-o

Projeto de análise de dados do banco de dados Bike Store com Power BI.

data-analysis dax-languague powerbi query

Last synced: 15 Mar 2026

https://github.com/miroslav-reiter/kurz_jazyk_sql_analytici_datovi_vedci

Materiály ku kurzu Jazyk SQL 1 pre Analytikov a Dátových Vedcov

analysis analytics data data-analysis data-science database mysql reiter sql

Last synced: 08 May 2026

https://github.com/aekanshd/crazytics-suicidesindia

Basic interpretation of the Suicides in India data-set using R.

data-analysis data-science graph india r suicides

Last synced: 10 Jun 2026

https://github.com/mariam-badr-mb/gtc-ml-project2-diabetes-prediction

This project is part of the GTC Machine Learning Program. It demonstrates the end-to-end ML workflow by building a predictive model for diabetes detection

classification-algorithm data-analysis data-visualization diabetes-prediction gridsearchcv hyperparameter-tuning machine-learning python

Last synced: 09 May 2026

https://github.com/santiagohermo/data.tools

Convenience functions for data manipulation

data-analysis data-cleaning data-science data-wrangling

Last synced: 24 Oct 2025

https://github.com/dcs-training/null-hypothesis-testing-with-r

This two-class course will focus on developing theoretical and practical skills for null hypothesis testing in R. Go to the readme file

data-analysis data-wrangling r statistics

Last synced: 24 Oct 2025

https://github.com/chayandatta/got_script_manipulation

Game of Thrones Script - String & file manipulation

data-analysis data-science pandas python3

Last synced: 11 May 2026

https://github.com/beallio/wherewolf

Wherewolf is a production-grade, local SQL workbench designed for data engineers and analysts to query local files (CSV, Parquet, JSON) with ease. Built with Streamlit, it provides a unified interface to execute SQL against either DuckDB or PySpark engines without requiring complex setup.

big-data data-analysis data-engineering etl parquet performance pyspark python spark-sql sql uv

Last synced: 28 Apr 2026

https://github.com/mkk-1817/adhd-prediction

This project focuses on leveraging machine learning techniques to predict Attention-Deficit/Hyperactivity Disorder (ADHD) in children. Accurate and early diagnosis is crucial for effective intervention and support.

adhd data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms prediction python

Last synced: 09 May 2026

https://github.com/foggy-projects/foggy-data-mcp-bridge

MCP Data Bridge for Java. Enabling safe Text-to-Query via a semantic layer, making enterprise data accessible to AI Agents.

agent data-analysis java llm mcp semantic-layer spring-boot text-to-sql

Last synced: 16 Mar 2026

https://github.com/shadowk29/cusumtools

An eclectic collection of python scripts I have found to be useful in processing nanopore data

data-analysis data-visualization time-series-analysis

Last synced: 16 Mar 2026

https://github.com/zpreisler/modules

Python libraries and modules for processing simulation outputs

data-analysis python scripts tensorflow

Last synced: 13 May 2026

https://github.com/sunnybibyan/random_data_generation

A project that generates a dataset using various statistical distributions (Normal, Uniform, Exponential, Random Integers, and Binomial) and performs data analysis. Includes visualizations and an option to export the data as a CSV file.

data-analysis data-visualization python random-data-generation statistics streamlit-webapp

Last synced: 13 Jun 2026

https://github.com/sujitmahapatra/blinkit-analysis-dashboard-powerbi

Power BI dashboard analyzing Blinkit sales data to address challenges in Quick Commerce, examining outlet types, establishment years, and customer demographics to uncover insights and trends.

data-analysis powerbi powerbi-dashboards

Last synced: 28 Jan 2026

https://github.com/geo-y20/uber-rides-data-analysis

This project aims to analyze Uber ride data to understand various aspects of ride usage, such as the distribution of rides across different categories, purposes, months, days, and times.

dashboard dashboard-templates data data-analysis data-analysis-python data-analytics data-visualization pandas powerbi python recommendation-system rides uber

Last synced: 13 Apr 2026

https://github.com/kaz-yos/distributed

Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulation Study (Pharmacoepidemiol Drug Saf 2018)

data-analysis epidemiology statistics

Last synced: 15 Jun 2026

https://github.com/chetanmalviya513/Firm-Financial-Transaction-Analysis

📊 Financial Analysis & Forecasting Processed large-scale financial data using Python for trend analysis and insights. Developed interactive Tableau dashboards to improve forecasting accuracy and reduce costs by 25%.

data-analysis financial-data forecasting insights msexcel pandas python reporting tableau-dashboards

Last synced: 15 Jun 2026

https://github.com/titanscouting/tra-analysis

Titan Robotics 2022 Strategy Team Analysis Repository

data-analysis frc frc-scouting hacktoberfest python

Last synced: 29 Jan 2026

https://github.com/mindgamesnl/yanderestats

https://mindgamesnl.github.io/YandereStats/

data-analysis reporting-pipeline yandere yandere-sim

Last synced: 18 Jun 2026

https://github.com/seekinginfiniteloop/fedcal

A feature-rich Python calendar that enables time series analyses of changes in federal workforce schedules and shifts in executive department funding status.

data-analysis data-science econometrics economic-data economics federal federal-government hr pandas pandas-library pandas-python pydata python

Last synced: 15 Apr 2026

https://github.com/lebrancconvas/how-much-love-in-thai-song

How much Love song among the Thai Songs?

data-analysis side-project web-scraping

Last synced: 19 Jun 2026

https://github.com/jamiemagee/rhi

Collating the data on the Renewable Heat Incentive scheme, and presenting it in a more readable format.

data-analysis open-data open-government rhi

Last synced: 25 Feb 2026

https://github.com/faisal-khann/diwali-sales-analysis

The "Diwali Sales Analysis" project aims to analyze the sales data during the Diwali festival period to uncover insights and trends that can help improve marketing strategies and sales performance in the future

csv data-analysis eda jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 11 Apr 2026

https://github.com/rogernet/desafio-profissional-produto-data-driven

Ajudar a formar Analistas de Produto, PMs e Gestores de Negócio capazes de tomar decisões estratégicas baseadas em dados.

data-analysis data-science data-visualization product

Last synced: 23 Jun 2026

https://github.com/nakshjainsonigara/vba-canteenmanagementsystem

The Canteen Management System is a comprehensive software solution designed to modernize and optimize canteen operations. It aims to simplify the complexities of managing a canteen by automating key processes such as order management, payment processing, and report generation.

canteen canteen-mangement-system charts data-analysis email excel microsoft payment-gateway vba vba-excel vba-macros word

Last synced: 30 Jan 2026

https://github.com/nafisalawalidris/international-breweries

This GitHub readme provides an overview of data analysis using SQL on the International Breweries dataset, including dataset description, analysis questions, example SQL queries, and key insights derived from the analysis.

data-analysis insights international-breweries-dataset queries sql

Last synced: 31 Jan 2026

https://github.com/tynoee/record_company-database

A record company database with multiple query commands using SQL

data-analysis sql

Last synced: 31 Jan 2026

https://github.com/luabagg/worldwide-trends

Worldwide Google Trends visualization and classification

data-analysis data-visualization google-trends trends

Last synced: 03 Feb 2026

https://github.com/jendives2000/regressions

Performing of a Linear Regression analysis to determine the strength of the relationship between the number of reviews and sales for a retail company.

data-analysis linear-regression pearson-correlation-coefficient regression

Last synced: 04 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026

https://github.com/pcanadas/weather_scraper

Este proyecto automatiza la recopilación y el procesamiento de datos meteorológicos históricos y previsionales. Utiliza Selenium para extraer información de sitios web de clima, procesa los datos con Pandas y los almacena en archivos CSV limpios. Es ideal para análisis climáticos, visualización de datos o integración en otros sistemas.

beautifulsoup data-analysis pandas python selenium

Last synced: 05 May 2026

https://github.com/mohitsai/boston-housing-data-analysis

Data Analysis Project for the City of Boston Government for insights into effect of property rennovations and remodelling on housing availability in the city

data-analysis data-science matplotlib numpy pandas python

Last synced: 05 May 2026

https://github.com/anushkundu/crime-pattern-analysis

Analyzing Crime Patterns in Montgomery County, USA: An Inclusive Study Based on NIBRS Data (2016-2022)

data-analysis data-visualization descriptive-statistics matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/iamrajmani/sentimental-analysis

Sentimental Analysis - Final Year College Project

data-analysis data-visualization machine-learning python pytorch

Last synced: 06 May 2026

https://github.com/sankaran-s2001/us-traffic-accidents-analysis-python-eda

Exploratory data analysis of US traffic accidents from 2016-2023, analyzing patterns by time, location, weather, and severity using Python data science libraries.

data-analysis data-science data-visualization eda matplolib numpy pandas python

Last synced: 06 May 2026

https://github.com/josepablodmg/python--linear-regression-advertising

A linear regression analysis to predict sales based on advertising spending across TV, radio, and newspaper channels. The project includes exploratory data analysis, model training, coefficient visualization, and residual analysis.

advertising data-analysis exploratory-data-analysis linear-regression machine-learning python regression scikit-learn visualization

Last synced: 06 May 2026

https://github.com/vimlesh-gupta/blinkit_data_analytics_project

End-to-end Blinkit data analytics project using Python, SQL Server & Power BI

blinkit data-analysis eda pandas powerbi python sql-server

Last synced: 06 May 2026

https://github.com/rlalpha49/anisearch-model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2026

https://github.com/urbanekda/upwork_dashboard

A data analysis project examining trends and patterns in the data science job market on Upwork. This project analyzes job postings, requirements, and market demands to provide insights into the freelance data science ecosystem.

data-analysis data-science data-science-projects data-visualization freelance jupyter-notebook python streamlit

Last synced: 07 May 2026

https://github.com/jjkay03/discord-call-extractor

Collect HTML data from Discord group/DM to create database of calls

data-analysis database discord discord-tool

Last synced: 07 May 2026

https://github.com/mahmoudnamnam/fc-barcelona-reports

FC Barcelona Reports: An interactive web application to analyze and visualize FC Barcelona's match data. Built with Streamlit, it scrapes match data from WhoScored, stores it in MongoDB, and presents insights through interactive visualizations like pass networks, shot maps, and player statistics.

data-analysis data-visualization football-analytics mplsoccer pandas streamlit web-scraping

Last synced: 07 May 2026

https://github.com/joseph-pabian/life-expectancy-

Statistical analysis of life expectancy in developed vs developing countries using SQL and Python

data-analysis duckdb public-health python sql statistics

Last synced: 07 May 2026

https://github.com/ddihora1604/advanced_business_analytics_on_world_bank_global_financial_inclusion_data_2021

Bridging the Gaps in Financial Inclusion: Understanding the Cash-Credit Paradox, Divide between Cash and Digital Payments, and Financial Resilience.

advanced-excel business-analytics data-analysis data-engineering data-mining data-visualization database exploratory-data-analysis machine-learning preprocessing-data python

Last synced: 07 May 2026

https://github.com/biginformatics/git-basics

Hands-on Git and GitHub lessons for analysts and statisticians

data-analysis git github public-health training

Last synced: 10 Jun 2026

https://github.com/aidan-zamfir/the-iliad

Data analysis & relationship network for the characters of Homers Iliad

data data-analysis dataframes networks networkx python selenium spacy webscraping

Last synced: 08 May 2026

https://github.com/devexpress-examples/wpf-pivot-grid-group-date-time-values

This example shows how to group date-time values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 08 May 2026

https://github.com/shelton-beep/trading-algorithm

A simple trading algorithm for SPY ETF using a moving average crossover strategy. This project analyzes SPY weekly price data, implements a buy/sell algorithm, and tracks performance metrics to evaluate profitability and risk. Ideal for learning algorithmic trading basics and financial data analysis.

data-analysis financial-analysis investment-strategy jupyter-notebook pandas python quantitative-finance technical-analysis time-series-analysis trading-strategies

Last synced: 08 May 2026

https://github.com/sabaasif2501/netflix-data-analysis

Exploratory data analysis of Netflix content using Python and pandas. Content types, genres, countries, and release years.

data-analysis netflix pandas portfolio-project python

Last synced: 08 May 2026

https://github.com/urmesthamondal/data_analysis_projects

Portfolio Data analysis projects built using Excel, Python, SQL and for visualization used Power bi .

data-analysis pivot-tables powerbi python sql sql-server visualisation

Last synced: 09 May 2026

https://github.com/magnus0969/black-friday-sales-analysis

An in-depth analysis of Black Friday sales data to uncover trends, customer behavior, and product insights. Utilizing Python, data visualization, and machine learning techniques, this project provides key business intelligence to optimize sales strategies.

analysis data-analysis data-science python sales-analysis

Last synced: 09 May 2026

https://github.com/master-helix/ibm-data-analyst-certification-stock-analysis-project

This is a mini project repository of my IBM Certification involving stock analysis and plotting of Tesla and GameStop

analytics data data-analysis data-visualization ibm matplotlib pandas python web-scraping

Last synced: 09 May 2026

https://github.com/marvinmarnold/oipm_stop_search

OIPM's analysis on Stop & Search (frisk) activity by the New Orleans Police Department.

data-analysis frisk new-orleans oipm police search stop

Last synced: 22 Jul 2025

https://github.com/yeopster/datascience_notebook

Compilation of my Notebook based on Kaggle Dataset

data-analysis data-science kaggle notebook python

Last synced: 10 May 2026

https://github.com/andersoncrs/aprendizaje_no_supervisado_kmeans_customers

Este repositorio contiene un análisis de datos de clientes de un centro comercial utilizando técnicas de aprendizaje no supervisado, específicamente K Means y clustering jerárquico. El objetivo del proyecto es segmentar a los clientes en grupos homogéneos para entender mejor sus comportamientos y características.

data-analysis kmeans-clustering matplotlib numpy seaborn visualization

Last synced: 10 May 2026

https://github.com/crazy-dot/covid-19-analysis

This project performs an in-depth analysis and visualization of COVID-19 data, focusing on India and its states/union territories.

covid-19-india data-analysis jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 10 May 2026

https://github.com/pipe199x/end-to-end-prediction-california

End-to-end prediction project using various technologies to predict housing prices in California.

california-housing data-analysis machine-learning python

Last synced: 11 May 2026

https://github.com/ceia-prefeitura/urban-lit-tracker-etl

UrbanLitTracker coleta artigos acadêmicos sobre mudanças urbanas via OpenAlex API, processa e armazena em MongoDB. Oferece dashboard interativo com Dash, exibindo dados como trabalhos mais relevantes, autores e palavras-chave frequentes, facilitando a análise e visualização da literatura urbana.

academic-research bibliometrics data-analysis data-pipeline data-visualization etl openalex-api urban-studies

Last synced: 11 May 2026