An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/aroramrinaal/spotistats

Spotistats is a data analysis and visualization project based on your Spotify streaming history.

data-analysis numbers spotify spotify-history visualization

Last synced: 15 Mar 2025

https://github.com/abhiram-kandiyana/us-bikeshare-analysis

Explorative analsis on a bike-share system (Motivate) to understand it's pain points

data-analysis data-visualization

Last synced: 26 Mar 2025

https://github.com/abhipatel35/diabetes_ml_classification

Predict diabetes using machine learning models. Experiment with logistic regression, decision trees, and random forests to achieve accurate predictions based on health indicators. Complete lifecycle of ML project included.

classification data-analysis data-science data-visualization descision-tree diabetes-prediction jupiter-notebook logistic-regression machine-learning model-evaluation open-source pandas pycharm-ide python random-forest scikit-learn

Last synced: 20 Jan 2026

https://github.com/danielrosehill/data-projects-index

Data apps and datasets deployed to Streamlit Community Cloud, Hugging Face, and elsewhere.

data-analysis data-science data-visualization

Last synced: 16 Mar 2026

https://github.com/the-pinbo/dimensionalityredux-pca-vs-autoencoders

Comparative study of PCA and Autoencoders for effective dimensionality reduction, assessed through PSNR and SSIM metrics.

autoencoder-mnist autoencoders data-analysis dimensionality-reduction image-compression mnist neural-networks pca psnr ssim

Last synced: 13 May 2025

https://github.com/mansiikumarii/mysql

A curated collection of MySQL scripts covering DDL, DML, and DRL operations. Ideal for beginners to practice and understand core SQL concepts.

backend data-analysis data-modeling database database-integration database-management database-performance database-schema mysql mysql-admin mysql-database orm php-mysql query-optimization rdbms sql sql-query sql-script stored-procedure

Last synced: 19 May 2026

https://github.com/joe-stifler/llm-sig-playground

This repository is a collaborative space for MSc Earth Science students at Imperial College London to experiment with and apply Large Language Models (LLMs) to real-world Earth Science problems. Follows below the persona playground link.

data-analysis earth-science llms machine-learning research-automation

Last synced: 29 Mar 2025

https://github.com/ginga1402/youtube_analysis

Exploratory Data Analysis on YouTube data

college-project data-analysis pandas-python

Last synced: 30 Mar 2025

https://github.com/jigyasag18/aircraft-data-management

This repository offers a comprehensive simulation of global military air deployments involving 10 countries, aircraft models, mission types, and strategic zones. It analyzes air power distribution, mission intent (offensive, defensive, support), and geopolitical positioning. The project provides structured insights into regional & zone level threat

aircraft-data aircraft-performance data data-analysis data-visualization database database-management dataset datavisualisation mysql powerbi powerbi-report powerbi-visuals sql

Last synced: 04 Feb 2026

https://github.com/windjammer6/9.-employee-exit-data-analysis-python

A personal project to analyse data from a Employee Exit survey from DETE and TAFE. Python libraries used: Numpy, Pandas, Matplotlib

data-analysis python

Last synced: 24 Mar 2025

https://github.com/arkww/matmap

Making maps from a Database and making the user guess which map is displayed

data-analysis data-science javascript python

Last synced: 24 Apr 2026

https://github.com/balajimohan18/loan-classification-datascience-project

This project uses machine learning algorithms to predict the classification of loan status. The dataset is loaded and some transformation is done using SQL for getting a proper dataset with some valid informations.

classification data-analysis data-cleaning data-science data-visualization loan-prediction loan-status machine-learning sql supervised-learning

Last synced: 03 Sep 2025

https://github.com/alchemine/analysis-tools

Analysis tools for machine learning projects

data-analysis explanatory-data-analysis machine-learning python

Last synced: 06 Aug 2025

https://github.com/johannaschmidle/bookauthors

Explored a book sales database. Cleaned data using Excel and created an interactive dashboard to analyze author popularity, ratings, and sales trends. The project highlighted key insights such as sales performance and rating distributions [Excel]

author-sales book-sales books data-analysis data-visualization excel

Last synced: 04 Feb 2026

https://github.com/misaghmomenib/shop-revenue-analysis

A Data Analysis Project Aimed at Analyzing and Forecasting Shop Revenue Based on Sales and Other Business Metrics. It Helps to Identify Trends, Patterns, and Key Factors Influencing Revenue to Make Data-driven Decisions for Business Growth.

data-analysis data-visualization python

Last synced: 24 Mar 2025

https://github.com/lmuffato/jiboia

Jiboia is a Python package for automatically normalizing and optimizing DataFrames efficiently.

data-analysis data-science dataframe normalization pandas python

Last synced: 19 May 2026

https://github.com/analysisbyvivek/road-accident

Analyzes road accident patterns, exploring factors like lighting, weather, speed limits, time of day, and road conditions to uncover trends in severity and frequency.

data-analysis data-visualization eda jupyter-notebook kaggle tableau-public

Last synced: 19 Jun 2026

https://github.com/al-ogr/sf_pr2_job_analysis_hh_sql

SkillFactory DataScience PROJECT-2. Анализ вакансий из HeadHunter

data-analysis data-science ipynb plotly python sql

Last synced: 19 May 2026

https://github.com/noturlee/iris-dataanalyis

This project aims to classify Iris flowers into three species—setosa, versicolor, and virginica—based on their sepal and petal measurements using machine learning techniques. The dataset comprises 150 samples evenly distributed among these species

data-analysis data-modeling data-science data-structures-and-algorithms data-visualization

Last synced: 08 Apr 2025

https://github.com/sweta-kaundilya/sql_projects_data_analytics

This repository contains SQL porfolio projects

data-analysis mysql-database mysql-workbench

Last synced: 10 Sep 2025

https://github.com/samaalharbi2/project-data-science-blog-post

A data science project from Udacity’s Nanodegree — exploring what drives developer success

crisp-dm data-analysis data-science data-visualization nanodegree udacity

Last synced: 26 Jan 2026

https://github.com/juliuspinsker/bioconductor-learning-container

🧬 Containerized development environment for Harvard's Professional Certificate in Data Analysis for Genomics (PH525.x series). Streamlined setup for Bioconductor, R, and genomic data analysis with RStudio and DevContainer support.

bioconductor bioinformatics chip-seq data-analysis data-science devcontainer dna-methylation docker edx functional-genomics genomics harvard harvardx ph525 ph525x r reproducible-research rna-seq rstudio single-cell-rna-seq

Last synced: 14 May 2026

https://github.com/codesaadumair/exploratory-data-analysis

A centralized repository showcasing various Exploratory Data Analysis (EDA) projects using Jupyter notebooks, visualizations, and accompanying documentation.

data-analysis data-science data-visualization eda jupyter-notebook jupyterlab python

Last synced: 24 Mar 2025

https://github.com/odessaz/portfolio-projects

This is a repository I have created to showcase skills, share projects and track my progress in Data Analytics and Data Science

applied-mathematics data-analysis data-science excel jupyter-notebook matplotlib-pyplot pandas portfolio python r r-studio seaborn sql statistics

Last synced: 12 Apr 2026

https://github.com/ayushbaid/football_stats

Analysing the competitiveness in different European football leagues

data-analysis football

Last synced: 03 Apr 2025

https://github.com/sweta-kaundilya/911-calls-capstone-project

For this capstone project we will be analyzing some 911 call data from Kaggle.

data data-analysis data-visualization jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 28 Apr 2026

https://github.com/adrianlardies/from-data-to-insight

This project creates and manages a MySQL database to analyze the performance of Bitcoin, Gold, and the S&P 500 in response to economic factors. It integrates historical data, executes advanced SQL queries, and visualizes key insights, showcasing the power of SQL and Python in financial analysis.

data-analysis data-science matplotlib pandas python seaborn sql

Last synced: 12 Apr 2026

https://github.com/fatihilhan42/hollywood-theatrical-market-synopsis-1995-to-2021

In this project, the data of hollywood film production companies from 1995 to 2021 were examined. Significant tables and graphs were created using data visualization algorithms, with the tickets sold divided into categories.

data data-analysis data-science data-visualization

Last synced: 23 Mar 2025

https://github.com/ujjwalll/econometrics_analysis_of_india_gdp_misestimation

A Econometric Analysis of the India's GDP to determine whether their is any flaw in India's GDP, as quoted by Dr. Arvind Subhramanium.

coefficient-estimates data-analysis econometrics economics gdp india r statistics

Last synced: 31 Oct 2025

https://github.com/chaganti-reddy/ai-prototype-customer-segmentation

Artificial Intelligence Prototype product based model for Customer Segmentation in E-Commerce Industry.

artificial-intelligence cluster-analysis customer-segmentation data-analysis machine-learning product-based prototype

Last synced: 13 Mar 2025

https://github.com/prasad-chavan1/bank_data_analysis_r

Bank data analysis in R language

data data-analysis data-science r

Last synced: 24 Feb 2025

https://github.com/1401dev/iowa-liquor-retail-sales-analysis

This repository contains the analysis of Iowa liquor retail sales data, aimed at uncovering sales trends and forecasting future sales patterns. The project involves data cleaning, preparation, and advanced time series analysis using Microsoft SQL Server and Google Colab.

customer-behavior data-analysis data-cleaning data-science data-visualization exploratory-data-analysis forecasting google-colab machine-learning microsoft-sql-server pandas prophet python retail-analytics retail-sales sales-forecasting sales-performance sql statsmodels time-series-analysis

Last synced: 16 Feb 2026

https://github.com/lucashomuniz/Project-05

Statistical Analysis of Hospitalization Costs: Leveraging SQL and R for Insights

anova-analysis anova-test data-analysis finance-analysis-data language-r linear-regression sql stastistical-model statistical-analysis

Last synced: 20 Oct 2025

https://github.com/carvalhoandre/coletor-tweets

Criado para coletar e armazenar tweets utilizando a API do Twitter. Inicialmente inspirado no caso de uso do livro Um Voluntário na Campanha de Obama, este projeto tem como objetivo demonstrar a importância do monitoramento no X. O coletor permite buscar tweets sobre qualquer termo desejado

data-analysis mongodb python twiter-analysis twitter

Last synced: 19 May 2026

https://github.com/leosimoes/nexoseducacao-imersao-powerbi

Atividades realizadas na Imersão PowerBI pela Nexos Educação com Karine Lago e Leticia Smirelli em Setembro de 2023.

business-intelligence dashboards data-analysis microsoft-power-bi

Last synced: 06 Jan 2026

https://github.com/josephbarbierdarnal/matoolkit

matoolkit is a python package containing a toolbox for creating visually appealing graphs/annotations in matplotlib

data-analysis data-visualization matplotlib

Last synced: 31 Mar 2025

https://github.com/adarshpheonix2810/fake-job-post-detection

This project focuses on detecting fake job posts using machine learning. Fake job advertisements are often created to scam individuals by stealing personal information or money.

data-analysis deep-learning joblib machine-learning nlp-machine-learning numpy pandas python scikit-learn tkinter

Last synced: 12 Apr 2026

https://github.com/mumtaz4118/nlp-course

Programming Assignments and Lectures for Stanford's CS 224: Natural Language Processing with Deep Learning

course data data-analysis data-analytics data-science data-visualization deep-learning education machine-learning natural-language-processing neural-network transfer-learning

Last synced: 24 Nov 2025

https://github.com/mindlessmuse666/train-test-splitter

Анализ данных о пассажирах Титаника и разбиение на обучающую и тестовую выборки. Практическое задание по дисциплине "Основы применения методов искусственного интеллекта в программировании".

data-analysis data-preprocessing data-visualization machine-learning pandas python scikit-learn seaborn titanic train-test-split

Last synced: 12 Apr 2026

https://github.com/andersoncrs/prediccion-del-precio-de-vehiculos-un-enfoque-con-regresion-lineal-y-regularizacion

Este proyecto tiene como objetivo predecir el precio de vehículos usados utilizando técnicas de regresión lineal y regularización Lasso. A través del análisis y procesamiento de datos, se construye un modelo predictivo preciso e interpretable basado en las características más relevantes de cada vehículo.

data-analysis data-exploration lasso-regression machine-learning polinomial-regression regularization-methods

Last synced: 03 Jul 2025

https://github.com/shrunga92/restaurant_order_analysis_sql

This project is a structured SQL-based analysis of restaurant orders, aimed at deriving key insights from transactional data.

data-analysis sql

Last synced: 03 Jul 2025

https://github.com/krzysikd/apartment-prices-in-poland-analysis-and-visualization

Data Analyst portfolio project that involves cleaning, transforming, and visualizing data to create an insightful dashboard. The project uses SSIS for ETL processes, SSMS for database management and queries, and Power BI for data visualization, focusing on the analysis of rental and sales apartment prices in Poland.

data-analysis data-cleaning data-visualizations powerbi sql sqlserver ssis

Last synced: 04 Feb 2026

https://github.com/2013xile/sheethub

Organize, import, export, concatenate sheet files on web application.

data-analysis data-wrangler excel sheets

Last synced: 08 Apr 2025

https://github.com/rauhanahmed/auto-data-analyzer

AutoDataAnalyzer: Automate data ingestion, analysis, and visualization with AI/ML-powered pipelines. Features natural language query processing, interactive Plotly visualizations, and seamless deployment via Docker.

ai-powered-analysis automated-pipeline cicd data-analysis data-visualization docker end-to-end-project flask generative-ai langchain llama3-1 machine-learning natural-language-processing plotly python3 pywebio

Last synced: 12 Apr 2026

https://github.com/jabulente/kruskall-wallis-test

This repository contain project that provides a reusable Python function to perform the Kruskal-Wallis H-test across multiple continuous variables, grouped by a categorical feature

data-analysis data-science eda hypothesis-tests kruskal-wallis kruskals-algorithm scipy-stats statistics

Last synced: 22 Jul 2025

https://github.com/3rd-son/movie-streaming-service-analysis

Exploratory Data Analysis of the Streaming Services like Neflix, Hulu, Disney+ etc

data-analysis exploratory-data-analysis jupyter-notebook matplotlib numpy pandas plotly python seaborn

Last synced: 18 Apr 2026

https://github.com/madi-s/tennispredictor

Program to predict outcomes of major tennis matches.

data-analysis prediction-algorithm python scraper tennis webdriver

Last synced: 06 Jul 2025

https://github.com/jatin-s16/digital-marketing

This repository contains raw data for Marketing analysis along with key business questions. I performed data cleaning using Python and its libraries and extracted meaningful insights. The results were then visualised using Tableau to enhance business understanding.

data-analysis data-science python3 tableau

Last synced: 16 Mar 2025

https://github.com/prgermux/defect-finder

Defect Finder is an interactive Python-based GUI application for detecting and analyzing mechanical and non-mechanical defects in data. It provides defect visualization, periodicity analysis, and statistical insights, making it ideal for research and quality control workflows.

data-analysis defect-detection gui pyqt5 python quality-control statistics visualization

Last synced: 24 Mar 2025

https://github.com/dzakwanalifi/stadata-x

Terminal UI untuk menjelajahi dan mengunduh data BPS Indonesia secara interaktif

bps-api cli-app data-analysis data-visualization indonesia-statistics indonesian-data open-data python statistics terminal-ui textual tui

Last synced: 20 Jan 2026

https://github.com/srvcl/lung-cancer-survival-analysis

Data Cleaning of a dataset and Survival Analysis in R Language

data-analysis data-science data-visualization r survival-analysis

Last synced: 11 May 2026

https://github.com/iamsainikhil/data-visualization

Visualization of Web data using Python

data-analysis data-visualization python webscraping

Last synced: 13 Jun 2026

https://github.com/leosimoes/datascienceacademy-python

Atividades do curso Fundamentos de Linguagem Python Para Análise de Dados e Data Science (Com ChatGPT) da DataScienceAcademy.

chatgpt data-analysis data-science python

Last synced: 02 May 2026

https://github.com/leosimoes/digitalinnovationone-analise-datasets

Projeto prático "Análise de dados com Python e Pandas" do Bootcamp "Banco Carrefour Data Engineer" da Digital Innovation One.

data-analysis data-science python

Last synced: 24 Mar 2025

https://github.com/kathkoeh/pimaindian-kk

Logistic regression analysis of diabetes risk using the Pima Indians dataset. Includes prevalence analysis, modeling, ROC/AUC evaluation, and patient testing in Python.

data-analysis diabetes epidemiology logistic-regression machine-learning public-health python

Last synced: 28 Apr 2026

https://github.com/abishekaditya/machinelearningintro

Some simple stuff with pandas and Scipy

data-analysis ipython machine-learning pandas python scipy

Last synced: 12 Apr 2026

https://github.com/ot-code/sql-sabor-y-tradicion

A SQL-driven project that integrates menu and order data to reveal insights on dish performance, customer preferences, and spending trends. It informs pricing strategies, menu adjustments, and targeted promotions, ultimately enhancing the overall customer experience and driving business growth.

analytical-queries data data-aggregation data-analysis database-design join-queries mysql order-analytics relational-databases restaurant-data sql sql-script

Last synced: 08 Apr 2025

https://github.com/hadeel-13/new_home

New Home is a Website for Buying and Selling Real Estate with user preferences, it is my Graduation project with a grade of 93%.

bootstrap5 chartjs css css3 data-analysis data-mining google-maps html html5 javascript jquery

Last synced: 12 Apr 2026

https://github.com/soyuid/bakery-data-analyst

# About the Project This Bakery Data Analysis project was created to help bakery owners understand their sales patterns. With in-depth data analysis, it is expected to provide useful insights to improve sales and operational strategies.

bakery data-analysis python sales visualization

Last synced: 24 Mar 2025

https://github.com/galahad20/b244006e_analisis_data

Data Analysis project at Dicoding course "Belajar Analisis Data dengan Python". I learn to do analyst on data and visualizing it to get meaningful insight.

data-analysis data-analytics python streamlit

Last synced: 06 Apr 2026

https://github.com/vikpires/ds_tips-dataset

Projeto individual do bootcamp de ciência de dados avanti 2024.2, com o objetivo de analisar e observar padrões no conjunto de dados "Tips".

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn tips

Last synced: 17 Sep 2025

https://github.com/mysftz/statistics-analysis

A python statistical analysis of a dataset and probability.

data-analysis matplotlib python python3 statistical-analysis

Last synced: 29 Jun 2025

https://github.com/wardenkenny/data-analyst-portfolio

A repository I have created to show and explore data analytics.

data-analysis excel r spreadsheets sql tableau

Last synced: 02 Apr 2025

https://github.com/datastalker/survival-cox

This repository contains an R script for performing survival analysis on breast cancer surgery data from the University of Chicago's Billings Hospital. The analysis includes Kaplan-Meier estimation and Cox Proportional Hazards modeling to assess patient survival.

breast-cancer-prediction cox-model data-analysis data-science data-visualization epidemiology kaplan-meier r survival-analysis

Last synced: 02 Apr 2025

https://github.com/eubrunoo/beer-consumption-predictor

An R project analyzing the impact of environmental factors on beer consumption in São Paulo, with a predictive linear regression model.

data-analysis data-science data-visualization machine-learning r statistical-analysis statistics

Last synced: 02 Apr 2025

https://github.com/jabulente/tanzania-geographical-zones

This project provides a geospatial visualization of Tanzania's geographical zones and regions. It uses geospatial data to map each zone, display regions, and annotate them for easy identification. The visualizations include simulated data to demonstrate thematic mapping techniques.

ai data-analysis data-science data-visualization geopandas geospatial location matplotlib ml python tanzania tanzania-geographic tanzania-locations

Last synced: 19 May 2026

https://github.com/hari7261/data-visualization

Python-based application built using CustomTkinter for the graphical user interface (GUI) and Matplotlib for data visualization. It allows users to import datasets, perform real-time data visualization, and analyze data using various chart types and machine learning techniques.

data-analysis data-visualization export hari7261 import python realtime-visualization

Last synced: 17 Jun 2025

https://github.com/mimi-netizen/python-and-machine-learning-in-financial-analysis

This comprehensive repository covers financial data analysis using Python and machine learning techniques, including time series modeling, portfolio optimization, risk assessment, credit risk prediction, and deep learning applications in finance.

data-analysis data-science data-visualization finance financial-analysis financial-data financial-modeling

Last synced: 19 May 2026

https://github.com/gaboelc/analysis-of-the-employment-situation-in-costa-rica-2018-2022

This is an analysis with data extracted from the INEC in order to identify the changes that occurred in the Costa Rican labor market before, during and after the COVID-19 pandemic.

costa-rica data-analysis empleo employment

Last synced: 24 Mar 2025

https://github.com/shreeparab1890/chat-analyzer

This project is a Data Analysis project to analyze the WhatsApp chats.

data-analysis numpy pandas python

Last synced: 12 Apr 2026

https://github.com/eco786786/salaries

This analysis explores the factors influencing salaries for data professionals from 2020 to 2024, including job titles, experience levels, remote work ratios, employment types, company locations and sizes. Using data from Kaggle, the project uncovers trends and insights to guide both companies and professionals in the tech industry.

data-analysis git postgresql powerbi

Last synced: 19 May 2026

https://github.com/m4tice/qm_project

Bicycle project crowd evaluation.

data-analysis data-engineering data-visualization

Last synced: 16 Mar 2025

https://github.com/alan-oliveir/state-of-data-2022

Neste projeto faço a análise da distribuição das faixas salariais para os profissionais de nível júnior para o cargo de analista, cientista e engenheiro de dados.

data-analysis jupyter-notebook pandas-python seaborn-python

Last synced: 03 Oct 2025

https://github.com/parthds02/pizza_sales_sql

SQL project analyzing pizza sales data. Includes creating tables, executing queries, and solving basic to advanced analytical questions to derive insights from sales data.

analytics data-analysis data-science pizza-sales sql sql-query

Last synced: 04 Mar 2026

https://github.com/zenithclown/finfolio

A Personal Finance Management Tool for the Developers, by the Developer

data-analysis data-science finance finance-application finance-management good-habits personal-finance portfolio

Last synced: 04 Feb 2026

https://github.com/treasarose/us_candy_distribution_analysis_project

This project focuses on advanced data analysis and optimization using SQL. It includes queries for analyzing sales, product margins, and shipping efficiency for a US candy distributor.

data-analysis entity-relationship mssql optimization query sql-server sqlproject us-candy-distributor

Last synced: 12 Oct 2025

https://github.com/ashwin331133/hospital_allpatients_waitinglist_data

This Power BI project analyzes patient waiting lists across various medical specialties and case types (Day Case, Inpatient, Outpatient). The goal is to gain insights to improve healthcare management and resource allocation.

data-analysis data-visualization powerbi

Last synced: 03 Jan 2026