An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sakan811/stress-pattern-occurrence-in-english-words

This project is intended to provide English learners with data that allows them to make a data-driven guess when encountering words that they aren't sure where to stress

data-analysis data-visualization english english-language english-learning language powerbi powerbi-report powerbi-visuals

Last synced: 20 Jun 2026

https://github.com/aonurakman/data-analysis-and-ml-algorithms

An exploration of data analysis techniques and standard ML algorithms on QSAR oral toxicity dataset. - 2021 - Yıldız Technical University

classification clustering data-analysis data-mining isolation-forest python regression

Last synced: 20 Jun 2026

https://github.com/evanmathew/northwind-traders

SQL-powered analysis of sales, employee performance, and customer behavior using PostgreSQL window functions. This project uncovers key business insights to optimize decision-making.

case-study data-analysis jupyter-notebook northwind-traders postgresql python-postgresql sql

Last synced: 20 Jun 2026

https://github.com/haseebn19/urban-housing-demand

A full-stack web application for visualizing housing and labour market data

data-analysis data-visualization docker full-stack gradle statistics web webapp

Last synced: 22 Jun 2026

https://github.com/engusseus/warframe-market-set-profit-analyzer

Python tool that analyzes Warframe Market data to find profitable item sets to trade

api data-analysis python trading waframe

Last synced: 23 Jun 2026

https://github.com/emaleckova/emaleckova.github.io

My personal website created with Quarto

biology data-analysis data-viz quarto r

Last synced: 23 Jun 2026

https://github.com/ladaegorova18/data_analysis

Learning the basics of data analysis in Python

analytics data-analysis data-visualization steam-games

Last synced: 24 Jun 2026

https://github.com/parsabordbar/ctx3docs

The Documentation for context Tree Project.

ai-tools context ctx3 ctx3-docs data-analysis documentation tree workflow

Last synced: 25 Jun 2026

https://github.com/chdre/data-analyzer

A small package to analyze and preprocess data.

data-analysis python

Last synced: 28 Jun 2026

https://github.com/imgabreuw/minicurso-python-para-financas

Mini curso de Python para finanças, disponibilizado por Varos.

data-analysis financial-analysis python

Last synced: 29 Jun 2026

https://github.com/mikkelrask/henryrollins-scraper

FANATIC! A dataset of Henry Rollins' listens on his KRCW radio show, with data dating back to 2017 - 496 episodes of weird and rare finds, fast paced punk and frog sounds. Includes a scraper that keeps the data up-to-date with henryrollins.com

archive data-analysis data-visualization music

Last synced: 29 Jun 2026

https://github.com/poglolopez/prueba_tecnica_inlaze

Este repositorio muestra mis habilidades en análisis de datos a través de una prueba técnica para Inlaze. Incluye flujos de trabajo con Python, SQLite y Power BI para analizar el comportamiento de jugadores, depósitos y rendimiento de fuentes de tráfico, destacando eficiencia operativa e información estratégica.

data-analysis data-v etl jupyter powerbi python sqlite

Last synced: 26 Feb 2025

https://github.com/mysftz/numerical-methods-in-matlab

Multiple MatLab scripts over multiple data analysis assignments.

data-analysis data-science matlab university university-assignment

Last synced: 14 May 2025

https://github.com/nmelgar/birthday_sports_dataviz

We will analyze how the Matthew Effect has influenced in professional sports players.

analysis csv data data-analysis data-science data-visualization datavisualization dataviz probability research tableau

Last synced: 08 Jan 2026

https://github.com/iness000/online-retail-customer-segmentation

This project performs comprehensive customer segmentation analysis on an online retail dataset using machine learning clustering techniques and RFM (Recency, Frequency, Monetary) analysis. The goal is to identify distinct customer segments to drive better customer relationship management strategies and business insights.

customer-segmentation data-analysis k-means

Last synced: 31 Aug 2025

https://github.com/pranjalya/hand-washing-data-visualisation

A small project of Data Visualization, where we analyze the effect of hand washing after introduced by Dr. Semmelweis to the nurses and midwives after giving birth.

data-analysis data-visualization jupyter-notebook pandas python3

Last synced: 06 May 2026

https://github.com/zimmi48/nixpkgs-issues

Analysis on nixpkgs issue lifetime.

data-analysis github-api nixpkgs

Last synced: 10 May 2026

https://github.com/virajbhutada/hr-analytics-excel-sql-tableau-powerbi

Explore a comprehensive HR Analytics portfolio showcasing data analysis and visualization skills. Featuring dashboards in Power BI, Excel, and Tableau, along with SQL queries for deeper insights. A holistic view of expertise in HR analytics, data visualization, and database management. Let's dive into the game of data insights!

data-analysis data-management data-visualization excel hr-analytics interactive-dashboards portfolio-project postgresql powerbi powerbi-visuals sql sql-queries tableau tableau-public

Last synced: 02 Aug 2025

https://github.com/virajbhutada/diamond-price-estimator

This project develops a predictive model to estimate diamond prices based on characteristics like carat, cut, color, and clarity. It covers data preprocessing, feature engineering, model selection, training, and evaluation. The final product is a web app where users can input diamond attributes to get accurate and instant price predictions.

cross-validation css data-analysis data-science-projects data-visualization eda feature-engineering html hyperparameter-tuning jupyter-notebooks machine-learning ml-algorithms model-deployment model-selection performance-optimization predictive-modeling python python-app user-interface

Last synced: 14 Apr 2026

https://github.com/satvikpraveen/pandasplayground

📊 A comprehensive pandas mastery project with 10 modular Jupyter notebooks covering data loading, cleaning, grouping, merging, time series, visualization, and performance profiling. Includes real-world workflows, Docker, Streamlit, and reusable utils. Ideal for data scientists and analysts to learn, practice, and refer. Practice-ready and modular.

analytics cheatsheet data-analysis data-cleaning data-pipeline data-science data-visualization docker etl exploratory-data-analysis jupyter-notebook jupyterlab learning-resource memory-profiling open-source pandas performance-tuning python streamlit time-series

Last synced: 10 Apr 2026

https://github.com/hi-jin2/data-analysis-basics

데이터분석기초(R) 수업 중에 작성한 소스코드 모음입니다. 『모두를 위한 R 데이터 분석 입문』 교재를 통해 R언어를 학습하였습니다.

data-analysis r r-studio

Last synced: 19 Jul 2025

https://github.com/barraharrison/spotify-listening-trends

Using EDA to look at song longevity, regional preferences, and streaming behavior in the charts and on Spotify.

data-analysis data-visualization jupyter-notebook kaggle-dataset

Last synced: 03 Feb 2026

https://github.com/farhad-here/median-performance-comparison

Benchmarking the performance of median calculation using vanilla Python vs NumPy.

data-analysis matplotlib numpy python

Last synced: 18 Apr 2026

https://github.com/karlyndiary/spotify-excel-dashboard

Data Analysis on the Spotify Dataset using Microsoft Excel and VBA.

charts data-analysis data-cleaning data-visualization excel excel-export excel-vba pivot-tables

Last synced: 04 Jan 2026

https://github.com/serlo/data-pipeline-interactive-exercises

processing pipeline for exercise dashboards

data-analysis serlo

Last synced: 26 Feb 2025

https://github.com/okwilkins/retailanalysis

A comprehensive exploratory analysis and implementation of kmeans/hierarchical clustering on online retail data.

data-analysis data-science machine-learning statistics

Last synced: 18 Oct 2025

https://github.com/azaz9026/data_cleaning

Welcome to the Data Cleaning repository! This collection is dedicated to showcasing techniques and methods for cleaning and preparing datasets for analysis.

data-analysis data-engineering data-structures data-visualization eda feature-engineering machine-learning numpy outliers pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/bibymaths/python_snippets

A collection of Python scripts for bioinformatics data analysis, including tools for transcription counts, nucleotide composition, and protein sequence evaluation.

amino-acid-scoring bioinformatics data-analysis fasta-generation mathematical-evaluation nucleotide-analysis protein-sequence-analysis transcription-counts

Last synced: 29 Jul 2025

https://github.com/ssoehdata/sql_for_data_science_specialization_course

Materials and Certifications from the SQL for DataScience Course

data-analysis data-science database databricks postgresql sql sqlite

Last synced: 10 Apr 2026

https://github.com/laudebugs/fec-data-analysis-2020

The project aimed to determine the total sum of contributions to the candidate committees as well as the number of contributions made by individuals.

data-analysis fec presidential-candidates

Last synced: 16 May 2026

https://github.com/itrauco/data-dirtying-tool

a simple command line tool to generate dirty data and do common data things in google cloud

data data-analysis data-engineering data-ops data-pipeline data-science data-visualization data-wrangling dirty-data google-cloud machine-learning

Last synced: 24 Feb 2025

https://github.com/dsrodrigovieira/favoritasales

Este repositório contém o projeto desenvolvido para o desafio do kaggle "Store Sales - Time Series Forecasting. Use machine learning to predict grocery sales"

data-analysis data-science kaggle-competition machine-learning python telegram-bot xgboost-regression

Last synced: 05 May 2026

https://github.com/shubham200137/cyclistic-case-study

This repository contains a case study for Google's Data Analytics Professional Certificate, focusing on Cyclistic, a fictional bike sharing company in Chicago. The case study aims to drive growth by converting casual riders into members through a marketing strategy.

data-analysis data-visualization numpy-python pandas-python presentation-slides sql tableau

Last synced: 11 Jun 2026

https://github.com/elakkiya-u/digital-marketing-campaign

A machine learning project to predict whether a customer will convert based on digital marketing campaign data.

campaigns data-analysis deployment digital-marketing machine-learning predictive-modeling python

Last synced: 30 Jun 2025

https://github.com/diem0n/100daysofdatascience

This repository is a collection of things i do on as a data scientist each day as i am hired at a fictional company called keko corp

data-analysis data-engineering data-science data-science-from-scratch data-warehousing machine-learning python

Last synced: 09 Apr 2026

https://github.com/tolumie/loan-approval-prediction

Loan Approval Prediction using Machine Learning | EDA + Decision Tree, Random Forest & Logistic Regression | Automating loan eligibility for Dream Housing Finance by analyzing customer data and predicting loan approvals.

classification credit-risk-analysis data-analysis decision-tree-classifier finance-analytics loan-approval logistic-regression-algorithm machine-learning predictive-modeling-techniques random-forest

Last synced: 30 Jun 2025

https://github.com/tashi-2004/apache-hadoop-spark-hive-cyberanalytics

This project utilizes Apache Hadoop, Hive, and PySpark to process and analyze the UNSW-NB15 dataset, enabling advanced query analysis, machine learning modeling, and visualization. The project demonstrates efficient data ingestion, processing, and predictive analytics for network security insights.

ai apache-hadoop apache-hive big-data-analytics big-data-processing data-analysis data-engineering data-science data-security data-visualization hdfs machine-learning network-analysis network-security pyspark python3 threat-detection unsw-nb15-dataset

Last synced: 02 May 2026

https://github.com/tenifayo/analysis-of-fordgobike-trip-data

Data Visualization using Ford GoBike Trip Data

data-analysis matplotlib pandas

Last synced: 11 Jul 2025

https://github.com/ved-coder-king/wheat_ai_project

This project, Smart Wheat Farming AI System, was developed as part of the coursework for the Artificial Intelligence program at Esprit School of Engineering.

agriculture data-analysis data-visualization deep-learning image-classification machine-learning object-detection python wheat

Last synced: 15 Apr 2025

https://github.com/bala-1409/power-bi-visualization-project

This repository contains Visualization Projects which is visualized through Power BI Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and by the insights we can reduce the damages by accidents & calamities.

dashboard data-analysis data-science data-visualization exploratory-data-analysis microsoft-excel microsoft-power-bi microsoft-powerpoint power-bi powerbi powerbi-reports powerbi-visuals visualization

Last synced: 04 Jan 2026

https://github.com/emcramer/clockplot

Plotting utility for a "clockplot" that puts groups into a time-ordered heterogeneity visualization

biology data-analysis data-visualization heterogeneity pseudotemporal-ordering

Last synced: 10 Mar 2026

https://github.com/regmibijay/opencarp-analyzer

Reads Trace Files created by OpenCARP Models and exports data for easy plotting with inbuilt plotter script.

bioinformatics data-analysis opencarp

Last synced: 16 Jan 2026

https://github.com/felinjob/ibm-applied-data-science-capstone

Este projeto, parte da especialização IBM Data Science Professional Certificate, prevê o sucesso do pouso do Falcon 9 da SpaceX. Usando dados da API da SpaceX e Web Scraping, o projeto inclui análise de dados e Machine Learning para gerar insights sobre os lançamentos.

data-analysis data-science data-visualization ibm jupyter-notebook machine-learning numpy pandas python scikit-learn seaborn sql

Last synced: 11 Apr 2026

https://github.com/27ahmad/netflix_sql_project

The Netflix SQL Project analyzes the Netflix dataset using SQL queries to gain insights into its content, identify trends, and address business problems related to movies and TV shows.

data-analysis postgresql-database sql

Last synced: 03 Feb 2026

https://github.com/27ahmad/ibm-data-science-capstone

The Capstone is the final course in the IBM Data Science Professional Certificate program. It's a project that combines all the skills and knowledge you've gained throughout the specialization.

data-analysis data-science folium-maps machine-learning plotly-dash python sql

Last synced: 26 May 2026

https://github.com/aksoni07/movie-recommendation

A hybrid movie recommendation system designed to deliver personalized and accurate suggestions by combining user preferences, item attributes, and collaborative patterns, ensuring a seamless and engaging experience.

clustering content-based-filtering data-analysis embeddings jupyter-notebook numpy ollaborative-filtering pandas personalization python recommendation-systems scikit-learn user-item-interactions

Last synced: 11 Apr 2026

https://github.com/mudassir-a/vendor-performance-analysis

vendor performance data analysis project using sql, python and power bi

data-analysis powerbi python sql

Last synced: 18 May 2026

https://github.com/badranalyst/student-tests-data-analysis-application

Python-based analysis of student test scores in math, reading, and writing, examining correlations with parental education, lunch type, and test preparation. Includes data cleaning, visualization, and statistical insights into factors influencing academic performance.

data-analysis data-visualization dataset matplotlib numpy pandas python sklearn

Last synced: 05 May 2026

https://github.com/ianfelps/jornada_python

Projetos realizados durante a Jornada Python da Hashtag Treinamentos em maio de 2024.

artificial-intelligence automation data-analysis python

Last synced: 28 Apr 2026

https://github.com/zulfachafidz/titanic_explorer_predicting_survival_with_classification_using_knn_algorithm

Tracking Life Safety with the KNN Predictive Analysis Approach. Leveraging the Titanic Dataset, we apply classification analysis to predict the fate of passengers based on a variety of features.

algorithm algorithms data data-analysis data-mining data-science datamodeling datapreprocessing dataset knn-algorithm knn-classification machine-learning machine-learning-algorithms prediction-model

Last synced: 01 Sep 2025

https://github.com/andersoncrs/arboles_de_decision_calidad_del_vino

Contiene un análisis detallado de la calidad del vino utilizando un modelo de clasificación basado en árboles de decisión. Incluye la exploración de datos, detección y manejo de valores atípicos, análisis Univariado y Bivariado, y la creación y evaluación de un modelo predictivo. El objetivo principal es predecir la calidad del vino.

data-analysis data-science data-visualization machine-learning matplotlib seaborn sklearn tree-decision

Last synced: 20 May 2026

https://github.com/jatin-s16/hr_mysql_powerbi

This repository contains raw HR data along with key business questions. I performed data cleaning using MySQL queries and wrote analytical queries to extract meaningful insights. The results were then visualised using Power BI to enhance business understanding.

data-analysis data-science data-visualization mysql powerbi

Last synced: 29 May 2026

https://github.com/haonamnguyen/costumer-shopping-trends-analysis

This project analyzes a synthetic dataset of customer shopping behavior to see key trends and insights. Using SQL and Tableau, the analysis focuses on customer demographics, purchase patterns, and preferences, including age distribution, payment methods, shipping types, and top product categories.

data-analysis data-visualization sql tableau

Last synced: 05 Jan 2026

https://github.com/arianarmw/da01-bike-sharing-analysis

🚴‍♀️ Data analysis project on bike-sharing systems. Includes data wrangling, exploratory data analysis (EDA), visualization, and interactive dashboards built with Streamlit. Explore patterns in bike usage and rental data!

bike-sharing-analysis data-analysis exploratory-data-analysis python streamlit visualization

Last synced: 11 Apr 2026

https://github.com/chanmeng666/douban-review-scraper

【One star = One happy developer doing a little dance 💃⭐️】A robust Python scraper for collecting and analyzing movie reviews from Douban.com, featuring comprehensive data processing and analysis capabilities.

beautifulsoup4 data-analysis data-processing douban movie-reviews pandas python sentiment-analysis text-mining web-scraping

Last synced: 02 May 2026

https://github.com/takshshah-16/spotify_eda

Spotify data analytics and advanced querying

data-analysis eda pgadmin4 postgresql

Last synced: 03 Jul 2026

https://github.com/vedantshi/coffee-sales-dashboard

This project analyzes coffee sales data using Excel, featuring data cleaning, trend analysis, and an interactive dashboard. Key insights highlight top-performing products, regional sales trends, and seasonal patterns. Recommendations focus on marketing strategies and inventory optimization. Future plans include Power BI integration for visuals.

business-insights data-analysis data-visualization excel-dashboard pivot-tables sales-trends

Last synced: 05 Jan 2026

https://github.com/benami171/ml_knn_decision-trees

A ml implementation comparing Decision Trees and k-Nearest Neighbors (k-NN) algorithms for Iris flower classification. Features comprehensive analysis of different approaches including brute-force and entropy-based decision trees, along with k-NN using multiple distance metrics.

classification cross-validation data-analysis decision-trees iris-dataset k-nearest-neighbours machine-learning nearest-neighbors python

Last synced: 30 Jun 2025

https://github.com/mahmoudwal27/powerbi-projects-for-data-analysis

This project leverages Power BI for data visualization, DAX for custom calculations, and integrates SQL and Excel for data preprocessing, analysis, and reporting, enabling dynamic and interactive insights.

data-analysis data-analysis-project data-analytics-project project

Last synced: 07 Mar 2026

https://github.com/andrii-zapukhlyi/otomoto_visualization

Scraping, data visualization, and building a price prediction model with data from the car classifieds website otomoto.pl

data-analysis machine-learning r scraping statistics visualization

Last synced: 26 Jul 2025

https://github.com/sun-lab-nbb/sl-shared-assets

A Python library that stores assets shared between multiple Sun (NeuroAI) lab data acquisition and processing repositories.

data-analysis data-collection data-processing experiment sunlab

Last synced: 10 Mar 2026

https://github.com/vidyadnina/cyclistic-sql-tableau-project

Trip data analysis for a bike-sharing service company using SQL and Tableau.

bigquery dashboard data-analysis data-analytics-sql data-cleaning data-visualization sql

Last synced: 02 Jan 2026

https://github.com/lvmalware/lsm-module

A simple statistics module, which provides 4 basic types of regressions using the Least Squares Method (LSM)

data-analysis least-square-regression regression regression-analysis statistics

Last synced: 31 Mar 2025

https://github.com/hanzopgp/lolanalysis

League Of Legends game data engineering, analysis, visualization and machine learning. Business intelligence project.

data-analysis data-cleaning data-engineering data-visualization dataiku deep-learning etl machine-learning scraping university

Last synced: 27 May 2026

https://github.com/masum184e/exploratory_data_analysis_projects

This space to showcase my journey in exploring various datasets, uncovering patterns, and extracting meaningful insights. Each project highlights different aspects of EDA, demonstrating techniques and tools that are essential for making sense of data.

data-analysis data-analysis-projects data-science data-science-projects eda eda-projects exploratory-data-analysis exploratory-data-analysis-projects

Last synced: 31 Mar 2025

https://github.com/matheusafonseca/c111

Este repositório é dedicado ao armazenamento e organização dos códigos desenvolvidos na disciplina C111 - Análise de Dados, oferecida pelo Instituto Nacional de Telecomunicações (INATEL).

data-analysis matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/bhavinpatel4199/artificial-intelligence---ai-for-decision-making

Artificial Intelligence for Decision Making is a collection of projects focused on applying AI and machine learning techniques to solve decision-making challenges. It includes projects on wine quality prediction, Cassandra data modeling, and text classification, showcasing a range of data science and machine learning applications.

artificial-intelligence cassandra-cql data-analysis data-engineering data-preprocessing data-structures decision-making deep-learning feature-selection machine-learning-algorithms sentiment-analysis text-classification

Last synced: 20 Jun 2026

https://github.com/leosimoes/datascienceacademy-powerbi-clinicadebi

Atividades do curso Análise de Dados com Microsoft Power BI e Clínica de BI da Data Science Academy.

dashboards data-analysis data-visualization microsoft-power-bi power-bi

Last synced: 05 Jan 2026

https://github.com/tolumie/exploratory-data-analytics-projects

Exploratory Data Analytics – A collection of projects covering data exploration, feature engineering, hypothesis testing, and predictive modeling across diverse datasets, including insurance, real estate, laptops, cars, COVID-19, and the Olympics.

data-analysis data-visualization data-wrangling exploratory-data-analysis-eda feature-engineering hypothesis-testing machine-learning matplotlib numpy pandas predictive-modeling python seaborn statistical-analysis

Last synced: 11 Apr 2026

https://github.com/thenorthkun/movies-dataset-analysis

Analysis & categorizing of Movies based on Actors, Genres, Gross covered etc 🦸🏼🧜🏼‍♀️🎧

data-analysis data-visualization filtering

Last synced: 23 Mar 2025

https://github.com/anurag-kumar-molankala/sales-performance-dashboard

A Power BI dashboard that analyzes sales trends, product performance, customer segmentation, and payment distribution. It uses DAX, time intelligence, and interactive visuals for data-driven insights. The model includes Sales, Product, and Customer tables for in-depth analysis.

dashboards data-analysis data-visualization dax dax-functions dax-measures dax-query etl-process powerbi powerbi-visuals powerquery sql-query sql-server

Last synced: 03 Apr 2025

https://github.com/debjyotisaha/power-bi-projects-phase-1

Portfolio projects related to data visualisation in Power BI

data-analysis data-visualization dax-expression powerbi powerquery

Last synced: 18 Jan 2026

https://github.com/chitranjan806/predicting-on-time-premium-deposits

A Predictive analysis project to predict the success rate of On-Time deposits of Premiums by Policy Holders.

analytics-vidhya analytics-vidhya-competition catboostregressor data-analysis data-science linear-regression logistic-regression python3

Last synced: 16 May 2026

https://github.com/jbalooshie/movies-etl

Exercise working with movie datasets from Kaggle and Wikipedia. Python is used to extract, clean, and combine the data, and then it is loaded into a postgreSQL database.

data-analysis data-science jupyter-notebook numpy pandas postgresql postgresql-database python sqlalchemy

Last synced: 11 Apr 2026

https://github.com/shru924/ecommerce_customer_behavior_analysis

A machine learning project that analyzes and segments e-commerce customers based on behavior patterns using Python, Random Forest, and data visualization.

customer-segmentation data-analysis jupyter-notebook machine-learning matplotlib pandas python scikit-learn

Last synced: 11 Apr 2026

https://github.com/geoninja/reddit_data_analysis

Data analysis application presented at the 2016 NTC (Non-profit Technology Conference) in San Jose, CA.

data-analysis python reddit-data-analysis text-analysis

Last synced: 03 May 2026

https://github.com/anderson-andre-p/exploratory-data-analysis.roller-coaster

This repository contains an exploratory data analysis (EDA) project focused on roller coasters. The project involved organizing, cleaning, and visualizing the data to gain insights into roller coasters' characteristics and performance.

data-analysis eda exploratory-data-analysis exploratory-data-visualizations notebook

Last synced: 15 Mar 2025

https://github.com/agrdatasci/climmob-analysis

Workflow for data analysis applied on ClimMob.net

citizen-science data-analysis workflow

Last synced: 24 Jun 2025