An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/deliprofesor/customerseg-customer-segmentation-and-shopping-analysis

This project performs data exploration, segmentation, and modeling of wholesale customer data using clustering algorithms, PCA, and decision trees to analyze purchasing behavior and predict customer channel preferences.

clustering customer-segmentation data-analysis data-visualization dbscan decision-tree gmm kmeans machine-learning pca

Last synced: 24 Jun 2025

https://github.com/scailfin/rob-webapi-flask

Default RESTful Web API implementation for the Reproducible Open Benchmarks for Data Analysis Platform (ROB) using the Flask web framework.

benchmarks data-analysis reproducibility webapi

Last synced: 17 Mar 2026

https://github.com/iwasakiyuuki/data-analysis-platform-airflow-dag

A collection of Airflow DAGs for automating data collection into our on-premises data analysis platform.

airflow airflow-dags data-analysis data-collection

Last synced: 13 May 2025

https://github.com/steviecurran/prediction-plot

Code to performs machine learning (k-nearest neighbours regression) and plot the predicted versus measured values

astrophysics c data-analysis high-redshift machine-learning pgplot python statistics tensorflow visualization

Last synced: 20 May 2026

https://github.com/ahnaf19/clean_bankingdata

Here I tried to practice simple ETL tasks. I know how to perform these tasks in SQL, here just explored my way around using pandas as well.

data-analysis data-cleaning pandas python

Last synced: 19 Apr 2026

https://github.com/lunarwhite/lake-george-viz

Geroge Lake data analysis and visualization, ANU COMP1730/6730

data-analysis python

Last synced: 01 Nov 2025

https://github.com/nemat-al/multivariate_data_analysis

Tasks for Multivariate Data Analysis Course @ ITMO University

data-analysis multivariate-analysis python

Last synced: 20 May 2026

https://github.com/silasberger/charts-analysis

Data set collection, preprocessing and analysis of singles- and album charts

charts data-analysis data-mining data-science dataset music

Last synced: 14 Sep 2025

https://github.com/ranxi2001/predicting-mental-health-risk

数据分析案例-精神健康预测(数据来源kaggle)

data-analysis data-visualization eda

Last synced: 27 Jun 2025

https://github.com/fer-aguirre/cookiecutter-data-analysis-extensive

A cookiecutter template for data analysis projects using Python.

cookiecutter data-analysis project-template python

Last synced: 09 Apr 2025

https://github.com/samruddhi3012/rfm-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 27 Jun 2025

https://github.com/vetrivel07/flight-price-prediction

Developed a flight price prediction model using Python, analyzing historical data to forecast airfare prices and help travelers make informed booking decisions

data-analysis data-visualization jupyter-notebook numpy pandas python

Last synced: 15 Jun 2025

https://github.com/buildwithlal/introduction-to-data-science-in-python-coursera

introduction to data science in python, part of Applied Data Science using Python Specialization from University of Michigan offered by Coursera

data-analysis matplotlib numpy pandas

Last synced: 03 May 2026

https://github.com/cassiofb-dev/fide-rating-analysis

The plot speaks for itself

chess data-analysis fide hans rating

Last synced: 15 Jun 2025

https://github.com/245839/automobile-analysis

Analysis of data on imported cars to the USA performed in Python using libraries for data analysis in the Jupyter environment.

data-analysis jupyter-notebook python

Last synced: 20 May 2026

https://github.com/vlad1343/data-visualisation

Python project showcasing interactive and static visualizations using Plotly and Matplotlib. It includes analysis of CSV, JSON, and API data, turning complex datasets into clear, insightful charts.

anova api csv-files data-analysis data-visualization json matplotlib matplotlib-pyplot pandas pandas-python plotly python3 seaborn seaborn-python

Last synced: 08 Apr 2026

https://github.com/faizantkhan/python_matplotlib

Matplotlib is a powerful Python library for creating visualizations and plots. It’s widely used for data representation, making complex information more accessible and interpretable. It offers various types of plots, including line graphs, scatter plots, bar charts, histograms, and more

data-analysis data-analytics data-engineering data-science data-visualization deep-learning graphs line machine-learning machine-learning-algorithms matplotlib matplotlib-pyplot matplotlib-python python

Last synced: 20 May 2026

https://github.com/vzamboulingame/data-portfolio

This repository showcases my projects in Python and SQL, highlighting my skills in data analysis & visualization.

data-analysis data-portfolio data-science data-science-portfolio data-science-projects data-visualization jupyter-notebook portfolio python sql

Last synced: 20 May 2026

https://github.com/farhad-here/tegenx

TeGenX: Multilingual Text Generation App.TeGenX is a lightweight, interactive text generation application built with Streamlit. It leverages multiple pre-trained transformer models to generate text in both English and Persian.

data-analysis data-science deep-learning happytransformer huggingface nlp python stream text-generation text-generator textgeneration transformer web-application

Last synced: 25 Jan 2026

https://github.com/gabrielramirezv/rnaseq_2025_notas

Repository for RNA-seq class from the Undergraduate Program in Genomic Sciences.

data-analysis r rna

Last synced: 29 Mar 2025

https://github.com/samruddhi3012/health-care-analytics

Hi! This repo involves analyzing the Healthcare analytics using Advanced Microsoft Excel.

dashboard data-analysis data-visualization healthcare microsoft-excel pivot-chart pivot-tables vlookup

Last synced: 29 Mar 2025

https://github.com/saravanansuriya/energy-consumption-analysis

Project will analyze energy usage and greenhouse gas (GHG) emissions of Ontario's Broader Public Sector (BPS) organizations, leveraging a comprehensive database of reported data in Power Bi

data-analysis data-cleaning powerbi python-script

Last synced: 22 Mar 2025

https://github.com/kineticloom/plydb-fun-nfl-analyst

Analyze NFL data with your AI agent

data-analysis football-analytics nfl

Last synced: 15 May 2026

https://github.com/srummanf/elnino-anomaly-study

Study on El Niño’s impact on Chennai groundwater sustainability

data-analysis machine-learning python satellite-imagery-analysis

Last synced: 15 May 2026

https://github.com/s-narasimman/zepto_inventory_sql_data_analysis

This project focuses on data cleaning, exploration, and analysis of product information from the Zepto dataset using SQL. It provides actionable insights into pricing, stock availability, discounts, and category-level performance.

aggregation categorization csv data-analysis data-cleaning kaggle postgresql sql zepto

Last synced: 16 May 2026

https://github.com/deborangueira/campeonado_kaggle_2025

Desenvolvimento de um modelo de machine learning para prever o sucesso de startups. O objetivo é identificar quais empresas têm maior probabilidade de se tornarem casos de sucesso no mercado.

computacao data-analysis desafio kaggle modulo3 ponderada

Last synced: 16 May 2026

https://github.com/pabi1234810/data_analysis_zepto

A comprehensive SQL-based business intelligence solution for analyzing grocery store product data, inventory management, and pricing strategies. This project demonstrates end-to-end data analysis workflow from raw data exploration to actionable business insights.

analytics csv data-analysis data-science database excel kaggle kaggle-dataset mathematics pgadmin4 sql utf-8 zepto

Last synced: 01 Nov 2025

https://github.com/alinababer/data-science-and-insight-agent-rag-llama3-lava-llm

Data-Science-and-Insight-Agent-RAG-LLama3-Lava-LLM-Django-WebApplication is an advanced AI-driven chatbot designed to assist in data science, document analysis, and image interpretation. This repository contain the Datascience Agent of this project.

artificial-neural-networks classifcation data-analysis data-engineering data-visualization datascience large-language-models llama2 lstm machine-learning python random-forest regression

Last synced: 01 Jan 2026

https://github.com/an0n1mity/spamclassifiereval

A repository for evaluating the misclassification rate of spam classification models using a threshold-based approach.

data-analysis machine-learning natural-language-processing python-programming spam-classification text-classification

Last synced: 02 Nov 2025

https://github.com/jakobzmrzlikar/pca-on-genomes

An analysis of human genome mutations from different populations.

data-analysis genome-analysis pca-analysis

Last synced: 16 May 2025

https://github.com/edanur-y/abalone-age-prediction-with-regression-models

Comparing the performances of simple linear, multiple linear, multi-layer perceptron and k-nearest neighbors regressions on abalone data to predict the age.

data-analysis hyperparameter-tuning missing-values-analysis outlier-analysis python recursive-feature-elimination

Last synced: 20 May 2026

https://github.com/ksharma67/eda-on-ipl

In this python notebook, analysis of IPL matches from 2008 to 2020 is done using python packages like pandas, matplotlib and seaborn.

data-analysis data-science eda matplotlib numpy pandas python seaborn

Last synced: 07 May 2026

https://github.com/12danielll/neurogenomics_project

This project focuses on analyzing sequencing data to understand molecular mechanisms of neurological diseases and predict the effectiveness of immunotherapy in breast cancer patients. It integrates Python and R scripts for data processing, statistical analysis, and visualization, alongside a comprehensive report detailing methods and findings.

bioinformatics biostatistics clustering clustering-algorithms data-analysis data-visualization deseq2 differential-gene-expression functional-analysis immune-therapy machine-learning neurological-disease neuroscience pca-analysis python r seurat single-cell-analysis

Last synced: 06 Apr 2026

https://github.com/adrianlardies/multi-asset-financial-analysis

Comparative analysis of bitcoin, gold and S&P 500 in relation to macroeconomic indicators (VIX, interest rate, CPI). We explore the evolution of a $100 monthly investment in these assets, presenting visualizations to evaluate their performance and potential as financial diversification tools.

data-analysis data-science matplotlib pandas python seaborn

Last synced: 09 May 2026

https://github.com/hemant-kumar786/heart-disease-prediction

Heart Disease Analysis project in RStudio using statistical methods and data visualization. Includes data cleaning, exploratory data analysis (EDA), correlation study, and insights on key health indicators influencing heart disease.

correlation-study data-analysis data-visualization eda healthcare heart-disease r rstudio statical-analysis

Last synced: 02 Nov 2025

https://github.com/pentalpha/eu-car-emissions-analysis-2015

Analysis of CO² Emissions on Passenger Cars at the E.U. Contries, Year 2015.

data-analysis data-science dataset jupyter-notebook python python3

Last synced: 15 May 2026

https://github.com/pentalpha/bti-performance-study

A series of analysis on a large amount of data about the grades of students in the Technology Information course at UFRN

analysis big-data clustering data-analysis data-science data-visualization ipynb ipython jupyter-notebook performance-analysis plot python python3

Last synced: 15 May 2026

https://github.com/mtholahan/advanced-mysqlquery-tuning-mini-project

Analyzed EuroCup 2016 data with advanced SQL queries. Imported CSV datasets into MySQL, designed schema with match, player, and referee details, and implemented queries covering match outcomes, penalty shootouts, player stats, bookings, substitutions, and referee activity to explore tournament dynamics.

bootcamp data-analysis data-engineering data-modeling database eurocup football mysql queries soccer sports springboard sql

Last synced: 15 May 2026

https://github.com/erseco/ugr_tratamiento_inteligente_datos

Repositorio de trabajo de la asignatura Tratamiento Inteligente de Datos del Máster en Ingeniería Informática de la Universidad de Granada (UGR)

data-analysis data-mining ugr

Last synced: 26 Apr 2026

https://github.com/errea/vet_clinic_database

For this project you need special preparation. As the goal of this project is to solve some performance issue, first we need to introduce those issues. In order to do that, you will populate your database with a significant number of data.

data data-analysis data-structures data-visualization database

Last synced: 21 May 2026

https://github.com/kashirin-alex/thither.direct-onamove

an android skeleton-example application for using data from Thither.Direct platform on mobile applications

android-application data data-analysis data-structures data-visualization mobile-development mobility query research-data-management

Last synced: 27 Apr 2026

https://github.com/habiburrahman-mu/data-wrangling

Data Wrangling is the process of converting data from the initial format to a format that may be better for analysis.

data-analysis data-mining data-science

Last synced: 21 May 2026

https://github.com/andersoncrs/regularizacion_lasso_en_modelos_de_regresion_lineal

Este repositorio contiene un análisis detallado sobre la implementación de la regularización Lasso en modelos de regresión lineal para predecir el precio de vehículos. Se parte de un conjunto de datos limpio y se aplican diversas transformaciones y modelados para mejorar la precisión de las predicciones.

data-analysis data-science data-visualization jupyter-notebook linear-regression regularization-methods seaborn sklearn

Last synced: 16 May 2026

https://github.com/dcs-training/regressionandmixedeffectsmodelling

This course will introduce you to regression and linear mixed-effects models (LMMs). It will help to develop your theoretical understanding and practical skills for running such models in R. Go to the readme file

data-analysis r rmarkdown statistics

Last synced: 25 Feb 2025

https://github.com/dcs-training/introtodatabases

This repository host the material connected to a training developed by Dave Elsmore (Edina) for CDCS. Go to the readme file

data-analysis data-wrangling databases sql

Last synced: 10 Jun 2026

https://github.com/rahmamohammad/retail_project

Retail & Data analytics: KPIs, sales trends, Excel planning pack, forecasting & inventory tracking.

data-analysis data-visualization ecommerce excel jupyter-notebook matplotlib python retail-analytics storytelling

Last synced: 17 May 2026

https://github.com/teja-1403/forage-tata-data-visualisation-empowering-business-with-effective-insights

This repository contains solutions to the 4 different tasks that must be performed during the Data Visualisation: Empowering Business with Effective Insights virtual internship provided by TATA via Forage.

analysis-and-reporting analytics analytics-and-decision-science charts communications dashboards data-analysis data-cleanup data-interpretation data-storytelling data-visualizations graph insights power-bi visual-basic visualizations

Last synced: 18 Feb 2026

https://github.com/daniel-jcvv/daniel-jcvv

👨‍💻 Data Engineer | 3+ years enterprise experience with Telcel & Citi Banamex Develop ETL pipelines, data governance, and cloud solutions. Building scalable data architectures and automated workflows for Fortune 500 clients. Tech Stack: Python, SQL Server, Oracle, Apache Airflow, PySpark

agentic-ai apache-airflow apache-kafka apache-spark automation business-intelligence citi-bank-apis data-analysis data-engineering data-lake data-warehouse etl-pipeline medallion-architecture mlops n8n-workflow python rag sql-server

Last synced: 15 Apr 2026

https://github.com/mvharsh/blinkit-sales-dashboard

An interactive Power BI dashboard visualizing Blinkit's sales performance across outlets, item types, and customer ratings for strategic insights.

blinkitdashboard data-analysis data-visualization powerbi

Last synced: 25 Jan 2026

https://github.com/shafaq-aslam/data-gathering

A hands on collection of notebooks exploring multiple techniques of data gathering, from reading CSV, Excel, JSON, and SQL files to exporting data in various formats and fetching real time data through APIs. This repository documents my complete learning journey of data ingestion, preparation, and extraction for data analysis workflows.

api data-analysis data-export data-gathering data-import data-science jupyter-notebook machine-learning pandas python python3

Last synced: 21 May 2026

https://github.com/j-wu1/analyse_ventes_jeuxvideo_python

Analyse Exploratoire de Données (EDA) sur les ventes de jeux vidéo avec Python, Pandas, Matplotlib et Seaborn dans un Jupyter Notebook.

data-analysis eda jupyter-notebook matplotlib pandas python seaborn

Last synced: 19 Aug 2025

https://github.com/as16082023/global-electronics-retailer

Analyzed Maven Electronics' performance data to identify factors driving revenue decline since 2020.

advanced-excel data-analysis data-visualization

Last synced: 03 Feb 2026

https://github.com/anonymo2239/big-data-churn-analyzer

Scalable customer churn prediction using PySpark. Includes EDA, feature engineering, modeling, and real-time inference on new data.

big-data churn-analysis churn-prediction classification-algorithm data-analysis data-science data-visualization modeling pyspark

Last synced: 21 May 2026

https://github.com/gui-sitton/bank-loans

In this project I will prepare a report for a bank's loan division. I find out whether a customer's marital status and number of children have an impact on loan default, as well as other factors

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 21 May 2026

https://github.com/abhipatel35/moviematcher-movie-recommender-system

A robust movie recommendation system using the MovieLens dataset, employing Collaborative Filtering, Matrix Factorization, and Hybrid Models to enhance recommendation accuracy and diversity.

collaborative-filtering content-based-filtering data-analysis eda hybrid-models machine-learning matrix-factorization movie-recommendations movielens-dataset python recommender-system surprise-library

Last synced: 21 May 2026

https://github.com/eesunmoon/genai_cor-recom

[Project] Outfit Coordination Recommender System using KoAlpaca

data-analysis fine-tuning generative-ai huggingface keyword llm numpy pandas python python3 selenium

Last synced: 06 Apr 2026

https://github.com/tapas-gope/pizza-sales

This project analyzes Pizza Sales Data to provide insights into customer preferences and sales performance. Key metrics include total revenue, orders, and average order value, with a breakdown by pizza category and size. The dashboard identifies peak sales periods and top-selling items, supporting data-driven business decisions.

business-intelligence dashboard data-analysis data-visualization dax powerbi sales-analysis

Last synced: 02 Jan 2026

https://github.com/kaushik-puttaswamy/food-delivery-time-prediction-using-machine-learning

The Food Delivery Time Prediction Model estimates delivery times using regression algorithms, with XGBoost as the best performer, and is deployed as a real-time application via Streamlit.

data-analysis data-science delivery food-delivery geolocation machine-learning modeldeployment predictive-modeling python realtimeproject regression-models streamlit xgboost

Last synced: 16 Apr 2026

https://github.com/shivani8136/bellabeat-smart-device-data-analysis

This project analyzes smart device fitness data to uncover insights into user behavior, engagement, and wellness patterns. Conducted for Bellabeat, a high-tech company specializing in health-focused smart products for women, this analysis supports strategic decisions around product development and feature prioritization.

data-analysis data-visualization r-programming-language

Last synced: 08 Feb 2026

https://github.com/touradbaba/multi-page_dash_application

This repository contains a Multi-Page Dash Application designed to provide interactive visualizations of geo-spatial data, focusing on population and GDP. The app offers insights into demographic and economic trends through interactive maps and various types of charts. It is built with Python, using Plotly and Dash, and is deployed on Heroku.

dash dashboard data-analysis data-visualization exploratory-data-analysis heroku-deployment plotly pythonanywhere

Last synced: 27 Jul 2025

https://github.com/balajimohan18/power-bi-visualization-project

This repository contains Visualization Projects which is visualized through Power BI Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and by the insights we can reduce damages by accidents & calamities.

data-analysis data-cleaning data-science data-visualization exploratory-data-analysis microsoft-excel microsoft-power-bi microsoft-powerpoint powerbi powerbi-visuals powerpoint-slides

Last synced: 08 Mar 2026

https://github.com/kheriberto/logistic_regression_project

A project that analyses dummie data from an advertising company using logistic regression

data-analysis logistic-regression pandas python scikit-learn seaborn

Last synced: 08 Apr 2026

https://github.com/jailsonsb2/kit-analise-de-dados

🚀 Um kit de ferramentas Python para acelerar a análise de dados. Carregue arquivos de forma inteligente (CSV, Excel, etc.) e converta notebooks Jupyter para scripts de produção sem esforço.

analise-de-dados analise-exploratoria analise-exploratoria-de-dados automation automations dados data-analysis data-cleaning etl etl-automation jupyter-notebook pandas powerquery python toolkit

Last synced: 29 Apr 2026

https://github.com/hayatiyrtgl/cryptocurrency_time_series_rnn

Python script for training a Simple RNN model on cryptocurrency price data to predict future prices, including data exploration and evaluation

data-analysis data-science data-visualization keras pandas pandas-python prediction predictive-modeling python python-script rnn rnn-tensorflow tensorflow time-series time-series-analysis

Last synced: 08 Apr 2026

https://github.com/l0rd-inquisit0r/data-analytics

A repository of data analytics implementations in Python

ai data-analysis data-analysis-python data-analytics

Last synced: 18 Jun 2025

https://github.com/ejw-data/tableau-drug-study

Brief analysis of drug treatments that were also analyzed with pandas

data-analysis tableau

Last synced: 02 Jan 2026

https://github.com/saidulalimallick04/smart-traffic-violation-pattern-detector-dashboard

This project is a Streamlit web application designed to analyze traffic violation data. It provides a user-friendly interface to explore, visualize, and gain insights from traffic violation datasets. Users can upload their own data, perform analysis, and view summaries and trends.

dashboard data-analysis data-visualization internship-project pandas python smart-traffic streamlit

Last synced: 18 Apr 2026

https://github.com/faizantkhan/automated-eda

This repository showcases tools for automatic Exploratory Data Analysis (EDA) in Python. These tools help you quickly understand your datasets and generate insightful reports.

automatic automation autoviz data-analysis data-analysis-python data-science data-visualization dtale dtale-library eda exploratory-data-analysis ml pandas pandas-profiling python python-library sweetviz

Last synced: 18 Apr 2026

https://github.com/anamakarevich/suicide_rates_factors

Female suicide rates analysis for Udacity Hacathon

data-analysis data-cleaning linear-regression suicide

Last synced: 21 May 2026

https://github.com/danicaalana/sales-review-sentiment-analysis

This project is a sentiment analysis project using a machine learning model. It analyzes Amazon product reviews to determine whether the sentiment expressed is positive, negative, or neutral using Multinomial Naive Bayes Method.

amazon data-analysis data-science machine-learning naive-bayes python sales-review sentiment-analysis

Last synced: 15 May 2026

https://github.com/puspacempaka/hackerrank-sql-challenges-intermediate

This repository features solutions to various intermediate-level SQL challenges from HackerRank. It includes efficient SQL queries, problem-solving techniques, and well-documented scripts. Explore these solutions to understand different SQL problems and enhance your skills.

challenges data-analysis database hackerrank-solutions queries sql sql-intermediate-level

Last synced: 02 Jan 2026

https://github.com/simranshaikh20/credit-card-dashboard

A Data Visualization Project using Microsoft Power bi

data-analysis data-visualization powerbi

Last synced: 02 Jan 2026

https://github.com/mmzong/gee_lifestyleeffectsonhypertension

Generalized Estimating Equations (GEE), Quasi-likelihood under the Independence Model Criterion (QIC), Longitudinal data, Embedded box plots within violin plots with hypertension risk categories, spaghetti plots, aggregate line plots, histograms, faceted-area plots, box and jitter plots. Investigating the impact of lifestyle on health.

aggregate-line-plot area-faceted-plots box-plots data-analysis data-manipulation data-science data-visualization generalized-estimating-equations histograms jitter-plots longitudinal-data qic quasi-likelihoods r spaghetti-plots violin-plots

Last synced: 29 Jul 2025

https://github.com/iliyasalve/cyclistic_case_study

Analysis of the Bike-Sharing System for the following question: "How do annual members and casual riders use Cyclistic bikes differently?"

bike-sharing data data-analysis data-visualisation r

Last synced: 06 Apr 2025

https://github.com/ginga1402/car_price_prediction

Predict the price of a car using MS Excel.

college-project data-analysis excel linear-regression

Last synced: 30 Mar 2025

https://github.com/abhishekyadav915/diwali_sales_analysis

This project aims to analyze sales data during the Diwali festival using Python. The analysis focuses on identifying key trends, customer purchasing behavior, and sales performance across different segments. By leveraging data visualization and statistical analysis, we uncover insights.

data-analysis data-visualization matplotlib-pyplot numpy-library pandas-dataframe seaborn-python

Last synced: 05 Apr 2025

https://github.com/tashi-2004/data-visualization-tableau-traffic-collision-insights

Analysis of traffic collision data using Tableau, featuring interactive visualizations that highlight trends in injuries and fatalities, contributing factors, and geographic distributions. It includes various sheets and dashboards, with recommendations for enhancing road safety. The dataset is available for further exploration.

data-analysis data-visualization eda geospatial-analysis machine-learning predictive-modeling statistics tableau traffic-analysis

Last synced: 19 Mar 2026

https://github.com/nishumehta/uber-rides-data-analysis

An in-depth analysis of Uber ride data for the year 2016, to uncover patterns in ride behavior, mileage trends, and frequent start locations to generate actionable insights for business decisions.

data-analysis jupyter-notebook matplotlib-pyplot pandas python tableau-dashboards

Last synced: 09 May 2026