An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/dzakwanalifi/reglins

regLins is an R package designed for performing linear regression analysis using various optimization methods. It also provides an interactive Shiny application for a more dynamic analysis experience.

data-analysis linear-regression optimization r shiny-app

Last synced: 09 Jul 2025

https://github.com/cosmoduende/r-earthquakes

Análisis y visualización de datos de actividad sísmica en México con R. Cómo analizar y visualizar la historia sísmica de México con datos del SSN (Servicio Sismológico Nacional)

data-analysis data-analytics data-science dataviz earthquakes r-code r-programming r-studio rstudio sismo sismologia sismos ssn ssnmx terremoto terremotos

Last synced: 24 Jan 2026

https://github.com/bhiogade/customer-purchase-analysis

Comprehensive Customer Purchase Analysis Across Multiple Dimensions

data-analysis data-visualization tableau tableau-desktop

Last synced: 02 Feb 2026

https://github.com/tolumie/web-scraping-rest-api-stock-data-operations

Web Scraping, REST API & Stock Data Operations is a data-driven project that explores the power of web scraping, API interactions, and stock market analysis using Python. From extracting stock data and public records to analyzing real-world financial trends, this repository is a one-stop resource for data enthusiasts, traders, and analysts.

api-integration data-analysis data-cleaning data-visualization financial-data python rest-api sql-databases stock-data web-scraping

Last synced: 19 May 2026

https://github.com/mrham17/spotify_streaming_analytics

Project is stable & documentation will be completed soon. Thank you for your understanding and patience.

big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics

Last synced: 24 Jul 2025

https://github.com/rahmamohammad/retail_project

Retail & Data analytics: KPIs, sales trends, Excel planning pack, forecasting & inventory tracking.

data-analysis data-visualization ecommerce excel jupyter-notebook matplotlib python retail-analytics storytelling

Last synced: 17 May 2026

https://github.com/ahmedkhaled404/data-cleaning-and-eda-layoffs-mysql

This project involves cleaning a dataset containing information about layoffs from companies around the world.

data data-analysis data-cleaning data-preprocessing datacleaning eda exploratory-data-analysis mysql sql

Last synced: 08 Jun 2026

https://github.com/bris0yzbekaye/json-to-excel-converter

This repository provides a tool to convert JSON data to Excel format (.xlsx). It allows you to easily transform structured JSON data into a well-organized spreadsheet for better analysis and visualization.

automation-script automation-tools data-analysis data-converter data-export data-formatting data-tools data-visualization excel excel-automation excel-converter excel-tools json json-exporter json-parser json-processing json-to-csv json-to-excel programming-tools spreadsheet-tools

Last synced: 25 Jul 2025

https://github.com/anastasius21/creditcardfrauddetection

This repository contains a Jupyter Notebook for Credit Card Fraud Detection Model and a csv dataset on which it is being trained

credit-card-fraud data-analysis data-science data-visualization fraud-detection logistic-regression machine-learning

Last synced: 16 Jun 2025

https://github.com/leandrocollares/street-cherry-trees-in-vancouver

Street cherry trees in Vancouver: an exploratory data analysis

data-analysis data-visualization folium pandas plotly-express

Last synced: 17 Sep 2025

https://github.com/netesf13d/expt-sequence-analysis

Data processing, analysis and visualization package for atomic physics experiments in the single-atom regime.

cold-atoms data-analysis data-visualization optical-tweezers

Last synced: 24 Jul 2025

https://github.com/matte34/auto-insurance-analysis

Conducted a comprehensive exploratory data analysis (EDA) on an auto insurance dataset that I found from Kaggle. I performed a permutation test and generated data visualizations.

data-analysis data-visualization permutation-test python3 scipy seaborn

Last synced: 06 May 2026

https://github.com/baguilar6174/python-jupyter-notebooks

Explore data analysis projects with Python, Jupyter and more tools. Discover stunning visualizations and reveal meaningful information in datasets to make informed decisions.

data-analysis jupyter-notebook kaggle pandas python

Last synced: 09 Apr 2026

https://github.com/shreeparab1890/india-gdp-rate-1960-to-2021-data-analysis

This ipython notebook is the Exploratory data analysis (EDA) of the India GDP Rate 1960 to 2021.

analysis data-analysis eda exploratory-data-analysis ipython-notebook jyputer-notebook matplotlib matplotlib-pyplot pandas python

Last synced: 06 Mar 2026

https://github.com/swethajoseph/netflix-powerbi-interactive-dashboard

Created an interactive Netflix Power BI dashboard to analyze and visualize Netflix's content library, uncovering trends in content type, genre distribution, and global reach

data-analysis data-visualization interactive-visualizations powerbi powerbi-dashboards powerbi-report

Last synced: 03 Jan 2026

https://github.com/emmarhoffmann/analysis-of-sleep-patterns-and-psychological-well-being-among-college-students

Explores the relationship between sleep patterns, psychological well-being, and lifestyle choices among college students using statistical analysis on 253 observations.

college-students data-analysis r statistical-models

Last synced: 04 Oct 2025

https://github.com/lashawnfofung/super-heroes-analysis-project

This portfolio project involves a detailed analysis of 732 superhero records from the heroes_information.csv dataset, comprising 11 columns of unique characteristics for each hero. The primary goal is to showcase key insights derived from this rich dataset, demonstrating proficiency in data analysis using SQL.

data-analysis datasets mysql-database mysql-server mysql-workbench sql

Last synced: 07 Jul 2025

https://github.com/tomy-jr98/air-quality-sql-project

Air pollution analysis using BigQuery and Tableau, with data cleaning, aggregation, and visualization.

air-pollution bigquery data-analysis portfolio sql tableau

Last synced: 25 Jul 2025

https://github.com/dimits-ts/sport-repression-repl-study

A replication Study for the recent paper "International Sports Events and Repression in Autocracies: Evidence from the 1978 FIFA World Cup" paper.

data-analysis jupyter regression-models replication-study statistical-analysis

Last synced: 25 Jul 2025

https://github.com/cescedes/medical-insurance-costs-with-python

Investigate how different factors affect the prediction of medical insurance costs by practicing many python concepts.

codecademy data-analysis python python-dictionaries python-functions python-lists python-loops python-strings

Last synced: 19 May 2026

https://github.com/vishal-verma-96/Pre-Owned-Car-Price-prediction-using-Streamlit-App

Capstone Project by skill Academy- Exploratory Analysis, Visualization and Prediction of Used Car Prices. Deploying the highest-scoring model with Streamlit web app

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms matplotlib numpy pandas python3 regression-algorithms scikit-learn seaborn streamlit

Last synced: 02 Mar 2025

https://github.com/andersoncrs/clasificacion-propina-restaurante

Este informe desarrolla, de manera clara y práctica, un análisis completo del conocido conjunto de datos de propinas (tips), mostrando paso a paso cómo transformar la información cruda en modelos predictivos útiles.

clasification data-analysis data-visualization tips

Last synced: 26 Jul 2025

https://github.com/sanafagal/wsp-msg-automation

An intuitive application for managing and analyzing customer and reseller data stored in Google Sheets, providing insights and streamlined data organization.

automation cloud-credentials data-analysis google-sheets-api python

Last synced: 16 Jun 2025

https://github.com/chahelgupta/hospital-readmission-prediction-and-analysis

The Hospital Readmission Prediction project uses clinical data to predict diabetic readmissions. SVM + SMOTE achieved 61.16% accuracy, with key predictors including hospital stay, lab tests, and medications.

data-analysis knn-classification logistic-regression machine-learning prediction prediction-model python random-forest-classifier smote svm-classifier

Last synced: 15 May 2026

https://github.com/riyajain255/customer-segmentation-for-e-commerce

This project analyzes online retail data to segment customers using K-Means clustering and build classification models to predict those segments based on purchasing behavior.

customer-segmentation data-analysis kmeans-clustering logistic-regression machine-learning matplotlib numpy pandas python random-forest scikit-learn seaborn-plots

Last synced: 02 Apr 2026

https://github.com/kushalagarwalla/netflix-movie-data-analysis

🚀 Netflix Data Analytics Project 🎬📊 | Analyzed 9K+ movies to uncover insights on genres, popularity, votes & release trends. Includes EDA, KPIs & visualizations using Python (Pandas, NumPy, Matplotlib, Seaborn). Supports data-driven content & engagement strategy.

data-analysis data-visualization jupyter-notebook numpy pandas python seaborn

Last synced: 06 May 2026

https://github.com/samiksha29-patil/flipkart-mobiles-data-analysis-visualization-in-python

This project analyzes Flipkart Mobiles Dataset to extract useful insights about mobile phones, their pricing, ratings, discounts, and customer reviews. The analysis and visualization are done using Python to understand market trends and customer preferences.

data-analysis data-visualization matplotlib numpy pandas python seaborn

Last synced: 04 May 2026

https://github.com/labex-labs/numpy-for-beginners

This comprehensive course covers the fundamental concepts and practical techniques of NumPy, the essential library for numerical computing in Python. Learn to create, manipulate, and analyze arrays efficiently.

array-manipulation array-slicing beginner-friendly course data-analysis data-science data-structures fast-computation hands-on labex labs linear-algebra matrix-operations numerical-computing numpy programming python python-programming scientific-computing vectorized-operations

Last synced: 20 Jun 2026

https://github.com/vitor-ace/sunspots-data-analysis

This is a Jupyter Notebook which works with Data Analysis logic and libraries implementation with Python.

data-analysis data-visualization debbuging error-handling file-handling matplotlib-pyplot numpy pandas python

Last synced: 06 May 2026

https://github.com/applicativesystem/numpy-builder

code getter and numpty operator for numpy operations

data-analysis numpy numpy-python shell-script

Last synced: 15 Aug 2025

https://github.com/amruthadevops/stock-market-analysis

To analyze market trends and predict future market behavior using machine learning techniques

data-analysis data-science jupyter-notebook machine-learning powerbi-desktop python stock-market

Last synced: 15 May 2026

https://github.com/hecatops/ad_libs

A real time advertisement data analytics platforming, displaying important metrics in easy to understand language.

dashboard data-analysis data-visualization kpi plotly-dash python

Last synced: 07 Nov 2025

https://github.com/azaz9026/loan_approval_prediction

Welcome to the Loan Approval Prediction repository! This project aims to build a predictive model that can determine whether a loan application should be approved or denied based on various features. Purpose The goal of this repository is to develop a machine learning model that can accurately predict loan approval decisio

data data-analysis data-visualization eda machine-learning numpy pandas python statistics

Last synced: 06 Apr 2026

https://github.com/zwelz3/unofficial-survivor-knowledge-graph

A comprehensive RDF knowledge graph covering all 50 seasons of Survivor (US), with 23,000+ triples across 749 named graphs.

data-analysis rdf survivor

Last synced: 23 May 2026

https://github.com/ddjain/jsonl-visualizer

A beautiful web tool for visualizing JSONL files with syntax highlighting and multiple view modes

data-analysis json jsonl viusal

Last synced: 18 Sep 2025

https://github.com/karsterr/repeated-measurement

An R-based workflow for conducting repeated measures ANOVA using the ez package, with data wrangling via tidyverse and visualization through ggplot2. Includes data import, transformation to long format, statistical analysis, and graphical summary.

anove data-analysis experimental-design ezanove ggplot2 r repeated-measurements rstats statistics tidyverse

Last synced: 18 Sep 2025

https://github.com/m-coder-umer/sales-dashboard-power-bi-project

An interactive Sales Dashboard built with Power BI using MySQL data, showcasing monthly trends, top-performing products, and key sales KPIs (Key Performance Indicators).

business-intelligence data-analysis data-cleaning data-modeling data-visualization dax interactive-dashboard mysql power-query powerbi sales-dashboard sql time-series-analysis

Last synced: 07 Jul 2025

https://github.com/ariyaarka/sales-analysis

A simple analysis on random dataset of pizza sales using SQL

data-analysis presentation-slides sql

Last synced: 17 Jan 2026

https://github.com/rh01/data-analysis-with-r

Duke University - Data Analysis With R

data-analysis r r-language r-studio rmarkdown

Last synced: 23 May 2026

https://github.com/Solrikk/PicTrace-Web

PicTraceV2 is a highly efficient image matching platform that leverages computer vision using OpenCV, deep learning with TensorFlow and the ResNet50 model, asynchronous processing with aiohttp, and Selenium for browser automation. PicTraceV2 allows users to upload images directly or provide URLs, quickly scanning a vast database to find image

automation computer-vision data-analysis data-extraction deep-learning image-processing image-search machine-learning natural-language-processing opencv openpyxl pandas python selenium tensorflow web-scraping yandex yandex-api

Last synced: 15 Aug 2025

https://github.com/lewismakau/portfolio-projects

This repository contains file data and SQL files for projects used for my Portfolio.

data-analysis data-cleaning data-structures data-visualization database google-analytics microsoft-sql-server mysql powerbi tableau

Last synced: 02 Apr 2026

https://github.com/ljadhav25/healthcare-data-collection-and-analysis

This repository contains a project focused on collecting healthcare data from the web, storing it in a structured format, and performing comprehensive analysis. The objective is to gather valuable health-related information, process and clean the data, and derive insights to support healthcare research and decision-making.

data-analysis data-visualization flask-application flask-backend html-css-javascript pycharm-ide python

Last synced: 09 Apr 2026

https://github.com/ramapinnimty/udacity-mlfoundation-nanodegree

This is a repository containing solutions to the assignments that are a part of the Udacity Machine Learning Foundation Nanodegree program.

assignments data-analysis python3 statistics udacity-machine-learning-nanodegree

Last synced: 26 Jul 2025

https://github.com/sharoonjoseph321/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 27 Jul 2025

https://github.com/hfzdzakii/dicoding-shipclusteringanalysisdataandmodelling

This repo is a master submission for my Dicoding Final Project. Ship Performance Clustering Dataset was being used to fulfill the submission. Feel free to explore and I hope my work give you some insight!

clustering data-analysis machine-learning

Last synced: 27 Jul 2025

https://github.com/susshiii/sql-layoffs-data-cleaning-and-eda

Full SQL project using MySQL to clean and analyze a real-world tech layoff dataset from 2020–2023.

data-analysis data-analytics-project data-cleaning eda layoffs mysql sql

Last synced: 07 Jul 2025

https://github.com/souravxbera/credit-card-approval-predictor

End-to-end Machine Learning project to predict credit card approval decisions using real-world financial features. Includes EDA, model training, and deployment-ready architecture

credit-card-approval-prediction data-analysis machine-learning python scikit-learn streamlit

Last synced: 15 May 2026

https://github.com/danymukesha/bioga

Apply multi-objective genetic algorithms to genomic data for biologically informed feature selection and pattern discovery.

data-analysis gene-expression genetic-algorithms genomics optimization-algorithms

Last synced: 18 Sep 2025

https://github.com/dmvianna/python-nix

Trivial Nix environment with pandas and postgresql

data-analysis nix

Last synced: 27 Jul 2025

https://github.com/grindelfp/data-analysis-example

One of my UNI Artificial Intelligence Systems course's projects.

data-analysis data-preprocessing ipynb

Last synced: 19 Sep 2025

https://github.com/tbep-tech/tbeploads

R Package for estimating nutrient loading to Tampa Bay

data-analysis loads package tampa-bay tbep tbnmc water-quality

Last synced: 19 Feb 2026

https://github.com/jofaval/ionosphere

Binary Classification of Ionosphere signals at Goose Bay, Labrador in 1988

data-analysis data-science data-visualization deep-learning google-colab keras machine-learning python scikit-learn tensorflow uci xgboost

Last synced: 09 Apr 2026

https://github.com/dina-hosny/retail-store-data-modeling-and-analysis-using-datastage

The project implements a star-schema data warehousing flow, then utilize IBM InfoSphere DataStage to develop efficient ETL pipelines to create data marts and perform some analysis on them.

data-analysis datastage datawarehousing etl extract ibm load transform

Last synced: 06 Mar 2026

https://github.com/sevilaymuni/project-no.2-pandas-tableau-student-mobility

Pandas assisted Feature Engineering on Study Mobility: Tableau Dashboards on Students' Preferences

data-analysis data-extraction data-visualization feature-engineering pandas python tableau-dashboards tableau-desktop tableau-public

Last synced: 03 May 2026

https://github.com/tbep-tech/tbep-r-training

Repository for miscellaneous R training materials

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/tbep-tech/pep-graphics

Materials for generating PEP graphics

data-analysis pep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/tberf-oyster

Materials for evaluating TBERF oyster restoration success

ccmp-bh4 ccmp-bh6 data-analysis tampa-bay tbep tberf

Last synced: 19 Feb 2026

https://github.com/tbep-tech/pep-r-training

Materials for PEP R training

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/r8vnhill/hdp

Hentai data processing

data-analysis e-hentai hentai kotlin

Last synced: 02 Apr 2025

https://github.com/revtpark/teamseas_scrapper

Scraping Team Seas for data analysis and visualization.

chartjs data-analysis python webscraping

Last synced: 28 Mar 2025

https://github.com/shubhamgoyal575/tableau-visualization-dashboard

This repository features interactive Tableau dashboards for sales performance and healthcare analysis. It includes insights on revenue trends, regional sales, patient demographics, and hospital occupancy for data-driven decision-making. 🚀

dashborad data-analysis data-cleaning-and-preprocessing healthcare-analysis healthcare-dashboard sales-dashboard sales-data-analysis-project tableau tableau-dashboards tableau-public visualization visualization-tools

Last synced: 20 Feb 2026

https://github.com/shafaq-aslam/predicting-heart-disease-risk-with-logistic-regression-techniques

Develop a predictive model using logistic regression techniques to assess heart disease risk based on patient health metrics and data analysis.

data-analysis heart-disease logistic-regression machine-learning machine-learning-models matplotlib numpy pandas python scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/zeh237/superstore-data-analytics

This is a Flask based data analytics project based on the superstore dataset using flask, pandas, sql and python

analytics data data-analysis data-science data-visualization flask python superstore

Last synced: 04 May 2025

https://github.com/deliprofesor/customerseg-customer-segmentation-and-shopping-analysis

This project performs data exploration, segmentation, and modeling of wholesale customer data using clustering algorithms, PCA, and decision trees to analyze purchasing behavior and predict customer channel preferences.

clustering customer-segmentation data-analysis data-visualization dbscan decision-tree gmm kmeans machine-learning pca

Last synced: 24 Jun 2025

https://github.com/lucas-mazzolim/superstore-bi

Project where I prepared two data sources for querying and created a BI visualization in Data Studio. Used tools as Mysql, Looker Studio, Google Spreadsheet and Python.

business-intelligence data-analysis data-visualization google-looker-studio mysql spreadsheet

Last synced: 27 Jul 2025

https://github.com/scailfin/rob-webapi-flask

Default RESTful Web API implementation for the Reproducible Open Benchmarks for Data Analysis Platform (ROB) using the Flask web framework.

benchmarks data-analysis reproducibility webapi

Last synced: 17 Mar 2026

https://github.com/nandit123/python_on_excel

Data Analysis using python libraries on excel data

csv data-analysis data-science fill fluctuations graph numpy python python-library

Last synced: 16 May 2026

https://github.com/tkhoa2711/twitter-hate-speech

Hate speech detection on Twitter

data-analysis python twitter

Last synced: 28 Jul 2025

https://github.com/antononcube/java-tilestats

Java package for statistics over 2D tillings. (Tile binning, aggregation functions application, etc.)

data-analysis hexagonal-grids

Last synced: 02 Apr 2025

https://github.com/fer-aguirre/cookiecutter-data-analysis-extensive

A cookiecutter template for data analysis projects using Python.

cookiecutter data-analysis project-template python

Last synced: 09 Apr 2025

https://github.com/tbep-tech/fim-seagrass

Materials for analysis of FIM data, seagrass, and other datasets

data-analysis fim seagrass tampa-bay

Last synced: 19 Feb 2026

https://github.com/tbep-tech/seagrass-analysis

Materials for assessing coverage changes and analysis of drivers of change for Tampa Bay seagrass

dashboard data-analysis seagrass tampa-bay water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/piney-point-analysis

Materials for analysis of Piney Point monitoring data

data-analysis open-science piney-point tampa-bay tbep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/peptools

Materials for wrangling and summarizing data from the Peconic Estuary

data-analysis package pep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/rookery-bay-training

Materials for R training at Rookery Bay Monitoring Workshop 2020

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/ggarciajavier/udacity-dalf-project1-investigate-dataset

Work performed for the 1st project of Udacity Data Analyst Nanodegree: exploratory data analysis of a football dataset.

data-analysis football-analytics python python36 udacity-data-analyst-nanodegree

Last synced: 15 May 2026

https://github.com/rohitblaze10/survey_monkey_analysis--using-ipython

This data analysis project focused on extracting insights from survey responses. It involves data cleaning, merging, and transformation using iPython (Pandas,OS) and SQL. The goal is to identify trends and patterns in survey data for better decision-making.

data-analysis ipynb ipython-notebook

Last synced: 28 Jul 2025

https://github.com/vetrivel07/flight-price-prediction

Developed a flight price prediction model using Python, analyzing historical data to forecast airfare prices and help travelers make informed booking decisions

data-analysis data-visualization jupyter-notebook numpy pandas python

Last synced: 15 Jun 2025

https://github.com/gui-sitton/prepaid

In this project I work as an analyst for the telecommunications company Megaline. The company offers its customers prepaid plans, Surf and Ultimate. The sales department wants to know which plans bring in the most revenue in order to adjust the advertising budget

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 22 May 2026

https://github.com/buildwithlal/introduction-to-data-science-in-python-coursera

introduction to data science in python, part of Applied Data Science using Python Specialization from University of Michigan offered by Coursera

data-analysis matplotlib numpy pandas

Last synced: 03 May 2026

https://github.com/ashwin331133/sql-project--sales-data-analysis--walmart

This SQL-based Walmart data analysis project aims to identify top-performing branches and products, optimize sales strategies using Kaggle's Walmart Sales Forecasting Competition dataset.

data-analysis eda sql

Last synced: 03 Jan 2026

https://github.com/j-wu1/analyse_ventes_jeuxvideo_python

Analyse Exploratoire de Données (EDA) sur les ventes de jeux vidéo avec Python, Pandas, Matplotlib et Seaborn dans un Jupyter Notebook.

data-analysis eda jupyter-notebook matplotlib pandas python seaborn

Last synced: 19 Aug 2025

https://github.com/labex-labs/sqlite-intermediate-to-advanced

In this course, delve into advanced SQLite techniques. Master constraints, indexing, joins, subqueries, transactions, triggers, views, full-text search, JSON, backups, PRAGMA tuning, CTEs, window functions, and more!

advanced-sql course data-analysis data-integrity data-manipulation data-modeling database database-design hands-on labex labs performance-tuning programming query-optimization relational-database schema-management sql sqlite stored-procedures transaction-management

Last synced: 18 May 2026