An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/vara-co/pandas-challenge

PyCitySchools - Analysis between budget and academic performance in schools

budget-analysis data-analysis jupiter-notebook pandas-dataframe python school-performances

Last synced: 17 May 2026

https://github.com/vhtua/group4_data_analysis

Hierarchical Cluster Analysis: Movie Genres Preferences

data-analysis hierarchical-clustering r unsupervised-learning

Last synced: 29 Mar 2025

https://github.com/stevapple/elasticsearch-utils

Asynchronous data processing and import/export for Elasticsearch, written in Python.

data-analysis data-processing elasticsearch python

Last synced: 15 May 2026

https://github.com/nafisalawalidris/investigating-netflix-movies-and-guest-stars-in-the-office

Dive into the world of Netflix and explore the average duration of movies. Netflix, being the largest entertainment company, offers a wide range of movies for its viewers. In this project, we analyse movie durations using pandas and create a DataFrame from a dictionary. By examining average durations from 2011 to 2020.

average-duration csv-files data-analysis data-visualization dataframe filtering movie-durations movie-length-distribution netflix pandas python trends

Last synced: 20 May 2026

https://github.com/kiranmayi5/r-projects

This repository showcases R projects designed to tackle real-world problems through data-driven solutions.

data-analysis exploratory-data-analysis predictive-modeling r statistical-analysis

Last synced: 25 Jun 2025

https://github.com/sourabh-kumar04/numpy-basic

Numpy-Basic is a structured learning repo covering NumPy from basics to advanced. It includes arrays, indexing, reshaping, filtering, vector ops, angle functions, stats, and .npy file handling. Each concept is explained with code, examples, and Matplotlib visualizations in both light and dark modes. Ideal for students and data learners.

data-analysis data-science data-visualization learning learning-resources machine-learning matplotlib numerical-computing numpy python python-library python-programming

Last synced: 10 May 2026

https://github.com/nabilshadman/r-data-analysis

A modular R framework for data analysis, with emphasis on data processing and reproducible workflows.

data-analysis data-cleaning data-manipulation data-science descriptive-statistics programming r r-studio statistical-analysis statistical-computing t-test

Last synced: 04 Apr 2025

https://github.com/oguzgn/budget-checker-for-campaign-budget-allocation

This project focuses on modeling campaign performance data for Looker, helping determine which campaigns to scale up or cut back. It aggregates metrics over the last 7 and 30 days, providing actionable insights for budget optimization and performance improvement.

budget-allocation budget-controller budget-management calculated-fields campaign-analytics data-analysis data-modeling looker-studio sql

Last synced: 17 Feb 2026

https://github.com/morphclue/godot-trend

R-Code and data for game engines on itch.io

data-analysis game-engines trends

Last synced: 05 Apr 2025

https://github.com/mksingh431/python-project

Learn Pandas with exercises and sample projects

data-analysis data-science data-visualization project projects python

Last synced: 03 May 2026

https://github.com/luminati-io/amazon-dataset-samples

A sample dataset of over 1,000 Amazon product listings, extracted using the Bright Data API, perfect for competitive analysis, market trends, and eCommerce insights.

amazon api data-analysis data-science dataset ecommerce products web-scraping

Last synced: 03 Jan 2026

https://github.com/nafiealhilaly/analyzing-sa-schools-data

A simple python streamlit app to explore and analyze Saudi Arabia schools dataset from data.gov.sa

data-analysis data-visualization eda python streamlit

Last synced: 16 May 2026

https://github.com/muneeb1030/eda-of-physionets-ecg

EDA of Physionet Data set regarding "A Large Scale 12 Lead Electrocardiogram Database for Arrhythmia Study 1.0.0". This project focuses on the preprocessing of electrocardiogram (ECG) signals and utilizes Principal Component Analysis (PCA) for dimensionality reduction

12-lead-ecg data-analysis ecg-signal eda pca python3 wfdb

Last synced: 25 Jul 2025

https://github.com/jabhij/tableau_dashboards

Consists brief info about all of my tableau dashboards, insights that I got out of them, & the outcomes that I got after analyzing those visualizations.

data-analysis data-analytics data-science data-visualization tableau visualisation

Last synced: 07 Mar 2026

https://github.com/balapriyac/python-data-analysis

Code along to simple data science and analysis projects and tutorials

data-analysis data-science python

Last synced: 25 Jul 2025

https://github.com/akashparley/stocklyzer

Stocklyzer is a real-time stock analysis web app built with Streamlit. It features stock performance tracking, technical indicators, CAPM-based risk-return insights, and ARIMA-based price prediction. Ideal for finance enthusiasts, analysts, and learners exploring data-driven investing tools.

arima-forecasting data-analysis financial-analysis machine-learning stock-price-prediction

Last synced: 16 May 2026

https://github.com/lijesh010/netflix_dataset_exploratory_data_analysis_python_project

This repository contains an Exploratory Data Analysis (EDA) Python project on the Netflix dataset. The purpose of this project is to gain insights and better understand the characteristics of the content available on Netflix, including movies and TV shows.

data-analysis data-exploration data-visualization exploratory-data-analysis jupyter-notebook python

Last synced: 20 May 2026

https://github.com/gursv/stocksage

Predict next day's close price for a stock like NSEI, NYA, HSI, IXIC, TWII, etc...!

data-analysis data-preprocessing data-science gridsearchcv machine-learning python3 random-forest-regressor stock-data stock-price-prediction streamlit

Last synced: 18 Apr 2026

https://github.com/vara-co/space-missions

Space Missions Over Time (1957-2022): Successes vs Failures, and Rocket Usage

data-analysis data-analysis-python history matplotlib pandas pandas-python space space-race spaceships team-project

Last synced: 18 May 2026

https://github.com/listiangr/product_sales_data_analysis

Proyek ini menganalisis data penjualan untuk memberikan wawasan tentang tren penjualan, profitabilitas, dan permintaan produk, guna membantu perusahaan merencanakan strategi harga, promosi, dan pengelolaan inventaris yang lebih efektif.

corrplot data-analysis data-preprocessing data-visualization dplyr ggcorrplot ggplot2 product-sales r-language rstudio

Last synced: 03 Apr 2025

https://github.com/olekscode/covidanalysis

A setup for COVID-19 data analysis in Pharo

coronavirus covid-19 data-analysis pharo

Last synced: 05 Apr 2025

https://github.com/priyanshubiswas-tech/data-analysis-with-python

This repository showcases Python projects completed for a Data Analysis with Python certification, demonstrating skills in data manipulation, visualization, and statistical analysis using libraries like NumPy, Pandas, Matplotlib, Seaborn, and SciPy.

data-analysis demographic-data-analyzer mean-variance-standard-deviation-calculator medical-data-visualizer page-view-time-series-visualizer python scipy-stats sea-level-predictor seaborn

Last synced: 07 May 2025

https://github.com/hariyebk/eplinsights

English Premier League 2018/2019 Data Analysis

class-composition data-analysis filesystem-library

Last synced: 26 Jul 2025

https://github.com/hoangsonww/fred-banking-data-analysis

💸 AI-powered banking data explorer that combines FRED API insights with vector search, regression analysis, and interactive chat via OpenAI, Claude, and Gemini. Built with TypeScript, React, and Express for seamless full-stack performance.

anthropic chartjs claude-ai data data-analysis data-analytics data-science data-visualization fred fred-api gemini google-generative-ai logistic-regression multiple-regression openai pinecone react regression typescript vector-database

Last synced: 09 Apr 2025

https://github.com/yash-kavaiya/ai-analytics

This is a Streamlit app that uses Pandas and AI to perform data analytics on uploaded CSV files.

data-analysis generative-ai pandas streamlit

Last synced: 20 Jul 2025

https://github.com/gurpreetkaurjethra/ai-data-visualization-agent

This Streamlit application creates an interactive Data Visualization Assistant that can understand Natural Language Queries and generate appropriate Visualizations using LLMs.

aiagents aichatbot aidevelopment artificial-intelligence data-analysis data-visualization generative-ai llms

Last synced: 25 Jun 2025

https://github.com/priyanshubiswas-tech/airflow_dbt_superset_project

End-to-end ITSM data engineering pipeline using PostgreSQL, DBT, Airflow, and Superset. Covers ingestion, cleaning, transformation, orchestration, and visualization, validated across Docker Toolbox and Docker Desktop environments.

apache-airflow apache-superset dags data-analysis dbt docker etl etl-automation etl-pipeline postgresql

Last synced: 07 May 2025

https://github.com/rayyan9477/diamond-price-forecasting

This is a comprehensive machine learning project focused on predicting diamond prices. Using a dataset of diamond attributes, the project implements various machine learning models to forecast prices. Key features include data preprocessing, exploratory data analysis (EDA), and model training with algorithms such as Linear Regression, Decision Tree

data-analysis data-science decision-trees eda linear-regression machine-learning

Last synced: 26 Jul 2025

https://github.com/defrecord/value-alignment-toolkit

A comprehensive toolkit for implementing, analyzing, and validating AI value alignment based on Anthropic's 'Values in the Wild' research.

ai anthropic data-analysis ethics privacy python simulation value-alignment

Last synced: 20 Jul 2025

https://github.com/incubrain/awesome-maharashtra-data

A collection of datasets specific to Maharashtra, India. WIP

ai artificial-intelligence data data-analysis data-science datasets maharashtra marathi

Last synced: 23 May 2026

https://github.com/pawlo77/kaggle-project

Repository for 'kaggle' project of Data Science Scientific Circle at Faculty of Mathematics and Information Science, Warsaw University of Technology

data-analysis data-science eda maschine-learning

Last synced: 20 Mar 2025

https://github.com/li-pearl/gene-count-normalizer

First step of data wrangling in MERFISH data project

data-analysis merfish merscope python

Last synced: 13 May 2026

https://github.com/windjammer6/8.-star-wars-data-analysis-python

A personal project to analyse data from a Star Wars survey. Python libraries used: Pandas, Matplotlib

data-analysis python

Last synced: 27 Jul 2025

https://github.com/dimits-ts/synthetic_moderation_experiments

Experiments relating to synthetic LLM user-agents and LLM facilitators in online discussions

data-analysis dataset-generation llms llms-reasoning nlp

Last synced: 06 Mar 2026

https://github.com/adrianycmc/introducaoadatascience

Explorando dados: Utilizando Python, Pandas e o Colaboratory do Google.

data-analysis data-science jupyter pandas python

Last synced: 30 Apr 2026

https://github.com/riju18/advanced-data-analysis-and-visualization

Advanced level of data preparation, level of detail calculation, animation, table calculation etc for data analysis & visualization.

data-analysis data-science data-visualization tableau

Last synced: 18 Feb 2026

https://github.com/rishabhraj43/diwali-sales-analysis

A Data Analysis project made in Python

data-analysis python

Last synced: 01 May 2026

https://github.com/tbep-tech/ecometab-r-training

Website materials for R training on ecosystem metabolism

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/tbep-tech/verified-wbids

Materials for evaluation of verified WBIDs in the Tampa Bay watershed

data-analysis open-science tampa-bay tbep water-quality

Last synced: 19 Feb 2026

https://github.com/denko5/valentine-s-analysis

This repository dives into Valentine's trends and behaviors using SQL, focusing on exploratory data analysis to uncover patterns and insights. It features SQL queries, datasets, and documentation to guide readers through the process. Designed for collaboration and educational use.

africa analytics data-analysis eda exploratory-data-analysis insights kenya sql sql-server sqlworkbench trends valentines-day

Last synced: 04 May 2025

https://github.com/arnabsaha7/customer-churn_prediction---analysis

Predict customer churn using machine learning. This project employs a RandomForestClassifier to analyze customer data and determine the likelihood of churn. Explore the Jupyter Notebook for insights into the data and model, and contribute to the project's development.

customer-churn-prediction data-analysis machine-learning

Last synced: 02 Mar 2025

https://github.com/birkkarlsen/beam_dynamics_tools

Repository filled with functions related to the analysis of longitudinal beam dynamics measurements and simulations

accelerator-physics beam-dynamics data-analysis

Last synced: 19 Sep 2025

https://github.com/pinedah/escom_development-of-applications-for-data-analysis

This repository is a personal collection of programs, exercises, and notes from the Development of Applications for Data Analysis course at Instituto Politécnico Nacional (IPN). As part of the Bachelor's in Data Science, the course focuses on developing practical skills in Python for data analysis.

data-analysis data-science data-visualization jupyter-notebook python python-data-analysis

Last synced: 20 Jan 2026

https://github.com/gappeah/british-airways-analysis

This project focuses on analyzing and visualising travel data from British Airways using Tableau. The goal is to extract insights and present them in an interactive and visually appealing manner.

data data-analysis data-visualization tableau

Last synced: 11 Jun 2025

https://github.com/tbep-tech/red-tide-twitter

Supplementary materials to accompany Skripnikov et al. red tide Twitter analysis

ccmp-li1 ccmp-wq3 data-analysis open-science tampa-bay tberf water-quality

Last synced: 19 Feb 2026

https://github.com/nafisalawalidris/logistic-regression-model-for-breast-cancer-recurrence-prediction

Predicting Breast Cancer Recurrence - A logistic regression model using patient attributes to classify recurrence risk. Dataset analysis and model evaluation. Contributions welcome.

breast-cancer classification-model data-analysis data-science healthcare logistic-regression machine-learning python recurrence-prediction scikit-learn

Last synced: 17 May 2026

https://github.com/aliciagilmatute/analisis-multinivel-bayesiano

Este estudio explora el análisis multinivel desde un enfoque bayesiano para evaluar la variabilidad del rendimiento en matemáticas entre 10 centros educativos

bayesian-statistics cmdstanr data-analysis hierarchical-models multilevel-models rstats rstudio stan

Last synced: 30 Oct 2025

https://github.com/kushagrakumar04/traffic-accident-analysis

This project analyzes traffic accident data to identify patterns based on road conditions, weather, and time of day. Visual representations of accident hotspots and contributing factors are created to offer a comprehensive understanding of the dynamics involved. The insights from this analysis aim to develop targeted strategy to improve safety.

data-analysis matplotlib pandas visualization

Last synced: 15 May 2026

https://github.com/codingprivacy/feedback-portal-system

AI based Feedback Portal System which takes periodic feedbacks from users via highly human friendly chat-bot, analyse the responses through NLP and sentiment analysis and visualize the analysis on the portal website.

artificial-intelligence bokeh chatbot data-analysis flask mysql-database nlp portal python sentiment-analysis visualization website

Last synced: 19 Sep 2025

https://github.com/ajwad-shaikh/sristi-sanshodh-collect

SRISTI Sanshodh Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments. Contribute and make the world a better place! ✨📋✨ https://docs.opendatakit.org/collect-…

collect data-analysis data-collection javarosa odk opendatakit

Last synced: 04 Apr 2025

https://github.com/jcaperella29/financial-data-scraper

Financial Data Scraper is a Python-based web scraping tool using Selenium to extract financial data from Stock Analysis. It scrapes Income Statement, Balance Sheet, Cash Flow, and Ratios for multiple companies and saves them as CSV files.

automation data-analysis finance financial-statements investment python selenium stock-market web-scraping

Last synced: 28 Jul 2025

https://github.com/kumaranand05/suicide-rate-analysis

Analysis of Mortality data of WHO and visualization using Power BI

analytics data-analysis data-visualization mortality-rates powerbi python suicide-dataset suicide-rate

Last synced: 04 May 2026

https://github.com/samruddhi3012/shopping-habits-customer-behavior-analysis

Hello there! This repo contains python project based on E-Commerce Customer Behavior analysis.

customer-segmentation customerbehavior data-analysis ecommerce python

Last synced: 29 Mar 2025

https://github.com/andystmc/nextflownyc

Developed a machine learning model (Bidirectional LSTM) to forecast NYC traffic volumes using 10 years of automated traffic count data. Achieved strong predictive accuracy, demonstrating the power of deep learning for urban traffic analysis.

data-analysis data-cleaning data-science data-visualization exploratory-data-analysis feature-engineering hyperparameter-tuning jupyter-notebook lstm-neural-networks machine-learning numpy pandas predictive-modeling python3 scikit-learn tensorflow-keras traffic-flow-forecasting

Last synced: 07 Apr 2026

https://github.com/p2-718na/alice-simulation

Code for my Lab-2 course.

cern-root data-analysis

Last synced: 13 Mar 2025

https://github.com/akash1070/data-science-virtual-internship-by-accenture

data merging and data cleaning in python as well as data visulaisation with dashboard in Tableau.

data-analysis data-cleaning data-science python3 tableau visualization

Last synced: 15 May 2026

https://github.com/unrndm/dataanalysis

artifacts and sollutions of homework for course "Data Analysis" in Magistrate of HSE during 2023-2024

2023-2024 data-analysis hse

Last synced: 27 Mar 2025

https://github.com/nafisalawalidris/dr.-semmelweis-and-the-discovery-of-handwashing

Uncover the revolutionary impact of handwashing on mortality rates in healthcare. Explore the story of Dr. Semmelweis and his groundbreaking findings.

data data-analysis handwashing healthcare-analysis medical-breakthrough mortality-rates

Last synced: 13 Jul 2025

https://github.com/thecoderpinar/customer-segmentation-clv-analysis

Optimize marketing strategies and enhance decision-making. Explore customer data, segment behavior, calculate CLV, analyze demographics, and visualize insights. 🚀

clv-analysis customer-segmentation data-analysis data-science data-visualization jupyter-notebook machine-learning marketing-strategy python

Last synced: 03 Apr 2025

https://github.com/swarchal/morar

Processing phenotypic screening data

biology data data-analysis drug-discovery hts phenotypic

Last synced: 19 Jun 2025

https://github.com/nysportsfan/Gun-Violence-in-the-US

This repository contains all the relevant files for my first capstone project as part of the Springboard Data Science Career Track.

data-analysis data-science data-visualization machine-learning python3 statistics

Last synced: 10 May 2025

https://github.com/sufiyanahmed4566/sql-musicmaven

"This Music Store Database Project showcases SQL skills through comprehensive database design, query optimization, and data analysis. Includes ER diagram, database file, query questions (Easy, Medium, Hard), answered queries, and CSV table data. Ideal for recruiters seeking skilled SQL developers for music store management and data analysis.

data-analysis database insights mysql-database oracle-database relational-databases sql

Last synced: 18 May 2026

https://github.com/pavelgrigoryevds/olist-deep-dive

🌊 Deep Sales Analysis of Olist E-Commerce: EDA | Time Series| Viz | RFM | NLP | Geospatial | Segmentation & Actionable Business Recommendations.

business-recommendations clusterization data-analysis data-analytics data-science deep-analysis e-commerce eda feature-engineering geospatial jupyter-notebook nlp pandas plotly preprocessing python rfm statistics time-series visualization

Last synced: 07 May 2026

https://github.com/numbersprotocol/dyda

Dynamic data pipeline framework

ai artificial-neural-networks data-analysis data-science

Last synced: 07 Nov 2025

https://github.com/archived-blueprints/amazonathena-blueprints

Simplified blueprints for building data pipelines with Amazon Athena.

amazon-athena athena cli data-analysis data-engineering data-science elt etl

Last synced: 29 Jul 2025

https://github.com/mijisu0103/ukhsa-dashboard-project

Simple dashboard that downloads and displays the data about infectious diseases (Influenza, Rhinovirus and COVID-19) from the UK Health Security Agency (UKHSA) dashboard.

data-analysis data-visualisation ipywidgets python voila-dashboard

Last synced: 17 Jun 2025

https://github.com/rayyan9477/youtube-spam-detection-with-flask-and-machine-learning

This is a web application built using Flask that detects spam comments on YouTube using a Naive Bayes classifier. It leverages techniques such as CountVectorizer for feature extraction and scikit-learn for machine learning. The application reads data from a CSV file and predicts whether a comment is spam or not.

data-analysis data-science machine-learning nlp-machine-learning spam-detection

Last synced: 21 Sep 2025

https://github.com/atharvbyadav/expensemate

A simple, lightweight personal finance tracker built with Streamlit and SQLite. Log expenses, visualize spending habits, manage budgets, and download reports – all through an interactive web interface.

budgeting data-analysis data-visualization expense-tracker finance-app open-source pandas personal-finance plotly python sqlite streamlit streamlit-webapp

Last synced: 28 Apr 2026

https://github.com/hemanthkumarsunkari27/pmay_analysis_project

Built for the 1st AI for Good Hackathon by Snowflake, this project uses data analytics and AI to explore housing and sanitation trends in India under PMAY. Using Snowflake and Streamlit, it provides interactive insights into regional disparities, helping guide sustainable infrastructure development.

data-analysis data-visualization pmay-analysis sanitation-coverage snowflake-integration streamlit-dashboard sustainable-development

Last synced: 26 Mar 2025

https://github.com/tynoee/covid19_data_analysis

This is an analysis of Covid 19 dataset using multiple SQL queries. The dataset used for this analysis includes various information regarding COVID-19 cases such as confirmed cases, deaths, and recoveries, segmented by different geographical locations and time periods.

data-analysis excel sql sqlserver-2019 tableau tableau-public

Last synced: 16 Feb 2026

https://github.com/vkbo/osirisanalysis

Matlab toolbox for analysing simulation results from Osiris 3

data-analysis matlab matlab-gui physics-simulation

Last synced: 10 May 2025

https://github.com/makosai/covid19datachart

A basic chart for checking corona data. Written in a single HTML file for convenience. Grab the single file and run it anywhere. Or visit the webpage.

chart chartjs corona coronavirus coronavirus-analysis covid-19 covid-2019 covid19 covid19-data data data-analysis datasets

Last synced: 23 Feb 2026

https://github.com/gad-dimnt-cptec/scanplot

Um sistema de plotagem simples para o SCANTEC

data-analysis jupyter-notebook pandas python scantec

Last synced: 17 Jan 2026

https://github.com/patilni3/seaborn-in-depth

Python's Seaborn Library for Data Analysis, Machine Learning, Data Science and many more...

data-analysis data-reporting data-representation data-science data-visualization plots-in-python powerbi seaborn sns

Last synced: 03 Apr 2025

https://github.com/patilni3/numpy-in-depth

Python's NumPy Library for Data Analysis, Machine Learning, Data Science and many more...

data-analysis data-engineering data-science machine-learning numpy pandas

Last synced: 10 May 2026

https://github.com/docuvesta/la-prairie-luxury-skincare-makeup-analysis

Web scraping La Prairie skincare regional websites for brand and product insights 🛍️

cosmetics data-analysis data-analytics data-visualization jupyter-notebook luxury python science skincare

Last synced: 19 Apr 2026

https://github.com/nishchal-kansara/loan_eligibility_prediction

This project aims to create a robust machine learning model that accurately predicts an applicant's eligibility for a loan based on various features such as income, credit history, and marital status.

data-analysis data-cleaning data-science data-visualization datascience dataset loan-eligibility

Last synced: 23 Jun 2026

https://github.com/gappeah/london-housing-price-dashboard

This Excel-based Housing Visual Dashboard provides a comprehensive view of average house prices across various boroughs in London from 1996 to 2013. The dashboard is designed to offer insights into housing market trends and price variations across different areas of London over time.

data data-analysis data-visualization excel visual

Last synced: 31 Jul 2025