An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/zwelz3/unofficial-survivor-knowledge-graph

A comprehensive RDF knowledge graph covering all 50 seasons of Survivor (US), with 23,000+ triples across 749 named graphs.

data-analysis rdf survivor

Last synced: 23 May 2026

https://github.com/ddjain/jsonl-visualizer

A beautiful web tool for visualizing JSONL files with syntax highlighting and multiple view modes

data-analysis json jsonl viusal

Last synced: 18 Sep 2025

https://github.com/karsterr/repeated-measurement

An R-based workflow for conducting repeated measures ANOVA using the ez package, with data wrangling via tidyverse and visualization through ggplot2. Includes data import, transformation to long format, statistical analysis, and graphical summary.

anove data-analysis experimental-design ezanove ggplot2 r repeated-measurements rstats statistics tidyverse

Last synced: 18 Sep 2025

https://github.com/ariyaarka/sales-analysis

A simple analysis on random dataset of pizza sales using SQL

data-analysis presentation-slides sql

Last synced: 17 Jan 2026

https://github.com/rh01/data-analysis-with-r

Duke University - Data Analysis With R

data-analysis r r-language r-studio rmarkdown

Last synced: 23 May 2026

https://github.com/ljadhav25/healthcare-data-collection-and-analysis

This repository contains a project focused on collecting healthcare data from the web, storing it in a structured format, and performing comprehensive analysis. The objective is to gather valuable health-related information, process and clean the data, and derive insights to support healthcare research and decision-making.

data-analysis data-visualization flask-application flask-backend html-css-javascript pycharm-ide python

Last synced: 09 Apr 2026

https://github.com/ramapinnimty/udacity-mlfoundation-nanodegree

This is a repository containing solutions to the assignments that are a part of the Udacity Machine Learning Foundation Nanodegree program.

assignments data-analysis python3 statistics udacity-machine-learning-nanodegree

Last synced: 26 Jul 2025

https://github.com/sharoonjoseph321/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 27 Jul 2025

https://github.com/hfzdzakii/dicoding-shipclusteringanalysisdataandmodelling

This repo is a master submission for my Dicoding Final Project. Ship Performance Clustering Dataset was being used to fulfill the submission. Feel free to explore and I hope my work give you some insight!

clustering data-analysis machine-learning

Last synced: 27 Jul 2025

https://github.com/codeonthespectrum/web-scrap

Este projeto realiza o web scraping da Wikipédia para obter dados sobre os municípios mais populosos do estado do Rio de Janeiro.

data-analysis data-visualization webscraping

Last synced: 16 Feb 2026

https://github.com/danymukesha/bioga

Apply multi-objective genetic algorithms to genomic data for biologically informed feature selection and pattern discovery.

data-analysis gene-expression genetic-algorithms genomics optimization-algorithms

Last synced: 18 Sep 2025

https://github.com/dmvianna/python-nix

Trivial Nix environment with pandas and postgresql

data-analysis nix

Last synced: 27 Jul 2025

https://github.com/grindelfp/data-analysis-example

One of my UNI Artificial Intelligence Systems course's projects.

data-analysis data-preprocessing ipynb

Last synced: 19 Sep 2025

https://github.com/tbep-tech/tbeploads

R Package for estimating nutrient loading to Tampa Bay

data-analysis loads package tampa-bay tbep tbnmc water-quality

Last synced: 19 Feb 2026

https://github.com/jofaval/ionosphere

Binary Classification of Ionosphere signals at Goose Bay, Labrador in 1988

data-analysis data-science data-visualization deep-learning google-colab keras machine-learning python scikit-learn tensorflow uci xgboost

Last synced: 09 Apr 2026

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 13 May 2026

https://github.com/tbep-tech/tbep-r-training

Repository for miscellaneous R training materials

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/tbep-tech/pep-graphics

Materials for generating PEP graphics

data-analysis pep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/tberf-oyster

Materials for evaluating TBERF oyster restoration success

ccmp-bh4 ccmp-bh6 data-analysis tampa-bay tbep tberf

Last synced: 19 Feb 2026

https://github.com/tbep-tech/pep-r-training

Materials for PEP R training

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/shafaq-aslam/predicting-heart-disease-risk-with-logistic-regression-techniques

Develop a predictive model using logistic regression techniques to assess heart disease risk based on patient health metrics and data analysis.

data-analysis heart-disease logistic-regression machine-learning machine-learning-models matplotlib numpy pandas python scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/lucas-mazzolim/superstore-bi

Project where I prepared two data sources for querying and created a BI visualization in Data Studio. Used tools as Mysql, Looker Studio, Google Spreadsheet and Python.

business-intelligence data-analysis data-visualization google-looker-studio mysql spreadsheet

Last synced: 27 Jul 2025

https://github.com/nandit123/python_on_excel

Data Analysis using python libraries on excel data

csv data-analysis data-science fill fluctuations graph numpy python python-library

Last synced: 16 May 2026

https://github.com/tkhoa2711/twitter-hate-speech

Hate speech detection on Twitter

data-analysis python twitter

Last synced: 28 Jul 2025

https://github.com/dadvaiahpavan/ai-data-scientist-

AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.

ai anomaly-detection classification data-analysis data-science machine-learning panda plotu regression scikit-learn sentiment-analysis streamlit

Last synced: 03 May 2026

https://github.com/tbep-tech/fim-seagrass

Materials for analysis of FIM data, seagrass, and other datasets

data-analysis fim seagrass tampa-bay

Last synced: 19 Feb 2026

https://github.com/tbep-tech/seagrass-analysis

Materials for assessing coverage changes and analysis of drivers of change for Tampa Bay seagrass

dashboard data-analysis seagrass tampa-bay water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/piney-point-analysis

Materials for analysis of Piney Point monitoring data

data-analysis open-science piney-point tampa-bay tbep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/peptools

Materials for wrangling and summarizing data from the Peconic Estuary

data-analysis package pep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/rookery-bay-training

Materials for R training at Rookery Bay Monitoring Workshop 2020

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/rohitblaze10/survey_monkey_analysis--using-ipython

This data analysis project focused on extracting insights from survey responses. It involves data cleaning, merging, and transformation using iPython (Pandas,OS) and SQL. The goal is to identify trends and patterns in survey data for better decision-making.

data-analysis ipynb ipython-notebook

Last synced: 28 Jul 2025

https://github.com/ashwin331133/sql-project--sales-data-analysis--walmart

This SQL-based Walmart data analysis project aims to identify top-performing branches and products, optimize sales strategies using Kaggle's Walmart Sales Forecasting Competition dataset.

data-analysis eda sql

Last synced: 03 Jan 2026

https://github.com/labex-labs/sqlite-intermediate-to-advanced

In this course, delve into advanced SQLite techniques. Master constraints, indexing, joins, subqueries, transactions, triggers, views, full-text search, JSON, backups, PRAGMA tuning, CTEs, window functions, and more!

advanced-sql course data-analysis data-integrity data-manipulation data-modeling database database-design hands-on labex labs performance-tuning programming query-optimization relational-database schema-management sql sqlite stored-procedures transaction-management

Last synced: 18 May 2026

https://github.com/lit26/data_jobs_analyzing

Data analysis for data jobs

data-analysis topic-modeling

Last synced: 26 Mar 2025

https://github.com/archanakokate/eda_amazon_products_and_discounts_2023

Exploratory Data Analysis (EDA) on Amazon's 2023 Products and Discounts data

data-analysis data-mining data-visualization exploratory-data-analysis

Last synced: 03 Jan 2026

https://github.com/swethajoseph/statistical-stock-performance-analysis

Conducted a statistical analysis of Microsoft, Tesla, and Apple stock performance compared to the S&P 500, examining price trends, volatility, and correlations to derive investment insights.

advancedexcel comparative-analysis data-analysis data-visualization datapreparation descriptive-statistics moving-average msexcel performance-analysis performance-metrics regression-analysis statistical-analysis

Last synced: 03 Jan 2026

https://github.com/prateek5525/retail-sales-analysis-project

This project involves analyzing retail sales data using SQL to uncover insights into sales patterns, customer behavior, and product performance. It serves as an exercise to develop foundational SQL skills in data exploration, cleaning, and analysis.

data-analysis data-cleaning retail-sales-data sql

Last synced: 03 Jan 2026

https://github.com/hasinii12/-chocolate-analysis-dashboard

This Power BI report provides a comprehensive analysis of chocolate ratings and related attributes.

data-analysis data-visualization powerbi

Last synced: 09 Feb 2026

https://github.com/noorulhudaajmal/business-performance-analytics

Python-Streamlit based interactive dashboard to analyze and visualize key business metrics for an online store.

business-analytics dashboard data-analysis python-streamlit

Last synced: 29 Jul 2025

https://github.com/malakasupun/crime-data-analysis-of-lapd

This project aims to explore and analyse crime patterns in Los Angeles using a dataset spanning from 2020 to the present. The primary focus is to extract meaningful insights by integrating structured data analysis and advanced techniques in SQL and Natural Language Processing (NLP).

data-analysis data-visualization llm nlp sql

Last synced: 29 Jul 2025

https://github.com/yash22222/literacy-exploration-analysis

Delve into India's literacy landscape through data analysis. Uncover regional disparities, high/low literacy states & gender imbalances.

csv data-analysis data-visualization government-data india literacy literacy-analysis states

Last synced: 29 Jul 2025

https://github.com/nguyenda18/ppp-data-tool

Command line tool (could later be used as lambda function) to download CSV files from SBA and generate JSON

data-analysis nodejs-server ppp-files ppp-loans

Last synced: 29 Jul 2025

https://github.com/cyprianfusi/data-scientist-technical-exercise-10ds

With recommendations to UK Department for Education of 10 Local Authorities where National Tutoring Programme (NTP) should be intensified and a response to UK Secretary of Health regarding a 76% Accident and Emergency (A&E) performance target which seems far-fetched.

data-analysis data-cleaning data-visualization hypothesis-testing pandas-python policy statistics

Last synced: 21 Sep 2025

https://github.com/sandergi/ekichabi

A digital phonebook to connect sustenance farmers in Tanzania. Works via USSD so farmers without an internet connection can use it (via their Telecom). Build with Django in Python and a MySQL database. This is a public copy of the private repo with user information stripped.

android data-analysis ict4d research ussd

Last synced: 14 May 2026

https://github.com/naso7y/twitter-sentiment-analysis

Classifies airline-related tweets as positive, negative, or neutral using machine learning and NLP.

data-analysis machine-learning nlp sentiment-analysis

Last synced: 29 Jul 2025

https://github.com/hemangsharma/bookingdataanalysisreport

The report helps understand key trends and insights around customer bookings, pricing, and other related attributes.

analysis data data-analysis data-analytics data-visualization streamlit streamlit-dashboard

Last synced: 14 May 2026

https://github.com/zulfachafidz/green_horizon_forecasting_peak_organic_avocado_sales_with_the_prophet_algorithm

The Green Horizon Project leverages the Prophet algorithm to predict peak sales of organic avocados, supporting the campaign "APEAM GO ORGANIC." Using Python and Looker Studio, this analysis aims to provide deep insight into sales trends and potential, forming the basis of smarter marketing strategies.

algorithm algorithms analytics data data-analysis data-engineering data-mining data-science data-visualization forecasting machine-learning machine-learning-algorithms prophet-model python python-script

Last synced: 17 May 2026

https://github.com/benmar2406/rent-in-germany

Interactive visualizations and maps depicting topics around rent prices and income in Germany built with Svelte.

charts d3 d3-visualization d3js data-analysis data-visualization gis gis-data infographic infographics map mapbox mapbox-gl mapbox-gl-js mapboxgl svelte

Last synced: 26 Mar 2025

https://github.com/ozep/genshincharacteranalysis

Uses a spreadsheet with Character Data and organizes it into readable graphs.

data-analysis jypyternotebook python

Last synced: 18 Apr 2026

https://github.com/sadratehranian/pem-fuel-cell

The methodology section details the use of Python for data processing and analysis, employing statistical and machine learning-based anomaly detection techniques to identify potential issues in fuel cell stacks. It emphasizes data preprocessing, feature engineering, exploratory data analysis (EDA), and anomaly detection.

anomaly-detection data-analysis data-science data-visualization exploratory-data-analysis feature-engineering fuel-cell machine-learning preprocessing python statistical-analysis visual-studio-code

Last synced: 26 Mar 2025

https://github.com/antrita/stroke_prediction_model

A model that combines Kaggle's Stroke Prediction Dataset with live weather/air quality data to implement FDA-compliant MLOps pipeline and shows expertise in healthcare regulations and real-time inference.

ai data-analysis deep-learning kaggle-dataset machine-learning prediction-model random-forest real-time scikit-learn streamlit weather-api xgboost

Last synced: 07 May 2026

https://github.com/sinsunsan/earth-survival-kit

Global warning data visualisation app to make everyone understand global warning and take actions that matter

angular angular7 d3 data-analysis data-visualization ecology global-warning ngx-charts

Last synced: 05 May 2026

https://github.com/nathadriele/transaction_fraud_prevention_pipeline

Uma solução de detecção e prevenção de fraudes em transações financeiras, combinando Machine Learning, regras de negócio e análises estatísticas avançadas. O sistema oferece um dashboard interativo para monitoramento em tempo real, análise de dados e gestão de alertas de fraude.

data-analysis data-visualization docker fraud-prevention machine-learning matplotlib numpy pandas pipeline pytest python scikit-learn scipy seaborn streamlit tensorflow transaction xgboost

Last synced: 10 Apr 2026

https://github.com/sidsin0809/hmdb-endo-flagger

A Python toolkit to identify and score endogenous human metabolites from HMDB XML metadata

data-analysis hmdb metabolomics ontology pipeline python-3 streaming-parser xml-parsing

Last synced: 06 Jul 2025

https://github.com/amoghkori/deeplabcut-package-for-animal-pose-estimation

DeepLabCut Mouse Location Prediction: Training a deep neural network to predict the location of a mouse using annotated joint positions.

data-analysis data-annotations data-preprocessing deep-learning machine-learning model-evaluation python-programming research research-project

Last synced: 17 Mar 2025

https://github.com/karencofre/riesgorelativo-lookerstudio

proyecto de análisis de datos y análisis perdicitvo en looker studio y google colab

bigquery data-analysis data-science machine-learning matplotlib python sklearn sql

Last synced: 03 Jan 2026

https://github.com/jackmnob/python-tableau-eda-stockdash

Data cleaning, preparation, and manipulation (EDA) for an interactive stock market dashboard with Tableau - using pandas (Python) via JupyterLab

cleaning-data dashboard data-analysis data-preparation eda jupyter-notebook jupyterlab python tableau-public

Last synced: 14 May 2026

https://github.com/kartikey2807/bike-classification-1rt700

Binary classification problem involving Logistic regression, SMOTE and feature expansion.

data-analysis data-engineering data-visualization logistic-regression

Last synced: 30 Jul 2025

https://github.com/sanveed-adnan/supermarket-sales-sql-project

SQL-based data analysis project on supermarket sales performance using SQLite and Power BI.

business-intelligence data-analysis data-science data-science-projects data-visualization power-bi sales-data sql sqlite

Last synced: 08 Nov 2025

https://github.com/alanmenchaca/getting-and-cleaning-data-course-project

The purpose of this project is to demonstrate how to collect, work with, and clean a data set.

data-analysis getting-and-cleaning-data rstudio tidy-data

Last synced: 31 Jul 2025

https://github.com/teamtigers/echartify

A web application built with .net core 2.2 that has come with the idea of reading the National Election's Data-set of Bangladesh in a fastest possible time and then representing the data-set with different statistical charts.

bangladesh chartjs code-first-migration cross-platform data-analysis data-structures data-visualization dotnet-core election-analysis election-data entity-framework-core materializecss mvc npoi razor-pages

Last synced: 16 Apr 2026

https://github.com/derogative404/google_data_analytics_capstone

Capstone project part of the Google Data Analytics Certificate Program

data-analysis excel r tableau

Last synced: 26 Mar 2025

https://github.com/alrza2003/google-data-analysis-case-study-cyclistic

This project analyzes Cyclistic’s trip data to identify patterns in bike usage between casual riders and annual members. The findings help optimize marketing strategies and membership conversions.

business-task cyclistic-bike-share-analysis-case-study data-analysis data-science data-visualization google-data-analytics google-data-analytics-capstone-project google-data-analytics-professional jupyter-notebook python rmarkdown tableau

Last synced: 09 May 2026

https://github.com/ayeshathoi/simulation-sessional-412

Simulation of SSQS, Inventory System, Transient State, PERT, Monte Carlo Alo etc.

data-analysis excel inventory-system monte-carlo python simulation ssqs triangle-distributions

Last synced: 31 Jul 2025

https://github.com/mainak-97/netflix-content-analysis-project

SQL-based analysis of Netflix’s movies and TV shows dataset to uncover content trends, popular genres, geographical insights, and audience preferences. Includes data queries, findings, and a presentation of key insights.

data-analysis mysql mysql-workbench powerpoint presentation-slides sql

Last synced: 23 Sep 2025

https://github.com/xenon1919/credit-card-fraud-detection

Credit Card Fraud Detection is a machine learning project to predict fraudulent credit card transactions. It handles imbalanced data using undersampling and applies Logistic Regression and XGBoost models. With an AUC of 0.98, it offers robust fraud detection. Includes a Streamlit app for real-time predictions.

data-analysis machine-learning python

Last synced: 14 May 2026

https://github.com/remram44/apex-legends-ocr-data

Get data from Apex Legends streams using OCR

apex-legends data-analysis video-games

Last synced: 31 Jul 2025

https://github.com/farrelfaricaf/exploratorydataanalyst---titanic

This project analyzes the Titanic dataset using exploratory data analysis (EDA) and visualization techniques to identify survival patterns. The goal is to understand how demographic factors like gender and age influenced survival rates during the 1912 disaster.

data data-analysis data-science data-visualization eda python titanic-dataset

Last synced: 31 Jul 2025

https://github.com/pauliorandall/airline-passenger-satisfaction-r

Analysing the Airline Passenger Satisfaction dataset from Maven Analytics

data-analysis data-analytics r

Last synced: 01 Aug 2025

https://github.com/computingvictor/mercadona_agent

Web app to explore supermarket products with advanced filters, search, favorites, and nutritional info. Includes data analysis notebooks for deeper insights.

css data-analysis data-science data-visualization filtering html interactive-ui javascript notebooks nutritional-info pandas product-catalog python supermarket webapp

Last synced: 09 Apr 2026

https://github.com/darkdk123/handwashing-discovery-analysis

A Guided Project in a Boot camp to Analyse the Original Data used in the Discovery of Viruses & Hand Washing By Dr. Ignaz Semmelweis in Vienna General Hospital in the 1840s.

data-analysis data-science data-visualization matplotlib-pyplot numpy pandas plotly-python python seaborn-plots

Last synced: 09 Apr 2026

https://github.com/kaushik-puttaswamy/amazon-sales-dashboard-using-tableau

The Amazon Sales Data Analysis Dashboard provides insights into key sales metrics like profit, revenue, shipment days, and units sold. It includes visualizations to assess performance by region, country, and sales channel. The dashboard helps stakeholders optimize strategies and improve profitability through data-driven analysis.

dashboard data-analysis data-visualization tableau

Last synced: 11 Jan 2026

https://github.com/mysftz/numerical-methods-in-matlab

Multiple MatLab scripts over multiple data analysis assignments.

data-analysis data-science matlab university university-assignment

Last synced: 14 May 2025

https://github.com/aygp-dr/claude-log-stream

Advanced analytics engine for Claude Code logs with real-time processing capabilities

claude-api clojure data-analysis monitoring

Last synced: 24 Sep 2025

https://github.com/palwisha-18/time_series_analysis_lex_vs_gdp

Analyzes how a country’s GDP per capita correlates with the life expectancy of its citizens over a period of about 100+ years

data-analysis data-visualization pandas plotl time

Last synced: 19 May 2026

https://github.com/aravind2060/employee_engagement_analysis_spark

Using Spark Structured APIs to analyze employee data and extract insights related to employee satisfaction, engagement, concerns, and job titles within an organization.

apache-spark data-analysis data-preprocessing docker docker-compose python

Last synced: 09 Apr 2026

https://github.com/jasoncobra3/finops-copilot

An end-to-end AI-powered FinOps platform that ingests cloud billing data, analyzes cost trends, answers natural-language questions using a RAG pipeline (LangChain + FAISS + sentence-transformers + Groq), and provides actionable cost optimization recommendations. Includes a FastAPI backend and Streamlit dashboard UI - fully containerized with Docker

ai-assistant cloud-cost-optimization cloud-enginee cost-analytics data-analysis devops docker faiss faiss-vector-database fastapi finops groq langchain llm pandas rag rag-pipeline sentence-transformers sqlite3 streamlit

Last synced: 13 Apr 2026