An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sujata-adhikari/data-analysis

Data analysis of Market sales data using PowerBi, created dashboard to show analysis.

data-analysis excel pandas powerbi

Last synced: 12 Jun 2026

https://github.com/chaganti-reddy/weather-prediction-australia

Creating a fully-automated system that can use today's weather data for a given location to predict whether it will rain at the location tomorrow.

data-analysis logistic-regression machine-learning prediction-model python3

Last synced: 13 Apr 2026

https://github.com/datalopes1/warehouse_rfv

Neste projeto será realizada uma análise do tipo RFV (Recência, Frequência e Valor) com dados que encontrei neste video no Youtube do canal Jie Jenn.

analise-rfv data-analysis data-science kmeans python rfm-analysis

Last synced: 28 Apr 2026

https://github.com/emmanuelletocs/steam-game-recommender

A powerful recommendation system for Steam games, combining Content-Based and Collaborative Filtering techniques. Built with Python, Scikit-learn, and Streamlit to deliver accurate, real-time game recommendations. Perfect for gamers and data scientists interested in building intelligent recommendation engines.

als-algorithm data-analysis gaming-industry knn machine-learning mds mysql ncf neural-network pyspark recommendation-engine recommendation-system scikit-learn spark

Last synced: 28 Apr 2026

https://github.com/jovicdev97/financial-loan-datascience-notebook

using numpy and pandas to analyze a synthetic loan dataset with python

data-analysis matlabplot numpy pandas plotting python seaborn

Last synced: 28 Apr 2026

https://github.com/ovuiproduction/tabletalk

TableTalk - Your Data, Your Language Query tabular data using natural language—no SQL required! Upload your data, ask questions, and get instant insights. 🔹 Convert Natural Language to SQL 🔹 Handle Complex Queries & Aggregations 🔹 Upload CSVs for Easy Analysis 🔹 React + Flask + SQLite3 Backend 🔹 Powered by LLMs for Accuracy

ai data-analysis flask llm machine-learning natural-language-processing prompt-engineering react sql sqlite

Last synced: 28 Apr 2026

https://github.com/george-njuguna/spotify-etl-pipeline

This is an ETL pipeline that uses Spotify API , Docker and Airflow

apache-airflow data-analysis docker pipelines python

Last synced: 28 Apr 2026

https://github.com/rorrell/coviddeaths

A Jupyter Notebook where I create several visualizations based on data about COVID-19 deaths from 2020 to 2024

data-analysis data-visualization jupyter-notebook python3

Last synced: 28 Apr 2026

https://github.com/luminati-io/Target-dataset-samples

A sample dataset of over 1000 target products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.

api data-analysis data-mining datasets target web-scraper web-scraping

Last synced: 09 Apr 2025

https://github.com/buabaj/fortran-assignment

code repository for fortran and python climatology assignment.

big-data climatology data-analysis data-visualization fortran90 python

Last synced: 28 Apr 2026

https://github.com/priyanshubiswas-tech/e-commerce_data_analysis

Analyzes 9,994 e-commerce transactions to uncover insights on sales trends, customer behavior, profitability, and logistics using EDA and visualization. Identifies top products, customer segments, and shipping efficiencies to optimize marketing, inventory, and operations, making it valuable for retail, finance, and logistics.

data data-analysis data-visualization pandas pandas-dataframe plotly-analytics-projects plotly-express python

Last synced: 28 Apr 2026

https://github.com/nlink-jp/shell-agent-v2

macOS local-first chat & agent tool with interactive data analysis (Wails v2 + React)

data-analysis duckdb golang llm macos react wails

Last synced: 13 May 2026

https://github.com/josedanielchg/efficient-data-storage-for-predictive-modeling

DataCamp project from the Associate Data Scientist track, focusing on optimizing dataset storage by transforming data types and filtering. Prepares data for efficient machine learning workflows

cleaning-dataset data-analysis jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/kisaa-fatima/data-visualization-with-tableauleu

Conducted Exploratory Data Analysis (EDA) on the Berkeley Earth Dataset (large scale dataset), which features high-resolution land and ocean time series data. Created interactive dashboards using Tableau to effectively visualize and highlight trends and patterns within the data.

data-analysis data-science exploratory-data-analysis insights python tableau visualizations

Last synced: 29 Apr 2026

https://github.com/chrispsang/healthcare-dataanalysis

Analyze synthetic patient data to identify trends, improve healthcare delivery, and predict patient outcomes using machine learning models. Includes data exploration, preprocessing, model building, and visualizations.

data-analysis data-science data-visualization healthcare jupyter-notebook machine-learning python

Last synced: 29 Apr 2026

https://github.com/luminati-io/Indeed-dataset-samples

A sample dataset of over 1000 Indeed job listings, extracted using the Bright Data API, ideal for market analysis and growth.

api data-analysis datasets indeed jobs web-scraping

Last synced: 09 Apr 2025

https://github.com/devexpress-examples/winforms-visualize-pivot-grid-data-in-chart

The following example shows how to integrate the Pivot Grid with the Chart control.

charting data-analysis dotnet pivot-grid-for-winforms winforms

Last synced: 29 Apr 2026

https://github.com/vanshuchaudhary/zomato

This Jupyter Notebook contains an exploratory data analysis (EDA) of Zomato restaurant data. It includes data cleaning, visualization, and insights into restaurant ratings, pricing, cuisine distribution, and location-based trends.

business-analytics data-analysis data-mining data-science data-visualization datascience matplotlib pandas-dataframe pandas-python python python-3 python-library

Last synced: 29 Apr 2026

https://github.com/satyacoder29/crowdfunding-in-sql

Crowdfunding is a method of raising funds for projects or causes by collecting small contributions from a large group of people, usually through online platforms. It enables individuals, startups, and nonprofits to secure funding, offering rewards or recognition in exchange, and helps bring ideas to life without traditional financing.

data-analysis data-cleaning database-management mysql-database quries sql sql-functions sql-server views

Last synced: 29 Apr 2026

https://github.com/anilyigitsel/istanbul-rental-apartments-analysis

This project analyzes the Istanbul Rental Apartments Dataset (2025), which includes rental apartment listings from Istanbul, Turkey.

data-analysis data-visualization jupyter-notebook matplotlib pandas python rental-housing

Last synced: 29 Apr 2026

https://github.com/alexgenovese/react-charts-covid-19-data

Examples on COVID-19 data using different library charts: G2, G2Plot, Plotly, ApexCharts

data-analysis data-science data-visualization react reactjs

Last synced: 13 May 2026

https://github.com/ibrahimhabibeg/national-university-of-singapore-sms-analysis

Analysis of SMS messages collected by the National University of Singapore

analytics data-analysis data-science nlp python

Last synced: 13 May 2026

https://github.com/rafgpereira/obmep-analise

Código que analisa a retrospectiva das premiações da Obmep em determinada localidade e escola

data-analysis excel pandas python

Last synced: 29 Apr 2026

https://github.com/vedanty3/supermarket-sales-data-analysis

This project contains data visualization techniques (using pandas and matplotlib) to explore different aspects of supermarket sales data of 3 months.

data-analysis data-science jupyter-notebook matplotlib numpy pandas python

Last synced: 08 May 2026

https://github.com/taljindergill78/yelp-arizona-analysis

This project analyzes the Yelp dataset for the state of Arizona to extract insights about restaurant businesses and user behavior. Using Apache Spark and PySpark for distributed data processing, the project demonstrates how big data tools can be used to uncover patterns in customer reviews, business performance, and user engagement.

big-data data-analysis data-engineering distributed-computing pyspark spark sql yelp-dataset

Last synced: 29 Apr 2026

https://github.com/marina-gal/elderly-care-ranking

Data analysis and scoring model for elderly care homes, including data cleaning, transformation, 0–100 scoring, and ranking across multiple quality dimensions.

data-analysis excel ranking

Last synced: 30 May 2026

https://github.com/farhad-here/textprepx

A Multilingual Text Preprocessing Tool for English and Persian.

cleantext contractions data-analysis deep-learning emoji nlp nltk opp parsivar regex streamlit text-preprocessing textblob

Last synced: 29 Apr 2026

https://github.com/sharoonjoseph11/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/sdley/cas_pratique-del_annuel

Del-Annuel est logiciel de deliberation annuelle des ecoles superieures ou universités

data-analysis pandas python tkinter-gui

Last synced: 29 Apr 2026

https://github.com/varshan1123/sql-tableau-project

We analyze key indicators for our pizza sales data to gain insights into our business performance - A Data Analysis Project performed on Tableau & SQL.

analysis data-analysis data-science data-visualization excel mysql powerbi sql sql-server tableau tableau-dashboards

Last synced: 29 Apr 2026

https://github.com/brevex/code-complexity-data-analisis

Data collection that shows different complexity scores in an algorithmic dataframe.

code-analysis data-analysis data-science python

Last synced: 29 Apr 2026

https://github.com/dindagustiayu/data-processing

The digital text book to interpreting characterisation results.

characterisation data-analysis gitbook latex-package myst qualitative-analysis quantitative-analysis

Last synced: 08 Jun 2026

https://github.com/supertetelman/frc-data-analysis

A Collection of R, Matlab, and Bash scripts that were developed in real-time from the stands of a FRC competition. Gathered data from various online sources, parsed it, and ran some basic analysis on it to calculate ratings and make basic match predictions. Results were mad public and hosted live via AWS. Developed as a student teaching tool under poor Internet Connectivity with minimal access to real-time match data.

bash data-analysis matlab r teaching

Last synced: 29 Apr 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/ahshah322/world-happiness-report-2025

Data analysis and visualization of the World Happiness Report 2025 using Python (pandas, seaborn, matplotlib). Explores how GDP, health, freedom, generosity, and corruption perception influence global happiness.

data-analysis data-science matplotlib numpy pandas python seaborn worldhappiness

Last synced: 29 Apr 2026

https://github.com/avazasgarov/soccer-hypothesis-testing

Statistical analysis comparing goal-scoring patterns in Men’s vs. Women’s FIFA World Cups using hypothesis testing.

data-analysis eda hypothesis-testing matplotlib-pyplot pandas pingouin python scipy

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/rijul007/smartwatch-data-analysis-using-python

Smartwatch Data Analysis to uncover insights into health and activity patterns using Python for data cleaning, exploratory analysis, and interactive visualizations.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas plotly python

Last synced: 30 Apr 2026

https://github.com/mfakhriazhar/nlp-movie-recommender-system

This project is a content-based movie recommender system built using Natural Language Processing (NLP) techniques. By extracting and combining important text features from movie metadata, this system suggests movies that are similar to a user's selected title.

data-analysis data-science deep-learning machine-learning natural-language-processing python recommender-system

Last synced: 30 Apr 2026

https://github.com/deliprofesor/joblocationmapper

JobLocationMapper is a Python tool that visualizes job listings on an interactive map. It uses city and state data to place job markers accurately and color-codes them by occupation (Software, Marketing, Design). The map clusters markers for better organization, and users can click on them to view job details.

clustrered-markers data-analysis data-visualization folium geocoding geographical-visualization interactive-map job-listings map-visualization pandas python

Last synced: 14 May 2026

https://github.com/aniketmondal/dataanalysis

Contains cleaning, transformation, and exploratory analysis of various data sets using Python Pandas, NumPy, re, random, etc.

analysis data-analysis data-science pandas python

Last synced: 30 Apr 2026

https://github.com/busra-deveci/kaggle-iris_data_analysis

Exploratory data analysis and visualization of the Iris dataset using Python.

data-analysis iris-dataset kaggle pandas python seaborn visualization

Last synced: 30 Apr 2026

https://github.com/badranalyst/e-commerce-customer-analysis-data-science-foundations-case-study

This case study explores e-commerce customer data through data exploration, pre-processing, and splitting. It includes model building and training to analyze customer behavior. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used for the analysis and model development.

data-analysis data-science dataset eda exploratory-data-analysis machine-learning matplotlib ml model-building model-training numpy pandas pre-processing python seaborn

Last synced: 01 May 2026

https://github.com/syarwinaaa09/investigating-netflix-movies

🎬 investigating netflix movie trends using python and pandas 📊

csv data-analysis matplotlib netflix pandas visualization

Last synced: 01 May 2026

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026

https://github.com/manganite/vibespin

VibeSpin is a Python framework for simulating and analyzing 2D lattice spin systems (Ising, XY, and q-state Clock models) with Numba-accelerated Monte Carlo dynamics, correlation/structure diagnostics, and reproducible benchmarking workflows.

clock-model critical-phenomena data-analysis ising-model lattice-models monte-carlo-simulation phase-transitions physics-simulation python scientific-computing spin-models spin-systems statistical-mechanics xy-model

Last synced: 29 Jun 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/caesaredia/la-cafe-market-analysis

A data-driven feasibility study exploring the potential of launching a robot-staffed café in Los Angeles, based on real F&B business data.

business-intelligence cafe data-analysis data-visualization food-industry franchise los-angeles market-research pandas python

Last synced: 01 May 2026

https://github.com/monish-nallagondalla/sensor_fault_detection

This repo contains sensor data for analysis, focusing on sensor readings, their attributes, and classification (Good/Bad). It includes 500+ sensors with features for predictive modeling, anomaly detection, and sensor failure prediction.

anomaly-detection classification data-analysis data-science machine-learning predictive-modeling python sensor-data

Last synced: 01 May 2026

https://github.com/parth-jatav/super-store-analysis-project

The Super Store Analysis project leverages Python libraries such as pandas, matplotlib, and numpy to perform a comprehensive analysis of a retail store's data. This project includes data cleaning, visualization, and statistical analysis to identify key trends, optimize inventory, enhance decision-making processes for improved business performance.

data-analysis matplotlib numpy pandas python super-store

Last synced: 12 Apr 2026

https://github.com/kavicastelo/soil-fertilizer-analysis-colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 01 May 2026

https://github.com/nurulashraf/hierarchical-clustering-customer-segmentation

A customer segmentation project using hierarchical clustering to group customers based on their spending behaviour and demographics. This helps businesses identify patterns and create targeted marketing strategies.

business-analytics clustering-algorithm customer-segmentation data-analysis hierarchical-clustering machine-learning python unsupervised-learning

Last synced: 18 May 2026

https://github.com/kheriberto/pandas_and_seabron_project

In this project I showcase my ability using pandas and seaborn to mold, transform and plot data.

data-analysis pandas python seaborn

Last synced: 01 May 2026

https://github.com/balajimohan18/foreign-exchange-rate-time-series-datascience-project

This project will use time series analysis to forecast the exchange rate between the euro and the US dollar. The project will use a variety of statistical techniques, such as ARIMA to model the data and forecast the exchange rate.

data-analysis data-analytics data-preprocessing data-science data-transformation data-visualization eda exploratory-data-analysis foreign-exchange-rates machine-learning model-fitting predictive-modeling python3 time-series time-series-analysis

Last synced: 14 May 2026

https://github.com/guptakushal03/whatsapp-chat-analyser

The WhatsApp Chat Analyzer is a Python-based tool built with Streamlit for analyzing WhatsApp chat data. It provides insights such as total messages, word count, media shared, links shared, monthly activity timeline, most active users, activity maps, and word clouds.

chat-analysis data-analysis data-visualization python streamlit text-processing whatsapp word-cloud

Last synced: 01 May 2026

https://github.com/mxagar/data_science_udacity

My personal notes, code and projects of the Udacity Data Science Nanodegree.

dashboard data-analysis data-engineering data-science machine-learning-pipelines

Last synced: 09 Apr 2025

https://github.com/faithererer/haokanvideo_spider

好看视频爬取与数据分析

data-analysis data-visualization python spider

Last synced: 02 May 2026

https://github.com/yashsingh43/lung-cancer-biomarker-analysis

Gene expression analysis to identify biomarkers for early lung cancer detection (SCLC & NSCLC)

bioinformatics biomarkers cancer cytoscape data-analysis gene-expression gsea nsclc r sclc

Last synced: 11 Jun 2026

https://github.com/ronitjariwala/prodigy_ds_04

Prodigy InfoTech Data Science Internship Task-4

data-analysis data-science data-visualization python

Last synced: 02 May 2026

https://github.com/bhawnagoyal18/ai-doctor-a-symptom-checker-disease-predictor

AI Doctor is an intelligent healthcare application that utilizes machine learning (ML) and Python to predict potential diseases based on user-input symptoms. The project integrates data from multiple medical datasets and provides an interactive web-based UI for an intuitive user experience.

data-analysis data-engineering data-visualization dataset flask html5 machine-learning python sql stacking statistics

Last synced: 02 May 2026

https://github.com/isaqueiros/motorpremium-predictions-mlpclassifier

This Jupyter Notebooks is an initial study of the application of sklearn neural network MLP Classifier model. The model is applied to dataset MotorPremiums, which is supplied separately in .csv format.

data-analysis data-science machine-learning neural-network python sklearn-library

Last synced: 02 May 2026

https://github.com/fatihilhan42/spotify-songs-recommendations-system_with_python

We developed a song recommendation system for the user with the data we received from our Spotify song dataset. Data set and other applications are given in the description. Have a nice day.

data-analysis data-science data-visualization jupyter-notebook python recommendation-engine recommendation-system

Last synced: 02 May 2026

https://github.com/m0saan/python-for-data-analysis

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney,

data-analysis data-science ipython-notebook machine-learning matplotlib numpy pandas python

Last synced: 02 May 2026

https://github.com/iamsainikhil/web-data-scraping

Data scraping from a webpage using Python

beautiful-soup data-analysis data-scraping python

Last synced: 11 Jun 2026

https://github.com/skuschel/postexperiment

postprocessor for experimental (event based) data.

data-analysis eventstore hacktoberfest postprocessing

Last synced: 12 Jun 2026

https://github.com/inevolin/multivariate-data-analysis

Showcases of modern multivariate & multidimensional data analysis in industrial and high-tech settings.

analytics data-analysis data-science data-visualization javascript

Last synced: 09 Jun 2026

https://github.com/maddieemihle/pandas-challenge

Python analysis to create and manipulate school and standardized test data. Scores are calculated, grouped, aggregated, summarized, and organized using pandas.

data-analysis pandas-python

Last synced: 09 Jun 2026

https://github.com/bhavna-kale/cars-eda-project

Project analyzing used car market data to identify high-impact price drivers and depreciation curves, presented through an interactive web application.

data-analysis excel matplotlib numpy pandas python3 searborn streamlit

Last synced: 03 May 2026

https://github.com/monteirooscar98/tarifas-publicas-sp-dieese

Extração de dados através de WebScraping no site do Dieese e Analise em relação as Tarifas Públicas do Município de São Paulo.

data-analysis data-visualization python webscraping

Last synced: 03 May 2026

https://github.com/chaedoll/analysis-python-foreignerinfra

국내 외국인 대상 인프라 개선을 위한 보고서 (Report on improving infrastructure for foreigners)

data-analysis python team-project

Last synced: 03 May 2026

https://github.com/sambit-mondal/stockx

StockX is a full-stack application designed to help store owners efficiently manage their inventory, track purchases, and analyze stock levels. The system integrates MongoDB, Express, React, and Flask (Python) to provide a seamless experience.

artificial-intelligence data-analysis inventory-management-system machine-learning mern-stack

Last synced: 12 Jun 2026

https://github.com/vipulbunny/restaurant-insight-analysis

A comprehensive data analysis project exploring restaurant ratings, locations, and customer sentiments. This project includes data preprocessing, descriptive analysis, geospatial mapping, sentiment analysis, and price-rating correlations using Python and visualization tools.

data-analysis data-preprocessing data-visualization folium geospatial geospatial-analysis geospatial-visualization machine-learning nlp pandas python restaurant-insights seaborn sentiment-analysis

Last synced: 03 May 2026

https://github.com/muskanmi/data_analysis_python

Data analysis on students result dataset using python libraries.

boxplot countplots data-analysis numpy pandas pie-chart python3 seaborn

Last synced: 03 May 2026

https://github.com/ankitgmishra/machinelearning

Continuously deep diving in understanding & advancing my expertise in Machine Learning through ongoing education and hands on experience with practical learning.

artificial-intelligence data-analysis data-cleaning data-gathering machine-learning machinel-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/busradeveci/student-performance-prediction

A machine learning project to predict student exam performance based on academic, social, and personal features. Built with Python and scikit-learn.

data-analysis kaggle linear-regression machine-learning predictive-modeling python scikit-learn student-performance

Last synced: 25 Apr 2025

https://github.com/saksham-jain177/cryptodataanalysis

A Python powered project that fetches live cryptocurrency data from the CoinMarketCap API, analyzes it, and updates a live Excel sheet every 5 minutes.

api-integration coinmarketcap cryptocurrency data-analysis excel live-data python

Last synced: 12 Jun 2026

https://github.com/nurulashraf/logistic-regression-loan-prediction

Loan approval prediction using logistic regression based on applicant data, including income, credit history, and property details, after data preparation and feature engineering.

data-analysis data-science loan-prediction logistic-regression machine-learning predictive-modeling python sklearn

Last synced: 03 May 2026