An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/brunomontezano/benzocovid

💊 Data Analysis Project of Benzodiazepines during COVID-19 Pandemic.

benzodiazepines covid-19 data-analysis

Last synced: 28 Feb 2025

https://github.com/as16082023/hotel-booking-analysis-eda-

Exploratory Data Analysis on hotel booking data using Python

data-analysis data-visualization exploratory-data-analysis jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/mafda/seattle_airbnb_data_analysis

This repository contains a comprehensive analysis of the Seattle Airbnb dataset, conducted using the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology.

crisp-dm data-analysis data-science jupyter-notebook pandas-python seattle-data

Last synced: 29 May 2026

https://github.com/jiachengwang-punch/predictive-analytics-skill

A reusable, multi-model, language-adaptive methodology for end-to-end machine learning analysis of tabular data.

claude-skill codex-skill data-analysis data-science deepseek feature-engineering lightgbm llm machine-learning methodology prompt-engineering tabular-data

Last synced: 30 May 2026

https://github.com/prernarohra/quakeguard

QuakeGuard is an innovative project for reducing earthquake intensity and structural damage. It takes a proactive approach to seismic activity, by using complex algorithms and real-time data to improve safety and resilience for people in earthquake-prone areas.

artificial-intelligence backend data-analysis data-science earthquake-intensity final-year-project front-end geology machine-learning open-source python visualization

Last synced: 21 May 2026

https://github.com/randomshek/Working-With-Excel

Using Excel Power Query and PowerPivot, reorganise the data into a star schema and showcasing reports that can be created by data analysts using DAX formulae and PowerPivot

data-analysis excel power-pivot power-query

Last synced: 20 Jul 2025

https://github.com/kishlayjeet/zomato-data-exploration

In this project, we will be exploring a dataset containing information on various restaurants and their ratings, location, and other attributes.

data-analysis eda matplotlib numpy pandas zomato-data-exploration

Last synced: 10 Apr 2026

https://github.com/prernarohra/mental-health-prediction

This project focuses on predicting mental health outcomes using machine learning algorithms. By analyzing various psychological, social, and lifestyle factors, the model aims to identify individuals at risk, enabling early intervention and support.

data-analysis data-science data-visualization machine-learning mental-health python

Last synced: 20 May 2026

https://github.com/saeun-park/lg-aimers-4th

MQL 데이터 기반 B2B 영업기회 창출 예측 모델 개발

b2b data-analysis data-science machine-learning mql

Last synced: 08 Apr 2025

https://github.com/zen204/airbnb-availability

A machine learning model that predicts Airbnb listing availability, utilizing feature engineering and supervised learning techniques to improve guest experience and optimize host management.

binary-classification data-analysis data-preprocessing data-visualization feature-engineering machine-learning matplotlib model-evaluation nlp pandas predictive-modeling python scikit-learn seaborn supervised-learning

Last synced: 21 Jan 2026

https://github.com/robthepcguy/ahk-mouse-heatmap

An AutoHotkey script that records left, right, and middle mouse clicks, logging the date, time, and x and y coordinates. It features automatic GUI updates and generates a visual heatmap via a Python script, accessible from the system tray. This tool is ideal for analyzing user interaction and creating detailed mouse activity maps.

autohotkey-script click-counter click-log click-map data-analysis data-visualization heatmap heatmap-visualization mouse-events mouse-tracking python

Last synced: 01 Apr 2025

https://github.com/m-faizan-mahmood/detailed-exploratory-data-analysis-eda-marketing-recomendations.

This project focuses on cleaning, preprocessing, and analyzing data using Pandas and NumPy. Key steps include handling missing values, removing outliers, feature engineering, and exploratory data analysis (EDA). Visualizations with Matplotlib and Seaborn highlight trends in customer spending, campaign performance, and product sales.

big-data data-analysis data-processing data-science eda exploratory-data-analysis numpy pandas python

Last synced: 11 Apr 2026

https://github.com/ondrejhruby/countries-of-the-world

Explore global data with this repository, featuring insights, visualizations, and Python code examples on countries worldwide—perfect for enhancing your data analysis and visualization skills.

data-analysis data-science data-visualization geography jupyter-notebook machine-learning matplotlib pandas python statistics

Last synced: 16 Apr 2026

https://github.com/prernarohra/heart-disease-prediction

This project develops a machine learning model to predict heart disease risk based on symptoms and medical history. The model achieved the best accuracy with Logistic Regression, as it works well for binary classification problems.

artificial-intelligence data-analysis data-science dataset heartdisease-prediction machine-learning models

Last synced: 06 Nov 2025

https://github.com/sandergi/designbuildfly

Useful tools made for Design Build Fly at UW, hosted on Glitch so teammates can easily access. Check out our optimization tools here: https://github.com/JPaonaskar/DBF-Optimization

data-analysis inav-blackbox webapp

Last synced: 01 Apr 2025

https://github.com/aaryan-agr/canadian-energy

This project analyzes Canada's energy trade, focusing on imports, exports, and market trends in the energy sector.

data-analysis data-cleaning data-manipulation data-processing data-science data-vizualisation energy-sector time-series-analysis

Last synced: 10 Jun 2025

https://github.com/solrikk/pictrace-web

PicTraceV2 is a highly efficient image matching platform that leverages computer vision using OpenCV, deep learning with TensorFlow and the ResNet50 model, asynchronous processing with aiohttp, and Selenium for browser automation. PicTraceV2 allows users to upload images directly or provide URLs, quickly scanning a vast database to find image

automation computer-vision data-analysis data-extraction deep-learning image-processing image-search machine-learning natural-language-processing opencv openpyxl pandas python selenium tensorflow web-scraping yandex yandex-api

Last synced: 12 Apr 2026

https://github.com/whitehathackerpr/data-visualization-tool

This is a Python-based web application that allows users to upload datasets, analyze data, and create visualizations interactively. The tool is designed for ease of use and provides a simple interface to perform basic data analysis and generate visualizations

data data-analysis data-visualization python python3

Last synced: 05 Sep 2025

https://github.com/jerela/mola

A Python library for matrix algebra

data-analysis linear-algebra-library matrix-algebra python

Last synced: 14 Jan 2026

https://github.com/asifdotexe/flipkart-electric-scooter-data-analysis

In this project, I have web scraped Electric Scooter data from Flipkart and turn it into a csv file for further analysis

beautifulsoup4 data-analysis data-science flipkart webscraping

Last synced: 29 May 2026

https://github.com/greed2411/ndl

Numbers Don't Lie, attempt on Data Analysis using pandas and matplotlib.

cities data-analysis data-science data-visualization india kaggle

Last synced: 19 Apr 2026

https://github.com/viztruth/google-play-store-data-analysis

This repository contains all the materials of my final project 'Google Play store Data Analysis' for the 'Telling Stories with Data' course at PES University.

data-analysis data-visualization

Last synced: 21 Aug 2025

https://github.com/shakhthi/deep-learning

All Materials, Practice codes and Projects related ML & DL

data-analysis deep-learning machine-learning

Last synced: 09 Apr 2025

https://github.com/rahul-404/full_stack_data_science_with_generative_ai

Welcome to the repository for the course "Full Stack Data Science with Generative AI". This repository is designed to accompany the course and provide resources, exercises, and projects related to the study of data science and generative AI techniques.

data-analysis data-science data-visualization database deep-learning exploratory-data-analysis feature-engineering generative-ai machine-learning nlp python statistics

Last synced: 12 Apr 2026

https://github.com/avijit-jana/redbus-data_scraping_and_filtering_with_streamlit_app

A Streamlit-based application leveraging Selenium to automate data scraping from Redbus, enabling efficient collection, analysis, and visualization of bus travel data for improved operational efficiency and strategic planning in the transportation industry.

automation dashboard data-analysis data-visualization datadrivendecisions python3 redbus selenium-python streamlit-application webscrapping

Last synced: 15 Mar 2025

https://github.com/geetisha/sales_insight_data_analysis_using_sql_and_tableau-etl-

Sales Insights - A Data Analysis Project performed on Tableau & SQL Topics

dashboard data-analysis data-visualization mysql project sales-analysis sql tableau

Last synced: 07 Jan 2026

https://github.com/nouman6093/advanced-statistical-models

in this repository i will upload everything i have learned about data science advanced statistical models. there are over 42 statistical models. each of them work on algorithms. and there are over 32 algorithms. each library has its own way of writing such statistical models. after learning i will try to upload as much statistical models as possibl

data data-analysis data-science data-visualization

Last synced: 11 Jun 2026

https://github.com/sijuswamy/data-analytics-using-r

Course Repository for Data Analysis using R- Add-on course

data-analysis

Last synced: 12 Apr 2025

https://github.com/ryannapp12/quant_trading_engine

A modular, and scalable quantitative trading engine built in Python. This project demonstrates efficient data caching with SQLite, concurrent backtesting, and advanced risk analytics, showcasing best practices in clean code architecture and performance optimization.

algorithmic-trading backtesting dash data-analysis data-visualization fintech lstm machine-learning numpy pandas plotly python quantitative-finance real-time risk-management sqlite technical-analysis tensorflow time-series-analysis trading-strategies

Last synced: 11 Apr 2026

https://github.com/danhenriquex/data-science-project

The main goal of this project was to apply the concepts of data visualization and analysis.

data-analysis data-science numpy pandas python

Last synced: 12 Apr 2026

https://github.com/allanotieno254/codsoft

This repository showcases a series of data science projects completed during an internship with CODESOFT. Each project utilizes Python and various machine learning techniques to solve specific problems in data analysis, classification, regression, and predictive modeling.

classification data-analysis data-science feature-engineering machine-learning model-evaluation predictive-modeling python-programming regression

Last synced: 15 May 2025

https://github.com/asifdotexe/air-quality-analysis-aqa

AQA is a data-driven project focused on analyzing air quality data sourced from data.gov.in. The project encompasses data preprocessing, analysis, and visualization to gain insights into air pollution levels across various locations in India. By examining six key pollutants, the project aims to raise awareness about the environmental issues

aqi-analysis data-analysis data-preprocessing data-science data-visualization presentation

Last synced: 07 Jun 2026

https://github.com/jakubkorytko/data-graphs

Transform raw data into captivating visual stories with this app, effortlessly craft stunning data charts that unveil insights and trends

charts data-analysis mit-license open-source

Last synced: 14 May 2026

https://github.com/ssreeramj/youtube_channels_analysis

This web app gives a detailed analysis of the videos uploaded in a particular youtube channel.

data-analysis heroku pandas python streamlit youtube

Last synced: 29 Apr 2026

https://github.com/sunnybibyan/exploratory-data-analysis-eda

Welcome to the Titanic Dataset - Exploratory Data Analysis (EDA) project repository! This project aims to uncover insights from the Titanic dataset using Python and Jupyter Notebook. By analyzing key variables such as age, gender, and class, we aim to visualize relationships between passenger characteristics and survival rates.

data-analysis data-visualization jupyter-notebook python titanic-dataset

Last synced: 18 Jan 2026

https://github.com/carolinedotxyz/dp_sgd_classification

A hands-on educational walkthrough of training a CelebA (Eyeglasses) image classifier with Differentially Private SGD using PyTorch and Opacus. The focus of this repo is on clarity and reproducibility through balanced subsets, deterministic preprocessing, and side-by-side baseline vs. DP training, while acknowledging real trade-offs.

celeba-dataset classification data-analysis dp-sgd machine-learning opacus python pytorch

Last synced: 16 May 2026

https://github.com/ahammadshawki8/playing-with-pandas

🐼 Pandas is one of my favourite library in python. It is well-known for "Analyzing" data. Learn basics and beyond the basics of Pandas from this repository. 🤍🖤

beginner-friendly data-analysis favourite-library pandas python

Last synced: 17 Apr 2026

https://github.com/chelseammatta/nopd-cad-data-analysis

Analysis of 911 call data from New Orleans' 3rd & 4th police districts (2019-2022) using BigQuery

911-calls 911-data bigquery cad-data crime-analysis data-analysis emergency-response new-orleans public-safety sql

Last synced: 01 Jul 2025

https://github.com/luminati-io/airbnb-dataset-samples

A sample dataset of over 1000 Airbnb listings, extracted using the Bright Data API, ideal for competitor tracking, brand reputation, and market analysis.

airbnb airbnb-listings api data-analysis datasets web-scraper web-scraper-api web-scraping

Last synced: 04 Jan 2026

https://github.com/ultrasage-danz/weather-data-analysis

Weather Data Analysis notebook project. Created using Google collab

collaboration data-analysis data-science dataset google google-colab-notebook project

Last synced: 24 Mar 2025

https://github.com/leosimoes/uerj-tcc-analisador-dados-texto

Texto do trabalho de conclusão de curso (TCC) em engenharia de computação. Aplicativo Web para análise de dados.

data-analysis data-science data-visualization python streamlit

Last synced: 24 Mar 2025

https://github.com/targetta/ankaflow

YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.

bigquery clickhouse data-analysis dataops deltalake duckdb elt-pipeline etl etl-automation motherduck parquet python sql

Last synced: 09 Oct 2025

https://github.com/leosimoes/udacity-starbucks

Project 3 of the Udacity Machine Learning Engineer Nanodegree Program. Data analysis and machine learning application to Starbukcs data.

aws-iam aws-s3 aws-sagemaker data-analysis data-science machine-learning python

Last synced: 24 Mar 2025

https://github.com/zeinhasan/eksploration-and-data-visualization-course-material

Exploratory Data Analysis (EDA) Laboratory Assistant Teaching Materials

data-analysis data-visualization statistics

Last synced: 11 May 2026

https://github.com/sanam2405/chatinfo

Analysing the WhatsApp Chat with my crush over a 6M period

data-analysis data-visualization python

Last synced: 27 Apr 2026

https://github.com/jabhij/eda_experiments

In this repo I'll use different types of datasets to explore and implement various Exploratory Data Analysis (EDA) approaches.

ames-housing analysis battery-life blackfriday-analysis data-analysis data-science data-visualization eda matplotlib-pyplot numpy pandas python seaborn visualization zomato-data-analysis

Last synced: 14 Apr 2026

https://github.com/muneeb1030/webscrapper_mastodon

The Mastodon Social Platform Scraper is a Python-based web scraping tool designed to explore and extract valuable data from the Mastodon social platform.

data-analysis data-collection mastodon python3 scrapy scrapy-spider selenium-python webscraping

Last synced: 09 Oct 2025

https://github.com/antonijn/polyfit

Fits a polygon to a given data input

c data-analysis linear-algebra toy

Last synced: 16 Jul 2025

https://github.com/aritrakar/statpy

A simple package containing some functions for analysing Gaussian and Binomial distributions. Created for the Udacity AWS MLE Foundations 2021 course.

data-analysis python statistics

Last synced: 24 Oct 2025

https://github.com/olob0/badwords-pt-br

💬 Wordlist com palavrões em pt-BR para análise de dados, filtros, ou texto considerado "evitável"

badword-filter badwords brasil data-analysis filter filter-lists filterlist portugues portuguese text-analysis wordlist

Last synced: 06 Jan 2026

https://github.com/ankit21111/filmilytics

This repository contains data and analysis on RSVP Movie House Production, focusing on past performance metrics and audience trends. Our goal is to derive actionable insights that can guide future productions for greater success. Explore the data, analysis scripts, and recommendations to understand how RSVP can thrive in the film industry.

data-analysis database database-design database-schema erdiagram sql

Last synced: 13 Jun 2025

https://github.com/john-science/data_science_by_example

Examples of Data Science Tools & Libraries

data-analysis data-science ipython pandas

Last synced: 12 May 2025

https://github.com/shuklayash02/data_analysis_using_r

Covid19 analysis and cleaning of data where the death age and deaths of specific gender is cleaned and analysed

analysis cleaning-data data-analysis data-visualization rprogramming

Last synced: 09 Oct 2025

https://github.com/nafisalawalidris/buybuy-e-commerce-company

The BuyBuy E-commerce Company repository is a comprehensive hub for the company's e-commerce platform. It includes source code, documentation, and data analysis insights, providing a data-driven approach to improve customer experience, drive revenue, and inform decision-making.

buybuy cleaning-data company customer-experience data data-analysis decision-making documentation e-commerce excel insights postgresql repository revenue source-code sql

Last synced: 16 Mar 2025

https://github.com/raccoon-hero/gender-equality-tracker

A web application visualizing gender equality metrics with a focus on Ukraine. Built with Flask, it's powered by live data from global open sources, with dynamic research insights and analysis.

chartjs css dashboard data-analysis data-visualization flask frontend gender-equality global-metrics html linked-data openalex opendata python representation semantic-web ukraine webapp wikidata world-bank-api

Last synced: 07 May 2026

https://github.com/nafisalawalidris/tools-for-data-science

It covers popular languages (Python, R, SQL) and libraries (NumPy, Pandas) used in the field. The author shares their objectives of teaching data analysis, web development, and critical thinking skills. The repository also includes code examples, explanations of arithmetic expressions, and contact information for the author.

arithmetic-expressions data-analysis data-science data-visualization languages libraries matplotlib numpy pandas programming python r sql tools web-development

Last synced: 11 Apr 2026

https://github.com/sevdanurgenc/python-for-data-science-lecture-notes

In this repo, I have the course contents of Python for Data Science training, which will be given to Siemens by the cooperation of Academy Peak Information Technologies Training and Consultancy between 28 June - 1 July 2022.

data-analysis data-mining data-modeling data-science data-structure data-visualization matplotlib-tutorial numpy-tutorial pandas-tutorial

Last synced: 23 Mar 2025

https://github.com/odeyiany2/flit-apprenticeship-data-science-projects

This repo contains all my projects for my FLiT Apprenticeship

data-analysis data-science data-visualization machine-learning sql

Last synced: 17 May 2026

https://github.com/ifibla/adsdb-project

Algorithms, Data Structures and Databases Project

data-analysis data-engineering python

Last synced: 12 Apr 2026

https://github.com/dwidevelopes/database-input-pelanggran-mahasiswa

Menginput data Mahasiswa Yang Melakukan Pelanggran yang siap di data dan di hukum Dan juga siap Terkena Sanksi

aplikasi aplikasi-sekolah data data-analysis database input-method mahasiswa sekolah siswa siswi website

Last synced: 02 May 2026

https://github.com/tqhungdev0605/crawl_200_jd_dataanalyst

Automate job data scraping for 200 Data Analyst postings on https://vn.indeed.com using Python

data-analysis jupyter-notebook python3 scraping selenium

Last synced: 11 Apr 2026

https://github.com/priyanshubiswas-tech/aws-mwaa-elt-airflow-sql-dbt-superset-project

This project was created as part of an assessment for DigitalXC AI. It demonstrates a cloud-based ELT pipeline using AWS MWAA, Airflow, dbt, PostgreSQL, and Superset. The pipeline automates data ingestion from S3, transformation with dbt, and visualization through Superset, following modern data engineering practices on a scalable AWS architecture.

apache-airflow apache-superset aws-s3 dag data-analysis data-engineering-pipeline data-visualization dbt elt-pipeline python rds-postgres

Last synced: 03 Jul 2025

https://github.com/luochang212/weibo-analysis

Data analysis based on sina weibo.

data-analysis weibo

Last synced: 03 Apr 2026

https://github.com/smusab9152/pokemon_data_analysis

This repo that explores and analyzes a dataset of Pokémon attributes. The analysis includes data cleaning, exploratory data analysis (EDA), and visualizations .

analytics data-analysis data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas pokemon python seaborn statistical-analysis

Last synced: 02 May 2026

https://github.com/PatriLoto/Intro_R_para_reinventarTEC_2021

Material para el taller de Primeros pasos en R para el análisis de datos

data-analysis rstats

Last synced: 10 Oct 2025

https://github.com/siddharthbadal/sql-case-studies-data-analysis

Data Analysis case studies on various databases using SQL . Demonstrating proficiency in solving diverse business problems. Projects cover sales, orders, products, finance, healthcare and other sectors, and highlight my ability to analyze complex datasets through SQL queries, data manipulation, and visualization techniques.

data-analysis sql sql-query sql-server sqlserver

Last synced: 08 Jan 2026

https://github.com/shadan100/stroke-prediction-analysis

A web based application to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

artificial-intelligence data-analysis data-science django django-framework jupyter-notebook machine-learning matplotlib pandas predictive-modeling python stroke-prediction web-application

Last synced: 08 Mar 2026

https://github.com/saeun-park/data-analysis

데이터 분석 프로젝트 및 공모전

anova-test data-analysis data-visualization statistics

Last synced: 21 Jan 2026

https://github.com/kamanhang/sqldatawarehousedataengineeringproject

This project delivers a modern data warehouse which focuses on building clean, organized data pipeline which covers important aspects such as ETL Pipeline Development, Data Cleaning, Data Modelling and Data Analytics

customer-analytics data-analysis data-cleaning data-engineering data-modeling data-pipeline data-visualization datascience etl-pipeline postgresql powerbi powerbidashboard sales-analysis sql

Last synced: 10 Oct 2025

https://github.com/sing-group/bew

Public repository for Biofilmfs Experiment Workbench (BEW).

aibench data-analysis data-management java jfreechart workbench

Last synced: 03 Jul 2025

https://github.com/gher-uliege/bluecloud-plankton

Spatial interpolation of plankton data using a neural network

data data-analysis data-visualization neural-network oceanography

Last synced: 30 Mar 2025

https://github.com/moindalvs/learn_eda_house_price_dataset

Data Set: House Prices: Advanced Regression Techniques Exploratory Data Analysis on more than 80 features

cardinality data-analysis data-science data-structures data-visualization missing-values

Last synced: 10 Oct 2025

https://github.com/cosmoduende/r-twitter

Explore your Twitter activity with R: Sentiment Analysis and Data Visualization. How to analyze your Twitter account (or any account), discover your habits and sentiments with the "rtweet" package and NLP.

data-analysis data-visualization lemmatization nlp nlp-library nlp-resources nltk nltk-library r-package r-programming r-studio rtweet stemming twitter twitter-api twitter-data twitter-data-analysis twitter-data-extraction twitter-sentiment-analysis udpipe

Last synced: 10 Oct 2025

https://github.com/willie-conway/datavista

DataVista is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training — all in a unified, professional-grade interface. Built with React and Recharts.

analytics-platform api-integration classification coingecko-api csv-import data-analysis data-cleaning-and-preprocessing data-pipeline data-science data-visualizations etl hypothesis-testing json-export machine-learning-models open-meteo react recharts regression statistics world-bank

Last synced: 30 May 2026

https://github.com/Zen204/airbnb-availability

A machine learning model that predicts Airbnb listing availability, utilizing feature engineering and supervised learning techniques to improve guest experience and optimize host management.

binary-classification data-analysis data-preprocessing data-visualization feature-engineering machine-learning matplotlib model-evaluation nlp pandas predictive-modeling python scikit-learn seaborn supervised-learning

Last synced: 02 Apr 2025

https://github.com/nafisalawalidris/building-a-clustering-model-for-customer-segmentation

Customer Segmentation Using Clustering: This repo applies clustering algorithms to a customer transaction dataset, grouping similar customers together based on their purchasing behavior. Targeted marketing strategies can be developed by analyzing distinct customer segments.

clustering customer-segmentation data-analysis data-visualization k-means machine-learning marketing-analytics unsupervised-learning

Last synced: 16 Mar 2025

https://github.com/kurosawaxyz/covid4eu-sorbonne

Economy: “Analysis of Labor Market decisions of men and women during the COVID-19 pandemic in the 4EU+ countries”.

covid-19 data-analysis data-science data-visualization pandas

Last synced: 04 Jul 2025

https://github.com/cyberoctane29/unicorn-companies-analysis

This project explores unicorn companies, private startups valued at over $1 billion, using Python for data analysis. It covers industry trends, geographic distribution, and investment patterns through EDA, including data cleaning, handling missing values, datetime transformations, and visualizations to uncover key insights.

data-analysis eda numpy pandas python

Last synced: 02 May 2026