An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/vishnu-vamshii/layoffs-data-analysis-in-sql

This project focuses on the cleaning and exploratory analysis of a dataset containing layoff information. It includes data deduplication, standardization of columns, handling null and blank values, and analyzing layoffs by company, industry, country, and date. Various SQL queries are used to explore trends and patterns in layoffs over time.

data-analysis eda mysql

Last synced: 15 Jul 2025

https://github.com/viper373/lol-dataanalytics

腾讯游戏-英雄联盟赛事20/21/22年数据综合分析预测

crawler-python data-analysis jupyter-notebook lol python spider

Last synced: 15 Jul 2025

https://github.com/shrutiijoshi/airbnb-listing-reviews

Airbnb is an online marketplace that connects people who want to rent out their homes with travelers seeking accommodations.

data-analysis matplotlib-pyplot pandas-python python seaborn

Last synced: 17 May 2026

https://github.com/gagan8605/zepto_sql_analysis

This project explores and analyzes the inventory data of Zepto, a rapidly growing 10-minute grocery delivery platform in India. The dataset contains over 3,000+ SKUs across key product categories such as Fruits & Vegetables, Dairy, Beverages, Packaged Foods, and more. The analysis was performed using PostgreSQL, covering both data cleaning and bus

cleaning-data data-analysis database-management postgresql sql

Last synced: 16 Jul 2025

https://github.com/amr-yasser226/interactive-sales-analytics-dashboard

An interactive web-based dashboard for visualizing multinational electronics sales data. This project for the DSAI 203 course integrates a Python/Flask backend with an amCharts frontend to provide dynamic insights into product revenues, sales distribution, and employee statistics across different countries.

am5charts amcharts business-intelligence css dashboard data-analysis data-analytics data-visualization flask html javascript python sqlalchemy sqlite web-application

Last synced: 13 Apr 2026

https://github.com/maheera421/pandas

Implementation of essential Pandas functions.

data-analysis data-manipulation pandas-dataframes pandas-datareader pandas-python

Last synced: 17 Jul 2025

https://github.com/venkat-023/thyroid-cancer-prediction

This project aims to develop a machine learning pipeline to predict thyroid cancer based on patient data. The dataset was sourced from multiple public repositories, cleaned, and merged to create a comprehensive dataset for modeling. Various classification algorithms were implemented, including Random Forest, Logistic Regression, K-Nearest Neighbors

data-analysis data-cleaning deep-learning ensembling-methods hyperparameter-tuning machine-learning-algorithms nueral-networks

Last synced: 17 May 2026

https://github.com/ofir-frd/predict-success-of-a-restaurant

Apply machine learning on a restaurante database. Study and analyse the data for prediction of a successful restaurant.

data-analysis data-science machine-learning visualization

Last synced: 11 Jun 2026

https://github.com/nikbarb810/covid_growth_rate_390.51

Exploring Covid Growth Rate of European Population using genetic data analysis

bioinformatics data-analysis r rcpp

Last synced: 07 Apr 2026

https://github.com/ahmeddhus/exploring-football-data-analysis

Learning and exploring data analysis through real-world datasets using Python and StatsBomb APIs and mlpsoccer library

data-analysis jupyter-notebook mplsoccer python statsbomb

Last synced: 17 May 2026

https://github.com/cyblx/clustering

This project explores clustering techniques and supervised learning applied to World Cup team performance analysis. The methodologies include K-Means, DBSCAN, K-Nearest Neighbors, Gaussian Mixture Models (GMM), and Agglomerative Clustering.

clustering data-analysis dbscan gmm kmeans supervised-learning unsupervised-learning world-cup

Last synced: 18 Jul 2025

https://github.com/theveryhim/basic-data-analysis

Working with basic Python tools frequently used in data science

data-analysis data-processing visualization

Last synced: 18 Jul 2025

https://github.com/theveryhim/web-scraping-and-statistical-tests

Crawling web for data and perform statistical tests to verify judgments

data-analysis hypothesis-testing web-scraping

Last synced: 18 Jul 2025

https://github.com/theveryhim/massive-text-processing

cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework

big-data data-analysis frequent-itemsets massive-datasets pyspark text-preprocessing

Last synced: 18 Jul 2025

https://github.com/abhinavhariyal/diwali-sales-analysis

This project is based on data visualization and analysis using python and jupyter notebook on the data for diwali sales.

data-analysis data-visualization jupyter python

Last synced: 19 May 2026

https://github.com/amyanchen/sf-airbnb

Exploratory Data Analysis of San Francisco Airbnb's

data-analysis data-science data-visualization r rmarkdown statistics

Last synced: 18 Jul 2025

https://github.com/fer-aguirre/cookiecutter-data-analysis-extensive

A cookiecutter template for data analysis projects using Python.

cookiecutter data-analysis project-template python

Last synced: 09 Apr 2025

https://github.com/alexjackson1/commons-indicative-votes

A cluster analysis of the House of Commons' Indicative Brexit Voting Process on 27 Match 2019

data-analysis politics r

Last synced: 19 Jul 2025

https://github.com/vetrivel07/flight-price-prediction

Developed a flight price prediction model using Python, analyzing historical data to forecast airfare prices and help travelers make informed booking decisions

data-analysis data-visualization jupyter-notebook numpy pandas python

Last synced: 15 Jun 2025

https://github.com/mh0386/motorcycle_data_analysis

Data analysis applied to motorcycle dataset.

data-analysis

Last synced: 19 Jul 2025

https://github.com/buildwithlal/introduction-to-data-science-in-python-coursera

introduction to data science in python, part of Applied Data Science using Python Specialization from University of Michigan offered by Coursera

data-analysis matplotlib numpy pandas

Last synced: 03 May 2026

https://github.com/malexandersalazar/tools-python-mssql-statistics-descriptor

A lightweight tool based on sweetviz that generates high-density visualizations to kickstart Exploratory Data Analysis within Microsoft Azure SQL Database using ODBC with just one line of code

azure-sql-database data-analysis data-visualization eda python

Last synced: 16 May 2026

https://github.com/preciousclement/maternal-experiences-in-nigeria

This repository contains a Python-based project that generates realistic synthetic data simulating the maternal health journey of 5,000 women in Nigeria.

data-analysis data-generation maternal-health nigeria public-health python

Last synced: 08 May 2025

https://github.com/jm199504/data-analysis-practice

数据分析练习(Titanic / BankCustomers)

data-analysis python

Last synced: 02 May 2026

https://github.com/sharoonjoseph321/social_media_eda

Data Analysis on social media apps ,using pandas, python, matplotlib.

data data-analysis data-science data-visualization matplotlib programming-language project python pythonprojects

Last synced: 03 Mar 2025

https://github.com/josafary-ds/curso_dnc

Repositório para armazenamento dos arquivos de estudo e projetos DNC - Cientista de Dados

data-analysis data-science data-visualization machine-learning powerbi python

Last synced: 13 Mar 2025

https://github.com/dsite42/simple_data_visualizer

This is a simple tool to visualize data for a quick Exploratory Data Analysis (EDA). You can create various plot types as seaborn or plotly plot via a GUI in multiple windows (RelPlot, PairPlot, JointPlot, DisPlot, CatPlot, LmPlot, 3DPlot).

data-analysis data-science data-visualisation data-visualization eda exploratory-data-analysis plotly seaborn

Last synced: 12 May 2026

https://github.com/cassiofb-dev/fide-rating-analysis

The plot speaks for itself

chess data-analysis fide hans rating

Last synced: 15 Jun 2025

https://github.com/kineticloom/plydb-fun-nfl-analyst

Analyze NFL data with your AI agent

data-analysis football-analytics nfl

Last synced: 15 May 2026

https://github.com/iamber12/stack-overflow-analysis-using-stack-exchange-api

This Python-based project utilizes the Stack Exchange API to analyze StackOverflow data, focusing on the 'R' and 'Dot Net' programming tags.

data-analysis data-visualization python stack-exchange-api

Last synced: 20 Jul 2025

https://github.com/srummanf/elnino-anomaly-study

Study on El Niño’s impact on Chennai groundwater sustainability

data-analysis machine-learning python satellite-imagery-analysis

Last synced: 15 May 2026

https://github.com/leoz0214/foodhygieneanalysis

Data analysis regarding Food Hygiene Ratings in England, Wales and Northern Ireland.

data-analysis food-hygiene-ratings pandas python

Last synced: 17 May 2026

https://github.com/dhruvil-26/sql-projects

This repository contains SQL projects focusing on data analysis and insights. Currently, it includes: 1. RSVP Movies Analysis - SQL queries to analyze movie trends, ratings, and genres. 2. Pizza Sales Analysis - SQL queries to explore sales patterns, customer behavior, and profitability in a pizza business.

analysis data-analysis database mysql pizza-sales-analysis rdbms rsvp sql

Last synced: 17 May 2026

https://github.com/fatihilhan42/wnba-draft-player-dataanalysis-1997-2022-with-python

In this project, the statistics of the players in the WNBA drafts from 1997 to 2022 were examined. The data in the dataset, which you can find in the repo, was first organized using data cleaning algorithms. These cleaned data were then graphically extracted using data visualization algorithms.

data-analysis data-analysis-python data-visualization jupyter-notebook python

Last synced: 17 May 2026

https://github.com/aelmah/ibm-applied-ds

Find here : A collection of projects I've done throught Applied DS Specialization !

applied-data-science-capstone beautifulsoup data-analysis data-visualization machine-learning python-for-ai-and-data-science web-scraping

Last synced: 11 Sep 2025

https://github.com/marcosvbras/udacity-nd109-project-titanic

Data Analysis project to Udacity Nanodegree's course: Artificial Intelligence Programming with Python.

data-analysis data-analyst-nanodegree data-science jupyter-notebook machine-learning python udacity

Last synced: 19 May 2026

https://github.com/deliprofesor/2024-salary-analysis-for-machine-learning-engineers

This project analyzes a salary dataset to explore factors like experience, company size, remote work ratio, and country. It includes data cleaning, group analysis, visualizations, and machine learning models (linear regression and Random Forest) to predict salaries and identify key features.

data-analysis data-cleaning data-visualization ggplot2 linear-regression machine-learning plotly r-programming random-forest salary-prediction salary-trends

Last synced: 07 Mar 2026

https://github.com/wesleych3n/my-work-log

A self project to record and analyze work's check in/out time on google sheet with telegram bot.

data-analysis telegram-bot worklog

Last synced: 20 Jul 2025

https://github.com/sangampaudel530/bhutan-rainfall-explorer

Interactive dashboard to explore, analyze, and forecast rainfall trends in Bhutan (2021–2025) using Streamlit, Plotly, and Prophet.

bhutan climate-change data-analysis prophet-facebook rainfall-prediction streamlit visualization

Last synced: 17 May 2026

https://github.com/saksham-jain177/data-analysis

A collection of data analysis and machine learning projects across various datasets. Explore predictive modeling, data visualization, and insights from real-world data. Projects include sales predictions, disease detection, customer segmentation, and more.

api data data-analysis data-cleaning data-science data-visualization datamodeling dataset datasets exploratory-data-analysis python python3 web-scraping youtube-api

Last synced: 01 May 2026

https://github.com/12danielll/neurogenomics_project

This project focuses on analyzing sequencing data to understand molecular mechanisms of neurological diseases and predict the effectiveness of immunotherapy in breast cancer patients. It integrates Python and R scripts for data processing, statistical analysis, and visualization, alongside a comprehensive report detailing methods and findings.

bioinformatics biostatistics clustering clustering-algorithms data-analysis data-visualization deseq2 differential-gene-expression functional-analysis immune-therapy machine-learning neurological-disease neuroscience pca-analysis python r seurat single-cell-analysis

Last synced: 06 Apr 2026

https://github.com/arv-anshul/pw-experience-portal

Data Analysis on PW Skills and Ineuron.ai experience/internship portal.

data-analysis experience ineuron-ai internship physics-wallah portal pw-skills python3

Last synced: 16 Apr 2026

https://github.com/pentalpha/eu-car-emissions-analysis-2015

Analysis of CO² Emissions on Passenger Cars at the E.U. Contries, Year 2015.

data-analysis data-science dataset jupyter-notebook python python3

Last synced: 15 May 2026

https://github.com/surajsanap/employee-resigning-analysis-powerbi-dashboard-data-analytics

Effortlessly analyze employee resignations with our concise Power BI dashboard. Download the XML file, open the dashboard, and gain quick insights into resignation trends and reasons for departure. Streamlined and effective

dashboard data-analysis data-analytics powerbi python xml-dataset

Last synced: 08 May 2025

https://github.com/pentalpha/bti-performance-study

A series of analysis on a large amount of data about the grades of students in the Technology Information course at UFRN

analysis big-data clustering data-analysis data-science data-visualization ipynb ipython jupyter-notebook performance-analysis plot python python3

Last synced: 15 May 2026

https://github.com/mtholahan/advanced-mysqlquery-tuning-mini-project

Analyzed EuroCup 2016 data with advanced SQL queries. Imported CSV datasets into MySQL, designed schema with match, player, and referee details, and implemented queries covering match outcomes, penalty shootouts, player stats, bookings, substitutions, and referee activity to explore tournament dynamics.

bootcamp data-analysis data-engineering data-modeling database eurocup football mysql queries soccer sports springboard sql

Last synced: 15 May 2026

https://github.com/dionixius7/titanic-disaster-ml-model

This project predicts the survival of passengers on the Titanic by using Kaggle Titanic Disaster Dataset. The dataset contains information related to passengers, such as age, gender, and class. Different machine learning algorithms have been applied for this predictive model to accomplish an accurate prediction that will define the survival chances

data-analysis data-science data-visualization eda knn-classifier machine-learning neural-network python scikit-learn svm tensorflow titanic-kaggle titanic-survival-prediction

Last synced: 07 Feb 2026

https://github.com/anandanraju/sql_data_analysis_projects

About This Two projects involves analyzing Pizza Data & Walmart Sales data using SQL to identify insights and trends. The aim is to do data-driven approaches to understand sales performance, identify key factors influencing sales, and provide actionable recommendations for business improvement.

csv data-analysis data-management mysql pizza sql sql-schema walmart

Last synced: 24 Jun 2025

https://github.com/monish-nallagondalla/cement_strength_prediction

The Cement Strength Prediction project uses machine learning to predict the compressive strength of cement based on its components, such as Cement, Fly Ash, Water, Superplasticizer, Coarse Aggregate, Fine Aggregate, and Age. The goal is to forecast compressive strength (MPa) for optimized cement production and quality control.

cement-strength-prediction construction-industry data-analysis data-preprocessing data-science data-visualization feature-engineering machine-learning predictive-modeling python regression-analysis scikit-learn

Last synced: 11 May 2026

https://github.com/nitins17/tableauvisualizations

Visualizations I created while learning to work with Tableau

data-analysis data-science data-visualization tableau visualization

Last synced: 01 Mar 2026

https://github.com/edumoraes1/journey_active_users

Segmentação de base via SQL para jornada de vendedores ativos

bq data-analysis salesforce sql

Last synced: 02 Feb 2026

https://github.com/brownred/python-and-sql

Python and SQL (postgreSQL & mySQL) for data analysis.

data-analysis databases python3 sql

Last synced: 11 May 2026

https://github.com/collins-kimotho/communicate-data-findings

Data Analysis Project: Investigating Factors Contributing to No-Show Appointments in Medical Records

data-analysis data-science data-visualization dataset pandas python

Last synced: 17 May 2026

https://github.com/gemaquejr/restaurant-orders

Projeto com o objetivo de aplicar os conceitos de POO e trabalhar com Set, Hashmap e Dict. Este projeto foi criado para avaliação final na seção 06 do módulo de ciência da computação do Curso de Desenvolvimento Web na Trybe.

data-analysis dict hashmap poo python set

Last synced: 30 Oct 2025

https://github.com/eesunmoon/genai_cor-recom

[Project] Outfit Coordination Recommender System using KoAlpaca

data-analysis fine-tuning generative-ai huggingface keyword llm numpy pandas python python3 selenium

Last synced: 06 Apr 2026

https://github.com/satvikpraveen/pcc-vizforge

🎨 Personal data visualization toolkit generating synthetic datasets across multiple domains (random walks, dice simulations, weather patterns, earthquakes, GitHub analytics) with beautiful Matplotlib & Plotly visualizations. Includes Jupyter notebooks, interactive dashboards & statistical analysis. Perfect for learning data science! 🚀📊

analytics dashboard data-analysis data-generation data-science data-visualization github-analytics interactive-visualization jupyter-notebook matplotlib plotly probability python random-walk scientific-computing seismology statistical-analysis synthetic-data time-series weather-data

Last synced: 17 May 2026

https://github.com/tinaland101/python-api-challenge

This project involves analyzing weather data from cities around the world using the OpenWeatherMap API and creating visualizations to explore the relationship between weather variables and latitude.

api-integration-and-data-retrieval data-analysis data-collection-and-geospatial-analysis problem-solving-and-decision-making statistical-analysis

Last synced: 03 Mar 2025

https://github.com/nishumehta/house-sales-analysis

House Sales Analysis Dashboard for King County, Washington, built with Tableau. Features interactive charts and maps to explore sales patterns, price distributions, and property conditions.

dashboard data-analysis data-visualization tableau tableau-dashboards tableau-public

Last synced: 11 Jan 2026

https://github.com/arv-anshul/pw-api

Perform data analysis on PW Skills APIs. Made a web app using streamlit. See any course syllabus, analytics, quizzes and assignments.

api course data-analysis ineuron-ai physics-wallah project pw-skills python3 streamlit

Last synced: 18 Apr 2026

https://github.com/danicaalana/sales-review-sentiment-analysis

This project is a sentiment analysis project using a machine learning model. It analyzes Amazon product reviews to determine whether the sentiment expressed is positive, negative, or neutral using Multinomial Naive Bayes Method.

amazon data-analysis data-science machine-learning naive-bayes python sales-review sentiment-analysis

Last synced: 15 May 2026

https://github.com/lauratrigo/codigo_roti

Análise de ROTI é uma ferramenta em MATLAB para processar e visualizar dados ionosféricos (ROTI) de múltiplas estações GNSS. Desenvolvido para pesquisas em geofísica espacial, o script gera gráficos temporais comparativos com filtros de qualidade e tratamento de dados faltantes. 📡

data-analysis geophysics image-processing matlab roti scientific-initiation

Last synced: 24 Jun 2025

https://github.com/sadratehranian/prediction-of-covid-19-diagnosis

Build an algorithm in MATLAB using ML techniques to predict if a person is having COVID-19 or not depending on the existing medical conditions. Further research has been conducted on identifying the most suitable machine learning techniques and increase their prediction accuracy.

covid-19 data-analysis data-science data-visualization machine-learning matlab prediction visualization

Last synced: 11 Sep 2025

https://github.com/mahdikh03/custumers_clustering_rmf

A data analysis project to implement RFM (Recency, Frequency, Monetary) analysis for customer segmentation and behavior analysis using the K-Means algorithm.

customer-segmentation data-analysis k-means-clustering unsupervised-learning

Last synced: 09 May 2025

https://github.com/nishumehta/uber-rides-data-analysis

An in-depth analysis of Uber ride data for the year 2016, to uncover patterns in ride behavior, mileage trends, and frequent start locations to generate actionable insights for business decisions.

data-analysis jupyter-notebook matplotlib-pyplot pandas python tableau-dashboards

Last synced: 09 May 2026

https://github.com/niaid/genetic-linkage-analysis

Materials for ACE course on Genetic Linkage Analysis.

ace ace-uganda2020 analysis bcbb-training clinical data-analysis genetics ngs ngs-analysis

Last synced: 24 Jun 2025

https://github.com/syarwinaaa09/exploring-airbnb-market-trends

a data analysis project exploring NYC Airbnb listings, using data visualization and pandas for price trends, room types, and reviews.

airbnb data-analysis data-science data-visualization jupyter-notebook new-york-city nyc pandas price-analysis reviews room-types

Last synced: 30 Apr 2026

https://github.com/muneeb1030/webscrapper_altnews

The project utilizes a combination of Python, Scrapy, and Selenium to navigate through the dynamic content of AltNews.in and collect valuable information for analysis and verification.

data-analysis data-collection python3 scrapy scrapy-spider selenium selenium-python

Last synced: 17 May 2026

https://github.com/macorisd/instagram-fake-account-analysis

A project in R focused on detecting fake Instagram accounts. It includes exploratory data analysis, data visualization, and analysis using three techniques: association rules, formal concept analysis, and regression. The results are presented in an interactive Quarto book.

data-analysis data-science data-visualization r

Last synced: 10 Jun 2025

https://github.com/eslamdyab21/apara-data-gui

Custom application for Apara's data wrangling scripts, Technologies used are Qt-designer, PyQt5 for the GUI and Pandas, Numpy for the data work.

csv data data-analysis data-wrangling gui pandas pyqt5-desktop-application qt5-gui

Last synced: 17 May 2026

https://github.com/smsraj2001/sds-datathon

A simple data science project/hackathon done as part of SDS course

data-analysis data-analysis-python data-cleaning data-science statistics statistics-for-data-science

Last synced: 16 Jul 2025

https://github.com/shaikh-raj/data-science-portfolio

Data Science Portfolio of Raj Shaikh including Case Studies and Articles that I have completed that solve various business problems.

articles case-study data-analysis deep-learning machine-learning nlp statistics

Last synced: 20 Jul 2025

https://github.com/arction/lcjs-example-0507-dashboardfiberanalysis

A demo application showcasing using LightningChart JS to visualize fiber analysis data.

area-plot area-series chart charts dashboard data-analysis demo heatmap javascript lcjs lightningchart-js performance visualization webgl

Last synced: 12 Mar 2025

https://github.com/soumasish2005/ai-chatbot-using-snowflake

This project is a Streamlit application that allows users to upload a CSV file and ask questions about their data in natural language.

cloud data-analysis data-science data-visualization python snowflake streamlit

Last synced: 17 May 2026