An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sweta-kaundilya/911-calls-capstone-project

For this capstone project we will be analyzing some 911 call data from Kaggle.

data data-analysis data-visualization jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 28 Apr 2026

https://github.com/sweta-kaundilya/sql_projects_data_analytics

This repository contains SQL porfolio projects

data-analysis mysql-database mysql-workbench

Last synced: 10 Sep 2025

https://github.com/al-ogr/sf_pr2_job_analysis_hh_sql

SkillFactory DataScience PROJECT-2. Анализ вакансий из HeadHunter

data-analysis data-science ipynb plotly python sql

Last synced: 19 May 2026

https://github.com/lmuffato/jiboia

Jiboia is a Python package for automatically normalizing and optimizing DataFrames efficiently.

data-analysis data-science dataframe normalization pandas python

Last synced: 19 May 2026

https://github.com/mjshubham21/ny_yellow_taxi_python_da_project

A data analysis project of New York Yellow Taxi (Feb of 2025) using Python and its libraries for analytics like : NumPy, MatPlotLib, Pandas and Seaborn.

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 04 May 2026

https://github.com/jedrzej-wydra/competition-cooperation

Competition, cooperation, and parental effects in larval aggregations formed on carrion by communally breeding beetles Necrodes littoralis (Staphylinidae: Silphinae)

data-analysis non-linear-regression r

Last synced: 20 Aug 2025

https://github.com/abidshafee/google.colaboratory_projects

This repository contains the collections of interactive python notebooks (ipynb) that are some of my projects on Data Science, Machine Learning (ML), and Natural Language Processing (NLP).

colaboratory data-analysis data-science lstm machine-learning nlp statistics time-series

Last synced: 09 Jul 2025

https://github.com/joe-stifler/llm-sig-playground

This repository is a collaborative space for MSc Earth Science students at Imperial College London to experiment with and apply Large Language Models (LLMs) to real-world Earth Science problems. Follows below the persona playground link.

data-analysis earth-science llms machine-learning research-automation

Last synced: 29 Mar 2025

https://github.com/mansiikumarii/mysql

A curated collection of MySQL scripts covering DDL, DML, and DRL operations. Ideal for beginners to practice and understand core SQL concepts.

backend data-analysis data-modeling database database-integration database-management database-performance database-schema mysql mysql-admin mysql-database orm php-mysql query-optimization rdbms sql sql-query sql-script stored-procedure

Last synced: 19 May 2026

https://github.com/betkh/datascieneinpython

Jupiter Notebook files

data-analysis data-visualization

Last synced: 16 Jun 2025

https://github.com/the-pinbo/dimensionalityredux-pca-vs-autoencoders

Comparative study of PCA and Autoencoders for effective dimensionality reduction, assessed through PSNR and SSIM metrics.

autoencoder-mnist autoencoders data-analysis dimensionality-reduction image-compression mnist neural-networks pca psnr ssim

Last synced: 13 May 2025

https://github.com/julie-fliorko/rockbuster-insights-sql-project

Data analysis using PostgreSQL to help Rockbuster Stealth LLC identify revenue trends, customer insights, and rental behavior patterns.

data-analysis postgresql sql

Last synced: 22 Jul 2025

https://github.com/derogative404/google_data_analytics_capstone

Capstone project part of the Google Data Analytics Certificate Program

data-analysis excel r tableau

Last synced: 26 Mar 2025

https://github.com/atharvkadammm/suicide-prediction-system

A machine learning project predicting suicide risk based on multiple socio-economic and environmental factors using data mining techniques.

csv data-analysis data-science data-visualization datamining exploratory-data-analysis feature-engineering machine-learnin matplotlib mental-health numpy pandas riskassesment seaborn sklearn suicide-prediction supervised-

Last synced: 01 Jul 2025

https://github.com/qorah/vic-edu-housing-insights

Analysis of education outcomes and housing affordability in Victoria, Australia.

data-analysis jupyter-notebook

Last synced: 18 Mar 2025

https://github.com/jhrcook/protein-language-models

Experimenting with protein language model predictions

data-analysis protein-language-model variant-effect-prediction

Last synced: 28 May 2026

https://github.com/amishidesai04/interactive-data-visualisation-tool

A Java-based application leveraging JavaFX to create dynamic and interactive charts, including pie charts, bar charts, and line graphs. Ideal for visualizing various datasets, this tool offers customizable features and a user-friendly interface. Easily input and manage data, customize chart styles, and observe trends and patterns effectively.

charts data-analysis data-visualisation data-visualization-project gui java javafx visualization-tools

Last synced: 17 Apr 2026

https://github.com/ebrizzzz/data-visualization-project-using-tableau

A data visualization project for the Visual Data Analysis course (Spring Term 2025) at the University of Skövde. This project explores the factors influencing national happiness scores across different global regions from 2005 to 2022.

analytics data data-analysis data-science data-visualization python regression tableau

Last synced: 16 Jun 2025

https://github.com/iamsainikhil/us-births-analysis

Analysis of US-Births during 1994-2003 based on CDC-NCHS data set.

data-analysis python

Last synced: 16 May 2026

https://github.com/andrewzgheib/football-database-analysis

Football database utilizing PostgreSQL and Pandas for data management, with PowerBI for intuitive KPI visualization

data-analysis data-visualization database pandas pgsql postgr powerbi sql

Last synced: 04 Apr 2025

https://github.com/nerooc/device-downtime-detection

Repozytorium dotyczące projektu z przedmiotu "Sztuczne Sieci Neuronowe"

data-analysis detection-model recurrent-neural-networks

Last synced: 22 Mar 2025

https://github.com/timkong21/siemens-mobility-operations-industrial-engineer-simulation

Operations Industrial Engineer job simulation with Siemens Mobility. Includes time study analysis to identify assembly bottlenecks (Task 1) and a proposed layout redesign to improve efficiency without automation (Task 2).

data-analysis forage industrial-engineering job-simulation manufacturing process-improvement production-engineering python siemens time-analysis

Last synced: 19 May 2026

https://github.com/puspacempaka/superstore-analysis-with-sql

This repository showcases various data analyses on the popular Superstore dataset using SQL queries. The analyses cover a range of business insights, including sales performance, customer segmentation, and product profitability. Each analysis is documented with the SQL queries used and explanations of the steps involved.

business-intelligence data-analysis sales-analysis sql superstore-dataset

Last synced: 09 Mar 2026

https://github.com/lopez86/datascienceexamples

Examples of various data science & data analysis topics using various sources of data.

data-analysis data-science pandas scikit-learn tutorial visualization

Last synced: 13 Apr 2026

https://github.com/sharduljunagade/human-activity-recognition

This repository contains the code for the Assignment-1 of the course ES 335: Machine Learning 2024 at IIT Gandhinagar taught by Prof. Nipun Batra.

data-analysis data-collection decision-trees groq-api human-activity-recognition jupyter langchain-python machine-learning pandas prompt-engineering python sklearn tsfel

Last synced: 08 Apr 2026

https://github.com/tknishh/investing-platform

An investing platform application to help users get information and analyze various foreign currency assets. The investing platform uses an ETL pipeline to insert new batches of Forex data once a day.

data-analysis investing-platform pipeline

Last synced: 18 Mar 2025

https://github.com/nagar2nd/zomato-bangalore-analysis-tableau

Analysing restaurant data in Bengaluru to enhance customer satisfaction by optimizing the restaurant experience. The focus is on improving the popularity of different cuisines, enhancing delivery times, and boosting restaurant ratings. An interactive Tableau dashboard has been developed to help Zomato identify key areas for improvements.

data-analysis data-visualization tableau

Last synced: 05 Mar 2026

https://github.com/shubhamgoyal575/credit-card-fraud-detection

📌 Credit Card Fraud Detection using Machine Learning This project focuses on detecting fraudulent credit card transactions using machine learning models like Random Forest, XGBoost, and Deep Learning. The dataset is preprocessed to handle class imbalance, and multiple models are evaluated based on ROC AUC Score and F1 Score.

adaboost-classifier artificial-neural-networks credit-card-fraud data-analysis data-cleaning data-preprocessing data-science data-visualization deep-learning exploratory-data-analysis lightgbm machine-learning machine-learning-algorithms random-forest-classifer scikit-learn tensorflow xgboost

Last synced: 08 Feb 2026

https://github.com/gagan8605/zepto_sql_analysis

This project explores and analyzes the inventory data of Zepto, a rapidly growing 10-minute grocery delivery platform in India. The dataset contains over 3,000+ SKUs across key product categories such as Fruits & Vegetables, Dairy, Beverages, Packaged Foods, and more. The analysis was performed using PostgreSQL, covering both data cleaning and bus

cleaning-data data-analysis database-management postgresql sql

Last synced: 16 Jul 2025

https://github.com/swatisinghit/e-commerce-trend-analysis-for-target

An exploratory and in-depth study of the E-Commerce sales data for a Brazilian store using SQL.

bigquery data-analysis mysql sql

Last synced: 19 May 2026

https://github.com/amarlearning/exploring-the-evolution-of-linux

Data Analysis about the development of the Linux operating system by exploring its Git repository history.

cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux

Last synced: 12 May 2026

https://github.com/imnotamr/datasets-used

A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics

ai classification data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning models python regression-models

Last synced: 19 May 2026

https://github.com/mulukensholaye/spark_kafka_streaming_csv

Real-time streaming data analysis pipeline with integrating apache spark's streaming library to read records from kafka topic

apache-kafka apache-spark data-analysis python3 realtime-messaging

Last synced: 19 May 2026

https://github.com/syed-amjad-ali/airbnb-listing-analysis

Analyzing AirBnB listings in Paris to determine the impact of recent regulations

business-intelligence data-analysis jupyter-notebook maven-analytics python

Last synced: 19 May 2026

https://github.com/hawmex/aut_data_and_information_analysis_project

This repository contains the files of my project for the "Data & Information Analysis" course at AUT (Tehran Polytechnic).

data-analysis data-science k-means outlier-detection python

Last synced: 19 May 2026

https://github.com/brevex/hotel-booking-demand-data-analysis

Data analysis in Python of demand for urban hotels and resorts showing their causes and relationships

data-analysis data-science hotel-booking-analysis kaggle python

Last synced: 08 May 2026

https://github.com/devexpress-examples/wpf-pivotgrid-how-to-display-underlying-data

This example demonstrates how to obtain the records from the control's underlying data source for a selected cell or multiple selected cells.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 19 May 2026

https://github.com/samir-atra/share-lm_dataset_analysis

Analysis, studies and optimizations on the ShareLM extension dataset

data-analysis data-visualization gemma3n huggingface huggingface-transformers pandas

Last synced: 19 May 2026

https://github.com/nehul1149/olympic-data-analysis

This project is an interactive data visualization and analytics platform for exploring historical Olympic Games data. Built with Python and Streamlit, it offers an in-depth analysis of medal tallies, athlete statistics, and country-wise performance trends, providing users with powerful insights into the world's biggest sporting event.

analysis data-analysis data-science data-visualization matplotlib python streamlit

Last synced: 18 May 2026

https://github.com/prakshal0809/sql-data-analysis-project

This project involves analyzing pizza sales data using SQL to address various data analysis questions, providing essential foundational to advanced SQL knowledge.

data-analysis sql

Last synced: 26 Jun 2025

https://github.com/borjamome/radiografia-madrid

Análisis de Población, Economía y Sociedad de Madrid con R.

data-analysis data-visualization madrid r

Last synced: 17 Jun 2025

https://github.com/singingsandhill/data_analysis

데이터 분석_개인 프로젝트 정리

data-analysis python

Last synced: 19 May 2026

https://github.com/shriansh8619/sql_eda

Explored relational databases using SQL to perform comprehensive Exploratory Data Analysis (EDA), covering database exploration, segmentation, trend analysis, and performance ranking. Developed reusable SQL scripts to analyze dimensions, measures, and time-based metrics, helping uncover key business insights.

data-analysis exploratory-data-analysis mysql

Last synced: 20 Aug 2025

https://github.com/thecoderpinar/globalwarmingforecast

🌍 Global Warming Forecast Tool An advanced tool for analyzing and forecasting climate trends using ARIMA and Prophet models, with interactive visualizations and scenario simulations.

arima climate-change data-analysis environmental-science forecasting global-warming machine-learning prophet streamlit time-series-analysis visualization

Last synced: 27 Mar 2025

https://github.com/kushagrakumar04/visual-age-distribution

A Bar chart or histogram to visually depict the distribution of a categorical or continuous variable, such as the age distribution or gender composition within a population. This graphical representation provides a clear and insightful overview of the data's patterns and trends.

data-analysis data-science google-colab

Last synced: 21 Jun 2025

https://github.com/myriamba/neuraview

AI-Powered Data Insights and Visualization Generator

data-analysis data-engineering data-insights data-visualization generative-ai llm

Last synced: 21 Aug 2025

https://github.com/sukhitashvili/pca_tutorial

PCA algorithm from scrach, using only matrix-vector multiplications

data-analysis data-science data-visualization machine-learning-algorithms pca

Last synced: 29 Mar 2025

https://github.com/samukiszhsd/alteryx-analytics

Você está trabalhando com dados de transações bancárias do Itaú e precisa fazer algumas análises para ajudar o time de auditoria a detectar padrões incomuns e possíveis transações suspeitas.

alteryx data-analysis data-structures data-visualization etl workflow

Last synced: 18 Feb 2026

https://github.com/prady2309/stock-analysis

Analysis on the stock prices of Apple, Google, Microsoft and Amazon

data-analysis data-science data-visualization python stock-market

Last synced: 19 May 2026

https://github.com/eve-ning/ppshift

Analyzes maps and scores from 2015

data-analysis data-mining osu osugame

Last synced: 13 Feb 2026

https://github.com/saroshfarhan/irish_hospital_data_anaysis

Irish hospital's patient discharge data for four counties analysis

data-analysis data-science data-visualization healthcare irish-data r-programming-language

Last synced: 18 Feb 2026

https://github.com/sebastianurdaneguibisalaya/colocaciones-de-credito-fondo-mivivienda-peru

Exploro las Colocaciones de Crédito del Fondo MIVIVIENDA S.A. entre 2018 y 2022, con un conjunto de datos descargado del Portal Nacional de Datos Abiertos del Perú. 🏠

data-analysis jupyter-notebook python

Last synced: 24 Feb 2025

https://github.com/darshan1924/house-price-pridiction

This repository contains a machine learning project for predicting house prices based on various features, including geographical coordinates. The project includes data preprocessing steps to handle# House Price Prediction Project

data-analysis data-preprocessing house-prices jupyter-notebook machine-learning prediction

Last synced: 27 Mar 2025

https://github.com/mosalem149/pythonutilities

A collection of Python scripts for common utility tasks including file manipulation, word counting, longest word detection, and grade categorization. Perfect for quick and easy solutions to everyday programming problems.

data-analysis educational-tools file-io file-manipulation grade-calculation python text-analysis text-processing utility word-counting

Last synced: 15 May 2026

https://github.com/beyzabasarir/northwind-traders-analysis

Northwind dataset analysis using PostgreSQL, Python, and Power BI. Focused on sales, customers, shipping, and performance insights.

dashboard data-analysis data-visualization jupyter-notebook matplotlib numpy pandas postgresql powerbi python seaborn

Last synced: 10 Apr 2026

https://github.com/parthkumarmpatel/sql-exploratory-data-analysis

SQL EDA scripts for sales data warehouse — metrics, insights, and rankings from my data warehouse project.

data-analysis exploratory-data-analysis sql-server

Last synced: 26 Jun 2025

https://github.com/amr-yasser226/interactive-sales-analytics-dashboard

An interactive web-based dashboard for visualizing multinational electronics sales data. This project for the DSAI 203 course integrates a Python/Flask backend with an amCharts frontend to provide dynamic insights into product revenues, sales distribution, and employee statistics across different countries.

am5charts amcharts business-intelligence css dashboard data-analysis data-analytics data-visualization flask html javascript python sqlalchemy sqlite web-application

Last synced: 13 Apr 2026

https://github.com/xenon1919/credit-card-fraud-detection

Credit Card Fraud Detection is a machine learning project to predict fraudulent credit card transactions. It handles imbalanced data using undersampling and applies Logistic Regression and XGBoost models. With an AUC of 0.98, it offers robust fraud detection. Includes a Streamlit app for real-time predictions.

data-analysis machine-learning python

Last synced: 14 May 2026

https://github.com/spshah1701/world-development-indicators

Analysis of World Development Indicators (WDI) using big data technologies, specifically Databricks, Apache Spark, and Scala.

apache-spark big-data data-analysis spark-sql

Last synced: 17 Mar 2025

https://github.com/adeebkhan25/dataset_suicide_susceptible

The "Student Suicide Risk Factors Dataset" is a comprehensive collection of data aimed at understanding and mitigating the factors contributing to student suicides.

data-analysis dataset machine-learning supervised-learning

Last synced: 24 Dec 2025

https://github.com/alimiheb/advwokcube-analysis

A comprehensive SSAS cube project based on AdventureWorksDW2019, featuring data cleaning, multidimensional modeling, and visualizations in Power BI and Excel.

adventureworks data-analysis excel powerbi sql-server ssas-multidimensional visualization

Last synced: 26 Jun 2025

https://github.com/kaushik-puttaswamy/amazon-sales-dashboard-using-tableau

The Amazon Sales Data Analysis Dashboard provides insights into key sales metrics like profit, revenue, shipment days, and units sold. It includes visualizations to assess performance by region, country, and sales channel. The dashboard helps stakeholders optimize strategies and improve profitability through data-driven analysis.

dashboard data-analysis data-visualization tableau

Last synced: 11 Jan 2026

https://github.com/nivasharmaa/friskwatch

A Java program for analyzing stop-and-frisk data from the NYPD. Features data import, organization, and statistical analysis to compare occurrences during and after policy implementation.

data-analysis data-visualization dataprocessing datascience file-io java java-oop nypd-data

Last synced: 19 May 2026

https://github.com/pramodkondur/dataspark-end-to-end-dataanalytics

Cleaned, performed EDA and stored data in MySQL. Queried, and analyzed data, uncovering opportunities to drive revenue growth and optimize operations, with a potential revenue growth of $30.03 million. Reported key insights using Power BI.

data-analysis data-visualization eda powerbi python sql

Last synced: 21 May 2026

https://github.com/shellynagar27/marketing-content-performance-analysis

Analyzed 2024 social media campaign data from TikTok, Instagram, LinkedIn, and X.com using Power BI to uncover performance trends across platforms, content types, and regions. Built an interactive dashboard to drive insights on engagement, optimal posting times, and content strategy.

data-analysis data-modelling data-visualization excel figma marketing-analytics powerbi powerquery wireframing

Last synced: 26 Jun 2025

https://github.com/aidan-zamfir/advt-analysis

Web scrapping project. Will eventually use character/episode data for NLP & networking/ data analysis .

data-analysis nlp python selen webscraping

Last synced: 23 Aug 2025

https://github.com/ritap03/neuralnetwork-shapeclassifier

Feedforward neural network system in MATLAB for geometric shape classification. Includes data preprocessing, network training and evaluation, confusion matrix analysis, and a graphical interface for user interaction and model testing.

ai data-analysis deep-learning feedforward-network gui image-classification machine-learning matlab neural-network pattern-recognition

Last synced: 14 May 2026

https://github.com/kevingastelum/mydataanalysis

My DataAnalyst Projects | Python, SQL, Excel, PowerBI & Tableau

data-analysis python sql visualization

Last synced: 20 May 2026

https://github.com/kevin-rsj/sectores_economicos_covid-19

Análisis Exploratorio de Datos (EDA): Comportamiento de Sectores Económicos antes, durante y después de la Pandemia de COVID-19 (2019-2022)

data-analysis financial-analysis pandemic-analysis python stock-market time-series visualization yahoo-finance

Last synced: 20 May 2026

https://github.com/ryuzen6/kaggle-series

This is a series of Machine Learning/Deep Learning Models made for practice.

artificial-intelligence data-analysis data-science deep-learning machine-learning python3

Last synced: 20 May 2026

https://github.com/prince-pastakiya/human-resources-tableau-project

👥 Interactive Tableau dashboard for HR analytics — includes workforce overview, demographics, income analysis, and detailed employee records with full filtering.

chatgpt data-analysis data-visualization human-resources numpy python python-faker tableau-dashboards tableau-public

Last synced: 18 Apr 2026

https://github.com/ljadhav25/knn-algorithm-data-science-

This repository contains a project demonstrating the implementation and application of the K-Nearest Neighbors (K-NN) algorithm in Data Science. The objective is to provide a comprehensive understanding of the K-NN algorithm, including data preprocessing, model training, evaluation, and visualization of results. This project is ideal for beginners

data-analysis data-science knn-classification machine-learning matplotlib-pyplot numpy pandas-library seaborn

Last synced: 16 Apr 2026

https://github.com/who-else-but-arjun/isro_xrf_sr

Source Codes for super resolution of the lunar elemental abundance map using a semi-supervised deep spatial interpolation model. This hybrid approach combined ResNet50 for spatial feature extraction with Graph Neural Network (GATv2Conv) layers and Convolutional Neural Networks (CNNs), followed by fusion layers.

cnn data-analysis graph-neural-networks pytorch semi-supervised-learning spatial-interpolation super-resolution

Last synced: 30 Apr 2026