An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/nxion/sql-data-warehouse-project

Building a modern data warehouse with MS SQL server, ETL processes, data modeling and analyitics.

data data-analysis data-analytics data-engineering data-lakehouse data-warehouse datalake datascience etl etl-job medallion-architecture ms mssql sql sql-query sql-server

Last synced: 05 Jun 2026

https://github.com/maddieemihle/home_sales

A PySpark-powered analysis of real estate trends using home sales data. This project explores average prices by year, room configuration, and property features, while demonstrating SparkSQL, caching, and partitioning techniques in a scalable data pipeline—all within Google Colab

apache-spark caching data-analysis googlecolab parquet pyspark sparksql

Last synced: 21 Apr 2026

https://github.com/mhuwaimel/data-analysis-of-students-results-in-qiyas

Analysis of student performance data from Qiyas (قياس), the Saudi Arabian National Center for Assessment

data-analysis jupyter-notebook python

Last synced: 22 Apr 2026

https://github.com/kgelli/apple-data-analysis---apache-spark

Modular ETL pipeline for analyzing Apple product purchase patterns using Apache Spark on Databricks with factory design patterns.

apache-spark data-analysis databricks delta-lake etl-pipeline factory-pattern pyspark

Last synced: 22 Apr 2026

https://github.com/thinogueiras/jornada-python

Jornada Python - Hashtag Programação.

data-analysis data-science inteligencia-artificial python rpa

Last synced: 22 Apr 2026

https://github.com/leabrodyheine/california-schools-data-visualization

This front-end project provides interactive visualizations of learning models adopted by California schools during the pandemic. Using D3.js and Mapbox, it dynamically presents data through bar charts, bubble charts, heatmaps, and geographic maps, allowing users to explore trends across school types, sizes, and districts.

d3-visualization d3js data-analysis data-visualization mapbox openai plotly

Last synced: 22 Apr 2026

https://github.com/ayushi-gajendra/buenos-aires-subway-statistics

A comprehensive data analysis of the Buenos Aires subway system ridership using Python and Pandas. This project identifies peak-hour congestion patterns, explores hourly passenger distributions, and utilizes the 95th percentile to isolate extreme traffic conditions for urban mobility insights.

95th-percentile buenos-aires data-analysis data-science-portfolio data-visualization matplotlib pandas python statistical-analysis subway-ridership transit-data urban-mobility

Last synced: 05 Jun 2026

https://github.com/floffah/my-listening

Various ways to analyse your Spotify extended streaming history data

convex data-analysis listening-history spotify

Last synced: 23 Apr 2026

https://github.com/al-ogr/sf_pr1_job_analysis_hh

SkillFactory DataScience PROJECT-1. Анализ резюме из HeadHunter

data-analysis data-science ipynb plotly python

Last synced: 23 Apr 2026

https://github.com/thc1006/nycu_timtable_crawler

🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.

academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping

Last synced: 24 Apr 2026

https://github.com/shudhanshurp/adidas-us-data-analysis

This Power BI project analyzes Adidas sales data across different regions, retailers, and product categories in the U.S. The dashboards provide insights into sales performance, operational metrics, and future forecasts to support data-driven decision-making.

data-analysis data-transformation data-visualization forecasting powerbi python retail-analytics

Last synced: 24 Apr 2026

https://github.com/henriquetourinho/s.i.g.m.a

Plataforma de busca e análise de arquivos para Linux, com GUI avançada em PySide6 e foco em metadados ricos para investigações profundas.

data-analysis developer-tools file-search metadata open-source pyqt pyside6 python python-brasil qt6 sysadmin-tools

Last synced: 24 Apr 2026

https://github.com/voidnire/redditviralmysteryposts

Análise de posts de subreddits de mistério. O que define um post viral neste tipo de sub?

data-analysis data-visualization mysteries mystery nlms python-3 reddit

Last synced: 24 Apr 2026

https://github.com/muthukumar0908/youtube-data-harvesting-and-warehousing-using-sql-mongodb-and-streamlit

Create a simple and intuitive user interface using Streamlit, From the youtube getting and extracting the data by using API key. That data stored in database.

data-analysis mongodb-atlas python sqldatabase streamlit-webapp youtube-api

Last synced: 24 Apr 2026

https://github.com/edwinrlambert/emomap-sentiment-analysis

To analyze public sentiment related to specific locations in a city (e.g., parks, transit stations, restaurants, neighborhoods) using geo-tagged social media posts, reviews, and comments. The goal is to visualize how people feel across different areas and times.

data-analysis jupyter-notebook python sentiment-analysis

Last synced: 24 Apr 2026

https://github.com/amlanmohanty1/zepto-sql-data-analysis-project

Complete Data Analysis on Zepto Inventory data using SQL

data-analysis database inventory-management postgresql sql zepto

Last synced: 24 Apr 2026

https://github.com/pedrohdosanjos/economic-data-analysis

This project aims to analyze the export data from various states in the United States to Brazil over time. The data is sourced from the FRED (Federal Reserve Economic Data) API and processed to identify the top 5 exporting states for each year, as well as the states with the highest total export value across all years.

api data-analysis data-visualization jupyter-notebook python

Last synced: 24 Apr 2026

https://github.com/fbarffmann/belly-button-challenge

Built an interactive JavaScript dashboard to visualize bacterial biodiversity from belly button samples. Analyzed data from 153 participants and identified OTU 1167 as the most common bacteria.

biodiversity dashboard data-analysis data-visualization interactive-charts javascript json plotly

Last synced: 25 Apr 2026

https://github.com/m-biriulova/python-job-market-analysis

Web scraping, data analysis, and visualization of Python developer vacancies in Czech Republic.

automation beautifulsoup data-analysis data-visualization portfolio-project python selenium web-scraping

Last synced: 25 Apr 2026

https://github.com/sarangs1621/weather-prediction

Weather Prediction Using Machine Learning is a project that leverages machine learning algorithms to predict weather conditions based on historical data. It evaluates three popular ML models (Decision Tree, KNN, and Logistic Regression) and provides performance insights through metrics and visualizations.

data-analysis decision-tree jupyter-notebook knn logistic-regression machine-learning predictive-modeling python scikit-learn weather-prediction

Last synced: 25 Apr 2026

https://github.com/aastopher/mma_outcome

Simple exploratory analysis of UFC Fights and Vegas fight odds from 1993 to 2021

data-analysis data-visualization

Last synced: 06 Jun 2026

https://github.com/edwinrlambert/investigating-netflix-movies

Demonstrates data analysis and visualization techniques for Netflix movies using Python in a Jupyter notebook. This is a DataCamp project.

data-analysis data-analysis-python netflix python

Last synced: 25 Apr 2026

https://github.com/devexpress-examples/wpf-pivotgrid-customize-the-cell-template

This example demonstrates how to customize the cell appearance in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 26 Apr 2026

https://github.com/dcs-training/2023-10-22-carpentry-social-science

Go to https://dcs-training.github.io/2023-10-22-Carpentry-Social-Science/ to follow along the material

data-analysis data-visualisation data-wrangling intro-to-programming r

Last synced: 06 Jun 2026

https://github.com/pararang/nams-thesis-fuzzy

A specialized data processing tool designed to help with Fuzzy Delphi Method calculations for thesis research data analysis. Then extended with some new features for data processing with different method.

data-analysis dematel hacktoberfest hacktoberfest-accepted house-of-quality python sustainability vibecoding

Last synced: 27 Apr 2026

https://github.com/mumtaz4118/amazon-iphone-12-data-scrapped

Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.

data-analysis data-extraction data-science data-scraping html mark-up python

Last synced: 27 Apr 2026

https://github.com/busesimsek/sql-projects

A collection of my SQL projects with insights into real-world datasets.

data-analysis data-analytics mysql sql

Last synced: 07 Jun 2026

https://github.com/edanur-y/laptop-price-prediction-with-regression-models

Comparing the performances of multi-layer perceptron, k-nearest neighbors, random forest, gradient boosting and extreme gradient boosting regression and on laptop data to predict the price.

data-analysis data-transformation feature-importance hyperparameter-tuning python

Last synced: 27 Apr 2026

https://github.com/banyc/dfplot

Summarize a data frame by plotting. `cargo install --git https://github.com/Banyc/dfplot.git`.

csv data-analysis plotly plotting statistics

Last synced: 27 Apr 2026

https://github.com/lotfiferaga/hotel-reviews-sentiment-analysis

Efficient Python-driven sentiment analysis for hotel reviews, providing insightful evaluations.

data-analysis data-visualization nlp python

Last synced: 07 Jun 2026

https://github.com/tillscode/personal-finance-ml-analysis

Machine learning analysis of personal financial data with predictive modeling and interactive dashboard

dashboard data-analysis finance machine-learning python scikit-learn

Last synced: 28 Apr 2026

https://github.com/sujata-adhikari/data-analysis

Data analysis of Market sales data using PowerBi, created dashboard to show analysis.

data-analysis excel pandas powerbi

Last synced: 12 Jun 2026

https://github.com/simranshaikh20/diwali-sales-analysis-for-business-insights

A data analyst project on diwali sales . In this state according state , gender, age we are able to know how much sale it done.

data-analysis data-visualization python

Last synced: 28 Apr 2026

https://github.com/elmezianech/autoinventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

automation dashboard data-analysis data-engineering-pipeline docker etl glue-job inventory-management kafka kpis lambda-functions lstm ml-pipeline mlflow power-bi pytorch redshift s3 streamlit warehouse-management

Last synced: 28 Apr 2026

https://github.com/george-njuguna/spotify-etl-pipeline

This is an ETL pipeline that uses Spotify API , Docker and Airflow

apache-airflow data-analysis docker pipelines python

Last synced: 28 Apr 2026

https://github.com/dcs-training/decode-winterschool

In here you can find material on cluster analysis, data wrangling, and network analysis. Go to the readme file for more info

data-analysis data-visualisation data-wrangling gephi network-analysis python r statistics

Last synced: 28 Apr 2026

https://github.com/rorrell/coviddeaths

A Jupyter Notebook where I create several visualizations based on data about COVID-19 deaths from 2020 to 2024

data-analysis data-visualization jupyter-notebook python3

Last synced: 28 Apr 2026

https://github.com/buabaj/fortran-assignment

code repository for fortran and python climatology assignment.

big-data climatology data-analysis data-visualization fortran90 python

Last synced: 28 Apr 2026

https://github.com/priyanshubiswas-tech/e-commerce_data_analysis

Analyzes 9,994 e-commerce transactions to uncover insights on sales trends, customer behavior, profitability, and logistics using EDA and visualization. Identifies top products, customer segments, and shipping efficiencies to optimize marketing, inventory, and operations, making it valuable for retail, finance, and logistics.

data data-analysis data-visualization pandas pandas-dataframe plotly-analytics-projects plotly-express python

Last synced: 28 Apr 2026

https://github.com/ericdataplus/kaggle-airbnb-nyc

NYC Airbnb Market Analysis: Multi-source from 2 Kaggle datasets (151K listings)

airbnb data-analysis kaggle nyc python visualization

Last synced: 28 Apr 2026

https://github.com/wei-rongrong2/openfoodfactclustering

A project that explores clustering food products based on nutritional attributes using K-Means, Fuzzy C-Means, and DBSCAN algorithms, with a Streamlit dashboard for visualizing results.

clustering dashboard data-analysis dbscan food-products fuzzy-cmeans k-means machine-learning nutrition nutrition-clustering open-food-facts streamlit

Last synced: 28 Apr 2026

https://github.com/josedanielchg/efficient-data-storage-for-predictive-modeling

DataCamp project from the Associate Data Scientist track, focusing on optimizing dataset storage by transforming data types and filtering. Prepares data for efficient machine learning workflows

cleaning-dataset data-analysis jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/ricram2/column-name-extractor

Jupyter Notebook. Takes Folder with one or more CSV and gives back one CSV with a compendium of column names and 3 example values (first, random, random)

data-analysis pandas

Last synced: 29 Apr 2026

https://github.com/devexpress-examples/web-forms-pivot-grid-change-summary-display-mode

This example shows how to use different summary display modes in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 29 Apr 2026

https://github.com/thanaraklee/pyspark-dataframe-operations

This project focuses on utilizing PySpark DataFrames to analyze and visualize data sourced from external datasets, such as CSV files. It provides a practical example of how to manipulate, transform, and gain insights from large datasets using the PySpark framework.

data-analysis dataframe pyspark python

Last synced: 29 Apr 2026

https://github.com/kawshik-khan/fake-news-analysis

A fake news detection ML model. It utilizes the Bag of Words model for text vectorization and a Multinomial Naive Bayes classifier to predict whether news articles are real or fake. The project covers data preprocessing, model training, and performance evaluation with accuracy metrics and a confusion matrix.

data-analysis data-science machine-learning ml python3

Last synced: 08 Jun 2026

https://github.com/marcinz20/anomaly-detection-in-credo-dataset

University project, which goal is to build a system, that detects anomalies in CREDO dataset

credo data-analysis data-science encoder-decoder-model jupiter-notebook pca-analysis python3

Last synced: 29 Apr 2026

https://github.com/nivasharmaa/spiderverse

A comprehensive Java program for analyzing and managing events and data points within a fictional spiderverse. Features event handling, anomaly detection, cluster management, and robust file I/O operations.

advanced-algorithms anomaly-detection clustering data-analysis file-io object-oriented-programming

Last synced: 29 Apr 2026

https://github.com/kasraskari/learn-r-codes

A learning repository for R programming, covering data manipulation, visualization, and statistical analysis. (Work in progress!) 🚧

data-analysis data-analysis-r data-visualization r r-examples r-graphics r-statistics statistics

Last synced: 08 Jun 2026

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/anilyigitsel/istanbul-rental-apartments-analysis

This project analyzes the Istanbul Rental Apartments Dataset (2025), which includes rental apartment listings from Istanbul, Turkey.

data-analysis data-visualization jupyter-notebook matplotlib pandas python rental-housing

Last synced: 29 Apr 2026

https://github.com/eco786786/restaurant_orders

This analysis seeks to uncover patterns in customer behaviour by examining restaurant order data.

data-analysis git postgresql tableau

Last synced: 29 Apr 2026

https://github.com/findmyway/dataframe-in-julia

A quick introduction of DataFrame in Julia for users from Python

data-analysis dataframe julia jupyter-notebook

Last synced: 29 Apr 2026

https://github.com/fatihilhan42/starbucks_analysis_turkey_and_world_with_python

In this project, firstly the brands for coffee in the world and then these brands in Turkey were examined. The data from the dataset, which you can find in the repo, was first organized using data cleaning algorithms. These cleaned data were then graphically extracted using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/mr-dhan/eda-sales-customer-transactions

Dalam dunia bisnis ritel yang kompetitif, pemahaman mendalam terhadap perilaku pelanggan merupakan fondasi penting untuk pengambilan keputusan strategis. Namun, data transaksi pelanggan seringkali berjumlah besar dan kompleks, sehingga memerlukan proses analisis yang efektif untuk mengungkap insight yang berharga.

dashboard data data-analysis data-analysis-python data-science data-visualization eda python

Last synced: 29 Apr 2026

https://github.com/mfakhriazhar/python-data-analyst-tutorial

A collection of My Python learning files for Data Analyst purposes. Covers fundamental to advanced topics such as data exploration, visualization, statistical analysis, and the use of popular libraries like Pandas, NumPy, Matplotlib, and Seaborn. Suitable for personal documentation or shared learning references.

data-analysis data-science data-visualization exploratory-data-analysis portfolio python

Last synced: 29 Apr 2026

https://github.com/teja-1403/forage-standard-bank-data-science

This repository contains solutions to the 4 different tasks that must be performed during the Data Science virtual internship provided by Standard Bank via Forage.

automl communication-skills data-analysis data-science machine-learning python sql

Last synced: 29 Apr 2026

https://github.com/farhad-here/textprepx

A Multilingual Text Preprocessing Tool for English and Persian.

cleantext contractions data-analysis deep-learning emoji nlp nltk opp parsivar regex streamlit text-preprocessing textblob

Last synced: 29 Apr 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/chandantech2023/sales-trend-analysis

This repository features the Superstore Sales Analysis project, demonstrating data cleaning and analysis using Python and SQL, along with interactive visualization in Power BI. .

data-analysis data-science dax kaggle powerbi-desktop python3 sql

Last synced: 29 Apr 2026

https://github.com/sharoonjoseph11/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/valikmorinko/ecommerce-sales-analysis

Анализ продаж e-commerce: данные, визуализации, аналитические выводы.

data-analysis e-commerce jupyter matplotlib pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/sdley/cas_pratique-del_annuel

Del-Annuel est logiciel de deliberation annuelle des ecoles superieures ou universités

data-analysis pandas python tkinter-gui

Last synced: 29 Apr 2026

https://github.com/alunera-data/sql-use-cases

Practical SQL use cases for Business Intelligence and IT Service Management (BI & ITSM)

business-intelligence dashboards data-analysis data-quality eda itsm kpis postgresql process-monitoring query reporting sql sqlserver

Last synced: 29 Apr 2026

https://github.com/meinhere/ta-pendat

Proyek Akhir Mata Kuliah Penambangan Data - Klasifikasi Trauma Pasien Menggunakan Metode Naive Bayes

data-analysis data-mining naive-bayes-classifier python trauma

Last synced: 29 Apr 2026

https://github.com/varshan1123/sql-tableau-project

We analyze key indicators for our pizza sales data to gain insights into our business performance - A Data Analysis Project performed on Tableau & SQL.

analysis data-analysis data-science data-visualization excel mysql powerbi sql sql-server tableau tableau-dashboards

Last synced: 29 Apr 2026

https://github.com/al-ghaly/e-commerce-a-b-testing

A Statistical Analysis project in which I Performed an A/B test to analyze the effect of changing the user interface for an E-Commerce company's Website.

data-analysis matplotlib numpy pandas python python-data-analysis seaborn statistical-analysis statistics

Last synced: 29 Apr 2026

https://github.com/supertetelman/frc-data-analysis

A Collection of R, Matlab, and Bash scripts that were developed in real-time from the stands of a FRC competition. Gathered data from various online sources, parsed it, and ran some basic analysis on it to calculate ratings and make basic match predictions. Results were mad public and hosted live via AWS. Developed as a student teaching tool under poor Internet Connectivity with minimal access to real-time match data.

bash data-analysis matlab r teaching

Last synced: 29 Apr 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/mominurr/amazon-best-sellers-data-analysis

Exploring trends and product insights in Amazon Best Sellers data.

data-analysis data-visualization python scraping selenium tableau

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/farhad-here/student_performance_analyzer

Student Performance Analyzer with python, it is on of my data analysis course project. I teach you about filter(),lambda,map() in python

data-analysis data-visualization filter kaggle kaggle-dataset lambda map pandas python python-tutorial streamlit

Last synced: 29 Apr 2026

https://github.com/alam025/algo-trading-bot

Backtested 20+ strategies achieving 18% annualised returns on historical S&P 500 data

api ccxt data-analysis finance fintech pandas postgresql python

Last synced: 08 Jun 2026

https://github.com/angchekar28/air-quality-index-analysis

This project analyzes Air Quality Index (AQI) data to identify pollution trends, seasonal variations, and the impact of different pollutants. It includes data visualization, correlation analysis, and insights into air quality variations over time.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook machine-learning python

Last synced: 30 Apr 2026

https://github.com/shruti-h/sales_data_analysis

Sales Data Analysis | Pandas & Matplotlib

data-analysis data-science data-vi matplotlib pandas-library python

Last synced: 30 Apr 2026

https://github.com/avazasgarov/soccer-hypothesis-testing

Statistical analysis comparing goal-scoring patterns in Men’s vs. Women’s FIFA World Cups using hypothesis testing.

data-analysis eda hypothesis-testing matplotlib-pyplot pandas pingouin python scipy

Last synced: 30 Apr 2026

https://github.com/aishwaryagade02/loan-funnel-optimization-analysis

Tracks how loan applications move through each stage, helps spot where people drop off, and gives clear insights to improve approval strategies and overall performance.

ab-testing data-analysis data-creation hypothesis-testing python reporting sql statistical-methods streamlit

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/diogojorgebasso/dataanalysis_r_minesnancy

Les codes et les matériaux des cours d'analyse de données en R à Mines de Nancy. Vous y trouverez également des scripts R, des notebooks et d'autres ressources pour chaque leçon.

analyse-data data-analysis data-science data-visualization estatistics r statistiques statistiques-descriptives

Last synced: 30 Apr 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026