An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/kmbuki/uk_police_data

R programming - Using open data about crime and policing in England, Wales and Northern Ireland.

data-analysis data-visualization r

Last synced: 04 Jun 2026

https://github.com/misaghmomenib/soccer-match-analysis

This Project Predicts Football Match Outcomes (Home Win, Away Win, or Draw) Using Historical Match Data. It Involves Data Preprocessing, Exploratory Analysis, and Training a Random Forest Model to Predict Results Based on Features Like Shots, Possession, and Passes.

data-analysis git open-source python

Last synced: 20 Apr 2026

https://github.com/anjaliwork20/moodify

Mood-based music recommendation system that considers a user's emotional state to recommend songs, genres, artists and playlists using Machine learning

artificial-intelligence cnn-keras cnn-model convolutional-neural-networks data data-analysis data-science data-structures data-visualization database deep-learning machine-learning machine-learning-algorithms python recommended song songs

Last synced: 20 Apr 2026

https://github.com/xre22zax/roller-coaster

Explore award-winning wood and steel coasters from 2013-2018 Golden Ticket Awards & Captain Coaster, all powered by Python and interactive visualizations.

analytics data-analysis data-visualization pandas python python-lambda python3 visualization

Last synced: 20 Apr 2026

https://github.com/abinashsahoo007/project-bankruptcy-prevention

The project is to create a classification model that predicts the chances of a business facing bankruptcy based on the key feature like Industrial Risk, Management Risk, Financial Flexibility, Credibility, Competitiveness, Operating Risk.

data-analysis data-mining data-visualization deployments eda machine-learning pickle python statistics streamlit

Last synced: 20 Apr 2026

https://github.com/mozeel-v/spam-detection

ML-powered SMS Spam Classifier using NLP and Scikit-learn. Detects and filters spam messages with interactive Streamlit UI.

classification data-analysis mnb streamlit

Last synced: 10 May 2026

https://github.com/profasem/logistics-performance-analysis

Power BI dashboard analyzing logistics performance, delivery delays, carrier efficiency, and regional risk.

business-intelligence dashboard data-analysis logistics powerbi python supply-chain

Last synced: 21 Apr 2026

https://github.com/jeffbrennan/analysis-templates

Templates of commonly used graphics/functions/settings to help focus on the bigger picture

data-analysis r rmd

Last synced: 12 Oct 2025

https://github.com/tmmvn/analytics-notebooks

A bunch of data analytics notebooks done testing out JetBrains DataLore

ai algorithms data-analysis datalore elements-of-ai helsinki-university-mooc python

Last synced: 22 Apr 2026

https://github.com/rajesh9943/sentiment-analysis-of-consumer-opinions-on-amazon-products

Developed a comprehensive Sentiment Analysis System aimed at classifying Amazon product reviews into positive, neutral, and negative sentiments. The project leveraged advanced Natural Language Processing (NLP) techniques alongside machine learning algorithms to deliver accurate and actionable insights from customer feedback

amazon data-analysis data-manipulation data-preprocessing data-presentation data-visualization machine-learning nlp nlp-library nltk product-reviews-analysis sentiment-analysis sklearn-library word-cloud-generator-in-python-3

Last synced: 05 Jun 2026

https://github.com/kgotsosm/epl-analysis

Preparing data for machine learning algorithms to predict English Premier League match winners.

data-analysis data-cleaning data-modeling

Last synced: 22 Apr 2026

https://github.com/ayushi-gajendra/buenos-aires-subway-statistics

A comprehensive data analysis of the Buenos Aires subway system ridership using Python and Pandas. This project identifies peak-hour congestion patterns, explores hourly passenger distributions, and utilizes the 95th percentile to isolate extreme traffic conditions for urban mobility insights.

95th-percentile buenos-aires data-analysis data-science-portfolio data-visualization matplotlib pandas python statistical-analysis subway-ridership transit-data urban-mobility

Last synced: 05 Jun 2026

https://github.com/thc1006/nycu_timtable_crawler

🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.

academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping

Last synced: 24 Apr 2026

https://github.com/arunabhagit/inventory-misalignment-and-revenue-loss-in-multi-store-bike-retail

This project focuses on identifying the inventory and demand mismatch causing stagnant sales and lost revenue in a bike retail chain. By analyzing store-level performance and regional customer preferences, the project aims to detect underperforming products.

data-analysis data-visualization powerbi python

Last synced: 24 Apr 2026

https://github.com/muthukumar0908/youtube-data-harvesting-and-warehousing-using-sql-mongodb-and-streamlit

Create a simple and intuitive user interface using Streamlit, From the youtube getting and extracting the data by using API key. That data stored in database.

data-analysis mongodb-atlas python sqldatabase streamlit-webapp youtube-api

Last synced: 24 Apr 2026

https://github.com/cyberoctane29/python-for-data-analysis

A repository dedicated to learning Python for data analysis, data science, and data analytics. This collection of Jupyter notebooks covers practical exercises and concepts from the Google Advanced Data Analytics Professional Certificate program.

data data-analysis data-analytics data-science python

Last synced: 24 Apr 2026

https://github.com/gnodux/adb-link

An MCP server that connects to multiple databases. Supports access control and dynamic SQL query tool registration and invocation.

agent ai-tools data-analysis database-gateway go mcp mcp-server

Last synced: 06 Jun 2026

https://github.com/gunifiri/duckdb-ghw

🦆 Accelerate analytics with DuckDB's integration for GitHub workflows, enabling efficient data handling and processing directly within your repositories.

analytics analytics-engine big-data columnar-storage data-analysis data-science database duckdb in-memory-database open-source parquet python query-planner r sql

Last synced: 29 Apr 2026

https://github.com/jigyasag18/bird-strikes-in-aviation-project

This project analyzes over a decade of U.S. bird strike data (2000–2011) to evaluate safety risks, damage trends, and cost implications in aviation. Using PostgreSQL for database management and Power BI for dashboard visualization, it uncovers critical insights into when, where, and how wildlife impacts aircraft. Key findings inform strategically.

bird-strike-prevention bird-strike-prevention-in-real-airport data data-analysis data-analysis-project data-visualisation data-visualization data-visualization-project data-visualizations database dataset dax-query postgresql postgresql-database powerbi powerbi-desktop powerbi-report powerbi-visuals sql sql-database

Last synced: 09 May 2026

https://github.com/xjwllmsx/hacker-news-engagement

Analyze Hacker News data to reveal which post types and posting hours spark the most discussion, using Python and a reproducible Jupyter notebook.

data data-analysis jupyter python

Last synced: 25 Apr 2026

https://github.com/m-biriulova/python-job-market-analysis

Web scraping, data analysis, and visualization of Python developer vacancies in Czech Republic.

automation beautifulsoup data-analysis data-visualization portfolio-project python selenium web-scraping

Last synced: 25 Apr 2026

https://github.com/mahdi-meyghani/movie-recommendation-system

A Python-based movie recommendation system utilizing popularity-based, content-based, and collaborative filtering models with data science and machine learning techniques.

data-analysis data-science machine-learning recommendation-system scikit-learn scikitlearn-machine-learning

Last synced: 23 Jan 2026

https://github.com/marielachirinosr/hotel-data-analysis

Pandas & Matplotlib Learning Analysis. Repository featuring data analysis projects using Pandas and Matplotlib libraries

data data-analysis matplotlib pandas python

Last synced: 25 Apr 2026

https://github.com/devexpress-examples/winforms-create-a-custom-exporter-for-pivotgridcontrol-with-xtrareport

This example illustrates how to dynamically create a custom report based on PivotGridControl content in WinForms.

data-analysis dotnet pivot-grid pivot-grid-for-winforms winforms

Last synced: 26 Apr 2026

https://github.com/dcs-training/2023-10-22-carpentry-social-science

Go to https://dcs-training.github.io/2023-10-22-Carpentry-Social-Science/ to follow along the material

data-analysis data-visualisation data-wrangling intro-to-programming r

Last synced: 06 Jun 2026

https://github.com/rociobenitez/happiness-index-data-processing

Repository for Big Data Processing - Contains Jupyter Notebooks and Datasets for data analysis and processing tasks related to Big Data.

big-data big-data-processing data-analysis data-processing happiness-index happiness-report jupyter-notebook matplotlib pandas seaborn

Last synced: 15 May 2026

https://github.com/moshora99/sql-data-warehouse-project

Build modern data warehouse with mysql, Including ETL processes, data modeling and analytics

data-analysis data-engineering data-science database datawarehouse datawarehousing etl scheme sql sql-query sql-server

Last synced: 27 Apr 2026

https://github.com/mothraa/etl-marketanalysis-webscraping-poo

OC project 2 refactoring (POO version not yet completed)

data-analysis etl poo python web-scraping

Last synced: 20 Oct 2025

https://github.com/hrosicka/czechpopulationestimation

This GitHub repository contains Python code for data analysis and population prediction in the Czech Republic up to the year 2050. The code is written in Python and utilizes the Pandas and Matplotlib libraries.

data-analysis data-visualization matplotlib matplotlib-figures matplotlib-pyplot pandas pandas-dataframe pandas-library pandas-python python python3

Last synced: 11 May 2026

https://github.com/hfzdzakii/dicoding-airqualityanalysisdata

This repo is a master submission for my Dicoding Final Project. Air Quality Dataset is being used to fulfill the submission. Feel free to explore and I hope my work give you some insight!

data-analysis data-visualization streamlit

Last synced: 27 Apr 2026

https://github.com/sferez/gradient_descent

Multiple Linear Regression, Gradient Descent with Python

data-analysis data-science gradient-descent linear-regression python

Last synced: 12 May 2026

https://github.com/lotfiferaga/hotel-reviews-sentiment-analysis

Efficient Python-driven sentiment analysis for hotel reviews, providing insightful evaluations.

data-analysis data-visualization nlp python

Last synced: 07 Jun 2026

https://github.com/sweta2501/netflix_dataanalysis

With the help of Netflix Data, I have done some Data Analysis.

data-analysis data-science jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/sujata-adhikari/data-analysis

Data analysis of Market sales data using PowerBi, created dashboard to show analysis.

data-analysis excel pandas powerbi

Last synced: 12 Jun 2026

https://github.com/leticia-ducatti/sales-dashboard-project

Interactive sales dashboard built with Python and Streamlit — shows KPIs, allows filtering, and visualizes sales data.

data-analysis pandas plotly python streamlit

Last synced: 12 May 2026

https://github.com/emmanuelletocs/steam-game-recommender

A powerful recommendation system for Steam games, combining Content-Based and Collaborative Filtering techniques. Built with Python, Scikit-learn, and Streamlit to deliver accurate, real-time game recommendations. Perfect for gamers and data scientists interested in building intelligent recommendation engines.

als-algorithm data-analysis gaming-industry knn machine-learning mds mysql ncf neural-network pyspark recommendation-engine recommendation-system scikit-learn spark

Last synced: 28 Apr 2026

https://github.com/ggarciajavier/udacity-dalf-project2-wrangle-openstreetmap-data

Work performed for the 2nd project of Udacity Data Analyst Nanodegree: OpenStreetMap data wrangling and analysis.

data-analysis openstreetmap python sql

Last synced: 12 May 2026

https://github.com/dcs-training/decode-winterschool

In here you can find material on cluster analysis, data wrangling, and network analysis. Go to the readme file for more info

data-analysis data-visualisation data-wrangling gephi network-analysis python r statistics

Last synced: 28 Apr 2026

https://github.com/matheusafonseca/python-data-visualization-matplotlib-seaborn-masterclass-udemy

This repository is dedicated to storing the code developed during the "Python Data Visualization: Matplotlib & Seaborn Masterclass" course on Udemy.

charts data-analysis data-analysis-python data-science data-visualization database graphics graphics-programming jupyter-notebook matplotlib matplotlib-plots python python3 seaborn seaborn-plots

Last synced: 28 Apr 2026

https://github.com/Kaushik-Puttaswamy/Airline-Passenger-Referral-Prediction-Using-Machine-Learning

This project uses a machine learning model to predict if passengers referred by existing customers will book a flight, helping airlines target likely customers. Key factors like service ratings and value for money drive predictions, achieving over 90% accuracy.

airline-marketing customer-referral-prediction customer-satisfaction data-analysis feature-engineering hyperparameter-tuning machine-learning model-evaluation predictive-analytics

Last synced: 20 Oct 2025

https://github.com/buabaj/fortran-assignment

code repository for fortran and python climatology assignment.

big-data climatology data-analysis data-visualization fortran90 python

Last synced: 28 Apr 2026

https://github.com/priyanshubiswas-tech/e-commerce_data_analysis

Analyzes 9,994 e-commerce transactions to uncover insights on sales trends, customer behavior, profitability, and logistics using EDA and visualization. Identifies top products, customer segments, and shipping efficiencies to optimize marketing, inventory, and operations, making it valuable for retail, finance, and logistics.

data data-analysis data-visualization pandas pandas-dataframe plotly-analytics-projects plotly-express python

Last synced: 28 Apr 2026

https://github.com/leopeng1995/neuralsql

Make DataStore More Intelligent

data-analysis mongodb sql

Last synced: 12 May 2026

https://github.com/leosimoes/alura-7daysofcode-dados

Desafios das Trilhas de Dados - Ciência de Dados, Machine Learning e Python Pandas.

data-analysis data-science jupyter-notebook machine-learning python

Last synced: 28 Apr 2026

https://github.com/prady2309/sales-prediction-using-python

Implemented using Multiple Linear Regression

data-analysis data-science machine-learning python

Last synced: 29 Apr 2026

https://github.com/thanaraklee/pyspark-dataframe-operations

This project focuses on utilizing PySpark DataFrames to analyze and visualize data sourced from external datasets, such as CSV files. It provides a practical example of how to manipulate, transform, and gain insights from large datasets using the PySpark framework.

data-analysis dataframe pyspark python

Last synced: 29 Apr 2026

https://github.com/marcinz20/anomaly-detection-in-credo-dataset

University project, which goal is to build a system, that detects anomalies in CREDO dataset

credo data-analysis data-science encoder-decoder-model jupiter-notebook pca-analysis python3

Last synced: 29 Apr 2026

https://github.com/vanshuchaudhary/zomato

This Jupyter Notebook contains an exploratory data analysis (EDA) of Zomato restaurant data. It includes data cleaning, visualization, and insights into restaurant ratings, pricing, cuisine distribution, and location-based trends.

business-analytics data-analysis data-mining data-science data-visualization datascience matplotlib pandas-dataframe pandas-python python python-3 python-library

Last synced: 29 Apr 2026

https://github.com/roland045/smart_fluid_sedimentation_tester

Control program for custom developed smart fluid sedimentation tester system

arduino data-analysis instrumentation measurement sensor

Last synced: 13 May 2026

https://github.com/dcs-training/network-analyisis-python

Course material for introducing data visualization with Altair and network analysis with NetworkX (in Python). Go to the readme file

data-analysis data-visualisation network-analysis python text-analysis

Last synced: 29 Apr 2026

https://github.com/yulia-momotyuk/dla-data-analysis-practice

This repository contains my homework assignments completed during the "Data Analyst in IT" course at Data Loves Academy.

analytics data-analysis data-visualization excel mysql numpy pandas postgres powerbi python seaborn sql tableau

Last synced: 14 Apr 2026

https://github.com/khushi-sabarad/web_scraping

This project is a Python-based web scraper that extracts the menu from a cafe and saves it to an Excel file. It was created to automate the process of retrieving and updating menu prices, a task that was observed to be done manually at the hostel.

beautifulsoup data-analysis data-visualization market-analysis pandas python requests web-scraping wordcloud

Last synced: 29 Apr 2026

https://github.com/carlos-edulira/mbabigdata-projeto

Entrega do projeto MBA Unipe Big Data BI

data-analysis delta minio python spark

Last synced: 29 Apr 2026

https://github.com/manukot/sturdy-engine-python-

I've leant not only various Theoretical Concepts but also practical projects in my Masters Coursework

data-analysis data-visualization python3

Last synced: 13 May 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/istinnew/eniac_ab_insight

Dive into a comprehensive analysis aimed at boosting iPhone 13 sales by optimizing the Click-Through Rate (CTR) of the “SHOP NOW” button, compare different button designs and determine the most effective strategy for increasing engagement.

ab-testing data data-analysis data-engineering data-science data-visualization google googlecolab libraries python testing testing-tools visual-studio-code

Last synced: 29 Apr 2026

https://github.com/brevex/code-complexity-data-analisis

Data collection that shows different complexity scores in an algorithmic dataframe.

code-analysis data-analysis data-science python

Last synced: 29 Apr 2026

https://github.com/dindagustiayu/data-processing

The digital text book to interpreting characterisation results.

characterisation data-analysis gitbook latex-package myst qualitative-analysis quantitative-analysis

Last synced: 08 Jun 2026

https://github.com/supertetelman/frc-data-analysis

A Collection of R, Matlab, and Bash scripts that were developed in real-time from the stands of a FRC competition. Gathered data from various online sources, parsed it, and ran some basic analysis on it to calculate ratings and make basic match predictions. Results were mad public and hosted live via AWS. Developed as a student teaching tool under poor Internet Connectivity with minimal access to real-time match data.

bash data-analysis matlab r teaching

Last synced: 29 Apr 2026

https://github.com/mominurr/amazon-best-sellers-data-analysis

Exploring trends and product insights in Amazon Best Sellers data.

data-analysis data-visualization python scraping selenium tableau

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/alam025/invoice-generator

Processed 500+ invoices with automated payment reminders and multi-currency PDF generation

api data-analysis finance fintech nextjs pdfkit prisma python stripe

Last synced: 08 Jun 2026

https://github.com/casassg/ms_thesis

Social Media Analysis for Crisis Informatics in the Cloud

casassg-thesis data-analysis google-cloud kubernetes

Last synced: 19 Oct 2025

https://github.com/aishwaryagade02/loan-funnel-optimization-analysis

Tracks how loan applications move through each stage, helps spot where people drop off, and gives clear insights to improve approval strategies and overall performance.

ab-testing data-analysis data-creation hypothesis-testing python reporting sql statistical-methods streamlit

Last synced: 30 Apr 2026

https://github.com/diogojorgebasso/dataanalysis_r_minesnancy

Les codes et les matériaux des cours d'analyse de données en R à Mines de Nancy. Vous y trouverez également des scripts R, des notebooks et d'autres ressources pour chaque leçon.

analyse-data data-analysis data-science data-visualization estatistics r statistiques statistiques-descriptives

Last synced: 30 Apr 2026

https://github.com/madhurragarwal/advertising-data-set---eda-and-ml

Logistic Regression and EDA done on Advertising Data set

data-analysis machine-learning

Last synced: 13 May 2026

https://github.com/mitchellharrison/mitchellharrison.github.io

Welcome to my slice of the internet, where I share the knowledge that Duke gave me, so you don't have to spend the mortgage-sized amount to access it. Built with R, Python, Quarto, and love.

ai algorithms-and-data-structures blog data-analysis data-science data-visualization educational machine-learning portfolio portfolio-website quarto r r-language statistics tutorials

Last synced: 30 Apr 2026

https://github.com/ladaegorova18/data_analysis

Learning the basics of data analysis in Python

analytics data-analysis data-visualization steam-games

Last synced: 24 Jun 2026

https://github.com/gitchaell/computer-scrapping

Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.

data-analysis data-structures data-visualization database dbdiagram export-excel powerbi scrapper-script scrapping xlsx

Last synced: 01 May 2026

https://github.com/satvikpraveen/matplotlibmasterpro

📷 MatplotlibMasterPro is a complete, portfolio-ready project to master data visualization using matplotlib. Includes 16 notebooks, real datasets, exportable plots, custom themes, Streamlit dashboard, and Docker support. Ideal for learners and data professionals.

charts custom-plots dashboarding data-analysis data-science data-visualization educational-project interactive-visualizations jupyter-notebook matplotlib notebooks open-source plotting portfolio-project python python-utilities reproducible-research subplots time-series-analysis visualization-tools

Last synced: 14 May 2026

https://github.com/shruti-h/netflix-eda

Exploratory Data Analysis on Netflix Movies & TV Shows dataset using Python, Pandas, Matplotlib, and Seaborn

data-analysis data-science eda matplotlib netflix pandas-library python seaborn

Last synced: 01 May 2026

https://github.com/cdeweyx/bryce-harper-2016-analysis

Notebook analyzing Bryce Harper's disappointing 2016 campaign in historical context through data analytics.

data-analysis data-visualization python

Last synced: 01 May 2026

https://github.com/yashsingh43/cdc-sleep-duration-health-analysis

Analysis of CDC BRFSS 2022 data exploring how sleep duration relates to mental and physical health outcomes.

beautifulsoup brfss cdc data-analysis data-visualization matplotlib pandas plotly public-health python

Last synced: 11 Jun 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/manjit-baishya-datascience/flipkart-laptop-listing-eda

This project analyzes laptop price data from Flipkart using AutoScraper for web scraping. It includes data loading, EDA, cleaning, statistical analysis, and visualization. The goal is to derive insights for pricing strategies and market positioning. Explore the repository for detailed documentation and code.

data-analysis ecommerce-platform flipkart laptop python

Last synced: 08 Jun 2026

https://github.com/fbarffmann/project1

Analyzed factors influencing movie profitability using Python. Cleaned and visualized film industry data to uncover trends in budgets, sales, genres, and ratings.

box-office-analysis data-analysis data-visualization matplotlib movie-industry pandas python regression seaborn

Last synced: 01 May 2026

https://github.com/kavicastelo/soil-fertilizer-analysis-colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 01 May 2026

https://github.com/kheriberto/pandas_and_seabron_project

In this project I showcase my ability using pandas and seaborn to mold, transform and plot data.

data-analysis pandas python seaborn

Last synced: 01 May 2026