An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/voidnire/redditviralmysteryposts

Análise de posts de subreddits de mistério. O que define um post viral neste tipo de sub?

data-analysis data-visualization mysteries mystery nlms python-3 reddit

Last synced: 24 Apr 2026

https://github.com/mariann95/sql_data_warehouse_and_analytics_project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

data-analysis data-analytics data-cleaning data-engineering data-lakehouse data-science data-science-portfolio data-warehouse data-warehousing datalake datawarehouse datawarehousing etl etl-job etl-pipeline medallion-architecture sql sql-query sql-server sqlserver

Last synced: 06 Jun 2026

https://github.com/fbarffmann/belly-button-challenge

Built an interactive JavaScript dashboard to visualize bacterial biodiversity from belly button samples. Analyzed data from 153 participants and identified OTU 1167 as the most common bacteria.

biodiversity dashboard data-analysis data-visualization interactive-charts javascript json plotly

Last synced: 25 Apr 2026

https://github.com/m-biriulova/python-job-market-analysis

Web scraping, data analysis, and visualization of Python developer vacancies in Czech Republic.

automation beautifulsoup data-analysis data-visualization portfolio-project python selenium web-scraping

Last synced: 25 Apr 2026

https://github.com/aastopher/mma_outcome

Simple exploratory analysis of UFC Fights and Vegas fight odds from 1993 to 2021

data-analysis data-visualization

Last synced: 06 Jun 2026

https://github.com/devexpress-examples/wpf-pivotgrid-customize-the-cell-template

This example demonstrates how to customize the cell appearance in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 26 Apr 2026

https://github.com/moshora99/sql-data-warehouse-project

Build modern data warehouse with mysql, Including ETL processes, data modeling and analytics

data-analysis data-engineering data-science database datawarehouse datawarehousing etl scheme sql sql-query sql-server

Last synced: 27 Apr 2026

https://github.com/arush-codes/paris-olympic-de

data engineering project on paris olympics 2024

azure data-analysis data-engineering microsoft-azure olympics2024 pipeline

Last synced: 27 Apr 2026

https://github.com/banyc/dfplot

Summarize a data frame by plotting. `cargo install --git https://github.com/Banyc/dfplot.git`.

csv data-analysis plotly plotting statistics

Last synced: 27 Apr 2026

https://github.com/sujata-adhikari/data-analysis

Data analysis of Market sales data using PowerBi, created dashboard to show analysis.

data-analysis excel pandas powerbi

Last synced: 12 Jun 2026

https://github.com/elmezianech/autoinventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

automation dashboard data-analysis data-engineering-pipeline docker etl glue-job inventory-management kafka kpis lambda-functions lstm ml-pipeline mlflow power-bi pytorch redshift s3 streamlit warehouse-management

Last synced: 28 Apr 2026

https://github.com/shreeparab1890/indian-elections-2019-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Indian Lok Sabha Elections 2019.

data data-analysis data-science data-visualization eda exploratory-data-analysis matplotlib numpy pandas plotly python python3 visualization

Last synced: 28 Apr 2026

https://github.com/manalisbhavsar/stock-price-prediction

Stock Price Prediction model using Machine Learning and LSTM to forecast future stock prices based on historical data. Achieved a low error rate of 3.2% by leveraging moving averages and deep learning techniques, ensuring accurate predictions.

data-analysis deep-learning lstm machine-learning matplotlib numpy pandas python

Last synced: 28 Apr 2026

https://github.com/abhi227070/car-price-prediction

This project implements a machine learning model to predict the price of cars based on various features such as mileage, manufacturing date, fuel type, and more. Users can input car information, and the model will estimate the price of the car based on the provided data. This tool can be useful for both car buyers and sellers to estimate car price.

data-analysis machine-learning machine-learning-algorithms machinelearning python3 regression regression-models scikit-learn scikitlearn-machine-learning

Last synced: 28 Apr 2026

https://github.com/ericdataplus/kaggle-airbnb-nyc

NYC Airbnb Market Analysis: Multi-source from 2 Kaggle datasets (151K listings)

airbnb data-analysis kaggle nyc python visualization

Last synced: 28 Apr 2026

https://github.com/emircanakyuzz/veri_gorsellestirilmesi_ve_analizi-analysis_and_visualization_of_dataset

Bu çalışmada numpy, pandas, seaborn ve matplotlib gibi veri biliminde çokca bilinen modülleri kullanarak analiz ve görselleştirme işlemleri gerçekleştirdim.

data-analysis data-science data-visualization jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/marcinz20/anomaly-detection-in-credo-dataset

University project, which goal is to build a system, that detects anomalies in CREDO dataset

credo data-analysis data-science encoder-decoder-model jupiter-notebook pca-analysis python3

Last synced: 29 Apr 2026

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/anilyigitsel/istanbul-rental-apartments-analysis

This project analyzes the Istanbul Rental Apartments Dataset (2025), which includes rental apartment listings from Istanbul, Turkey.

data-analysis data-visualization jupyter-notebook matplotlib pandas python rental-housing

Last synced: 29 Apr 2026

https://github.com/i7t5/sentimentnlp

Sentiment analysis for COMP 435 Introduction to Machine Learning, Spring 2025

data-analysis jupyter-notebook machine-learning nlp python sentiment-analysis

Last synced: 29 Apr 2026

https://github.com/fatihilhan42/starbucks_analysis_turkey_and_world_with_python

In this project, firstly the brands for coffee in the world and then these brands in Turkey were examined. The data from the dataset, which you can find in the repo, was first organized using data cleaning algorithms. These cleaned data were then graphically extracted using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/mfakhriazhar/python-data-analyst-tutorial

A collection of My Python learning files for Data Analyst purposes. Covers fundamental to advanced topics such as data exploration, visualization, statistical analysis, and the use of popular libraries like Pandas, NumPy, Matplotlib, and Seaborn. Suitable for personal documentation or shared learning references.

data-analysis data-science data-visualization exploratory-data-analysis portfolio python

Last synced: 29 Apr 2026

https://github.com/jofaval/melbourne-temperature-timeseries

Timeseries Data Analysis and Forecasting of the daily min temperature in Melbourne from 1981 to 1990

data-analysis data-science data-visualization deep-learning google-colab melbourne python temperature tensorflow timeseries timeseries-analysis

Last synced: 29 Apr 2026

https://github.com/sharoonjoseph11/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/alunera-data/sql-use-cases

Practical SQL use cases for Business Intelligence and IT Service Management (BI & ITSM)

business-intelligence dashboards data-analysis data-quality eda itsm kpis postgresql process-monitoring query reporting sql sqlserver

Last synced: 29 Apr 2026

https://github.com/varshan1123/sql-tableau-project

We analyze key indicators for our pizza sales data to gain insights into our business performance - A Data Analysis Project performed on Tableau & SQL.

analysis data-analysis data-science data-visualization excel mysql powerbi sql sql-server tableau tableau-dashboards

Last synced: 29 Apr 2026

https://github.com/prithviraj-2003/cognifyz-data-science-internship

🎓 Data Science Internship at Cognifyz Technologies 📅 Duration: 2 Months 🧠 Worked on real-world restaurant data 🗂️ Completed structured tasks across 3 levels 📌 Tasks focused on EDA, data preprocessing, visualization, and analysis 📎 Task descriptions provided in an attached PDF

data-analysis data-science data-visualization matplotlib numpy pandas python3

Last synced: 29 Apr 2026

https://github.com/theoplayz2/eda-explorer

Инструмент на Python для разведочного анализа данных (EDA) и визуализации, поддерживающий загрузку данных CSV и JSON, с модульной архитектурой ООП. Практическая работа по теме: "Обнаружение и визуализация данных для понимания их сущности" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

analysis battery-life cqrs csharp data-analysis eeg-analysis exploratorydataanalysis json-visualization matplotlib messaging profile-report python verilog visualization

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/muhammadusman-khan/e-commerce-store-eda

Exploratory Data Analysis on E-commerce store data to uncover insights about sales trends, customer behavior, and product performance using Python libraries like Pandas, NumPy, and Matplotlib/Seaborn.

data-analysis data-science data-visualization e-commerce eda exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/alam025/invoice-generator

Processed 500+ invoices with automated payment reminders and multi-currency PDF generation

api data-analysis finance fintech nextjs pdfkit prisma python stripe

Last synced: 08 Jun 2026

https://github.com/bachtiarashidiqy/ecommercedashboard

An interactive e-commerce analytics dashboard built with Streamlit, providing visualizations for sales performance, product analysis, geographic insights, and delivery status. Includes date filtering, company branding, and comprehensive documentation.

analytics dashboard data-analysis data-visualization e-commerce matplotlib pandas python seaborn streamlit

Last synced: 30 Apr 2026

https://github.com/farhad-here/id_validator

Iranian National ID Validator. This was one of my data analysis project for the course i had.

data-analysis identity idverification object-oriented-programming oop oops-in-python python streamlit

Last synced: 30 Apr 2026

https://github.com/mfakhriazhar/nlp-movie-recommender-system

This project is a content-based movie recommender system built using Natural Language Processing (NLP) techniques. By extracting and combining important text features from movie metadata, this system suggests movies that are similar to a user's selected title.

data-analysis data-science deep-learning machine-learning natural-language-processing python recommender-system

Last synced: 30 Apr 2026

https://github.com/mitchellharrison/mitchellharrison.github.io

Welcome to my slice of the internet, where I share the knowledge that Duke gave me, so you don't have to spend the mortgage-sized amount to access it. Built with R, Python, Quarto, and love.

ai algorithms-and-data-structures blog data-analysis data-science data-visualization educational machine-learning portfolio portfolio-website quarto r r-language statistics tutorials

Last synced: 30 Apr 2026

https://github.com/beolawork-art/novabank-churn-analysis

NovaBank has noticed that customers are closing accounts or going inactive, and they want to understand why.

data-analysis data-science-projects data-visualization eda machine-learning numpy pandas python scikit-learn sql

Last synced: 08 Apr 2026

https://github.com/busra-deveci/kaggle-iris_data_analysis

Exploratory data analysis and visualization of the Iris dataset using Python.

data-analysis iris-dataset kaggle pandas python seaborn visualization

Last synced: 30 Apr 2026

https://github.com/badranalyst/e-commerce-customer-analysis-data-science-foundations-case-study

This case study explores e-commerce customer data through data exploration, pre-processing, and splitting. It includes model building and training to analyze customer behavior. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used for the analysis and model development.

data-analysis data-science dataset eda exploratory-data-analysis machine-learning matplotlib ml model-building model-training numpy pandas pre-processing python seaborn

Last synced: 01 May 2026

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026

https://github.com/cdeweyx/bryce-harper-2016-analysis

Notebook analyzing Bryce Harper's disappointing 2016 campaign in historical context through data analytics.

data-analysis data-visualization python

Last synced: 01 May 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/sairupeshl/leo-orbital-congestion-analysis

Geospatial data analysis of the UCS Satellite Database using Python to map active LEO space assets, validate orbital parameters, and isolate mega-constellation traffic bottlenecks.

aerospace-engineering data-analysis geospatial-analysis orbital-mechanics pandas python satellite-data seaborn

Last synced: 08 Jun 2026

https://github.com/rafath0ssain/predihome

Data analysis using economic factors affecting living conditions across Canadian provinces.

data-analysis data-visualization dplyr ggplot2 graph kaggle linear-regression prediction-model r shiny tidyr

Last synced: 01 May 2026

https://github.com/maxwelllzh/linearizer

Linearizing parameters for linear regression

data-analysis machine-learning scikit-learn

Last synced: 02 May 2026

https://github.com/adithya17-star/ai-powered-fraud-detection

An AI-powered fraud detection system using machine learning algorithms to identify suspicious transactions and provide interactive visualizations for financial security.

dashboard-visualization data-analysis finance-technology fintech flask fraud-detection machine-learning python security transaction-monitoring

Last synced: 02 May 2026

https://github.com/teja-1403/ignosis-tech-ml-assignment

Analysis of transaction data to identify the most profitable products and key customer segments, providing insights for targeted marketing strategies.

customer-segmentation data-analysis data-visualization machine-learning marketing-strategy python

Last synced: 02 May 2026

https://github.com/rorrell/employmentdata

A Jupyter Notebook where I use group by to analyze the average unemployment rate by year

data-analysis data-visualization jupyter-notebook python3

Last synced: 02 May 2026

https://github.com/dissorial/prx21_erikz

Analysis of self-tracked data: interactive visualizations & predictive algorithms

analytics data-analysis data-science data-visualization machine-learning matplotlib pandas python python3 visualization

Last synced: 02 May 2026

https://github.com/helenaden/data-science-fundamentals

This project delves into fundamental data science concepts using Python libraries like NumPy and Pandas

data-analysis datascience datasets datavisualization datawrangling heatmap numpy pandas patterns python

Last synced: 03 May 2026

https://github.com/monteirooscar98/tarifas-publicas-sp-dieese

Extração de dados através de WebScraping no site do Dieese e Analise em relação as Tarifas Públicas do Município de São Paulo.

data-analysis data-visualization python webscraping

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/rohitinu6/tesla-price-prediction

A machine learning project that predicts future stock price movements using Logistic Regression, SVC, and XGBoost with engineered financial features.

data-analysis data-visualization feature-engineering financial-analysis logistic-regression machine-learning matplotlib python scikit-learn seaborn stock-market stock-price-prediction support-vector-machine time-series xgboost

Last synced: 03 May 2026

https://github.com/mohnish88/e-commerce-data-analysis

I analyzed sales data to identify trends and patterns, which significantly enhanced decision-making processes. Additionally, I created interactive visualizations to present these insights clearly and effectively, facilitating better understanding and communication of the data's implications.

data-analysis data-cleaning jupyter-notebook pandas plotly python python-library sales sales-analysis visulaization

Last synced: 03 May 2026

https://github.com/devlucho/modelos-predictivos

Modelos predictivos utilizando los algoritmos de Regresión Lineal, Regresión Logística y Árboles de Decisión.

data-analysis jupyter-notebook python3

Last synced: 03 May 2026

https://github.com/salma-mamdoh/project-writing-functions-for-product-analysis

My Project to learn the Basics of Analysis on DataCamp

data-analysis data-camp pandas python

Last synced: 03 May 2026

https://github.com/ankitgmishra/machinelearning

Continuously deep diving in understanding & advancing my expertise in Machine Learning through ongoing education and hands on experience with practical learning.

artificial-intelligence data-analysis data-cleaning data-gathering machine-learning machinel-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/nurulashraf/logistic-regression-loan-prediction

Loan approval prediction using logistic regression based on applicant data, including income, credit history, and property details, after data preparation and feature engineering.

data-analysis data-science loan-prediction logistic-regression machine-learning predictive-modeling python sklearn

Last synced: 03 May 2026

https://github.com/devesh8423/machine_learning

Machine Learning practice projects, Jupyter notebooks, and datasets for learning regression, classification, and data analysis.

classification data-analysis data-science data-visualization jupyter-notebook machine-learning matplotlib ml-project numpy-library pandas python regression sckit-learn seaborn

Last synced: 03 May 2026

https://github.com/donmaruko/flask-data-analysis

Flask API for statistical calculations. Data analysis, cleansing, visualization, and manipulation. Documented by Swagger.

api api-rest data-analysis data-science data-visualization datascience flasgger matplotlib pandas seaborn sqlite wordcloud

Last synced: 03 May 2026

https://github.com/nathadriele/world-marathon-run-majors-analytics-challenge

This project presents a complete data engineering, analytics, machine learning, and Streamlit dashboard pipeline focused on the Abbott World Marathon Majors: Tokyo, Boston, London, Berlin, Chicago, and New York City. Covering the 2018 to 2025 seasons, it analyzes more than 628,000 runner records and 86 verified winner entries.

challenge data-analysis data-pipeline gradient-boosting lasso-regression linear-regression machine-learning models predictive-modeling python random-forest ridge-regression run-analytics world-marathon

Last synced: 09 Jun 2026

https://github.com/xiaohan2012/myunisport

Visualize your Unisport annual training records

data-analysis data-visualization pandas pygal sports-stats tikzposter

Last synced: 04 May 2026

https://github.com/fatihilhan42/the-office-eda

Data analysis study of my favorite sitcom, The Office (US).

data-analysis data-science data-visualization fatihilhan office python sitcom

Last synced: 04 May 2026

https://github.com/ibrahimm7004/supermarket-sales-analysis

This project focuses on Data Mining techniques to gather inisights about customer behaviour regarding Supermarket Sales. Includes: Association Rule Mining, Temporal Patterns in customer behavior, Sequential Pattern Mining, Classification, Regression, and Outlier Detection.

apriori association-rules data-analysis data-mining data-science data-visualization fpgrowth python sales-analysis supermarket-sales

Last synced: 04 May 2026

https://github.com/hyperplasma/olympic-visualization-analysis

Multidimensional analysis and visualization of Olympic medals, economy, and happiness index.

data-analysis data-visualization matplotlib numpy pandas python wordcloud

Last synced: 04 May 2026

https://github.com/ljadhav25/logistic-regression-data-science-

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote, based on a given data set of independent variables.

data-analysis data-science data-visualization logestic-regression machine-learning

Last synced: 04 May 2026

https://github.com/youssefyaser/scrape-the-imdb-site-for-the-top-250-movies

Web scraping the top 250 movies in IMDB site.

data-analysis numpy pandas python

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/vara-co/crowdfunding_etl

ETL Mini Project based on a Crowdfunding Database, using CRUD operations. SQL, Postgres, and an ERD.

data-analysis database datacleaning erd erdiagram etl jupyter-notebook postgres postgresql regex schema sql

Last synced: 04 May 2026

https://github.com/matt-ags/jornada-python

Repositório com os projetos realizados durante a semana "Jornada Python" - 01/2025

artificial-intelligence automation data-analysis jupyter-notebook machine-learning python

Last synced: 05 May 2026

https://github.com/rtlich/sap-sustainable-management

Project for the ERP & BI course at Esprit School of Engineering. It optimizes resource and operations management in an agri-food company using SAP MM & PM, focusing on sustainability, CO₂ reduction, and predictive maintenance.

angular business-intelligence data-analysis flask machine-learning ocr powerbi python sql-server talend

Last synced: 05 May 2026

https://github.com/13anush/python-libraries-

A collection of essential Python libraries—NumPy, Pandas, Matplotlib, and Seaborn—perfect for anyone starting out in data analysis.

data-analysis matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/sajjad425/edaipl

The dataset covers the Indian Premier League (IPL) with details on matches (date, teams, venue, results), player stats (runs, wickets), team stats (wins, losses), season summaries, and umpire info. The EDA reveals patterns and insights, highlighting dominant teams, star players, and trends across seasons.

data-analysis eda exploratory-data-analysis ipl python

Last synced: 05 May 2026

https://github.com/pcanadas/weather_scraper

Este proyecto automatiza la recopilación y el procesamiento de datos meteorológicos históricos y previsionales. Utiliza Selenium para extraer información de sitios web de clima, procesa los datos con Pandas y los almacena en archivos CSV limpios. Es ideal para análisis climáticos, visualización de datos o integración en otros sistemas.

beautifulsoup data-analysis pandas python selenium

Last synced: 05 May 2026

https://github.com/ayaatmohammed/amazon-sales-analysis-pyspark

In-depth analysis of the Olist E-commerce dataset from Kaggle using PySpark for customer segmentation (RFM) and market basket analysis.

big-data big-data-analytics customer-segmentation data-analysis data-science ecommerce jupyter-notebook kaggle pyspark python rfm-analysis

Last synced: 05 May 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/ibrahimceyisakar/hotel-finder

Hotel finder system with Python includes data gathering, analyzing, and visualization.

data-analysis data-gathering data-visualization pandas plotly python selenium streamlit

Last synced: 06 May 2026