An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/mumtaz4118/amazon-iphone-12-data-scrapped

Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.

data-analysis data-extraction data-science data-scraping html mark-up python

Last synced: 27 Apr 2026

https://github.com/busesimsek/sql-projects

A collection of my SQL projects with insights into real-world datasets.

data-analysis data-analytics mysql sql

Last synced: 07 Jun 2026

https://github.com/mehrab-kalantari/olympics-data-analysis

A streamlit application to analyze the Olympics dataset from several views

data-analysis streamlit-dashboard streamlit-webapp

Last synced: 20 Apr 2026

https://github.com/hfzdzakii/dicoding-airqualityanalysisdata

This repo is a master submission for my Dicoding Final Project. Air Quality Dataset is being used to fulfill the submission. Feel free to explore and I hope my work give you some insight!

data-analysis data-visualization streamlit

Last synced: 27 Apr 2026

https://github.com/airdac/ml-palmerpenguins

Classification and analysis of the palmerpenguins dataset in Python. Team project from UPC's Master's Degree in Data Science

classification data-analysis data-science machine-learning palmer-penguin python upc

Last synced: 07 Jun 2026

https://github.com/josedanielchg/1990s-netflix-movie-insight

Small exploratory analysis of Netflix movie data from the 1990s. This project is part of the DataCamp Associate Data Scientist in Python program and focuses on filtering, visualizing, and extracting insights from a dataset using Python. Analyze trends in movie durations and count short action films to practice key data science skills!

beginner data-analysis python

Last synced: 27 Apr 2026

https://github.com/sweta2501/netflix_dataanalysis

With the help of Netflix Data, I have done some Data Analysis.

data-analysis data-science jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/datalopes1/warehouse_rfv

Neste projeto será realizada uma análise do tipo RFV (Recência, Frequência e Valor) com dados que encontrei neste video no Youtube do canal Jie Jenn.

analise-rfv data-analysis data-science kmeans python rfm-analysis

Last synced: 28 Apr 2026

https://github.com/rajivaleaakash/customer-churn-prediction

A machine learning project focused on predicting customer churn using various data analysis and modeling techniques. The repository includes data preprocessing, feature engineering, exploratory data analysis (EDA), model training, evaluation, and visualization to help businesses identify customers at risk of leaving.

churn-prediction classification customer-churn data-analysis data-science gridsearchcv imblearn machine-learning numpy pandas pyhton randomsearchcv scikit-learn

Last synced: 28 Apr 2026

https://github.com/dcs-training/decode-winterschool

In here you can find material on cluster analysis, data wrangling, and network analysis. Go to the readme file for more info

data-analysis data-visualisation data-wrangling gephi network-analysis python r statistics

Last synced: 28 Apr 2026

https://github.com/matheusafonseca/python-data-visualization-matplotlib-seaborn-masterclass-udemy

This repository is dedicated to storing the code developed during the "Python Data Visualization: Matplotlib & Seaborn Masterclass" course on Udemy.

charts data-analysis data-analysis-python data-science data-visualization database graphics graphics-programming jupyter-notebook matplotlib matplotlib-plots python python3 seaborn seaborn-plots

Last synced: 28 Apr 2026

https://github.com/abhi227070/car-price-prediction

This project implements a machine learning model to predict the price of cars based on various features such as mileage, manufacturing date, fuel type, and more. Users can input car information, and the model will estimate the price of the car based on the provided data. This tool can be useful for both car buyers and sellers to estimate car price.

data-analysis machine-learning machine-learning-algorithms machinelearning python3 regression regression-models scikit-learn scikitlearn-machine-learning

Last synced: 28 Apr 2026

https://github.com/ssoehdata/sql_for_data_science_specialization_course

Materials and Certifications from the SQL for DataScience Course

data-analysis data-science database databricks postgresql sql sqlite

Last synced: 10 Apr 2026

https://github.com/dcs-training/intro-to-statistics

Intro to Statistics workshop. In this repo, you are going to find the code and files we are going to use for the practical part of the workshop, together with the ppt associated with this training. Go to the readme file

data-analysis data-visualisation data-wrangling r statistics

Last synced: 20 Jun 2026

https://github.com/gaurav-van/optimizing-rate-of-penetration-in-geothermal-drilling-a-digital-twin-approach

Let’s explore something interesting together. In this project, we developed a machine learning digital twin using Intel-optimized XGBoost and daal4py to simulate and optimize the Rate of Penetration (ROP) in geothermal drilling. We leveraged SHAP for Explainable AI (XAI) to interpret model predictions.

data-analysis data-science digital-twin explainable-ai geothermal geothermal-energy jupyter-notebook machine-learning python shap xai xgboost

Last synced: 28 Apr 2026

https://github.com/leosimoes/alura-7daysofcode-dados

Desafios das Trilhas de Dados - Ciência de Dados, Machine Learning e Python Pandas.

data-analysis data-science jupyter-notebook machine-learning python

Last synced: 28 Apr 2026

https://github.com/i-am-uchenna/sql-data-warehouse-project

The Data Warehouse and Analytics Project is a comprehensive initiative designed to demonstrate the end-to-end process of building a modern data warehouse and deriving actionable insights through SQL-based analytics.

architecture business-intelligence crm data data-analysis database database-management datawarehouse erp etl etl-pipeline model sql sqlserver

Last synced: 15 May 2026

https://github.com/chrispsang/healthcare-dataanalysis

Analyze synthetic patient data to identify trends, improve healthcare delivery, and predict patient outcomes using machine learning models. Includes data exploration, preprocessing, model building, and visualizations.

data-analysis data-science data-visualization healthcare jupyter-notebook machine-learning python

Last synced: 29 Apr 2026

https://github.com/marcinz20/anomaly-detection-in-credo-dataset

University project, which goal is to build a system, that detects anomalies in CREDO dataset

credo data-analysis data-science encoder-decoder-model jupiter-notebook pca-analysis python3

Last synced: 29 Apr 2026

https://github.com/nevermendel/revolut-analysis

Python script to analyse Revolut transactions

data-analysis revolut revolut-analysis

Last synced: 12 Apr 2025

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/eco786786/restaurant_orders

This analysis seeks to uncover patterns in customer behaviour by examining restaurant order data.

data-analysis git postgresql tableau

Last synced: 29 Apr 2026

https://github.com/i7t5/sentimentnlp

Sentiment analysis for COMP 435 Introduction to Machine Learning, Spring 2025

data-analysis jupyter-notebook machine-learning nlp python sentiment-analysis

Last synced: 29 Apr 2026

https://github.com/lankesathwik7/sql-query-assistant

Natural language to SQL query converter using Groq LLM. Ask questions in plain English and get SQL queries, visualized results, and natural language explanations. Built with Streamlit and PostgreSQL.

data-analysis database groq llm natural-language-processing python sql

Last synced: 29 Apr 2026

https://github.com/carlos-edulira/mbabigdata-projeto

Entrega do projeto MBA Unipe Big Data BI

data-analysis delta minio python spark

Last synced: 29 Apr 2026

https://github.com/farhad-here/textprepx

A Multilingual Text Preprocessing Tool for English and Persian.

cleantext contractions data-analysis deep-learning emoji nlp nltk opp parsivar regex streamlit text-preprocessing textblob

Last synced: 29 Apr 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/jayavarshini-jayakumaran/nba-exploratory-data-analysis

A data analytics project that explores NBA game and player data using Python and Power BI. Features data preprocessing, EDA, feature engineering, and an interactive dashboard for visualizing team and player performance trends.

data-analysis data-visualization exploratory-data-analysis powerbi python3

Last synced: 20 Jun 2026

https://github.com/meinhere/ta-pendat

Proyek Akhir Mata Kuliah Penambangan Data - Klasifikasi Trauma Pasien Menggunakan Metode Naive Bayes

data-analysis data-mining naive-bayes-classifier python trauma

Last synced: 29 Apr 2026

https://github.com/ceia-prefeitura/urban-lit-tracker-etl

UrbanLitTracker coleta artigos acadêmicos sobre mudanças urbanas via OpenAlex API, processa e armazena em MongoDB. Oferece dashboard interativo com Dash, exibindo dados como trabalhos mais relevantes, autores e palavras-chave frequentes, facilitando a análise e visualização da literatura urbana.

academic-research bibliometrics data-analysis data-pipeline data-visualization etl openalex-api urban-studies

Last synced: 11 May 2026

https://github.com/al-ghaly/e-commerce-a-b-testing

A Statistical Analysis project in which I Performed an A/B test to analyze the effect of changing the user interface for an E-Commerce company's Website.

data-analysis matplotlib numpy pandas python python-data-analysis seaborn statistical-analysis statistics

Last synced: 29 Apr 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/theoplayz2/eda-explorer

Инструмент на Python для разведочного анализа данных (EDA) и визуализации, поддерживающий загрузку данных CSV и JSON, с модульной архитектурой ООП. Практическая работа по теме: "Обнаружение и визуализация данных для понимания их сущности" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

analysis battery-life cqrs csharp data-analysis eeg-analysis exploratorydataanalysis json-visualization matplotlib messaging profile-report python verilog visualization

Last synced: 29 Apr 2026

https://github.com/muhammadusman-khan/e-commerce-store-eda

Exploratory Data Analysis on E-commerce store data to uncover insights about sales trends, customer behavior, and product performance using Python libraries like Pandas, NumPy, and Matplotlib/Seaborn.

data-analysis data-science data-visualization e-commerce eda exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/leticia-ducatti/sales-dashboard-project

Interactive sales dashboard built with Python and Streamlit — shows KPIs, allows filtering, and visualizes sales data.

data-analysis pandas plotly python streamlit

Last synced: 12 May 2026

https://github.com/edgarhtt/uber_freight_data_analysis

Uber Freight interview homework. It consisted of solving a 2 warehouse problem and an ETL task

data-analysis data-science data-visualization python

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/min-thway-htut/r-programming

Repository for R-Programming

data-analysis r-programming

Last synced: 10 Jun 2026

https://github.com/farhad-here/id_validator

Iranian National ID Validator. This was one of my data analysis project for the course i had.

data-analysis identity idverification object-oriented-programming oop oops-in-python python streamlit

Last synced: 30 Apr 2026

https://github.com/yankh764/revenue-data-analysis

A take home assignment of improving a revenue data pipeline

data-analysis docker python sql take-home-assignment

Last synced: 30 Apr 2026

https://github.com/ahmedtaher10/covid-19-cases

The data we are using contains the data on covid-19 cases and their impact on GDP from December 31, 2019, to October 10, 2020.

data-analysis python visualization

Last synced: 30 Apr 2026

https://github.com/busra-deveci/kaggle-iris_data_analysis

Exploratory data analysis and visualization of the Iris dataset using Python.

data-analysis iris-dataset kaggle pandas python seaborn visualization

Last synced: 30 Apr 2026

https://github.com/devexpress-examples/wpf-pivot-grid-provide-custom-summary-values

This example demonstrates how to determine the value type when you calculate custom summary values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 01 May 2026

https://github.com/okwilkins/retailanalysis

A comprehensive exploratory analysis and implementation of kmeans/hierarchical clustering on online retail data.

data-analysis data-science machine-learning statistics

Last synced: 18 Oct 2025

https://github.com/skysign/dat

데이터분석을 함께 공부하는 스터디입니다.

data data-analysis data-science

Last synced: 02 Jan 2026

https://github.com/satyam4229/omnify-dataanalysis

Our assessment of Omnify focused on data-driven strategies to maximize profitability. We identified "Product X" as the most profitable product and recommended leveraging the "Wellness Solutions" keyword category for optimal keyword strategy.

data-analysis data-science data-visualization excel omnify

Last synced: 04 Jan 2026

https://github.com/dnut/associations

Python 3 library to identify high-dimensional statistical relationships in any data set.

analytics arch-linux association-rules data data-analysis data-mining data-science machine-learning python-modules

Last synced: 01 May 2026

https://github.com/caesaredia/la-cafe-market-analysis

A data-driven feasibility study exploring the potential of launching a robot-staffed café in Los Angeles, based on real F&B business data.

business-intelligence cafe data-analysis data-visualization food-industry franchise los-angeles market-research pandas python

Last synced: 01 May 2026

https://github.com/pablo1785/receipt-rs

Receipt processing backend built with Shuttle.rs, Axum and Azure Form Recognizer API

api-rest axum azure backend cognitive-services computer-vision data-analysis rust shuttle-rs sqlx

Last synced: 01 May 2026

https://github.com/sebastian-diaz-berdecia/analisis-popularidad-de-series-y-generos-de-series

Consultas SQL para el análisis de la popularidad de series y géneros series de la base de datos NetflixDB.

business-analytics bussiness-intelligence data data-analysis database mysql mysql-database sql

Last synced: 12 May 2026

https://github.com/codesaadumair/data-science-monorepo

Comprehensive Data Science monorepo featuring EDA, Machine Learning, Preprocessing, Feature Engineering, and Visualization projects with Jupyter notebooks and Python.

data-analysis data-science data-science-projects data-visualization eda jupyter-notebook jupyterlab machine-learning python

Last synced: 01 May 2026

https://github.com/priyanshu7639/data_visualization_dashboard

An Interactive data visualization tool that combines traditional plotting capabilities with modern AI assistance. It allows users to create and modify visualizations through natural language commands, making data exploration accessible to users of all skill levels.

business-analytics data-analysis data-engineering data-exploration data-science data-visualization datapreprocessing datascience interactive-visualizations matplotlib plotly plotting python research-tool streamlit

Last synced: 12 May 2026

https://github.com/idb-devs/dataanalyticsairbnb

Construir um modelo de previsão de preço que permita uma pessoa comum que possui um imóvel possa saber quanto deve cobrar pela diária do seu imóvel.

data-analysis data-science jupyter python

Last synced: 18 Apr 2026

https://github.com/luminati-io/target-dataset-samples

A sample dataset of over 1000 target products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.

api data-analysis data-mining datasets target web-scraper web-scraping

Last synced: 04 Jan 2026

https://github.com/bhoyarapurva23399/mini-erp-inventory-billing

Lightweight ERP inventory and billing web app built using Python Flask and SQLite — featuring product, customer, and dashboard management.

backend data-analysis erp flask inventory-billing mini-project python sqlite

Last synced: 01 May 2026

https://github.com/teja-1403/ignosis-tech-ml-assignment

Analysis of transaction data to identify the most profitable products and key customer segments, providing insights for targeted marketing strategies.

customer-segmentation data-analysis data-visualization machine-learning marketing-strategy python

Last synced: 02 May 2026

https://github.com/parthds02/-daily-calorie-count-meal-plan-generator-

Welcome to the Daily Calorie Count Meal Plan Generator project! This Streamlit web application is designed to create personalized meal plans based on user inputs such as age, weight, gender, and calorie goals. It also allows users to download their customized meal plans as PDFs.

calories-tracker data-analysis data-science pdf-generation streamlit vscode

Last synced: 13 May 2026

https://github.com/ani717/pneumonia_detection_effecientnet_b7

Pneumonia Detection in Chest X-ray Image with EfficientNet-B7. Accuracy = 87.98%, Precision = 100%, Recall = 83.87%, F1 Score = 91.23.

cnn computer-vision data-analysis data-augmentation efficientnet image-classification image-processing machine-learning

Last synced: 13 May 2026

https://github.com/mituskillologies/dkte-da-mar25

Programs conducted at DKTE's Engineering Institute, Ichalkaranji in training on Python Data Analytics March 2025.

data-analysis matplotlib numpy pandas python-programming tkinter-python

Last synced: 13 May 2026

https://github.com/se7en69/rna-seq-data-processing-and-analysis-pipeline

This pipeline automates essential steps for RNA-Seq data analysis, including quality control, read trimming, alignment to a reference genome, and coverage quantification. It leverages tools like FastQC, fastp, STAR, and bedtools to ensure high-quality results, with MultiQC reports providing an overview at each stage.

bioinformaitcs-scripting bioinformatics bioinformatics-pipeline data-analysis linux scripts shell

Last synced: 02 May 2026

https://github.com/inevolin/multivariate-data-analysis

Showcases of modern multivariate & multidimensional data analysis in industrial and high-tech settings.

analytics data-analysis data-science data-visualization javascript

Last synced: 09 Jun 2026

https://github.com/sarah-marion/sovereign-osint-toolkit

Sovereign OSINT Toolkit - Advanced, self-hosted intelligence platform for security researchers and investigators. Ethical, private and production-ready.

correlation-engine cybersecurity data-analysis docker fastapi infosec intelligence investigation open-source osint privacy python3 security-research security-tools threat-intelligence

Last synced: 02 May 2026

https://github.com/dimamirana/finding-correlation-among-social-media-usage-depression-sleep

In our project we tried to analysis whether there is a link between depression and social media usage time

anaconda data-analysis jupiter-notebook matplotlib-pyplot patternlab python

Last synced: 03 May 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/maddieemihle/python-challenge

Creating a Python script that analyzes financial records and election results

data-analysis python

Last synced: 09 Jun 2026

https://github.com/mohnish88/e-commerce-data-analysis

I analyzed sales data to identify trends and patterns, which significantly enhanced decision-making processes. Additionally, I created interactive visualizations to present these insights clearly and effectively, facilitating better understanding and communication of the data's implications.

data-analysis data-cleaning jupyter-notebook pandas plotly python python-library sales sales-analysis visulaization

Last synced: 03 May 2026

https://github.com/balajimohan18/foreign-exchange-rate-time-series-datascience-project

This project will use time series analysis to forecast the exchange rate between the euro and the US dollar. The project will use a variety of statistical techniques, such as ARIMA to model the data and forecast the exchange rate.

data-analysis data-analytics data-preprocessing data-science data-transformation data-visualization eda exploratory-data-analysis foreign-exchange-rates machine-learning model-fitting predictive-modeling python3 time-series time-series-analysis

Last synced: 14 May 2026

https://github.com/filip-kustura/python-covid-19-behaviors-analysis

Using Jupyter Notebook, this university project analyzes attitudes and behaviors related to the COVID-19 pandemic using a two-year survey from Imperial College London and YouGov research company. Utilizing Pandas, NumPy and Matplotlib, the data analysis focuses on three countries, exploring trends and insights throughout the pandemic.

covid-19 data-analysis data-visualization jupyter-notebook matplotlib numpy pandas python university-project

Last synced: 12 Apr 2026

https://github.com/pranav016/exploratory-data-analysis-of-sp500-dataset

This a data-analysis that I performed on the S&P 500 dataset and answered a few questions through data visualization techniques.

data-analysis

Last synced: 03 Jul 2026

https://github.com/rahulsm20/insurance-data

A data analytics project dealing with risk assessment and it's effects in health insurance.

data-analysis data-analytics machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 12 Apr 2026

https://github.com/noodleslove/house-of-representatives-analysis-ii

In this project, we want to estimate if a transaction will have capital gains exceeding $200 using the provided dataset.

coursework data-analysis data-science eda feature-engineering pandas python3

Last synced: 12 Apr 2026

https://github.com/brianrscode/delitos-cdmx

Página simple que muestra estadísticas sobre los delitos ocurridos en CDMX

analisis-de-datos data-analysis django pandas plotly python python3

Last synced: 18 Apr 2026

https://github.com/apoorvalal/misc_stata_ados

Misc Utility programs in Stata.

data-analysis stata stata-command

Last synced: 04 Feb 2026

https://github.com/moenessgannouni/englandweather

A mini-project that analyzes weather data in England usingLinear Regression and Multiple Linear Regression. Ideal for learning and applying statistical analysis and predictive modeling.

data-analysis data-visualization linear-regression multiple-linear-regression rprogramming

Last synced: 22 Mar 2025

https://github.com/borjamome/soho_cholera

Cholera deaths in the Soho District (London)

data-analysis data-visualization london r

Last synced: 04 Sep 2025

https://github.com/vasulab/knightshock

Shock tube experiment planning and data analysis package.

cantera data-analysis matplotlib numpy shock-tube

Last synced: 18 Jul 2025

https://github.com/aldrinjenson/smart-qa

Query any structured data and find relations using natural language

data-analysis llm nlp sql

Last synced: 06 Apr 2025

https://github.com/ajay1214/credit-card-transaction-dashboard

Credit Card weekly dashboard that provides real-time insights into key performance metrics and trends

data-analysis powerbi sql

Last synced: 04 Feb 2026