An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/i7t5/sentimentnlp

Sentiment analysis for COMP 435 Introduction to Machine Learning, Spring 2025

data-analysis jupyter-notebook machine-learning nlp python sentiment-analysis

Last synced: 29 Apr 2026

https://github.com/findmyway/dataframe-in-julia

A quick introduction of DataFrame in Julia for users from Python

data-analysis dataframe julia jupyter-notebook

Last synced: 29 Apr 2026

https://github.com/fatihilhan42/starbucks_analysis_turkey_and_world_with_python

In this project, firstly the brands for coffee in the world and then these brands in Turkey were examined. The data from the dataset, which you can find in the repo, was first organized using data cleaning algorithms. These cleaned data were then graphically extracted using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/carlos-edulira/mbabigdata-projeto

Entrega do projeto MBA Unipe Big Data BI

data-analysis delta minio python spark

Last synced: 29 Apr 2026

https://github.com/mfakhriazhar/python-data-analyst-tutorial

A collection of My Python learning files for Data Analyst purposes. Covers fundamental to advanced topics such as data exploration, visualization, statistical analysis, and the use of popular libraries like Pandas, NumPy, Matplotlib, and Seaborn. Suitable for personal documentation or shared learning references.

data-analysis data-science data-visualization exploratory-data-analysis portfolio python

Last synced: 29 Apr 2026

https://github.com/farhad-here/textprepx

A Multilingual Text Preprocessing Tool for English and Persian.

cleantext contractions data-analysis deep-learning emoji nlp nltk opp parsivar regex streamlit text-preprocessing textblob

Last synced: 29 Apr 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/sharoonjoseph11/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/saikumar787/car_price_prediction_using_linear-regression

A machine learning project to predict the selling price of used cars using regression techniques. Includes data preprocessing, model training, evaluation, and testing on new data.

car-price-prediction-with-machine-learning data-analysis joblib jupiter-notebook linear-regression-models model-deployment python scikit-learn standardscaler

Last synced: 29 Apr 2026

https://github.com/meinhere/ta-pendat

Proyek Akhir Mata Kuliah Penambangan Data - Klasifikasi Trauma Pasien Menggunakan Metode Naive Bayes

data-analysis data-mining naive-bayes-classifier python trauma

Last synced: 29 Apr 2026

https://github.com/shimaa83/eda-repo

Exploratory data analysis for Police and retail dataset in kaggle

data-analysis python

Last synced: 29 Apr 2026

https://github.com/al-ghaly/e-commerce-a-b-testing

A Statistical Analysis project in which I Performed an A/B test to analyze the effect of changing the user interface for an E-Commerce company's Website.

data-analysis matplotlib numpy pandas python python-data-analysis seaborn statistical-analysis statistics

Last synced: 29 Apr 2026

https://github.com/prithviraj-2003/cognifyz-data-science-internship

🎓 Data Science Internship at Cognifyz Technologies 📅 Duration: 2 Months 🧠 Worked on real-world restaurant data 🗂️ Completed structured tasks across 3 levels 📌 Tasks focused on EDA, data preprocessing, visualization, and analysis 📎 Task descriptions provided in an attached PDF

data-analysis data-science data-visualization matplotlib numpy pandas python3

Last synced: 29 Apr 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/theoplayz2/eda-explorer

Инструмент на Python для разведочного анализа данных (EDA) и визуализации, поддерживающий загрузку данных CSV и JSON, с модульной архитектурой ООП. Практическая работа по теме: "Обнаружение и визуализация данных для понимания их сущности" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

analysis battery-life cqrs csharp data-analysis eeg-analysis exploratorydataanalysis json-visualization matplotlib messaging profile-report python verilog visualization

Last synced: 29 Apr 2026

https://github.com/farhad-here/student_performance_analyzer

Student Performance Analyzer with python, it is on of my data analysis course project. I teach you about filter(),lambda,map() in python

data-analysis data-visualization filter kaggle kaggle-dataset lambda map pandas python python-tutorial streamlit

Last synced: 29 Apr 2026

https://github.com/muhammadusman-khan/e-commerce-store-eda

Exploratory Data Analysis on E-commerce store data to uncover insights about sales trends, customer behavior, and product performance using Python libraries like Pandas, NumPy, and Matplotlib/Seaborn.

data-analysis data-science data-visualization e-commerce eda exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/alam025/invoice-generator

Processed 500+ invoices with automated payment reminders and multi-currency PDF generation

api data-analysis finance fintech nextjs pdfkit prisma python stripe

Last synced: 08 Jun 2026

https://github.com/avazasgarov/soccer-hypothesis-testing

Statistical analysis comparing goal-scoring patterns in Men’s vs. Women’s FIFA World Cups using hypothesis testing.

data-analysis eda hypothesis-testing matplotlib-pyplot pandas pingouin python scipy

Last synced: 30 Apr 2026

https://github.com/aishwaryagade02/loan-funnel-optimization-analysis

Tracks how loan applications move through each stage, helps spot where people drop off, and gives clear insights to improve approval strategies and overall performance.

ab-testing data-analysis data-creation hypothesis-testing python reporting sql statistical-methods streamlit

Last synced: 30 Apr 2026

https://github.com/mxagar/eda_fe_summary

An 80/20 guide for Data Processing: Data Cleaning, Exploratory Data Analysis, Feature Engineering, Feature Selection.

data-analysis data-cleaning data-modeling data-science data-visualization eda exploratory-data-analysis feature-engineering feature-selection machine-learning pandas

Last synced: 30 Apr 2026

https://github.com/srinibas-masanta/ibm-applied-data-science-capstone

This repository contains the work completed for the Applied Data Science Capstone Project offered by IBM on Coursera. The capstone project is the final course in the IBM Data Science Professional Certificate series and serves as an opportunity to apply the skills and knowledge gained throughout the series to a real-world data science problem.

capstone-project data-analysis data-science data-visualization machine-learning python web-scraping

Last synced: 30 Apr 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/mitchellharrison/mitchellharrison.github.io

Welcome to my slice of the internet, where I share the knowledge that Duke gave me, so you don't have to spend the mortgage-sized amount to access it. Built with R, Python, Quarto, and love.

ai algorithms-and-data-structures blog data-analysis data-science data-visualization educational machine-learning portfolio portfolio-website quarto r r-language statistics tutorials

Last synced: 30 Apr 2026

https://github.com/abhi227070/ipl-2024-sold-player-data-analysis

This project analyzes IPL 2024 auctioned players' data, including name, team, cricket type, nationality, and price. Users input a player's name to access team, style, nationality, and auction price, aiding research and fantasy leagues. It offers insights into player dynamics, serving cricket enthusiasts with comprehensive data exploration.

data-analysis data-visualization dataanalytics machine-learning machine-learning-algorithms python3

Last synced: 30 Apr 2026

https://github.com/gitchaell/computer-scrapping

Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.

data-analysis data-structures data-visualization database dbdiagram export-excel powerbi scrapper-script scrapping xlsx

Last synced: 01 May 2026

https://github.com/badranalyst/e-commerce-customer-analysis-data-science-foundations-case-study

This case study explores e-commerce customer data through data exploration, pre-processing, and splitting. It includes model building and training to analyze customer behavior. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used for the analysis and model development.

data-analysis data-science dataset eda exploratory-data-analysis machine-learning matplotlib ml model-building model-training numpy pandas pre-processing python seaborn

Last synced: 01 May 2026

https://github.com/syarwinaaa09/investigating-netflix-movies

🎬 investigating netflix movie trends using python and pandas 📊

csv data-analysis matplotlib netflix pandas visualization

Last synced: 01 May 2026

https://github.com/falakrana/data-analysis-visualization

This repository showcases data analysis and visualization projects using Python and Tableau. It includes exploratory data analysis, interactive dashboards, and insightful visual stories derived from real-world datasets.

data-analysis data-visualization python tableau-public

Last synced: 01 May 2026

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026

https://github.com/mmfava/lonomia-host-plants-2024

This project investigates the relationship between Lonomia achelous and Lonomia obliqua caterpillars and their host plants. The project uses Docker for a consistent environment and R for statistical analysis, with detailed processes documented in Jupyter notebooks.

data-analysis host-plants lonomia lonomism r

Last synced: 01 May 2026

https://github.com/ariyaarka/result-analysis

A simple analysis of result based on different factors shown in figures

data-analysis jupyter-notebook matplotlib numpy-library pandas-dataframe python seaborn

Last synced: 01 May 2026

https://github.com/bpkaur/a-network-analysis-of-game-of-thrones

A Network analysis of Game of Thrones: To analyze the co-occurrence network of the characters in the Game of Thrones books

data-analysis data-science machine-learning networkx python3

Last synced: 01 May 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/sairupeshl/leo-orbital-congestion-analysis

Geospatial data analysis of the UCS Satellite Database using Python to map active LEO space assets, validate orbital parameters, and isolate mega-constellation traffic bottlenecks.

aerospace-engineering data-analysis geospatial-analysis orbital-mechanics pandas python satellite-data seaborn

Last synced: 08 Jun 2026

https://github.com/pablo1785/receipt-rs

Receipt processing backend built with Shuttle.rs, Axum and Azure Form Recognizer API

api-rest axum azure backend cognitive-services computer-vision data-analysis rust shuttle-rs sqlx

Last synced: 01 May 2026

https://github.com/monish-nallagondalla/sensor_fault_detection

This repo contains sensor data for analysis, focusing on sensor readings, their attributes, and classification (Good/Bad). It includes 500+ sensors with features for predictive modeling, anomaly detection, and sensor failure prediction.

anomaly-detection classification data-analysis data-science machine-learning predictive-modeling python sensor-data

Last synced: 01 May 2026

https://github.com/fbarffmann/project1

Analyzed factors influencing movie profitability using Python. Cleaned and visualized film industry data to uncover trends in budgets, sales, genres, and ratings.

box-office-analysis data-analysis data-visualization matplotlib movie-industry pandas python regression seaborn

Last synced: 01 May 2026

https://github.com/codesaadumair/data-science-monorepo

Comprehensive Data Science monorepo featuring EDA, Machine Learning, Preprocessing, Feature Engineering, and Visualization projects with Jupyter notebooks and Python.

data-analysis data-science data-science-projects data-visualization eda jupyter-notebook jupyterlab machine-learning python

Last synced: 01 May 2026

https://github.com/abdoomohamedd/python-data-analysis-projects

A collection of data analysis projects that I have worked on. Each project involves cleaning data, performing exploratory data analysis (EDA), and creating visualizations to extract insights. The projects utilize various Python libraries, including pandas, numpy, matp

data-analysis data-analysis-python data-cleaning data-visualization matplotlib matplotlib-pyplot numpy pandas python

Last synced: 01 May 2026

https://github.com/rafath0ssain/predihome

Data analysis using economic factors affecting living conditions across Canadian provinces.

data-analysis data-visualization dplyr ggplot2 graph kaggle linear-regression prediction-model r shiny tidyr

Last synced: 01 May 2026

https://github.com/bhoyarapurva23399/mini-erp-inventory-billing

Lightweight ERP inventory and billing web app built using Python Flask and SQLite — featuring product, customer, and dashboard management.

backend data-analysis erp flask inventory-billing mini-project python sqlite

Last synced: 01 May 2026

https://github.com/mateusoliveira30/top-intelligent-people

This project performs an exploratory analysis of the top_intelligent_people_in_the_world_5000.csv dataset, featuring some of the world's most intelligent individuals. Using pandas and matplotlib, the analysis includes checking for missing values, describing variables, and visualizing data.

data-analysis graphics kaggle-dataset python3

Last synced: 03 May 2026

https://github.com/more-joao/color-distance-luminance

Data analysis project that aims to establish a relation between the Canberra distance between white and any given color in the RGB colorspace and its luminance.

canberra-distance data-analysis luminance python r rgb

Last synced: 02 May 2026

https://github.com/faithererer/haokanvideo_spider

好看视频爬取与数据分析

data-analysis data-visualization python spider

Last synced: 02 May 2026

https://github.com/shridhar1504/milk-production-time-series-forecasting-datascience-project

This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.

adf arima-model augmented-dickey-fuller-test data-analysis data-analytics data-science data-visualization eda exploratory-data-analysis machine-learning machine-learning-algorithms python python3 residuals sarimax seasonality time-series time-series-forecasting trends

Last synced: 02 May 2026

https://github.com/shreeparab1890/unicorns-of-india-till-sep-2022-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Unicorns of India till Sep 2022.

analysis data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 02 May 2026

https://github.com/suma-aljudaia/my-portfolio

Suma Aljudaia | Portfolio – AI & Data Analysis Enthusiast

ai css data-analysis html machine-learning portfolio

Last synced: 02 May 2026

https://github.com/amishidesai04/emergency-calls-data-analysis-project

Welcome to the Emergency Calls Data Analysis project repository. This project is dedicated to extracting, processing, and visualizing data from the "Emergency – 911 Calls, Montgomery County" dataset, sourced from Kaggle. The main objective is to analyze trends in emergency calls in Montgomery County, Pennsylvania, spanning multiple years.

analysis data-analysis data-extraction data-processing data-science data-visualization numpy pandas python seaborn

Last synced: 02 May 2026

https://github.com/isaqueiros/motorpremium-predictions-mlpclassifier

This Jupyter Notebooks is an initial study of the application of sklearn neural network MLP Classifier model. The model is applied to dataset MotorPremiums, which is supplied separately in .csv format.

data-analysis data-science machine-learning neural-network python sklearn-library

Last synced: 02 May 2026

https://github.com/chuxinh/our-data-manual

All in one place for our data science learning journey by Chuxin and Melody

data-analysis data-science machine-learning python

Last synced: 09 Jun 2026

https://github.com/m0saan/python-for-data-analysis

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney,

data-analysis data-science ipython-notebook machine-learning matplotlib numpy pandas python

Last synced: 02 May 2026

https://github.com/benzerinsio/breastcancer-eda

📊 Análise Exploratória de Dados (EDA) - Câncer de Mama | Exploração de características clínicas para identificar padrões e relações no diagnóstico de câncer de mama.

analise-de-dados analise-exploratoria analise-exploratoria-de-dados data-analysis data-visualization diagnosis eda exploratory-data-analysis health-care medical-data python seaborn

Last synced: 02 May 2026

https://github.com/sarah-marion/sovereign-osint-toolkit

Sovereign OSINT Toolkit - Advanced, self-hosted intelligence platform for security researchers and investigators. Ethical, private and production-ready.

correlation-engine cybersecurity data-analysis docker fastapi infosec intelligence investigation open-source osint privacy python3 security-research security-tools threat-intelligence

Last synced: 02 May 2026

https://github.com/abhijeet107/task-4

Design an interactive dashboard for business stakeholders.

data-analysis excel-csv tableau-dashboards tableau-public

Last synced: 22 Jan 2026

https://github.com/lit26/data_jobs_analyzing

Data analysis for data jobs

data-analysis topic-modeling

Last synced: 26 Mar 2025

https://github.com/sgb31/covid-19-data-analysis

"In this project, I analyzed COVID-19 data to explore trends, case growth, and key patterns. I worked on cleaning the data, performing exploratory analysis, and visualizing infection rates, recoveries, and fatalities. The goal was to gain insights into how the pandemic evolved and its overall impact.

data-analysis data-visualization matplotlib pandas python seaborn

Last synced: 13 May 2026

https://github.com/who-else-but-arjun/isro_xrf_sr

Source Codes for super resolution of the lunar elemental abundance map using a semi-supervised deep spatial interpolation model. This hybrid approach combined ResNet50 for spatial feature extraction with Graph Neural Network (GATv2Conv) layers and Convolutional Neural Networks (CNNs), followed by fusion layers.

cnn data-analysis graph-neural-networks pytorch semi-supervised-learning spatial-interpolation super-resolution

Last synced: 30 Apr 2026

https://github.com/kevingastelum/mydataanalysis

My DataAnalyst Projects | Python, SQL, Excel, PowerBI & Tableau

data-analysis python sql visualization

Last synced: 20 May 2026

https://github.com/jofaval/iris-flowers

Multilabel Classification of the famous Iris Flowers Dataset from Ronald Aylmer Fisher in 1936

classification data-analysis data-science data-visualization google-colab iris-flowers kaggle machine-learning python scikit-learn xgboost

Last synced: 05 Apr 2026

https://github.com/cyberoctane29/noaa-lightning-analysis

This project explores lightning strike data from the National Oceanic and Atmospheric Administration (NOAA) to identify seasonal trends and analyze strike frequency across months. It demonstrates data manipulation, aggregation, and visualization using Python, providing insights into lightning activity patterns.

data-analysis data-science data-visualization eda python

Last synced: 20 Apr 2026

https://github.com/zen204/accenture-tech-news-summarization-engine

A tool developed to analyze knowledge graphs from technology news articles, uncovering insights and trends about technology products, platforms, services, and their industry impact. Built during an internship at Accenture to inform decision-making in the tech landscape.

data-analysis decision-making graph-visualization industry-insights jupyter-notebook knowledge-graph machine-learning python tech-news tech-trends

Last synced: 29 Apr 2026

https://github.com/clchinkc/zombie

Personal project, Python, NumPy, Matplotlib, Pygame, Scikit-learn, TensorFlow, Docker

algorithms data-analysis docker machine-learning matplotlib numpy pygame python sklearn tensorflow zombie-simulation

Last synced: 05 Apr 2026

https://github.com/lulloooo/bizdata-nexus

Collection of my Business & Data Analysis projects, from professional/academic endeavors to passion-driven explorations 📊

business-analysis data-analysis economics etl excel finance mysql python r risk-analysis

Last synced: 05 Apr 2026

https://github.com/darkdk123/house-valuation-model

A Challenge Project in a Boot-Camp to create a ML Model to predict the prices of houses in Boston Massachusetts from multiple parameters Using Multivariable Regression.

data-analysis data-science data-visualization matplotlib-pyplot multivariate-regression predictive-modeling statistics

Last synced: 07 Jul 2025

https://github.com/jovicdev97/Financial-Loan-DataScience-Notebook

using numpy and pandas to analyze a synthetic loan dataset with python

data-analysis matlabplot numpy pandas plotting python seaborn

Last synced: 12 Mar 2025

https://github.com/antononcube/wl-tilestats-paclet

Wolfram Language (aka Mathematica) paclet for statistics over 2D tillings. (Tile binning, aggregation functions application, etc.)

2d-data data-analysis geospatial-data mathematica wolfram-language

Last synced: 20 Mar 2026

https://github.com/gui-sitton/prepaid

In this project I work as an analyst for the telecommunications company Megaline. The company offers its customers prepaid plans, Surf and Ultimate. The sales department wants to know which plans bring in the most revenue in order to adjust the advertising budget

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 22 May 2026

https://github.com/antononcube/java-tilestats

Java package for statistics over 2D tillings. (Tile binning, aggregation functions application, etc.)

data-analysis hexagonal-grids

Last synced: 02 Apr 2025

https://github.com/r8vnhill/hdp

Hentai data processing

data-analysis e-hentai hentai kotlin

Last synced: 02 Apr 2025

https://github.com/susshiii/sql-layoffs-data-cleaning-and-eda

Full SQL project using MySQL to clean and analyze a real-world tech layoff dataset from 2020–2023.

data-analysis data-analytics-project data-cleaning eda layoffs mysql sql

Last synced: 07 Jul 2025

https://github.com/m-coder-umer/sales-dashboard-power-bi-project

An interactive Sales Dashboard built with Power BI using MySQL data, showcasing monthly trends, top-performing products, and key sales KPIs (Key Performance Indicators).

business-intelligence data-analysis data-cleaning data-modeling data-visualization dax interactive-dashboard mysql power-query powerbi sales-dashboard sql time-series-analysis

Last synced: 07 Jul 2025

https://github.com/lashawnfofung/super-heroes-analysis-project

This portfolio project involves a detailed analysis of 732 superhero records from the heroes_information.csv dataset, comprising 11 columns of unique characteristics for each hero. The primary goal is to showcase key insights derived from this rich dataset, demonstrating proficiency in data analysis using SQL.

data-analysis datasets mysql-database mysql-server mysql-workbench sql

Last synced: 07 Jul 2025

https://github.com/mikhaelmounay/salty-med

Salty Mediterranean - Grade 12 Data Analysis & Visualization Capstone Project

data-analysis data-visualization

Last synced: 02 Feb 2026

https://github.com/borjamome/top-goleadores

Mejores delanteros en Europa según los datos

data-analysis data-visualization football-analytics r

Last synced: 26 Mar 2025

https://github.com/borjamome/explorando-madrid

Exploring Madrid: A Data-driven Analysis with R 🐻🌳

data-analysis data-visualization madrid r

Last synced: 26 Mar 2025

https://github.com/manish506/loan-approval-prediction

Explore predictive modeling in this project by applying classification techniques to a loan approval dataset. Analyze and preprocess the data, then use models like K-Nearest Neighbors, Random Forest, SVC, and Logistic Regression to predict loan outcomes. Gain insights into approval factors and enhance prediction accuracy.

classification classification-models data-analysis data-science jupyter-notebook loan-approval-prediction machine-learning predictive-analytics predictive-modeling project python

Last synced: 19 Jan 2026