An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/saikumar787/car_price_prediction_using_linear-regression

A machine learning project to predict the selling price of used cars using regression techniques. Includes data preprocessing, model training, evaluation, and testing on new data.

car-price-prediction-with-machine-learning data-analysis joblib jupiter-notebook linear-regression-models model-deployment python scikit-learn standardscaler

Last synced: 29 Apr 2026

https://github.com/istinnew/eniac_ab_insight

Dive into a comprehensive analysis aimed at boosting iPhone 13 sales by optimizing the Click-Through Rate (CTR) of the “SHOP NOW” button, compare different button designs and determine the most effective strategy for increasing engagement.

ab-testing data data-analysis data-engineering data-science data-visualization google googlecolab libraries python testing testing-tools visual-studio-code

Last synced: 29 Apr 2026

https://github.com/cicku/en.650.672

HW of EN.650.672

analytics data-analysis numpy pandas

Last synced: 05 May 2026

https://github.com/monish-nallagondalla/universal-bank

Credit Card Ownership Prediction A machine learning project that predicts credit card ownership using features like age and income, balancing class distributions for improved accuracy.

classification-models credit-card-prediction data-analysis data-classification decision-tree-classifier imbalanced-datasets machine-learning model-evaluation python scikit-learn

Last synced: 05 May 2026

https://github.com/akash-47-tank/personalized-e-commerce-review-summarizer

Personalized E-commerce Product Review Summarizer: A Streamlit app that summarizes product reviews (e.g., from a CSV) using T5-small and tailors summaries to user preferences (price, durability, etc.) with NLP and lightweight ML.

data-analysis e-commerce machine-learning nlp personalization portfolio python scikit-learn sentiment-analysis streamlit t5 transformers web-app

Last synced: 05 May 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 05 May 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/iamrajmani/sentimental-analysis

Sentimental Analysis - Final Year College Project

data-analysis data-visualization machine-learning python pytorch

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/yashpaneliya/bank-loan-default-analysis

Analyze and understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default.

data-analysis loan-default-analysis matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/ankitwalimbe/sentiment-analysis

Sentiment analysis of Amazon Fashion reviews using VADER and a baseline ML model (TF-IDF + SGDClassifier). Includes visualizations, reproducible notebook, and recruiter-ready documentation.

data-analysis machine-learning matplotlib nlp pandas python seaborn sentiment-analysis sklearn

Last synced: 06 May 2026

https://github.com/drill-n-bass/dealavo-project

Cartesian product from dictionary to list of dictionaries and faster methods for finding index than the `index` method.

data-analysis data-analysis-python matplotlib pandas python python3 random timeit

Last synced: 06 May 2026

https://github.com/vimlesh-gupta/blinkit_data_analytics_project

End-to-end Blinkit data analytics project using Python, SQL Server & Power BI

blinkit data-analysis eda pandas powerbi python sql-server

Last synced: 06 May 2026

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026

https://github.com/joseph-pabian/life-expectancy-

Statistical analysis of life expectancy in developed vs developing countries using SQL and Python

data-analysis duckdb public-health python sql statistics

Last synced: 07 May 2026

https://github.com/ddihora1604/advanced_business_analytics_on_world_bank_global_financial_inclusion_data_2021

Bridging the Gaps in Financial Inclusion: Understanding the Cash-Credit Paradox, Divide between Cash and Digital Payments, and Financial Resilience.

advanced-excel business-analytics data-analysis data-engineering data-mining data-visualization database exploratory-data-analysis machine-learning preprocessing-data python

Last synced: 07 May 2026

https://github.com/jpgiant/gujaratrainfallanalysis_2021

Analysis about the rainfall that occurred in the districts of Gujarat state in 2021

data-analysis exploratory-data-analysis exploratory-data-visualizations matplotlib numpy pandas-python python

Last synced: 07 May 2026

https://github.com/biginformatics/git-basics

Hands-on Git and GitHub lessons for analysts and statisticians

data-analysis git github public-health training

Last synced: 10 Jun 2026

https://github.com/vyjayanthipolapragada/genai_smart_retail_recommendation

GenAI Smart Retail is a recommendation system designed for retail environments. It provides personalized product recommendations to users based on product descriptions using a content-based filtering approach. The system leverages FastAPI for backend integration, allowing users to interact with the recommendation engine via an API. This project aim

content-based-recommendation data-analysis data-science data-visualization fastapi gen-ai instacart-data jupyter-notebook open-ai python3 retail scikitlearn-machine-learning stream

Last synced: 07 May 2026

https://github.com/bnvulpe/regression-and-time-series

This work centers on assessing and comparing predictive models for regression and time series prediction using specific datasets, with the goal of selecting the most effective methodology for unseen test data.

colab data-analysis data-analysis-python data-science data-visualization forecasting jupyter-notebook machine-learning model-evaluation predictive-modeling python regression sarima sarimax time-series-analysis time-series-analysis-and-forecasting

Last synced: 08 May 2026

https://github.com/blladerunner/customer-churn-dashboard

Customer Churn Dashboard — SQL + Python analytics project exploring customer retention patterns, churn rate by demographics and services, and key insights for telecom business strategy.

business-intelligence churn-analysis customer-retention dashboard data-analysis data-analytics data-science pandas powerbi python sql sqlite telecom

Last synced: 08 May 2026

https://github.com/devexpress-examples/wpf-pivot-grid-group-date-time-values

This example shows how to group date-time values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 08 May 2026

https://github.com/sabaasif2501/netflix-data-analysis

Exploratory data analysis of Netflix content using Python and pandas. Content types, genres, countries, and release years.

data-analysis netflix pandas portfolio-project python

Last synced: 08 May 2026

https://github.com/mrunmayee3108/financial-chatbot

A Python chatbot for analyzing financial data of companies with revenue, income, assets, cash flow, and debt ratio queries

chatbot data-analysis jupyter-notebook pandas python python3

Last synced: 09 May 2026

https://github.com/rizkipragustono/data_analysis_spark

Exploration: Data Analysis using Spark

apache-spark data-analysis pyspark python spark-sql sql

Last synced: 09 May 2026

https://github.com/tsbarr/toronto-open-data

Analysis of Toronto's open data initiatives. 🌆 Exploring Toronto's urban systems through data science 📊 Python-based analyses of public datasets 🔍 Focus on community impact and urban patterns 🎓 Academic rigour meets practical insights 🔄 Regularly updated with new analyses

api-integration civic-tech ckan-api data-analysis data-cleaning data-science data-visualization exploratory-data-analysis jupyter-notebook open-data pandas public-data python tableau toronto urban-analytics

Last synced: 09 May 2026

https://github.com/magnus0969/black-friday-sales-analysis

An in-depth analysis of Black Friday sales data to uncover trends, customer behavior, and product insights. Utilizing Python, data visualization, and machine learning techniques, this project provides key business intelligence to optimize sales strategies.

analysis data-analysis data-science python sales-analysis

Last synced: 09 May 2026

https://github.com/master-helix/ibm-data-analyst-certification-stock-analysis-project

This is a mini project repository of my IBM Certification involving stock analysis and plotting of Tesla and GameStop

analytics data data-analysis data-visualization ibm matplotlib pandas python web-scraping

Last synced: 09 May 2026

https://github.com/mmfava/qualesuapergunta-scripts-base-2015-2018

Este repositório contém scripts R utilizados durante meu trabalho de consultoria em bioestatística. Os scripts abrangem várias análises estatísticas e serviram como base para análises que foram realizadas. Eles não são scripts das consultorias ou assessorias em si.

analytics data-analysis r

Last synced: 20 May 2026

https://github.com/salma-mamdoh/exploring-the-evolution-of-linux-project

My Project to learn the Basics of Analysis on DataCamp

data-analysis datacamp pandas python time-series-analysis

Last synced: 09 May 2026

https://github.com/sdley/cas_pratiques_a_rendre

Exercices pratiques de traitement de données avec python.

data-analysis pandas python

Last synced: 09 May 2026

https://github.com/vasishta03/econovisionai

A simple Python desktop app to search and explore OECD economic data (CSV) and report summaries (TXT/JSON) using a modern CustomTkinter GUI—no SQL or web frameworks needed.

csv customtkinter data-analysis desktop-app economic-data gui json local-app oecd pandas python search tkinter

Last synced: 10 May 2026

https://github.com/macdon112/credit-card-fraud-detection

Comparing ML models (Random Forest, KNN, Decision Tree) for credit card fraud detection using SMOTE and stratified cross-validation.

classification data-analysis fraud-detection imbalanced-data machine-learning python scikit-learn

Last synced: 10 May 2026

https://github.com/mozeel-v/spam-detection

ML-powered SMS Spam Classifier using NLP and Scikit-learn. Detects and filters spam messages with interactive Streamlit UI.

classification data-analysis mnb streamlit

Last synced: 10 May 2026

https://github.com/crazy-dot/covid-19-analysis

This project performs an in-depth analysis and visualization of COVID-19 data, focusing on India and its states/union territories.

covid-19-india data-analysis jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 10 May 2026

https://github.com/sdley/tp2_datascience

Exercice Pratique de traitement de donnees avec python

data-analysis pandas python

Last synced: 11 May 2026

https://github.com/hrosicka/czechpopulationestimation

This GitHub repository contains Python code for data analysis and population prediction in the Czech Republic up to the year 2050. The code is written in Python and utilizes the Pandas and Matplotlib libraries.

data-analysis data-visualization matplotlib matplotlib-figures matplotlib-pyplot pandas pandas-dataframe pandas-library pandas-python python python3

Last synced: 11 May 2026

https://github.com/leticia-ducatti/sales-dashboard-project

Interactive sales dashboard built with Python and Streamlit — shows KPIs, allows filtering, and visualizes sales data.

data-analysis pandas plotly python streamlit

Last synced: 12 May 2026

https://github.com/ggarciajavier/udacity-dalf-project2-wrangle-openstreetmap-data

Work performed for the 2nd project of Udacity Data Analyst Nanodegree: OpenStreetMap data wrangling and analysis.

data-analysis openstreetmap python sql

Last synced: 12 May 2026

https://github.com/krypten/playingcardsstatisticalanalysis

Statistical Analysis of Playing Cards (Descriptive Statistics: Final Project)

data-analysis machine-learning machinelearning python statistics udacity

Last synced: 12 May 2026

https://github.com/elishah-john/happiness-report-2019

Analysis of "Happiness Report 2019" using python.

data-analysis data-visualization educational jupyter-notebook python

Last synced: 12 May 2026

https://github.com/roland045/smart_fluid_sedimentation_tester

Control program for custom developed smart fluid sedimentation tester system

arduino data-analysis instrumentation measurement sensor

Last synced: 13 May 2026

https://github.com/ani717/pneumonia_detection_effecientnet_b7

Pneumonia Detection in Chest X-ray Image with EfficientNet-B7. Accuracy = 87.98%, Precision = 100%, Recall = 83.87%, F1 Score = 91.23.

cnn computer-vision data-analysis data-augmentation efficientnet image-classification image-processing machine-learning

Last synced: 13 May 2026

https://github.com/nlink-jp/shell-agent-v2

macOS local-first chat & agent tool with interactive data analysis (Wails v2 + React)

data-analysis duckdb golang llm macos react wails

Last synced: 13 May 2026

https://github.com/deliprofesor/joblocationmapper

JobLocationMapper is a Python tool that visualizes job listings on an interactive map. It uses city and state data to place job markers accurately and color-codes them by occupation (Software, Marketing, Design). The map clusters markers for better organization, and users can click on them to view job details.

clustrered-markers data-analysis data-visualization folium geocoding geographical-visualization interactive-map job-listings map-visualization pandas python

Last synced: 14 May 2026

https://github.com/satvikpraveen/matplotlibmasterpro

📷 MatplotlibMasterPro is a complete, portfolio-ready project to master data visualization using matplotlib. Includes 16 notebooks, real datasets, exportable plots, custom themes, Streamlit dashboard, and Docker support. Ideal for learners and data professionals.

charts custom-plots dashboarding data-analysis data-science data-visualization educational-project interactive-visualizations jupyter-notebook matplotlib notebooks open-source plotting portfolio-project python python-utilities reproducible-research subplots time-series-analysis visualization-tools

Last synced: 14 May 2026

https://github.com/yashsingh43/cdc-sleep-duration-health-analysis

Analysis of CDC BRFSS 2022 data exploring how sleep duration relates to mental and physical health outcomes.

beautifulsoup brfss cdc data-analysis data-visualization matplotlib pandas plotly public-health python

Last synced: 11 Jun 2026

https://github.com/prgermux/data-plotter

This Python application provides a graphical user interface (GUI) for analyzing and visualizing data from various sources. It uses the PyQt5 framework for the GUI and Matplotlib for plotting data. The application supports multiple file formats, allows users to select any columns for the X and Y axes, and provides dynamic plots.

automation data-analysis plott python

Last synced: 12 Jun 2026

https://github.com/luizassimoes/q5ga-latency-and-throughput

Quick 5G Analyser: PyQT5 software developed to help with simple graphical analysis and chart generating for ping and iperf3 tests.

data-analysis data-visualization pyqt5 python

Last synced: 13 Jun 2026

https://github.com/gmalbert/immigration

Immigration Data Analysis

data-analysis immigration

Last synced: 14 Jun 2026

https://github.com/jkazari/rollercoaster-eda

Repository of a small data-analysis project in R for Mathematical Software class on the 3rd semester of studying Mathematics at Gdańsk University of Technology

data-analysis r

Last synced: 14 Jun 2026

https://github.com/prathmesh2507/global-stock-intelligence-dashboard

Interactive Global Stock Market Analytics Dashboard built using Python, YFinance, Pandas, Streamlit, and Plotly. Analyze 20+ countries and 400+ top stocks with advanced visualizations and financial insights.

dashboard data-analysis data-visualization python stock-analysis streamlit

Last synced: 15 Jun 2026

https://github.com/dcs-training/data-wrangling-and-vis-pandas

Introduction to analyzing structured data with the Python libraries pandas, for CSV and TSV data, and ElementTree, for XML data. Go to the readme file

data-analysis data-visualisation data-wrangling python

Last synced: 16 Jun 2026

https://github.com/kheriberto/bedu_dc

Ejercicios del curso de "python desde 0" de la plataforma BEDU

data-analysis python

Last synced: 18 Jun 2026

https://github.com/preetesh21/spotme

This repository is using the web-based API provided by Spotify to retrieve data and then analyse it.

api data-analysis

Last synced: 18 Jun 2026

https://github.com/httpsnooow/graphs-analysis-neo4j

Challenges from the "Neo4J - Data Analysis with Graphs" course by Digital Innovation One (DIO).

challenge data-analysis data-engineering data-science graph neo4j neo4j-database neo4j-graph

Last synced: 18 Jun 2026

https://github.com/shahaf-f-s/feature-space

A modular framework for combining pandas series features

data-analysis data-science feature-engineering

Last synced: 19 Jun 2026

https://github.com/angelmtenor/idafc

Udacity's Intro to Data Analysis

data-analysis

Last synced: 20 Jun 2026

https://github.com/dcs-training/intro-to-statistics

Intro to Statistics workshop. In this repo, you are going to find the code and files we are going to use for the practical part of the workshop, together with the ppt associated with this training. Go to the readme file

data-analysis data-visualisation data-wrangling r statistics

Last synced: 20 Jun 2026

https://github.com/evanmathew/northwind-traders

SQL-powered analysis of sales, employee performance, and customer behavior using PostgreSQL window functions. This project uncovers key business insights to optimize decision-making.

case-study data-analysis jupyter-notebook northwind-traders postgresql python-postgresql sql

Last synced: 20 Jun 2026

https://github.com/anburocky3/cbse-schools-data

Fetch CBSE Schools in seconds and use it for your data projects

cbse data data-analysis data-science grabber nextjs

Last synced: 24 Jun 2026

https://github.com/imosudi/unsupervised-ml-kmeans-analysis

K-Means clustering analysis using synthetic datasets generated with scikit-learn, including meshgrid visualisation, silhouette score evaluation, and investigation of cluster count and random seed effects.

clustering data-analysis jupyter-notebook kmeans kmeans-clustering machine-learning matplotlib python3 scikit-learn silhouette-score unsupervised-learning

Last synced: 25 Jun 2026

https://github.com/vevdokimovm/python-course-notebooks

Python course practice scripts, Jupyter notebooks and deep learning exercises from Grokking Deep Learning

data-analysis deep-learning jupyter python

Last synced: 27 Jun 2026

https://github.com/manganite/vibespin

VibeSpin is a Python framework for simulating and analyzing 2D lattice spin systems (Ising, XY, and q-state Clock models) with Numba-accelerated Monte Carlo dynamics, correlation/structure diagnostics, and reproducible benchmarking workflows.

clock-model critical-phenomena data-analysis ising-model lattice-models monte-carlo-simulation phase-transitions physics-simulation python scientific-computing spin-models spin-systems statistical-mechanics xy-model

Last synced: 29 Jun 2026

https://github.com/jedrzej-wydra/competition-cooperation

Competition, cooperation, and parental effects in larval aggregations formed on carrion by communally breeding beetles Necrodes littoralis (Staphylinidae: Silphinae)

data-analysis non-linear-regression r

Last synced: 20 Aug 2025

https://github.com/mikkelrask/henryrollins-scraper

FANATIC! A dataset of Henry Rollins' listens on his KRCW radio show, with data dating back to 2017 - 496 episodes of weird and rare finds, fast paced punk and frog sounds. Includes a scraper that keeps the data up-to-date with henryrollins.com

archive data-analysis data-visualization music

Last synced: 29 Jun 2026