An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/stoll-jonathan/sorting_algorithm_analyzer

C++ program which analyses the performance of different sorting algorithms on a dataset of random numbers

bubble-sort data-analysis insertion-sort merge-sort sorting-algorithms

Last synced: 01 Apr 2025

https://github.com/yash-3-bit/online-sales-analysis

Project-Merging the different months datasets and performing the data cleaning ,Analysis and Visualization

data-analysis data-visualization pandas-library

Last synced: 27 Mar 2025

https://github.com/noodleslove/house-of-representatives-analysis-ii

In this project, we want to estimate if a transaction will have capital gains exceeding $200 using the provided dataset.

coursework data-analysis data-science eda feature-engineering pandas python3

Last synced: 12 Apr 2026

https://github.com/ernanej/data-science-dca0131

Files, developed throughout the 2024.1 semester of the Data Science discipline taught at the Federal University of Rio Grande do Norte by the Department of Computer Engineering and Automation (DCA). 📚

big-data data-analysis data-science ia

Last synced: 30 Mar 2025

https://github.com/apoorvalal/misc_stata_ados

Misc Utility programs in Stata.

data-analysis stata stata-command

Last synced: 04 Feb 2026

https://github.com/theveryhim/frequent-item-sets-and-lsh

A practice on finding frequent item sets and similar items in pysaprk framework

big-data data-analysis frequent-itemset-mining locality-sensitive-hashing pyspark text-processing

Last synced: 03 Jul 2025

https://github.com/mpoojithavigneswari/bangalore-house-price-prediction

This project involves creating a website that predicts Bangalore house prices with 94.65% accuracy using a machine learning algorithm.

data-analysis data-science flask-server machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 12 Apr 2026

https://github.com/sarveshdhond/top_25_cad_stocks

In this project I have used Python Jupyter lab and Pandas to import data set from Yahoo stocks website. I have imported the top 25 most active Canadian stocks on 12th July 2024. This project shows skills such as Python, Web Scrapping and Pandas.

data-analysis pandas-dataframe python webscraping

Last synced: 01 Apr 2025

https://github.com/quantumudit/groceries-basket-analysis

This project performs market basket analysis using Power BI and Python to reveal associations between grocery items. It involves transforming raw transaction data into a processed dataset, creating interactive Power BI reports, and generating key insights through Python, enabling data-driven decision-making.

data-analysis data-visualization pandas powerbi python

Last synced: 12 Apr 2026

https://github.com/abdelrahmanbayoumi/titanic-machine-learning-from-disasters

Knowing from a training set of samples listing passengers who survived or did not survive the Titanic disaster, can our model determine based on a given test dataset not containing the survival information, if these passengers in the test dataset survived or not.

data-analysis data-science data-visualization machine-learning pandas

Last synced: 09 Apr 2025

https://github.com/charlescro/reddit-classification-nlp

Analyzing subreddit language via Reddit API and NLP techniques.

data-analysis data-science data-visualization nlp-machine-learning reddit-api scikit-learn

Last synced: 03 Apr 2025

https://github.com/sreekar0101/electric-vehicle-market-growth-and-incentive-impact-analysis-dashboard

About This project involves the development of a comprehensive Tableau dashboard to analyze the growth and market dynamics of electric vehicles (EVs). The dashboard reveals key insights, including a 20% increase in EV adoption over five years, the dominance of Battery Electric Vehicles (BEVs) which make up 60% of the market

data-analysis data-visualization tableau-desktop

Last synced: 07 Jan 2026

https://github.com/jnyambok/epl_dashboard

English Premier League Dashboard summarizing match data from 2009-2024

data-analysis data-science gcp powerbi

Last synced: 04 Sep 2025

https://github.com/cassandrajm/reddit-dashboard

INTERACTIVE DASHBOARD: Analyzing Political Discourse on Reddit: A Multi-Faceted NLP Approach to Toxicity, Bias, and Political Stance

capstone data data-analysis data-science politics python reddit

Last synced: 09 Apr 2025

https://github.com/noorulhudaajmal/customer-segmentation-analysis

Customer segmentation and analysis of purchasing behaviour

cluster-analysis customer-segmentation data-analysis

Last synced: 07 Oct 2025

https://github.com/muthukumar0908/imdb_movie_analysis_with_powerbi

The project aim is to analyze the dataset using Power Bi, The dataset is related to IMDB Movies.

data-analysis data-visualization powerbi

Last synced: 12 Jun 2025

https://github.com/wisdom-osborn/data-analytics-course-online-

🔍 Data Analytics with Python — Hands-on Course Materials Jupyter notebooks, projects, and datasets based on the freeCodeCamp Data Analysis with Python certification. Learn NumPy, Pandas, data cleaning, and visualization through real-world examples

data data-analysis data-science data-visualization freecodecamp numpy pandas pandas-dataframe project python

Last synced: 19 Apr 2026

https://github.com/jcm-ai/quantium-data-analytics-virtual-experience-program

This repository contains all about the proposed solutions to the assignments that I was required to complete as part of the Quantium Data Analytics Virtual Experience Program. 📊📈📉👨‍💻

commercial-thinking communication-skills data-analysis data-validation data-visualisation data-wrangling jupyter-notebook matplotlib-pyplot numpy-library pandas-python presentation-skills programming python3 scipy-stats seaborn statistical-testing

Last synced: 16 May 2026

https://github.com/pngo1997/life-expectancy-logistic-regression

Life expectancy analysis project using logistic regression.

data-analysis logistic-regression r rmarkdown

Last synced: 10 Jun 2026

https://github.com/victorlcastro-dsa/pbl-datacamp

This repository features projects from DataCamp's Project-Based Learning (PBL) courses, showcasing practical applications of data analysis, machine learning, and visualization. Explore real-world datasets and interactive results that highlight the skills gained through hands-on learning.

data-analysis data-science data-visualization datacamp-projects hypothesis-testing machine-learning project-based-learning

Last synced: 30 Jun 2026

https://github.com/fbarffmann/citibike-covid-analysis

Analyzed NYC CitiBike usage during March 2020 to assess the impact of COVID-19 using Python and Tableau. Includes ridership breakdowns, user type trends, and interactive dashboard.

citibike covid19 data-analysis data-visualization exploratory-data-analysis pandas python tableau transportation

Last synced: 12 Apr 2026

https://github.com/francois-lenne/eletric_vehicle_usa

the project is purely educational the main goal is to use fabric

data-analysis data-engineering delta-lake fabric jupyter-notebook pyspark python spark

Last synced: 12 Apr 2026

https://github.com/soajala/shopify-sales-analysis-powerbi

End-to-end Power BI dashboard project analyzing Shopify sales data with real-time metrics, DAX, and business insights.

business-intelligence data-analysis data-visualization dax interactive-dashboard powerbi sales-analysis shopify

Last synced: 05 Sep 2025

https://github.com/shridhar1504/loan-classification-datascience-project

This project uses machine learning algorithms to predict the classification of loan status. The dataset is loaded and some transformation is done using SQL for getting a proper dataset with some valid informations.

classification data-analysis data-cleaning data-science data-visualization eda loan-prediction loan-status machine-learning predictive-modeling sql supervised-learning

Last synced: 09 Apr 2025

https://github.com/noeldevelops/stem-degrees-analysis-cpp

C++ Data Analysis, I/O - takes an external data file for processing, performs some statistical analysis, and displays the results in the console

cpp data-analysis

Last synced: 29 May 2026

https://github.com/wo0fle/sfrcp

The program used for a research study I conducted: "Comparison of Star Formation Rate in Spiral versus Elliptical Galaxies."

astronomy astropy data-analysis galaxy jupyter-notebook python research research-project

Last synced: 03 Apr 2025

https://github.com/ymorsi7/caliwageanalysis

California employment and wage analysis on data from the past decade.

data-analysis data-science ipynb jupyter-notebook

Last synced: 21 Jan 2026

https://github.com/wtmcgrew/sql-credit-risk-analysis

Credit Risk Analysis using SQL & Excel – Approval trends by FICO, DTI, PTI, LTV, and delinquencies.

case-study credit-risk data-analysis financial-analysis loan-applications portfolio-project sql sqlite underwriting

Last synced: 04 Jul 2025

https://github.com/camara94/data_analyse_series_temporelles

Dans ce tutoriel, nous allons répondre aux questions suivantes: 1. Lire les données Microsoft à l'aide du package **Pandas Data reader** 2. Obtenez le **prix maximum** de l'action de **2017 à 2022** 3. Quelle est la **date du cours le plus élevé** de l'action ? 4. Quelle est la **date du cours le plus bas** de l'action ?

data-analysis data-analysis-python data-science data-structures-and-algorithms data-visualization serie series-forecasting

Last synced: 09 Apr 2025

https://github.com/dmdlgg/spotify-analysis

An interactive data analysis app built with Python, Pandas, Plotly, and Streamlit, showcasing insights about the top 1000 most played songs on Spotify. Dataset sourced from Kaggle. Users can explore the frequency, popularity, and most played songs by artist in a clean and intuitive interface.

data-analysis data-visualization pandas plotly python streamlit

Last synced: 11 May 2026

https://github.com/shrinidhi857/simpledataanalysisonstartups

The Indian startup ecosystem has experienced remarkable growth over the past decade, becoming a hotbed of innovation and entrepreneurship. In this data analysis we are segregating fields ,finding new insights.

data-analysis data-science data-visualization indian-startups

Last synced: 17 Sep 2025

https://github.com/xza85hrf/excel-comparison-app

Excel Comparison Application is a Python-based tool that compares two Excel files and generates a new Excel file with the differences. It's primarily designed to help in database updating by identifying new clients. The app also has a graphical user interface for easier use and logs operations for potential troubleshooting.

case-sensitive-comparison data-analysis data-difference database-comparison database-updates excel-comparison file-merging file-processing gui-application new-client-detection python

Last synced: 25 Mar 2025

https://github.com/nsandoya/python_scrp_project

This is a tool specially made for Dipaso ecommerce website. You can extract data from there, analyze it and see keywords, brands, and categories frecuency, prices distribution and other market tendencies as well —all in a group of friendly stadistic tables and graphics (exported from a Jupyter notebook) :)

beautifulsoup4 data data-analysis jupyter-notebook pandas python3

Last synced: 28 Apr 2026

https://github.com/sumit0ubey/internship

This repository showcases the tasks and projects I completed during various internships. It includes work across diverse domains such as: Data Analysis: Exploratory data analysis, data visualization, and insights generation using Python and libraries like Pandas, Matplotlib, and Seaborn. Backend Development: Designing and implementing RESTful API

backend-development data-analysis python-developer

Last synced: 05 Sep 2025

https://github.com/fbarffmann/nosql-challenge

Analyzed 28,000+ UK restaurant records using MongoDB and PyMongo. Queried hygiene scores, location data, and customer ratings.

data-analysis data-cleaning database-analysis json mongodb nosql pymongo python restaurant-data

Last synced: 13 Apr 2026

https://github.com/avratanubiswas/fluorpenplugin

A matlab user interface for analysing OJIP curve datasets from FluorPen instrument. That is, serving as an additional plug in for "quick categorical analysis".

data-analysis fluorpen ojip-curve

Last synced: 18 Mar 2026

https://github.com/hazim-hf/data-science

This course covers basic data science principles, Python programming, and the concept of big data and its types. It explores algorithms, methods, and analyses in data science with practical Python examples. Additionally, it highlights current data technologies for storing and archiving.

data-analysis data-wrangling time-series

Last synced: 04 Jul 2025

https://github.com/nafiealhilaly/first-dash-app

A simple dash plotly app to explore and analyze imagined students assessment dataset

data-analysis data-analytics data-visualization eda plotly-dash python

Last synced: 02 Apr 2025

https://github.com/lexiortiz/advanced-data-analytics

Structured learning notes, code snippets, and key takeaways from the Google Advanced Data Analytics Professional Certificate. Serves as a personal reference for reinforcing concepts and as a resource for others on a similar learning journey.

data data-analysis data-engineering google python-3 sql

Last synced: 29 May 2026

https://github.com/nurulashraf/polynomial-regression-manufacturing

A Python project implementing polynomial regression to analyse and predict manufacturing-related data. Features include data preprocessing, model training, and visualisation of results. Ideal for exploring machine learning applications in manufacturing process optimisation.

data-analysis data-visualization machine-learning manufacturing polynomial-regression predictive-modeling process-optimization python regression-models scikit-learn

Last synced: 16 Apr 2026

https://github.com/analysisbyvivek/Road-Accident

Analyzes road accident patterns, exploring factors like lighting, weather, speed limits, time of day, and road conditions to uncover trends in severity and frequency.

data-analysis data-visualization eda jupyter-notebook kaggle tableau-public

Last synced: 29 Jan 2026

https://github.com/zen204/renewable-energy-usage-v-electricity-access

Interactive data visualization project created for COSI 116A: Introduction to Information Visualization at Brandeis University (Fall 2024). The project showcases data-driven insights using advanced visualization techniques and user interactivity. Hosted on GitHub Pages.

d3js data-analysis data-visualization electricity github-pages html-css-javascript information-visualization interactive python renewable-energy tableau web-development

Last synced: 08 Feb 2026

https://github.com/cezlul/analyse-ventes-immobilier

Solution ML d'analyse immobilière parisienne : classification automatique appartements vs commerces (K-means, 91%) et prédiction prix (régression, R²=0.98) sur 26K transactions. Valorise portefeuille 169M€ avec recommandations stratégiques data-driven.

data-analysis jupyter-notebook machine-learning matplotlib numpy pandas python sklearn

Last synced: 13 Apr 2026

https://github.com/jameswrigley/laph

A node-based data analysis program.

cpp data-analysis nodes qml

Last synced: 05 Jun 2026

https://github.com/abhisek-13/fake_news_classifier

The Fake News Classifier is a TensorFlow-based machine learning project that detects and classifies fake news with 97% accuracy. The repository includes a single Python file with complete code for building and training the model, which you can use to create and deploy your own model.

colab-notebook data-analysis data-engineering deep-learning eda kaggle keras machine-learning nlp pandas python tensorflow

Last synced: 13 Apr 2026

https://github.com/busradeveci/student-performance-prediction

A machine learning project to predict student exam performance based on academic, social, and personal features. Built with Python and scikit-learn.

data-analysis kaggle linear-regression machine-learning predictive-modeling python scikit-learn student-performance

Last synced: 25 Apr 2025

https://github.com/shivamsharma32/customer-churn-analysis-power-bi-

This project is about analyzing and visualizing customer churn data using Power BI. Customer churn is the percentage of customers who stop doing business with a company over a given period of time. It is an important metric for businesses to understand why customers leave and how to retain them.

data-analysis dataanalytics datavisualization powerbi

Last synced: 15 Jan 2026

https://github.com/gintuvedula/crime-data-analysis-with-mysql-and-python

This project aims to analyze crime data using MySQL for database management and Python for data analysis and visualization. The objective is to uncover crime trends, hotspots, and patterns to support law enforcement and urban planning efforts.

data-analysis data-exploration database mysql python

Last synced: 05 May 2026

https://github.com/hemangsharma/streamingcontentanalyzer

This Streamlit application provides an interactive dashboard for analyzing streaming content data. It allows users to explore movie and TV show ratings, distributions, temporal trends, and genre breakdowns through various visualizations and filters.

dashboard data-analysis data-science data-visualization python streamlit-dashboard streamlit-webapp

Last synced: 02 Apr 2025

https://github.com/marianamartiyns/inep-educationperfomance

Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.

data-analysis data-cleaning data-science inep predictive-modeling pyhton web-scraping

Last synced: 16 Mar 2025

https://github.com/vedanty3/supermarket-sales-data-analysis

This project contains data visualization techniques (using pandas and matplotlib) to explore different aspects of supermarket sales data of 3 months.

data-analysis data-science jupyter-notebook matplotlib numpy pandas python

Last synced: 08 May 2026

https://github.com/luminati-io/Shopee-dataset-samples

A sample dataset of over 1000 Shopee products, extracted using the Bright Data API, ideal for pricing optimization, gap analysis, and market strategy refinement..

api data-analysis data-mining datasets products shopee web-scraping

Last synced: 09 Apr 2025

https://github.com/chaganti-reddy/weather-prediction-australia

Creating a fully-automated system that can use today's weather data for a given location to predict whether it will rain at the location tomorrow.

data-analysis logistic-regression machine-learning prediction-model python3

Last synced: 13 Apr 2026

https://github.com/khushi-sabarad/8-week-sql-challenge

Case studies' solutions for the #8WeekSQLChallenge by Danny Ma

8weeksqlchallenge case-study data-analysis mysql sql

Last synced: 06 Sep 2025

https://github.com/saifalibaig/covid-19-infection-rate-analysis-using-python

Analysis of Covid-19 Infection rate and the world happiness report to identify if there is any relationship between infection rate and happiness

data-analysis data-visualization jupyter-notebook numpy pandas python3 sns

Last synced: 18 Apr 2026

https://github.com/shellynagar27/good-cabs-data-analysis-project

This project is part of CodeBasics Challenge #13, where the goal was to provide actionable insights to the Chief of Operations at Goodcabs, a cab service provider in tier-2 cities of India. The project focused on analyzing key metrics like trip volume, repeat passenger rate, and passenger satisfaction.

critical-thinking data-analysis data-visualization excel exploratory-data-analysis power-bi presentation problem-solving sql storytelling

Last synced: 25 Jan 2026

https://github.com/shellynagar27/business-insights-360-project

A comprehensive Dashboard which provides better understanding of the business's market standing, key focus areas for optimization, underperforming customers, and year-wise financial insights, aiding in better inventory planning and performance tracking. Further it can be used in answering n number of why questions based on the situations.

dashboard data-analysis data-visualization dax-languague dax-studio excel performance-optimization power-bi reporting sql storage-manager

Last synced: 27 Jan 2026

https://github.com/shibbir24/a-data-driven-approach-to-food-security-and-supermarket-accessibility

A Data-Driven Approach to Food Security and Supermarket Accessibility

data-analysis matplotlib numpy pandas python3 seaborn

Last synced: 13 Apr 2026

https://github.com/giyanellow/time-series-analysis-on-philippine-debt-and-inflation

A Time Series Analysis on the Philippine Inflation Rate with some predictions using RandomForest.

data-analysis data-analysis-python machine-learning python random-forest

Last synced: 18 Mar 2026

https://github.com/saro0307/exploratory-data-analysis-terrorism

Phase 1 of Data Science project (program) to perform Exploratory Data Analysis on Terrorism using Python On Google Colab for Coderscave Internship sept 2023

colaboratory data-analysis datascience machine-learning numpy pandas python seaborn skit-learn visualization

Last synced: 13 Apr 2026

https://github.com/1401dev/customer-lifetime-value-prediction

A data science project leveraging Python and Scikit-Learn to build predictive models that estimate customer lifetime value (CLV). Includes data cleaning, feature engineering, and model selection to identify key drivers of CLV, supporting strategic decision-making in customer retention and marketing.

clv clv-analysis customer-retention data-analysis dataprocessing feature-engineering machine-learning marketing-analytics predictive-modeling python regression-analysis scikit-learn

Last synced: 06 May 2026

https://github.com/tatilimongi/first_python_project

Este repositório contém um estudo de caso de automação de planilhas em Python para análise de vendas de carros por fabricante ao longo dos anos

data-analysis email-sending file-manipulation graphical-visualization spreadsheet-automation

Last synced: 26 Mar 2025

https://github.com/fer-aguirre/covid19-venezuela

Análisis de datos de muertes por covid-19 en Venezuela

covid-19 data-analysis dataviz line-chart

Last synced: 09 Apr 2025

https://github.com/weisswuerste/polars-eurovision-analytics

Analytics example using both the Pandas and Polars libraries

data-analysis data-analytics pandas polars python python-3 python3

Last synced: 08 May 2026

https://github.com/fer-aguirre/cookiecutter-data-analysis-lite

A cookiecutter template for data journalism projects that offers a simplified and beginner-friendly structure.

cookiecutter data-analysis data-journalism project-template python

Last synced: 14 Jun 2025

https://github.com/marina-gal/sql-business-questions

A collection of SQL queries designed to strengthen analytical problem-solving skills using the AdventureWorks2019 sample database. tested and optimized in SQL Server Management Studio (SSMS).

adventureworks data-analysis data-analyst interview-preparation learning microsoft-sql-server practice sql sql-queries

Last synced: 30 May 2026

https://github.com/bertiewooster/ipywidgets

Interactive data visualizations in a Jupyter Notebook per tutorial https://python.plainenglish.io/interactive-visualizations-with-pandas-seaborn-and-ipywidgets-173e5d7d6a5e

data-analysis data-science data-visualization ipython-notebook ipywidgets juypter-notebook python

Last synced: 06 Mar 2026

https://github.com/tiagocavalcante/nesfit

NES 2024 Practical and Research Work - Group 2

data-analysis fitness

Last synced: 09 Jun 2026

https://github.com/srinibas-masanta/electric-vehicle-analysis-dashboard

This repository features an interactive Tableau dashboard that visualizes electric vehicle (EV) adoption trends in the U.S. 🚗⚡ Explore EV growth, top manufacturers, regional distribution, and the impact of incentives—all in one dynamic view. 📊 Use filters to dive deeper into the data and uncover key insights! 🚀

dashboards data-analysis data-visualization tableau

Last synced: 15 Jan 2026

https://github.com/satyam4229/prediction-of-cement-compressive-strength

Prediction of cement compressive strength is a model which is based on Regression model, Here we predict that how much is the compressive strength of the particular cement has with variety of mixtures of its component.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 13 Apr 2026

https://github.com/pratik-khose/realtime-sales-simulation

Power BI: Realtime Sales Simulation using SQL Server and Direct Query

data-analysis data-analytics data-visualization dax-query powerbi sql sql-server sqlserver

Last synced: 10 Jun 2026

https://github.com/auliannee/customer-analysis-with-tableau

This repository contains the data source and the tableau workbook.

data-analysis data-visualization tableau

Last synced: 12 Mar 2026

https://github.com/firetyrant/sql-portfolio-projects

Documenting my SQL learning journey with hands-on projects focused on data cleaning, analysis, and optimization.

bigquery data-analysis databases etl learning portfolio query-optimization sql

Last synced: 19 Apr 2026

https://github.com/rainbowatcher/simple

Make data work easier, saving your working time

bigdata data-analysis etl

Last synced: 10 Apr 2025

https://github.com/samruddhi3012/rfm-sales-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 12 Mar 2025

https://github.com/manditacaos/hypefemme-analise-vendas

Projeto de análise de dados e visualização no Power BI da loja fictícia Hype Femme.

data-analysis jupyter-notebook portfolio powerbi python

Last synced: 10 Apr 2025

https://github.com/nimomach/cafe-sales

This analysis focuses on evaluating the sales performance of a cafe by examining key metrics such as total revenue, sales by product category, peak sales times, and many more.

cafe data-analysis data-visualization sales

Last synced: 12 Mar 2026

https://github.com/aalekhpatel07/statcan

StatCAN dataset fetcher and cleaner.

census data-analysis data-science statcan

Last synced: 02 Apr 2025

https://github.com/deliprofesor/k-means-clustering-for-retail-data-analysis

This project uses K-Means clustering to segment wholesale customers based on their spending habits. The data is preprocessed, scaled, and clustered into four groups. The Elbow and Silhouette methods determine the optimal number of clusters, and results are visualized using boxplots and scatter plots to uncover spending patterns.

clustering-visualisation data-analysis elbow-method k-means k-means-clustering r silhouette-score

Last synced: 10 Apr 2025

https://github.com/sabelomkhwanzi/data-alchemist-boot-camp

Built on Covalent's Unified API, Increment has the full historical data set for 40+ chains including every smart contract, event, transaction, address, etc. With access to all this data you can find:

covalent data-analysis increment

Last synced: 11 Mar 2026