An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/mahmoudnamnam/superstore-analysis

This project explores the SuperStore dataset to uncover insights into sales, profit, and customer behavior. It identifies key trends, regional variations, and product performance, using data analysis and machine learning techniques to guide business strategy and optimize performance.

clustering data-analysis data-science data-visualization geopandas jupyter-notebook machine-learning numpy pandas plotly regression seaborn sklearn

Last synced: 12 Apr 2026

https://github.com/mostafa-ghorab/global-happiness-analysis

An analysis of global happiness rankings based on various factors like GDP, family support, health, and freedom from the World Happiness Report (2015-2017). This project provides data visualizations and statistical insights into how these factors influence happiness scores in different regions.

business-analysis data-analysis data-visualization matplotlib numpy pandas python seaborn

Last synced: 12 Apr 2026

https://github.com/soumya-thoutam/covid-19-impact-on-u.s.-states-and-colleges

Covid-19 analysis and impact on United States Colleges and States using SQL and Tableau.

covid-19 dashboard data-analysis data-visualization dataset sql sql-server tableau

Last synced: 04 Sep 2025

https://github.com/stoll-jonathan/sorting_algorithm_analyzer

C++ program which analyses the performance of different sorting algorithms on a dataset of random numbers

bubble-sort data-analysis insertion-sort merge-sort sorting-algorithms

Last synced: 01 Apr 2025

https://github.com/yash-3-bit/online-sales-analysis

Project-Merging the different months datasets and performing the data cleaning ,Analysis and Visualization

data-analysis data-visualization pandas-library

Last synced: 27 Mar 2025

https://github.com/ernanej/data-science-dca0131

Files, developed throughout the 2024.1 semester of the Data Science discipline taught at the Federal University of Rio Grande do Norte by the Department of Computer Engineering and Automation (DCA). 📚

big-data data-analysis data-science ia

Last synced: 30 Mar 2025

https://github.com/apoorvalal/misc_stata_ados

Misc Utility programs in Stata.

data-analysis stata stata-command

Last synced: 04 Feb 2026

https://github.com/hemangsharma/breast-cancer-patient-dashboard

This interactive Streamlit dashboard visualizes insights from the SEER Breast Cancer Dataset (2006-2010)

data-analysis streamlit streamlit-dashboard streamlit-webapp

Last synced: 05 May 2026

https://github.com/tanaybhadula/twitter-trends-dashboard

An interactive dashboard to visualizes data on current Twitter trends by country and globally. Collects data of over 60 countries using the python Tweepy library, processed it,and visualized it in the form of bar chart and pie chart using the Plotly Dash framework.

dash dashboard data-analysis data-visualization plotly python trends twitter

Last synced: 31 May 2026

https://github.com/mpoojithavigneswari/bangalore-house-price-prediction

This project involves creating a website that predicts Bangalore house prices with 94.65% accuracy using a machine learning algorithm.

data-analysis data-science flask-server machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 12 Apr 2026

https://github.com/skivhisink/econometricnsu

Семестровый магистерский курс по эконометрике на первом курсе магистратуры экономического факультета НГУ

data-analysis econometrics economics education nsu r

Last synced: 09 Apr 2025

https://github.com/quantumudit/groceries-basket-analysis

This project performs market basket analysis using Power BI and Python to reveal associations between grocery items. It involves transforming raw transaction data into a processed dataset, creating interactive Power BI reports, and generating key insights through Python, enabling data-driven decision-making.

data-analysis data-visualization pandas powerbi python

Last synced: 12 Apr 2026

https://github.com/bpkaur/exploring-the-evolution-of-linux

This project explores the evolution of the Linux kernel by finding top 10 contributors and visualization of commits over the years.

data-analysis data-science datacamp ipynb-jupyter-notebook python3

Last synced: 21 Feb 2026

https://github.com/abdelrahmanbayoumi/titanic-machine-learning-from-disasters

Knowing from a training set of samples listing passengers who survived or did not survive the Titanic disaster, can our model determine based on a given test dataset not containing the survival information, if these passengers in the test dataset survived or not.

data-analysis data-science data-visualization machine-learning pandas

Last synced: 09 Apr 2025

https://github.com/borjamome/soho_cholera

Cholera deaths in the Soho District (London)

data-analysis data-visualization london r

Last synced: 04 Sep 2025

https://github.com/grooviter/tablesaw

Java dataframe and visualization library

data-analysis dataframe java visualization

Last synced: 28 Mar 2025

https://github.com/sreekar0101/electric-vehicle-market-growth-and-incentive-impact-analysis-dashboard

About This project involves the development of a comprehensive Tableau dashboard to analyze the growth and market dynamics of electric vehicles (EVs). The dashboard reveals key insights, including a 20% increase in EV adoption over five years, the dominance of Battery Electric Vehicles (BEVs) which make up 60% of the market

data-analysis data-visualization tableau-desktop

Last synced: 07 Jan 2026

https://github.com/kernix13/github-readme-seo-analysis

A Jupyter Notebook GitHub README and Repo SEO Analysis to determine what makes a repo rank in the SERPS

accessibility data-analysis readme seo seo-analysis

Last synced: 29 May 2026

https://github.com/yanny-alt/competitor-sales-analysis-in-power-bi

This project aims to analyze competitor sales for a fictional manufacturing company, Sintec, using Power BI. The focus is on integrating, cleaning, and modeling data from multiple sources to generate insightful reports on company and competitor performance.

data-analysis powerbi sales-analysis

Last synced: 07 Jan 2026

https://github.com/edprice25/us-states-analysis

Presents a series of visualizations for folks looking to relocate to more affordable areas in the US. Click on my link below to see a full analysis.

data-analysis jupyter-notebook matplotlib pandas python us-states

Last synced: 04 Jul 2025

https://github.com/kseniatyschuk/excel-data-matcher

Compare and match Excel files via a simple Python GUI

automation data-analysis etl excel gui pandas python3 tkinter

Last synced: 23 Apr 2025

https://github.com/cassandrajm/reddit-dashboard

INTERACTIVE DASHBOARD: Analyzing Political Discourse on Reddit: A Multi-Faceted NLP Approach to Toxicity, Bias, and Political Stance

capstone data data-analysis data-science politics python reddit

Last synced: 09 Apr 2025

https://github.com/ehsan-behzadi/online-retail-data-analysis-and-preprocessing

This project analyzes and preprocesses the Online Retail dataset to uncover insights into customer purchasing behaviors, sales trends, and product performance. It includes data cleaning, exploration, and visualization, with the goal of enhancing understanding of online retail dynamics.

cohort-analysis data-analysis data-cleaning data-exploration duplicate-detection exploratory-data-analysis-eda feature-encoding feature-engineering handling-missing-values online-retail outlier-detection preprocessing trends-visualization visualization z-score-method

Last synced: 16 Apr 2026

https://github.com/nilayhangarge/data-analysis-with-python

This repository provides a practical introduction to data acquisition and analysis using Pandas. It covers loading datasets, exploring data, manipulating data, and gaining insights through statistical summaries. Ideal for beginners, it offers code examples and explanations to enhance your data manipulation skills using Pandas for Python.

data-acquisition data-analysis data-analytics data-binning data-cleaning data-engineering data-fundamentals data-insights data-integration data-preprocessing data-science data-wrangling numpy pandas python

Last synced: 12 Apr 2026

https://github.com/noorulhudaajmal/customer-segmentation-analysis

Customer segmentation and analysis of purchasing behaviour

cluster-analysis customer-segmentation data-analysis

Last synced: 07 Oct 2025

https://github.com/chinmayee4/vrinda_store_data_analysis

Analyzed Data By Creating Interactive Dashboard Using MS Excel

data-analysis data-cleaning data-visualization excel-dashboard pivot-tables power-query

Last synced: 07 Jan 2026

https://github.com/jbalooshie/election_analysis

A Python script built to analyze specific election's results, and be re-purposed to analyze the results of other elections. The script provides you with different breakdowns of the vote based on candidate and county,

data-analysis data-science elections python

Last synced: 09 Apr 2025

https://github.com/wisdom-osborn/data-analytics-course-online-

🔍 Data Analytics with Python — Hands-on Course Materials Jupyter notebooks, projects, and datasets based on the freeCodeCamp Data Analysis with Python certification. Learn NumPy, Pandas, data cleaning, and visualization through real-world examples

data data-analysis data-science data-visualization freecodecamp numpy pandas pandas-dataframe project python

Last synced: 19 Apr 2026

https://github.com/bkataru/physics-e.e

Project repository for IB physics extended essay. Topic: Predictive data modeling of a variable binary star’s brightness over a period of time using astrostatistics.

astrometry astronomical-algorithms astronomical-images astronomy astrophotography astrostatistics data-analysis data-science data-visualization modeling physics polynomial-regression regression-analysis

Last synced: 09 Apr 2025

https://github.com/saroshfarhan/dublin_pedestrian_data_analysis

Pedestrian's footfall data analysis for the city of Dublin

data-analysis data-visualization r-programming

Last synced: 07 Jan 2026

https://github.com/rohitblaze10/netflix_analysis_using_tableau

The Netflix dashboard in Tableau provides a professional and visually captivating interface for users to explore a vast collection of TV shows and series. With seamless navigation and interactive filters, users can easily personalize their recommendations based on release year, genre, duration, and rating.

data data-analysis data-science data-visualization netflix tableau

Last synced: 04 Feb 2026

https://github.com/saigeethika05/global-connect

International Student Engagement Platform

data-analysis figma prototyping ui-design ux-design wireframes

Last synced: 04 Jul 2025

https://github.com/astropenguin/optimap

Optimized integrated intensity map method for spectral cubes

astronomy data-analysis data-science python python3 radio-astronomy spectral-cubes

Last synced: 09 Apr 2025

https://github.com/bhaskarbharati/ibm-datascience-hands-on-lab

This is the basic hands-on exercise using Jupyter Notebook. This lab is done in the process of learning course Tools For Data Science | IBM

data-analysis data-science data-visualization datawrangling eda machine-learning

Last synced: 23 Apr 2025

https://github.com/giog97/find_similar_tables_on_pubtables-1m

Find similar tables on the PubTables-1M dataset

data-analysis data-visualization datamining dm tables

Last synced: 09 Apr 2025

https://github.com/victorlcastro-dsa/pbl-datacamp

This repository features projects from DataCamp's Project-Based Learning (PBL) courses, showcasing practical applications of data analysis, machine learning, and visualization. Explore real-world datasets and interactive results that highlight the skills gained through hands-on learning.

data-analysis data-science data-visualization datacamp-projects hypothesis-testing machine-learning project-based-learning

Last synced: 30 Jun 2026

https://github.com/ankitmishralive/machinelearning

Continuously deep diving in understanding & advancing my expertise in Machine Learning through ongoing education and hands on experience with practical learning.

artificial-intelligence data-analysis data-cleaning data-gathering machine-learning machinel-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 22 Mar 2025

https://github.com/rachit1084/sql-practice-ankit-bansal

Personal SQL problem-solving practice based on Ankit Bansal's YouTube series, with logic-driven solutions for analyst prep.

analytics data-analysis data-analyst interview-preparation logical-reasoning postgresql sql sql-practice

Last synced: 04 Jul 2025

https://github.com/francois-lenne/eletric_vehicle_usa

the project is purely educational the main goal is to use fabric

data-analysis data-engineering delta-lake fabric jupyter-notebook pyspark python spark

Last synced: 12 Apr 2026

https://github.com/faysalalmahmud/bd-med-professional-analysis

Analysis of healthcare professionals in Bangladesh through web scraping, data processing, and interactive visualization.

data-analysis data-visualization jupyter-notebook python scraper selenium selenium-webdriver tableau

Last synced: 04 Sep 2025

https://github.com/soajala/shopify-sales-analysis-powerbi

End-to-end Power BI dashboard project analyzing Shopify sales data with real-time metrics, DAX, and business insights.

business-intelligence data-analysis data-visualization dax interactive-dashboard powerbi sales-analysis shopify

Last synced: 05 Sep 2025

https://github.com/muhammed-fazal/student-success-and-early-intervention-analytics-system

To consolidate scattered student performance records into a unified Data Warehouse in SQL Server. Engineer an Interactive Power BI dashboards that visualize academic trends, identifying student performance and implement predictive analytics.

analysis analytics dashboard data data-analysis data-engineering data-science data-visualization database etl etl-pipeline power-bi powerbi python sql sql-server

Last synced: 29 May 2026

https://github.com/shridhar1504/loan-classification-datascience-project

This project uses machine learning algorithms to predict the classification of loan status. The dataset is loaded and some transformation is done using SQL for getting a proper dataset with some valid informations.

classification data-analysis data-cleaning data-science data-visualization eda loan-prediction loan-status machine-learning predictive-modeling sql supervised-learning

Last synced: 09 Apr 2025

https://github.com/noeldevelops/stem-degrees-analysis-cpp

C++ Data Analysis, I/O - takes an external data file for processing, performs some statistical analysis, and displays the results in the console

cpp data-analysis

Last synced: 29 May 2026

https://github.com/wo0fle/sfrcp

The program used for a research study I conducted: "Comparison of Star Formation Rate in Spiral versus Elliptical Galaxies."

astronomy astropy data-analysis galaxy jupyter-notebook python research research-project

Last synced: 03 Apr 2025

https://github.com/ymorsi7/caliwageanalysis

California employment and wage analysis on data from the past decade.

data-analysis data-science ipynb jupyter-notebook

Last synced: 21 Jan 2026

https://github.com/angchekar28/valorant-gameplay-analysis

This project analyzes Valorant gameplay data to understand key factors affecting match outcomes. It compares various machine learning models to predict player performance, rank classification, and match success.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook machine-learning python

Last synced: 12 Apr 2026

https://github.com/wtmcgrew/sql-credit-risk-analysis

Credit Risk Analysis using SQL & Excel – Approval trends by FICO, DTI, PTI, LTV, and delinquencies.

case-study credit-risk data-analysis financial-analysis loan-applications portfolio-project sql sqlite underwriting

Last synced: 04 Jul 2025

https://github.com/sarthakagg29/sql-share-trading-analysis

Analysis of share trading transactions using SQL. Includes table setup, sample data, and a variety of queries to answer typical business questions about stocks and trading.

data-analysis dbeaver portfolio postgresql share-market sql

Last synced: 04 Jul 2025

https://github.com/camara94/data_analyse_series_temporelles

Dans ce tutoriel, nous allons répondre aux questions suivantes: 1. Lire les données Microsoft à l'aide du package **Pandas Data reader** 2. Obtenez le **prix maximum** de l'action de **2017 à 2022** 3. Quelle est la **date du cours le plus élevé** de l'action ? 4. Quelle est la **date du cours le plus bas** de l'action ?

data-analysis data-analysis-python data-science data-structures-and-algorithms data-visualization serie series-forecasting

Last synced: 09 Apr 2025

https://github.com/smoeding/jmeterplugin-datasketches

A JMeter listener using DataSketches to estimate response time quantiles and histograms

data-analysis jmeter jmeter-listeners jmeter-plugin

Last synced: 06 Mar 2025

https://github.com/shrinidhi857/simpledataanalysisonstartups

The Indian startup ecosystem has experienced remarkable growth over the past decade, becoming a hotbed of innovation and entrepreneurship. In this data analysis we are segregating fields ,finding new insights.

data-analysis data-science data-visualization indian-startups

Last synced: 17 Sep 2025

https://github.com/doughtnerd/pod-old

Read and write Excel data

data data-analysis excel poi-library workbook

Last synced: 21 Jan 2026

https://github.com/anudeepkaddala/bankds

This repository contains a Python-based solution for cleaning, matching, and formatting bank data. The primary goal is to match banks from two datasets based on their names and associate each bank with its respective asset size. The final output is a cleaned dataset with asset sizes in Indian-style currency format.

data-analysis data-science fuzzy-matching pandas python

Last synced: 12 Apr 2026

https://github.com/nsandoya/python_scrp_project

This is a tool specially made for Dipaso ecommerce website. You can extract data from there, analyze it and see keywords, brands, and categories frecuency, prices distribution and other market tendencies as well —all in a group of friendly stadistic tables and graphics (exported from a Jupyter notebook) :)

beautifulsoup4 data data-analysis jupyter-notebook pandas python3

Last synced: 28 Apr 2026

https://github.com/fbarffmann/python-challenge

Automated financial and election data analysis using Python. Cleaned and transformed large CSV datasets, calculated key business metrics, and generated automated reports for stakeholders.

automation csv data-analysis data-cleaning election-analysis financial-analysis python reporting

Last synced: 24 Apr 2025

https://github.com/sumit0ubey/internship

This repository showcases the tasks and projects I completed during various internships. It includes work across diverse domains such as: Data Analysis: Exploratory data analysis, data visualization, and insights generation using Python and libraries like Pandas, Matplotlib, and Seaborn. Backend Development: Designing and implementing RESTful API

backend-development data-analysis python-developer

Last synced: 05 Sep 2025

https://github.com/itskshitija/lego-set-explorer

As a part of the Maven Analytics Lego challenge, I developed an interactive Power BI dashboard exploring the evolution of LEGO sets from 1970 to 2022.

data-analysis data-science data-visualization dataanalysis dataset powerbi powerbi-desktop powerbi-report

Last synced: 12 Jun 2025

https://github.com/fbarffmann/nosql-challenge

Analyzed 28,000+ UK restaurant records using MongoDB and PyMongo. Queried hygiene scores, location data, and customer ratings.

data-analysis data-cleaning database-analysis json mongodb nosql pymongo python restaurant-data

Last synced: 13 Apr 2026

https://github.com/avratanubiswas/fluorpenplugin

A matlab user interface for analysing OJIP curve datasets from FluorPen instrument. That is, serving as an additional plug in for "quick categorical analysis".

data-analysis fluorpen ojip-curve

Last synced: 18 Mar 2026

https://github.com/fbarffmann/sqlalchemy-challenge

Built a Flask API with SQLAlchemy to analyze and visualize Hawaii climate data. Automated data extraction and developed database queries for temperature and precipitation insights.

api climate-data data-analysis data-visualization flask orm python sql sqlalchemy sqlite

Last synced: 13 Apr 2026

https://github.com/alinenog/desenvolve_gb_2022

Formação Desenvolve 2022 do Grupo Boticário na área de dados

data-analysis data-science googlesheet machine-learning numpy pandas python

Last synced: 13 Apr 2026

https://github.com/hazim-hf/data-science

This course covers basic data science principles, Python programming, and the concept of big data and its types. It explores algorithms, methods, and analyses in data science with practical Python examples. Additionally, it highlights current data technologies for storing and archiving.

data-analysis data-wrangling time-series

Last synced: 04 Jul 2025

https://github.com/nullthefirst/py-notebooks

Jupyter Notebooks holding Data Science projects

data-analysis data-science data-visualization datasets jupyter-notebooks python

Last synced: 26 Apr 2026

https://github.com/ironlegion88/media_bias

An end-to-end NLP pipeline to analyze ideological bias in online news media during elections. Uses sentiment analysis, topic modeling (LDA/NMF), and NER to quantify media framing.

data-analysis machine-learning media-bias nlp nltk political-science python scikit-learn sentiment-analysis spacy topic-modeling

Last synced: 13 Apr 2026

https://github.com/lexiortiz/advanced-data-analytics

Structured learning notes, code snippets, and key takeaways from the Google Advanced Data Analytics Professional Certificate. Serves as a personal reference for reinforcing concepts and as a resource for others on a similar learning journey.

data data-analysis data-engineering google python-3 sql

Last synced: 29 May 2026

https://github.com/darrenjolson/pba-analysis-app

Data analysis and visualization tool for professional bowling tournaments, predicting performance across different oil patterns and venues.

bowling data-analysis data-visualization flask pba predictive-analytics python reactjs sports-analytics

Last synced: 13 Apr 2026

https://github.com/nurulashraf/polynomial-regression-manufacturing

A Python project implementing polynomial regression to analyse and predict manufacturing-related data. Features include data preprocessing, model training, and visualisation of results. Ideal for exploring machine learning applications in manufacturing process optimisation.

data-analysis data-visualization machine-learning manufacturing polynomial-regression predictive-modeling process-optimization python regression-models scikit-learn

Last synced: 16 Apr 2026

https://github.com/analysisbyvivek/Road-Accident

Analyzes road accident patterns, exploring factors like lighting, weather, speed limits, time of day, and road conditions to uncover trends in severity and frequency.

data-analysis data-visualization eda jupyter-notebook kaggle tableau-public

Last synced: 29 Jan 2026

https://github.com/analysisbyvivek/Crime-data

Analyzes crime patterns across different areas, exploring factors such as crime type, weapon usage, demographic influences, and geographic distribution to uncover trends in frequency, correlations, and hotspots.

apache-superset data-analysis eda jupyter-notebook python

Last synced: 29 Jan 2026

https://github.com/borjamome/accidentes_madrid

Análisis de Accidentes en Madrid en SQL (2023)

accidentes-coche data-analysis madrid sql

Last synced: 17 Jan 2026

https://github.com/parthds02/e-commerce-data-analysis-with-python

This project focuses on analyzing an e-commerce dataset using Python. The goal is to derive meaningful insights through exploratory data analysis (EDA) and uncover trends and patterns that can drive business decisions.

data-analysis ecommerce exploratory-data-analysis jupyter-notebook pytho sales-analysis visualization

Last synced: 13 Jun 2025

https://github.com/amoghkori/working-with-apache-spark-mllib

Implemented Apache Spark MLLib to analyze a large car dataset, predict car selling prices, and gain insights into the car market.

amazon-web-services data-analysis data-visualization exploratory-data-analysis linear-regression machine-learning model-selection pyspark python random-forest sagemaker spark

Last synced: 13 Apr 2026

https://github.com/nature40/casestudies

Case studies for testing the functionality of database systems, sensors, etc

casestudies data-analysis data-visualization database

Last synced: 02 May 2026

https://github.com/ireneflorez/nypd-mvc

Analysis of NYPD Motor Vehicle Collisions

basemap data-analysis folium jupyter-notebook matplot pandas python

Last synced: 08 May 2026

https://github.com/hassanislam463/data-cleaning-and-modelling-top-5-categories-analysis-forage

This project involves cleaning, merging, and analyzing datasets to identify the top 5 performing categories based on aggregate popularity scores. It includes cleaned datasets, a final merged dataset, visualizations, and a presentation summarizing the tasks and results. Tools used: Microsoft Excel, Python, and PowerPoint.

data-analysis data-visualization microsoft-excel

Last synced: 07 Jan 2026

https://github.com/sco1/xbmini-py

Python Toolkit for the GCDC HAM

data-analysis data-visualization python python3

Last synced: 07 May 2025

https://github.com/ljadhav25/data-engineering-poc

This repository contains a beginner-level Data Engineering Proof of Concept (POC) project designed for practice. The objective is to provide hands-on experience with data engineering concepts, including data extraction, transformation, loading (ETL), and basic data analysis. This project is ideal for those looking to build foundational skills in da

data-analysis etl matplotlib numpy pandas python

Last synced: 13 Apr 2026

https://github.com/cezlul/analyse-ventes-immobilier

Solution ML d'analyse immobilière parisienne : classification automatique appartements vs commerces (K-means, 91%) et prédiction prix (régression, R²=0.98) sur 26K transactions. Valorise portefeuille 169M€ avec recommandations stratégiques data-driven.

data-analysis jupyter-notebook machine-learning matplotlib numpy pandas python sklearn

Last synced: 13 Apr 2026

https://github.com/extwiii/datascience-jhu

Ask the right questions, manipulate data sets, and create visualizations to communicate results - Coursera

biostatistics data-analysis data-science linear-regression multivariate-regression r r-programming toolbox visualization

Last synced: 05 Jul 2025

https://github.com/jameswrigley/laph

A node-based data analysis program.

cpp data-analysis nodes qml

Last synced: 05 Jun 2026