An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/v-mayya/quantitative-analysis-data-dashboard

Quantitative survey data analysis using R

data data-analysis data-visualization flourish r

Last synced: 01 Apr 2025

https://github.com/ibotsh/predicting-cancer-diagnosis-using-life-status-variables---brfss-data-analysis

This project explores whether certain life status factors are associated with a cancer diagnosis (excluding skin cancer) using the Behavioral Risk Factor Surveillance System (BRFSS) 2021 dataset from the Centers for Disease Control (CDC).

data-analysis r regression-models

Last synced: 18 Jun 2025

https://github.com/dhruvil-26/powerbi-projects

This repository contains Power BI projects showcasing data analysis and interactive dashboards. Each project includes detailed visualizations and insights on diverse topics such as loan analysis, sales performance, and customer behavior.

customer-behavior-analysis data data-analysis interactive-dashboards loan-analysis powerbi sales-performance visualization

Last synced: 04 Feb 2026

https://github.com/danielafishwickinacap/coderhouse_da

Data analyst Final Project files

data-analysis

Last synced: 18 Jan 2026

https://github.com/azaz9026/car_price_prediction_model

This repository contains a machine learning model designed to predict car prices based on various features. Using historical data on car attributes such as make, model, year, mileage, and other relevant factors, the model aims to provide accurate and reliable price estimates for used cars.

data-analysis data-engineering liner-regestion machine-learning modeling numpy pandas python3 rendering

Last synced: 09 Apr 2026

https://github.com/ajay1214/credit-card-transaction-dashboard

Credit Card weekly dashboard that provides real-time insights into key performance metrics and trends

data-analysis powerbi sql

Last synced: 04 Feb 2026

https://github.com/shridhar1504/tableau-visualization-viz.-project-

This repository contains Visualization Projects which is visualized through Tableau Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and also it provides social values in some cases to calculate damages and intensity of calamities.

dashboards data-analysis data-science data-visualization exploratory-data-analysis tableau tableau-dashboards tableau-public tableau-workbooks visualization

Last synced: 04 Feb 2026

https://github.com/tomijuarez/lemmatisation

Lemmatisation fully implemented in Java.

algorithms data-analysis data-science java-8 lemmatization oop

Last synced: 08 Apr 2025

https://github.com/aldrinjenson/smart-qa

Query any structured data and find relations using natural language

data-analysis llm nlp sql

Last synced: 06 Apr 2025

https://github.com/brianrscode/delitos-cdmx

Página simple que muestra estadísticas sobre los delitos ocurridos en CDMX

analisis-de-datos data-analysis django pandas plotly python python3

Last synced: 18 Apr 2026

https://github.com/bryanfks-dev/klempoken-analysis

Analysis and forcasting model for Klempoken MSMEs

big-data-analytics data-analysis data-forecast data-visualization

Last synced: 01 Apr 2025

https://github.com/rahulsm20/insurance-data

A data analytics project dealing with risk assessment and it's effects in health insurance.

data-analysis data-analytics machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 12 Apr 2026

https://github.com/parth-jatav/super-store-analysis-project

The Super Store Analysis project leverages Python libraries such as pandas, matplotlib, and numpy to perform a comprehensive analysis of a retail store's data. This project includes data cleaning, visualization, and statistical analysis to identify key trends, optimize inventory, enhance decision-making processes for improved business performance.

data-analysis matplotlib numpy pandas python super-store

Last synced: 12 Apr 2026

https://github.com/pranav016/exploratory-data-analysis-of-sp500-dataset

This a data-analysis that I performed on the S&P 500 dataset and answered a few questions through data visualization techniques.

data-analysis

Last synced: 30 Oct 2025

https://github.com/mostafa-ghorab/global-happiness-analysis

An analysis of global happiness rankings based on various factors like GDP, family support, health, and freedom from the World Happiness Report (2015-2017). This project provides data visualizations and statistical insights into how these factors influence happiness scores in different regions.

business-analysis data-analysis data-visualization matplotlib numpy pandas python seaborn

Last synced: 12 Apr 2026

https://github.com/dhanyasri20/credit-risk-prediction

Credit Risk Prediction using Python, SQL, and Flask. Trained ML models (Random Forest) to identify high-risk loan applicants with 86% accuracy, automated SQL reporting, and deployed a Flask web app for real-time predictions.

classification credit-risk data-analysis financial-data flask loan-prediction machine-learning python random-forest sql

Last synced: 28 Apr 2026

https://github.com/satvikpraveen/rsvp_case_study

A comprehensive IMDB dataset analysis using SQL. Includes database setup, advanced queries, and actionable insights. Organized with files for database creation, queries, and solutions. Features an Entity-Relationship Diagram (ERD), executive summary, and SQL scripts. Perfect for SQL workflows and business intelligence in the film industry.

aggregate-functions business-intelligence common-table-expressions data-analysis data-driven-decisions data-querying database-design entity-relationship-diagram imdb-dataset relational-database sql subqueries-and-joins

Last synced: 11 Jan 2026

https://github.com/yash-3-bit/online-sales-analysis

Project-Merging the different months datasets and performing the data cleaning ,Analysis and Visualization

data-analysis data-visualization pandas-library

Last synced: 27 Mar 2025

https://github.com/quocduyenanhnguyen/roi-modeling-and-analysis-of-sports-dataset

In this project, you will find my ROI model for retirement savings and PowerPoint presentation of my ROI model, as well as my data analysis/visualization of Sports Ticket Sales dataset that I concluded with a PDF group written report

data-analysis data-visualization microsoft-excel rate-of-return-modeling sports-ticket-sales-dataset

Last synced: 08 Feb 2026

https://github.com/noodleslove/house-of-representatives-analysis-ii

In this project, we want to estimate if a transaction will have capital gains exceeding $200 using the provided dataset.

coursework data-analysis data-science eda feature-engineering pandas python3

Last synced: 12 Apr 2026

https://github.com/hemangsharma/breast-cancer-patient-dashboard

This interactive Streamlit dashboard visualizes insights from the SEER Breast Cancer Dataset (2006-2010)

data-analysis streamlit streamlit-dashboard streamlit-webapp

Last synced: 05 May 2026

https://github.com/dulajkavinda/pandas-exploring-data-ml

🐼 Exploring data with pandas library.

data-analysis machine-learning pandas python

Last synced: 09 May 2026

https://github.com/tanaybhadula/twitter-trends-dashboard

An interactive dashboard to visualizes data on current Twitter trends by country and globally. Collects data of over 60 countries using the python Tweepy library, processed it,and visualized it in the form of bar chart and pie chart using the Plotly Dash framework.

dash dashboard data-analysis data-visualization plotly python trends twitter

Last synced: 31 May 2026

https://github.com/theveryhim/massive-text-processing-1

cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework

big-data data-analysis frequent-itemsets massive-datasets pyspark text-preprocessing

Last synced: 03 Jul 2025

https://github.com/theveryhim/frequent-item-sets-and-lsh

A practice on finding frequent item sets and similar items in pysaprk framework

big-data data-analysis frequent-itemset-mining locality-sensitive-hashing pyspark text-processing

Last synced: 03 Jul 2025

https://github.com/mpoojithavigneswari/bangalore-house-price-prediction

This project involves creating a website that predicts Bangalore house prices with 94.65% accuracy using a machine learning algorithm.

data-analysis data-science flask-server machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 12 Apr 2026

https://github.com/skivhisink/econometricnsu

Семестровый магистерский курс по эконометрике на первом курсе магистратуры экономического факультета НГУ

data-analysis econometrics economics education nsu r

Last synced: 09 Apr 2025

https://github.com/quantumudit/groceries-basket-analysis

This project performs market basket analysis using Power BI and Python to reveal associations between grocery items. It involves transforming raw transaction data into a processed dataset, creating interactive Power BI reports, and generating key insights through Python, enabling data-driven decision-making.

data-analysis data-visualization pandas powerbi python

Last synced: 12 Apr 2026

https://github.com/bpkaur/exploring-the-evolution-of-linux

This project explores the evolution of the Linux kernel by finding top 10 contributors and visualization of commits over the years.

data-analysis data-science datacamp ipynb-jupyter-notebook python3

Last synced: 21 Feb 2026

https://github.com/borjamome/soho_cholera

Cholera deaths in the Soho District (London)

data-analysis data-visualization london r

Last synced: 04 Sep 2025

https://github.com/grooviter/tablesaw

Java dataframe and visualization library

data-analysis dataframe java visualization

Last synced: 28 Mar 2025

https://github.com/vasulab/knightshock

Shock tube experiment planning and data analysis package.

cantera data-analysis matplotlib numpy shock-tube

Last synced: 18 Jul 2025

https://github.com/sreekar0101/electric-vehicle-market-growth-and-incentive-impact-analysis-dashboard

About This project involves the development of a comprehensive Tableau dashboard to analyze the growth and market dynamics of electric vehicles (EVs). The dashboard reveals key insights, including a 20% increase in EV adoption over five years, the dominance of Battery Electric Vehicles (BEVs) which make up 60% of the market

data-analysis data-visualization tableau-desktop

Last synced: 07 Jan 2026

https://github.com/kernix13/github-readme-seo-analysis

A Jupyter Notebook GitHub README and Repo SEO Analysis to determine what makes a repo rank in the SERPS

accessibility data-analysis readme seo seo-analysis

Last synced: 29 May 2026

https://github.com/kseniatyschuk/excel-data-matcher

Compare and match Excel files via a simple Python GUI

automation data-analysis etl excel gui pandas python3 tkinter

Last synced: 23 Apr 2025

https://github.com/cassandrajm/reddit-dashboard

INTERACTIVE DASHBOARD: Analyzing Political Discourse on Reddit: A Multi-Faceted NLP Approach to Toxicity, Bias, and Political Stance

capstone data data-analysis data-science politics python reddit

Last synced: 09 Apr 2025

https://github.com/nilayhangarge/data-analysis-with-python

This repository provides a practical introduction to data acquisition and analysis using Pandas. It covers loading datasets, exploring data, manipulating data, and gaining insights through statistical summaries. Ideal for beginners, it offers code examples and explanations to enhance your data manipulation skills using Pandas for Python.

data-acquisition data-analysis data-analytics data-binning data-cleaning data-engineering data-fundamentals data-insights data-integration data-preprocessing data-science data-wrangling numpy pandas python

Last synced: 12 Apr 2026

https://github.com/noorulhudaajmal/customer-segmentation-analysis

Customer segmentation and analysis of purchasing behaviour

cluster-analysis customer-segmentation data-analysis

Last synced: 07 Oct 2025

https://github.com/chinmayee4/vrinda_store_data_analysis

Analyzed Data By Creating Interactive Dashboard Using MS Excel

data-analysis data-cleaning data-visualization excel-dashboard pivot-tables power-query

Last synced: 07 Jan 2026

https://github.com/jcm-ai/quantium-data-analytics-virtual-experience-program

This repository contains all about the proposed solutions to the assignments that I was required to complete as part of the Quantium Data Analytics Virtual Experience Program. 📊📈📉👨‍💻

commercial-thinking communication-skills data-analysis data-validation data-visualisation data-wrangling jupyter-notebook matplotlib-pyplot numpy-library pandas-python presentation-skills programming python3 scipy-stats seaborn statistical-testing

Last synced: 16 May 2026

https://github.com/pngo1997/life-expectancy-logistic-regression

Life expectancy analysis project using logistic regression.

data-analysis logistic-regression r rmarkdown

Last synced: 10 Jun 2026

https://github.com/saigeethika05/global-connect

International Student Engagement Platform

data-analysis figma prototyping ui-design ux-design wireframes

Last synced: 04 Jul 2025

https://github.com/giog97/find_similar_tables_on_pubtables-1m

Find similar tables on the PubTables-1M dataset

data-analysis data-visualization datamining dm tables

Last synced: 09 Apr 2025

https://github.com/victorlcastro-dsa/pbl-datacamp

This repository features projects from DataCamp's Project-Based Learning (PBL) courses, showcasing practical applications of data analysis, machine learning, and visualization. Explore real-world datasets and interactive results that highlight the skills gained through hands-on learning.

data-analysis data-science data-visualization datacamp-projects hypothesis-testing machine-learning project-based-learning

Last synced: 29 Nov 2025

https://github.com/francois-lenne/eletric_vehicle_usa

the project is purely educational the main goal is to use fabric

data-analysis data-engineering delta-lake fabric jupyter-notebook pyspark python spark

Last synced: 12 Apr 2026

https://github.com/shridhar1504/loan-classification-datascience-project

This project uses machine learning algorithms to predict the classification of loan status. The dataset is loaded and some transformation is done using SQL for getting a proper dataset with some valid informations.

classification data-analysis data-cleaning data-science data-visualization eda loan-prediction loan-status machine-learning predictive-modeling sql supervised-learning

Last synced: 09 Apr 2025

https://github.com/angchekar28/valorant-gameplay-analysis

This project analyzes Valorant gameplay data to understand key factors affecting match outcomes. It compares various machine learning models to predict player performance, rank classification, and match success.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook machine-learning python

Last synced: 12 Apr 2026

https://github.com/xza85hrf/excel-comparison-app

Excel Comparison Application is a Python-based tool that compares two Excel files and generates a new Excel file with the differences. It's primarily designed to help in database updating by identifying new clients. The app also has a graphical user interface for easier use and logs operations for potential troubleshooting.

case-sensitive-comparison data-analysis data-difference database-comparison database-updates excel-comparison file-merging file-processing gui-application new-client-detection python

Last synced: 25 Mar 2025

https://github.com/fbarffmann/python-challenge

Automated financial and election data analysis using Python. Cleaned and transformed large CSV datasets, calculated key business metrics, and generated automated reports for stakeholders.

automation csv data-analysis data-cleaning election-analysis financial-analysis python reporting

Last synced: 24 Apr 2025

https://github.com/fbarffmann/sqlalchemy-challenge

Built a Flask API with SQLAlchemy to analyze and visualize Hawaii climate data. Automated data extraction and developed database queries for temperature and precipitation insights.

api climate-data data-analysis data-visualization flask orm python sql sqlalchemy sqlite

Last synced: 13 Apr 2026

https://github.com/darrenjolson/pba-analysis-app

Data analysis and visualization tool for professional bowling tournaments, predicting performance across different oil patterns and venues.

bowling data-analysis data-visualization flask pba predictive-analytics python reactjs sports-analytics

Last synced: 13 Apr 2026

https://github.com/aimin-nur/visualisasi_bikestore

Data Analyst - Dashboard Bike Store

data-analysis sql visualization

Last synced: 29 Jan 2026

https://github.com/analysisbyvivek/Road-Accident

Analyzes road accident patterns, exploring factors like lighting, weather, speed limits, time of day, and road conditions to uncover trends in severity and frequency.

data-analysis data-visualization eda jupyter-notebook kaggle tableau-public

Last synced: 29 Jan 2026

https://github.com/ireneflorez/nypd-mvc

Analysis of NYPD Motor Vehicle Collisions

basemap data-analysis folium jupyter-notebook matplot pandas python

Last synced: 08 May 2026

https://github.com/jameswrigley/laph

A node-based data analysis program.

cpp data-analysis nodes qml

Last synced: 05 Jun 2026

https://github.com/athari22/investigating-netflix-movies-and-guest-stars-in-the-office

Apply basic Python skills in Introduction to Python and Intermediate Python by processing and visualizing film and television data.

data-analysis data-science data-visualization loop loops matplotlib matplotlib-pyplot netflix numpy office pandas python

Last synced: 11 Apr 2026

https://github.com/mr-chang95/udacity_movie_project

Movie Data Analysis and Visualization Project for Udacity's Data Analyst Program. Using Python in Jupyter Notebook.

data-analysis data-visualization jupyter-notebook movie python

Last synced: 13 Apr 2026

https://github.com/pinedah/sleep-data-analysis-exercise

Análisis de un dataset médico sobre el sueño, explorando duración, calidad y factores relacionados. Incluye limpieza de datos, EDA y visualizaciones con Python (pandas, numpy, matplotlib, seaborn, scipy).

data-analysis data-science escom numpy pandas python school-project scipy

Last synced: 13 Apr 2026

https://github.com/abhisek-13/fake_news_classifier

The Fake News Classifier is a TensorFlow-based machine learning project that detects and classifies fake news with 97% accuracy. The repository includes a single Python file with complete code for building and training the model, which you can use to create and deploy your own model.

colab-notebook data-analysis data-engineering deep-learning eda kaggle keras machine-learning nlp pandas python tensorflow

Last synced: 13 Apr 2026

https://github.com/bala-1409/tableau-visualization-viz.-project

This repository contains Visualization Projects which is visualized through Tableau Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and also it provides social values in some cases to calculate damages and intensity by calamities.

dashboard data-analysis data-science data-visualization exploratory-data-analysis tableau tableau-dashboards tableau-public visualization

Last synced: 04 Feb 2026

https://github.com/lijesh010/covid-19_global_analytics_power_bi_project

This repository is a data visualization project that offers an in-depth analysis of the Covid-19 pandemic using Microsoft Power BI. This interactive dashboard provides valuable insights into key metrics related to Covid-19 cases, deaths, recoveries, and more, helping users understand the global impact of the pandemic.

dashboard data-analysis data-visualization powerbi report

Last synced: 08 Jan 2026

https://github.com/hemangsharma/streamingcontentanalyzer

This Streamlit application provides an interactive dashboard for analyzing streaming content data. It allows users to explore movie and TV show ratings, distributions, temporal trends, and genre breakdowns through various visualizations and filters.

dashboard data-analysis data-science data-visualization python streamlit-dashboard streamlit-webapp

Last synced: 02 Apr 2025

https://github.com/mishaa931/amazon-sales-dashboard-power-bi

This project features a dynamic Power BI dashboard built on dummy Amazon sales data. It visualizes key business metrics such as revenue trends, top-selling categories, discount impact, and geographic performance. The dashboard is designed to help stakeholders make data-driven decisions through clear, interactive visuals.

data-analysis data-quality data-visualization microsoftpowerbi

Last synced: 05 Feb 2026

https://github.com/joaquinmoron/airbnb-eda-python

EDA de Airbnb — limpieza, exploración y visualización en Python (pandas, matplotlib, seaborn).

airbnb data-analysis eda matplotlib pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/marianamartiyns/inep-educationperfomance

Data collection, processing, exploratory analysis, and predictive modeling of school performance rates using datasets from INEP.

data-analysis data-cleaning data-science inep predictive-modeling pyhton web-scraping

Last synced: 16 Mar 2025

https://github.com/marina-gal/elderly-care-ranking

Data analysis and scoring model for elderly care homes, including data cleaning, transformation, 0–100 scoring, and ranking across multiple quality dimensions.

data-analysis excel ranking

Last synced: 30 May 2026

https://github.com/luminati-io/Indeed-dataset-samples

A sample dataset of over 1000 Indeed job listings, extracted using the Bright Data API, ideal for market analysis and growth.

api data-analysis datasets indeed jobs web-scraping

Last synced: 09 Apr 2025

https://github.com/luminati-io/Target-dataset-samples

A sample dataset of over 1000 target products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.

api data-analysis data-mining datasets target web-scraper web-scraping

Last synced: 09 Apr 2025

https://github.com/chaganti-reddy/weather-prediction-australia

Creating a fully-automated system that can use today's weather data for a given location to predict whether it will rain at the location tomorrow.

data-analysis logistic-regression machine-learning prediction-model python3

Last synced: 13 Apr 2026

https://github.com/leandrocollares/nyc-film-permits

NYC film permits: an exploratory data analysis

data-analysis data-visualization pandas plotly

Last synced: 05 Jul 2025

https://github.com/saifalibaig/covid-19-infection-rate-analysis-using-python

Analysis of Covid-19 Infection rate and the world happiness report to identify if there is any relationship between infection rate and happiness

data-analysis data-visualization jupyter-notebook numpy pandas python3 sns

Last synced: 18 Apr 2026

https://github.com/shellynagar27/good-cabs-data-analysis-project

This project is part of CodeBasics Challenge #13, where the goal was to provide actionable insights to the Chief of Operations at Goodcabs, a cab service provider in tier-2 cities of India. The project focused on analyzing key metrics like trip volume, repeat passenger rate, and passenger satisfaction.

critical-thinking data-analysis data-visualization excel exploratory-data-analysis power-bi presentation problem-solving sql storytelling

Last synced: 25 Jan 2026

https://github.com/shellynagar27/candy-market-share-analysis

Candy Market Share Analysis explores confectionery sales data using Power BI, Python, and Power Query. It uncovers key market trends, top-selling candies, manufacturer performance, and packaging preferences to support data-driven decision-making for industry researchers.

critical-thinking data-analysis data-visualization exploratory-data-analysis powerbi powerquery problem-solving sales-analysis

Last synced: 03 Feb 2026