An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/ironlegion88/media_bias

An end-to-end NLP pipeline to analyze ideological bias in online news media during elections. Uses sentiment analysis, topic modeling (LDA/NMF), and NER to quantify media framing.

data-analysis machine-learning media-bias nlp nltk political-science python scikit-learn sentiment-analysis spacy topic-modeling

Last synced: 13 Apr 2026

https://github.com/lexiortiz/advanced-data-analytics

Structured learning notes, code snippets, and key takeaways from the Google Advanced Data Analytics Professional Certificate. Serves as a personal reference for reinforcing concepts and as a resource for others on a similar learning journey.

data data-analysis data-engineering google python-3 sql

Last synced: 29 May 2026

https://github.com/nurulashraf/polynomial-regression-manufacturing

A Python project implementing polynomial regression to analyse and predict manufacturing-related data. Features include data preprocessing, model training, and visualisation of results. Ideal for exploring machine learning applications in manufacturing process optimisation.

data-analysis data-visualization machine-learning manufacturing polynomial-regression predictive-modeling process-optimization python regression-models scikit-learn

Last synced: 16 Apr 2026

https://github.com/aimin-nur/visualisasi_bikestore

Data Analyst - Dashboard Bike Store

data-analysis sql visualization

Last synced: 29 Jan 2026

https://github.com/analysisbyvivek/Crime-data

Analyzes crime patterns across different areas, exploring factors such as crime type, weapon usage, demographic influences, and geographic distribution to uncover trends in frequency, correlations, and hotspots.

apache-superset data-analysis eda jupyter-notebook python

Last synced: 29 Jan 2026

https://github.com/borjamome/accidentes_madrid

Análisis de Accidentes en Madrid en SQL (2023)

accidentes-coche data-analysis madrid sql

Last synced: 17 Jan 2026

https://github.com/amoghkori/working-with-apache-spark-mllib

Implemented Apache Spark MLLib to analyze a large car dataset, predict car selling prices, and gain insights into the car market.

amazon-web-services data-analysis data-visualization exploratory-data-analysis linear-regression machine-learning model-selection pyspark python random-forest sagemaker spark

Last synced: 13 Apr 2026

https://github.com/nature40/casestudies

Case studies for testing the functionality of database systems, sensors, etc

casestudies data-analysis data-visualization database

Last synced: 02 May 2026

https://github.com/ireneflorez/nypd-mvc

Analysis of NYPD Motor Vehicle Collisions

basemap data-analysis folium jupyter-notebook matplot pandas python

Last synced: 08 May 2026

https://github.com/zen204/renewable-energy-usage-v-electricity-access

Interactive data visualization project created for COSI 116A: Introduction to Information Visualization at Brandeis University (Fall 2024). The project showcases data-driven insights using advanced visualization techniques and user interactivity. Hosted on GitHub Pages.

d3js data-analysis data-visualization electricity github-pages html-css-javascript information-visualization interactive python renewable-energy tableau web-development

Last synced: 08 Feb 2026

https://github.com/himanshubhosale25/ai-insightful-quiz-analytics

This project analyzes student quiz performance data, providing visualizations and AI-generated feedback. It uses FastAPI for the backend, React for the frontend, and OpenAI LLMs to deliver personalized insights and actionable recommendations for students.

data-analysis fastapi openai-api react student-performance

Last synced: 11 Mar 2025

https://github.com/ljadhav25/data-engineering-poc

This repository contains a beginner-level Data Engineering Proof of Concept (POC) project designed for practice. The objective is to provide hands-on experience with data engineering concepts, including data extraction, transformation, loading (ETL), and basic data analysis. This project is ideal for those looking to build foundational skills in da

data-analysis etl matplotlib numpy pandas python

Last synced: 13 Apr 2026

https://github.com/cezlul/analyse-ventes-immobilier

Solution ML d'analyse immobilière parisienne : classification automatique appartements vs commerces (K-means, 91%) et prédiction prix (régression, R²=0.98) sur 26K transactions. Valorise portefeuille 169M€ avec recommandations stratégiques data-driven.

data-analysis jupyter-notebook machine-learning matplotlib numpy pandas python sklearn

Last synced: 13 Apr 2026

https://github.com/spacebakery/variance-in-weather-project

Statistics for Data Analysis | Variance and Standard Deviation

data-analysis python standard-deviation statistics variance

Last synced: 05 Jul 2025

https://github.com/crazy-dot/instagram_user_analytics

Analysis of Popular Social Media Network - Instagram

data-analysis instagram-analytics project-repository trainity

Last synced: 07 Jan 2026

https://github.com/athari22/investigating-netflix-movies-and-guest-stars-in-the-office

Apply basic Python skills in Introduction to Python and Intermediate Python by processing and visualizing film and television data.

data-analysis data-science data-visualization loop loops matplotlib matplotlib-pyplot netflix numpy office pandas python

Last synced: 11 Apr 2026

https://github.com/abhisek-13/fake_news_classifier

The Fake News Classifier is a TensorFlow-based machine learning project that detects and classifies fake news with 97% accuracy. The repository includes a single Python file with complete code for building and training the model, which you can use to create and deploy your own model.

colab-notebook data-analysis data-engineering deep-learning eda kaggle keras machine-learning nlp pandas python tensorflow

Last synced: 13 Apr 2026

https://github.com/bala-1409/tableau-visualization-viz.-project

This repository contains Visualization Projects which is visualized through Tableau Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and also it provides social values in some cases to calculate damages and intensity by calamities.

dashboard data-analysis data-science data-visualization exploratory-data-analysis tableau tableau-dashboards tableau-public visualization

Last synced: 04 Feb 2026

https://github.com/ascender1729/leetcode_scraper

Extract topic tags from LeetCode problems to streamline interview preparation.

beautifulsoup coding-interview data-analysis graphql leetcode python scraper web-scraping

Last synced: 20 Jun 2026

https://github.com/busradeveci/student-performance-prediction

A machine learning project to predict student exam performance based on academic, social, and personal features. Built with Python and scikit-learn.

data-analysis kaggle linear-regression machine-learning predictive-modeling python scikit-learn student-performance

Last synced: 25 Apr 2025

https://github.com/kittonn/data-analysis-freecodecamp

freecodecamp - data analysis projects.

data-analysis freecodecamp

Last synced: 05 Apr 2025

https://github.com/anthonytlei/graphsql

Lightweight SQL-to-GraphQL connector for querying GraphQL endpoints using SQL syntax.

connector data-analysis dbapi graphql graphsql python sql sqlalchemy superset

Last synced: 09 Apr 2026

https://github.com/hemangsharma/streamingcontentanalyzer

This Streamlit application provides an interactive dashboard for analyzing streaming content data. It allows users to explore movie and TV show ratings, distributions, temporal trends, and genre breakdowns through various visualizations and filters.

dashboard data-analysis data-science data-visualization python streamlit-dashboard streamlit-webapp

Last synced: 02 Apr 2025

https://github.com/pkjjoshi/behind-the-menu-uncovering-insights-from-restaurant-data

Discover hidden patterns in dining data — from popular cuisine pairings to geographic restaurant clusters

data-analysis data-visualization insights jupyter-notebook pandas python restaurant-data

Last synced: 05 Jul 2025

https://github.com/mishaa931/amazon-sales-dashboard-power-bi

This project features a dynamic Power BI dashboard built on dummy Amazon sales data. It visualizes key business metrics such as revenue trends, top-selling categories, discount impact, and geographic performance. The dashboard is designed to help stakeholders make data-driven decisions through clear, interactive visuals.

data-analysis data-quality data-visualization microsoftpowerbi

Last synced: 05 Feb 2026

https://github.com/joaquinmoron/airbnb-eda-python

EDA de Airbnb — limpieza, exploración y visualización en Python (pandas, matplotlib, seaborn).

airbnb data-analysis eda matplotlib pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/manojrathod0777/loan-prediction

Predict loan approval status using machine learning techniques. This project demonstrates data preprocessing, feature engineering, model training, and evaluation, along with an interactive Streamlit app for real-time predictions. Ideal for financial decision-making.

classification-models data-analysis data-science financial-analytics jupyter-notebook loan-prediction machine-learning predictive-modeling python streamlit-app

Last synced: 13 Apr 2026

https://github.com/marianamartiyns/rfm-cluster-analysis

Customer behavior and sales analysis, including data cleaning, RFM calculation, churn analysis and customer clustering.

cluster-analysis data-analysis data-cleaning data-visualization pyhton

Last synced: 16 Mar 2025

https://github.com/marina-gal/elderly-care-ranking

Data analysis and scoring model for elderly care homes, including data cleaning, transformation, 0–100 scoring, and ranking across multiple quality dimensions.

data-analysis excel ranking

Last synced: 30 May 2026

https://github.com/luminati-io/Walmart-dataset-samples

A sample dataset of over 1000 Walmart products, extracted using the Bright Data API, ideal for consumer market insights and competitor analysis.

api data-analysis dataset walmart walmart-scraper web-scraping

Last synced: 09 Apr 2025

https://github.com/luminati-io/Target-dataset-samples

A sample dataset of over 1000 target products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.

api data-analysis data-mining datasets target web-scraper web-scraping

Last synced: 09 Apr 2025

https://github.com/chaganti-reddy/weather-prediction-australia

Creating a fully-automated system that can use today's weather data for a given location to predict whether it will rain at the location tomorrow.

data-analysis logistic-regression machine-learning prediction-model python3

Last synced: 13 Apr 2026

https://github.com/deliprofesor/virtual-reality-in-education-impact-analysis-and-insights

This project examines the impact of Virtual Reality (VR) on education, focusing on its effects on student engagement, learning outcomes, and creativity. It uses data analysis techniques like descriptive statistics, correlation analysis, and clustering to assess VR's effectiveness in enhancing learning.

clustering data data-analysis data-science data-visualization exploratory-data-analysis hypothesis-testing machine-learning python regression-analysis virtual-reality

Last synced: 14 Jun 2025

https://github.com/khushi-sabarad/8-week-sql-challenge

Case studies' solutions for the #8WeekSQLChallenge by Danny Ma

8weeksqlchallenge case-study data-analysis mysql sql

Last synced: 06 Sep 2025

https://github.com/saifalibaig/covid-19-infection-rate-analysis-using-python

Analysis of Covid-19 Infection rate and the world happiness report to identify if there is any relationship between infection rate and happiness

data-analysis data-visualization jupyter-notebook numpy pandas python3 sns

Last synced: 18 Apr 2026

https://github.com/hari7261/playwithdata-python

This is one of the repository where I have put lot of data science and machine learning related questions on their solutions I hope you will find something better than some other platforms. Thank you Happy exploring

data-analysis data-science data-science-learning machienlearning matplotlib matplotlib-python ml numpy numpy-arrays numpy-library pandas pandas-dataframe pandas-library python python-script sklearn

Last synced: 13 Apr 2026

https://github.com/tj2904/lfb-callout-analysis

An investigation into London Fire Brigade's callout data.

data-analysis decsion-tree kmeans lfb-incidents london-fire-brigade pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/shellynagar27/business-insights-360-project

A comprehensive Dashboard which provides better understanding of the business's market standing, key focus areas for optimization, underperforming customers, and year-wise financial insights, aiding in better inventory planning and performance tracking. Further it can be used in answering n number of why questions based on the situations.

dashboard data-analysis data-visualization dax-languague dax-studio excel performance-optimization power-bi reporting sql storage-manager

Last synced: 27 Jan 2026

https://github.com/mehedi-hassan81/mastercourse

Data analysis project analysing renewable energy production across 212 countries, visualizing trends with Tableau. Highlights China's dominance (2,894 TWh) and Paraguay's 100% renewable share.

data-analysis pandas python renewable-energy selenium tableau-dashboards tableau-public web-scraping

Last synced: 08 May 2026

https://github.com/rupashi03/fitbit-user-eda-case-study

Performed Exploratory Data Analysis (EDA) on Fitbit users' data to uncover trends in activity and health metrics.

business-analysis case-study consumer-insights data-analysis exploratory-data-analysis health-data r user-behavior-analytics

Last synced: 25 Mar 2025

https://github.com/wadeChriestenson/Main_Application

A Django application to host my personal resume.

data-analysis data-visualization django plotly python ui-design

Last synced: 11 Mar 2025

https://github.com/allanotieno254/powerbi-dax-filter-context

This repository contains a Power BI project that explores **DAX Filter Context**, a crucial concept in DAX calculations. The project focuses on **Bank Loan Analysis**, demonstrating how different filter contexts affect DAX formulas.

business-intelligence data data-analysis dax dax-functions powerbi powerbi-visuals visualization

Last synced: 08 Jan 2026

https://github.com/pawlo77/smarty

End-to-End Data Science tool

data-analysis data-processing pandas pipeline

Last synced: 08 May 2026

https://github.com/rishitabansal9/adult-census-income-prediction

This is a project made for data analysis and income prediction using random forest classifier with 91% accuracy.

data data-analysis data-science feature-engineering random-forest-classifier

Last synced: 25 Mar 2025

https://github.com/diligencefrozen/dcinside-data

Analyzing the Dcinside Frozen Gallery Dataset. #디시

data-analysis dataset

Last synced: 30 May 2026

https://github.com/saro0307/exploratory-data-analysis-terrorism

Phase 1 of Data Science project (program) to perform Exploratory Data Analysis on Terrorism using Python On Google Colab for Coderscave Internship sept 2023

colaboratory data-analysis datascience machine-learning numpy pandas python seaborn skit-learn visualization

Last synced: 13 Apr 2026

https://github.com/tatilimongi/first_python_project

Este repositório contém um estudo de caso de automação de planilhas em Python para análise de vendas de carros por fabricante ao longo dos anos

data-analysis email-sending file-manipulation graphical-visualization spreadsheet-automation

Last synced: 26 Mar 2025

https://github.com/jooapa/bytebrother

Byte Brother is watching YOU

data data-analysis security

Last synced: 26 Jan 2026

https://github.com/weisswuerste/polars-eurovision-analytics

Analytics example using both the Pandas and Polars libraries

data-analysis data-analytics pandas polars python python-3 python3

Last synced: 08 May 2026

https://github.com/jkaardal/csvnav

A memory-efficient python class for navigating large CSV/text files.

csv data-analysis data-science machine-learning memory-management

Last synced: 14 Jan 2026

https://github.com/marielachirinosr/pandas-weather-project

Pandas Weather Data. Explore straightforward Python scripts for weather information analysis.

data-analysis pandas python

Last synced: 29 Apr 2026

https://github.com/isaqueiros/newspapersales-predictions-linearregression_and_regularisation

This notebook is a study on the sales of newspapers of a local stand, with intention to predict the newspaper sales performance based on the different features available. For this, 4 sklearn models are applied: Linear Regression, Lasso Regression, Ridge Regression and Elastic Net Regression.

data-analysis data-science linear-regression machine-learning python regularization-methods sklearn-library sklearn-linear-regression

Last synced: 02 May 2026

https://github.com/bertiewooster/ipywidgets

Interactive data visualizations in a Jupyter Notebook per tutorial https://python.plainenglish.io/interactive-visualizations-with-pandas-seaborn-and-ipywidgets-173e5d7d6a5e

data-analysis data-science data-visualization ipython-notebook ipywidgets juypter-notebook python

Last synced: 06 Mar 2026

https://github.com/1adityakadam/carnegie_classifications_website

A comprehensive data analytics platform analyzing 50+ years of U.S. higher education trends through interactive visualizations and historical institution tracking.

css data-analysis html javascript python ui-design web-development

Last synced: 13 Apr 2026

https://github.com/tiagocavalcante/nesfit

NES 2024 Practical and Research Work - Group 2

data-analysis fitness

Last synced: 09 Jun 2026

https://github.com/grandechowhiskey/fcc-data_analysis-projects

A collection of projects completed as part of the FreeCodeCamp "Data Analysis with Python" certification. These projects cover statistical calculations, data visualization, and trend analysis using real-world datasets.

data-analysis data-visualization matplotlib pandas python3 scikit-learn seaborn

Last synced: 01 May 2026

https://github.com/balajimohan18/milk-production-time-series-forecasting-datascience-project

This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.

acf adf data-analysis data-cleaning data-science data-visualization eda exploratory-data-analysis machine-learning pacf seasonality time-series trends

Last synced: 30 May 2026

https://github.com/giorgossideris/athens_weather_analysis

Analyse the data of Athens' weather.

data-analysis visualization

Last synced: 16 Mar 2025

https://github.com/srinibas-masanta/electric-vehicle-analysis-dashboard

This repository features an interactive Tableau dashboard that visualizes electric vehicle (EV) adoption trends in the U.S. 🚗⚡ Explore EV growth, top manufacturers, regional distribution, and the impact of incentives—all in one dynamic view. 📊 Use filters to dive deeper into the data and uncover key insights! 🚀

dashboards data-analysis data-visualization tableau

Last synced: 15 Jan 2026

https://github.com/srinibas-masanta/zomato-customer-and-restaurant-analysis

This repository contains a comprehensive analysis of Zomato's platform, focusing on various aspects of customer behavior, restaurant performance, and market trends. The analysis leverages data-driven insights to answer key questions that can guide business strategies, enhance customer satisfaction, and optimize operational efficiency.

business-analytics data-analysis data-science data-visualization

Last synced: 02 Apr 2025

https://github.com/devexpress-examples/web-forms-pivot-grid-custom-summary-values

This example demonstrates how to determine the value type when you calculate custom summary values in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 06 Jul 2025

https://github.com/neuralsignal/loris

Loris: Database and Analysis application for a Drosophila Lab (or any lab)

data-analysis data-structures database datajoint flask neuroscience

Last synced: 12 Mar 2026

https://github.com/abhijeet107/final-project

Final project summation INTERNSHIP PROJECTS (2 WEEKS)

data-analysis data-cleaning-and-preprocessing excel mysql-database python tableau-public

Last synced: 23 Feb 2026

https://github.com/ankitpoddar07/sqlpizzas-saleproject

🍕 Pizza Sales Analysis with SQL

data-analysis database excel mysql powerbi ppt python

Last synced: 09 May 2026

https://github.com/nmelgar/healthy_child_dataviz

Data visualization project to analyze what a healthy child is.

analysis data data-analysis data-science data-visualization dataviz research tableau visualization

Last synced: 23 Feb 2026

https://github.com/satyam4229/prediction-of-cement-compressive-strength

Prediction of cement compressive strength is a model which is based on Regression model, Here we predict that how much is the compressive strength of the particular cement has with variety of mixtures of its component.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 13 Apr 2026

https://github.com/wilfordaf/dataanalyst-test

Test task for Junior Data Analyst position

data-analysis pandas python trading-data

Last synced: 28 Feb 2025

https://github.com/purposeachiever6/discovering_hidden_pattern

Discovering Hidden Patterns in Sequential and Numerical Data

data-analysis r statistical-analysis

Last synced: 28 Feb 2025

https://github.com/robinmillford/cardiac-care-performance-dashboard

This project presents a comprehensive data analysis and interactive dashboard focused on Cardiac Surgery and Percutaneous Coronary Interventions (PCI) performance by hospital, spanning from 2008 onwards.

cardiac data-analysis data-visualization plotly-express streamlit-dashboard tableau tableau-public

Last synced: 07 Sep 2025

https://github.com/juanmerino89/data-job-market-analysis-project

Análisis completo del mercado laboral a través de datos abiertos, scraping y visualizaciones. Proyecto explicado paso a paso en mi canal de YouTube.

career-insights data-analysis data-science job-data job-market jupyter-notebook machine-learning market-trends open-data portfolio-project python salary-analysis visualization web-scraping youtube-project

Last synced: 18 May 2026

https://github.com/ryanbbrown/volleyball-analysis-project

Analyzes 10 years of self-collected men's NCAA volleyball player height and team wins data to determine the importance of height for success.

data-analysis data-visualization python volleyball

Last synced: 31 May 2026

https://github.com/firetyrant/sql-portfolio-projects

Documenting my SQL learning journey with hands-on projects focused on data cleaning, analysis, and optimization.

bigquery data-analysis databases etl learning portfolio query-optimization sql

Last synced: 19 Apr 2026

https://github.com/satyam4229/prediction-of-different-diseases

Prediction of the different diseases with the help of different symptoms express the diseases in the real time. In the dataset, there are 132+ different symptoms on which the model is trained to give the best result of the disease.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 13 Apr 2026

https://github.com/rainbowatcher/simple

Make data work easier, saving your working time

bigdata data-analysis etl

Last synced: 10 Apr 2025

https://github.com/mattholy/haka

HaKa is an out-of-the-box tool system designed for data engineers and data analysts in medium-sized enterprises. It is easy to deploy and scale.

celery data-analysis data-engineering fastapi python uvicorn-gunicorn

Last synced: 19 May 2026

https://github.com/scailfin/rob-client

Command line user interface for the Reproducible Open Benchmarks for Data Analysis Platform (ROB)

benchmarks data-analysis reproducibility

Last synced: 14 Jan 2026

https://github.com/samruddhi3012/rfm-sales-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 12 Mar 2025

https://github.com/nimomach/cafe-sales

This analysis focuses on evaluating the sales performance of a cafe by examining key metrics such as total revenue, sales by product category, peak sales times, and many more.

cafe data-analysis data-visualization sales

Last synced: 12 Mar 2026

https://github.com/masamallow/jupyterlab-my-local

Configuration to run my personal JupyterLab on my local.

data-analysis jupyter jupyter-notebook jupyterlab

Last synced: 26 Mar 2025

https://github.com/deliprofesor/behavioral-insights-and-data-exploration

This project analyzes Spanish speech data, focusing on acoustic features and demographics. It includes data cleaning, outlier detection, clustering, and time series modeling (ARIMA, Holt-Winters) to uncover patterns in speech duration and word frequency.

acoustic-features arima clustering data-analysis holt-winters k-means machine-learning speech-analysis time-series-analysis

Last synced: 10 Apr 2025

https://github.com/deliprofesor/k-means-clustering-for-retail-data-analysis

This project uses K-Means clustering to segment wholesale customers based on their spending habits. The data is preprocessed, scaled, and clustered into four groups. The Elbow and Silhouette methods determine the optimal number of clusters, and results are visualized using boxplots and scatter plots to uncover spending patterns.

clustering-visualisation data-analysis elbow-method k-means k-means-clustering r silhouette-score

Last synced: 10 Apr 2025