An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sevilaymuni/project-no.2-pandas-tableau-student-mobility

Pandas assisted Feature Engineering on Study Mobility: Tableau Dashboards on Students' Preferences

data-analysis data-extraction data-visualization feature-engineering pandas python tableau-dashboards tableau-desktop tableau-public

Last synced: 03 May 2026

https://github.com/annnieglez/computer-vision-parking-lot

This project leverages computer vision techniques to analyze parking lot occupancy. The goal is to detect available parking spaces in real-time using image and video input.

computer-vision data-analysis data-science data-visualization google-colab image-classification image-processing machine-learning python transfer-learning

Last synced: 15 May 2026

https://github.com/dina-hosny/retail-store-data-modeling-and-analysis-using-datastage

The project implements a star-schema data warehousing flow, then utilize IBM InfoSphere DataStage to develop efficient ETL pipelines to create data marts and perform some analysis on them.

data-analysis datastage datawarehousing etl extract ibm load transform

Last synced: 06 Mar 2026

https://github.com/souravxbera/credit-card-approval-predictor

End-to-end Machine Learning project to predict credit card approval decisions using real-world financial features. Includes EDA, model training, and deployment-ready architecture

credit-card-approval-prediction data-analysis machine-learning python scikit-learn streamlit

Last synced: 15 May 2026

https://github.com/lewismakau/portfolio-projects

This repository contains file data and SQL files for projects used for my Portfolio.

data-analysis data-cleaning data-structures data-visualization database google-analytics microsoft-sql-server mysql powerbi tableau

Last synced: 02 Apr 2026

https://github.com/azaz9026/loan_approval_prediction

Welcome to the Loan Approval Prediction repository! This project aims to build a predictive model that can determine whether a loan application should be approved or denied based on various features. Purpose The goal of this repository is to develop a machine learning model that can accurately predict loan approval decisio

data data-analysis data-visualization eda machine-learning numpy pandas python statistics

Last synced: 06 Apr 2026

https://github.com/amruthadevops/stock-market-analysis

To analyze market trends and predict future market behavior using machine learning techniques

data-analysis data-science jupyter-notebook machine-learning powerbi-desktop python stock-market

Last synced: 15 May 2026

https://github.com/kathisnehith/realestate-sales-analysis

Investigating real estate sales trends to understand market dynamics and inform investment decisions.

data-analysis excel realestate sales sql stastical-analysis-tools tableau

Last synced: 12 Feb 2026

https://github.com/chahelgupta/hospital-readmission-prediction-and-analysis

The Hospital Readmission Prediction project uses clinical data to predict diabetic readmissions. SVM + SMOTE achieved 61.16% accuracy, with key predictors including hospital stay, lab tests, and medications.

data-analysis knn-classification logistic-regression machine-learning prediction prediction-model python random-forest-classifier smote svm-classifier

Last synced: 15 May 2026

https://github.com/sanafagal/wsp-msg-automation

An intuitive application for managing and analyzing customer and reseller data stored in Google Sheets, providing insights and streamlined data organization.

automation cloud-credentials data-analysis google-sheets-api python

Last synced: 16 Jun 2025

https://github.com/vishal-verma-96/Pre-Owned-Car-Price-prediction-using-Streamlit-App

Capstone Project by skill Academy- Exploratory Analysis, Visualization and Prediction of Used Car Prices. Deploying the highest-scoring model with Streamlit web app

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms matplotlib numpy pandas python3 regression-algorithms scikit-learn seaborn streamlit

Last synced: 02 Mar 2025

https://github.com/shreeparab1890/india-gdp-rate-1960-to-2021-data-analysis

This ipython notebook is the Exploratory data analysis (EDA) of the India GDP Rate 1960 to 2021.

analysis data-analysis eda exploratory-data-analysis ipython-notebook jyputer-notebook matplotlib matplotlib-pyplot pandas python

Last synced: 06 Mar 2026

https://github.com/anastasius21/creditcardfrauddetection

This repository contains a Jupyter Notebook for Credit Card Fraud Detection Model and a csv dataset on which it is being trained

credit-card-fraud data-analysis data-science data-visualization fraud-detection logistic-regression machine-learning

Last synced: 16 Jun 2025

https://github.com/ahmedkhaled404/data-cleaning-and-eda-layoffs-mysql

This project involves cleaning a dataset containing information about layoffs from companies around the world.

data data-analysis data-cleaning data-preprocessing datacleaning eda exploratory-data-analysis mysql sql

Last synced: 08 Jun 2026

https://github.com/bhiogade/customer-purchase-analysis

Comprehensive Customer Purchase Analysis Across Multiple Dimensions

data-analysis data-visualization tableau tableau-desktop

Last synced: 02 Feb 2026

https://github.com/dzakwanalifi/reglins

regLins is an R package designed for performing linear regression analysis using various optimization methods. It also provides an interactive Shiny application for a more dynamic analysis experience.

data-analysis linear-regression optimization r shiny-app

Last synced: 09 Jul 2025

https://github.com/faith99/water_pollution_dashboard

A data visualization project exploring water access, contamination and health outcomes

data-analysis data-visualization powerbi public-health publichealth

Last synced: 02 Feb 2026

https://github.com/gonzalofuentes28/dpeek

Interactive terminal data viewer for CSV, TSV, JSON, and JSONL files

bubbletea cli csv csv-viewer data-analysis data-viewer golang json json-viewer sqlite terminal tui

Last synced: 06 Apr 2026

https://github.com/marcomadera/test-for-random-numbers

Test for random number between 0 and 1

data-analysis statistics

Last synced: 09 Jul 2025

https://github.com/madhursinghbhadoriya/data_analysis_sales_insights_using_tableau

• Performed Data Cleaning using MySQL. • Data analysis and ETL in Tableau. • Created an Interactive Dashboard with significant information about the Sales Insights, Profit and Revenue Analysis.

data-analysis data-visualization dataanalysis etl mysql tableau-dashboards tableau-desktop

Last synced: 09 Apr 2025

https://github.com/lucaspadoni/9-11-hijackers-social-network-analysis

Social Network Analysis focused on the events of 9/11/2001. By examining publicly available data through SNA techniques, we gain insights into the organizational structure of the terrorist network, offering valuable perspectives on key relationships and connections.

9-11 data-analysis data-analytics graph-theory hijacking network-analysis sna social-network-analysis terrorism terrorist-attacks

Last synced: 19 May 2026

https://github.com/brunomontezano/sleep-cognition-and-functioning

💤 Data analysis of a brief communication published in Psychiatry Research Communications journal by Montezano et al (2023).

bipolar-disorder cognition data-analysis data-visualization data-viz depression ggplot2 pelotasrs psychiatry psychology published-article r sleep ucpel

Last synced: 13 Jun 2026

https://github.com/silianpan/python-data-analysis-course

python data analysis course of drotion-lega

data-analysis jupyter-notebook panda

Last synced: 11 Apr 2025

https://github.com/saob007/tablero_subsidios_servicio_agua

Se construye un dashboard para el análisis de la distribución y asignación de subsidios para agua potable y alcantarillado otorgados por la Secretaría de Planeación de la Alcaldía de Sincelejo en 2020, con el objetivo de identificar patrones en cobertura, consumo, facturación y subsidios, facilitando la toma de decisiones en políticas públicas

dashboard data-analysis data-visualization looker-studio

Last synced: 31 Jan 2026

https://github.com/oubiche-ishak19/stock_evaluation_python

A Python script to classify companies based on financial metrics like Piotroski F-Score and Stock Valuation, using CSV financial data for analysis and output.

backtesting-frameworks classification csv-processing data-analysis expert-system finance financial-analysis-tools python rule-based-classifier stock stock-market streamlit tkinter-gui yahoo-finance

Last synced: 15 May 2026

https://github.com/advestis/adadjust

Package allowing to fit any mathematical function to (for now 1-D only) data.

data-analysis fit python

Last synced: 17 May 2026

https://github.com/cadedupont/mlb-data-analysis

Performing analysis on dataset of active MLB players in R

baseball-analytics data-analysis data-science mlb-stats-api r

Last synced: 23 Jun 2026

https://github.com/diliprk/smartcityvisualization

Data Wrangling and Data Visualization Works done for Smart City Project at HBK Saar

bokeh data-analysis data-visualization python3

Last synced: 15 May 2026

https://github.com/rohithay/titanic-data-analysis

Predict Survival Outcomes from the 1912 Titanic disaster based on each passenger's features, such as sex and age.

data-analysis machine-learning matplotlib pandas scipy-stats statistical-models

Last synced: 15 May 2026

https://github.com/ansh-info/literaturesurvey

Literature Survey Engine, leverages the powerful Semantic Scholar's Recommendation API to provide you with highly relevant research article recommendations based on your curated lists of articles.

api api-integration automation data-analysis data-visualization docker docker-compose literature-survey machine-learning mysql paper-recommendations python recommendation-system research-tools semantic-scholar streamlit zotero

Last synced: 10 Apr 2026

https://github.com/hrolive/patc-big-data-analytics-bsc

Introduction to the main concepts and technologies related to Big Data and Data Analytics and its applications to real projects.

analytics bias big-data data-analysis hadoop hpc machine-learning mapreduce nosql python spark spark-streaming visualization

Last synced: 12 Apr 2026

https://github.com/jakebrehm/lemons

🍋 A Python package which makes building GUIs easy peasy lemon squeezy.

data-analysis data-science gui python python3 python37 tkinter tkinter-gui tkinter-python

Last synced: 27 Mar 2025

https://github.com/jakebrehm/ezpz-reducer

🪓 Concatenates and then decimates one or more csv files.

data-analysis data-manipulation data-science python python3

Last synced: 27 Mar 2025

https://github.com/dylanbk/exploring-data

A collection of programs that explore data engineering and analysis.

data-analysis data-engineering matplotlib pandas python

Last synced: 02 Mar 2025

https://github.com/kfrural/dashboard_agro

Dashboard Agro is a technological platform that integrates several components to support Brazilian agribusiness through data analysis, visualization and forecasts. This innovative solution was developed to serve three main groups: farmers, researchers and public managers.

big-data data-analysis predictive-analytics python

Last synced: 15 May 2026

https://github.com/cyberoctane29/deutsche-bank-customer-churn-prediction-end-to-end-analysis-and-modeling

In this project, I aim to predict customer churn for Deutsche Bank using supervised machine learning. It involves data exploration, feature engineering, and building Naive Bayes, Decision Tree, Random Forest, and XGBoost models. Models are tuned, evaluated, and compared to identify the best approach for churn prediction.

bank-customer-churn churn-analysis churn-prediction customer-churn-analytics data-analysis data-analytics data-visualization decision-tree eda gaussian-naive-bayes machine-learning random-forest supervised-learning xgboost

Last synced: 11 Oct 2025

https://github.com/aalkiyumi/project-4-big-data-analysis-with-pyspark-on-weather-data

In this project, I analyzed weather data from the NCEI Global Surface Summary of Day dataset using PySpark in Jupyter Notebook. Tasks included data cleaning, statistical analysis, and forecasting for temperature, wind speed, precipitation, and extreme weather events. The project also predicts future weather patterns for Cincinnati and Florida.

big-data-analytics cs5165 data-analysis data-cleaning data-engineering data-science introduction-to-cloud-computing jupyter-notebook machine-learning precipitation-analysis predictive-modeling pyspark statistical-analysis temperature-forecasting time-series-forecasting uc uc2026 university-of-cincinnati wind-speed-data

Last synced: 17 Mar 2025

https://github.com/oshinrathor/data-science-systems-and-analytics-projects

Dive into my Data Science Projects Repository, featuring a Spam SMS Classifier, NIA Dashboard, H1N1 Vaccine Prediction, and NYC Taxi Fare Prediction. Each project showcases my skills in data cleaning, exploratory analysis, modeling, and visualization, offering valuable insights and methodologies for data enthusiasts and practitioners.

dashboard data-analysis data-driven-decisions data-presentation data-science data-visualization dataexploration eda insights nia webanalytics

Last synced: 02 Mar 2025

https://github.com/nishumehta/retail-sales-analysis

Retail sales performance analysis using Python and Power BI.

data-analysis ipynb-notebook jupyter-notebook powerbi python

Last synced: 15 May 2026

https://github.com/satyacoder29/crm-analytics

CRM Analytics Dashboard – An interactive dashboard using Tableau, SQL, and Salesforce CRM Analytics (CRMA) to analyze sales performance, customer segmentation, and churn prediction. Features automated ETL pipelines, predictive analytics, and real-time insights for data-driven decision-making. 🚀📊

advanced-excel data-analysis data-cleaning data-collection data-transformation data-visualization matplotlib numpy pandas powerbi python seaborn sql tableau

Last synced: 03 Mar 2025

https://github.com/prakashjha1/whatsapp-chat-analyzer

WhatsApp Analyzer means we are analyzing our WhatsApp group activities. It tracks our conversation and analyses how much time we are spending or saying it as “wasting” on WhatsApp.

data-analysis data-science natural-language-processing pandas pyhton regular-expression

Last synced: 15 May 2026

https://github.com/vavarm/data-analysis-french-electric-automobile-infrastructure

Data analysis realized in R Shiny and Python about the French electric vehicle and charging station infrastructure

data-analysis data-science data-visualization factominer geojson ggplot2 plotly python r rshiny

Last synced: 15 May 2026

https://github.com/lord3008/instances-of-data-analysis

This repository of mine shows my work on data analysis of various projects that I made. I feel data analysis is the very key to investigate a solution. Further more it enlightens the direction towards model building.

data data-analysis

Last synced: 03 Mar 2025

https://github.com/jonek/pv-city-mastr

Extract and analyze data about photovoltaic systems in Germany

data-analysis germany jupyter-notebook pandas photovolatic-power photovoltaic

Last synced: 11 May 2026

https://github.com/sciencesar-labs/py485-final-project

ROOT-based muon data analysis using Python & Jupyter – final project for PY485E @ CERN

cern computational-physics data-analysis jupyter-notebook muons python root uproot

Last synced: 15 May 2026

https://github.com/dina-hosny/sparkify---data-modeling-with-cassandra

Sparkify - Data Modeling with Cassandra - Udacity Data Engineering Expert Track.

cassandra cql data-analysis data-engineering data-modeling data-warehousing etl python

Last synced: 11 Apr 2026

https://github.com/satyacoder29/comparison-of-region-based-sales-tableau

The region-based sales comparison analyzes sales performance across different regions. It identifies trends, top-performing regions, and areas needing improvement by comparing metrics like revenue, growth rate, and product demand. This analysis helps optimize sales strategies and resource allocation for better performance.

data-analysis data-cleaning data-collection data-visualization powerquerym relationships tableau tableau-desktop unions

Last synced: 02 Feb 2026

https://github.com/chingu-voyages/v47-tier3-team-30

An easily accessible tool for calculating electricity-related carbon emissions, along with insights for reducing environmental impact. | Voyage-47 | https://chingu.io/ | Twitter: https://twitter.com/ChinguCollabs

carbon-emissions carbon-footprint data-analysis data-engineering data-science

Last synced: 10 May 2026

https://github.com/janashanaa/flightanalysis

This Jupyter Notebook presents an exploratory data analysis of data derived from a flight booking website.

data-analysis data-visualization exploratory-data-analysis jupyter-notebook python

Last synced: 15 May 2026

https://github.com/emmarhoffmann/analysis-of-california-real-estate-market-factors-influencing-home-prices

Investigates how home size, number of bedrooms, and bathrooms influence home prices, with comparisons across California, New York, New Jersey, and Pennsylvania.

data-analysis r real-estate statistical-models

Last synced: 17 Mar 2025

https://github.com/emmarhoffmann/analysis-of-student-debt-among-first-generation-college-students

Explores the financial landscape of first-generation college students, analyzing patterns in student debt based on factors like median income, net price of attendance, and enrollment size.

data-analysis first-generation-college-students r statistical-models

Last synced: 17 Mar 2025

https://github.com/mindlessmuse666/apartment-price-predictor

Python-проект по прогнозированию стоимости аренды квартир с помощью линейной регрессии. Практическая работа по теме: "Основы машинного обучения" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

apartment-price-prediction data-analysis data-science linear-regression linear-regression-models machine-learning matplotlib python regression sklearn unit-testing

Last synced: 11 Apr 2026

https://github.com/mindlessmuse666/iris-knn

Проект демонстрирует применение алгоритма k-ближайших соседей (KNN) для классификации набора данных Iris. Включает загрузку данных, обучение модели, оценку производительности и визуализацию результатов с использованием библиотек Pandas, Scikit-learn, Matplotlib, Seaborn и Plotly.

algorithm classification data-analysis data-visualization iris-dataset knn lazy-learning machine-learning python scikit-learn

Last synced: 17 Aug 2025

https://github.com/ljadhav25/knn-algorithm-data-science-

This repository contains a project demonstrating the implementation and application of the K-Nearest Neighbors (K-NN) algorithm in Data Science. The objective is to provide a comprehensive understanding of the K-NN algorithm, including data preprocessing, model training, evaluation, and visualization of results. This project is ideal for beginners

data-analysis data-science knn-classification machine-learning matplotlib-pyplot numpy pandas-library seaborn

Last synced: 16 Apr 2026

https://github.com/pramodkondur/dataspark-end-to-end-dataanalytics

Cleaned, performed EDA and stored data in MySQL. Queried, and analyzed data, uncovering opportunities to drive revenue growth and optimize operations, with a potential revenue growth of $30.03 million. Reported key insights using Power BI.

data-analysis data-visualization eda powerbi python sql

Last synced: 21 May 2026

https://github.com/spshah1701/world-development-indicators

Analysis of World Development Indicators (WDI) using big data technologies, specifically Databricks, Apache Spark, and Scala.

apache-spark big-data data-analysis spark-sql

Last synced: 17 Mar 2025

https://github.com/vara-co/solar-eclipse-2024

Group Project on the 2024 Solar Eclipse's Path over the US with an interactive map and a couple of visualizations on the data gathered.

data-analysis data-visualizations html-css-javascript interactive-map javascript map solar-eclipse

Last synced: 15 May 2026

https://github.com/mosalem149/pythonutilities

A collection of Python scripts for common utility tasks including file manipulation, word counting, longest word detection, and grade categorization. Perfect for quick and easy solutions to everyday programming problems.

data-analysis educational-tools file-io file-manipulation grade-calculation python text-analysis text-processing utility word-counting

Last synced: 15 May 2026

https://github.com/darshan1924/house-price-pridiction

This repository contains a machine learning project for predicting house prices based on various features, including geographical coordinates. The project includes data preprocessing steps to handle# House Price Prediction Project

data-analysis data-preprocessing house-prices jupyter-notebook machine-learning prediction

Last synced: 27 Mar 2025

https://github.com/thecoderpinar/globalwarmingforecast

🌍 Global Warming Forecast Tool An advanced tool for analyzing and forecasting climate trends using ARIMA and Prophet models, with interactive visualizations and scenario simulations.

arima climate-change data-analysis environmental-science forecasting global-warming machine-learning prophet streamlit time-series-analysis visualization

Last synced: 27 Mar 2025

https://github.com/nehul1149/olympic-data-analysis

This project is an interactive data visualization and analytics platform for exploring historical Olympic Games data. Built with Python and Streamlit, it offers an in-depth analysis of medal tallies, athlete statistics, and country-wise performance trends, providing users with powerful insights into the world's biggest sporting event.

analysis data-analysis data-science data-visualization matplotlib python streamlit

Last synced: 18 May 2026

https://github.com/brevex/hotel-booking-demand-data-analysis

Data analysis in Python of demand for urban hotels and resorts showing their causes and relationships

data-analysis data-science hotel-booking-analysis kaggle python

Last synced: 08 May 2026

https://github.com/tknishh/investing-platform

An investing platform application to help users get information and analyze various foreign currency assets. The investing platform uses an ETL pipeline to insert new batches of Forex data once a day.

data-analysis investing-platform pipeline

Last synced: 18 Mar 2025

https://github.com/k31ner/inmopipeline

Proyecto integral de análisis y modelado predictivo de datos inmobiliarios, que abarca recolección, transformación, visualización y machine learning utilizando Python y herramientas modernas de ingeniería y ciencia de datos.

data-analysis data-engineering data-science fastapi python streamlit

Last synced: 08 May 2026

https://github.com/iamsainikhil/us-births-analysis

Analysis of US-Births during 1994-2003 based on CDC-NCHS data set.

data-analysis python

Last synced: 16 May 2026

https://github.com/ebrizzzz/data-visualization-project-using-tableau

A data visualization project for the Visual Data Analysis course (Spring Term 2025) at the University of Skövde. This project explores the factors influencing national happiness scores across different global regions from 2005 to 2022.

analytics data data-analysis data-science data-visualization python regression tableau

Last synced: 16 Jun 2025

https://github.com/qorah/vic-edu-housing-insights

Analysis of education outcomes and housing affordability in Victoria, Australia.

data-analysis jupyter-notebook

Last synced: 18 Mar 2025

https://github.com/betkh/datascieneinpython

Jupiter Notebook files

data-analysis data-visualization

Last synced: 16 Jun 2025

https://github.com/abidshafee/google.colaboratory_projects

This repository contains the collections of interactive python notebooks (ipynb) that are some of my projects on Data Science, Machine Learning (ML), and Natural Language Processing (NLP).

colaboratory data-analysis data-science lstm machine-learning nlp statistics time-series

Last synced: 09 Jul 2025

https://github.com/anas436/data-science-projects

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources. Dive in to discover insights and techniques in data science. Reach out for collaborations and feedback.

data-analysis data-science machine-learning

Last synced: 27 Mar 2025

https://github.com/felipe-veas/visor-sueldos-publicos

Herramienta interactiva para visualizar y analizar remuneraciones del sector público en Chile, construida con Streamlit.

audit chile data-analysis python streamlit transparency

Last synced: 16 May 2026

https://github.com/czesctuklap/sustainable-fashion-database-analysis

This project, analyzes a dataset of sustainable fashion trends for 2024. It includes data preprocessing, exploration, visualization, and insights on environmental impact factors such as carbon footprint, water usage, waste production, and sustainability practices.

data-analysis data-visualization database dataset keggle sustainable-fashion

Last synced: 30 Apr 2026

https://github.com/fmind/malpop

Rank the popularity of malware applications by their occurrence on VirusTotal

data-analysis malware popularity ranking virustotal

Last synced: 11 Apr 2025

https://github.com/estevan-ulian/py-agent-voice

Um projeto para lidar com interações de voz entre humano e agente de I.A. permitindo a leitura e análise de dados de um arquivo CSV.

agent-based-modeling data-analysis python3 whisper-ai

Last synced: 11 Apr 2025

https://github.com/badranalyst/startup-expansion-analysis-with-pandas-matplotlib-and-power-bi

Analyzes startup growth and expansion factors using Pandas for data analysis and Matplotlib for visualizations. Complements findings with data visualizations in Power BI, providing actionable insights into funding and market trends.

dashboard data-analysis data-visualization dataset matplotlib matplotlib-pyplot pandas power-bi powerbi

Last synced: 16 May 2026

https://github.com/josedanielchg/nyc-schools-test-scores-exploration

DataCamp project analyzing NYC public school test scores to identify top math-performing schools, the best overall SAT scores, and borough-level variability using Python and pandas

data-analysis jupyter-notebook python

Last synced: 19 Mar 2025

https://github.com/swat1563/recommendation-system

This repository features a recommendation system and analytics engine using datasets on users, organizations, contents, contacts, events, and recommendations. It includes data preprocessing, building a recommendation system, and creating visual reports with Power BI.

analytics data-analysis data-visualization engine kaggle numpy pandas powerbi powerbi-dashboards powerbi-desktop powerbi-reports python recommendation-engine recommendation-system recommender-systems scikit-learn scipy

Last synced: 07 Jan 2026