An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/dadvaiahpavan/ai-data-scientist-

AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.

ai anomaly-detection classification data-analysis data-science machine-learning panda plotu regression scikit-learn sentiment-analysis streamlit

Last synced: 03 May 2026

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 13 May 2026

https://github.com/codeonthespectrum/web-scrap

Este projeto realiza o web scraping da Wikipédia para obter dados sobre os municípios mais populosos do estado do Rio de Janeiro.

data-analysis data-visualization webscraping

Last synced: 16 Feb 2026

https://github.com/kavicastelo/colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 26 Mar 2025

https://github.com/jibbs1703/airline-data-analysis

This repository contains the Exploratory Data Analysis of the flight delay and cancellation for airline flights in the United States in the year 2015. With this EDA, insights and solutions are suggested for business owners and airport managers.

business-insights business-solution data-analysis data-visualization

Last synced: 20 Mar 2025

https://github.com/muneeb706/r-programming

R-Programming examples for data analysis.

data-analysis r-programming

Last synced: 26 Mar 2025

https://github.com/s1m0n38/cr-analysis

An exercise in data collection/analysis

clash-royale data-analysis data-collection data-science

Last synced: 08 Jul 2025

https://github.com/juliargubolin/sql-for-data-analysis

This repository was created in order to insert all the documents, files and notes I took while learning SQL and data analysis through "SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights" by Cathy Tanimura (O'Reilly).

advanced data-analysis data-science sql

Last synced: 11 Jan 2026

https://github.com/alexzalox/us_stocks

Read US stock tickers and their costs from a CSV and display the formatted DataFrame in the terminal using pandas.

data-analysis finance pandas python python3 stocks yfinance

Last synced: 15 May 2026

https://github.com/tharun2806/end-to-end-internship-data-analysis

Internship Dataset Analysis is an end-to-end project analyzing an internship dataset obtained from Kaggle. The project involves cleaning and preprocessing the data using Excel and SQL, followed by exploratory data analysis (EDA). The analysis includes statistical, sectoral and geospatial insights, visualized through an interactive Tableau dashboard

bigquery data-analysis data-cleaning data-preprocessing data-visualization exploratory-data-analysis geospatial-analysis microsoft-excel reporting sectoral-analysis statistical-analysis tableau-public

Last synced: 01 Apr 2025

https://github.com/galal-pic/advanced_regression

A project to predict house prices through machine learning different techniques

data-analysis data-science deep-learning feature-engineering flask machine-learning python regression

Last synced: 08 Jul 2025

https://github.com/wwgolay/hr1099-timelapse-vlbi

The repository for HR1099 timelapse VLBI.

astronomy astrophysics data-analysis website

Last synced: 03 Apr 2025

https://github.com/vipulbunny/ml-learning_projects

A collection of machine learning projects implemented in Python, showcasing core concepts like regression, classification, clustering, and model evaluation techniques. Ideal for learners and data science enthusiasts.

classification clustering data-analysis data-science data-visualization decision-trees jupyter-notebook machine-learning model-evaluation random-forest regression supervised-learning unsupervised-learning

Last synced: 23 Jul 2025

https://github.com/hevalhazalkurt/word_analyser

A web app developed in Python and Django that analyzes given text mathematically and sentimentally.

analyzer analyzes content data-analysis django emotion python python3 sentiment sentiment-analyser sentiment-analysis text text-analysis

Last synced: 19 May 2026

https://github.com/kailenroa/sleep-efficiency-project

This project focuses on analyzing sleep efficiency using wearable technology data. It explores patterns in sleep behavior and key factors impacting sleep quality. A dashboard was created using phyton and data visualization tools to provide actionable insights and recommendations for improving sleep health.

dashboard data-analysis html phyton sleep-efficiency

Last synced: 06 Jan 2026

https://github.com/jpgiant/training_project

Analyzing whether there is a difference between the average death ages of left handers and right handers using Bayesian Conditional Probability Theorem.

bayesian-statistics data-analysis data-visualization numpy pandas-dataframe python

Last synced: 30 Apr 2026

https://github.com/aalkiyumi/project-3-docker-container-for-data-processing-script

This Dockerized Python application analyzes two text files (IF.txt and AlwaysRememberUsThisWay.txt). It counts total words, identifies the largest file, and finds the top three most frequent words in each. Results are saved to an output file and printed to the console.

cs5165 data-analysis data-engineering data-science docker introduction-to-cloud-computing statistical-analysis text-processing uc uc2026 university-of-cincinnati

Last synced: 17 May 2026

https://github.com/anas436/data-science-projects

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources. Dive in to discover insights and techniques in data science. Reach out for collaborations and feedback.

data-analysis data-science machine-learning

Last synced: 27 Mar 2025

https://github.com/k31ner/inmopipeline

Proyecto integral de análisis y modelado predictivo de datos inmobiliarios, que abarca recolección, transformación, visualización y machine learning utilizando Python y herramientas modernas de ingeniería y ciencia de datos.

data-analysis data-engineering data-science fastapi python streamlit

Last synced: 08 May 2026

https://github.com/vara-co/solar-eclipse-2024

Group Project on the 2024 Solar Eclipse's Path over the US with an interactive map and a couple of visualizations on the data gathered.

data-analysis data-visualizations html-css-javascript interactive-map javascript map solar-eclipse

Last synced: 15 May 2026

https://github.com/prakashjha1/whatsapp-chat-analyzer

WhatsApp Analyzer means we are analyzing our WhatsApp group activities. It tracks our conversation and analyses how much time we are spending or saying it as “wasting” on WhatsApp.

data-analysis data-science natural-language-processing pandas pyhton regular-expression

Last synced: 15 May 2026

https://github.com/nishumehta/retail-sales-analysis

Retail sales performance analysis using Python and Power BI.

data-analysis ipynb-notebook jupyter-notebook powerbi python

Last synced: 15 May 2026

https://github.com/madhursinghbhadoriya/data_analysis_sales_insights_using_tableau

• Performed Data Cleaning using MySQL. • Data analysis and ETL in Tableau. • Created an Interactive Dashboard with significant information about the Sales Insights, Profit and Revenue Analysis.

data-analysis data-visualization dataanalysis etl mysql tableau-dashboards tableau-desktop

Last synced: 09 Apr 2025

https://github.com/kathisnehith/realestate-sales-analysis

Investigating real estate sales trends to understand market dynamics and inform investment decisions.

data-analysis excel realestate sales sql stastical-analysis-tools tableau

Last synced: 12 Feb 2026

https://github.com/annnieglez/computer-vision-parking-lot

This project leverages computer vision techniques to analyze parking lot occupancy. The goal is to detect available parking spaces in real-time using image and video input.

computer-vision data-analysis data-science data-visualization google-colab image-classification image-processing machine-learning python transfer-learning

Last synced: 15 May 2026

https://github.com/rijul007/market-basket-analysis-using-r

Market Basket Analysis using association rules, leveraging R’s powerful tools for data-driven retail strategies.

data-analysis data-science r

Last synced: 02 Apr 2025

https://github.com/smsraj2001/sds-datathon

A simple data science project/hackathon done as part of SDS course

data-analysis data-analysis-python data-cleaning data-science statistics statistics-for-data-science

Last synced: 16 Jul 2025

https://github.com/syarwinaaa09/exploring-airbnb-market-trends

a data analysis project exploring NYC Airbnb listings, using data visualization and pandas for price trends, room types, and reviews.

airbnb data-analysis data-science data-visualization jupyter-notebook new-york-city nyc pandas price-analysis reviews room-types

Last synced: 30 Apr 2026

https://github.com/nishumehta/uber-rides-data-analysis

An in-depth analysis of Uber ride data for the year 2016, to uncover patterns in ride behavior, mileage trends, and frequent start locations to generate actionable insights for business decisions.

data-analysis jupyter-notebook matplotlib-pyplot pandas python tableau-dashboards

Last synced: 09 May 2026

https://github.com/danicaalana/sales-review-sentiment-analysis

This project is a sentiment analysis project using a machine learning model. It analyzes Amazon product reviews to determine whether the sentiment expressed is positive, negative, or neutral using Multinomial Naive Bayes Method.

amazon data-analysis data-science machine-learning naive-bayes python sales-review sentiment-analysis

Last synced: 15 May 2026

https://github.com/eesunmoon/genai_cor-recom

[Project] Outfit Coordination Recommender System using KoAlpaca

data-analysis fine-tuning generative-ai huggingface keyword llm numpy pandas python python3 selenium

Last synced: 06 Apr 2026

https://github.com/mtholahan/advanced-mysqlquery-tuning-mini-project

Analyzed EuroCup 2016 data with advanced SQL queries. Imported CSV datasets into MySQL, designed schema with match, player, and referee details, and implemented queries covering match outcomes, penalty shootouts, player stats, bookings, substitutions, and referee activity to explore tournament dynamics.

bootcamp data-analysis data-engineering data-modeling database eurocup football mysql queries soccer sports springboard sql

Last synced: 15 May 2026

https://github.com/pentalpha/bti-performance-study

A series of analysis on a large amount of data about the grades of students in the Technology Information course at UFRN

analysis big-data clustering data-analysis data-science data-visualization ipynb ipython jupyter-notebook performance-analysis plot python python3

Last synced: 15 May 2026

https://github.com/pentalpha/eu-car-emissions-analysis-2015

Analysis of CO² Emissions on Passenger Cars at the E.U. Contries, Year 2015.

data-analysis data-science dataset jupyter-notebook python python3

Last synced: 15 May 2026

https://github.com/12danielll/neurogenomics_project

This project focuses on analyzing sequencing data to understand molecular mechanisms of neurological diseases and predict the effectiveness of immunotherapy in breast cancer patients. It integrates Python and R scripts for data processing, statistical analysis, and visualization, alongside a comprehensive report detailing methods and findings.

bioinformatics biostatistics clustering clustering-algorithms data-analysis data-visualization deseq2 differential-gene-expression functional-analysis immune-therapy machine-learning neurological-disease neuroscience pca-analysis python r seurat single-cell-analysis

Last synced: 06 Apr 2026

https://github.com/srummanf/elnino-anomaly-study

Study on El Niño’s impact on Chennai groundwater sustainability

data-analysis machine-learning python satellite-imagery-analysis

Last synced: 15 May 2026

https://github.com/kineticloom/plydb-fun-nfl-analyst

Analyze NFL data with your AI agent

data-analysis football-analytics nfl

Last synced: 15 May 2026

https://github.com/cassiofb-dev/fide-rating-analysis

The plot speaks for itself

chess data-analysis fide hans rating

Last synced: 15 Jun 2025

https://github.com/buildwithlal/introduction-to-data-science-in-python-coursera

introduction to data science in python, part of Applied Data Science using Python Specialization from University of Michigan offered by Coursera

data-analysis matplotlib numpy pandas

Last synced: 03 May 2026

https://github.com/vetrivel07/flight-price-prediction

Developed a flight price prediction model using Python, analyzing historical data to forecast airfare prices and help travelers make informed booking decisions

data-analysis data-visualization jupyter-notebook numpy pandas python

Last synced: 15 Jun 2025

https://github.com/fer-aguirre/cookiecutter-data-analysis-extensive

A cookiecutter template for data analysis projects using Python.

cookiecutter data-analysis project-template python

Last synced: 09 Apr 2025

https://github.com/scailfin/rob-webapi-flask

Default RESTful Web API implementation for the Reproducible Open Benchmarks for Data Analysis Platform (ROB) using the Flask web framework.

benchmarks data-analysis reproducibility webapi

Last synced: 17 Mar 2026

https://github.com/deliprofesor/customerseg-customer-segmentation-and-shopping-analysis

This project performs data exploration, segmentation, and modeling of wholesale customer data using clustering algorithms, PCA, and decision trees to analyze purchasing behavior and predict customer channel preferences.

clustering customer-segmentation data-analysis data-visualization dbscan decision-tree gmm kmeans machine-learning pca

Last synced: 24 Jun 2025

https://github.com/zeh237/superstore-data-analytics

This is a Flask based data analytics project based on the superstore dataset using flask, pandas, sql and python

analytics data data-analysis data-science data-visualization flask python superstore

Last synced: 04 May 2025

https://github.com/revtpark/teamseas_scrapper

Scraping Team Seas for data analysis and visualization.

chartjs data-analysis python webscraping

Last synced: 28 Mar 2025

https://github.com/sevilaymuni/project-no.2-pandas-tableau-student-mobility

Pandas assisted Feature Engineering on Study Mobility: Tableau Dashboards on Students' Preferences

data-analysis data-extraction data-visualization feature-engineering pandas python tableau-dashboards tableau-desktop tableau-public

Last synced: 03 May 2026

https://github.com/dina-hosny/retail-store-data-modeling-and-analysis-using-datastage

The project implements a star-schema data warehousing flow, then utilize IBM InfoSphere DataStage to develop efficient ETL pipelines to create data marts and perform some analysis on them.

data-analysis datastage datawarehousing etl extract ibm load transform

Last synced: 06 Mar 2026

https://github.com/souravxbera/credit-card-approval-predictor

End-to-end Machine Learning project to predict credit card approval decisions using real-world financial features. Includes EDA, model training, and deployment-ready architecture

credit-card-approval-prediction data-analysis machine-learning python scikit-learn streamlit

Last synced: 15 May 2026

https://github.com/lewismakau/portfolio-projects

This repository contains file data and SQL files for projects used for my Portfolio.

data-analysis data-cleaning data-structures data-visualization database google-analytics microsoft-sql-server mysql powerbi tableau

Last synced: 02 Apr 2026

https://github.com/azaz9026/loan_approval_prediction

Welcome to the Loan Approval Prediction repository! This project aims to build a predictive model that can determine whether a loan application should be approved or denied based on various features. Purpose The goal of this repository is to develop a machine learning model that can accurately predict loan approval decisio

data data-analysis data-visualization eda machine-learning numpy pandas python statistics

Last synced: 06 Apr 2026

https://github.com/amruthadevops/stock-market-analysis

To analyze market trends and predict future market behavior using machine learning techniques

data-analysis data-science jupyter-notebook machine-learning powerbi-desktop python stock-market

Last synced: 15 May 2026

https://github.com/chahelgupta/hospital-readmission-prediction-and-analysis

The Hospital Readmission Prediction project uses clinical data to predict diabetic readmissions. SVM + SMOTE achieved 61.16% accuracy, with key predictors including hospital stay, lab tests, and medications.

data-analysis knn-classification logistic-regression machine-learning prediction prediction-model python random-forest-classifier smote svm-classifier

Last synced: 15 May 2026

https://github.com/sanafagal/wsp-msg-automation

An intuitive application for managing and analyzing customer and reseller data stored in Google Sheets, providing insights and streamlined data organization.

automation cloud-credentials data-analysis google-sheets-api python

Last synced: 16 Jun 2025

https://github.com/vishal-verma-96/Pre-Owned-Car-Price-prediction-using-Streamlit-App

Capstone Project by skill Academy- Exploratory Analysis, Visualization and Prediction of Used Car Prices. Deploying the highest-scoring model with Streamlit web app

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms matplotlib numpy pandas python3 regression-algorithms scikit-learn seaborn streamlit

Last synced: 02 Mar 2025

https://github.com/shreeparab1890/india-gdp-rate-1960-to-2021-data-analysis

This ipython notebook is the Exploratory data analysis (EDA) of the India GDP Rate 1960 to 2021.

analysis data-analysis eda exploratory-data-analysis ipython-notebook jyputer-notebook matplotlib matplotlib-pyplot pandas python

Last synced: 06 Mar 2026

https://github.com/anastasius21/creditcardfrauddetection

This repository contains a Jupyter Notebook for Credit Card Fraud Detection Model and a csv dataset on which it is being trained

credit-card-fraud data-analysis data-science data-visualization fraud-detection logistic-regression machine-learning

Last synced: 16 Jun 2025

https://github.com/ahmedkhaled404/data-cleaning-and-eda-layoffs-mysql

This project involves cleaning a dataset containing information about layoffs from companies around the world.

data data-analysis data-cleaning data-preprocessing datacleaning eda exploratory-data-analysis mysql sql

Last synced: 08 Jun 2026

https://github.com/bhiogade/customer-purchase-analysis

Comprehensive Customer Purchase Analysis Across Multiple Dimensions

data-analysis data-visualization tableau tableau-desktop

Last synced: 02 Feb 2026

https://github.com/dzakwanalifi/reglins

regLins is an R package designed for performing linear regression analysis using various optimization methods. It also provides an interactive Shiny application for a more dynamic analysis experience.

data-analysis linear-regression optimization r shiny-app

Last synced: 09 Jul 2025

https://github.com/faith99/water_pollution_dashboard

A data visualization project exploring water access, contamination and health outcomes

data-analysis data-visualization powerbi public-health publichealth

Last synced: 02 Feb 2026

https://github.com/gonzalofuentes28/dpeek

Interactive terminal data viewer for CSV, TSV, JSON, and JSONL files

bubbletea cli csv csv-viewer data-analysis data-viewer golang json json-viewer sqlite terminal tui

Last synced: 06 Apr 2026

https://github.com/marcomadera/test-for-random-numbers

Test for random number between 0 and 1

data-analysis statistics

Last synced: 09 Jul 2025

https://github.com/lucaspadoni/9-11-hijackers-social-network-analysis

Social Network Analysis focused on the events of 9/11/2001. By examining publicly available data through SNA techniques, we gain insights into the organizational structure of the terrorist network, offering valuable perspectives on key relationships and connections.

9-11 data-analysis data-analytics graph-theory hijacking network-analysis sna social-network-analysis terrorism terrorist-attacks

Last synced: 19 May 2026

https://github.com/brunomontezano/sleep-cognition-and-functioning

💤 Data analysis of a brief communication published in Psychiatry Research Communications journal by Montezano et al (2023).

bipolar-disorder cognition data-analysis data-visualization data-viz depression ggplot2 pelotasrs psychiatry psychology published-article r sleep ucpel

Last synced: 13 Jun 2026

https://github.com/silianpan/python-data-analysis-course

python data analysis course of drotion-lega

data-analysis jupyter-notebook panda

Last synced: 11 Apr 2025

https://github.com/saob007/tablero_subsidios_servicio_agua

Se construye un dashboard para el análisis de la distribución y asignación de subsidios para agua potable y alcantarillado otorgados por la Secretaría de Planeación de la Alcaldía de Sincelejo en 2020, con el objetivo de identificar patrones en cobertura, consumo, facturación y subsidios, facilitando la toma de decisiones en políticas públicas

dashboard data-analysis data-visualization looker-studio

Last synced: 31 Jan 2026

https://github.com/oubiche-ishak19/stock_evaluation_python

A Python script to classify companies based on financial metrics like Piotroski F-Score and Stock Valuation, using CSV financial data for analysis and output.

backtesting-frameworks classification csv-processing data-analysis expert-system finance financial-analysis-tools python rule-based-classifier stock stock-market streamlit tkinter-gui yahoo-finance

Last synced: 15 May 2026

https://github.com/advestis/adadjust

Package allowing to fit any mathematical function to (for now 1-D only) data.

data-analysis fit python

Last synced: 17 May 2026

https://github.com/cadedupont/mlb-data-analysis

Performing analysis on dataset of active MLB players in R

baseball-analytics data-analysis data-science mlb-stats-api r

Last synced: 18 Jun 2026

https://github.com/diliprk/smartcityvisualization

Data Wrangling and Data Visualization Works done for Smart City Project at HBK Saar

bokeh data-analysis data-visualization python3

Last synced: 15 May 2026

https://github.com/rohithay/titanic-data-analysis

Predict Survival Outcomes from the 1912 Titanic disaster based on each passenger's features, such as sex and age.

data-analysis machine-learning matplotlib pandas scipy-stats statistical-models

Last synced: 15 May 2026

https://github.com/ansh-info/literaturesurvey

Literature Survey Engine, leverages the powerful Semantic Scholar's Recommendation API to provide you with highly relevant research article recommendations based on your curated lists of articles.

api api-integration automation data-analysis data-visualization docker docker-compose literature-survey machine-learning mysql paper-recommendations python recommendation-system research-tools semantic-scholar streamlit zotero

Last synced: 10 Apr 2026

https://github.com/hrolive/patc-big-data-analytics-bsc

Introduction to the main concepts and technologies related to Big Data and Data Analytics and its applications to real projects.

analytics bias big-data data-analysis hadoop hpc machine-learning mapreduce nosql python spark spark-streaming visualization

Last synced: 12 Apr 2026

https://github.com/jakebrehm/lemons

🍋 A Python package which makes building GUIs easy peasy lemon squeezy.

data-analysis data-science gui python python3 python37 tkinter tkinter-gui tkinter-python

Last synced: 27 Mar 2025

https://github.com/jakebrehm/ezpz-reducer

🪓 Concatenates and then decimates one or more csv files.

data-analysis data-manipulation data-science python python3

Last synced: 27 Mar 2025