An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/victoryfanfare/car-price-prediction

ML модель для определения рыночной стоимости автомобилей с пробегом. Проект включает анализ данных, feature engineering и сравнение различных алгоритмов машинного обучения.

catboost data-analysis jupyter-notebook lightgbm machine-learning pandas python regression

Last synced: 15 Jun 2026

https://github.com/dcs-training/data-wrangling-and-vis-pandas

Introduction to analyzing structured data with the Python libraries pandas, for CSV and TSV data, and ElementTree, for XML data. Go to the readme file

data-analysis data-visualisation data-wrangling python

Last synced: 16 Jun 2026

https://github.com/llnl/cap

HPC workflow that automates the tedious actions of compiling, analyzing, and parsing with bincfg

data-analysis hpc python workflows

Last synced: 17 Jun 2026

https://github.com/juanse0330/registro-pacientes-terapia-python

Proyecto en Python para automatizar el registro y análisis de pacientes en terapia ocupacional domiciliaria. Herramienta orientada al sector salud.

automatizacion data-analysis python salud terapia-ocupacional

Last synced: 17 Jun 2026

https://github.com/kheriberto/bedu_dc

Ejercicios del curso de "python desde 0" de la plataforma BEDU

data-analysis python

Last synced: 18 Jun 2026

https://github.com/preetesh21/spotme

This repository is using the web-based API provided by Spotify to retrieve data and then analyse it.

api data-analysis

Last synced: 18 Jun 2026

https://github.com/ibttf/bayborhood

Interactive map to find the ideal neighborhood in San Francisco based on data.

data data-analysis data-visualization gis mapbox react

Last synced: 18 Jun 2026

https://github.com/httpsnooow/graphs-analysis-neo4j

Challenges from the "Neo4J - Data Analysis with Graphs" course by Digital Innovation One (DIO).

challenge data-analysis data-engineering data-science graph neo4j neo4j-database neo4j-graph

Last synced: 18 Jun 2026

https://github.com/ilhanseyhanx/car-price-prediction-with-machine-learning

🚗 ML-powered car price prediction model with 95.88% accuracy using Random Forest and comprehensive data preprocessing

car-price-prediction data-analysis data-science machine-learning pandas python random-forest regression sklearn

Last synced: 19 Jun 2026

https://github.com/shahaf-f-s/feature-space

A modular framework for combining pandas series features

data-analysis data-science feature-engineering

Last synced: 19 Jun 2026

https://github.com/alinababer/covid19-timeseries-cases-and-deaths-forecasting-

This study is based on confirmed cases and deaths collected from Pakistan. Results demonstrate the promising potential of TIME SERIES model in forecasting COVID-19 cases and highlight the superior performance of the time series compared to the LSTM.we apply AI-based forecasting models such time series ARIMA, LSTM, prophet and VAR.

arima covid-19 data-analysis data-science data-visualization fbprophet forecasting lstm rnn time-series var vectorautoregression

Last synced: 19 Jun 2026

https://github.com/mahapeth/invest-track

Реализация инструмента для мониторинга активности пользователей ИС "Инвест" для ВКР по направлению 01.03.02 Прикладная математика и информатика

analitycs app data-analysis data-visualization jupyter-notebook python sites

Last synced: 20 Jun 2026

https://github.com/an4pdm/relatorio-de-vendas

O presente projeto foi feito através das ferramentas oferecidas pelo Power BI afim de aprimorar meus conhecimentos sobre ETL. Os dados utilizados foram de origem do site "Kaggle".

data-analysis data-visualization database etl powerbi

Last synced: 20 Jun 2026

https://github.com/sakan811/stress-pattern-occurrence-in-english-words

This project is intended to provide English learners with data that allows them to make a data-driven guess when encountering words that they aren't sure where to stress

data-analysis data-visualization english english-language english-learning language powerbi powerbi-report powerbi-visuals

Last synced: 20 Jun 2026

https://github.com/dcs-training/r-visualisation-and-stats

This repository contains material from a 8 classes course on Data Visualisation and statistics with R

data-analysis data-visualisation data-wrangling intro-to-programming r statistics

Last synced: 20 Jun 2026

https://github.com/aonurakman/data-analysis-and-ml-algorithms

An exploration of data analysis techniques and standard ML algorithms on QSAR oral toxicity dataset. - 2021 - Yıldız Technical University

classification clustering data-analysis data-mining isolation-forest python regression

Last synced: 20 Jun 2026

https://github.com/evanmathew/northwind-traders

SQL-powered analysis of sales, employee performance, and customer behavior using PostgreSQL window functions. This project uncovers key business insights to optimize decision-making.

case-study data-analysis jupyter-notebook northwind-traders postgresql python-postgresql sql

Last synced: 20 Jun 2026

https://github.com/haseebn19/urban-housing-demand

A full-stack web application for visualizing housing and labour market data

data-analysis data-visualization docker full-stack gradle statistics web webapp

Last synced: 22 Jun 2026

https://github.com/dcs-training/datavisualisationwithr2021

Data Visualisation with R Course (delivered by the Centre in October/November 2021). This workshop is focusing on good practice of creating graphs with R and R Studio. Go to the readme file

data-analysis data-visualisation data-wrangling r

Last synced: 23 Jun 2026

https://github.com/emaleckova/emaleckova.github.io

My personal website created with Quarto

biology data-analysis data-viz quarto r

Last synced: 23 Jun 2026

https://github.com/anburocky3/cbse-schools-data

Fetch CBSE Schools in seconds and use it for your data projects

cbse data data-analysis data-science grabber nextjs

Last synced: 24 Jun 2026

https://github.com/lu-m-dev/biostatistics-eda

Exploratory data analysis and visualization system for biostatistical research

biostatistics data-analysis data-visualization eda

Last synced: 25 Jun 2026

https://github.com/imosudi/unsupervised-ml-kmeans-analysis

K-Means clustering analysis using synthetic datasets generated with scikit-learn, including meshgrid visualisation, silhouette score evaluation, and investigation of cluster count and random seed effects.

clustering data-analysis jupyter-notebook kmeans kmeans-clustering machine-learning matplotlib python3 scikit-learn silhouette-score unsupervised-learning

Last synced: 25 Jun 2026

https://github.com/parsabordbar/ctx3docs

The Documentation for context Tree Project.

ai-tools context ctx3 ctx3-docs data-analysis documentation tree workflow

Last synced: 25 Jun 2026

https://github.com/vevdokimovm/python-course-notebooks

Python course practice scripts, Jupyter notebooks and deep learning exercises from Grokking Deep Learning

data-analysis deep-learning jupyter python

Last synced: 27 Jun 2026

https://github.com/chdre/data-analyzer

A small package to analyze and preprocess data.

data-analysis python

Last synced: 28 Jun 2026

https://github.com/imgabreuw/minicurso-python-para-financas

Mini curso de Python para finanças, disponibilizado por Varos.

data-analysis financial-analysis python

Last synced: 29 Jun 2026

https://github.com/manganite/vibespin

VibeSpin is a Python framework for simulating and analyzing 2D lattice spin systems (Ising, XY, and q-state Clock models) with Numba-accelerated Monte Carlo dynamics, correlation/structure diagnostics, and reproducible benchmarking workflows.

clock-model critical-phenomena data-analysis ising-model lattice-models monte-carlo-simulation phase-transitions physics-simulation python scientific-computing spin-models spin-systems statistical-mechanics xy-model

Last synced: 29 Jun 2026

https://github.com/mikkelrask/henryrollins-scraper

FANATIC! A dataset of Henry Rollins' listens on his KRCW radio show, with data dating back to 2017 - 496 episodes of weird and rare finds, fast paced punk and frog sounds. Includes a scraper that keeps the data up-to-date with henryrollins.com

archive data-analysis data-visualization music

Last synced: 29 Jun 2026

https://github.com/yash1882/music-store-data-analysis

A project focuses on analyzing music store data using SQL ♬

begineer-friendly data-analysis music music-store-data music-store-data-analysis sql-project

Last synced: 28 Jan 2026

https://github.com/tasosfotiadis/time-series-forecasting-for-bitcoin

This project forecasts Bitcoin’s daily closing price using time series models. Data from Jan 2021 to Mar 2022 is processed by converting timestamps, resampling, and handling missing values. LSTM and ARIMA models are evaluated on MAE, RMSE, and MAPE, with LSTM achieving better accuracy while ARIMA is faster in training and inference.

arima bitcoin data data-analysis data-science deep-learning forecasting jupyter-notebook neural-networks python time-series

Last synced: 06 May 2026

https://github.com/anurag-ghosh-12/library_management_system_sql

This project showcases the development of a comprehensive Library Management System utilizing Structured Query Language (SQL). It demonstrates a practical application of relational database principles to efficiently manage library resources, member information, and borrowing/returning transactions.

data-analysis data-visualisation dbms-project sql

Last synced: 29 Jan 2026

https://github.com/srimantapal205/dataengineerwireframedesigns

Data Engineer Wireframe Designs are essential for planning and visualizing data pipelines, architecture, and workflows before implementation.

data-analysis data-engineering dataflow dataflow-programming datapipeline dataprocessing development visualization

Last synced: 29 Jan 2026

https://github.com/andreicirciumaru/best-of-breed

CSV fundamentals screener: schema validation + market-cap weights

csv data-analysis finance pandas python screener

Last synced: 15 Apr 2026

https://github.com/angchekar28/sales-report-power-bi

A Power BI sales report analyzing country-wise and product-wise sales trends. Includes dashboards, decomposition trees, and key influencers analysis for business insights.

dashboard data-analysis data-cleaning data-visualization powerbi sales-report

Last synced: 16 Mar 2026

https://github.com/engineertolulope/us_states_living_ranking_analysis

Python script for analyzing and ranking U.S. states based on factors like cost of living, tax burden, diversity, crime rates, and climate. Uses weighted criteria to identify the best states to live in according to these metrics. Ideal for decision-making on relocation.

data-analysis data-science linear-regression machine-learning python scikit-learn

Last synced: 29 Jan 2026

https://github.com/wareflowx/excel-toolkit

A powerful command-line toolkit for Excel and CSV data manipulation, analysis, and transformation.

data-analysis data-wrangling excel pandas python uv

Last synced: 29 Jan 2026

https://github.com/mattdelaune/powerbi_healthcare_dashboard

Interactive Hospital Insights Dashboard built with Power BI, showcasing comprehensive analysis of patient demographics, treatment outcomes, and hospital performance.

data-analysis healthcare power-bi visualization

Last synced: 29 Jan 2026

https://github.com/abhi227070/medical-insurance-predictor

This project implements a machine learning regression model to predict medical insurance charges based on user-provided details such as smoking status, number of children, gender, and age. The user-friendly interface allows individuals to estimate their average insurance price before purchasing medical insurance.

data-analysis machine-learning machine-learning-algorithms machinelearning python3 regression-models

Last synced: 04 May 2026

https://github.com/isaqueiros/newspapersoldout-predictions-logistic_regression

This notebook is a study of the application of sklearn Logistic Regression model and analysis of metric quality with a focus on the impact of imbalanced data. The problem presented is the analysis of sales of newspapers of a local stand in order to classify the probability of the newspaper being Sold Out or Not, given a set of features.

data-analysis data-imbalance data-science logistic-regression machine-learning python sklearn-library sklearn-logistic-regression

Last synced: 18 Apr 2026

https://github.com/smahala02/magnetism-lab

This repository contains Python scripts and data for analyzing inductance in toroidal coils to calculate the magnetic permeability of ferrite materials. The project helps classify materials as soft or hard magnets based on experimental data.

data-analysis inductance jupyter-notebook magnetism python toroids

Last synced: 29 Jan 2026

https://github.com/shrutiijoshi/marketing-campaign-report

The dataset includes information on campaign types, recipient segments, interactions (clicks, opens, bounces, etc.), and conversion metrics.

dashboard data-analysis data-visualization tableau-public

Last synced: 25 Feb 2026

https://github.com/edumoraes1/comissao-reduzida

Criação de segmentação de publico via SQL para nova feature do enjoei de comissão reduzida

bq data-analysis salesforce sql

Last synced: 06 Feb 2026

https://github.com/joannescode/regex_with_py

Learning by practicing with Regex (Python)

data-analysis python3 regex

Last synced: 30 Jan 2026

https://github.com/surajwate/datalab

DataLab is a versatile toolkit designed to simplify data exploration, analysis, and visualization for data scientists.

data-analysis data-science python visualization

Last synced: 30 Jan 2026

https://github.com/mfakhriazhar/us-companies-revenue-dashboard

This project is a data visualization dashboard built using Power BI that highlights lists of the largest companies in the United States by revenue. The goal is to provide an interactive overview of company performance across industries, focusing on revenue, employee metrics, and industry trends.

dashboard data-analysis data-visualization largest-companies-us powerbi revenue united-states

Last synced: 30 Jan 2026

https://github.com/mfakhriazhar/healthcare-dashboard-project

This project is a comprehensive data analysis and visualization of healthcare data using Power BI. It focuses on understanding patient distribution, billing trends, and hospital performance through a clean and interactive dashboard.

dashboard dashboardreporting data-analysis datacleaning excel powerbi powerquery

Last synced: 30 Jan 2026

https://github.com/touchesir/twitter_physicalactivity

Companion Data / Analysis for "Monitoring Physical Activity Levels using Social Media Data"

data-analysis twitter

Last synced: 30 Jan 2026

https://github.com/ljadhav25/decision-tree-random-forest-algorithm-data-science-

This repository contains an implementation of decision tree and random forest algorithms from scratch in Python. Decision trees and random forests are popular machine learning algorithms used for classification and regression tasks. The goal of this project is to provide a clear and understandable implementation of these algorithms

data-analysis data-science decision-trees machine-learning-algorithms matplotlib numpy pandas python random-forest-classifier

Last synced: 15 Apr 2026

https://github.com/aygp-dr/values-compass

Tools for exploring and analyzing Anthropic's Values-in-the-Wild dataset for AI ethics research

ai-ethics anthropic-claude data-analysis nlp values

Last synced: 25 Feb 2026

https://github.com/manishabarse/hr_data_analysis

Used Microsoft SQL Server Management Studio and Power BI

data-analysis powerbi sql ssms

Last synced: 30 Jan 2026

https://github.com/nehar-2404/airbnb-nyc-eda-ml

This project analyzes Airbnb listings in New York City to uncover key insights about pricing, host activity, and neighborhood trends. It covers data cleaning, EDA, and basic machine learning to predict listing prices.

airbnb data-analysis eda machine-learning matplotlib pandas pyhton seaborn visualization

Last synced: 15 Apr 2026

https://github.com/jcaperella29/jc_bioinformatics_hub

A personal hub to showcase my bioinformatics applications including RNA-Seq, ATAC-Seq, and miRNA-Seq analysis tools. Powered by simple HTML, CSS, and JavaScript with a biotech-themed design.

atac-seq bioinformatics biotech data-analysis github-pages portal rna-seq webapp

Last synced: 25 Feb 2026

https://github.com/jaseel342/ecommerce_sales_dashboard

The E-commerce Sales Dashboard project offers a comprehensive view of e-commerce sales performance using interactive Power BI dashboards. It focuses on key metrics like YTD Sales, YTD Profit, YTD Profit Margin, and Quantity of Products sold, analyzing data by product categories, states, and regions.

data-analysis data-modelling dax-expression excel power-query powerbi visualization

Last synced: 07 Feb 2026

https://github.com/aavishkarmahajan/sql

SQL code assignments and practice questions from SQL courses, SQL data analysis

data-analysis sql sql-server

Last synced: 07 Feb 2026

https://github.com/gurpreet17/uc-davis-sql-for-data-science-specialization

Completed the SQL Basics for Data Science Specialization from the University of California, Davis, gaining proficiency in Data Analysis, SQL, Apache Spark, and Delta Lake.

apache-spark bigdata data-analysis data-science delta-lake sqlite

Last synced: 15 Apr 2026

https://github.com/auliannee/new-york-uber-pickups-analysis

This repository contains the projects related to data collecting, quality check, manipulation, analyzing, and visualizations.

data-analysis data-science ipython-notebook jupyter-notebook python

Last synced: 07 Feb 2026

https://github.com/luminati-io/indeed-dataset-samples

A sample dataset of over 1000 Indeed job listings, extracted using the Bright Data API, ideal for market analysis and growth.

api data-analysis datasets indeed jobs web-scraping

Last synced: 07 Feb 2026

https://github.com/tralahm/parliament-2017-dataset

Concise, Clean data sets of the 2017 Kenyan General Election results for the Members of the Senate and National Assembly Composition

csv-parsing data-analysis data-visualization datasets election-data ipynb-jupyter-notebook kaggle-dataset kenya-constituencies kenya-counties matplotlib python3 tralahtek

Last synced: 31 Jan 2026

https://github.com/jofaval/titanic-disaster

Data Analysis of the famous Titanic Disaster in 1912 with Machine Learning

classification data-analysis data-science data-visualization google-colab kaggle machine-learning python scikit-learn

Last synced: 15 Apr 2026

https://github.com/jujulis18/olympicsmedalsdashboard

Olympic Dashboard – Paris 2024 est un tableau de bord interactif permettant d’explorer les performances des athlètes médaillés des Jeux Olympiques d’été de Paris 2024.

dashboard data-analysis data-visualization eda olympic python streamlit

Last synced: 31 Jan 2026

https://github.com/amishidesai04/flipkart-mobile-sales-analysis

Flipkart Mobile Sales Analysis is a Tableau project that visualizes mobile sales data from Flipkart. It highlights trends in brand performance, pricing, ratings, and customer preferences. The interactive dashboard helps users explore key insights for data-driven decisions in e-commerce and retail.

dashboard data-analysis data-visualization storyboard tableau

Last synced: 31 Jan 2026

https://github.com/traore-07/fedex-sales-analysis

Analysis of the FedEx Sales Transaction

data-analysis data-visualization sales-analysis tabeau

Last synced: 31 Jan 2026

https://github.com/cca/panopto-session-data

analyzing Panopto session data for retention purposes

data-analysis ipython-notebook video

Last synced: 07 Feb 2026

https://github.com/shafaq-aslam/pandas-lab

A comprehensive collection of Jupyter notebooks exploring Pandas, from Series and DataFrames to data cleaning, aggregation, merging, and visualization. A complete hands-on guide for mastering data manipulation and analysis with Python.

analytics data-analysis data-cleaning data-science data-visualization dataframe jupyter-notebook machine-learning pandas pandas-dataframe pandas-library pandas-series python python3 series

Last synced: 15 Apr 2026

https://github.com/allanotieno254/bank-loan-analysis-dashboard-power-bi

An interactive Power BI dashboard that analyzes bank loan data to provide insights into approval trends, default risks, and customer profiles. Designed to assist financial institutions in making data-driven lending decisions.

bank-loans business-intelligence dashboard data-analysis financial-analysis power-bi risk-assessment

Last synced: 31 Jan 2026

https://github.com/ginanti-riski/streamlit_datapenyewaansepeda

Analisis Bike Sharing adalah proyek yang bertujuan untuk memahami pola penyewaan sepeda berdasarkan berbagai faktor seperti cuaca, musim, dan hari. Proyek ini menggunakan teknik analisis data untuk mendapatkan wawasan yang lebih dalam mengenai tren peminjaman sepeda.

data-analysis data-analysis-python data-science data-visualization python streamlit

Last synced: 15 Apr 2026

https://github.com/malthejorgensen/repx

Python regular expression file transformer

command-line-tool data-analysis text-processing

Last synced: 31 Jan 2026

https://github.com/gastonstat/stat133

STAT 133: Concepts in Computing with Data

data-analysis data-science data-visualization r-programming syllabus

Last synced: 25 Feb 2026

https://github.com/steviecurran/gbt-scripts

IDL scripts for the reduction of Green Bank Telescope data

data-analysis data-compression data-visualization radio-astronomy spectroscopy

Last synced: 31 Jan 2026

https://github.com/nikitalpopov/evotor_champ

solution for evotor data challenge

data-analysis data-science python scikit-learn

Last synced: 15 Apr 2026

https://github.com/alex-pierron/ekip-enedis-genai

Repository for the team "Ekip" during the H-GenAI Hackathon 2025 organized at SIA Partners, Paris, France

amazon-nova artificial-intelligence aws aws-lambda data-analysis database generative-ai mistral nlp

Last synced: 15 Apr 2026

https://github.com/ajmannust41288/data-analyst

Data Analyst ,Microsoft Professional expert,Desktop PowerBi ,Tablue and Dashboards with ChatGP4 AI uses

business-analytics data-analysis data-analyst data-analytics eda

Last synced: 01 Feb 2026

https://github.com/tusharpandey003/chat_analysis

Analysis of group chat with respect to individual member of group

chat-analysis chat-analyzer data-analysis data-science streamlit whatsapp whatsapp-chat whatsapp-web

Last synced: 01 Feb 2026

https://github.com/axsk/geekgraph

parse, cluster and visualize boardgamegeek.com user profiles

data-analysis scraper

Last synced: 01 Feb 2026

https://github.com/emediongfrancis/unified-data-lake-implementation-gcp-kafka-airflow-snowflake

This project demonstrates the integration of data from multiple sources into a unified data lake. The project showcases the use of Apache Airflow for ETL tasks, Google Cloud Storage as a data lake, Apache Kafka for data movement automation, Snowflake for data warehousing, and Google BigQuery for analysis.

airflow data-analysis data-warehousing etl etl-pipeline gcp-storage kafka snowflake value variety

Last synced: 07 Feb 2026

https://github.com/bineet-ratna-shakya/data-science-salary-analysis

analyzing a dataset containing salaries of data science professionals from 2020 to 2023.

data-analysis data-science data-visualization jupyter numpy pandas python

Last synced: 01 Feb 2026

https://github.com/tapas-gope/global-superstore-sales

This repository contains a Power BI dashboard designed to provide comprehensive insights into sales performance across various regions, segments, and products. The dashboard utilizes a variety of visualizations, including bar charts, line charts, maps, and tables, to effectively communicate key metrics and trends.

business-intelligence data-analysis data-modeling data-visualization financial-reporting powerbi sales-analysis

Last synced: 07 Feb 2026

https://github.com/ludreinsalvador/life-expectancy-data-analysis

Contains Power BI dashboards analyzing global life expectancy trends, mortality rates, and health expenditures. Using a dataset sourced from Google Sheets, the project explores the impact of economic and healthcare factors on longevity.

dashboard data-analysis data-visualization healthcare-analysis life-expectancy powerbi

Last synced: 25 Feb 2026

https://github.com/asghar-rizvi/world-energy-consumption-analysis-1965-2023-

An in-depth analysis of global energy consumption trends from 1965 to 2023, using data from various countries and regions.

data-analysis data-analysis-python data-science python real-world-data real-world-data-analysis real-world-problem-solving real-world-project visulaization

Last synced: 15 Apr 2026

https://github.com/rissh/titanicsurvivalpredictionusingml

Predicting Titanic passenger survival through machine learning. This project includes data preprocessing, exploratory data analysis, feature engineering, and model training using Python. 🚢

data data-analysis data-science data-visualization dataanalysis jupiter-notebook machine-learning machine-learning-algorithms machinelearning matplotlib numpy pandas prediction prediction-model python python3 seaborn tenserflow tflearn titanic

Last synced: 01 Feb 2026