An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/touppercase78/salary-prediction-collection

Salary predictions with ML models and analyses on datasets from several other GitHub repos

data-analysis data-visualization datasets machine-learning python3 regression-models

Last synced: 02 May 2026

https://github.com/ramonanf/tc1002s_semanatec

Herramientas computacionales: El arte de la analítica

data-analysis data-visualization jupiter-notebook pandas-python

Last synced: 15 Jun 2025

https://github.com/madrury/hot-sauce

Simuation of a Hot Sauce Spicyness Dataset

data-analysis data-science data-visualization dataset machine-learning

Last synced: 16 May 2026

https://github.com/chanupadeshan/atliq-bank-insights

A complete data analytics and A/B testing project for Atliq Bank using synthetic customer and transaction data. Includes data cleaning, EDA, and statistical evaluation of a targeted marketing campaign. Website: Leave blank or link to a blog/portfolio if applicable

ab-testing data-analysis data-visualization eda python3 statistics

Last synced: 04 Jul 2026

https://github.com/eco786786/salaries

This analysis explores the factors influencing salaries for data professionals from 2020 to 2024, including job titles, experience levels, remote work ratios, employment types, company locations and sizes. Using data from Kaggle, the project uncovers trends and insights to guide both companies and professionals in the tech industry.

data-analysis git postgresql powerbi

Last synced: 19 May 2026

https://github.com/whisplnspace/insightgenie

InsightGenie is an AI-powered data analyst that lets you upload files, ask questions, and get insights with visualizations

data-analysis data-science data-visualization deployment gemini-api huggingface nlp

Last synced: 19 Jun 2025

https://github.com/mimi-netizen/python-and-machine-learning-in-financial-analysis

This comprehensive repository covers financial data analysis using Python and machine learning techniques, including time series modeling, portfolio optimization, risk assessment, credit risk prediction, and deep learning applications in finance.

data-analysis data-science data-visualization finance financial-analysis financial-data financial-modeling

Last synced: 19 May 2026

https://github.com/lucasfloresc/final_project

This is the final project of the Ironhack Bootcamp. In this project I applied all methods and tecniques learned in the Bootcamp, such as Web Scrapping and API extraction, Data cleaning and processing with Python, Python logic, the implementation of machine learning and Data Visualization. All displayed in Streamlit for more user friendly interface

data-analysis data-visualization machine-learning python streamlit webscraping

Last synced: 08 May 2026

https://github.com/jabulente/tanzania-geographical-zones

This project provides a geospatial visualization of Tanzania's geographical zones and regions. It uses geospatial data to map each zone, display regions, and annotate them for easy identification. The visualizations include simulated data to demonstrate thematic mapping techniques.

ai data-analysis data-science data-visualization geopandas geospatial location matplotlib ml python tanzania tanzania-geographic tanzania-locations

Last synced: 19 May 2026

https://github.com/mysftz/statistics-analysis

A python statistical analysis of a dataset and probability.

data-analysis matplotlib python python3 statistical-analysis

Last synced: 29 Jun 2025

https://github.com/galahad20/b244006e_analisis_data

Data Analysis project at Dicoding course "Belajar Analisis Data dengan Python". I learn to do analyst on data and visualizing it to get meaningful insight.

data-analysis data-analytics python streamlit

Last synced: 06 Apr 2026

https://github.com/mkk-1817/cvip-ds-exploratory_data_analysis-terrorism

This repository deals with exploring global terrorism trends analyzing the Global Terrorism Database to uncover temporal patterns, identify top terrorist groups, examine attack types, and gain insights into geographical and success/failure dynamics.

coderscave data-analysis data-science data-visualization eda exploratory-data-analysis python terrorism-analysis

Last synced: 19 Jun 2025

https://github.com/namratagulati/tweets_analysis

This repository focuses on sentiment analysis of Twitter data using Python, Natural Language Processing (NLP), and the Natural Language Toolkit (NLTK). The goal is to extract valuable insights from social media discussions, such as word frequency, hashtag trends, and sentiment patterns.

analysis data-analysis natural-language-processing nlp-machine-learning nltk-corpus nltk-python sentiment-analysis twitter-sentiment-analysis

Last synced: 07 Aug 2025

https://github.com/celineboutinon/lafleche-et-associes

OpenClassrooms Data Analyst 2022-2023 - Projet 7 using KNIME Analytics Platform

data-analysis data-analytics data-visualisation knime-analytics-platform no-code rgpd

Last synced: 08 Feb 2026

https://github.com/iamsainikhil/data-visualization

Visualization of Web data using Python

data-analysis data-visualization python webscraping

Last synced: 13 Jun 2026

https://github.com/srvcl/lung-cancer-survival-analysis

Data Cleaning of a dataset and Survival Analysis in R Language

data-analysis data-science data-visualization r survival-analysis

Last synced: 11 May 2026

https://github.com/berkekaragoz/media-investments-data-analysis

Advertisement Investments Distribution of Turkey by Medium

data-analysis r

Last synced: 19 Aug 2025

https://github.com/lorinczakos/sql-projects

This is a collection of my SQL scripts that I wrote and were approved through my course with GoIT Romania Data Analyst course

bigquery cte data data-analysis dbeaver marketing-analytics postgresql project-repository sql vscode

Last synced: 16 May 2026

https://github.com/nferno55/mock-data-governance

Working with messy data and using data quality practices to clean it up and practice SQL/Python automation. YAML will be used for Metadata validation soon.

data-analysis database-management metadata python sql sqlite3 yaml

Last synced: 16 May 2026

https://github.com/tabibyte/aoty-highest-rated-albums-data-analysis

Data Analysis of AOTY Highest Rated Albums

albums aoty data-analysis music

Last synced: 10 Sep 2025

https://github.com/madi-s/tennispredictor

Program to predict outcomes of major tennis matches.

data-analysis prediction-algorithm python scraper tennis webdriver

Last synced: 06 Jul 2025

https://github.com/athari22/multivariable_regression_and_valuation_model_

Multivariable regression model using Python to analyze and predict Boston housing prices based on various socioeconomic and environmental features.

data-analysis data-analysis-python housing-prices housing-prices-competition machine-learning pandas pandas-python plotly python regression-models seaborn seaborn-python sklearn

Last synced: 17 Jun 2025

https://github.com/jabulente/kruskall-wallis-test

This repository contain project that provides a reusable Python function to perform the Kruskal-Wallis H-test across multiple continuous variables, grouped by a categorical feature

data-analysis data-science eda hypothesis-tests kruskal-wallis kruskals-algorithm scipy-stats statistics

Last synced: 22 Jul 2025

https://github.com/nafisrayan/crypto-trading-platform

This React Crypto Exchange Template is designed to provide a solid foundation for building a comprehensive cryptocurrency exchange platform. With its sleek and modern design, this template is perfect for anyone looking to create a user-friendly and intuitive trading experience.

crypto dashboard data-analysis data-visualization react template

Last synced: 16 May 2026

https://github.com/j-faria/bicerin

Working on the RV challenge in Torino

data-analysis gp radial-velocity rv-challenge

Last synced: 07 Apr 2026

https://github.com/carvalhoandre/coletor-tweets

Criado para coletar e armazenar tweets utilizando a API do Twitter. Inicialmente inspirado no caso de uso do livro Um Voluntário na Campanha de Obama, este projeto tem como objetivo demonstrar a importância do monitoramento no X. O coletor permite buscar tweets sobre qualquer termo desejado

data-analysis mongodb python twiter-analysis twitter

Last synced: 19 May 2026

https://github.com/prasad-chavan1/bank_data_analysis_r

Bank data analysis in R language

data data-analysis data-science r

Last synced: 24 Feb 2025

https://github.com/danitilahun/exploratory-data-analysis-projects

This repository contains a collection of my personal Exploratory Data Analysis (EDA) projects. Each project involves exploring various datasets to gain insights, uncover patterns, and visualize trends.

data-analysis data-science data-visualization exploratory-data-analysis python

Last synced: 16 May 2026

https://github.com/v41bh4vr4jput/data-analysis-with-python

This repository is a comprehensive collection of data analysis projects and tutorials using Python's most powerful libraries: NumPy, Pandas, Seaborn, and Matplotlib. It is designed to help you explore, clean, visualize, and analyze data efficiently.

api data data-analysis data-visualization matplotlib numpy pandas python sakila-db seaborn

Last synced: 09 Apr 2026

https://github.com/chaganti-reddy/ai-prototype-customer-segmentation

Artificial Intelligence Prototype product based model for Customer Segmentation in E-Commerce Industry.

artificial-intelligence cluster-analysis customer-segmentation data-analysis machine-learning product-based prototype

Last synced: 13 Mar 2025

https://github.com/gmasson/datadash

DataDash é uma biblioteca JavaScript e CSS para criar dashboards interativos, para visualização de dados dinâmicos em páginas web.

dashboard dashboard-application dashboards data-analysis data-science data-visualization javascript

Last synced: 08 Aug 2025

https://github.com/sweta-kaundilya/911-calls-capstone-project

For this capstone project we will be analyzing some 911 call data from Kaggle.

data data-analysis data-visualization jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 28 Apr 2026

https://github.com/sweta-kaundilya/sql_projects_data_analytics

This repository contains SQL porfolio projects

data-analysis mysql-database mysql-workbench

Last synced: 10 Sep 2025

https://github.com/al-ogr/sf_pr2_job_analysis_hh_sql

SkillFactory DataScience PROJECT-2. Анализ вакансий из HeadHunter

data-analysis data-science ipynb plotly python sql

Last synced: 19 May 2026

https://github.com/lmuffato/jiboia

Jiboia is a Python package for automatically normalizing and optimizing DataFrames efficiently.

data-analysis data-science dataframe normalization pandas python

Last synced: 19 May 2026

https://github.com/kakri787/alcoholism-and-grade-analysis

A mini project for university data science module where we analyzed on the relationship between alcohol consumption in students and their academic performance, making use of exploratory data analysis and machine learning techniques to see if we can predict student's grades.

data-analysis data-science data-vizualisation lasso-regression machine-learning neural-network

Last synced: 12 Apr 2025

https://github.com/xenon1919/credit-card-fraud-detection

Credit Card Fraud Detection is a machine learning project to predict fraudulent credit card transactions. It handles imbalanced data using undersampling and applies Logistic Regression and XGBoost models. With an AUC of 0.98, it offers robust fraud detection. Includes a Streamlit app for real-time predictions.

data-analysis machine-learning python

Last synced: 14 May 2026

https://github.com/nurulashraf/linear-regression-insurance-premium

This analysis applies simple linear regression to explore the relationship between age and insurance premium. It includes model training, visualisation, and evaluation using MSE and RMSE to assess prediction accuracy.

beginner-project data-analysis insurance-data linear-regression machine-learning matplotlib predictive-modeling python regression-models scikit-learn

Last synced: 05 May 2026

https://github.com/prathmesh2507/ctc-hackthon

A data-driven system designed to reduce overcrowding and optimize urban public transport using real-world geospatial data and intelligent simulation.

dashboard data-analysis data-visualization python streamlit

Last synced: 16 May 2026

https://github.com/datalopes1/desafio_delivery

Desafio do Clube de Assinaturas da Universidade dos Dados para simular as demandas reais de um analista de dados

data-analysis jupyter python

Last synced: 06 Mar 2026

https://github.com/joe-stifler/llm-sig-playground

This repository is a collaborative space for MSc Earth Science students at Imperial College London to experiment with and apply Large Language Models (LLMs) to real-world Earth Science problems. Follows below the persona playground link.

data-analysis earth-science llms machine-learning research-automation

Last synced: 29 Mar 2025

https://github.com/mansiikumarii/mysql

A curated collection of MySQL scripts covering DDL, DML, and DRL operations. Ideal for beginners to practice and understand core SQL concepts.

backend data-analysis data-modeling database database-integration database-management database-performance database-schema mysql mysql-admin mysql-database orm php-mysql query-optimization rdbms sql sql-query sql-script stored-procedure

Last synced: 19 May 2026

https://github.com/the-pinbo/dimensionalityredux-pca-vs-autoencoders

Comparative study of PCA and Autoencoders for effective dimensionality reduction, assessed through PSNR and SSIM metrics.

autoencoder-mnist autoencoders data-analysis dimensionality-reduction image-compression mnist neural-networks pca psnr ssim

Last synced: 13 May 2025

https://github.com/julie-fliorko/rockbuster-insights-sql-project

Data analysis using PostgreSQL to help Rockbuster Stealth LLC identify revenue trends, customer insights, and rental behavior patterns.

data-analysis postgresql sql

Last synced: 22 Jul 2025

https://github.com/georgiifirsov/educational-research-work

Educational research project on 3rd year (6th semester). Topic: ARMA models in time series analysis

arma data-analysis jupyter-notebook python time-series time-series-analysis tsa

Last synced: 27 Apr 2026

https://github.com/jagoda11/elastic-vision

This repository contains a full-stack application designed to explore data from ElasticSearch🧐indices and visualize it using charts and graphs. The backend is built using Node.js and the frontend is powered🚀 by React.

backend chartjs dashboard-development data-analysis data-visualization docker elasticsearch frontend fullstack javascript material-ui monorepo mui-x node pie-chart react restful-api tables

Last synced: 09 Apr 2026

https://github.com/clchinkc/zombie

Personal project, Python, NumPy, Matplotlib, Pygame, Scikit-learn, TensorFlow, Docker

algorithms data-analysis docker machine-learning matplotlib numpy pygame python sklearn tensorflow zombie-simulation

Last synced: 05 Apr 2026

https://github.com/beatrice-b-m/bea-tools

🐝 𝓉𝑜𝑜𝓁𝓈 𝓂𝒶𝒹𝑒 𝒷𝓎, 𝒶𝓃𝒹 𝒻𝑜𝓇, 𝒷𝑒𝒶 🐝 . ݁₊ ⊹ . ݁ ⟡ ݁ . ⊹ ₊ ݁ ⊹ . ݁ ⟡ ݁ . ⊹ ₊ ݁. ⊹ . ݁ ⟡ ݁ .⊹ . ݁ ⟡ A Python package of random functions and tools that I use regularly. Data science / analysis focused since, ya know, I'm a data scientist c:

data-analysis data-science data-visualization

Last synced: 15 Jan 2026

https://github.com/mindlessmuse666/missing-data-processing

Проект по обработке пропущенных значений в данных о пассажирах Титаника с использованием библиотек Python Matplotlib и Seaborn.

data-analysis data-visualization matplotlib missing-values-analysis missing-values-handling pandas python seaborn titanic

Last synced: 16 May 2026

https://github.com/jhrcook/protein-language-models

Experimenting with protein language model predictions

data-analysis protein-language-model variant-effect-prediction

Last synced: 28 May 2026

https://github.com/amishidesai04/interactive-data-visualisation-tool

A Java-based application leveraging JavaFX to create dynamic and interactive charts, including pie charts, bar charts, and line graphs. Ideal for visualizing various datasets, this tool offers customizable features and a user-friendly interface. Easily input and manage data, customize chart styles, and observe trends and patterns effectively.

charts data-analysis data-visualisation data-visualization-project gui java javafx visualization-tools

Last synced: 17 Apr 2026

https://github.com/sukitsubaki/screen-time-tracker

A minimalist Python tracker that records the usage time of various applications and provides insights into your computer usage habits.

application-usage data-analysis monitoring productivity python python-cli screen-time time-tracking

Last synced: 12 Apr 2025

https://github.com/andrewzgheib/football-database-analysis

Football database utilizing PostgreSQL and Pandas for data management, with PowerBI for intuitive KPI visualization

data-analysis data-visualization database pandas pgsql postgr powerbi sql

Last synced: 04 Apr 2025

https://github.com/nerooc/device-downtime-detection

Repozytorium dotyczące projektu z przedmiotu "Sztuczne Sieci Neuronowe"

data-analysis detection-model recurrent-neural-networks

Last synced: 22 Mar 2025

https://github.com/timkong21/siemens-mobility-operations-industrial-engineer-simulation

Operations Industrial Engineer job simulation with Siemens Mobility. Includes time study analysis to identify assembly bottlenecks (Task 1) and a proposed layout redesign to improve efficiency without automation (Task 2).

data-analysis forage industrial-engineering job-simulation manufacturing process-improvement production-engineering python siemens time-analysis

Last synced: 19 May 2026

https://github.com/lopez86/datascienceexamples

Examples of various data science & data analysis topics using various sources of data.

data-analysis data-science pandas scikit-learn tutorial visualization

Last synced: 13 Apr 2026

https://github.com/sharduljunagade/human-activity-recognition

This repository contains the code for the Assignment-1 of the course ES 335: Machine Learning 2024 at IIT Gandhinagar taught by Prof. Nipun Batra.

data-analysis data-collection decision-trees groq-api human-activity-recognition jupyter langchain-python machine-learning pandas prompt-engineering python sklearn tsfel

Last synced: 08 Apr 2026

https://github.com/debjyotisaha/data-analytics-projects-phase-2

Developed and showcased various data analytics projects, including data preprocessing, exploratory data analysis, and visualization. Utilized tools such as Python, Pandas, NumPy, and Matplotlib to derive actionable insights and demonstrate problem-solving capabilities.

data-analysis data-preprocessing eda matplotlib numpy pandas python seaborn

Last synced: 09 Apr 2026

https://github.com/nagar2nd/zomato-bangalore-analysis-tableau

Analysing restaurant data in Bengaluru to enhance customer satisfaction by optimizing the restaurant experience. The focus is on improving the popularity of different cuisines, enhancing delivery times, and boosting restaurant ratings. An interactive Tableau dashboard has been developed to help Zomato identify key areas for improvements.

data-analysis data-visualization tableau

Last synced: 05 Mar 2026

https://github.com/shubhamgoyal575/credit-card-fraud-detection

📌 Credit Card Fraud Detection using Machine Learning This project focuses on detecting fraudulent credit card transactions using machine learning models like Random Forest, XGBoost, and Deep Learning. The dataset is preprocessed to handle class imbalance, and multiple models are evaluated based on ROC AUC Score and F1 Score.

adaboost-classifier artificial-neural-networks credit-card-fraud data-analysis data-cleaning data-preprocessing data-science data-visualization deep-learning exploratory-data-analysis lightgbm machine-learning machine-learning-algorithms random-forest-classifer scikit-learn tensorflow xgboost

Last synced: 08 Feb 2026

https://github.com/swatisinghit/e-commerce-trend-analysis-for-target

An exploratory and in-depth study of the E-Commerce sales data for a Brazilian store using SQL.

bigquery data-analysis mysql sql

Last synced: 19 May 2026

https://github.com/amarlearning/exploring-the-evolution-of-linux

Data Analysis about the development of the Linux operating system by exploring its Git repository history.

cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux

Last synced: 12 May 2026

https://github.com/imnotamr/datasets-used

A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics

ai classification data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning models python regression-models

Last synced: 19 May 2026

https://github.com/mulukensholaye/spark_kafka_streaming_csv

Real-time streaming data analysis pipeline with integrating apache spark's streaming library to read records from kafka topic

apache-kafka apache-spark data-analysis python3 realtime-messaging

Last synced: 19 May 2026

https://github.com/syed-amjad-ali/airbnb-listing-analysis

Analyzing AirBnB listings in Paris to determine the impact of recent regulations

business-intelligence data-analysis jupyter-notebook maven-analytics python

Last synced: 19 May 2026

https://github.com/hawmex/aut_data_and_information_analysis_project

This repository contains the files of my project for the "Data & Information Analysis" course at AUT (Tehran Polytechnic).

data-analysis data-science k-means outlier-detection python

Last synced: 19 May 2026

https://github.com/halyusa16/sql-employee-insights

This project dives into employee data to uncover actionable insights using SQL. It mimics real-world HR and business analysis tasks, from salary comparisons to workforce demographics and potential cost-cutting strategies.

data-analysis mysql sql

Last synced: 11 Apr 2025

https://github.com/devexpress-examples/wpf-pivotgrid-how-to-display-underlying-data

This example demonstrates how to obtain the records from the control's underlying data source for a selected cell or multiple selected cells.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 19 May 2026

https://github.com/samir-atra/share-lm_dataset_analysis

Analysis, studies and optimizations on the ShareLM extension dataset

data-analysis data-visualization gemma3n huggingface huggingface-transformers pandas

Last synced: 19 May 2026

https://github.com/rita94105/smart_contract_vulnerability_detector

Smart contracts are pivotal in blockchain applications but are prone to vulnerabilities that can lead to significant losses. SmartGuard: Multi-Stage Smart Contract Vulnerability Detection tackles this issue by developing a machine learning framework to identify eight vulnerability types using datasets from Kaggle and Hugging Face.

data-analysis machine-learning smart-contracts streamlit vulnerability-detection

Last synced: 01 Aug 2025

https://github.com/prakshal0809/sql-data-analysis-project

This project involves analyzing pizza sales data using SQL to address various data analysis questions, providing essential foundational to advanced SQL knowledge.

data-analysis sql

Last synced: 26 Jun 2025

https://github.com/borjamome/radiografia-madrid

Análisis de Población, Economía y Sociedad de Madrid con R.

data-analysis data-visualization madrid r

Last synced: 17 Jun 2025

https://github.com/singingsandhill/data_analysis

데이터 분석_개인 프로젝트 정리

data-analysis python

Last synced: 19 May 2026

https://github.com/rorrell/spotifyhistory

A Jupyter Notebook where I wrangle some data and plot a chart to draw some conclusions about a user's Spotify history

data-analysis data-visualisation data-wrangling jupyter-notebook python3

Last synced: 19 May 2026

https://github.com/ygalvao/uow_ai_final_project

This was my Final Project for the Artificial Intelligence Diploma program of The University of Winnipeg - Professional, Applied and Continuing Education (PACE).

data-analysis data-analytics dbscan elections k-means k-means-clustering machine-learning som som-clustering

Last synced: 10 Jul 2025

https://github.com/riborings/uranouchi42microdiversity

In this repository live the bash, R and Julia scripts used to explore the microdiversity of the prokaryotic community at Uranouchi Inlet (42-sample time-series) by means of metagenomic shotgun sequencing under the supervision of the Ogata Lab.

big-data data-analysis data-visualisation diversity-analysis marine-ecology marine-ecosystem metagenomics microbiome-analysis prokaryotic-genomes

Last synced: 29 Oct 2025