An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/brunomontezano/sleep-cognition-and-functioning

💤 Data analysis of a brief communication published in Psychiatry Research Communications journal by Montezano et al (2023).

bipolar-disorder cognition data-analysis data-visualization data-viz depression ggplot2 pelotasrs psychiatry psychology published-article r sleep ucpel

Last synced: 13 Jun 2026

https://github.com/lucaspadoni/9-11-hijackers-social-network-analysis

Social Network Analysis focused on the events of 9/11/2001. By examining publicly available data through SNA techniques, we gain insights into the organizational structure of the terrorist network, offering valuable perspectives on key relationships and connections.

9-11 data-analysis data-analytics graph-theory hijacking network-analysis sna social-network-analysis terrorism terrorist-attacks

Last synced: 19 May 2026

https://github.com/jyrki69pro/pdf-insight-agent

📄 Extract insights from PDFs effortlessly with this AI-powered summarizer, transforming documents into structured, actionable points.

agent-based-model agentic-ai agentic-workflow agents ai-agent data-analysis finance-management financial-analysis generative-ai langchain langgraph llama3 llm multiagent-systems pdf phidata python toolcalling

Last synced: 11 Apr 2026

https://github.com/idaraabasiudoh/drug_prescribtion_decision_tree_model

This repository contains a machine learning project focused on classifying drugs based on patient characteristics using a Decision Tree classifier. The project uses Python and popular data science libraries such as scikit-learn, pandas, and matplotlib.

data-analysis jupyter-notebook machine-learning python3 scikit-learn

Last synced: 10 Apr 2026

https://github.com/touppercase78/salary-prediction-collection

Salary predictions with ML models and analyses on datasets from several other GitHub repos

data-analysis data-visualization datasets machine-learning python3 regression-models

Last synced: 02 May 2026

https://github.com/ramonanf/tc1002s_semanatec

Herramientas computacionales: El arte de la analítica

data-analysis data-visualization jupiter-notebook pandas-python

Last synced: 15 Jun 2025

https://github.com/eco786786/salaries

This analysis explores the factors influencing salaries for data professionals from 2020 to 2024, including job titles, experience levels, remote work ratios, employment types, company locations and sizes. Using data from Kaggle, the project uncovers trends and insights to guide both companies and professionals in the tech industry.

data-analysis git postgresql powerbi

Last synced: 19 May 2026

https://github.com/mimi-netizen/python-and-machine-learning-in-financial-analysis

This comprehensive repository covers financial data analysis using Python and machine learning techniques, including time series modeling, portfolio optimization, risk assessment, credit risk prediction, and deep learning applications in finance.

data-analysis data-science data-visualization finance financial-analysis financial-data financial-modeling

Last synced: 19 May 2026

https://github.com/jabulente/tanzania-geographical-zones

This project provides a geospatial visualization of Tanzania's geographical zones and regions. It uses geospatial data to map each zone, display regions, and annotate them for easy identification. The visualizations include simulated data to demonstrate thematic mapping techniques.

ai data-analysis data-science data-visualization geopandas geospatial location matplotlib ml python tanzania tanzania-geographic tanzania-locations

Last synced: 19 May 2026

https://github.com/mysftz/statistics-analysis

A python statistical analysis of a dataset and probability.

data-analysis matplotlib python python3 statistical-analysis

Last synced: 29 Jun 2025

https://github.com/galahad20/b244006e_analisis_data

Data Analysis project at Dicoding course "Belajar Analisis Data dengan Python". I learn to do analyst on data and visualizing it to get meaningful insight.

data-analysis data-analytics python streamlit

Last synced: 06 Apr 2026

https://github.com/marcomadera/test-for-random-numbers

Test for random number between 0 and 1

data-analysis statistics

Last synced: 09 Jul 2025

https://github.com/iamsainikhil/data-visualization

Visualization of Web data using Python

data-analysis data-visualization python webscraping

Last synced: 13 Jun 2026

https://github.com/srvcl/lung-cancer-survival-analysis

Data Cleaning of a dataset and Survival Analysis in R Language

data-analysis data-science data-visualization r survival-analysis

Last synced: 11 May 2026

https://github.com/madi-s/tennispredictor

Program to predict outcomes of major tennis matches.

data-analysis prediction-algorithm python scraper tennis webdriver

Last synced: 06 Jul 2025

https://github.com/gonzalofuentes28/dpeek

Interactive terminal data viewer for CSV, TSV, JSON, and JSONL files

bubbletea cli csv csv-viewer data-analysis data-viewer golang json json-viewer sqlite terminal tui

Last synced: 06 Apr 2026

https://github.com/jabulente/kruskall-wallis-test

This repository contain project that provides a reusable Python function to perform the Kruskal-Wallis H-test across multiple continuous variables, grouped by a categorical feature

data-analysis data-science eda hypothesis-tests kruskal-wallis kruskals-algorithm scipy-stats statistics

Last synced: 22 Jul 2025

https://github.com/faith99/water_pollution_dashboard

A data visualization project exploring water access, contamination and health outcomes

data-analysis data-visualization powerbi public-health publichealth

Last synced: 02 Feb 2026

https://github.com/dzakwanalifi/reglins

regLins is an R package designed for performing linear regression analysis using various optimization methods. It also provides an interactive Shiny application for a more dynamic analysis experience.

data-analysis linear-regression optimization r shiny-app

Last synced: 09 Jul 2025

https://github.com/carvalhoandre/coletor-tweets

Criado para coletar e armazenar tweets utilizando a API do Twitter. Inicialmente inspirado no caso de uso do livro Um Voluntário na Campanha de Obama, este projeto tem como objetivo demonstrar a importância do monitoramento no X. O coletor permite buscar tweets sobre qualquer termo desejado

data-analysis mongodb python twiter-analysis twitter

Last synced: 19 May 2026

https://github.com/prasad-chavan1/bank_data_analysis_r

Bank data analysis in R language

data data-analysis data-science r

Last synced: 24 Feb 2025

https://github.com/chaganti-reddy/ai-prototype-customer-segmentation

Artificial Intelligence Prototype product based model for Customer Segmentation in E-Commerce Industry.

artificial-intelligence cluster-analysis customer-segmentation data-analysis machine-learning product-based prototype

Last synced: 13 Mar 2025

https://github.com/ujjwalll/econometrics_analysis_of_india_gdp_misestimation

A Econometric Analysis of the India's GDP to determine whether their is any flaw in India's GDP, as quoted by Dr. Arvind Subhramanium.

coefficient-estimates data-analysis econometrics economics gdp india r statistics

Last synced: 31 Oct 2025

https://github.com/bhiogade/customer-purchase-analysis

Comprehensive Customer Purchase Analysis Across Multiple Dimensions

data-analysis data-visualization tableau tableau-desktop

Last synced: 02 Feb 2026

https://github.com/sweta-kaundilya/911-calls-capstone-project

For this capstone project we will be analyzing some 911 call data from Kaggle.

data data-analysis data-visualization jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 28 Apr 2026

https://github.com/sweta-kaundilya/sql_projects_data_analytics

This repository contains SQL porfolio projects

data-analysis mysql-database mysql-workbench

Last synced: 10 Sep 2025

https://github.com/al-ogr/sf_pr2_job_analysis_hh_sql

SkillFactory DataScience PROJECT-2. Анализ вакансий из HeadHunter

data-analysis data-science ipynb plotly python sql

Last synced: 19 May 2026

https://github.com/lmuffato/jiboia

Jiboia is a Python package for automatically normalizing and optimizing DataFrames efficiently.

data-analysis data-science dataframe normalization pandas python

Last synced: 19 May 2026

https://github.com/ahmedkhaled404/data-cleaning-and-eda-layoffs-mysql

This project involves cleaning a dataset containing information about layoffs from companies around the world.

data data-analysis data-cleaning data-preprocessing datacleaning eda exploratory-data-analysis mysql sql

Last synced: 08 Jun 2026

https://github.com/joe-stifler/llm-sig-playground

This repository is a collaborative space for MSc Earth Science students at Imperial College London to experiment with and apply Large Language Models (LLMs) to real-world Earth Science problems. Follows below the persona playground link.

data-analysis earth-science llms machine-learning research-automation

Last synced: 29 Mar 2025

https://github.com/mansiikumarii/mysql

A curated collection of MySQL scripts covering DDL, DML, and DRL operations. Ideal for beginners to practice and understand core SQL concepts.

backend data-analysis data-modeling database database-integration database-management database-performance database-schema mysql mysql-admin mysql-database orm php-mysql query-optimization rdbms sql sql-query sql-script stored-procedure

Last synced: 19 May 2026

https://github.com/anastasius21/creditcardfrauddetection

This repository contains a Jupyter Notebook for Credit Card Fraud Detection Model and a csv dataset on which it is being trained

credit-card-fraud data-analysis data-science data-visualization fraud-detection logistic-regression machine-learning

Last synced: 16 Jun 2025

https://github.com/the-pinbo/dimensionalityredux-pca-vs-autoencoders

Comparative study of PCA and Autoencoders for effective dimensionality reduction, assessed through PSNR and SSIM metrics.

autoencoder-mnist autoencoders data-analysis dimensionality-reduction image-compression mnist neural-networks pca psnr ssim

Last synced: 13 May 2025

https://github.com/julie-fliorko/rockbuster-insights-sql-project

Data analysis using PostgreSQL to help Rockbuster Stealth LLC identify revenue trends, customer insights, and rental behavior patterns.

data-analysis postgresql sql

Last synced: 22 Jul 2025

https://github.com/shreeparab1890/india-gdp-rate-1960-to-2021-data-analysis

This ipython notebook is the Exploratory data analysis (EDA) of the India GDP Rate 1960 to 2021.

analysis data-analysis eda exploratory-data-analysis ipython-notebook jyputer-notebook matplotlib matplotlib-pyplot pandas python

Last synced: 06 Mar 2026

https://github.com/jhrcook/protein-language-models

Experimenting with protein language model predictions

data-analysis protein-language-model variant-effect-prediction

Last synced: 28 May 2026

https://github.com/amishidesai04/interactive-data-visualisation-tool

A Java-based application leveraging JavaFX to create dynamic and interactive charts, including pie charts, bar charts, and line graphs. Ideal for visualizing various datasets, this tool offers customizable features and a user-friendly interface. Easily input and manage data, customize chart styles, and observe trends and patterns effectively.

charts data-analysis data-visualisation data-visualization-project gui java javafx visualization-tools

Last synced: 17 Apr 2026

https://github.com/andrewzgheib/football-database-analysis

Football database utilizing PostgreSQL and Pandas for data management, with PowerBI for intuitive KPI visualization

data-analysis data-visualization database pandas pgsql postgr powerbi sql

Last synced: 04 Apr 2025

https://github.com/nerooc/device-downtime-detection

Repozytorium dotyczące projektu z przedmiotu "Sztuczne Sieci Neuronowe"

data-analysis detection-model recurrent-neural-networks

Last synced: 22 Mar 2025

https://github.com/timkong21/siemens-mobility-operations-industrial-engineer-simulation

Operations Industrial Engineer job simulation with Siemens Mobility. Includes time study analysis to identify assembly bottlenecks (Task 1) and a proposed layout redesign to improve efficiency without automation (Task 2).

data-analysis forage industrial-engineering job-simulation manufacturing process-improvement production-engineering python siemens time-analysis

Last synced: 19 May 2026

https://github.com/lopez86/datascienceexamples

Examples of various data science & data analysis topics using various sources of data.

data-analysis data-science pandas scikit-learn tutorial visualization

Last synced: 13 Apr 2026

https://github.com/sharduljunagade/human-activity-recognition

This repository contains the code for the Assignment-1 of the course ES 335: Machine Learning 2024 at IIT Gandhinagar taught by Prof. Nipun Batra.

data-analysis data-collection decision-trees groq-api human-activity-recognition jupyter langchain-python machine-learning pandas prompt-engineering python sklearn tsfel

Last synced: 08 Apr 2026

https://github.com/vishal-verma-96/Pre-Owned-Car-Price-prediction-using-Streamlit-App

Capstone Project by skill Academy- Exploratory Analysis, Visualization and Prediction of Used Car Prices. Deploying the highest-scoring model with Streamlit web app

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms matplotlib numpy pandas python3 regression-algorithms scikit-learn seaborn streamlit

Last synced: 02 Mar 2025

https://github.com/nagar2nd/zomato-bangalore-analysis-tableau

Analysing restaurant data in Bengaluru to enhance customer satisfaction by optimizing the restaurant experience. The focus is on improving the popularity of different cuisines, enhancing delivery times, and boosting restaurant ratings. An interactive Tableau dashboard has been developed to help Zomato identify key areas for improvements.

data-analysis data-visualization tableau

Last synced: 05 Mar 2026

https://github.com/shubhamgoyal575/credit-card-fraud-detection

📌 Credit Card Fraud Detection using Machine Learning This project focuses on detecting fraudulent credit card transactions using machine learning models like Random Forest, XGBoost, and Deep Learning. The dataset is preprocessed to handle class imbalance, and multiple models are evaluated based on ROC AUC Score and F1 Score.

adaboost-classifier artificial-neural-networks credit-card-fraud data-analysis data-cleaning data-preprocessing data-science data-visualization deep-learning exploratory-data-analysis lightgbm machine-learning machine-learning-algorithms random-forest-classifer scikit-learn tensorflow xgboost

Last synced: 08 Feb 2026

https://github.com/swatisinghit/e-commerce-trend-analysis-for-target

An exploratory and in-depth study of the E-Commerce sales data for a Brazilian store using SQL.

bigquery data-analysis mysql sql

Last synced: 19 May 2026

https://github.com/amarlearning/exploring-the-evolution-of-linux

Data Analysis about the development of the Linux operating system by exploring its Git repository history.

cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux

Last synced: 12 May 2026

https://github.com/imnotamr/datasets-used

A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics

ai classification data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning models python regression-models

Last synced: 19 May 2026

https://github.com/mulukensholaye/spark_kafka_streaming_csv

Real-time streaming data analysis pipeline with integrating apache spark's streaming library to read records from kafka topic

apache-kafka apache-spark data-analysis python3 realtime-messaging

Last synced: 19 May 2026

https://github.com/syed-amjad-ali/airbnb-listing-analysis

Analyzing AirBnB listings in Paris to determine the impact of recent regulations

business-intelligence data-analysis jupyter-notebook maven-analytics python

Last synced: 19 May 2026

https://github.com/hawmex/aut_data_and_information_analysis_project

This repository contains the files of my project for the "Data & Information Analysis" course at AUT (Tehran Polytechnic).

data-analysis data-science k-means outlier-detection python

Last synced: 19 May 2026

https://github.com/devexpress-examples/wpf-pivotgrid-how-to-display-underlying-data

This example demonstrates how to obtain the records from the control's underlying data source for a selected cell or multiple selected cells.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 19 May 2026

https://github.com/samir-atra/share-lm_dataset_analysis

Analysis, studies and optimizations on the ShareLM extension dataset

data-analysis data-visualization gemma3n huggingface huggingface-transformers pandas

Last synced: 19 May 2026

https://github.com/sanafagal/wsp-msg-automation

An intuitive application for managing and analyzing customer and reseller data stored in Google Sheets, providing insights and streamlined data organization.

automation cloud-credentials data-analysis google-sheets-api python

Last synced: 16 Jun 2025

https://github.com/prakshal0809/sql-data-analysis-project

This project involves analyzing pizza sales data using SQL to address various data analysis questions, providing essential foundational to advanced SQL knowledge.

data-analysis sql

Last synced: 26 Jun 2025

https://github.com/borjamome/radiografia-madrid

Análisis de Población, Economía y Sociedad de Madrid con R.

data-analysis data-visualization madrid r

Last synced: 17 Jun 2025

https://github.com/singingsandhill/data_analysis

데이터 분석_개인 프로젝트 정리

data-analysis python

Last synced: 19 May 2026

https://github.com/chahelgupta/hospital-readmission-prediction-and-analysis

The Hospital Readmission Prediction project uses clinical data to predict diabetic readmissions. SVM + SMOTE achieved 61.16% accuracy, with key predictors including hospital stay, lab tests, and medications.

data-analysis knn-classification logistic-regression machine-learning prediction prediction-model python random-forest-classifier smote svm-classifier

Last synced: 15 May 2026

https://github.com/jcm-ai/Quantium-Data-Analytics-Virtual-Experience-Program

This repository contains all about the proposed solutions to the assignments that I was required to complete as part of the Quantium Data Analytics Virtual Experience Program. 📊📈📉👨‍💻

commercial-thinking communication-skills data-analysis data-validation data-visualisation data-wrangling jupyter-notebook matplotlib-pyplot numpy-library pandas-python presentation-skills programming python3 scipy-stats seaborn statistical-testing

Last synced: 19 Aug 2025

https://github.com/amruthadevops/stock-market-analysis

To analyze market trends and predict future market behavior using machine learning techniques

data-analysis data-science jupyter-notebook machine-learning powerbi-desktop python stock-market

Last synced: 15 May 2026

https://github.com/sukhitashvili/pca_tutorial

PCA algorithm from scrach, using only matrix-vector multiplications

data-analysis data-science data-visualization machine-learning-algorithms pca

Last synced: 29 Mar 2025

https://github.com/samukiszhsd/alteryx-analytics

Você está trabalhando com dados de transações bancárias do Itaú e precisa fazer algumas análises para ajudar o time de auditoria a detectar padrões incomuns e possíveis transações suspeitas.

alteryx data-analysis data-structures data-visualization etl workflow

Last synced: 18 Feb 2026

https://github.com/prady2309/stock-analysis

Analysis on the stock prices of Apple, Google, Microsoft and Amazon

data-analysis data-science data-visualization python stock-market

Last synced: 19 May 2026

https://github.com/shadz23/smart-energy-dashboard

Power BI dashboard analyzing household electricity consumption to reveal usage patterns, peak hours, and estimated costs for smarter energy management and reduced bills. 🐙

chart data-analysis data-visualization dax energy-consumption hs110 hs300 ibm ibm-cloud influxdb jupyter-notebook kasa kp115 linuxone observability photovoltaics-dashboard plotly sense

Last synced: 19 Aug 2025

https://github.com/eve-ning/ppshift

Analyzes maps and scores from 2015

data-analysis data-mining osu osugame

Last synced: 13 Feb 2026

https://github.com/saroshfarhan/irish_hospital_data_anaysis

Irish hospital's patient discharge data for four counties analysis

data-analysis data-science data-visualization healthcare irish-data r-programming-language

Last synced: 18 Feb 2026

https://github.com/azaz9026/loan_approval_prediction

Welcome to the Loan Approval Prediction repository! This project aims to build a predictive model that can determine whether a loan application should be approved or denied based on various features. Purpose The goal of this repository is to develop a machine learning model that can accurately predict loan approval decisio

data data-analysis data-visualization eda machine-learning numpy pandas python statistics

Last synced: 06 Apr 2026

https://github.com/sebastianurdaneguibisalaya/colocaciones-de-credito-fondo-mivivienda-peru

Exploro las Colocaciones de Crédito del Fondo MIVIVIENDA S.A. entre 2018 y 2022, con un conjunto de datos descargado del Portal Nacional de Datos Abiertos del Perú. 🏠

data-analysis jupyter-notebook python

Last synced: 24 Feb 2025

https://github.com/parthkumarmpatel/sql-exploratory-data-analysis

SQL EDA scripts for sales data warehouse — metrics, insights, and rankings from my data warehouse project.

data-analysis exploratory-data-analysis sql-server

Last synced: 26 Jun 2025

https://github.com/lewismakau/portfolio-projects

This repository contains file data and SQL files for projects used for my Portfolio.

data-analysis data-cleaning data-structures data-visualization database google-analytics microsoft-sql-server mysql powerbi tableau

Last synced: 02 Apr 2026

https://github.com/adeebkhan25/dataset_suicide_susceptible

The "Student Suicide Risk Factors Dataset" is a comprehensive collection of data aimed at understanding and mitigating the factors contributing to student suicides.

data-analysis dataset machine-learning supervised-learning

Last synced: 24 Dec 2025

https://github.com/alimiheb/advwokcube-analysis

A comprehensive SSAS cube project based on AdventureWorksDW2019, featuring data cleaning, multidimensional modeling, and visualizations in Power BI and Excel.

adventureworks data-analysis excel powerbi sql-server ssas-multidimensional visualization

Last synced: 26 Jun 2025

https://github.com/nivasharmaa/friskwatch

A Java program for analyzing stop-and-frisk data from the NYPD. Features data import, organization, and statistical analysis to compare occurrences during and after policy implementation.

data-analysis data-visualization dataprocessing datascience file-io java java-oop nypd-data

Last synced: 19 May 2026