An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/borjamome/radiografia-madrid

Análisis de Población, Economía y Sociedad de Madrid con R.

data-analysis data-visualization madrid r

Last synced: 17 Jun 2025

https://github.com/singingsandhill/data_analysis

데이터 분석_개인 프로젝트 정리

data-analysis python

Last synced: 19 May 2026

https://github.com/ccoolbaugh/individualized_cooling_data_analysis

Matlab code to analyze data collected during a brown adipose tissue individualized cooling protocol.

brown-adipose-tissue cold-exposure data-analysis ibutton matlab skin-temperature thermoregulation

Last synced: 18 Aug 2025

https://github.com/beatrice-b-m/bea-tools

🐝 𝓉𝑜𝑜𝓁𝓈 𝓂𝒶𝒹𝑒 𝒷𝓎, 𝒶𝓃𝒹 𝒻𝑜𝓇, 𝒷𝑒𝒶 🐝 . ݁₊ ⊹ . ݁ ⟡ ݁ . ⊹ ₊ ݁ ⊹ . ݁ ⟡ ݁ . ⊹ ₊ ݁. ⊹ . ݁ ⟡ ݁ .⊹ . ݁ ⟡ A Python package of random functions and tools that I use regularly. Data science / analysis focused since, ya know, I'm a data scientist c:

data-analysis data-science data-visualization

Last synced: 15 Jan 2026

https://github.com/mindlessmuse666/missing-data-processing

Проект по обработке пропущенных значений в данных о пассажирах Титаника с использованием библиотек Python Matplotlib и Seaborn.

data-analysis data-visualization matplotlib missing-values-analysis missing-values-handling pandas python seaborn titanic

Last synced: 16 May 2026

https://github.com/sukitsubaki/screen-time-tracker

A minimalist Python tracker that records the usage time of various applications and provides insights into your computer usage habits.

application-usage data-analysis monitoring productivity python python-cli screen-time time-tracking

Last synced: 12 Apr 2025

https://github.com/yash22222/data-analysis-on-real-time-social-media-comments

EngageInsight analyzes user interactions in comment data. It provides insights through visualizations created using Python libraries like Pandas and Matplotlib. The project aims to uncover patterns and trends in user engagement. The visualizations provide an overview of comment lengths, the frequency of different types of replies.

data-analysis data-cleaning-and-preprocessing data-visualization matplotlib pandas pattern-recognition real-time-social-media-data seaborn trend-analysis

Last synced: 14 May 2026

https://github.com/sukhitashvili/pca_tutorial

PCA algorithm from scrach, using only matrix-vector multiplications

data-analysis data-science data-visualization machine-learning-algorithms pca

Last synced: 29 Mar 2025

https://github.com/svetlanam/pycon-workshop

Pycon CZ workshop: Better data analyses and product recommendations with Instagram data

data-analysis data-science martinus matplotlib pandas pycon2016 pyconcz python scikit-learn workshop

Last synced: 09 Apr 2026

https://github.com/samukiszhsd/alteryx-analytics

Você está trabalhando com dados de transações bancárias do Itaú e precisa fazer algumas análises para ajudar o time de auditoria a detectar padrões incomuns e possíveis transações suspeitas.

alteryx data-analysis data-structures data-visualization etl workflow

Last synced: 18 Feb 2026

https://github.com/prady2309/stock-analysis

Analysis on the stock prices of Apple, Google, Microsoft and Amazon

data-analysis data-science data-visualization python stock-market

Last synced: 19 May 2026

https://github.com/eve-ning/ppshift

Analyzes maps and scores from 2015

data-analysis data-mining osu osugame

Last synced: 13 Feb 2026

https://github.com/saroshfarhan/irish_hospital_data_anaysis

Irish hospital's patient discharge data for four counties analysis

data-analysis data-science data-visualization healthcare irish-data r-programming-language

Last synced: 18 Feb 2026

https://github.com/sebastianurdaneguibisalaya/colocaciones-de-credito-fondo-mivivienda-peru

Exploro las Colocaciones de Crédito del Fondo MIVIVIENDA S.A. entre 2018 y 2022, con un conjunto de datos descargado del Portal Nacional de Datos Abiertos del Perú. 🏠

data-analysis jupyter-notebook python

Last synced: 24 Feb 2025

https://github.com/abhigyan126/prompt2query

A Python desktop application for streamlined data analysis, enabling users to generate and execute Pandas and SQL queries with ease. Focus on reducing analysis time through an intuitive interface and efficient workflows

data-analysis data-science data-visualization database gemini generative-ai ide llm pandas pandas-interface python sql-interface

Last synced: 13 Feb 2026

https://github.com/halyusa16/sql-employee-insights

This project dives into employee data to uncover actionable insights using SQL. It mimics real-world HR and business analysis tasks, from salary comparisons to workforce demographics and potential cost-cutting strategies.

data-analysis mysql sql

Last synced: 11 Apr 2025

https://github.com/zen204/accenture-tech-news-summarization-engine

A tool developed to analyze knowledge graphs from technology news articles, uncovering insights and trends about technology products, platforms, services, and their industry impact. Built during an internship at Accenture to inform decision-making in the tech landscape.

data-analysis decision-making graph-visualization industry-insights jupyter-notebook knowledge-graph machine-learning python tech-news tech-trends

Last synced: 29 Apr 2026

https://github.com/parthkumarmpatel/sql-exploratory-data-analysis

SQL EDA scripts for sales data warehouse — metrics, insights, and rankings from my data warehouse project.

data-analysis exploratory-data-analysis sql-server

Last synced: 26 Jun 2025

https://github.com/who-else-but-arjun/isro_xrf_sr

Source Codes for super resolution of the lunar elemental abundance map using a semi-supervised deep spatial interpolation model. This hybrid approach combined ResNet50 for spatial feature extraction with Graph Neural Network (GATv2Conv) layers and Convolutional Neural Networks (CNNs), followed by fusion layers.

cnn data-analysis graph-neural-networks pytorch semi-supervised-learning spatial-interpolation super-resolution

Last synced: 30 Apr 2026

https://github.com/rita94105/smart_contract_vulnerability_detector

Smart contracts are pivotal in blockchain applications but are prone to vulnerabilities that can lead to significant losses. SmartGuard: Multi-Stage Smart Contract Vulnerability Detection tackles this issue by developing a machine learning framework to identify eight vulnerability types using datasets from Kaggle and Hugging Face.

data-analysis machine-learning smart-contracts streamlit vulnerability-detection

Last synced: 01 Aug 2025

https://github.com/adeebkhan25/dataset_suicide_susceptible

The "Student Suicide Risk Factors Dataset" is a comprehensive collection of data aimed at understanding and mitigating the factors contributing to student suicides.

data-analysis dataset machine-learning supervised-learning

Last synced: 24 Dec 2025

https://github.com/rorrell/spotifyhistory

A Jupyter Notebook where I wrangle some data and plot a chart to draw some conclusions about a user's Spotify history

data-analysis data-visualisation data-wrangling jupyter-notebook python3

Last synced: 19 May 2026

https://github.com/alimiheb/advwokcube-analysis

A comprehensive SSAS cube project based on AdventureWorksDW2019, featuring data cleaning, multidimensional modeling, and visualizations in Power BI and Excel.

adventureworks data-analysis excel powerbi sql-server ssas-multidimensional visualization

Last synced: 26 Jun 2025

https://github.com/ygalvao/uow_ai_final_project

This was my Final Project for the Artificial Intelligence Diploma program of The University of Winnipeg - Professional, Applied and Continuing Education (PACE).

data-analysis data-analytics dbscan elections k-means k-means-clustering machine-learning som som-clustering

Last synced: 10 Jul 2025

https://github.com/nivasharmaa/friskwatch

A Java program for analyzing stop-and-frisk data from the NYPD. Features data import, organization, and statistical analysis to compare occurrences during and after policy implementation.

data-analysis data-visualization dataprocessing datascience file-io java java-oop nypd-data

Last synced: 19 May 2026

https://github.com/riborings/uranouchi42microdiversity

In this repository live the bash, R and Julia scripts used to explore the microdiversity of the prokaryotic community at Uranouchi Inlet (42-sample time-series) by means of metagenomic shotgun sequencing under the supervision of the Ogata Lab.

big-data data-analysis data-visualisation diversity-analysis marine-ecology marine-ecosystem metagenomics microbiome-analysis prokaryotic-genomes

Last synced: 29 Oct 2025

https://github.com/shellynagar27/marketing-content-performance-analysis

Analyzed 2024 social media campaign data from TikTok, Instagram, LinkedIn, and X.com using Power BI to uncover performance trends across platforms, content types, and regions. Built an interactive dashboard to drive insights on engagement, optimal posting times, and content strategy.

data-analysis data-modelling data-visualization excel figma marketing-analytics powerbi powerquery wireframing

Last synced: 26 Jun 2025

https://github.com/brunomontezano/digital-interventions-for-depression

📱 "Digital interventions for depressive symptoms: a randomized clinical trial" code

academia clinical-trials cognitive-behavioral-therapy data-analysis digital-health open-science smartphone-app

Last synced: 03 Oct 2025

https://github.com/blackcub3s/msc-finalthesis

The most important programming files, code functions and data processing pipelines for the Machine learning final thesis of my Master's degree. Also, the LaTeX code of the thesis.

data-analysis latex machine-learning numpy python sklearn

Last synced: 09 Apr 2026

https://github.com/kevin-rsj/sectores_economicos_covid-19

Análisis Exploratorio de Datos (EDA): Comportamiento de Sectores Económicos antes, durante y después de la Pandemia de COVID-19 (2019-2022)

data-analysis financial-analysis pandemic-analysis python stock-market time-series visualization yahoo-finance

Last synced: 20 May 2026

https://github.com/ryuzen6/kaggle-series

This is a series of Machine Learning/Deep Learning Models made for practice.

artificial-intelligence data-analysis data-science deep-learning machine-learning python3

Last synced: 20 May 2026

https://github.com/dcostachar/telco-customer-churn-dashboard

An interactive Tableau dashboard using the Telco Customer Churn dataset to analyze key drivers of customer churn and develop data-driven retention strategies for the telecommunications industry.

business-intelligence customer-churn-analysis data-analysis data-visualization marketing-analytics tableau

Last synced: 09 Mar 2026

https://github.com/abhishekyadav915/data-analytics-projects

This project focuses on performing comprehensive data analysis to extract valuable insights from a given dataset. By leveraging various data manipulation, cleaning, and visualization techniques, the project aims to uncover patterns, trends, and correlations that can inform decision-making and strategy.

data-analysis data-visualization dataset

Last synced: 05 Apr 2025

https://github.com/alan-oliveir/state-of-data-2022

Neste projeto faço a análise da distribuição das faixas salariais para os profissionais de nível júnior para o cargo de analista, cientista e engenheiro de dados.

data-analysis jupyter-notebook pandas-python seaborn-python

Last synced: 03 Oct 2025

https://github.com/badranalyst/restaurant-reviews-sentiment-analysis-nlp-case-study

This project analyzes restaurant reviews using Natural Language Processing (NLP) for sentiment analysis. It covers data exploration, pre-processing (NLTK text cleaning), model building, prediction, and deployment. The goal is to predict sentiment from reviews using Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn.

data-analysis data-science eda exploratory-data-analysis matplotlib-pyplot model model-building numpy pandas pre-processing predictive-modeling python seaborn

Last synced: 13 Apr 2026

https://github.com/rohitha-tata/churn-predict

Churn Predict uses Machine Learning to analyze customer behavior and identify those likely to leave. It involves data preprocessing, feature selection, model training (Logistic Regression, Random Forest, XGBoost), and evaluation using accuracy and ROC-AUC. The model provides actionable insights to help businesses reduce churn and improve retention

data-analysis logistic-regression machine-learning python

Last synced: 16 May 2026

https://github.com/techshot25/graduateadmissions

Looking at the probability of being accepted in a graduate program using a machine learning model

bayesian-regression correlation-matrices data-analysis data-science linear-regression machie-learning random-forest-regression regression ridge-regression

Last synced: 25 Feb 2025

https://github.com/srinibas-masanta/hotel-revenue-analysis-dashboard

This project focuses on analyzing hotel booking data to uncover key metrics and insights that drive revenue management decisions. By creating an interactive Power BI dashboard, the project aims to improve strategic decision-making, optimize occupancy rates, and enhance overall financial performance within the hospitality industry.

business-analytics data-analysis data-science data-visualization dax-functions hospitality powerbi

Last synced: 12 Jan 2026

https://github.com/coditheck/data_analysis

Data analysis is the process of inspecting, cleaning, transforming, and modeling data in order to discover useful information, draw conclusions, and support decision making.

data-analysis python

Last synced: 17 Jun 2025

https://github.com/kaushik-puttaswamy/amazon-sales-dashboard-using-tableau

The Amazon Sales Data Analysis Dashboard provides insights into key sales metrics like profit, revenue, shipment days, and units sold. It includes visualizations to assess performance by region, country, and sales channel. The dashboard helps stakeholders optimize strategies and improve profitability through data-driven analysis.

dashboard data-analysis data-visualization tableau

Last synced: 11 Jan 2026

https://github.com/kiran-kumar-k3/sales-performance-dashboard

The Sales Performance Dashboard is an interactive Python-based web application that visualizes and analyzes sales data, providing actionable insights through dynamic charts and metrics.

data-analysis python streamlit

Last synced: 20 May 2026

https://github.com/ritap03/neuralnetwork-shapeclassifier

Feedforward neural network system in MATLAB for geometric shape classification. Includes data preprocessing, network training and evaluation, confusion matrix analysis, and a graphical interface for user interaction and model testing.

ai data-analysis deep-learning feedforward-network gui image-classification machine-learning matlab neural-network pattern-recognition

Last synced: 14 May 2026

https://github.com/lucashomuniz/project-15

[Dashboard] Enhancing Business Intelligence: Leveraging SQL, Python, and DAX for Strategic Insights in Sales Analysis

business-analytics business-intelligence data-analysis data-science data-visualization dax-languague machine-learning powerbi python

Last synced: 12 Jul 2025

https://github.com/archanakokate/bank_term_deposit_prediction

Build a Decision Tree classifier to predict if the client will subscribe to a Term Deposit based on their demographic and behavioral data.

data-analysis data-visualization exploratory-data-analysis machine-learning

Last synced: 14 Sep 2025

https://github.com/arkww/matmap

Making maps from a Database and making the user guess which map is displayed

data-analysis data-science javascript python

Last synced: 24 Apr 2026

https://github.com/gui-sitton/carsells

In this project I am an analyst on the Crankshaft List. Hundreds of free vehicle advertisements are published on the site every day. I need to study the data collected over the last few years and determine which factors influence the price of a vehicle.

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 20 May 2026

https://github.com/karanch10/fraudshield

FraudShield is a machine learning credit card fraud detection system that analyzes transaction attributes to identify suspicious activities in real time. Built with Python, SQL, and Django, it provides a user-friendly interface for fraud prediction using OpenBanking APIs and advanced detection techniques. Ideal for businesses and individuals.

data-analysis data-science data-visualization machine-learning python3

Last synced: 20 May 2026

https://github.com/jiteshshelke/codsoft

A repository showcasing three machine learning projects—Titanic Survival Prediction, Movie Rating Prediction, and Iris Flower Classification—completed during CodSoft's Data Science Internship. 🚀

codsoft codsoftinternship data-analysis data-science linear-regression logistic-regression machine-learning machine-learning-algorithms python

Last synced: 20 May 2026

https://github.com/tabibyte/azerbaijani-rapper-lyrics-data-analysis

Lyrics Data Analysis of Azerbaijani Rappers

azerbaijan data-analysis rappers

Last synced: 22 Jul 2025

https://github.com/htsandaruvan/attrition-analytics-suite-by-hello-green

I have created a comprehensive data analytics dashboard to identify factors contributing to attrition,

data-analysis data-analytics data-visualization powerbi

Last synced: 20 Jan 2026

https://github.com/patricksferraz/aqw-madrid-data-analysis

Interactive analysis and visualization of Madrid's air quality and weather data (2001-2016) using Python, Dash, and Jupyter. Features interactive maps, statistical analysis, and data visualization tools.

air-quality dash data-analysis data-engineering data-science data-visualization data-wrangling environmental-data environmental-science interactive-dashboard jupyter jupyter-notebook madrid open-data pandas plotly python statistical-analysis time-series weather-data

Last synced: 30 Jan 2026

https://github.com/arkww/chinesenewspaperwordcount

Analysis the word count of Chinese characters in Simplified and Traditional Chinese characters and comparing the results

chinese-language data-analysis data-science python

Last synced: 16 May 2026

https://github.com/iwasakiyuuki/data-analysis-platform-airflow-dag

A collection of Airflow DAGs for automating data collection into our on-premises data analysis platform.

airflow airflow-dags data-analysis data-collection

Last synced: 13 May 2025

https://github.com/steviecurran/prediction-plot

Code to performs machine learning (k-nearest neighbours regression) and plot the predicted versus measured values

astrophysics c data-analysis high-redshift machine-learning pgplot python statistics tensorflow visualization

Last synced: 20 May 2026

https://github.com/ahnaf19/clean_bankingdata

Here I tried to practice simple ETL tasks. I know how to perform these tasks in SQL, here just explored my way around using pandas as well.

data-analysis data-cleaning pandas python

Last synced: 19 Apr 2026

https://github.com/lunarwhite/lake-george-viz

Geroge Lake data analysis and visualization, ANU COMP1730/6730

data-analysis python

Last synced: 01 Nov 2025

https://github.com/jatin-mehra119/sales-analysis

Sales Analysis of super market

data-analysis salesanalysis visualization

Last synced: 29 Oct 2025

https://github.com/ifigeneiatsiflidou/popular-items-sales-analysis

Two data tasks in Python: popular items by ZIP & store sales breakdown with plots.

data-analysis matplotlib pandas

Last synced: 16 May 2026

https://github.com/RLAlpha49/AniSearch-Model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2025

https://github.com/nemat-al/multivariate_data_analysis

Tasks for Multivariate Data Analysis Course @ ITMO University

data-analysis multivariate-analysis python

Last synced: 20 May 2026

https://github.com/pooja-manjunatha/nyc_parking_violations_dbt

This project uses dbt to transform NYC parking violations data through a layered architecture: Bronze: Raw ingested data Silver: Cleaned and enriched data Gold: Aggregated tables for analytics Using DuckDB as the warehouse backend, it ensures data quality with tests and documentation. The project enables reliable analysis of parking violations

data data-analysis data-engineering dbt duckdb python sql

Last synced: 14 May 2026

https://github.com/silasberger/charts-analysis

Data set collection, preprocessing and analysis of singles- and album charts

charts data-analysis data-mining data-science dataset music

Last synced: 14 Sep 2025

https://github.com/aakk23/netflix_sql_project

This SQL project provides an analytical overview of Netflix's movies and TV shows dataset, uncovering key insights related to content types, ratings, release trends, and geographic distribution. It helps explore patterns in content availability, audience targeting, and regional preferences to support data-driven decisions.

data-analysis netflix-data-analysis postgresql sql

Last synced: 10 Apr 2025

https://github.com/mkoeppe/jiawei-computations

Computations supporting Chapters 2 and 3 of Jiawei Wang's dissertation "Subadditivity of Piecewise Linear Functions", UC Davis, Ph.D. program in Mathematics, 2020

benchmark-framework branch-and-bound cluster cutting-planes data-analysis hpc integer-programming reproducible-research sagemath

Last synced: 10 Aug 2025

https://github.com/alfioma/ada-xtq

🔗 Simplify data transfer with ada-xtq, a lightweight tool for seamless integration and efficient handling of data between platforms.

ada algorithms api-development artificial-intelligence automation data-analysis data-visualization docker machine-learning neural-networks open-source programming python software-development xtq

Last synced: 01 May 2026

https://github.com/ranxi2001/predicting-mental-health-risk

数据分析案例-精神健康预测(数据来源kaggle)

data-analysis data-visualization eda

Last synced: 27 Jun 2025

https://github.com/panoschatzi/erythrocyte_study_statistical_analyses

R code for data transformation, analysis and visualization of experimental data, as well as for statistical analyses and quantitative simulations.

afex data-analysis emmeans ggplot2 lme4 purrr r rprogramming rstats rstudio statistics tidyverse visualization

Last synced: 04 Apr 2025

https://github.com/samruddhi3012/rfm-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 27 Jun 2025

https://github.com/jelhamm/internode-hellinger-distance-based-decision-tree

Simulations for the paper "Inter node Hellinger Distance based Decision Tree by Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib"

articles data-analysis data-mining decision-tree decision-tree-classifier hddt hellinger-distance-criterion machine-learning numpy-library paper-implementations python scipy-library simulation tree-node

Last synced: 04 Apr 2025