An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/ygalvao/uow_ai_final_project

This was my Final Project for the Artificial Intelligence Diploma program of The University of Winnipeg - Professional, Applied and Continuing Education (PACE).

data-analysis data-analytics dbscan elections k-means k-means-clustering machine-learning som som-clustering

Last synced: 10 Jul 2025

https://github.com/vipulbunny/web-tech-scanner

A Python-based web scraping tool that detects technologies used on a website by analyzing its scripts, meta tags, and HTML content.

beautifulsoup beautifulsoup4 data-analysis data-science python requests technology-detection web-scraping

Last synced: 22 May 2026

https://github.com/bocchio01/skyward_recruitment_assignment

Assignment to join the PoliMi SkyWard software team

data-analysis kalman-filter model-rocket

Last synced: 15 Mar 2025

https://github.com/riborings/uranouchi42microdiversity

In this repository live the bash, R and Julia scripts used to explore the microdiversity of the prokaryotic community at Uranouchi Inlet (42-sample time-series) by means of metagenomic shotgun sequencing under the supervision of the Ogata Lab.

big-data data-analysis data-visualisation diversity-analysis marine-ecology marine-ecosystem metagenomics microbiome-analysis prokaryotic-genomes

Last synced: 29 Oct 2025

https://github.com/nymarya/analise-correlacao-sifilis

Código da análise de correlação entre notificações de casos de sífilis e disponibilidade de testes e medicamentos

data-analysis healthcare pandas

Last synced: 03 Jan 2026

https://github.com/abhishekyadav915/data-analytics-projects

This project focuses on performing comprehensive data analysis to extract valuable insights from a given dataset. By leveraging various data manipulation, cleaning, and visualization techniques, the project aims to uncover patterns, trends, and correlations that can inform decision-making and strategy.

data-analysis data-visualization dataset

Last synced: 05 Apr 2025

https://github.com/junpenglao/spafv

SPAFV - Surface Profile Analysis for Free Viewing eye movement experiment in 2AFC task

data-analysis statistics temporal-logic

Last synced: 31 Mar 2025

https://github.com/sanjana-bongale/cancer_survival_data_analysis_and_prediction_using_logistic_regression

This project performs data analysis using Python to predict cancer patient survival outcomes. It involves data cleaning, exploratory analysis, and visualizations to explore factors like cancer type, stage, and treatments. A logistic regression model is built to predict patient survival based on demographic and medical data.

data-analysis data-cleaning data-science data-visualization eda jupyter-notebook kaggle logistic-regression machine-learning matplotlib numpy pandas predictive-modeling python scikit-learn seaborn

Last synced: 08 Apr 2026

https://github.com/rohitha-tata/churn-predict

Churn Predict uses Machine Learning to analyze customer behavior and identify those likely to leave. It involves data preprocessing, feature selection, model training (Logistic Regression, Random Forest, XGBoost), and evaluation using accuracy and ROC-AUC. The model provides actionable insights to help businesses reduce churn and improve retention

data-analysis logistic-regression machine-learning python

Last synced: 16 May 2026

https://github.com/derogative404/google_data_analytics_capstone

Capstone project part of the Google Data Analytics Certificate Program

data-analysis excel r tableau

Last synced: 26 Mar 2025

https://github.com/coditheck/data_analysis

Data analysis is the process of inspecting, cleaning, transforming, and modeling data in order to discover useful information, draw conclusions, and support decision making.

data-analysis python

Last synced: 17 Jun 2025

https://github.com/xenon1919/credit-card-fraud-detection

Credit Card Fraud Detection is a machine learning project to predict fraudulent credit card transactions. It handles imbalanced data using undersampling and applies Logistic Regression and XGBoost models. With an AUC of 0.98, it offers robust fraud detection. Includes a Streamlit app for real-time predictions.

data-analysis machine-learning python

Last synced: 14 May 2026

https://github.com/kaushik-puttaswamy/amazon-sales-dashboard-using-tableau

The Amazon Sales Data Analysis Dashboard provides insights into key sales metrics like profit, revenue, shipment days, and units sold. It includes visualizations to assess performance by region, country, and sales channel. The dashboard helps stakeholders optimize strategies and improve profitability through data-driven analysis.

dashboard data-analysis data-visualization tableau

Last synced: 11 Jan 2026

https://github.com/tushar2704/loan-limits-by-country

This project aims to leverage a diverse dataset encompassing economic indicators, demographic factors, and credit history to establish a predictive model. By establishing appropriate loan limits, financial institutions can enhance risk management, ensure responsible lending, and promote financial inclusivity.

artificial-intelligence data-analysis data-science loan project tushar2704

Last synced: 30 Oct 2025

https://github.com/thesfinox/fit-the-data

Data analysis using Wolfram Mathematica

analysis data data-analysis lab mathematica wolfram wolfram-mathematica

Last synced: 24 Jan 2026

https://github.com/anjalikumari021/sports_data_analysis_using_excel

Analyzed Sports data and prepared advanced dashboard using MS Excel.

data-analysis data-cleaning excel-dashboard ms-excel pivot-tables reporting

Last synced: 08 Mar 2026

https://github.com/mmfava/significados-aulas-biologia-quasiexp-2019

Repositório das análises realizadas para o paper "Construção de significados em aulas práticas de laboratório de biologia: uma avaliação por delineamento quase-experimental".

data-analysis r statistics

Last synced: 28 Jun 2025

https://github.com/bala-1409/sales-forecasting-datascience-project

Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.

data data-analysis data-science data-visualization datacleaning exploratory-data-analysis machine-learning-algorithms modelfitting prediction predictive-analytics predictive-modeling python3 regression-models salesforecast supervised-learning

Last synced: 26 Apr 2026

https://github.com/bala-1409/rafik-s-kitchen-data-analysis

The Project is about the Analysis of the Sales and Expenses Data of a Famous Fast-food Restaurant. This mainly focuses on gaining Insights that will boost the Future Sales and also Business Strategies it Improve the Profit Margins. Handled Tools are SQL, Python, Power BI, MS Office Tools.

business-analytics business-intelligence data-analysis data-analytics data-visualization eda exploratory-data-analysis ms-office powerbi-report powerpoint-presentations python sql-server

Last synced: 06 May 2026

https://github.com/beolawork-art/novabank-churn-analysis

NovaBank has noticed that customers are closing accounts or going inactive, and they want to understand why.

data-analysis data-science-projects data-visualization eda machine-learning numpy pandas python scikit-learn sql

Last synced: 08 Apr 2026

https://github.com/arkww/matmap

Making maps from a Database and making the user guess which map is displayed

data-analysis data-science javascript python

Last synced: 24 Apr 2026

https://github.com/smohanta23/ev-trendanalytics-24

This Tableau project analyzes EV adoption trends using data up to May 2024. Visualizations cover growth, geography, market share, CAFV eligibility, and consumer preferences, supporting data-driven decisions with detailed drill-downs. Data is meticulously cleaned, offering stakeholders valuable insights into EV market dynamics and trends for future.

business-intelligence data-analysis data-engineering electric-vehicles feature-engineering kpianalysis predictive-analytics tableau trendanalysis

Last synced: 27 Mar 2026

https://github.com/vladstudennikov/diabetes-prediction-app

ML-powered web app built with Laravel and Vue.js to predict diabetes risk based on users' daily habits and behavior

cypress data-analysis diabetes-prediction fastapi inertiajs laravel matplotlib medicine ml pandas php scikit-learn seaborn vuejs

Last synced: 08 Apr 2026

https://github.com/htsandaruvan/attrition-analytics-suite-by-hello-green

I have created a comprehensive data analytics dashboard to identify factors contributing to attrition,

data-analysis data-analytics data-visualization powerbi

Last synced: 20 Jan 2026

https://github.com/faris771/identify_customer_segments

This project is part of the Palestine Launchpad by Spark, and Udacity with Google. It uses unsupervised learning to identify customer segments for a mail-order company in Germany. The goal is to direct marketing campaigns towards the most promising audiences. The data is provided by Bertelsmann Arvato Analytics.

clustering data-analysis decomposition feature-engineering machine-learning unsupervised-learning

Last synced: 08 Aug 2025

https://github.com/thenazar9/user-behavior-email-campaign-analysis-sql

Analysis of user behavior and email campaign performance using BigQuery and Looker Studio, focusing on account creation trends, email engagement, and user segmentation.

analytics bigquery data-analysis data-visualization etl looker-studio sql structured-query-language

Last synced: 16 Oct 2025

https://github.com/kingflow-23/association-matching

Recherche et Structuration d'Opportunités de Financement pour les Associations

association data-analysis data-engineering excel fondation pyqt5 python webscraping

Last synced: 07 Apr 2025

https://github.com/arkww/chinesenewspaperwordcount

Analysis the word count of Chinese characters in Simplified and Traditional Chinese characters and comparing the results

chinese-language data-analysis data-science python

Last synced: 16 May 2026

https://github.com/noor188/preswald-data-app

A data app to visualize and manipulate the graduate admission dataset

data-analysis data-visualization open-source

Last synced: 04 Jul 2025

https://github.com/kefilweditse/awesome-matchem-datasets

Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.

awesome awesome-dataset awesome-dataset-collection awesome-match-data awesome-matchem data-analysis data-matching dataset dataset-collection dataset-research dataset-samples match match-data match-dataset-analysis match-examples

Last synced: 07 Apr 2025

https://github.com/who-else-but-arjun/isro_xrf_sr

Source Codes for super resolution of the lunar elemental abundance map using a semi-supervised deep spatial interpolation model. This hybrid approach combined ResNet50 for spatial feature extraction with Graph Neural Network (GATv2Conv) layers and Convolutional Neural Networks (CNNs), followed by fusion layers.

cnn data-analysis graph-neural-networks pytorch semi-supervised-learning spatial-interpolation super-resolution

Last synced: 30 Apr 2026

https://github.com/divyanshugit/indian-judiciary-analysis

Analysis of Indian district court data across states.

classification data-analysis

Last synced: 02 Jul 2025

https://github.com/neerajcodes888/data-science

This repository is a hub for data science enthusiasts, offering a diverse collection of projects, notebooks, and resources covering topics such as data analysis, machine learning, deep learning, and generative AI. Explore innovative ideas, contribute to cutting-edge research, and enhance your skills in the dynamic field of data science

data-analysis data-science data-visualization deep-learning deep-learning-algorithms eda genai jupyter-notebook machine-learning machine-learning-algorithms openai-api pandas plotting python3 sklearn-library streamlit

Last synced: 01 May 2026

https://github.com/jatin-mehra119/sales-analysis

Sales Analysis of super market

data-analysis salesanalysis visualization

Last synced: 29 Oct 2025

https://github.com/ifigeneiatsiflidou/popular-items-sales-analysis

Two data tasks in Python: popular items by ZIP & store sales breakdown with plots.

data-analysis matplotlib pandas

Last synced: 16 May 2026

https://github.com/sivkri/rnaseq-analysis-junctionseq-qorts

This repository provides scripts for RNA-Seq data analysis using JunctionSeq and QoRTs, enabling quality control, differential splicing analysis, and generation of browser tracks.

bioinformatics data-analysis differential-splicing genomics junctionseq qorts quality-control rna-seq rna-seq-analysis splice-junctions splice-variants spliced-alignment transcriptomics

Last synced: 22 Mar 2025

https://github.com/rohitha-tata/bike-sales

This project focuses on data cleaning, transformation, and dashboard creation using a bike buyers dataset. It includes Pivot Tables, slicers, visualizations, and statistical insights to analyze trends based on income, age, occupation, and other key factors. Insights help understand customer behavior, purchasing patterns, and decision-making trends.

data-analysis data-cleaning excel-dashboards interactive-slicers pivot-charts pivot-tables

Last synced: 08 Mar 2026

https://github.com/RLAlpha49/AniSearch-Model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2025

https://github.com/cosmoduende/r-earthquakes

Análisis y visualización de datos de actividad sísmica en México con R. Cómo analizar y visualizar la historia sísmica de México con datos del SSN (Servicio Sismológico Nacional)

data-analysis data-analytics data-science dataviz earthquakes r-code r-programming r-studio rstudio sismo sismologia sismos ssn ssnmx terremoto terremotos

Last synced: 24 Jan 2026

https://github.com/alfioma/ada-xtq

🔗 Simplify data transfer with ada-xtq, a lightweight tool for seamless integration and efficient handling of data between platforms.

ada algorithms api-development artificial-intelligence automation data-analysis data-visualization docker machine-learning neural-networks open-source programming python software-development xtq

Last synced: 01 May 2026

https://github.com/mrham17/spotify_streaming_analytics

Project is stable & documentation will be completed soon. Thank you for your understanding and patience.

big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics

Last synced: 24 Jul 2025

https://github.com/panoschatzi/erythrocyte_study_statistical_analyses

R code for data transformation, analysis and visualization of experimental data, as well as for statistical analyses and quantitative simulations.

afex data-analysis emmeans ggplot2 lme4 purrr r rprogramming rstats rstudio statistics tidyverse visualization

Last synced: 04 Apr 2025

https://github.com/jelhamm/internode-hellinger-distance-based-decision-tree

Simulations for the paper "Inter node Hellinger Distance based Decision Tree by Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib"

articles data-analysis data-mining decision-tree decision-tree-classifier hddt hellinger-distance-criterion machine-learning numpy-library paper-implementations python scipy-library simulation tree-node

Last synced: 04 Apr 2025

https://github.com/matte34/auto-insurance-analysis

Conducted a comprehensive exploratory data analysis (EDA) on an auto insurance dataset that I found from Kaggle. I performed a permutation test and generated data visualizations.

data-analysis data-visualization permutation-test python3 scipy seaborn

Last synced: 06 May 2026

https://github.com/ritap03/neuralnetwork-shapeclassifier

Feedforward neural network system in MATLAB for geometric shape classification. Includes data preprocessing, network training and evaluation, confusion matrix analysis, and a graphical interface for user interaction and model testing.

ai data-analysis deep-learning feedforward-network gui image-classification machine-learning matlab neural-network pattern-recognition

Last synced: 14 May 2026

https://github.com/swethajoseph/netflix-powerbi-interactive-dashboard

Created an interactive Netflix Power BI dashboard to analyze and visualize Netflix's content library, uncovering trends in content type, genre distribution, and global reach

data-analysis data-visualization interactive-visualizations powerbi powerbi-dashboards powerbi-report

Last synced: 03 Jan 2026

https://github.com/eea/eea.reveal

Reveal hidden knowledge by visualizing network structure in your data.

data-analysis data-visualization graphviz network-visualization

Last synced: 18 Mar 2025

https://github.com/cescedes/medical-insurance-costs-with-python

Investigate how different factors affect the prediction of medical insurance costs by practicing many python concepts.

codecademy data-analysis python python-dictionaries python-functions python-lists python-loops python-strings

Last synced: 19 May 2026

https://github.com/ashwin331133/sql-healthcare-data

This repository contains SQL queries designed to analyze health care data. The queries focus on patient demographics, encounter costs, and flu shot statistics, aiming to provide insights into patient behavior and financial impacts. The datasets include information on patient encounters, flu shots, and hospital admissions.

data-analysis mysql sql

Last synced: 29 Oct 2025

https://github.com/mfakhriazhar/housing-price-analysis

Determining the price of a house also depends on various factors such as building area, exterior quality, and amenities. This dataset provides information on properties for sale, and through Exploratory Data Analysis (EDA), patterns and key factors affecting house prices can be identified.

data-analysis data-science data-visualization eda exploratory-data-analysis python

Last synced: 16 May 2026

https://github.com/stkisengese/numpy-data-fundamentals

A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.

data data-analysis numpy pre-processing

Last synced: 16 May 2026

https://github.com/kushalagarwalla/netflix-movie-data-analysis

🚀 Netflix Data Analytics Project 🎬📊 | Analyzed 9K+ movies to uncover insights on genres, popularity, votes & release trends. Includes EDA, KPIs & visualizations using Python (Pandas, NumPy, Matplotlib, Seaborn). Supports data-driven content & engagement strategy.

data-analysis data-visualization jupyter-notebook numpy pandas python seaborn

Last synced: 06 May 2026

https://github.com/labex-labs/numpy-for-beginners

This comprehensive course covers the fundamental concepts and practical techniques of NumPy, the essential library for numerical computing in Python. Learn to create, manipulate, and analyze arrays efficiently.

array-manipulation array-slicing beginner-friendly course data-analysis data-science data-structures fast-computation hands-on labex labs linear-algebra matrix-operations numerical-computing numpy programming python python-programming scientific-computing vectorized-operations

Last synced: 20 Jun 2026

https://github.com/carlosvinimsouza/jupyter-notebook-basic

Armazenado todos os trabalhos referentes a Ciência de Dados.

data-analysis data-science programas-jupyter-notebook python

Last synced: 11 May 2026

https://github.com/mboula/mboula.github.io

GitHub portfolio + interactive resume | Showcasing data projects in civil rights (housing), cannabis, and analytics

cannabis case-study civil-rights compliance dashboards data-analysis data-cleaning data-vizualization excel google-data-analytics housing open-data pattern-analysis portfolio pro-se public-data r sql tableau

Last synced: 10 Jul 2025

https://github.com/zwelz3/unofficial-survivor-knowledge-graph

A comprehensive RDF knowledge graph covering all 50 seasons of Survivor (US), with 23,000+ triples across 749 named graphs.

data-analysis rdf survivor

Last synced: 23 May 2026

https://github.com/katarinatmb/serbia-protest-analysis

This project analyzes the frequency, regional distribution, and group characteristics of protests that emerged across Serbia following the fatal collapse of the Novi Sad train station roof in November 2024. The analysis explores how different communities responded in the aftermath of the disaster, using data visualization in RStudio

data-analysis data-visualization r r-mark rstudio

Last synced: 10 Jul 2025

https://github.com/karsterr/repeated-measurement

An R-based workflow for conducting repeated measures ANOVA using the ez package, with data wrangling via tidyverse and visualization through ggplot2. Includes data import, transformation to long format, statistical analysis, and graphical summary.

anove data-analysis experimental-design ezanove ggplot2 r repeated-measurements rstats statistics tidyverse

Last synced: 18 Sep 2025

https://github.com/ariyaarka/sales-analysis

A simple analysis on random dataset of pizza sales using SQL

data-analysis presentation-slides sql

Last synced: 17 Jan 2026

https://github.com/victorlcastro-dsa/coping_struggles_prediction

Repositório para prever dificuldades de enfrentamento com base em dados de saúde mental. Inclui análise, visualização e modelagem usando aprendizado de máquina. Resultados alcançam 86.58% de acurácia com um Voting Classifier.

classification-algorithm data-analysis data-science data-visualization machine-learning-algorithms problem-solving project-based-learning python

Last synced: 19 Apr 2025

https://github.com/ramapinnimty/udacity-mlfoundation-nanodegree

This is a repository containing solutions to the assignments that are a part of the Udacity Machine Learning Foundation Nanodegree program.

assignments data-analysis python3 statistics udacity-machine-learning-nanodegree

Last synced: 26 Jul 2025

https://github.com/colindean/allegheny_voter_reg_analysis

Allegheny County Voter Registration Analysis Tools

data-analysis data-science elections pandas polars python voting

Last synced: 16 May 2026

https://github.com/gaurav-van/data-analysis-projects

Collections of Projects that involves Data Analysis and Informed Decision Making

data-analysis database powerbi sql

Last synced: 06 Sep 2025

https://github.com/dmvianna/python-nix

Trivial Nix environment with pandas and postgresql

data-analysis nix

Last synced: 27 Jul 2025

https://github.com/grindelfp/data-analysis-example

One of my UNI Artificial Intelligence Systems course's projects.

data-analysis data-preprocessing ipynb

Last synced: 19 Sep 2025

https://github.com/jofaval/ionosphere

Binary Classification of Ionosphere signals at Goose Bay, Labrador in 1988

data-analysis data-science data-visualization deep-learning google-colab keras machine-learning python scikit-learn tensorflow uci xgboost

Last synced: 09 Apr 2026

https://github.com/tbep-tech/tbep-r-training

Repository for miscellaneous R training materials

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/ashwin331133/gorkha_earthquake_damage_prediction

The main objective is to predict the level of damage to buildings caused by the 2015 Gorkha earthquake in Nepal.

data-analysis data-visualization machine-learning python

Last synced: 29 Apr 2026

https://github.com/shafaq-aslam/predicting-heart-disease-risk-with-logistic-regression-techniques

Develop a predictive model using logistic regression techniques to assess heart disease risk based on patient health metrics and data analysis.

data-analysis heart-disease logistic-regression machine-learning machine-learning-models matplotlib numpy pandas python scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/engraulleite/local-data-warehousing-with-docker

Creating a DW from 0 to hero. Starting with logical and physical modeling to valuable reports.

airbyte data-analysis datawarehouse docker etl-pipeline metabase pgadmin4 postgresql

Last synced: 01 May 2026

https://github.com/nandit123/python_on_excel

Data Analysis using python libraries on excel data

csv data-analysis data-science fill fluctuations graph numpy python python-library

Last synced: 16 May 2026

https://github.com/datalopes1/fifa21_datacleaning

Neste projeto será feito o processo de limpeza e manipulação a partir do dataset FIFA 21 messy, raw dataset for cleaning/ exploring, que pode ser encontrado no Kaggle, com licensa CC0: Public Domain e enviado por Rachit Toshniwal.

data-analysis data-cleaning python

Last synced: 30 Apr 2026

https://github.com/tbep-tech/piney-point-analysis

Materials for analysis of Piney Point monitoring data

data-analysis open-science piney-point tampa-bay tbep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/rookery-bay-training

Materials for R training at Rookery Bay Monitoring Workshop 2020

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/martachesnova/sql

Performing data modeling (ERD) and data engineering. Then, writing series of SQL queries to analyze Employee Database of a company.

data-analysis data-engineering data-modeling erd postgresql sql

Last synced: 16 May 2026

https://github.com/aroramrinaal/spotistats

Spotistats is a data analysis and visualization project based on your Spotify streaming history.

data-analysis numbers spotify spotify-history visualization

Last synced: 15 Mar 2025

https://github.com/hassanislam463/british-airways-data-science

Analyze Skytrax reviews to uncover customer sentiments and key themes while predicting booking behavior using machine learning. This repository includes data collection, analysis, and modeling scripts alongside concise, visualized insights to improve customer experience and operational efficiency.

data-analysis data-science data-visualization

Last synced: 28 Mar 2025