An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/blackcub3s/msc-finalthesis

The most important programming files, code functions and data processing pipelines for the Machine learning final thesis of my Master's degree. Also, the LaTeX code of the thesis.

data-analysis latex machine-learning numpy python sklearn

Last synced: 09 Apr 2026

https://github.com/badranalyst/restaurant-reviews-sentiment-analysis-nlp-case-study

This project analyzes restaurant reviews using Natural Language Processing (NLP) for sentiment analysis. It covers data exploration, pre-processing (NLTK text cleaning), model building, prediction, and deployment. The goal is to predict sentiment from reviews using Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn.

data-analysis data-science eda exploratory-data-analysis matplotlib-pyplot model model-building numpy pandas pre-processing predictive-modeling python seaborn

Last synced: 13 Apr 2026

https://github.com/arkww/matmap

Making maps from a Database and making the user guess which map is displayed

data-analysis data-science javascript python

Last synced: 24 Apr 2026

https://github.com/techshot25/graduateadmissions

Looking at the probability of being accepted in a graduate program using a machine learning model

bayesian-regression correlation-matrices data-analysis data-science linear-regression machie-learning random-forest-regression regression ridge-regression

Last synced: 25 Feb 2025

https://github.com/srinibas-masanta/hotel-revenue-analysis-dashboard

This project focuses on analyzing hotel booking data to uncover key metrics and insights that drive revenue management decisions. By creating an interactive Power BI dashboard, the project aims to improve strategic decision-making, optimize occupancy rates, and enhance overall financial performance within the hospitality industry.

business-analytics data-analysis data-science data-visualization dax-functions hospitality powerbi

Last synced: 12 Jan 2026

https://github.com/htsandaruvan/attrition-analytics-suite-by-hello-green

I have created a comprehensive data analytics dashboard to identify factors contributing to attrition,

data-analysis data-analytics data-visualization powerbi

Last synced: 20 Jan 2026

https://github.com/dcostachar/telco-customer-churn-dashboard

An interactive Tableau dashboard using the Telco Customer Churn dataset to analyze key drivers of customer churn and develop data-driven retention strategies for the telecommunications industry.

business-intelligence customer-churn-analysis data-analysis data-visualization marketing-analytics tableau

Last synced: 09 Mar 2026

https://github.com/kiran-kumar-k3/sales-performance-dashboard

The Sales Performance Dashboard is an interactive Python-based web application that visualizes and analyzes sales data, providing actionable insights through dynamic charts and metrics.

data-analysis python streamlit

Last synced: 20 May 2026

https://github.com/alan-oliveir/state-of-data-2022

Neste projeto faço a análise da distribuição das faixas salariais para os profissionais de nível júnior para o cargo de analista, cientista e engenheiro de dados.

data-analysis jupyter-notebook pandas-python seaborn-python

Last synced: 03 Oct 2025

https://github.com/archanakokate/bank_term_deposit_prediction

Build a Decision Tree classifier to predict if the client will subscribe to a Term Deposit based on their demographic and behavioral data.

data-analysis data-visualization exploratory-data-analysis machine-learning

Last synced: 14 Sep 2025

https://github.com/arkww/chinesenewspaperwordcount

Analysis the word count of Chinese characters in Simplified and Traditional Chinese characters and comparing the results

chinese-language data-analysis data-science python

Last synced: 16 May 2026

https://github.com/gui-sitton/carsells

In this project I am an analyst on the Crankshaft List. Hundreds of free vehicle advertisements are published on the site every day. I need to study the data collected over the last few years and determine which factors influence the price of a vehicle.

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 20 May 2026

https://github.com/karanch10/fraudshield

FraudShield is a machine learning credit card fraud detection system that analyzes transaction attributes to identify suspicious activities in real time. Built with Python, SQL, and Django, it provides a user-friendly interface for fraud prediction using OpenBanking APIs and advanced detection techniques. Ideal for businesses and individuals.

data-analysis data-science data-visualization machine-learning python3

Last synced: 20 May 2026

https://github.com/jiteshshelke/codsoft

A repository showcasing three machine learning projects—Titanic Survival Prediction, Movie Rating Prediction, and Iris Flower Classification—completed during CodSoft's Data Science Internship. 🚀

codsoft codsoftinternship data-analysis data-science linear-regression logistic-regression machine-learning machine-learning-algorithms python

Last synced: 20 May 2026

https://github.com/tabibyte/azerbaijani-rapper-lyrics-data-analysis

Lyrics Data Analysis of Azerbaijani Rappers

azerbaijan data-analysis rappers

Last synced: 22 Jul 2025

https://github.com/patricksferraz/aqw-madrid-data-analysis

Interactive analysis and visualization of Madrid's air quality and weather data (2001-2016) using Python, Dash, and Jupyter. Features interactive maps, statistical analysis, and data visualization tools.

air-quality dash data-analysis data-engineering data-science data-visualization data-wrangling environmental-data environmental-science interactive-dashboard jupyter jupyter-notebook madrid open-data pandas plotly python statistical-analysis time-series weather-data

Last synced: 30 Jan 2026

https://github.com/ifigeneiatsiflidou/popular-items-sales-analysis

Two data tasks in Python: popular items by ZIP & store sales breakdown with plots.

data-analysis matplotlib pandas

Last synced: 16 May 2026

https://github.com/iwasakiyuuki/data-analysis-platform-airflow-dag

A collection of Airflow DAGs for automating data collection into our on-premises data analysis platform.

airflow airflow-dags data-analysis data-collection

Last synced: 13 May 2025

https://github.com/steviecurran/prediction-plot

Code to performs machine learning (k-nearest neighbours regression) and plot the predicted versus measured values

astrophysics c data-analysis high-redshift machine-learning pgplot python statistics tensorflow visualization

Last synced: 20 May 2026

https://github.com/RLAlpha49/AniSearch-Model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2025

https://github.com/alfioma/ada-xtq

🔗 Simplify data transfer with ada-xtq, a lightweight tool for seamless integration and efficient handling of data between platforms.

ada algorithms api-development artificial-intelligence automation data-analysis data-visualization docker machine-learning neural-networks open-source programming python software-development xtq

Last synced: 01 May 2026

https://github.com/panoschatzi/erythrocyte_study_statistical_analyses

R code for data transformation, analysis and visualization of experimental data, as well as for statistical analyses and quantitative simulations.

afex data-analysis emmeans ggplot2 lme4 purrr r rprogramming rstats rstudio statistics tidyverse visualization

Last synced: 04 Apr 2025

https://github.com/ahnaf19/clean_bankingdata

Here I tried to practice simple ETL tasks. I know how to perform these tasks in SQL, here just explored my way around using pandas as well.

data-analysis data-cleaning pandas python

Last synced: 19 Apr 2026

https://github.com/lunarwhite/lake-george-viz

Geroge Lake data analysis and visualization, ANU COMP1730/6730

data-analysis python

Last synced: 01 Nov 2025

https://github.com/jelhamm/internode-hellinger-distance-based-decision-tree

Simulations for the paper "Inter node Hellinger Distance based Decision Tree by Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib"

articles data-analysis data-mining decision-tree decision-tree-classifier hddt hellinger-distance-criterion machine-learning numpy-library paper-implementations python scipy-library simulation tree-node

Last synced: 04 Apr 2025

https://github.com/ashwin331133/sql-healthcare-data

This repository contains SQL queries designed to analyze health care data. The queries focus on patient demographics, encounter costs, and flu shot statistics, aiming to provide insights into patient behavior and financial impacts. The datasets include information on patient encounters, flu shots, and hospital admissions.

data-analysis mysql sql

Last synced: 29 Oct 2025

https://github.com/nemat-al/multivariate_data_analysis

Tasks for Multivariate Data Analysis Course @ ITMO University

data-analysis multivariate-analysis python

Last synced: 20 May 2026

https://github.com/kaushik-puttaswamy/amazon-sales-dashboard-using-tableau

The Amazon Sales Data Analysis Dashboard provides insights into key sales metrics like profit, revenue, shipment days, and units sold. It includes visualizations to assess performance by region, country, and sales channel. The dashboard helps stakeholders optimize strategies and improve profitability through data-driven analysis.

dashboard data-analysis data-visualization tableau

Last synced: 11 Jan 2026

https://github.com/silasberger/charts-analysis

Data set collection, preprocessing and analysis of singles- and album charts

charts data-analysis data-mining data-science dataset music

Last synced: 14 Sep 2025

https://github.com/ritap03/neuralnetwork-shapeclassifier

Feedforward neural network system in MATLAB for geometric shape classification. Includes data preprocessing, network training and evaluation, confusion matrix analysis, and a graphical interface for user interaction and model testing.

ai data-analysis deep-learning feedforward-network gui image-classification machine-learning matlab neural-network pattern-recognition

Last synced: 14 May 2026

https://github.com/lucashomuniz/project-15

[Dashboard] Enhancing Business Intelligence: Leveraging SQL, Python, and DAX for Strategic Insights in Sales Analysis

business-analytics business-intelligence data-analysis data-science data-visualization dax-languague machine-learning powerbi python

Last synced: 12 Jul 2025

https://github.com/mfakhriazhar/housing-price-analysis

Determining the price of a house also depends on various factors such as building area, exterior quality, and amenities. This dataset provides information on properties for sale, and through Exploratory Data Analysis (EDA), patterns and key factors affecting house prices can be identified.

data-analysis data-science data-visualization eda exploratory-data-analysis python

Last synced: 16 May 2026

https://github.com/ranxi2001/predicting-mental-health-risk

数据分析案例-精神健康预测(数据来源kaggle)

data-analysis data-visualization eda

Last synced: 27 Jun 2025

https://github.com/stkisengese/numpy-data-fundamentals

A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.

data data-analysis numpy pre-processing

Last synced: 16 May 2026

https://github.com/samruddhi3012/rfm-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 27 Jun 2025

https://github.com/carlosvinimsouza/jupyter-notebook-basic

Armazenado todos os trabalhos referentes a Ciência de Dados.

data-analysis data-science programas-jupyter-notebook python

Last synced: 11 May 2026

https://github.com/mboula/mboula.github.io

GitHub portfolio + interactive resume | Showcasing data projects in civil rights (housing), cannabis, and analytics

cannabis case-study civil-rights compliance dashboards data-analysis data-cleaning data-vizualization excel google-data-analytics housing open-data pattern-analysis portfolio pro-se public-data r sql tableau

Last synced: 10 Jul 2025

https://github.com/katarinatmb/serbia-protest-analysis

This project analyzes the frequency, regional distribution, and group characteristics of protests that emerged across Serbia following the fatal collapse of the Novi Sad train station roof in November 2024. The analysis explores how different communities responded in the aftermath of the disaster, using data visualization in RStudio

data-analysis data-visualization r r-mark rstudio

Last synced: 10 Jul 2025

https://github.com/colindean/allegheny_voter_reg_analysis

Allegheny County Voter Registration Analysis Tools

data-analysis data-science elections pandas polars python voting

Last synced: 16 May 2026

https://github.com/gaurav-van/data-analysis-projects

Collections of Projects that involves Data Analysis and Informed Decision Making

data-analysis database powerbi sql

Last synced: 06 Sep 2025

https://github.com/245839/automobile-analysis

Analysis of data on imported cars to the USA performed in Python using libraries for data analysis in the Jupyter environment.

data-analysis jupyter-notebook python

Last synced: 20 May 2026

https://github.com/pooja-manjunatha/nyc_parking_violations_dbt

This project uses dbt to transform NYC parking violations data through a layered architecture: Bronze: Raw ingested data Silver: Cleaned and enriched data Gold: Aggregated tables for analytics Using DuckDB as the warehouse backend, it ensures data quality with tests and documentation. The project enables reliable analysis of parking violations

data data-analysis data-engineering dbt duckdb python sql

Last synced: 14 May 2026

https://github.com/vlad1343/data-visualisation

Python project showcasing interactive and static visualizations using Plotly and Matplotlib. It includes analysis of CSV, JSON, and API data, turning complex datasets into clear, insightful charts.

anova api csv-files data-analysis data-visualization json matplotlib matplotlib-pyplot pandas pandas-python plotly python3 seaborn seaborn-python

Last synced: 08 Apr 2026

https://github.com/faizantkhan/python_matplotlib

Matplotlib is a powerful Python library for creating visualizations and plots. It’s widely used for data representation, making complex information more accessible and interpretable. It offers various types of plots, including line graphs, scatter plots, bar charts, histograms, and more

data-analysis data-analytics data-engineering data-science data-visualization deep-learning graphs line machine-learning machine-learning-algorithms matplotlib matplotlib-pyplot matplotlib-python python

Last synced: 20 May 2026

https://github.com/vzamboulingame/data-portfolio

This repository showcases my projects in Python and SQL, highlighting my skills in data analysis & visualization.

data-analysis data-portfolio data-science data-science-portfolio data-science-projects data-visualization jupyter-notebook portfolio python sql

Last synced: 20 May 2026

https://github.com/farhad-here/tegenx

TeGenX: Multilingual Text Generation App.TeGenX is a lightweight, interactive text generation application built with Streamlit. It leverages multiple pre-trained transformer models to generate text in both English and Persian.

data-analysis data-science deep-learning happytransformer huggingface nlp python stream text-generation text-generator textgeneration transformer web-application

Last synced: 25 Jan 2026

https://github.com/gabrielramirezv/rnaseq_2025_notas

Repository for RNA-seq class from the Undergraduate Program in Genomic Sciences.

data-analysis r rna

Last synced: 29 Mar 2025

https://github.com/aakk23/netflix_sql_project

This SQL project provides an analytical overview of Netflix's movies and TV shows dataset, uncovering key insights related to content types, ratings, release trends, and geographic distribution. It helps explore patterns in content availability, audience targeting, and regional preferences to support data-driven decisions.

data-analysis netflix-data-analysis postgresql sql

Last synced: 10 Apr 2025

https://github.com/samruddhi3012/health-care-analytics

Hi! This repo involves analyzing the Healthcare analytics using Advanced Microsoft Excel.

dashboard data-analysis data-visualization healthcare microsoft-excel pivot-chart pivot-tables vlookup

Last synced: 29 Mar 2025

https://github.com/saravanansuriya/energy-consumption-analysis

Project will analyze energy usage and greenhouse gas (GHG) emissions of Ontario's Broader Public Sector (BPS) organizations, leveraging a comprehensive database of reported data in Power Bi

data-analysis data-cleaning powerbi python-script

Last synced: 22 Mar 2025

https://github.com/mkoeppe/jiawei-computations

Computations supporting Chapters 2 and 3 of Jiawei Wang's dissertation "Subadditivity of Piecewise Linear Functions", UC Davis, Ph.D. program in Mathematics, 2020

benchmark-framework branch-and-bound cluster cutting-planes data-analysis hpc integer-programming reproducible-research sagemath

Last synced: 10 Aug 2025

https://github.com/engraulleite/local-data-warehousing-with-docker

Creating a DW from 0 to hero. Starting with logical and physical modeling to valuable reports.

airbyte data-analysis datawarehouse docker etl-pipeline metabase pgadmin4 postgresql

Last synced: 01 May 2026

https://github.com/s-narasimman/zepto_inventory_sql_data_analysis

This project focuses on data cleaning, exploration, and analysis of product information from the Zepto dataset using SQL. It provides actionable insights into pricing, stock availability, discounts, and category-level performance.

aggregation categorization csv data-analysis data-cleaning kaggle postgresql sql zepto

Last synced: 16 May 2026

https://github.com/deborangueira/campeonado_kaggle_2025

Desenvolvimento de um modelo de machine learning para prever o sucesso de startups. O objetivo é identificar quais empresas têm maior probabilidade de se tornarem casos de sucesso no mercado.

computacao data-analysis desafio kaggle modulo3 ponderada

Last synced: 16 May 2026

https://github.com/pabi1234810/data_analysis_zepto

A comprehensive SQL-based business intelligence solution for analyzing grocery store product data, inventory management, and pricing strategies. This project demonstrates end-to-end data analysis workflow from raw data exploration to actionable business insights.

analytics csv data-analysis data-science database excel kaggle kaggle-dataset mathematics pgadmin4 sql utf-8 zepto

Last synced: 01 Nov 2025

https://github.com/alinababer/data-science-and-insight-agent-rag-llama3-lava-llm

Data-Science-and-Insight-Agent-RAG-LLama3-Lava-LLM-Django-WebApplication is an advanced AI-driven chatbot designed to assist in data science, document analysis, and image interpretation. This repository contain the Datascience Agent of this project.

artificial-neural-networks classifcation data-analysis data-engineering data-visualization datascience large-language-models llama2 lstm machine-learning python random-forest regression

Last synced: 01 Jan 2026

https://github.com/an0n1mity/spamclassifiereval

A repository for evaluating the misclassification rate of spam classification models using a threshold-based approach.

data-analysis machine-learning natural-language-processing python-programming spam-classification text-classification

Last synced: 02 Nov 2025

https://github.com/hemangsharma/hotel-revenue-booking-analysis

This project provides a comprehensive revenue and reservation analysis for Highfield Hotel using historical data exported from booking systems and internal revenue reports. The goal is to derive actionable insights to improve room profitability, understand booking patterns, and support data-driven decision-making.

analysis data-analysis data-visualization hotel

Last synced: 10 Aug 2025

https://github.com/datalopes1/fifa21_datacleaning

Neste projeto será feito o processo de limpeza e manipulação a partir do dataset FIFA 21 messy, raw dataset for cleaning/ exploring, que pode ser encontrado no Kaggle, com licensa CC0: Public Domain e enviado por Rachit Toshniwal.

data-analysis data-cleaning python

Last synced: 30 Apr 2026

https://github.com/jakobzmrzlikar/pca-on-genomes

An analysis of human genome mutations from different populations.

data-analysis genome-analysis pca-analysis

Last synced: 16 May 2025

https://github.com/edanur-y/abalone-age-prediction-with-regression-models

Comparing the performances of simple linear, multiple linear, multi-layer perceptron and k-nearest neighbors regressions on abalone data to predict the age.

data-analysis hyperparameter-tuning missing-values-analysis outlier-analysis python recursive-feature-elimination

Last synced: 20 May 2026

https://github.com/ksharma67/eda-on-ipl

In this python notebook, analysis of IPL matches from 2008 to 2020 is done using python packages like pandas, matplotlib and seaborn.

data-analysis data-science eda matplotlib numpy pandas python seaborn

Last synced: 07 May 2026

https://github.com/nafisrayan/decentai

A comprehensive platform built using ReactJS and Flask, combining blockchain technology with AI to create a secure and intelligent space for community engagement and policy discussions. Leverages NLP and LLM for meaningful interactions and sentiment analysis while ensuring data security and user privacy.

chatbot data-analysis data-visualization flask gemini gemini-ai gemini-ai-chatbot gemini-api government government-tech llm mongodb nlp polls python react tailwind voting-systems winknlp

Last synced: 12 Apr 2026

https://github.com/martachesnova/sql

Performing data modeling (ERD) and data engineering. Then, writing series of SQL queries to analyze Employee Database of a company.

data-analysis data-engineering data-modeling erd postgresql sql

Last synced: 16 May 2026

https://github.com/hassanislam463/british-airways-data-science

Analyze Skytrax reviews to uncover customer sentiments and key themes while predicting booking behavior using machine learning. This repository includes data collection, analysis, and modeling scripts alongside concise, visualized insights to improve customer experience and operational efficiency.

data-analysis data-science data-visualization

Last synced: 28 Mar 2025

https://github.com/hassanislam463/sentiment_analysis_of_financial_news_headlines_and_affect_on_stock_price_prediction

This project analyzes financial news sentiment using a fine-tuned RoBERTa model and integrates it with stock data to predict price movements using LSTM and GRU. It highlights the role of sentiment in enhancing stock market forecasting.

data-analysis data-science data-visualization deep-learning lstm-neural-networks nlp-machine-learning

Last synced: 28 Mar 2025

https://github.com/adrianlardies/multi-asset-financial-analysis

Comparative analysis of bitcoin, gold and S&P 500 in relation to macroeconomic indicators (VIX, interest rate, CPI). We explore the evolution of a $100 monthly investment in these assets, presenting visualizations to evaluate their performance and potential as financial diversification tools.

data-analysis data-science matplotlib pandas python seaborn

Last synced: 09 May 2026

https://github.com/hemant-kumar786/heart-disease-prediction

Heart Disease Analysis project in RStudio using statistical methods and data visualization. Includes data cleaning, exploratory data analysis (EDA), correlation study, and insights on key health indicators influencing heart disease.

correlation-study data-analysis data-visualization eda healthcare heart-disease r rstudio statical-analysis

Last synced: 02 Nov 2025

https://github.com/chrisrobertsjr/chrisrobertsjr

Welcome to my Github Profile!

data data-analysis java r sql statistics

Last synced: 03 May 2026

https://github.com/satyacoder29/smartfinance-dynamic-financial-dashboard

SmartFinance: Dynamic Financial Dashboard is an interactive tool designed to visualize key financial metrics like revenue, expenses, and profit. It features real-time data updates, charts, slicers, and navigation for easy analysis. This dashboard helps businesses make data-driven decisions and optimize financial performance.

data-analysis data-cleaning data-modeling data-visualization powerbi powerbi-desktop powerbi-visuals powerquerym

Last synced: 13 Feb 2026

https://github.com/nick-peter-marcus/chocolate-bar-analysis

Analyzing Chocolate Bar Features and Ratings - Data Visualization, Decision Trees, Random Forest

data-analysis data-visualization decision-trees python random-forest seaborn sklearn

Last synced: 10 May 2026

https://github.com/karishmagupta05/e-commerce-sales-dashboard

This project is an interactive E-Commerce Sales Dashboard built using Power BI. It provides key insights into sales, profit, and customer behavior through visually engaging charts and graphs.

data-analysis data-visualization powerbi

Last synced: 09 Feb 2026

https://github.com/erseco/ugr_tratamiento_inteligente_datos

Repositorio de trabajo de la asignatura Tratamiento Inteligente de Datos del Máster en Ingeniería Informática de la Universidad de Granada (UGR)

data-analysis data-mining ugr

Last synced: 26 Apr 2026