An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/alpkanoz/ibm_data_science_professional_certificate

The repository contains projects and training materials carried out throughout the IBM data science professional course.

classification clustering data-analysis data-science data-visualization dataframe ibm ibm-watson machine-learning mathplotlib pandas predictive-modeling python scikit-learn

Last synced: 07 Mar 2026

https://github.com/kushagrakumar04/visual-age-distribution

A Bar chart or histogram to visually depict the distribution of a categorical or continuous variable, such as the age distribution or gender composition within a population. This graphical representation provides a clear and insightful overview of the data's patterns and trends.

data-analysis data-science google-colab

Last synced: 21 Jun 2025

https://github.com/rezowanrahat/netflix_analysis

Data analysis of Netflix content using Python, Pandas, and Seaborn

data-analysis data-visualization netflix pandas python

Last synced: 07 May 2026

https://github.com/atharvkadammm/calmlytic

An end-to-end machine learning project that predicts anxiety severity using classification models (Naive Bayes, Decision Tree, SVM, Logistic Regression, XGBoost), based on lifestyle, health, and behavioral features.

anxiety-prediction classification csv data-analysis data-preprocessing-and-cleaning data-science data-visualization ensemble-learning logistic-regression machine-learning-algorithms matplotlib mental-health numpy pandas python sci-kit-learn seaborn supervised-learning svm xgboost

Last synced: 21 Jun 2025

https://github.com/atharvkadammm/suicide-prediction-system

A machine learning project predicting suicide risk based on multiple socio-economic and environmental factors using data mining techniques.

csv data-analysis data-science data-visualization datamining exploratory-data-analysis feature-engineering machine-learnin matplotlib mental-health numpy pandas riskassesment seaborn sklearn suicide-prediction supervised-

Last synced: 01 Jul 2025

https://github.com/teditae/data-analysis-with-pandas

Mini data science projects focused on Pandas-powered analysis.

data-analysis data-manipulation pandas python

Last synced: 30 Apr 2026

https://github.com/pkjjoshi/restaurants-analysis

Performed beginner-level EDA on a restaurant dataset using Python. Analyzed top cuisines, city-wise ratings, price ranges, and online delivery impact using Pandas and Matplotlib. Includes 4 well-structured notebooks with visual insights.

beginner-project data-analysis data-visualization exploratory-data-analysis jupyter-notebook pandas python restaurant-data seaborn

Last synced: 21 Jun 2025

https://github.com/adnanrahin/nlp-with-disaster-tweets

Kaggle Competition: Predict which Tweets are about real disasters and which ones are not. Natural Language Processing.

data-analysis data-science data-visualization kaggle-competition machine-learning natural-language-processing regular-expression tweets

Last synced: 21 Jun 2025

https://github.com/sakan811/gachascope

Evaluate the cost-effectiveness of various in-app purchase bundles available in gacha games.

data data-analysis data-visualization game honkai honkai-star-rail honkai-starrail hoyoverse javascript nextjs tableau tableau-public typescript wutheringwaves

Last synced: 04 May 2026

https://github.com/engraulleite/local-data-warehousing-with-docker

Creating a DW from 0 to hero. Starting with logical and physical modeling to valuable reports.

airbyte data-analysis datawarehouse docker etl-pipeline metabase pgadmin4 postgresql

Last synced: 01 May 2026

https://github.com/maxbiostat/diehl_ebola_cell_2016

supplementary code and data to Diehl et al, 2016 (Cell)

data-analysis data-visualization disease-spread ebola mutation

Last synced: 11 Jul 2025

https://github.com/vedantshi/tableau-bike-data-dashboard

London Bike Rides Analysis explores bike usage patterns using data visualization and machine learning. It identifies trends through a dynamic moving average, analyzes weather impact with heatmaps, and provides actionable insights via an interactive Tableau dashboard. Tools: Python, Tableau.

data-analysis data-visualization python tableau weather-data

Last synced: 16 May 2026

https://github.com/balajimohan18/loan-clustering-datascience-project

This project uses Machine Learning to Cluster loan together based on their similarities. The project uses a dataeset of loan application which includes information about the Loan amount and Balance. The project then use the clustering algorithm to group the loan together based on the similarities.

clustering-algorithm data-analysis data-science data-visualization eda kmeans-clustering machine-learning sql unsupervised-learning

Last synced: 27 Jul 2025

https://github.com/datalopes1/fifa21_datacleaning

Neste projeto será feito o processo de limpeza e manipulação a partir do dataset FIFA 21 messy, raw dataset for cleaning/ exploring, que pode ser encontrado no Kaggle, com licensa CC0: Public Domain e enviado por Rachit Toshniwal.

data-analysis data-cleaning python

Last synced: 30 Apr 2026

https://github.com/jayita11/eda-student-exam-performance

This project performs Exploratory Data Analysis (EDA) and hypothesis testing on student performance data. It explores trends based on attributes like gender, race/ethnicity, parental education, lunch type, and test preparation course completion.

data-analysis eda hypothesis-testing matplotlib pandas python seaborn statsmodels student-performance-analysis

Last synced: 11 Jul 2025

https://github.com/progati00/marketing-mix-modeling-mmm-for-marketing-budget-optimization

A Marketing Mix Modeling (MMM) project using Python to analyze channel performance, calculate ROI, and simulate marketing budget changes for better business decisions. Includes a trained Linear Regression model, ROI analytics, and a Flask API for revenue prediction.

api budget-optimization data data-analysis data-science ecommerce eda flask jupyter-notebook linear-regression machine-learning marketing-analytics marketing-mix-modeling python roi-analysis vscode

Last synced: 14 Apr 2026

https://github.com/palakjainanalyst/ecommerce-customer-spending-analysis

An end-to-end Ecommerce analytics project uncovering customer spending trends using Excel, Python, SQL, and Power BI. From raw data to interactive dashboards, this project delivers deep insights on spending patterns, high-value customer segments - showcasing a complete data-to-decisions workflow.

data-analysis data-visualization database ecommerce excel jupyter-notebook powerbi python spending sql

Last synced: 06 May 2026

https://github.com/rociobenitez/airbnb-data-mining

Análisis detallado y modelado predictivo de alojamientos en Madrid utilizando técnicas de Big Data y estadística en R, enfocado en optimización de datos y predicción de características de propiedades.

airbnb data-analysis data-mining estadistica prediction-model predictive-analytics predictive-modeling qmd r rstudio

Last synced: 23 Jun 2025

https://github.com/martachesnova/sql

Performing data modeling (ERD) and data engineering. Then, writing series of SQL queries to analyze Employee Database of a company.

data-analysis data-engineering data-modeling erd postgresql sql

Last synced: 16 May 2026

https://github.com/nikbarb810/motif_detection_in_r

Motif Detection for TFBS in Glycolysis and Glyconeogenesis pathways

bioinformatics data-analysis null-hypothesis pwm r

Last synced: 23 Jun 2025

https://github.com/jofaval/boston-housing

Regression Analysis into the Boston Housing in-demand pricing in 1978

boston-housing data-analysis data-science data-visualization machine-learning python regression

Last synced: 16 May 2026

https://github.com/jwt218/sinc

MATLAB Standardization and Isotope Normalization for CSIA (with integrated correction and uncertainty quantification)

data-analysis geochemistry isotopes matlab

Last synced: 23 Jun 2025

https://github.com/hassanislam463/british-airways-data-science

Analyze Skytrax reviews to uncover customer sentiments and key themes while predicting booking behavior using machine learning. This repository includes data collection, analysis, and modeling scripts alongside concise, visualized insights to improve customer experience and operational efficiency.

data-analysis data-science data-visualization

Last synced: 28 Mar 2025

https://github.com/farzeennimran/fashion-mnist-dataset-classification-using-neural-network

Implementation of a Multi-layer Perceptron classifier with hyperparameter tuning and k-fold cross-validation employing GridSearchCV for classifying images on the Fashion MNIST dataset 👗👚👖

artificial-intelligence data-analysis data-mining data-science dataset deep-learning fashion-mnist-dataset gridsearchcv hyperparameter-tuning kfold-cross-validation machine-learning multilayer-perceptron-network neural-network numpy pandas python sklearn

Last synced: 03 Apr 2026

https://github.com/hassanislam463/sentiment_analysis_of_financial_news_headlines_and_affect_on_stock_price_prediction

This project analyzes financial news sentiment using a fine-tuned RoBERTa model and integrates it with stock data to predict price movements using LSTM and GRU. It highlights the role of sentiment in enhancing stock market forecasting.

data-analysis data-science data-visualization deep-learning lstm-neural-networks nlp-machine-learning

Last synced: 28 Mar 2025

https://github.com/gappeah/credit-card-transactions-fraud-detection-project

The Credit Card Transactions Fraud Detection Project repository is designed to analyse and detect fraudulent transactions in credit card data.

data-analysis postgresql sql

Last synced: 12 Jul 2025

https://github.com/myktorijus/retention-cohort

Extracted cohort data using SQL in BigQuery focusing on weekly retention from week 0 to week 6

bigquery data-analysis data-visualization powerbi sql

Last synced: 13 Jul 2025

https://github.com/jhermienpaul/google-data-analytics-program

Hands-on learning materials from the 8-course Google Data Analytics Professional Certificate program, covering foundational data skills, tools, and real-world business problem-solving

bigquery dashboard data-analysis data-analytics data-modeling data-storytelling data-visualization data-wrangling descriptive-analytics diagnostic-analytics etl-pipeline r-programming rstudio sql tableau

Last synced: 13 Jul 2025

https://github.com/guilherme-marcello/r-data-analysis-piechart

Reading RDS files, processing and presentation in pie charts

data-analysis data-visualization pie-chart r

Last synced: 13 Jul 2025

https://github.com/chrisrobertsjr/chrisrobertsjr

Welcome to my Github Profile!

data data-analysis java r sql statistics

Last synced: 03 May 2026

https://github.com/satyacoder29/smartfinance-dynamic-financial-dashboard

SmartFinance: Dynamic Financial Dashboard is an interactive tool designed to visualize key financial metrics like revenue, expenses, and profit. It features real-time data updates, charts, slicers, and navigation for easy analysis. This dashboard helps businesses make data-driven decisions and optimize financial performance.

data-analysis data-cleaning data-modeling data-visualization powerbi powerbi-desktop powerbi-visuals powerquerym

Last synced: 13 Feb 2026

https://github.com/simranrayait51/internshala-ds-projects

Projects from the Internshala Data Science course, showcasing my skills in Excel, SQL, Python, and Tableau for data manipulation, analysis, and visualization.

data-analysis data-science data-visualization excel internshala-project pgc postgresql python sql tableau

Last synced: 17 May 2026

https://github.com/nick-peter-marcus/chocolate-bar-analysis

Analyzing Chocolate Bar Features and Ratings - Data Visualization, Decision Trees, Random Forest

data-analysis data-visualization decision-trees python random-forest seaborn sklearn

Last synced: 10 May 2026

https://github.com/percival33/machine-learning-engineering

Uni project about enhancing fictional music streaming service, by developing machine learning models to generate popular playlists

data-analysis data-science machine-learning python

Last synced: 14 Jul 2025

https://github.com/bonelesswater/tradingbot

This project is a web application for a trading bot that displays financial data and indicators. It includes functionality for researching financial data, displaying market indicators, and more.

ai azure css d3 data-analysis django html javascript jquery materializecss python stock-market

Last synced: 30 Dec 2025

https://github.com/karishmagupta05/e-commerce-sales-dashboard

This project is an interactive E-Commerce Sales Dashboard built using Power BI. It provides key insights into sales, profit, and customer behavior through visually engaging charts and graphs.

data-analysis data-visualization powerbi

Last synced: 09 Feb 2026

https://github.com/ishansurdi/data-visualisation-empowering-business-with-effective-insights

The following tasks are completed for Data Visualization: Empowering Business with Effective Insights on Forage in October 2024. It is important to note that this should not be interpreted as an endorsement.

chart communicating-insights-and-analysis dashboard data data-analysis forage powerbi powerbi-visuals tableau tata tata-group virtual-internship visual visualization

Last synced: 17 Feb 2026

https://github.com/caesaredia/chicago-taxi-data-insights

Exploratory data analysis and hypothesis testing on Chicago taxi trip data to uncover patterns in demand and the effects of rainy weather on travel time.

chicago data-analysis data-visualization exploratory-data-analysis hypothesis-testing python statistical-analysis taxi-trips weather-analysis

Last synced: 17 May 2026

https://github.com/ccoolbaugh/individualized_cooling_data_analysis

Matlab code to analyze data collected during a brown adipose tissue individualized cooling protocol.

brown-adipose-tissue cold-exposure data-analysis ibutton matlab skin-temperature thermoregulation

Last synced: 18 Aug 2025

https://github.com/theo-jenkins/fmri-brain-scan-analyser

MATLAB toolkit for reading, analysing and simulating rs-fMRI brain scans in .nii format.

algorithms data-analysis data-visualization fmri-data-analysis matlab neuroimaging

Last synced: 15 Jul 2025

https://github.com/mahmoudwal27/brazilian_ecommerce

This project explores and cleans the Olist Brazilian E-Commerce dataset using Python (Pandas) to prepare it for Power BI visualization. The process includes loading data, performing exploratory analysis, handling missing values and duplicates, formatting key columns, and exporting clean datasets.

analytics data-analysis data-analysis-python google-cloud python

Last synced: 16 May 2026

https://github.com/priyanshubiswas-tech/farmlab-report-and-case-study-iot

This project was developed through live interviews and case studies with farmers in the year 2023 to address key agricultural challenges. The device provides real-time farm insights for better decision-making. Future plans include a digital portal, increased range, more sensors, and improved design. Open to collaboration!

arduino-ide c case case-study data data-analysis iot iot-device serialization

Last synced: 15 Jul 2025

https://github.com/miusarname2/proyectos-final-analitica-de-datos

Welcome to the repository where the magic of data analytics comes to life! This is the result of our effort and creativity in the subject of data analysis at the Universidad Cooperativa de Colombia (UCC). Here we keep everything we did to analyse data, draw cool conclusions and solve the workshop we were given. 🎯📊

data-analysis data-science data-visualization pip python

Last synced: 15 Jul 2025

https://github.com/jarrarshahid/nutrition-calculator

Simple python app to calculate nutritions in everyday meals.

data-analysis health json jupyter-notebook logic-programming python

Last synced: 15 Jul 2025

https://github.com/hosseinkarimi128/zed-one

An AI-powered assistant that analyzes CSV data using natural language queries to generate pandas code and visualizations.

ai-data-analysis automated-pandas automated-pandas-queries csv data-analysis fastapi langchain machine-learning matplotlib nlp openai pandas restful-api summarization visualization-tools

Last synced: 07 Apr 2026

https://github.com/vishnu-vamshii/layoffs-data-analysis-in-sql

This project focuses on the cleaning and exploratory analysis of a dataset containing layoff information. It includes data deduplication, standardization of columns, handling null and blank values, and analyzing layoffs by company, industry, country, and date. Various SQL queries are used to explore trends and patterns in layoffs over time.

data-analysis eda mysql

Last synced: 15 Jul 2025

https://github.com/viper373/lol-dataanalytics

腾讯游戏-英雄联盟赛事20/21/22年数据综合分析预测

crawler-python data-analysis jupyter-notebook lol python spider

Last synced: 15 Jul 2025

https://github.com/shrutiijoshi/airbnb-listing-reviews

Airbnb is an online marketplace that connects people who want to rent out their homes with travelers seeking accommodations.

data-analysis matplotlib-pyplot pandas-python python seaborn

Last synced: 17 May 2026

https://github.com/gagan8605/zepto_sql_analysis

This project explores and analyzes the inventory data of Zepto, a rapidly growing 10-minute grocery delivery platform in India. The dataset contains over 3,000+ SKUs across key product categories such as Fruits & Vegetables, Dairy, Beverages, Packaged Foods, and more. The analysis was performed using PostgreSQL, covering both data cleaning and bus

cleaning-data data-analysis database-management postgresql sql

Last synced: 16 Jul 2025

https://github.com/amr-yasser226/interactive-sales-analytics-dashboard

An interactive web-based dashboard for visualizing multinational electronics sales data. This project for the DSAI 203 course integrates a Python/Flask backend with an amCharts frontend to provide dynamic insights into product revenues, sales distribution, and employee statistics across different countries.

am5charts amcharts business-intelligence css dashboard data-analysis data-analytics data-visualization flask html javascript python sqlalchemy sqlite web-application

Last synced: 13 Apr 2026

https://github.com/harmanveer-2546/movie-industry

Investigate the film industry to gain sufficient understanding of what attributes to success and in turn utilize this analysis to create actionable recommendations for companies to enter the industry.

business business-analytics data-analysis datatime film-industry graphs matplotlib movie-database numpy pandas python scraping-websites seaborn visualization web-scraping-python

Last synced: 10 Apr 2026

https://github.com/maheera421/pandas

Implementation of essential Pandas functions.

data-analysis data-manipulation pandas-dataframes pandas-datareader pandas-python

Last synced: 17 Jul 2025

https://github.com/venkat-023/thyroid-cancer-prediction

This project aims to develop a machine learning pipeline to predict thyroid cancer based on patient data. The dataset was sourced from multiple public repositories, cleaned, and merged to create a comprehensive dataset for modeling. Various classification algorithms were implemented, including Random Forest, Logistic Regression, K-Nearest Neighbors

data-analysis data-cleaning deep-learning ensembling-methods hyperparameter-tuning machine-learning-algorithms nueral-networks

Last synced: 17 May 2026

https://github.com/ofir-frd/predict-success-of-a-restaurant

Apply machine learning on a restaurante database. Study and analyse the data for prediction of a successful restaurant.

data-analysis data-science machine-learning visualization

Last synced: 11 Jun 2026

https://github.com/nikbarb810/covid_growth_rate_390.51

Exploring Covid Growth Rate of European Population using genetic data analysis

bioinformatics data-analysis r rcpp

Last synced: 07 Apr 2026

https://github.com/grindelfp/two-data-manipulative-tasks

Two simple tasks on data analysis and processing.

data-analysis ipynb mlda

Last synced: 17 Feb 2026

https://github.com/ahmeddhus/exploring-football-data-analysis

Learning and exploring data analysis through real-world datasets using Python and StatsBomb APIs and mlpsoccer library

data-analysis jupyter-notebook mplsoccer python statsbomb

Last synced: 17 May 2026

https://github.com/cyblx/clustering

This project explores clustering techniques and supervised learning applied to World Cup team performance analysis. The methodologies include K-Means, DBSCAN, K-Nearest Neighbors, Gaussian Mixture Models (GMM), and Agglomerative Clustering.

clustering data-analysis dbscan gmm kmeans supervised-learning unsupervised-learning world-cup

Last synced: 18 Jul 2025

https://github.com/theveryhim/basic-data-analysis

Working with basic Python tools frequently used in data science

data-analysis data-processing visualization

Last synced: 18 Jul 2025

https://github.com/theveryhim/web-scraping-and-statistical-tests

Crawling web for data and perform statistical tests to verify judgments

data-analysis hypothesis-testing web-scraping

Last synced: 18 Jul 2025

https://github.com/theveryhim/massive-text-processing

cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework

big-data data-analysis frequent-itemsets massive-datasets pyspark text-preprocessing

Last synced: 18 Jul 2025

https://github.com/abhinavhariyal/diwali-sales-analysis

This project is based on data visualization and analysis using python and jupyter notebook on the data for diwali sales.

data-analysis data-visualization jupyter python

Last synced: 19 May 2026

https://github.com/amyanchen/sf-airbnb

Exploratory Data Analysis of San Francisco Airbnb's

data-analysis data-science data-visualization r rmarkdown statistics

Last synced: 18 Jul 2025

https://github.com/michael-angelo-mootoo/quanta-app

Quanta is an open source statistical package app / toolkit for neuroscience and general computational descriptive and inferential statistics.

computational-statistics customtkinter data-analysis descriptive-statistics gui-application inferential-statistics neuroscience python r statistical-analysis statistics tkinter-python

Last synced: 16 May 2026

https://github.com/pdiegel/currencytracker

A Python application that fetches real-time currency exchange rates from an API, securely stores the data in an SQLite database, and includes error handling, logging, and good programming practices for reliable and periodic data capturing.

analysis api currency data-analysis data-capture logging python python3 sqlite3 tracker

Last synced: 09 Sep 2025

https://github.com/alexjackson1/commons-indicative-votes

A cluster analysis of the House of Commons' Indicative Brexit Voting Process on 27 Match 2019

data-analysis politics r

Last synced: 19 Jul 2025

https://github.com/ggarciajavier/udacity-dalf-project3-test-perceptual-phenomenom

Work performed for the 3rd project of Udacity Data Analyst Nanodegree: statistical testing of a perceptual phenomenom (Stroop task).

data-analysis python statistical-inference udacity-data-analyst-nanodegree

Last synced: 18 May 2026

https://github.com/mh0386/motorcycle_data_analysis

Data analysis applied to motorcycle dataset.

data-analysis

Last synced: 19 Jul 2025

https://github.com/yasir-arafah/nyc-trip-fare-prediction-using-tcn

"NYC Trip Fare Prediction Using Temporal Convolutional Networks (TCN)" is a Data Analytics Project where the trip and fare data of NYC taxi are combined and then analyzed using Pyspark and visualized using Matplotlib library. The project predicts the fare by using Temporal Convolutional Neural Network.

colab data-analysis matplotlib nyc-taxi-dataset pyspark python

Last synced: 29 Apr 2026

https://github.com/malexandersalazar/tools-python-mssql-statistics-descriptor

A lightweight tool based on sweetviz that generates high-density visualizations to kickstart Exploratory Data Analysis within Microsoft Azure SQL Database using ODBC with just one line of code

azure-sql-database data-analysis data-visualization eda python

Last synced: 16 May 2026

https://github.com/preciousclement/maternal-experiences-in-nigeria

This repository contains a Python-based project that generates realistic synthetic data simulating the maternal health journey of 5,000 women in Nigeria.

data-analysis data-generation maternal-health nigeria public-health python

Last synced: 08 May 2025

https://github.com/jm199504/data-analysis-practice

数据分析练习(Titanic / BankCustomers)

data-analysis python

Last synced: 02 May 2026

https://github.com/jwt218/isonq

MATLAB package for Qtegra-generated data file processing.

data-analysis geochemistry isotopes matlab

Last synced: 03 Apr 2025

https://github.com/sharoonjoseph321/social_media_eda

Data Analysis on social media apps ,using pandas, python, matplotlib.

data data-analysis data-science data-visualization matplotlib programming-language project python pythonprojects

Last synced: 03 Mar 2025

https://github.com/alejandrolara11/desafio_latam_introduccion_analisis_de_datos

Repositorio del curso "Introducción al Análisis de Datos" de Desafío Latam. Ejercicios prácticos realizados durante el curso, enfocados en análisis de datos con Python, Pandas, y visualización básica.

data-analysis data-science data-visualization matplotlib numpy pandas python seaborn statsmodels

Last synced: 29 Apr 2026