An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/chitranjan806/greyatom_learning_repo

A Collection of Projects, Tasks and Challenges as part of Data Science Masters - Transition Program at GreyAtom.

data-analysis data-science greyatom python3

Last synced: 29 Aug 2025

https://github.com/akansharajput280799/strategic-analysis-of-retail-brand-in-south-america-using-sql

Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services

bigquery data-analysis data-science database database-schema google-bigquery mysql-server sql

Last synced: 19 May 2026

https://github.com/mizzy/tweetduck

Twitter Archive to DuckDB Importer - Extract and import Twitter archive data (2025 format) into DuckDB for analysis

archive cli data-analysis duckdb golang twitter

Last synced: 02 Sep 2025

https://github.com/syarwinaaa09/hypothesis-testing-with-mens-and-womens-soccer-matches

a data-driven exploration of international men's and women's football (soccer) match results using Python

data-analysis data-visualization football jupyter-notebook men-vs-women pandas python soccer sports-analytics visualization

Last synced: 05 May 2026

https://github.com/dacrol/filterdataset

Filters a dataset based on attributes

data-analysis dataset deep-learning machine-learning python python3

Last synced: 25 Jul 2025

https://github.com/masum184e/exploratory_data_analysis_projects

This space to showcase my journey in exploring various datasets, uncovering patterns, and extracting meaningful insights. Each project highlights different aspects of EDA, demonstrating techniques and tools that are essential for making sense of data.

data-analysis data-analysis-projects data-science data-science-projects eda eda-projects exploratory-data-analysis exploratory-data-analysis-projects

Last synced: 31 Mar 2025

https://github.com/kislerdm/github-repo-details

Application and library to fetch open source libraries details from github to perform due diligence

data-analysis golang opensource

Last synced: 01 Jul 2025

https://github.com/matheusafonseca/c111

Este repositório é dedicado ao armazenamento e organização dos códigos desenvolvidos na disciplina C111 - Análise de Dados, oferecida pelo Instituto Nacional de Telecomunicações (INATEL).

data-analysis matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/jiyanshgarg/delhivery-logistics-data-analysis

This project analyzes Delhivery's logistics delivery dataset to understand delivery performance, route efficiency, and operational patterns using data analytics techniques. The analysis focuses on transforming raw segment-level logistics data into meaningful trip-level insights that can help improve delivery efficiency and route planning.

business-insights-and-recommendations data-analysis data-cleaning-and-preprocessing data-visualization exploratory-data-analysis feature-engineering feature-extraction feature-selection hypothesis-testing outlier-detection outlier-treatment

Last synced: 12 Jun 2026

https://github.com/vyjayanthipolapragada/marketing_statistical_analysis

Statistical analysis of customer data and their impact on the sales of products based on marketing campaigns

customer-data data-analysis dataframes marketing matplotlib numpy pandas python seaborn statistical-analysis

Last synced: 11 Apr 2026

https://github.com/vanshuchaudhary/retail-sale

project uses MySQL to analyze retail sales data, focusing on customer behavior, sales trends, and product performance. The dataset includes transactions, customer demographics, and purchase details, helping businesses optimize strategies. Key Insights: 📊 Revenue Analysis – Total sales, top-spending customers 📅 Sales Trends

business-intelligence customer-behavior customer-behavior-analysis data-analysis mysql predictive-analytics retail-analytics sales-analysis sql-queries

Last synced: 23 Mar 2025

https://github.com/ilovenooodles/probstat-water-potability

Tugas Besar Probabilitas dan Statistika 1

csv data-analysis jupyter-notebooks python

Last synced: 03 May 2026

https://github.com/geoninja/reddit_data_analysis

Data analysis application presented at the 2016 NTC (Non-profit Technology Conference) in San Jose, CA.

data-analysis python reddit-data-analysis text-analysis

Last synced: 03 May 2026

https://github.com/anderson-andre-p/exploratory-data-analysis.roller-coaster

This repository contains an exploratory data analysis (EDA) project focused on roller coasters. The project involved organizing, cleaning, and visualizing the data to gain insights into roller coasters' characteristics and performance.

data-analysis eda exploratory-data-analysis exploratory-data-visualizations notebook

Last synced: 15 Mar 2025

https://github.com/shafaq-aslam/data-analytics-dairy

A comprehensive repository for Data Analytics learning and projects. It includes MySQL, Python, Power BI, Tableau, and Excel. The goal is to analyze data, generate insights, and create compelling visualizations for real-world datasets.

data-analysis data-visualization excel excel-based-data-analysis powerbi python-scripts sql sql-queries sql-queries-for-data-manipulation sql-query-for-data-visualization tableau

Last synced: 20 Jan 2026

https://github.com/agrdatasci/climmob-analysis

Workflow for data analysis applied on ClimMob.net

citizen-science data-analysis workflow

Last synced: 24 Jun 2025

https://github.com/vriv06/btk-trials-data-analysis

Data analysis of Bioteksa plant nutrition trials for measure nutrient efficacy, resistance against biotic and abiotic factors, etc.

agriculture-research confluence crops data-analysis quarto r

Last synced: 23 Mar 2025

https://github.com/82luli02/sakila_dvd_rental_database_analysis

Analysis of the Sakila DVD Rental database using SQL

data data-analysis data-science data-visualization sql

Last synced: 10 Mar 2026

https://github.com/fatihilhan42/eda-spacex-launches-falcon9-and-falcon-heavy

In this project, we analyze the space flight data of Spacex space research company Falcon 9 rocket.

data-analysis data-science data-visualization eda elonmusk spacex

Last synced: 23 Mar 2025

https://github.com/BAMresearch/Utah-SAXS-Tools

The Utah SAXS Tools (USToo), adapted for Python 3, originally by David P. Goldenberg, 2009-2012

data-analysis saxs small-angle-scattering small-angle-xray-scattering

Last synced: 16 Jan 2026

https://github.com/nikhil-donthusaram/heartdiseaseprediction

Heart Disease Prediction App is a machine learning web application that predicts the likelihood of heart disease based on user medical inputs. Built using a Decision Tree Classifier and deployed with Streamlit for an interactive, user-friendly interface.

data-analysis descision-tree joblib jupyter-notebook machine-learning matplotlib numpy pandas python3 seaborn sklearn streamlit vscode

Last synced: 11 Apr 2026

https://github.com/steviecurran/dashboards

Compilation of Links to the dashboards in the other repositories

dashboard data-analysis data-science data-visualization pandas powerbi python-dash tableau

Last synced: 21 Feb 2026

https://github.com/walid0912/rfm_analysis

RFM Analysis is employed to comprehend and categorize customers according to their purchasing patterns. RFM, an acronym for recency, frequency, and monetary value, comprises three essential metrics that offer insights into customer involvement, allegiance, and significance to a business.

data-analysis data-visualization python rfm-analysis

Last synced: 02 Sep 2025

https://github.com/deller23/hotel_booking_data_cleaning

Efficiently transforming raw hotel booking data into actionable insights! This project leverages Python and Pandas for advanced data cleaning—handling missing values, detecting outliers, and optimizing features—ensuring a high-quality dataset ready for analysis and modeling.

data-analysis data-cleaning data-preprocessing data-visualization data-wrangling pandas python

Last synced: 31 Mar 2025

https://github.com/nurulashraf/customer-segmentation-hierarchical-clustering

A customer segmentation project using hierarchical clustering to group customers based on their spending behaviour and demographics. This helps businesses identify patterns and create targeted marketing strategies.

business-analytics clustering-algorithm customer-segmentation data-analysis hierarchical-clustering machine-learning python unsupervised-learning

Last synced: 18 Apr 2025

https://github.com/adilshamim8/eda-on-health-and-sleep-data

Exploratory Data Analysis (EDA) on health and sleep data, uncovering patterns and insights using Python and visualization tools.

data-analysis data-visualization eda health healthcare sleep sleep-analysis

Last synced: 15 Mar 2025

https://github.com/mnoalett/cscrawler

BSc degree thesis - crawler for www.couchsurfing.org

bsc-thesis couchsurfing crawler data-analysis database python

Last synced: 02 May 2026

https://github.com/reddyprasade/r-program

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

data-analysis data-science r-programming

Last synced: 11 Apr 2026

https://github.com/sasanthns/sql_data_warehouse_project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

data data-analysis data-science data-warehouse datacleaning etl etlpipeline sql sqlserver

Last synced: 24 Mar 2025

https://github.com/thbaylson/datascience

All of my past data science assignments put into one singular notebook. Most of this comes from my Machine Learning course.

data-analysis data-science data-visualization decision-tree jupyter-notebook k-nearest-neighbors linear-regression machine-learning neural-network pandas-library python3 scikit-learn

Last synced: 09 May 2026

https://github.com/bablukumarjha/startup-funding-revenue-analysis-by-sql-and-pandas

SQL project analyzing startup funding, revenue, and founder data to extract business insights using Python and MySQL.

data data-analysis data-platform data-science dataanalysisusingpython dataanalytics pandas-dataframe pandas-library python sql sql-server sqlalchemy sqldatabase

Last synced: 18 May 2026

https://github.com/manel15279/datamining-project

A university project that aims to explore various data mining techniques like Data Exploration, Association Rule Mining, Supervised and Unsupervised Learning, applied to real-world datasets, focusing on soil fertility analysis and COVID-19 cases evolution over time.

covid-19 data-analysis data-mining data-visualization datascience gradio machine-learning python soil-properties

Last synced: 10 Jun 2025

https://github.com/gabrieladados/analise-ecommerce

Análise SQL para E-commerce: Estratégias de Crescimento para Impulsionar Vendas

bigquery data-analysis ecommerce sql

Last synced: 31 Mar 2025

https://github.com/mansogf/datascience_introduction

Data Science Introductions Practices

data-analysis data-science data-visualization graph

Last synced: 04 Apr 2025

https://github.com/junpenglao/spafv

SPAFV - Surface Profile Analysis for Free Viewing eye movement experiment in 2AFC task

data-analysis statistics temporal-logic

Last synced: 31 Mar 2025

https://github.com/vinitgurjar/r_lang_exp

This is a collection of my collage Data Analytics lab work and assignment, the files here contains program of R language

data-analysis data-visualization r

Last synced: 02 Jul 2025

https://github.com/chrispsang/customerchurnanalysis

Predicting customer churn using a RandomForestClassifier with detailed EDA, model evaluation, and visualization. Includes a Tableau dashboard for interactive insights.

customerchurn data-analysis data-visualization datapreprocessing machine-learning python scikit-learn tableau

Last synced: 31 Jan 2026

https://github.com/katiesaund/tidy_tuesday

A weekly data project in R from the R4DS online learning community

data-analysis data-visualization datascience plot r rstats tidytuesday

Last synced: 24 Mar 2025

https://github.com/abhiram-kandiyana/us-bikeshare-analysis

Explorative analsis on a bike-share system (Motivate) to understand it's pain points

data-analysis data-visualization

Last synced: 26 Mar 2025

https://github.com/windjammer6/9.-employee-exit-data-analysis-python

A personal project to analyse data from a Employee Exit survey from DETE and TAFE. Python libraries used: Numpy, Pandas, Matplotlib

data-analysis python

Last synced: 24 Mar 2025

https://github.com/johannaschmidle/bookauthors

Explored a book sales database. Cleaned data using Excel and created an interactive dashboard to analyze author popularity, ratings, and sales trends. The project highlighted key insights such as sales performance and rating distributions [Excel]

author-sales book-sales books data-analysis data-visualization excel

Last synced: 04 Feb 2026

https://github.com/analysisbyvivek/road-accident

Analyzes road accident patterns, exploring factors like lighting, weather, speed limits, time of day, and road conditions to uncover trends in severity and frequency.

data-analysis data-visualization eda jupyter-notebook kaggle tableau-public

Last synced: 19 Jun 2026

https://github.com/noturlee/iris-dataanalyis

This project aims to classify Iris flowers into three species—setosa, versicolor, and virginica—based on their sepal and petal measurements using machine learning techniques. The dataset comprises 150 samples evenly distributed among these species

data-analysis data-modeling data-science data-structures-and-algorithms data-visualization

Last synced: 08 Apr 2025

https://github.com/codesaadumair/exploratory-data-analysis

A centralized repository showcasing various Exploratory Data Analysis (EDA) projects using Jupyter notebooks, visualizations, and accompanying documentation.

data-analysis data-science data-visualization eda jupyter-notebook jupyterlab python

Last synced: 24 Mar 2025

https://github.com/adrianlardies/from-data-to-insight

This project creates and manages a MySQL database to analyze the performance of Bitcoin, Gold, and the S&P 500 in response to economic factors. It integrates historical data, executes advanced SQL queries, and visualizes key insights, showcasing the power of SQL and Python in financial analysis.

data-analysis data-science matplotlib pandas python seaborn sql

Last synced: 12 Apr 2026

https://github.com/1401dev/iowa-liquor-retail-sales-analysis

This repository contains the analysis of Iowa liquor retail sales data, aimed at uncovering sales trends and forecasting future sales patterns. The project involves data cleaning, preparation, and advanced time series analysis using Microsoft SQL Server and Google Colab.

customer-behavior data-analysis data-cleaning data-science data-visualization exploratory-data-analysis forecasting google-colab machine-learning microsoft-sql-server pandas prophet python retail-analytics retail-sales sales-forecasting sales-performance sql statsmodels time-series-analysis

Last synced: 16 Feb 2026

https://github.com/leosimoes/nexoseducacao-imersao-powerbi

Atividades realizadas na Imersão PowerBI pela Nexos Educação com Karine Lago e Leticia Smirelli em Setembro de 2023.

business-intelligence dashboards data-analysis microsoft-power-bi

Last synced: 06 Jan 2026

https://github.com/adarshpheonix2810/fake-job-post-detection

This project focuses on detecting fake job posts using machine learning. Fake job advertisements are often created to scam individuals by stealing personal information or money.

data-analysis deep-learning joblib machine-learning nlp-machine-learning numpy pandas python scikit-learn tkinter

Last synced: 12 Apr 2026

https://github.com/mindlessmuse666/train-test-splitter

Анализ данных о пассажирах Титаника и разбиение на обучающую и тестовую выборки. Практическое задание по дисциплине "Основы применения методов искусственного интеллекта в программировании".

data-analysis data-preprocessing data-visualization machine-learning pandas python scikit-learn seaborn titanic train-test-split

Last synced: 12 Apr 2026

https://github.com/shrunga92/restaurant_order_analysis_sql

This project is a structured SQL-based analysis of restaurant orders, aimed at deriving key insights from transactional data.

data-analysis sql

Last synced: 03 Jul 2025

https://github.com/abishekaditya/machinelearningintro

Some simple stuff with pandas and Scipy

data-analysis ipython machine-learning pandas python scipy

Last synced: 12 Apr 2026

https://github.com/vikpires/ds_tips-dataset

Projeto individual do bootcamp de ciência de dados avanti 2024.2, com o objetivo de analisar e observar padrões no conjunto de dados "Tips".

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn tips

Last synced: 17 Sep 2025

https://github.com/datastalker/survival-cox

This repository contains an R script for performing survival analysis on breast cancer surgery data from the University of Chicago's Billings Hospital. The analysis includes Kaplan-Meier estimation and Cox Proportional Hazards modeling to assess patient survival.

breast-cancer-prediction cox-model data-analysis data-science data-visualization epidemiology kaplan-meier r survival-analysis

Last synced: 02 Apr 2025

https://github.com/gaboelc/analysis-of-the-employment-situation-in-costa-rica-2018-2022

This is an analysis with data extracted from the INEC in order to identify the changes that occurred in the Costa Rican labor market before, during and after the COVID-19 pandemic.

costa-rica data-analysis empleo employment

Last synced: 24 Mar 2025

https://github.com/suchi25sathavara/r-projects

R projects in Real world Scenerios for Data Analysis

data data-analysis datavisualization r

Last synced: 01 Apr 2025

https://github.com/ajay1214/credit-card-transaction-dashboard

Credit Card weekly dashboard that provides real-time insights into key performance metrics and trends

data-analysis powerbi sql

Last synced: 04 Feb 2026

https://github.com/shridhar1504/tableau-visualization-viz.-project-

This repository contains Visualization Projects which is visualized through Tableau Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and also it provides social values in some cases to calculate damages and intensity of calamities.

dashboards data-analysis data-science data-visualization exploratory-data-analysis tableau tableau-dashboards tableau-public tableau-workbooks visualization

Last synced: 04 Feb 2026

https://github.com/bryanfks-dev/klempoken-analysis

Analysis and forcasting model for Klempoken MSMEs

big-data-analytics data-analysis data-forecast data-visualization

Last synced: 01 Apr 2025

https://github.com/mahmoudnamnam/superstore-analysis

This project explores the SuperStore dataset to uncover insights into sales, profit, and customer behavior. It identifies key trends, regional variations, and product performance, using data analysis and machine learning techniques to guide business strategy and optimize performance.

clustering data-analysis data-science data-visualization geopandas jupyter-notebook machine-learning numpy pandas plotly regression seaborn sklearn

Last synced: 12 Apr 2026

https://github.com/dhanyasri20/credit-risk-prediction

Credit Risk Prediction using Python, SQL, and Flask. Trained ML models (Random Forest) to identify high-risk loan applicants with 86% accuracy, automated SQL reporting, and deployed a Flask web app for real-time predictions.

classification credit-risk data-analysis financial-data flask loan-prediction machine-learning python random-forest sql

Last synced: 28 Apr 2026

https://github.com/satvikpraveen/rsvp_case_study

A comprehensive IMDB dataset analysis using SQL. Includes database setup, advanced queries, and actionable insights. Organized with files for database creation, queries, and solutions. Features an Entity-Relationship Diagram (ERD), executive summary, and SQL scripts. Perfect for SQL workflows and business intelligence in the film industry.

aggregate-functions business-intelligence common-table-expressions data-analysis data-driven-decisions data-querying database-design entity-relationship-diagram imdb-dataset relational-database sql subqueries-and-joins

Last synced: 11 Jan 2026

https://github.com/ernanej/data-science-dca0131

Files, developed throughout the 2024.1 semester of the Data Science discipline taught at the Federal University of Rio Grande do Norte by the Department of Computer Engineering and Automation (DCA). 📚

big-data data-analysis data-science ia

Last synced: 30 Mar 2025

https://github.com/hemangsharma/breast-cancer-patient-dashboard

This interactive Streamlit dashboard visualizes insights from the SEER Breast Cancer Dataset (2006-2010)

data-analysis streamlit streamlit-dashboard streamlit-webapp

Last synced: 05 May 2026

https://github.com/theveryhim/massive-text-processing-1

cleaning, processing and analysis of papers' dataset in pyspark(rdd) framework

big-data data-analysis frequent-itemsets massive-datasets pyspark text-preprocessing

Last synced: 03 Jul 2025

https://github.com/sreekar0101/electric-vehicle-market-growth-and-incentive-impact-analysis-dashboard

About This project involves the development of a comprehensive Tableau dashboard to analyze the growth and market dynamics of electric vehicles (EVs). The dashboard reveals key insights, including a 20% increase in EV adoption over five years, the dominance of Battery Electric Vehicles (BEVs) which make up 60% of the market

data-analysis data-visualization tableau-desktop

Last synced: 07 Jan 2026

https://github.com/kernix13/github-readme-seo-analysis

A Jupyter Notebook GitHub README and Repo SEO Analysis to determine what makes a repo rank in the SERPS

accessibility data-analysis readme seo seo-analysis

Last synced: 29 May 2026

https://github.com/ehsan-behzadi/online-retail-data-analysis-and-preprocessing

This project analyzes and preprocesses the Online Retail dataset to uncover insights into customer purchasing behaviors, sales trends, and product performance. It includes data cleaning, exploration, and visualization, with the goal of enhancing understanding of online retail dynamics.

cohort-analysis data-analysis data-cleaning data-exploration duplicate-detection exploratory-data-analysis-eda feature-encoding feature-engineering handling-missing-values online-retail outlier-detection preprocessing trends-visualization visualization z-score-method

Last synced: 16 Apr 2026

https://github.com/nilayhangarge/data-analysis-with-python

This repository provides a practical introduction to data acquisition and analysis using Pandas. It covers loading datasets, exploring data, manipulating data, and gaining insights through statistical summaries. Ideal for beginners, it offers code examples and explanations to enhance your data manipulation skills using Pandas for Python.

data-acquisition data-analysis data-analytics data-binning data-cleaning data-engineering data-fundamentals data-insights data-integration data-preprocessing data-science data-wrangling numpy pandas python

Last synced: 12 Apr 2026

https://github.com/jcm-ai/quantium-data-analytics-virtual-experience-program

This repository contains all about the proposed solutions to the assignments that I was required to complete as part of the Quantium Data Analytics Virtual Experience Program. 📊📈📉👨‍💻

commercial-thinking communication-skills data-analysis data-validation data-visualisation data-wrangling jupyter-notebook matplotlib-pyplot numpy-library pandas-python presentation-skills programming python3 scipy-stats seaborn statistical-testing

Last synced: 16 May 2026

https://github.com/pngo1997/life-expectancy-logistic-regression

Life expectancy analysis project using logistic regression.

data-analysis logistic-regression r rmarkdown

Last synced: 10 Jun 2026

https://github.com/bhaskarbharati/ibm-datascience-hands-on-lab

This is the basic hands-on exercise using Jupyter Notebook. This lab is done in the process of learning course Tools For Data Science | IBM

data-analysis data-science data-visualization datawrangling eda machine-learning

Last synced: 23 Apr 2025

https://github.com/rachit1084/sql-practice-ankit-bansal

Personal SQL problem-solving practice based on Ankit Bansal's YouTube series, with logic-driven solutions for analyst prep.

analytics data-analysis data-analyst interview-preparation logical-reasoning postgresql sql sql-practice

Last synced: 04 Jul 2025

https://github.com/francois-lenne/eletric_vehicle_usa

the project is purely educational the main goal is to use fabric

data-analysis data-engineering delta-lake fabric jupyter-notebook pyspark python spark

Last synced: 12 Apr 2026

https://github.com/muhammed-fazal/student-success-and-early-intervention-analytics-system

To consolidate scattered student performance records into a unified Data Warehouse in SQL Server. Engineer an Interactive Power BI dashboards that visualize academic trends, identifying student performance and implement predictive analytics.

analysis analytics dashboard data data-analysis data-engineering data-science data-visualization database etl etl-pipeline power-bi powerbi python sql sql-server

Last synced: 29 May 2026