An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/faris771/identify_customer_segments

This project is part of the Palestine Launchpad by Spark, and Udacity with Google. It uses unsupervised learning to identify customer segments for a mail-order company in Germany. The goal is to direct marketing campaigns towards the most promising audiences. The data is provided by Bertelsmann Arvato Analytics.

clustering data-analysis decomposition feature-engineering machine-learning unsupervised-learning

Last synced: 08 Aug 2025

https://github.com/kingflow-23/association-matching

Recherche et Structuration d'Opportunités de Financement pour les Associations

association data-analysis data-engineering excel fondation pyqt5 python webscraping

Last synced: 07 Apr 2025

https://github.com/hatamiarash7/ir-system

IR System for Reuters DB

data-analysis data-mining ir python

Last synced: 29 Mar 2025

https://github.com/mindlessmuse666/eda-explorer

Инструмент на Python для разведочного анализа данных (EDA) и визуализации, поддерживающий загрузку данных CSV и JSON, с модульной архитектурой ООП. Практическая работа по теме: "Обнаружение и визуализация данных для понимания их сущности" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

csv-visualization data-analysis data-science data-visualization exploratory-data-analysis json-visualization matplotlib oop pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/s1m0n38/cr-analysis

An exercise in data collection/analysis

clash-royale data-analysis data-collection data-science

Last synced: 08 Jul 2025

https://github.com/dimits-ts/visualization-assignments

Visualizing and analyzing results from the PISA-2018 competitions with regards to Greek performance and gender gap.

data-analysis data-visualization interactive-graphs presentation-slides r-language tableau

Last synced: 06 Nov 2025

https://github.com/noor188/preswald-data-app

A data app to visualize and manipulate the graduate admission dataset

data-analysis data-visualization open-source

Last synced: 04 Jul 2025

https://github.com/kefilweditse/awesome-matchem-datasets

Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.

awesome awesome-dataset awesome-dataset-collection awesome-match-data awesome-matchem data-analysis data-matching dataset dataset-collection dataset-research dataset-samples match match-data match-dataset-analysis match-examples

Last synced: 07 Apr 2025

https://github.com/tejaswirupa/data-analysis-of-departure-delays-at-united-airlines

Explored how weather and time factors influence delays in 58,000+ UA flights. Used permutation testing and visual analytics to show how temperature, visibility, and time of day affect departure punctuality.

data-analysis r statistics

Last synced: 25 Jan 2026

https://github.com/leosimoes/datascienceacademy-python-analisededados

Atividades do curso Análise de Dados com Linguagem Python da DataScienceAcademy.

data-analysis data-science jupyter-notebook python sql

Last synced: 29 Apr 2026

https://github.com/muneeb706/r-programming

R-Programming examples for data analysis.

data-analysis r-programming

Last synced: 26 Mar 2025

https://github.com/jibbs1703/airline-data-analysis

This repository contains the Exploratory Data Analysis of the flight delay and cancellation for airline flights in the United States in the year 2015. With this EDA, insights and solutions are suggested for business owners and airport managers.

business-insights business-solution data-analysis data-visualization

Last synced: 20 Mar 2025

https://github.com/shz-code/diwali_sales_data_analysis

Customer Product Purchase Behavior Analysis

behavior-analysis data-analysis matplotlib ml sales seaborn

Last synced: 14 Mar 2025

https://github.com/muthukumar0908/cardekho_used_car_price_prediction

The project aim is to build a machine learning model that offers users to find current valuations for used cars.

data-analysis data-visualization datacleaning eda machine-learning python streamlit

Last synced: 30 Mar 2025

https://github.com/apostolis-bloutsos-data/employee-data-eda

Mini EDA project on synthetic employee records using Python, pandas, and matplotlib

data-analysis eda jupyter-notebook matplotlib pandas python seaborn

Last synced: 09 May 2026

https://github.com/sivkri/shiny-scatter-plot-app

This repository contains a Shiny app that allows users to create interactive scatter plots by selecting the X and Y axes and customizing the point color. The app utilizes the shiny package in R to provide a user-friendly interface and the ggplot2 package for creating visually appealing plots.

data-analysis data-visualization ggplot2 interactive-web-application r rprogramming scatter-plot shiny

Last synced: 22 Mar 2025

https://github.com/sivkri/rnaseq-analysis-junctionseq-qorts

This repository provides scripts for RNA-Seq data analysis using JunctionSeq and QoRTs, enabling quality control, differential splicing analysis, and generation of browser tracks.

bioinformatics data-analysis differential-splicing genomics junctionseq qorts quality-control rna-seq rna-seq-analysis splice-junctions splice-variants spliced-alignment transcriptomics

Last synced: 22 Mar 2025

https://github.com/habiburrahman-mu/exploratory-data-analysis

Methods to see if certain characteristics or features can be used to predict.

data-analysis data-mining data-science data-visualization

Last synced: 20 Jan 2026

https://github.com/rohitha-tata/bike-sales

This project focuses on data cleaning, transformation, and dashboard creation using a bike buyers dataset. It includes Pivot Tables, slicers, visualizations, and statistical insights to analyze trends based on income, age, occupation, and other key factors. Insights help understand customer behavior, purchasing patterns, and decision-making trends.

data-analysis data-cleaning excel-dashboards interactive-slicers pivot-charts pivot-tables

Last synced: 08 Mar 2026

https://github.com/hasnathjami/data-analysis-of-covid-19

An Oracle PL/SQL-based project on COVID-19 data analysis. It is my CSE 4.1 project of Distributive Database Management System LAB.

data-analysis naive-bayes-classifier oracle-database probability-statistics sqlplus

Last synced: 08 Mar 2026

https://github.com/nimomach/amazon-sales-data

This is a small dataset containing Amazon sales data analysis for few regions.

dashboards data data-analysis data-visualization

Last synced: 08 Mar 2026

https://github.com/rugwiroparfait/alx_sql

This repo is where I save my queries and learning materials in Data Science program from ALX

anaconda data data-analysis jupyter-notebook sql

Last synced: 19 Aug 2025

https://github.com/cosmoduende/r-earthquakes

Análisis y visualización de datos de actividad sísmica en México con R. Cómo analizar y visualizar la historia sísmica de México con datos del SSN (Servicio Sismológico Nacional)

data-analysis data-analytics data-science dataviz earthquakes r-code r-programming r-studio rstudio sismo sismologia sismos ssn ssnmx terremoto terremotos

Last synced: 24 Jan 2026

https://github.com/kavicastelo/colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 26 Mar 2025

https://github.com/tolumie/web-scraping-rest-api-stock-data-operations

Web Scraping, REST API & Stock Data Operations is a data-driven project that explores the power of web scraping, API interactions, and stock market analysis using Python. From extracting stock data and public records to analyzing real-world financial trends, this repository is a one-stop resource for data enthusiasts, traders, and analysts.

api-integration data-analysis data-cleaning data-visualization financial-data python rest-api sql-databases stock-data web-scraping

Last synced: 19 May 2026

https://github.com/mrham17/spotify_streaming_analytics

Project is stable & documentation will be completed soon. Thank you for your understanding and patience.

big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics

Last synced: 24 Jul 2025

https://github.com/bris0yzbekaye/json-to-excel-converter

This repository provides a tool to convert JSON data to Excel format (.xlsx). It allows you to easily transform structured JSON data into a well-organized spreadsheet for better analysis and visualization.

automation-script automation-tools data-analysis data-converter data-export data-formatting data-tools data-visualization excel excel-automation excel-converter excel-tools json json-exporter json-parser json-processing json-to-csv json-to-excel programming-tools spreadsheet-tools

Last synced: 25 Jul 2025

https://github.com/codeonthespectrum/web-scrap

Este projeto realiza o web scraping da Wikipédia para obter dados sobre os municípios mais populosos do estado do Rio de Janeiro.

data-analysis data-visualization webscraping

Last synced: 16 Feb 2026

https://github.com/leandrocollares/street-cherry-trees-in-vancouver

Street cherry trees in Vancouver: an exploratory data analysis

data-analysis data-visualization folium pandas plotly-express

Last synced: 17 Sep 2025

https://github.com/netesf13d/expt-sequence-analysis

Data processing, analysis and visualization package for atomic physics experiments in the single-atom regime.

cold-atoms data-analysis data-visualization optical-tweezers

Last synced: 24 Jul 2025

https://github.com/matte34/auto-insurance-analysis

Conducted a comprehensive exploratory data analysis (EDA) on an auto insurance dataset that I found from Kaggle. I performed a permutation test and generated data visualizations.

data-analysis data-visualization permutation-test python3 scipy seaborn

Last synced: 06 May 2026

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 13 May 2026

https://github.com/swethajoseph/netflix-powerbi-interactive-dashboard

Created an interactive Netflix Power BI dashboard to analyze and visualize Netflix's content library, uncovering trends in content type, genre distribution, and global reach

data-analysis data-visualization interactive-visualizations powerbi powerbi-dashboards powerbi-report

Last synced: 03 Jan 2026

https://github.com/tomy-jr98/air-quality-sql-project

Air pollution analysis using BigQuery and Tableau, with data cleaning, aggregation, and visualization.

air-pollution bigquery data-analysis portfolio sql tableau

Last synced: 25 Jul 2025

https://github.com/cescedes/medical-insurance-costs-with-python

Investigate how different factors affect the prediction of medical insurance costs by practicing many python concepts.

codecademy data-analysis python python-dictionaries python-functions python-lists python-loops python-strings

Last synced: 19 May 2026

https://github.com/andersoncrs/clasificacion-propina-restaurante

Este informe desarrolla, de manera clara y práctica, un análisis completo del conocido conjunto de datos de propinas (tips), mostrando paso a paso cómo transformar la información cruda en modelos predictivos útiles.

clasification data-analysis data-visualization tips

Last synced: 26 Jul 2025

https://github.com/dadvaiahpavan/ai-data-scientist-

AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.

ai anomaly-detection classification data-analysis data-science machine-learning panda plotu regression scikit-learn sentiment-analysis streamlit

Last synced: 03 May 2026

https://github.com/sandergi/ekichabi

A digital phonebook to connect sustenance farmers in Tanzania. Works via USSD so farmers without an internet connection can use it (via their Telecom). Build with Django in Python and a MySQL database. This is a public copy of the private repo with user information stripped.

android data-analysis ict4d research ussd

Last synced: 14 May 2026

https://github.com/riyajain255/customer-segmentation-for-e-commerce

This project analyzes online retail data to segment customers using K-Means clustering and build classification models to predict those segments based on purchasing behavior.

customer-segmentation data-analysis kmeans-clustering logistic-regression machine-learning matplotlib numpy pandas python random-forest scikit-learn seaborn-plots

Last synced: 02 Apr 2026

https://github.com/kushalagarwalla/netflix-movie-data-analysis

🚀 Netflix Data Analytics Project 🎬📊 | Analyzed 9K+ movies to uncover insights on genres, popularity, votes & release trends. Includes EDA, KPIs & visualizations using Python (Pandas, NumPy, Matplotlib, Seaborn). Supports data-driven content & engagement strategy.

data-analysis data-visualization jupyter-notebook numpy pandas python seaborn

Last synced: 06 May 2026

https://github.com/samiksha29-patil/flipkart-mobiles-data-analysis-visualization-in-python

This project analyzes Flipkart Mobiles Dataset to extract useful insights about mobile phones, their pricing, ratings, discounts, and customer reviews. The analysis and visualization are done using Python to understand market trends and customer preferences.

data-analysis data-visualization matplotlib numpy pandas python seaborn

Last synced: 04 May 2026

https://github.com/labex-labs/numpy-for-beginners

This comprehensive course covers the fundamental concepts and practical techniques of NumPy, the essential library for numerical computing in Python. Learn to create, manipulate, and analyze arrays efficiently.

array-manipulation array-slicing beginner-friendly course data-analysis data-science data-structures fast-computation hands-on labex labs linear-algebra matrix-operations numerical-computing numpy programming python python-programming scientific-computing vectorized-operations

Last synced: 20 Jun 2026

https://github.com/vitor-ace/sunspots-data-analysis

This is a Jupyter Notebook which works with Data Analysis logic and libraries implementation with Python.

data-analysis data-visualization debbuging error-handling file-handling matplotlib-pyplot numpy pandas python

Last synced: 06 May 2026

https://github.com/hecatops/ad_libs

A real time advertisement data analytics platforming, displaying important metrics in easy to understand language.

dashboard data-analysis data-visualization kpi plotly-dash python

Last synced: 07 Nov 2025

https://github.com/hemangsharma/bookingdataanalysisreport

The report helps understand key trends and insights around customer bookings, pricing, and other related attributes.

analysis data data-analysis data-analytics data-visualization streamlit streamlit-dashboard

Last synced: 14 May 2026

https://github.com/zwelz3/unofficial-survivor-knowledge-graph

A comprehensive RDF knowledge graph covering all 50 seasons of Survivor (US), with 23,000+ triples across 749 named graphs.

data-analysis rdf survivor

Last synced: 23 May 2026

https://github.com/karsterr/repeated-measurement

An R-based workflow for conducting repeated measures ANOVA using the ez package, with data wrangling via tidyverse and visualization through ggplot2. Includes data import, transformation to long format, statistical analysis, and graphical summary.

anove data-analysis experimental-design ezanove ggplot2 r repeated-measurements rstats statistics tidyverse

Last synced: 18 Sep 2025

https://github.com/ariyaarka/sales-analysis

A simple analysis on random dataset of pizza sales using SQL

data-analysis presentation-slides sql

Last synced: 17 Jan 2026

https://github.com/rh01/data-analysis-with-r

Duke University - Data Analysis With R

data-analysis r r-language r-studio rmarkdown

Last synced: 23 May 2026

https://github.com/zulfachafidz/green_horizon_forecasting_peak_organic_avocado_sales_with_the_prophet_algorithm

The Green Horizon Project leverages the Prophet algorithm to predict peak sales of organic avocados, supporting the campaign "APEAM GO ORGANIC." Using Python and Looker Studio, this analysis aims to provide deep insight into sales trends and potential, forming the basis of smarter marketing strategies.

algorithm algorithms analytics data data-analysis data-engineering data-mining data-science data-visualization forecasting machine-learning machine-learning-algorithms prophet-model python python-script

Last synced: 17 May 2026

https://github.com/ljadhav25/healthcare-data-collection-and-analysis

This repository contains a project focused on collecting healthcare data from the web, storing it in a structured format, and performing comprehensive analysis. The objective is to gather valuable health-related information, process and clean the data, and derive insights to support healthcare research and decision-making.

data-analysis data-visualization flask-application flask-backend html-css-javascript pycharm-ide python

Last synced: 09 Apr 2026

https://github.com/ramapinnimty/udacity-mlfoundation-nanodegree

This is a repository containing solutions to the assignments that are a part of the Udacity Machine Learning Foundation Nanodegree program.

assignments data-analysis python3 statistics udacity-machine-learning-nanodegree

Last synced: 26 Jul 2025

https://github.com/sharoonjoseph321/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 27 Jul 2025

https://github.com/hfzdzakii/dicoding-shipclusteringanalysisdataandmodelling

This repo is a master submission for my Dicoding Final Project. Ship Performance Clustering Dataset was being used to fulfill the submission. Feel free to explore and I hope my work give you some insight!

clustering data-analysis machine-learning

Last synced: 27 Jul 2025

https://github.com/benmar2406/rent-in-germany

Interactive visualizations and maps depicting topics around rent prices and income in Germany built with Svelte.

charts d3 d3-visualization d3js data-analysis data-visualization gis gis-data infographic infographics map mapbox mapbox-gl mapbox-gl-js mapboxgl svelte

Last synced: 26 Mar 2025

https://github.com/danymukesha/bioga

Apply multi-objective genetic algorithms to genomic data for biologically informed feature selection and pattern discovery.

data-analysis gene-expression genetic-algorithms genomics optimization-algorithms

Last synced: 18 Sep 2025

https://github.com/dmvianna/python-nix

Trivial Nix environment with pandas and postgresql

data-analysis nix

Last synced: 27 Jul 2025

https://github.com/grindelfp/data-analysis-example

One of my UNI Artificial Intelligence Systems course's projects.

data-analysis data-preprocessing ipynb

Last synced: 19 Sep 2025

https://github.com/tbep-tech/tbeploads

R Package for estimating nutrient loading to Tampa Bay

data-analysis loads package tampa-bay tbep tbnmc water-quality

Last synced: 19 Feb 2026

https://github.com/jofaval/ionosphere

Binary Classification of Ionosphere signals at Goose Bay, Labrador in 1988

data-analysis data-science data-visualization deep-learning google-colab keras machine-learning python scikit-learn tensorflow uci xgboost

Last synced: 09 Apr 2026

https://github.com/sadratehranian/pem-fuel-cell

The methodology section details the use of Python for data processing and analysis, employing statistical and machine learning-based anomaly detection techniques to identify potential issues in fuel cell stacks. It emphasizes data preprocessing, feature engineering, exploratory data analysis (EDA), and anomaly detection.

anomaly-detection data-analysis data-science data-visualization exploratory-data-analysis feature-engineering fuel-cell machine-learning preprocessing python statistical-analysis visual-studio-code

Last synced: 26 Mar 2025

https://github.com/tbep-tech/tbep-r-training

Repository for miscellaneous R training materials

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/tbep-tech/pep-graphics

Materials for generating PEP graphics

data-analysis pep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/tberf-oyster

Materials for evaluating TBERF oyster restoration success

ccmp-bh4 ccmp-bh6 data-analysis tampa-bay tbep tberf

Last synced: 19 Feb 2026

https://github.com/tbep-tech/pep-r-training

Materials for PEP R training

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/shafaq-aslam/predicting-heart-disease-risk-with-logistic-regression-techniques

Develop a predictive model using logistic regression techniques to assess heart disease risk based on patient health metrics and data analysis.

data-analysis heart-disease logistic-regression machine-learning machine-learning-models matplotlib numpy pandas python scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/lucas-mazzolim/superstore-bi

Project where I prepared two data sources for querying and created a BI visualization in Data Studio. Used tools as Mysql, Looker Studio, Google Spreadsheet and Python.

business-intelligence data-analysis data-visualization google-looker-studio mysql spreadsheet

Last synced: 27 Jul 2025

https://github.com/nandit123/python_on_excel

Data Analysis using python libraries on excel data

csv data-analysis data-science fill fluctuations graph numpy python python-library

Last synced: 16 May 2026

https://github.com/tkhoa2711/twitter-hate-speech

Hate speech detection on Twitter

data-analysis python twitter

Last synced: 28 Jul 2025

https://github.com/sidsin0809/hmdb-endo-flagger

A Python toolkit to identify and score endogenous human metabolites from HMDB XML metadata

data-analysis hmdb metabolomics ontology pipeline python-3 streaming-parser xml-parsing

Last synced: 06 Jul 2025

https://github.com/tbep-tech/fim-seagrass

Materials for analysis of FIM data, seagrass, and other datasets

data-analysis fim seagrass tampa-bay

Last synced: 19 Feb 2026

https://github.com/tbep-tech/seagrass-analysis

Materials for assessing coverage changes and analysis of drivers of change for Tampa Bay seagrass

dashboard data-analysis seagrass tampa-bay water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/piney-point-analysis

Materials for analysis of Piney Point monitoring data

data-analysis open-science piney-point tampa-bay tbep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/peptools

Materials for wrangling and summarizing data from the Peconic Estuary

data-analysis package pep water-quality

Last synced: 19 Feb 2026

https://github.com/tbep-tech/rookery-bay-training

Materials for R training at Rookery Bay Monitoring Workshop 2020

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/rohitblaze10/survey_monkey_analysis--using-ipython

This data analysis project focused on extracting insights from survey responses. It involves data cleaning, merging, and transformation using iPython (Pandas,OS) and SQL. The goal is to identify trends and patterns in survey data for better decision-making.

data-analysis ipynb ipython-notebook

Last synced: 28 Jul 2025

https://github.com/ashwin331133/sql-project--sales-data-analysis--walmart

This SQL-based Walmart data analysis project aims to identify top-performing branches and products, optimize sales strategies using Kaggle's Walmart Sales Forecasting Competition dataset.

data-analysis eda sql

Last synced: 03 Jan 2026

https://github.com/labex-labs/sqlite-intermediate-to-advanced

In this course, delve into advanced SQLite techniques. Master constraints, indexing, joins, subqueries, transactions, triggers, views, full-text search, JSON, backups, PRAGMA tuning, CTEs, window functions, and more!

advanced-sql course data-analysis data-integrity data-manipulation data-modeling database database-design hands-on labex labs performance-tuning programming query-optimization relational-database schema-management sql sqlite stored-procedures transaction-management

Last synced: 18 May 2026

https://github.com/archanakokate/eda_amazon_products_and_discounts_2023

Exploratory Data Analysis (EDA) on Amazon's 2023 Products and Discounts data

data-analysis data-mining data-visualization exploratory-data-analysis

Last synced: 03 Jan 2026

https://github.com/swethajoseph/statistical-stock-performance-analysis

Conducted a statistical analysis of Microsoft, Tesla, and Apple stock performance compared to the S&P 500, examining price trends, volatility, and correlations to derive investment insights.

advancedexcel comparative-analysis data-analysis data-visualization datapreparation descriptive-statistics moving-average msexcel performance-analysis performance-metrics regression-analysis statistical-analysis

Last synced: 03 Jan 2026

https://github.com/prateek5525/retail-sales-analysis-project

This project involves analyzing retail sales data using SQL to uncover insights into sales patterns, customer behavior, and product performance. It serves as an exercise to develop foundational SQL skills in data exploration, cleaning, and analysis.

data-analysis data-cleaning retail-sales-data sql

Last synced: 03 Jan 2026

https://github.com/hasinii12/-chocolate-analysis-dashboard

This Power BI report provides a comprehensive analysis of chocolate ratings and related attributes.

data-analysis data-visualization powerbi

Last synced: 09 Feb 2026

https://github.com/amoghkori/deeplabcut-package-for-animal-pose-estimation

DeepLabCut Mouse Location Prediction: Training a deep neural network to predict the location of a mouse using annotated joint positions.

data-analysis data-annotations data-preprocessing deep-learning machine-learning model-evaluation python-programming research research-project

Last synced: 17 Mar 2025