An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/engusseus/warframe-market-set-profit-analyzer

Python tool that analyzes Warframe Market data to find profitable item sets to trade

api data-analysis python trading waframe

Last synced: 23 Mar 2025

https://github.com/mahdi-meyghani/movie-recommendation-system

A Python-based movie recommendation system utilizing popularity-based, content-based, and collaborative filtering models with data science and machine learning techniques.

data-analysis data-science machine-learning recommendation-system scikit-learn scikitlearn-machine-learning

Last synced: 23 Jan 2026

https://github.com/dcs-training/spatial_dynamics

Use of QGIS and R to analyse first and second order geospatial effects. Go to the Readme file

data-analysis geographical-data gis qgis r statistics

Last synced: 23 Oct 2025

https://github.com/changyeop-yang/study-datasciencefoundation

Big Data Science and its Analytics plays a major role in this decade. How to clean and prepare your data for analysis is still a challenge, like How to perform basic visualization of your data, How to model your data, How to curve-fit your data, And finally, how to present your findings and wow the audience

data-analysis ios kyungpook-national-university swift

Last synced: 23 Oct 2025

https://github.com/gunifiri/duckdb-ghw

🦆 Accelerate analytics with DuckDB's integration for GitHub workflows, enabling efficient data handling and processing directly within your repositories.

analytics analytics-engine big-data columnar-storage data-analysis data-science database duckdb in-memory-database open-source parquet python query-planner r sql

Last synced: 29 Apr 2026

https://github.com/sugumarsrinivasan/sql-datawarehouse-project

Building Mordern datawarehouse with SQL Server, including ETL Processes, data modeling, and data analytics.

data-analysis data-analytics data-engineering data-lake data-science data-warehouse datawarehousing etl etl-pipeline medallion-architecture sql sql-query sql-server

Last synced: 19 Jun 2026

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 05 May 2026

https://github.com/shubham200137/cyclistic-case-study

This repository contains a case study for Google's Data Analytics Professional Certificate, focusing on Cyclistic, a fictional bike sharing company in Chicago. The case study aims to drive growth by converting casual riders into members through a marketing strategy.

data-analysis data-visualization numpy-python pandas-python presentation-slides sql tableau

Last synced: 11 Jun 2026

https://github.com/dsrodrigovieira/favoritasales

Este repositório contém o projeto desenvolvido para o desafio do kaggle "Store Sales - Time Series Forecasting. Use machine learning to predict grocery sales"

data-analysis data-science kaggle-competition machine-learning python telegram-bot xgboost-regression

Last synced: 05 May 2026

https://github.com/a26nine/kortext-usage-dashboard

An interactive data visualisation dashboard built using Tableau software to understand the value of digital resources issued on Kortext platform at Middlesex University, London.

data-analysis data-science data-visualization knime tableau

Last synced: 01 Feb 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/ljadhav25/linear_regression_data_science

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

data-analysis data-science linear-regression machine-learning

Last synced: 26 Oct 2025

https://github.com/jedrzej-wydra/competition-cooperation

Competition, cooperation, and parental effects in larval aggregations formed on carrion by communally breeding beetles Necrodes littoralis (Staphylinidae: Silphinae)

data-analysis non-linear-regression r

Last synced: 20 Aug 2025

https://github.com/panoschatzi/healthcare_and_bioinformatics_analyses

Healthcare and Bioinformatics data analysis projects with Python and SQL.

data-analysis data-cleaning data-visualisation jupyter matplotlib mysql pandas plotly python seaborn sql

Last synced: 06 May 2026

https://github.com/vishalsiingh/deloitte-virtual-internship

Submission for the STEM Virtual Program by Deloitte via Forage.

coding cyber-security data-analysis deloitte development forage forensics

Last synced: 23 Jan 2026

https://github.com/itrauco/data-dirtying-tool

a simple command line tool to generate dirty data and do common data things in google cloud

data data-analysis data-engineering data-ops data-pipeline data-science data-visualization data-wrangling dirty-data google-cloud machine-learning

Last synced: 24 Feb 2025

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/gaurabkundu1/road-accident-data-analysis

This is an Excel project on Road Accident Data Analysis in the form of an interactive Dashboard.

dashboard data-analysis data-vizualisation excel road-accidents

Last synced: 24 Jan 2026

https://github.com/valentinoli/swiss-foodprint

Project in Applied Data Analysis, EPFL 2019

carbon-emissions data-analysis diet foodprint swiss switzerland

Last synced: 24 Jan 2026

https://github.com/erick957/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

🏠 Predict house prices using advanced regression techniques with this comprehensive analysis and cleaning project, from data loading to model deployment.

data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 06 May 2026

https://github.com/lopez86/rust-mlearn

Machine Learning Tools in Rust

data-analysis data-science machine-learning rust

Last synced: 15 May 2025

https://github.com/wassimhd/pwc-switzerland-power-bi-in-data-analytics-virtual-case-experience

The Project helps to build a foundation in data analysis and Power BI software which is provided by PWC virtual internship

data-analysis data-visualization datastorytelling powerbi

Last synced: 28 Jan 2026

https://github.com/ssoehdata/sql_for_data_science_specialization_course

Materials and Certifications from the SQL for DataScience Course

data-analysis data-science database databricks postgresql sql sqlite

Last synced: 10 Apr 2026

https://github.com/anurag-ghosh-12/library_management_system_sql

This project showcases the development of a comprehensive Library Management System utilizing Structured Query Language (SQL). It demonstrates a practical application of relational database principles to efficiently manage library resources, member information, and borrowing/returning transactions.

data-analysis data-visualisation dbms-project sql

Last synced: 29 Jan 2026

https://github.com/mikma03/datascience_python_datacamp

DataScience with Python. Code and examples. Python libraries, including pandas, NumPy, Matplotlib, and many more.

data-analysis data-science datacamp datascience numpy pandas python

Last synced: 06 May 2026

https://github.com/angchekar28/sales-report-power-bi

A Power BI sales report analyzing country-wise and product-wise sales trends. Includes dashboards, decomposition trees, and key influencers analysis for business insights.

dashboard data-analysis data-cleaning data-visualization powerbi sales-report

Last synced: 16 Mar 2026

https://github.com/cnoret/retail-data-analysis

Let's analyze historical sales data from a large retail chain and predict weekly sales using machine learning on a Streamlit web app

data-analysis data-analyst data-science data-vizualisation pandas python streamlit streamlit-webapp

Last synced: 10 Apr 2026

https://github.com/ksm26/ml-ai-data-science-jobs-in-canada

Explore the latest machine learning, artificial intelligence, and data science job opportunities in Canada. Stay informed about Canadian tech job market trends and find your next career move.

ai-canada ai-careers canada canadian-tech-companies canadian-tech-job-market data data-analysis data-engineering data-science data-science-careers machine-learning prompt-engineering robotics

Last synced: 06 May 2026

https://github.com/abhi227070/medical-insurance-predictor

This project implements a machine learning regression model to predict medical insurance charges based on user-provided details such as smoking status, number of children, gender, and age. The user-friendly interface allows individuals to estimate their average insurance price before purchasing medical insurance.

data-analysis machine-learning machine-learning-algorithms machinelearning python3 regression-models

Last synced: 04 May 2026

https://github.com/sanchittechnogeek/overscripted-analysis

Geolocation and user language extraction analysis from Mozilla Overscripted dataset

analysis data data-analysis mozilla

Last synced: 23 Mar 2025

https://github.com/joannescode/regex_with_py

Learning by practicing with Regex (Python)

data-analysis python3 regex

Last synced: 30 Jan 2026

https://github.com/mfakhriazhar/us-companies-revenue-dashboard

This project is a data visualization dashboard built using Power BI that highlights lists of the largest companies in the United States by revenue. The goal is to provide an interactive overview of company performance across industries, focusing on revenue, employee metrics, and industry trends.

dashboard data-analysis data-visualization largest-companies-us powerbi revenue united-states

Last synced: 30 Jan 2026

https://github.com/azaz9026/data_cleaning

Welcome to the Data Cleaning repository! This collection is dedicated to showcasing techniques and methods for cleaning and preparing datasets for analysis.

data-analysis data-engineering data-structures data-visualization eda feature-engineering machine-learning numpy outliers pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/manishabarse/hr_data_analysis

Used Microsoft SQL Server Management Studio and Power BI

data-analysis powerbi sql ssms

Last synced: 30 Jan 2026

https://github.com/jaseel342/ecommerce_sales_dashboard

The E-commerce Sales Dashboard project offers a comprehensive view of e-commerce sales performance using interactive Power BI dashboards. It focuses on key metrics like YTD Sales, YTD Profit, YTD Profit Margin, and Quantity of Products sold, analyzing data by product categories, states, and regions.

data-analysis data-modelling dax-expression excel power-query powerbi visualization

Last synced: 07 Feb 2026

https://github.com/deanlogan/data-analysis-course

Code created when completing the Data Analysis with Python Course on freecodecamp.org

course data-analysis numpy pandas python python3

Last synced: 06 May 2026

https://github.com/luminati-io/indeed-dataset-samples

A sample dataset of over 1000 Indeed job listings, extracted using the Bright Data API, ideal for market analysis and growth.

api data-analysis datasets indeed jobs web-scraping

Last synced: 07 Feb 2026

https://github.com/cca/panopto-session-data

analyzing Panopto session data for retention purposes

data-analysis ipython-notebook video

Last synced: 07 Feb 2026

https://github.com/badranalyst/titanic-survival-prediction-full-data-science-project-classification

This project predicts Titanic survivors using classification models. It includes data cleaning, pre-processing, exploratory data analysis (EDA), categorical feature conversion, model building, and evaluation. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used to analyze and predict survival outcomes.

classification data-analysis data-science eda exploratory-data-analysis machine-learning matplo matplotlib-pyplot ml model numpy pandas predictive-modeling python seaborn

Last synced: 06 May 2026

https://github.com/gastonstat/stat133

STAT 133: Concepts in Computing with Data

data-analysis data-science data-visualization r-programming syllabus

Last synced: 25 Feb 2026

https://github.com/nikitalpopov/evotor_champ

solution for evotor data challenge

data-analysis data-science python scikit-learn

Last synced: 15 Apr 2026

https://github.com/tusharpandey003/chat_analysis

Analysis of group chat with respect to individual member of group

chat-analysis chat-analyzer data-analysis data-science streamlit whatsapp whatsapp-chat whatsapp-web

Last synced: 01 Feb 2026

https://github.com/tapas-gope/global-superstore-sales

This repository contains a Power BI dashboard designed to provide comprehensive insights into sales performance across various regions, segments, and products. The dashboard utilizes a variety of visualizations, including bar charts, line charts, maps, and tables, to effectively communicate key metrics and trends.

business-intelligence data-analysis data-modeling data-visualization financial-reporting powerbi sales-analysis

Last synced: 07 Feb 2026

https://github.com/rohitdusane/healthcare-analytics

𝐏𝐨𝐰𝐞𝐫 𝐁𝐈 𝐃𝐚𝐬𝐡𝐛𝐨𝐚𝐫𝐝 is designed to provide valuable insights into patient waiting times across outpatient and inpatient healthcare services. It offers a comprehensive analysis of key factors influencing wait lists, including Age Profile, Specialty, Time Bands, and Patient Case Types.

data-analysis data-visualization dax dax-query healthcare-analysis powerbi-report

Last synced: 01 Feb 2026

https://github.com/okwilkins/retailanalysis

A comprehensive exploratory analysis and implementation of kmeans/hierarchical clustering on online retail data.

data-analysis data-science machine-learning statistics

Last synced: 18 Oct 2025

https://github.com/andrii04/ga4-gcs-to-bigquery-etl

Automated Data Pipeline that ingests daily GA4-formatted CSV files from a private Google Cloud Storage bucket, validates and loads them into BigQuery, and prepares analysis-ready views. The solution is built for deployment as a Cloud Function triggered by Cloud Scheduler and uses Python with the Google Cloud Storage and BigQuery client libraries.

automation bigquery cloud cloudfunctions data data-analysis data-engineering etl etlpipeline gcp google googlecloudplatform pipeline python sql

Last synced: 18 May 2026

https://github.com/devbigboy/excel-power-query-get-transform

Power Query is a feature in Excel that allows you to quickly import data from multiple sources and easily clean, transform, and reshape it to suit your needs.

data-analysis data-science excel

Last synced: 08 Feb 2026

https://github.com/ronitjariwala/prodigy_ds_02

Prodigy InfoTech Data Science Internship Task-2

data-analysis python

Last synced: 28 Apr 2026

https://github.com/sroman0/data-analytics

Data Analytics Exercises is a collection of comprehensive university-level exercises aimed at enhancing skills in data analytics. The repository includes practical notebooks covering data manipulation, exploratory data analysis (EDA), statistical analysis, data visualization, and machine learning fundamentals.

data-analysis data-analytics data-science data-visualization education exercises exploratory-data-analysis hands-on-practice jupyter-notebook machine-learning python statistics

Last synced: 15 Apr 2026

https://github.com/grindelfp/datasets-analysis

The Machine Learning and Data Analysis course task dedicated to training skills of data normalizing and preprocessing.

data-analysis datasets ipynb mlda

Last synced: 05 Mar 2026

https://github.com/siddhant2105s/airline-performance-analysis-dashboard

Enhancing Airline Performance Analysis for the Department of Transport

data-analysis data-visualization tableau

Last synced: 08 Feb 2026

https://github.com/djm158/learning-microsoft-r

Working through https://www.gitbook.com/book/smott/introduction-to-microsoft-r-server/details and creating samples

data-analysis data-science microsoft microsoft-sql-server r

Last synced: 15 Apr 2026

https://github.com/themihirmathur/uber-data-analytics

The goal of this project is to perform comprehensive data analytics on Uber trip data using a modern data engineering stack on Google Cloud Platform (GCP).

bigquery data-analysis data-engineering etl-pipeline google-cloud-platform looker python

Last synced: 09 Feb 2026

https://github.com/shubham200137/spotify-listening-habits-analytics

Spotify Listening Habits Analytics is a project aimed at analyzing personalized Spotify listening habits and music trends. It involves Exploratory Data Analysis (EDA) with Python Pandas, data processing using SQL Server, and creating visualizations with Power BI. The goal is to uncover insights into listening patterns, track popularity, and artist.

data-analysis data-visualization exploratory-data-analysis jupyter-notebook pandas power-bi-dashboard sqlserver

Last synced: 18 Mar 2026

https://github.com/naninsv/apple-retail-sales-warranty-analysis

An advanced SQL project analyzing over 1 million rows of Apple retail sales data to solve real-world business problems, optimize query performance, and extract actionable insights. The analysis includes sales trends, warranty claims, product performance, and year-over-year growth

business-intelligence data-analysis data-science etl insights retailanalytics sql sqladvance

Last synced: 26 Feb 2026

https://github.com/rlalpha49/anisearch-model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2026

https://github.com/ludreinsalvador/global-covid-19-data-analysis

Contains Power BI dashboards that visualizes and analyzes global COVID-19 cases, deaths, and vaccination trends using data from the World Health Organization (WHO). The project aims to provide insights into the pandemic’s impact and vaccination progress worldwide through dynamic reports and advanced analytics.

analytics covid-19 covid19-data data data-analysis data-collection data-transformation data-visualization

Last synced: 26 Feb 2026

https://github.com/haroontrailblazer/machine_learning

About This Repository A curated resource hub for learning machine learning, featuring tutorials, code examples, datasets, and hands-on projects to build foundational skills and explore real-world applications.

data data-analysis data-visualization database dataset gradient-descent machine-learning pandas python3 random-forest sklearn statistics

Last synced: 16 Apr 2026

https://github.com/serlo/data-pipeline-interactive-exercises

processing pipeline for exercise dashboards

data-analysis serlo

Last synced: 26 Feb 2025

https://github.com/skuschel/postexperiment

postprocessor for experimental (event based) data.

data-analysis eventstore hacktoberfest postprocessing

Last synced: 12 Jun 2026

https://github.com/aneeshmurali-n/project-ml-data-preprocessing

The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.

data-analysis data-cleaning data-encoding data-exploration feature-scaling label-encoding matplotlib minmaxscaler numpy one-hot-encoding outlier-detection pandas standardscaler

Last synced: 02 May 2026

https://github.com/bcko/ud-da-eda-redwinequality

Udacity Data Analyst Nanodegree Project : Exploratory Data Analysis : Red Wine Quality dataset

data-analysis data-analyst-nanodegree exploratory-data-analysis r-markdown rstudio udacity udacity-data-analyst-nanodegree udacity-nanodegree

Last synced: 10 Feb 2026

https://github.com/chinmayee4/sales-analysis-for-ferns-n-petals

Analyzed Data By Creating Interactive Dashboard Using MS Excel

data-analysis data-cleaning data-visualization excel pivot-tables powerquery

Last synced: 11 Feb 2026

https://github.com/vikktor93/proyecto-final-python-datascience

Dataset analysis of worldwide sales of video games on different platforms in 2020

data-analysis data-science jupyter-notebook kaggle matplotlib pandas python seaborn

Last synced: 16 Apr 2026

https://github.com/praveen-devknight/event-registration-analytics-dashboard

This project presents an interactive and visually-rich Power BI dashboard that analyzes registration data from a college-level technical and non-technical event, Teciton. The dashboard provides comprehensive insights into participant demographics, event preferences, food choices, and time-based trends.

data-analysis data-visualization excel powerbi sql

Last synced: 11 Feb 2026

https://github.com/joemull/pyjade

A data curation script for the Jane Addams Digital Edition

data-analysis digital-humanities

Last synced: 11 Feb 2026

https://github.com/karlyndiary/spotify-excel-dashboard

Data Analysis on the Spotify Dataset using Microsoft Excel and VBA.

charts data-analysis data-cleaning data-visualization excel excel-export excel-vba pivot-tables

Last synced: 04 Jan 2026

https://github.com/karlyndiary/coffee-shop-sales-analysis

Comprehensive analysis of coffee shop sales utilizing Pandas for data cleaning and exploratory data analysis (EDA), complemented by Streamlit for creating interactive data visualization dashboards.

data-analysis data-cleaning data-preprocessing data-visualization eda pandas streamlit streamlit-dashboard

Last synced: 07 May 2026

https://github.com/mstovarh/analisis-de-bebidas-de-starbucks

En este repositorio se encuentran unas gráficas basadas en diversas características de las bebidas de Starbucks, usé tecnologías como la herramienta de Data Analysis de ChatGPT, Excel y PowerQuery.

chatgpt data-analysis excel powerquery

Last synced: 15 Apr 2025

https://github.com/thlindustries/mortalidade_neonatal_python_react

Uma plataforma de visualização de dados montada utilizando Python e React com a library de visualização do Plotly

data-analysis data-visualization plotly python python3 react reactjs

Last synced: 16 Apr 2026

https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance

Project assets for my first exploratory data analysis: Baseball Performance vs. Attendance.

bigquery data-analysis data-cleaning data-visualization excel rstudio sql tableau tidyverse

Last synced: 12 Feb 2026

https://github.com/rohitblaze10/-excel-_seller_store_analysis

A collection of data analysis projects showcasing data cleaning, exploration, visualization, and machine learning. Using "Excel" and more to uncover insights and drive data-driven decision-making. Feel free to explore, contribute, or collaborate!

data-analysis data-visualization excel excel-export

Last synced: 12 Feb 2026

https://github.com/ankit21111/carpredict

This project predicts car prices using machine learning models, including Simple and Multiple Linear Regression. It covers data acquisition, feature selection, and optimization techniques like Ridge Regression. The best model, Multiple Linear Regression, achieved an R² score of 0.84. Check out the full analysis in the repository!

data-analysis data-visualization matplotlib numpy pandas pyhton scipy seaborn sklearn

Last synced: 16 Apr 2026

https://github.com/mnkanout/patients_medication_prediction

The aim of the project is to create a model that can help medical professionals select the proper medication for patients based on their symptoms. The model uses historical data of other patients to predict what could be the most suitable medication based on the patient's symptoms.

data data-analysis data-science data-visualization decision-tree-classifier machine-learning python3

Last synced: 29 Jun 2025

https://github.com/sambit-mondal/stockx

StockX is a full-stack application designed to help store owners efficiently manage their inventory, track purchases, and analyze stock levels. The system integrates MongoDB, Express, React, and Flask (Python) to provide a seamless experience.

artificial-intelligence data-analysis inventory-management-system machine-learning mern-stack

Last synced: 12 Jun 2026

https://github.com/rohansoni45/whatsapp-chat-analysis

This project involves analyzing WhatsApp chat data to extract valuable insights. Using Python and various libraries like Pandas and Matplotlib, the project processes and visualizes chat statistics such as message frequency, most active participants, and sentiment analysis.

chat-analysis data-analysis data-science matplotlib pandas python sentiment-analysis streamlit visualization web-app word-cloud

Last synced: 07 May 2026

https://github.com/farhad-here/median-performance-comparison

Benchmarking the performance of median calculation using vanilla Python vs NumPy.

data-analysis matplotlib numpy python

Last synced: 18 Apr 2026

https://github.com/ronylpatil/whatsapp-group-chat-analysis

This project is totally based on data analysis where our college official Whatsapp group is used to extract useful information from the chat. Some of the useful extracted features are most active members of the group, most active day of the week, top-10 media contributors in the Group, and many more...

data-analysis data-preprocessing data-wrangling feature-engineering

Last synced: 14 Jun 2025