An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/mnkanout/patients_medication_prediction

The aim of the project is to create a model that can help medical professionals select the proper medication for patients based on their symptoms. The model uses historical data of other patients to predict what could be the most suitable medication based on the patient's symptoms.

data data-analysis data-science data-visualization decision-tree-classifier machine-learning python3

Last synced: 29 Jun 2025

https://github.com/mstovarh/analisis-de-bebidas-de-starbucks

En este repositorio se encuentran unas gráficas basadas en diversas características de las bebidas de Starbucks, usé tecnologías como la herramienta de Data Analysis de ChatGPT, Excel y PowerQuery.

chatgpt data-analysis excel powerquery

Last synced: 15 Apr 2025

https://github.com/karlyndiary/spotify-excel-dashboard

Data Analysis on the Spotify Dataset using Microsoft Excel and VBA.

charts data-analysis data-cleaning data-visualization excel excel-export excel-vba pivot-tables

Last synced: 04 Jan 2026

https://github.com/satyam4229/omnify-dataanalysis

Our assessment of Omnify focused on data-driven strategies to maximize profitability. We identified "Product X" as the most profitable product and recommended leveraging the "Wellness Solutions" keyword category for optimal keyword strategy.

data-analysis data-science data-visualization excel omnify

Last synced: 04 Jan 2026

https://github.com/aneeshmurali-n/project-ml-data-preprocessing

The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.

data-analysis data-cleaning data-encoding data-exploration feature-scaling label-encoding matplotlib minmaxscaler numpy one-hot-encoding outlier-detection pandas standardscaler

Last synced: 02 May 2026

https://github.com/serlo/data-pipeline-interactive-exercises

processing pipeline for exercise dashboards

data-analysis serlo

Last synced: 26 Feb 2025

https://github.com/skysign/dat

데이터분석을 함께 공부하는 스터디입니다.

data data-analysis data-science

Last synced: 02 Jan 2026

https://github.com/ronitjariwala/prodigy_ds_02

Prodigy InfoTech Data Science Internship Task-2

data-analysis python

Last synced: 28 Apr 2026

https://github.com/andrii04/ga4-gcs-to-bigquery-etl

Automated Data Pipeline that ingests daily GA4-formatted CSV files from a private Google Cloud Storage bucket, validates and loads them into BigQuery, and prepares analysis-ready views. The solution is built for deployment as a Cloud Function triggered by Cloud Scheduler and uses Python with the Google Cloud Storage and BigQuery client libraries.

automation bigquery cloud cloudfunctions data data-analysis data-engineering etl etlpipeline gcp google googlecloudplatform pipeline python sql

Last synced: 18 May 2026

https://github.com/rahul-jha98/restauranttrends.stats-backend

Application that scrapes the Zomato Dataset and enables the user to visualise the results.

data-analysis data-extraction firebase-storage web-scraping zomato-api

Last synced: 16 Mar 2026

https://github.com/okwilkins/retailanalysis

A comprehensive exploratory analysis and implementation of kmeans/hierarchical clustering on online retail data.

data-analysis data-science machine-learning statistics

Last synced: 18 Oct 2025

https://github.com/badranalyst/titanic-survival-prediction-full-data-science-project-classification

This project predicts Titanic survivors using classification models. It includes data cleaning, pre-processing, exploratory data analysis (EDA), categorical feature conversion, model building, and evaluation. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used to analyze and predict survival outcomes.

classification data-analysis data-science eda exploratory-data-analysis machine-learning matplo matplotlib-pyplot ml model numpy pandas predictive-modeling python seaborn

Last synced: 06 May 2026

https://github.com/farzeen-2001/superstore_analysis_sql

Anaylsed the superstore Data using SQl

data-analysis mysql sql

Last synced: 15 Apr 2025

https://github.com/atanikan/data-mining-projects

Data Mining Homework

data-analysis iub

Last synced: 14 Mar 2025

https://github.com/azaz9026/data_cleaning

Welcome to the Data Cleaning repository! This collection is dedicated to showcasing techniques and methods for cleaning and preparing datasets for analysis.

data-analysis data-engineering data-structures data-visualization eda feature-engineering machine-learning numpy outliers pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/sanchittechnogeek/overscripted-analysis

Geolocation and user language extraction analysis from Mozilla Overscripted dataset

analysis data data-analysis mozilla

Last synced: 23 Mar 2025

https://github.com/bibymaths/python_snippets

A collection of Python scripts for bioinformatics data analysis, including tools for transcription counts, nucleotide composition, and protein sequence evaluation.

amino-acid-scoring bioinformatics data-analysis fasta-generation mathematical-evaluation nucleotide-analysis protein-sequence-analysis transcription-counts

Last synced: 29 Jul 2025

https://github.com/nevermendel/revolut-analysis

Python script to analyse Revolut transactions

data-analysis revolut revolut-analysis

Last synced: 12 Apr 2025

https://github.com/asghar-rizvi/youtube-statistics-project

This project analyzes a dataset of global YouTube statistics to uncover insights about YouTube channels, their ranks, and other attributes. The dataset used for this analysis was obtained from Kaggle.

data-analysis data-analysis-python data-science data-science-projects matplotlib numpy pandas pycharm-ide python seaborn

Last synced: 13 Jun 2026

https://github.com/cnoret/retail-data-analysis

Let's analyze historical sales data from a large retail chain and predict weekly sales using machine learning on a Streamlit web app

data-analysis data-analyst data-science data-vizualisation pandas python streamlit streamlit-webapp

Last synced: 10 Apr 2026

https://github.com/ssoehdata/sql_for_data_science_specialization_course

Materials and Certifications from the SQL for DataScience Course

data-analysis data-science database databricks postgresql sql sqlite

Last synced: 10 Apr 2026

https://github.com/laudebugs/fec-data-analysis-2020

The project aimed to determine the total sum of contributions to the candidate committees as well as the number of contributions made by individuals.

data-analysis fec presidential-candidates

Last synced: 16 May 2026

https://github.com/lopez86/rust-mlearn

Machine Learning Tools in Rust

data-analysis data-science machine-learning rust

Last synced: 15 May 2025

https://github.com/farhad-here/data-visualization-analysis-dva

This is my data analysis project. Users can use this project to clean and preprocessing the date or data visualization. Individuals can impute or ecnode ther dataset.

altair bokeh data-analysis data-analysis-python io matplotlib numpy pandas plotly python sklearn streamlit

Last synced: 11 Apr 2026

https://github.com/itrauco/data-dirtying-tool

a simple command line tool to generate dirty data and do common data things in google cloud

data data-analysis data-engineering data-ops data-pipeline data-science data-visualization data-wrangling dirty-data google-cloud machine-learning

Last synced: 24 Feb 2025

https://github.com/antoniszks/music-category-identifier

A 'Data-Science & Machine Learning' project where we are training a neural network to identify what kind of music we give to it. Based on a university project.

ai artificial-intelligence data-analysis data-science jupyter-notebook machine-learning ml notebook python

Last synced: 25 Feb 2025

https://github.com/dsrodrigovieira/favoritasales

Este repositório contém o projeto desenvolvido para o desafio do kaggle "Store Sales - Time Series Forecasting. Use machine learning to predict grocery sales"

data-analysis data-science kaggle-competition machine-learning python telegram-bot xgboost-regression

Last synced: 05 May 2026

https://github.com/motapinto/agent-based-simulation-conquest

Agent-based simulation modelation of the conquest Battlefield gamemode

agent-based-simulation data-analysis jade java sajas swing

Last synced: 24 Jan 2026

https://github.com/shivam5509/power-bi-project

Expert in creating interactive dashboards and reports using Power BI, utilizing 10+ visual tools like cards, slicers, and charts. Skilled in cleaning and transforming large datasets with Power Query Editor. Proficient in advanced DAX functions (SUMX, FILTER, CALCULATE) to derive insights and drive data-driven decisions.

advanced-excel computer-science data-analysis data-mining data-visualization engineering mysql numpy pandas powerbi pyhton3 sql sql-server

Last synced: 11 Apr 2026

https://github.com/shubham200137/icc-women-s-t20-world-cup-data-analytics

Created a Power BI report to identify top 11 players for a T20 cricket team by scraping data from espncricinfo with Python, cleaning and transforming the data with pandas, and evaluating various player performance metrics.

beautifulsoup4 data-analysis data-visualization numpy-python pandas-python powerbi web-scraping

Last synced: 25 Feb 2025

https://github.com/shubham200137/cyclistic-case-study

This repository contains a case study for Google's Data Analytics Professional Certificate, focusing on Cyclistic, a fictional bike sharing company in Chicago. The case study aims to drive growth by converting casual riders into members through a marketing strategy.

data-analysis data-visualization numpy-python pandas-python presentation-slides sql tableau

Last synced: 11 Jun 2026

https://github.com/faisal-fida/box-office-mojo-analysis

Analyzed box office data from Box Office Mojo, exploring relationships between worldwide revenue, release year, and a combined score that considers both factors. It includes visualizations like scatter plots, bar charts, and identifies top and bottom performing movies.

box-office data-analysis data-science python revenue-prediction visualization

Last synced: 06 May 2026

https://github.com/lotfiferaga/google-play-store-sentiment-analysis

Perform sentiment analysis on Google Play Store reviews using Python. Analyze user feedback to determine the overall sentiment (positive, negative, or neutral) towards various apps. Gain insights to aid developers and businesses in understanding user satisfaction levels and improving their products.

data-analysis data-visualization googleplayservices python reviewsanalysis-nlp

Last synced: 26 Feb 2025

https://github.com/grlyntng/rpims

Django Code and documentation for the Retail Pharmacy Inventory Management System (best final year project award)

data-analysis django erp forecasting-models lstm-neural-networks reporting

Last synced: 26 May 2026

https://github.com/zborovskaanna/dou-salary-analysis

Python data analysis project focused on improving data manipulation skills using Pandas

data-analysis pandas python

Last synced: 26 Feb 2025

https://github.com/weybsonalves/prevendo-o-atrito-de-clientes

Projeto em que percorro as etapas que compõem o ciclo de vida da ciência de dados a fim de prever o atrito de clientes do serviço de cartões de crédito de um banco.

data-analysis data-science data-visualization machine-learning python

Last synced: 06 May 2026

https://github.com/elakkiya-u/digital-marketing-campaign

A machine learning project to predict whether a customer will convert based on digital marketing campaign data.

campaigns data-analysis deployment digital-marketing machine-learning predictive-modeling python

Last synced: 30 Jun 2025

https://github.com/apsinghanalytics/hranalytics_myersbriggspersonalityinsights

A Excel analytics study exploring the correlation between personality traits and key HR-relevant parameters, including tenure and performance

data-analysis data-visualization excel pivot-tables

Last synced: 30 Jan 2026

https://github.com/jayita11/healthcare-management-optimization-analysis-and-visualization

This project analyzes healthcare data from 2019 to May 2024, optimizing patient care, resource allocation, and financial management. Insights include billing trends, blood bank management, doctor performance, and medication demand, supported by excel,interactive Tableau dashboards and SQL analysis.

data-analysis excel healthcare interactive-dashboards mysql sql tableau-dashboards

Last synced: 23 Mar 2025

https://github.com/shimaa83/eda_v2

Automatic EDA library

data-analysis data-science python

Last synced: 20 Apr 2026

https://github.com/diem0n/100daysofdatascience

This repository is a collection of things i do on as a data scientist each day as i am hired at a fictional company called keko corp

data-analysis data-engineering data-science data-science-from-scratch data-warehousing machine-learning python

Last synced: 09 Apr 2026

https://github.com/tolumie/loan-approval-prediction

Loan Approval Prediction using Machine Learning | EDA + Decision Tree, Random Forest & Logistic Regression | Automating loan eligibility for Dream Housing Finance by analyzing customer data and predicting loan approvals.

classification credit-risk-analysis data-analysis decision-tree-classifier finance-analytics loan-approval logistic-regression-algorithm machine-learning predictive-modeling-techniques random-forest

Last synced: 30 Jun 2025

https://github.com/tashi-2004/apache-hadoop-spark-hive-cyberanalytics

This project utilizes Apache Hadoop, Hive, and PySpark to process and analyze the UNSW-NB15 dataset, enabling advanced query analysis, machine learning modeling, and visualization. The project demonstrates efficient data ingestion, processing, and predictive analytics for network security insights.

ai apache-hadoop apache-hive big-data-analytics big-data-processing data-analysis data-engineering data-science data-security data-visualization hdfs machine-learning network-analysis network-security pyspark python3 threat-detection unsw-nb15-dataset

Last synced: 02 May 2026

https://github.com/aphp/jupyter-eds-notebooks

jupyter-eds-notebooks provides Docker images with preconfigured Jupyter environments for clinical and health data analysis, tailored for AP‑HP Datalabs and the HELIX platform.

data-analysis data-science data-visualization healthcare lab

Last synced: 13 Jan 2026

https://github.com/ttwag/p9_pandas

Problems that Introduce the DataFrame Object in Python's Pandas Library

data-analysis pandas-dataframe python

Last synced: 10 Jun 2025

https://github.com/tenifayo/analysis-of-fordgobike-trip-data

Data Visualization using Ford GoBike Trip Data

data-analysis matplotlib pandas

Last synced: 11 Jul 2025

https://github.com/jasontan22/aefes-time-series-forecasting

Bu proje, Anadolu Efes Biracılık ve Malt Sanayii A.Ş. (AEFES) piyasa verilerini kullanarak kapanış fiyatlarının gelecekteki değerlerini tahmin etmek amacıyla derin öğrenme yöntemleri (LSTM, BiLSTM, CNN+LSTM) kullanmaktadır. Projede, veri ön işleme, model eğitimi ve değerlendirme adımları detaylandırılmıştır.

bilstm cnn-lstm data-analysis deep-learning financial-forecasting lstm machine-learning python stock-price-prediction tensorflow

Last synced: 09 Aug 2025

https://github.com/ved-coder-king/wheat_ai_project

This project, Smart Wheat Farming AI System, was developed as part of the coursework for the Artificial Intelligence program at Esprit School of Engineering.

agriculture data-analysis data-visualization deep-learning image-classification machine-learning object-detection python wheat

Last synced: 15 Apr 2025

https://github.com/dug22/jjournal

A Jupyter like notebook software for Java

data data-analysis data-science java jshell jshell-repl notebook swing swing-application

Last synced: 11 Apr 2026

https://github.com/bala-1409/power-bi-visualization-project

This repository contains Visualization Projects which is visualized through Power BI Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and by the insights we can reduce the damages by accidents & calamities.

dashboard data-analysis data-science data-visualization exploratory-data-analysis microsoft-excel microsoft-power-bi microsoft-powerpoint power-bi powerbi powerbi-reports powerbi-visuals visualization

Last synced: 04 Jan 2026

https://github.com/samanhur/data_visualization_pcc

First experiences in data visualization with python

data-analysis data-science data-visualization python3

Last synced: 23 Mar 2025

https://github.com/neha-adnani/sql_music-store-analysis

SQL-based data analysis of a digital music store's sales and customer data.

business-analysis data data-analysis database follow-along-projects pgadmin4 portfolio-project postgres queries sql

Last synced: 18 Jun 2025

https://github.com/abhay-sinha-0/carpricepredictionproject

A machine learning project that predicts the selling price of a car based on its features such as year, mileage, fuel type, transmission, and more. This model can assist individuals and dealerships in estimating fair market prices for used cars.

artificial-intelligence data-analysis data-science data-visualization exploratory-data-analysis machine-learning-algorithms matplotlib-pyplot mysql-database numpy-library pandas-library python skit-learn sklearn-library

Last synced: 15 May 2025

https://github.com/danpoynor/python-number-guessing-game-with-stats

A number guessing game written in Python 3 that presents median, mode, and mean statistics

console-game data-analysis number-guessing-game python3 statistics

Last synced: 26 May 2026

https://github.com/emcramer/clockplot

Plotting utility for a "clockplot" that puts groups into a time-ordered heterogeneity visualization

biology data-analysis data-visualization heterogeneity pseudotemporal-ordering

Last synced: 10 Mar 2026

https://github.com/ak-alien/combobullet

ComboBullet is a versatile log processing and credential extraction toolkit for Windows. It offers multiple features to filter, extract, and manage credentials and cookie data from raw .txt files. This tool is particularly useful for combo scrapers, data analysts, and penetration testers.

combo-extraction cookie-extraction credential-management data-analysis log-processing penetration-testing

Last synced: 30 Jun 2025

https://github.com/regmibijay/opencarp-analyzer

Reads Trace Files created by OpenCARP Models and exports data for easy plotting with inbuilt plotter script.

bioinformatics data-analysis opencarp

Last synced: 16 Jan 2026

https://github.com/felinjob/ibm-applied-data-science-capstone

Este projeto, parte da especialização IBM Data Science Professional Certificate, prevê o sucesso do pouso do Falcon 9 da SpaceX. Usando dados da API da SpaceX e Web Scraping, o projeto inclui análise de dados e Machine Learning para gerar insights sobre os lançamentos.

data-analysis data-science data-visualization ibm jupyter-notebook machine-learning numpy pandas python scikit-learn seaborn sql

Last synced: 11 Apr 2026

https://github.com/27ahmad/netflix_sql_project

The Netflix SQL Project analyzes the Netflix dataset using SQL queries to gain insights into its content, identify trends, and address business problems related to movies and TV shows.

data-analysis postgresql-database sql

Last synced: 03 Feb 2026

https://github.com/stas1f1/methods-and-models-for-multivariate-data-analysis

Completed tasks for the course on methods of mutivatiate data analysis, 1st year of masters, FDT ITMO

data-analysis multivariate-analysis python

Last synced: 10 Mar 2026

https://github.com/27ahmad/ibm-data-science-capstone

The Capstone is the final course in the IBM Data Science Professional Certificate program. It's a project that combines all the skills and knowledge you've gained throughout the specialization.

data-analysis data-science folium-maps machine-learning plotly-dash python sql

Last synced: 26 May 2026

https://github.com/audy21/datacamp

Learning portfolio documenting my progress, while taking Data Analyst & Data Science certifications from DataCamp.

data-analysis data-science machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 11 Apr 2026

https://github.com/amanyadav-07/customer-churn-prediction

Machine Learning project to predict customer churn using Logistic Regression, Random Forest, and XGBoost. Includes data preprocessing, feature engineering, SMOTE balancing, model training, evaluation, and business insights.

accuracy-metrics data-analysis data-visualization logistic-regression machine-learning matplotlib numpy pandas python3 random-forest-classifier seaborn sklearn xgboost-classifier

Last synced: 11 Apr 2026

https://github.com/aksoni07/movie-recommendation

A hybrid movie recommendation system designed to deliver personalized and accurate suggestions by combining user preferences, item attributes, and collaborative patterns, ensuring a seamless and engaging experience.

clustering content-based-filtering data-analysis embeddings jupyter-notebook numpy ollaborative-filtering pandas personalization python recommendation-systems scikit-learn user-item-interactions

Last synced: 11 Apr 2026

https://github.com/shreyaamenon/data-analysis-aiml-mini-projects

mini projects to help me grow skills in data analysis, artificial intelligence and machine learning.

ai data-analysis jupyter-notebook machine-learning python

Last synced: 11 Apr 2026

https://github.com/mudassir-a/vendor-performance-analysis

vendor performance data analysis project using sql, python and power bi

data-analysis powerbi python sql

Last synced: 18 May 2026

https://github.com/badranalyst/student-tests-data-analysis-application

Python-based analysis of student test scores in math, reading, and writing, examining correlations with parental education, lunch type, and test preparation. Includes data cleaning, visualization, and statistical insights into factors influencing academic performance.

data-analysis data-visualization dataset matplotlib numpy pandas python sklearn

Last synced: 05 May 2026

https://github.com/bhavanachitragar/data-analysis-using-pyspark

Working with pyspark module in python and using google colab environment in order to apply some queries to the dataset. The dataset consist of two csv files listening.csv and genre.csv. Also, visualizing query results using matplotlib.

data-analysis google-colab pyspark-sql

Last synced: 30 Jun 2025

https://github.com/ianfelps/jornada_python

Projetos realizados durante a Jornada Python da Hashtag Treinamentos em maio de 2024.

artificial-intelligence automation data-analysis python

Last synced: 28 Apr 2026

https://github.com/zulfachafidz/titanic_explorer_predicting_survival_with_classification_using_knn_algorithm

Tracking Life Safety with the KNN Predictive Analysis Approach. Leveraging the Titanic Dataset, we apply classification analysis to predict the fate of passengers based on a variety of features.

algorithm algorithms data data-analysis data-mining data-science datamodeling datapreprocessing dataset knn-algorithm knn-classification machine-learning machine-learning-algorithms prediction-model

Last synced: 01 Sep 2025

https://github.com/yandexdataschool/ml-sweights-experiments

Experiments for the "Machine Learning on data with sPlot background subtraction" paper

data-analysis high-energy-physics machine-learning statistics

Last synced: 15 May 2025

https://github.com/27ahmad/heart-disease-diagnostic-eda

This project conducts Exploratory Data Analysis on a dataset related to heart diagnostic disease, aiming to derive valuable insights from the analysis.

data-analysis data-visualization pandas python

Last synced: 06 May 2026

https://github.com/andersoncrs/analisis_exploratorio_de_datos-eda-_rendimiento_estudiantil

Este análisis exploratorio de datos (EDA) realizado sobre el conjunto de datos de rendimiento estudiantil tiene como objetivo identificar y comprender los factores que influyen en el desempeño académico de los estudiantes. A través de la limpieza, transformación y visualización de datos, se busca descubrir patrones y relaciones significatvas.

data-analysis data-exploration data-exploration-and-preprocessing data-visualization seaborn

Last synced: 30 Mar 2025

https://github.com/andersoncrs/arboles_de_decision_calidad_del_vino

Contiene un análisis detallado de la calidad del vino utilizando un modelo de clasificación basado en árboles de decisión. Incluye la exploración de datos, detección y manejo de valores atípicos, análisis Univariado y Bivariado, y la creación y evaluación de un modelo predictivo. El objetivo principal es predecir la calidad del vino.

data-analysis data-science data-visualization machine-learning matplotlib seaborn sklearn tree-decision

Last synced: 20 May 2026

https://github.com/malucor/livros

Programa em Python para fazer uma análise de dados sobre livros, a partir de um arquivo Excel.

analise-de-dados book books bookshelf data-analysis ipynb jupyter-notebook livro livros python

Last synced: 16 May 2026

https://github.com/gaaniruddha/mphil

This repository contains a copy of my final MPhil presentation and panel report.

data-analysis gpu-imager radio-astronomy

Last synced: 03 Mar 2026

https://github.com/rita94105/ethereum-fraud-detection

This project focuses on detecting fraudulent transactions in the Ethereum network using both traditional machine learning models and deep learning techniques. By analyzing transaction attributes and interaction patterns, we aim to develop an effective fraud detection model.

data-analysis deep-learning ethereum fraud-detection machine-learning

Last synced: 01 May 2026

https://github.com/omnipotence-eth/manufacturing-quality-analytics

SQL + Python pipeline for semiconductor NCR analysis — supplier performance, defect Pareto, yield trends

analytics data-analysis etl manufacturing matplotlib pandas postgresql python quality sql

Last synced: 11 Apr 2026