An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/idaraabasiudoh/knn-customer-classification

Labels telecommunication customer base to respective groups to determine service type required for each customer.

data-analysis jupyter-notebook machine-learning pyhton3 scikit-learn

Last synced: 07 May 2026

https://github.com/md-emon-hasan/data-science

Data science tutorials, including data preprocessing, analysis, visualization, project deployment, machine learning and deep learning algorithms.

artificial-intelligence data-analysis data-engineering data-science deep-learning machine-learning-algorithms python

Last synced: 07 May 2026

https://github.com/y-india/retail-sales-analysis-project

Analysis and preprocessing of retail store sales data. Includes data loading, merging, and initial inspection. 📌 Recommended: See README.md for detailed project progress and dataset information.

ai dashboard data-analysis data-science data-visualization jupiter-notebook machine-learning matplotlib python real-world-problem-solving real-world-project retail-analytics sales-analysis seaborn sklearn-library streamlit

Last synced: 07 May 2026

https://github.com/shadan100/sales-prediction-analysis

The aim is to build a predictive model and find out the sales of each product at a particular store. Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.

artificial-intelligence data-analysis data-science django django-framework jupyter-notebook machine-learning matplotlib pandas predictive-modeling python sales-prediction

Last synced: 01 Mar 2026

https://github.com/akarshankapoor7/tensorflow_tutorial

This is an easy and fast tutorial for tensorflow. In data science, TensorFlow is an open-source machine learning framework by Google. It's used for building and training machine learning and deep learning models.

data-analysis data-science deep-learning machine-learning tensorflow

Last synced: 27 Apr 2026

https://github.com/bassamn/titanic-data-analysis

Exploratory data analysis (EDA) of the Titanic dataset using Python. Analyzed survival patterns by age, gender, and class with visualizations (seaborn/matplotlib). Non-ML focus—highlighting insights with statistics and plots.

data-analysis eda pandas python seaborn titanic visualization

Last synced: 08 May 2026

https://github.com/antononcube/wl-quantileregression-paclet

Wolfram Language (aka Mathematica) paclet that provides various Quantile Regression functions.

data-analysis machine-learning quantile-regression time-series time-series-analysis

Last synced: 20 Mar 2026

https://github.com/adrija-debnath/ideas-isi-data-science-internship

Topic of the Project - Predictive Maintenance Analysis, Data Science Internship at IDEAS - Institute of Data Engineering, Analytics and Science Foundation Technology Innovation Hub at Indian Statistical Institute, Kolkata.

data-analysis data-science predictive-analytics predictive-maintenance streamlit

Last synced: 27 Apr 2026

https://github.com/obirikan/ad-performance-analysis

This project Compares Ad Effectiveness Using A/B Tests; analyzes ad performance using user interaction data, advertisement metadata, and device data. The goal is to evaluate click-through rates (CTR) across various ad versions, platforms, and devices.

data-analysis pandas

Last synced: 27 Apr 2026

https://github.com/tnleite/projeto_king_lift

Este projeto apresenta uma análise detalhada dos dados financeiros da King Lift, uma empresa de locação de empilhadeiras. Utilizando Microsoft Excel, Power Query e Power Pivot, desenvolvi um dashboard interativo, também em Excel, que ajuda a empresa a obter insights valiosos para melhorar a eficiência operacional e aumentar o faturamento.

data-analysis data-science data-visualization excel

Last synced: 19 Mar 2026

https://github.com/miroslav-reiter/kurz_jazyk_sql_analytici_datovi_vedci

Materiály ku kurzu Jazyk SQL 1 pre Analytikov a Dátových Vedcov

analysis analytics data data-analysis data-science database mysql reiter sql

Last synced: 08 May 2026

https://github.com/framebuffers/mindhunter

Wrappers for Pandas DataFrames to add quicker access for common statistical values, utilities and functionality.

data-analysis data-science numpy pandas python utilities-python

Last synced: 08 May 2026

https://github.com/md-emon-hasan/data_analytics_project

Data analytics tasks and solutions, featuring hands-on exercises for data cleaning, visualization, and analysis using Python libraries.

cars-dataset census-data covid19-data data-analysis london-house-price police-data weather-data

Last synced: 08 May 2026

https://github.com/swarnim1812/crime_project

AI-Driven Crime Forecasting Across Indian States — A pioneering machine learning project that harnesses time series modeling (SARIMAX, Ridge Regression) to uncover patterns and forecast crime trends using real-world multi-state temporal and socio-economic data.

analytics crime-locator crime-prediction data-analysis deep-learning machine-learning prophet-facebook sarimax-model time-series-forecasting

Last synced: 31 Jan 2026

https://github.com/sathyasris27/time-series-and-spectral-analysis-

The aim of this project involves the analyses the data, removing trends and seasonal effects, identifying the underlying process, understanding the dominant frequencies, and using the residuals to make predictions.

data-analysis data-visualization forecasting r spectral-analysis time-series-analysis

Last synced: 07 Jun 2026

https://github.com/jongan69/potion-leaderboard

Start of Entry for potion leaderboard contest

data-analysis leaderboard potion trading

Last synced: 11 Jun 2026

https://github.com/alxrm/scent-of-literature

Russian literature sentiment analysis in terms of very small dataset

classification data-analysis sentiment-analysis sklearn tf-idf

Last synced: 28 Apr 2026

https://github.com/sedatdikbas/aefes-time-series-forecasting

Bu proje, Anadolu Efes Biracılık ve Malt Sanayii A.Ş. (AEFES) piyasa verilerini kullanarak kapanış fiyatlarının gelecekteki değerlerini tahmin etmek amacıyla derin öğrenme yöntemleri (LSTM, BiLSTM, CNN+LSTM) kullanmaktadır. Projede, veri ön işleme, model eğitimi ve değerlendirme adımları detaylandırılmıştır.

bilstm cnn-lstm data-analysis deep-learning financial-forecasting lstm machine-learning python stock-price-prediction tensorflow

Last synced: 09 May 2026

https://github.com/rubinlake/rl-academy-data-analytics

Educational data analysis project demonstrating BMW sales data analysis with AI-powered code assistance using Cursor IDE and Jupyter notebooks

cursor-ide data-analysis educational-project jupyter langchain matplotlib numpy pandas python scipy seaborn

Last synced: 09 May 2026

https://github.com/bhavik444/techistanbul_python_bootcamp

👨💻 Master Python programming through practical exercises in this 80-hour bootcamp, designed for beginners to advanced learners.

algorithms api-development automation coding-bootcamp data-analysis data-visualization django flask git machine-learning python software-engineering testing web-development web-scraping

Last synced: 28 Apr 2026

https://github.com/mkk-1817/adhd-prediction

This project focuses on leveraging machine learning techniques to predict Attention-Deficit/Hyperactivity Disorder (ADHD) in children. Accurate and early diagnosis is crucial for effective intervention and support.

adhd data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms prediction python

Last synced: 09 May 2026

https://github.com/thevinh-ha-1710/rstudio-statistics

This project deeply studies 2 datasets using applied statistics techniques.

applied-statistics data-analysis data-science data-visualization rmarkdown rstudio

Last synced: 31 Jan 2026

https://github.com/mariam-badr-mb/gtc-ml-project2-diabetes-prediction

This project is part of the GTC Machine Learning Program. It demonstrates the end-to-end ML workflow by building a predictive model for diabetes detection

classification-algorithm data-analysis data-visualization diabetes-prediction gridsearchcv hyperparameter-tuning machine-learning python

Last synced: 09 May 2026

https://github.com/billy-enrizky/yelpfusion

Finding All restaurants in the Maryland area using YelpFusion API

data-analysis pandas yelp-api yelpfusion

Last synced: 28 Apr 2026

https://github.com/gabrielmpinho/cs50-sql

Solutions and notes from CS50’s Introduction to Databases with SQL. Covers CRUD operations, data modeling, normalization, joins, views, indexes, and connecting SQL with Python and Java. Begins with SQLite for portability and introduces PostgreSQL and MySQL for scalability.

data-analysis data-structures data-visualization database databases javascript python sql

Last synced: 10 May 2026

https://github.com/pratik-khose/data-analysis-with-pandasai

PandasAI with Llama3 for Interactive Data Analysis

data-analysis llama3 llma pandasai streamlit visualization

Last synced: 11 May 2026

https://github.com/easycris-software/easycris

Professional statistical analysis and RNA-seq for researchers — no coding required

anova bioinformatics data-analysis desktop-app genomics pharmacology research-tools rna-seq statistics tauri

Last synced: 11 May 2026

https://github.com/affec-ds/netflix-recommender-system

Sistema de recomendación de títulos de Netflix basado en contenido. Incluye filtros por título, género y tipo de contenido (películas o series) con interfaz interactiva en Jupyter Notebook.

content-based-recommendation data-analysis eda ipywidgets jupyter-notebook machine-learning movies netflix portfolio-project python recommender-system

Last synced: 28 Apr 2026

https://github.com/mohamedhany99/human-voice-identifier-counter

the application developed in (KIVY) it can identify the users imported into the dataset based on the support vector machine training model it has two features ( Importing new voice - Detection to detect the human voices and count them)

android android-app android-application automation automation-framework data data-analysis data-mining data-science data-visualization datascience kivy kivy-framework machine-learning python

Last synced: 27 Mar 2026

https://github.com/dsrodrigovieira/houserocketsales

Este repositório contém um projeto desenvolvido para praticar habilidades de análise de dados utilizando Python

data-analysis data-visualization heroku kaggle-dataset python

Last synced: 29 Apr 2026

https://github.com/mayankyadav23/air-bnb-data-analysis

Data analysis and insights from NYC Airbnb listings, focusing on key metrics such as host performance, neighborhood trends, pricing, and customer reviews. Comprehensive documentation of ETL processes and analytical methodologies is provided. Perfect for understanding Airbnb dynamics and decision-making in the NYC market.

advanced-excel business-intelligence data-analysis data-analytics data-visualization power-bi ppt

Last synced: 19 Mar 2026

https://github.com/santiagortiiz/snowflake-data-warehousing

Snowflake University. Snowflake Data Warehousing. Foundamentals

big-data data-analysis data-warehouse olap snowflake

Last synced: 19 Mar 2026

https://github.com/is-leeroy-jenkins/sherpa

A budget execution & data analysis tool based on Winforms, .NET 6, and written in C# for EPA analysts

budget-management data-analysis data-science data-visualization federal-government

Last synced: 13 May 2026

https://github.com/manwithacap/by-the-metric-match

🎲🃏 A game data tracker for your board/card/video games!

data-analysis data-visualization games jupyter-notebook python utility

Last synced: 29 Apr 2026

https://github.com/iguptashubham/pizzahut-analysis-sql

best dataset for data analysis. Pizzahut data analysis done by Shubham Gupta in MySql. This dataset is provided by friend of mine intern at pizzahut. In pizzahut, they used this dataset to train and ask question. This data does not reveal anything about the pizzahut. It is safe to share. data

data-analysis data-analytics database dataset datasets mysql mysql-database pizzahut

Last synced: 14 May 2026

https://github.com/phammings/sales-management-analysis

Sales management analysis and Power BI dashboard for sample business request and user stories

data-analysis excel powerbi sql

Last synced: 01 Feb 2026

https://github.com/jhrcook/wagenmaker-data-analysis

Analysis of Registered Replication Report: Strack, Martin, & Stepper (1988) by Wagenmaker et al.

data-analysis r r-project statistics

Last synced: 08 Jun 2026

https://github.com/varshithdupati/yelp-business-analysis

Big Data analysis on Yelp reviews/businesses for Arizona. Using Hadoop, Spark, PySpark.

arizona-state-university big-data big-data-analytics data-analysis hadoop pyspark spark yelp

Last synced: 04 May 2026

https://github.com/sunnybibyan/random_data_generation

A project that generates a dataset using various statistical distributions (Normal, Uniform, Exponential, Random Integers, and Binomial) and performs data analysis. Includes visualizations and an option to export the data as a CSV file.

data-analysis data-visualization python random-data-generation statistics streamlit-webapp

Last synced: 13 Jun 2026

https://github.com/abhi18av/innovation-competition

Submission for a programming challenge

clojure clojurescript data-analysis

Last synced: 13 Jun 2026

https://github.com/reinmagine/eliminating-no-sensor

Contains my project that analyzes air quality sensor data to determine if the NO (Nitric Oxide) sensor in N. Mai, Los Angeles, CA can be removed without affecting data accuracy.

air-quality-sensor colab-notebook cost-optimization data-analysis data-optimization matplotlib-python nitric-oxide pyspark-python python sql

Last synced: 14 Jun 2026

https://github.com/soufianboukir/ecom-analytics-platform

End-to-end data science project on an Amazon sales dataset, including data preprocessing, analysis, modeling, and a Streamlit dashboard for insights and decision-making.

data-analysis data-science data-visualization data-visualization-dashboard forecasting-models timeseries

Last synced: 14 Jun 2026

https://github.com/adithya17-star/ai-powered-fraud-detection

An AI-powered fraud detection system using machine learning algorithms to identify suspicious transactions and provide interactive visualizations for financial security.

dashboard-visualization data-analysis finance-technology fintech flask fraud-detection machine-learning python security transaction-monitoring

Last synced: 02 May 2026

https://github.com/shridhar1504/milk-production-time-series-forecasting-datascience-project

This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.

adf arima-model augmented-dickey-fuller-test data-analysis data-analytics data-science data-visualization eda exploratory-data-analysis machine-learning machine-learning-algorithms python python3 residuals sarimax seasonality time-series time-series-forecasting trends

Last synced: 02 May 2026

https://github.com/suma-aljudaia/my-portfolio

Suma Aljudaia | Portfolio – AI & Data Analysis Enthusiast

ai css data-analysis html machine-learning portfolio

Last synced: 02 May 2026

https://github.com/ronitjariwala/prodigy_ds_04

Prodigy InfoTech Data Science Internship Task-4

data-analysis data-science data-visualization python

Last synced: 02 May 2026

https://github.com/se7en69/rna-seq-data-processing-and-analysis-pipeline

This pipeline automates essential steps for RNA-Seq data analysis, including quality control, read trimming, alignment to a reference genome, and coverage quantification. It leverages tools like FastQC, fastp, STAR, and bedtools to ensure high-quality results, with MultiQC reports providing an overview at each stage.

bioinformaitcs-scripting bioinformatics bioinformatics-pipeline data-analysis linux scripts shell

Last synced: 02 May 2026

https://github.com/benzerinsio/breastcancer-eda

📊 Análise Exploratória de Dados (EDA) - Câncer de Mama | Exploração de características clínicas para identificar padrões e relações no diagnóstico de câncer de mama.

analise-de-dados analise-exploratoria analise-exploratoria-de-dados data-analysis data-visualization diagnosis eda exploratory-data-analysis health-care medical-data python seaborn

Last synced: 02 May 2026

https://github.com/sarah-marion/sovereign-osint-toolkit

Sovereign OSINT Toolkit - Advanced, self-hosted intelligence platform for security researchers and investigators. Ethical, private and production-ready.

correlation-engine cybersecurity data-analysis docker fastapi infosec intelligence investigation open-source osint privacy python3 security-research security-tools threat-intelligence

Last synced: 02 May 2026

https://github.com/maddieemihle/pandas-challenge

Python analysis to create and manipulate school and standardized test data. Scores are calculated, grouped, aggregated, summarized, and organized using pandas.

data-analysis pandas-python

Last synced: 09 Jun 2026

https://github.com/dimamirana/finding-correlation-among-social-media-usage-depression-sleep

In our project we tried to analysis whether there is a link between depression and social media usage time

anaconda data-analysis jupiter-notebook matplotlib-pyplot patternlab python

Last synced: 03 May 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/chaedoll/analysis-python-foreignerinfra

국내 외국인 대상 인프라 개선을 위한 보고서 (Report on improving infrastructure for foreigners)

data-analysis python team-project

Last synced: 03 May 2026

https://github.com/rohitinu6/tesla-price-prediction

A machine learning project that predicts future stock price movements using Logistic Regression, SVC, and XGBoost with engineered financial features.

data-analysis data-visualization feature-engineering financial-analysis logistic-regression machine-learning matplotlib python scikit-learn seaborn stock-market stock-price-prediction support-vector-machine time-series xgboost

Last synced: 03 May 2026

https://github.com/vipulbunny/restaurant-insight-analysis

A comprehensive data analysis project exploring restaurant ratings, locations, and customer sentiments. This project includes data preprocessing, descriptive analysis, geospatial mapping, sentiment analysis, and price-rating correlations using Python and visualization tools.

data-analysis data-preprocessing data-visualization folium geospatial geospatial-analysis geospatial-visualization machine-learning nlp pandas python restaurant-insights seaborn sentiment-analysis

Last synced: 03 May 2026

https://github.com/devlucho/modelos-predictivos

Modelos predictivos utilizando los algoritmos de Regresión Lineal, Regresión Logística y Árboles de Decisión.

data-analysis jupyter-notebook python3

Last synced: 03 May 2026

https://github.com/ababic/dumpling

Fast, flexibile, powerful static data anonymisation for SQL dumps

anonymisation cli data-analysis data-science pii pii-redaction postgres privacy rust rust-lang scrubber scrubbing security tooling

Last synced: 03 May 2026

https://github.com/salma-mamdoh/project-writing-functions-for-product-analysis

My Project to learn the Basics of Analysis on DataCamp

data-analysis data-camp pandas python

Last synced: 03 May 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/nurulashraf/logistic-regression-loan-prediction

Loan approval prediction using logistic regression based on applicant data, including income, credit history, and property details, after data preparation and feature engineering.

data-analysis data-science loan-prediction logistic-regression machine-learning predictive-modeling python sklearn

Last synced: 03 May 2026

https://github.com/ljadhav25/swiggy-restaurant-analysis

This repository contains data and analysis related to restaurants listed on Swiggy, one of India's largest online food ordering and delivery platforms. The objective is to explore restaurant trends, customer reviews, pricing strategies, and delivery metrics to gain insights into the food delivery industry.

data-analysis data-visualization matplotlib-pyplot numpy-library pandas-library python seaborn-plots

Last synced: 03 May 2026

https://github.com/devesh8423/machine_learning

Machine Learning practice projects, Jupyter notebooks, and datasets for learning regression, classification, and data analysis.

classification data-analysis data-science data-visualization jupyter-notebook machine-learning matplotlib ml-project numpy-library pandas python regression sckit-learn seaborn

Last synced: 03 May 2026

https://github.com/donmaruko/flask-data-analysis

Flask API for statistical calculations. Data analysis, cleansing, visualization, and manipulation. Documented by Swagger.

api api-rest data-analysis data-science data-visualization datascience flasgger matplotlib pandas seaborn sqlite wordcloud

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/mindlessmuse666/titanic-data-visualization

Проект по визуализации данных о пассажирах Титаника с использованием библиотек Python Matplotlib, Seaborn и Plotly.

data-analysis data-visualization matplotlib pandas plotly python seaborn titanic

Last synced: 04 May 2026

https://github.com/sanchittechnogeek/rental-data-visualization_python

Statistics and visualization of rental data with python

data-analysis data-science data-visualization statistics

Last synced: 04 May 2026

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: 04 May 2026

https://github.com/mr-chang95/sf_data_visualization

In this personal project, I am interested in examining all of the active businesses in the San Francisco Bay Area while performing some simple data visualizations, mainly on categorical variables.

business data-analysis data-visualization jupyter-notebook pandas python san-francisco

Last synced: 04 May 2026

https://github.com/sweta-kaundilya/python_for_data_analysis

Learning Python and all the relevant libraries in python for Data field.

cufflinks data-analysis data-science matplotlib numpy pandas plotly python seaborn

Last synced: 04 May 2026

https://github.com/halyusa16/e-commerce-analysis

This project analyzes a public e-commerce dataset to uncover valuable insights and answer critical business questions. The dataset contains customer, product, order, and transaction details, providing a comprehensive view of the e-commerce platform's operations.

data-analysis data-cleaning data-exploration data-visualization self-project

Last synced: 09 Jun 2026

https://github.com/abhinav330/911-emergency-calls-analysis

This Python Notebook analyzes emergency call data from the '911.csv' dataset. It uses various data visualization techniques to explore and gain insights into the emergency call data, including the types of calls, reasons for calls, and call patterns over time.

data-analysis data-science data-visualization eda exploratory-data-analysis exploratory-data-visualizations numpy pandas python

Last synced: 09 Jun 2026

https://github.com/ljadhav25/logistic-regression-data-science-

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote, based on a given data set of independent variables.

data-analysis data-science data-visualization logestic-regression machine-learning

Last synced: 04 May 2026

https://github.com/bishopce16/pyber_analysis

The purpose of this project was to complete an exploratory analysis and create visualizations of the 2019 ride sharing data from PyBer.

data-analysis data-visualization jupyter-notebook matplotlib pandas python

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026