An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/edanur-y/bank-customer-churn-prediction-with-classification-models

Comparing the performances of multi-layer perceptron, decision tree, random forest, gradient boosting and extreme gradient boosting classifications on customer data to predict their status of exiting the bank.

data-analysis data-transformation hyperparameter-tuning python

Last synced: 16 Apr 2026

https://github.com/ronaessi-28/sales-data-analysis-visualization-project

A comprehensive data analysis and visualization project using Python, Pandas, Matplotlib, Seaborn, and Streamlit. The project explores Superstore sales data to uncover trends, region-wise performance, product category insights, and builds an interactive dashboard.

data-analysis data-visualization eda matplotlib pandas plotly python-project sales-dashboard seaborn streamlit

Last synced: 16 Apr 2026

https://github.com/danpoynor/omdb-api-data-analysis

Gathers data for Oscar-winning movies using their IMDB ids, saves the information to a CSV file, and answers a few data analysis questions about the movies using JupyterLab.

analytics csv data-analysis jupyter-notebook matplotlib omdb-api pandas-dataframe python-dotenv python3 seaborn-plots

Last synced: 16 Apr 2026

https://github.com/ribin-baby/the-sparks-foundation-data-science-internship

This repository contains tasks and solutions assigned as part of internship program. This repository contains workbooks on data analysis and model building parts.

data-analysis eda python3

Last synced: 16 Apr 2026

https://github.com/pizofreude/divvybikes-share-success

Developing data-driven marketing campaign for Divvy to convert casual riders into annual members. Divvy is a bike-share program of the Chicago Department of Transportation (CDOT).

airflow bi-analytics data-analysis data-engineering data-visualization database dbt docker etl jupyterlab python r redshift s3

Last synced: 17 Apr 2026

https://github.com/jabercrombia/video-game-data

This project integrates FastAPI as the backend and Next.js as the frontend to create a full-stack web application. It processes and displays vides game sales data, enabling seamless API communication while maintaining a scalable and efficient architecture.

data-analysis nextjs nintendo playstation python typescript video-game

Last synced: 02 Apr 2026

https://github.com/eliasdehondt/learn-r

Welcome to the Learn-R repository! This is your go-to resource for learning the R programming language, whether you're a beginner or looking to enhance your skills.

data-analysis data-visualization education machine-learning programming r statistics tutorials

Last synced: 03 Apr 2026

https://github.com/ridemountainpig/education-level-data-analysis

An analysis of the relationship between education levels, unemployment rates, and credit card spending in Taiwan's six major cities.

data-analysis matplotlib pandas-python

Last synced: 17 Apr 2026

https://github.com/nathaliacosim/migration-patrim

Automação para extração, conversão e migração de dados patrimoniais para o sistema patrimônio cloud da betha sistemas. O projeto garante um fluxo estruturado e seguro de transferência de informações, utilizando C# (.NET Framework), PostgreSQL e integração via API.

conversion-tool data-analysis data-conversion data-transformation dotnet dotnet-code dotnet-console-app migration-tool

Last synced: 17 Apr 2026

https://github.com/victoorv/criminalite_us

Une analyse de la criminalité en fonction de variables socio-économiques a été menée, incluant la sélection et la comparaison de modèles de régression multiple ainsi que des tests d'hypothèses sur les coefficients et la significativité des modèles.

data-analysis data-science r regression regression-analysis regression-models statistical-analysis statistical-tests statistics

Last synced: 04 Apr 2026

https://github.com/vitornegromonte/eda_stroke

Exploratory data analysis in the stroke prediction dataset

data-analysis data-science exploratory-data-analysis kaggle-dataset visualization

Last synced: 17 Apr 2026

https://github.com/santos-k/fashion-recommender-dashboard

The project is a neural network-based fashion recommendation system built using Python. The model used for this system is Resnet50, which is a deep learning model used for image recognition. The data used for training the model is scraped from Flipkart, with a total of 65,000 images.

ann cnn dash dashboard data-analysis data-science deep-learning eda gcp heroku kera machine-learning nueral-networks plolty python tensorflow

Last synced: 04 Apr 2026

https://github.com/q-viper/blog-notebooks

This is the repo to store most of my blogs in dataqoil.com and q-viper.github.io.

data-analysis data-science machine-learning-algorithms timeseries

Last synced: 04 Apr 2026

https://github.com/sanam2405/ahs

This contains the analysis of result of AHS Madhyamik Examination 2022

data-analysis data-visualization jupyter-notebook python

Last synced: 18 Apr 2026

https://github.com/yuvrajsaraogi/sales-prediction-using-python

Sales prediction involves estimating future product sales based on factors like advertising spend, target audience, and platform. Businesses rely on data scientists to forecast sales and optimize advertising costs. Machine learning in Python can be used for this task.

data data-analysis data-science data-visualization machine-learning matplotlib natural-language-processing numpy pandas prediction python sales-prediction-using-python sql

Last synced: 19 Apr 2026

https://github.com/prangonghose/wikipedia-blocking-policies

This study investigates the relationship between editors’ disruptive behavior and regulation policies on English Wikipedia, focusing on the Blocking Policy page. The study collects and analyzes data from 2004 to 2022 using the Wikipedia API, page statistics, and keyword extraction.

data-analysis data-visualization matplotlib open-source pandas python3 seaborn

Last synced: 18 Apr 2026

https://github.com/rajeev2806/retail-order-data-analysis

Dataset downloaded from kaggle api and then data cleaning and analysis is performed

data-analysis data-cleaning postgresql

Last synced: 18 Apr 2026

https://github.com/vansh-py04/data-analysis-questions-pandas-numpy-sql

Solution to 450+ Data Science Tech Stack questions essential for Data Analysts and Scientists!

data-analysis data-science deepnote machine-learning numpy pandas python sql

Last synced: 18 Apr 2026

https://github.com/vl1507/data_science_pro_course

Курс "Аналитик данных PRO (PRO DA-6)"

da data-analysis data-science ds jupyter-notebook machine-learning ml pro-da python

Last synced: 18 Apr 2026

https://github.com/bolshovaelizaveta/covid19_spark_analysis

Учебный проект по дисциплине 'Базы данных для компьютерного зрения'. Разработка аналитической платформы для эпидемиологического мониторинга COVID-19 с использованием Apache Hadoop и Spark

apache-hadoop apache-spark covid-19 data-analysis jupyter-notebook machine-learning medical-imaging pyspark sql

Last synced: 18 Apr 2026

https://github.com/mksingh431/free-data-science-courses

Data science is a rapidly growing tech field that’s transforming business decision-making. To break into this field, you need the right skills. Fortunately, top institutions like Harvard and IBM offer free online courses. These courses cover everything from basic programming to advanced machine learning.

course data data-analysis data-science data-visualization free freecou python

Last synced: 19 Apr 2026

https://github.com/rodriguesl1/analise-ibovespa-fiap

Modelo de previsão do índice IBOVESPA utilizando técnicas de séries temporais. O projeto inclui análise exploratória, decomposição sazonal, testes de estacionariedade e modelagem com Prophet, AutoARIMA e outros modelos estatísticos para apoiar decisões de investimento.

autoarima b3 brasil data-analysis economia finance forecasting ibovespa pandas prophet python statsmodels time-series

Last synced: 19 Apr 2026

https://github.com/yuvrajsaraogi/unemployment-analysis-with-python

Unemployment is measured by the unemployment rate which is the number of people who are unemployed as a percentage of the total labour force. We have seen a sharp increase in the unemployment rate during Covid-19, so analyzing the unemployment rate can be a good data science project.

big-data big-data-analytics data-analysis data-science data-visualization engineering excel jupyter-notebook machine-learning mini-project natural-language-processing nlp project python3 sql

Last synced: 19 Apr 2026

https://github.com/kheriberto/linear_regression_ecommerce

Simple project showcasing crafting a linear regression model with SciKit Learn

data-analysis jupyter-notebook linear-regression pandas python scikit-learn seaborn

Last synced: 19 Apr 2026

https://github.com/edwinrlambert/exploring-airbnb-market-trends

Dive into NYC's Airbnb market trends through detailed analysis of listings data, including prices, types, and review dates. This is a DataCamp project.

airbnb data-analysis jupyter-notebook market-trends python

Last synced: 19 Apr 2026

https://github.com/namratha2301/carprice_analysisandprediction

This project analyzes factors influencing vehicle prices using a dataset of various attributes, including Engine capacity, Power, Mileage, and Seating capacity.

data-analysis data-visualization exploratory-data-analysis machine-learning pandas predictive-modeling random-forest-classifier regression scikit-learn seaborn

Last synced: 20 Apr 2026

https://github.com/jbalooshie/school_district_analysis

Analysis of standardized testing results using NumPy and Pandas, executed in Jupyter Notebook. Summaries of the testing results are provided based on school, test type, and grade level.

data-analysis data-science dataframes jupyter-notebook numpy pandas python

Last synced: 20 Apr 2026

https://github.com/sarthakmishraa/bike_rental_predictor

Bike Sharing Dataset : This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

data-analysis machine-learning python xgboost

Last synced: 20 Apr 2026

https://github.com/robinmillford/hr-analytics-employee-performance-analysis

HR Analytics: Unveiling Employee Performance - A comprehensive exploration of employee data using SQL and Power BI, uncovering key insights for strategic HR decision-making.

data-analysis data-visualization jupyter-notebook powerbi python3 sql

Last synced: 20 Apr 2026

https://github.com/profasem/logistics-performance-analysis

Power BI dashboard analyzing logistics performance, delivery delays, carrier efficiency, and regional risk.

business-intelligence dashboard data-analysis logistics powerbi python supply-chain

Last synced: 21 Apr 2026

https://github.com/danpoynor/pet-shelter-data-analysis-notebook

Demonstration of skills analyzing data from a pet shelter. The CSV data contains tables detailing the incoming and outgoing animals and I use my knowledge of Pandas to gather and present the requested information.

csv data-analysis data-cleaning data-science jupyter-notebook matplotlib numpy pandas pet-shelter tabular-data

Last synced: 21 Apr 2026

https://github.com/meerantajalli/networksecuritydefense

This Network Security defense systems acts as an indicator against SMP Floods, UDP Floods, ICMP Floods. This model is trained using packets from wireshark and can easily differentiate between normal network traffic and traffic that has been targetted on the machine by an attacker using the rate of packets transfer and using the source IP.

anomaly-detection classification cyber-security data-analysis ddos-detection icmp-flood intrusion-detection machine-learning network-security packet-analysis python random-forest security smp-flood udp-flood wireshark

Last synced: 21 Apr 2026

https://github.com/mhuwaimel/data-analysis-of-students-results-in-qiyas

Analysis of student performance data from Qiyas (قياس), the Saudi Arabian National Center for Assessment

data-analysis jupyter-notebook python

Last synced: 22 Apr 2026

https://github.com/kgelli/apple-data-analysis---apache-spark

Modular ETL pipeline for analyzing Apple product purchase patterns using Apache Spark on Databricks with factory design patterns.

apache-spark data-analysis databricks delta-lake etl-pipeline factory-pattern pyspark

Last synced: 22 Apr 2026

https://github.com/rajesh9943/sentiment-analysis-of-consumer-opinions-on-amazon-products

Developed a comprehensive Sentiment Analysis System aimed at classifying Amazon product reviews into positive, neutral, and negative sentiments. The project leveraged advanced Natural Language Processing (NLP) techniques alongside machine learning algorithms to deliver accurate and actionable insights from customer feedback

amazon data-analysis data-manipulation data-preprocessing data-presentation data-visualization machine-learning nlp nlp-library nltk product-reviews-analysis sentiment-analysis sklearn-library word-cloud-generator-in-python-3

Last synced: 05 Jun 2026

https://github.com/strixion/demoversion_ai

The demoversion of StrixionAI

ai csv data-analysis data-analytics json python txt

Last synced: 24 Apr 2026

https://github.com/datalopes1/bank_marketing

Este projeto será baseado no Dataset Bank Marketing encontrado na UC Irvine - Machine Learning Repository e disponibilizado por S. Moro, R. Laureano e P. Cortez

data-analysis data-science data-visualization eda python

Last synced: 24 Apr 2026

https://github.com/voidnire/redditviralmysteryposts

Análise de posts de subreddits de mistério. O que define um post viral neste tipo de sub?

data-analysis data-visualization mysteries mystery nlms python-3 reddit

Last synced: 24 Apr 2026

https://github.com/cyberoctane29/python-for-data-analysis

A repository dedicated to learning Python for data analysis, data science, and data analytics. This collection of Jupyter notebooks covers practical exercises and concepts from the Google Advanced Data Analytics Professional Certificate program.

data data-analysis data-analytics data-science python

Last synced: 24 Apr 2026

https://github.com/amlanmohanty1/zepto-sql-data-analysis-project

Complete Data Analysis on Zepto Inventory data using SQL

data-analysis database inventory-management postgresql sql zepto

Last synced: 24 Apr 2026

https://github.com/gnodux/adb-link

An MCP server that connects to multiple databases. Supports access control and dynamic SQL query tool registration and invocation.

agent ai-tools data-analysis database-gateway go mcp mcp-server

Last synced: 06 Jun 2026

https://github.com/mehmetkahya0/gallstone_dataset_analysis_project

Safra Taşı Hastalığı (Gallstone-1) Veri Seti Analizi (https://archive.ics.uci.edu/dataset/1150/gallstone-1)

analysis analytics data data-analysis data-science data-visualization database graph matplotlib python

Last synced: 25 Apr 2026

https://github.com/rubix982/product-quality-classification

This is an implementation for the CIKM AnalytiCup 2017, around the topic of "Product Title Quality". The goal is to take SKUs and rank its title's clarity and conciseness. Referenced papers are attached to this repository. And as such, the aim is to craft ensemble models that either try to replicate results or find new methods for classification.

data data-analysis information-retrieval jupyter-notebook machine-learning nlp python spacy-nlp

Last synced: 25 Apr 2026

https://github.com/marielachirinosr/bellabeat-wellness-data-trends

Analyzing smart device data for insights on user activity patterns to optimize interventions for better health outcomes.

data data-analysis data-visualization pandas python python3 tableau tableau-public

Last synced: 25 Apr 2026

https://github.com/edwinrlambert/investigating-netflix-movies

Demonstrates data analysis and visualization techniques for Netflix movies using Python in a Jupyter notebook. This is a DataCamp project.

data-analysis data-analysis-python netflix python

Last synced: 25 Apr 2026

https://github.com/devexpress-examples/winforms-create-a-custom-exporter-for-pivotgridcontrol-with-xtrareport

This example illustrates how to dynamically create a custom report based on PivotGridControl content in WinForms.

data-analysis dotnet pivot-grid pivot-grid-for-winforms winforms

Last synced: 26 Apr 2026

https://github.com/moshora99/sql-data-warehouse-project

Build modern data warehouse with mysql, Including ETL processes, data modeling and analytics

data-analysis data-engineering data-science database datawarehouse datawarehousing etl scheme sql sql-query sql-server

Last synced: 27 Apr 2026

https://github.com/sohamb21/analysis-of-superstore-dataset

I completed the IBM SkillsBuild Data Analytics Internship Program to develop my Data Analytics skills and apply them to a real-world problem by working on this project.

data-analysis python

Last synced: 27 Apr 2026

https://github.com/edanur-y/laptop-price-prediction-with-regression-models

Comparing the performances of multi-layer perceptron, k-nearest neighbors, random forest, gradient boosting and extreme gradient boosting regression and on laptop data to predict the price.

data-analysis data-transformation feature-importance hyperparameter-tuning python

Last synced: 27 Apr 2026

https://github.com/mango606/da__

2021.9 데이터분석프로그래밍 과제

data-analysis python task

Last synced: 27 Apr 2026

https://github.com/sweta2501/netflix_dataanalysis

With the help of Netflix Data, I have done some Data Analysis.

data-analysis data-science jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/sujata-adhikari/data-analysis

Data analysis of Market sales data using PowerBi, created dashboard to show analysis.

data-analysis excel pandas powerbi

Last synced: 12 Jun 2026

https://github.com/simranshaikh20/diwali-sales-analysis-for-business-insights

A data analyst project on diwali sales . In this state according state , gender, age we are able to know how much sale it done.

data-analysis data-visualization python

Last synced: 28 Apr 2026

https://github.com/elmezianech/autoinventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

automation dashboard data-analysis data-engineering-pipeline docker etl glue-job inventory-management kafka kpis lambda-functions lstm ml-pipeline mlflow power-bi pytorch redshift s3 streamlit warehouse-management

Last synced: 28 Apr 2026

https://github.com/sufyan14/weather-data-analysis

A Streamlit dashboard that forecasts 30-day weather trends using uploaded CSV data and Facebook Prophet.

data-analysis python streamlit

Last synced: 28 Apr 2026

https://github.com/szapp/candyanalysis

Case study: Analyze the candy power ranking to identify and recommend popular candy characteristics

data-analysis data-visualization feature-selection interaction-terms

Last synced: 28 Apr 2026

https://github.com/kisaa-fatima/data-visualization-with-tableauleu

Conducted Exploratory Data Analysis (EDA) on the Berkeley Earth Dataset (large scale dataset), which features high-resolution land and ocean time series data. Created interactive dashboards using Tableau to effectively visualize and highlight trends and patterns within the data.

data-analysis data-science exploratory-data-analysis insights python tableau visualizations

Last synced: 29 Apr 2026

https://github.com/devexpress-examples/web-forms-pivot-grid-change-summary-display-mode

This example shows how to use different summary display modes in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 29 Apr 2026

https://github.com/kawshik-khan/fake-news-analysis

A fake news detection ML model. It utilizes the Bag of Words model for text vectorization and a Multinomial Naive Bayes classifier to predict whether news articles are real or fake. The project covers data preprocessing, model training, and performance evaluation with accuracy metrics and a confusion matrix.

data-analysis data-science machine-learning ml python3

Last synced: 08 Jun 2026

https://github.com/nivasharmaa/spiderverse

A comprehensive Java program for analyzing and managing events and data points within a fictional spiderverse. Features event handling, anomaly detection, cluster management, and robust file I/O operations.

advanced-algorithms anomaly-detection clustering data-analysis file-io object-oriented-programming

Last synced: 29 Apr 2026

https://github.com/satyacoder29/crowdfunding-in-sql

Crowdfunding is a method of raising funds for projects or causes by collecting small contributions from a large group of people, usually through online platforms. It enables individuals, startups, and nonprofits to secure funding, offering rewards or recognition in exchange, and helps bring ideas to life without traditional financing.

data-analysis data-cleaning database-management mysql-database quries sql sql-functions sql-server views

Last synced: 29 Apr 2026

https://github.com/mdaffailhami/king_county_home_sales_analysis

This repository contains code and analysis for exploring home sales data in King County, featuring geospatial mapping to visualize trends and factors influencing housing prices, including location, size, and various property features, using Python and popular data analysis libraries.

data-analysis data-science folium-maps geospatial python

Last synced: 29 Apr 2026

https://github.com/dcs-training/network-analyisis-python

Course material for introducing data visualization with Altair and network analysis with NetworkX (in Python). Go to the readme file

data-analysis data-visualisation network-analysis python text-analysis

Last synced: 29 Apr 2026

https://github.com/teja-1403/forage-standard-bank-data-science

This repository contains solutions to the 4 different tasks that must be performed during the Data Science virtual internship provided by Standard Bank via Forage.

automl communication-skills data-analysis data-science machine-learning python sql

Last synced: 29 Apr 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/varshan1123/sql-tableau-project

We analyze key indicators for our pizza sales data to gain insights into our business performance - A Data Analysis Project performed on Tableau & SQL.

analysis data-analysis data-science data-visualization excel mysql powerbi sql sql-server tableau tableau-dashboards

Last synced: 29 Apr 2026

https://github.com/prithviraj-2003/cognifyz-data-science-internship

🎓 Data Science Internship at Cognifyz Technologies 📅 Duration: 2 Months 🧠 Worked on real-world restaurant data 🗂️ Completed structured tasks across 3 levels 📌 Tasks focused on EDA, data preprocessing, visualization, and analysis 📎 Task descriptions provided in an attached PDF

data-analysis data-science data-visualization matplotlib numpy pandas python3

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/angchekar28/air-quality-index-analysis

This project analyzes Air Quality Index (AQI) data to identify pollution trends, seasonal variations, and the impact of different pollutants. It includes data visualization, correlation analysis, and insights into air quality variations over time.

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook machine-learning python

Last synced: 30 Apr 2026

https://github.com/bachtiarashidiqy/ecommercedashboard

An interactive e-commerce analytics dashboard built with Streamlit, providing visualizations for sales performance, product analysis, geographic insights, and delivery status. Includes date filtering, company branding, and comprehensive documentation.

analytics dashboard data-analysis data-visualization e-commerce matplotlib pandas python seaborn streamlit

Last synced: 30 Apr 2026

https://github.com/farhad-here/id_validator

Iranian National ID Validator. This was one of my data analysis project for the course i had.

data-analysis identity idverification object-oriented-programming oop oops-in-python python streamlit

Last synced: 30 Apr 2026

https://github.com/mitchellharrison/mitchellharrison.github.io

Welcome to my slice of the internet, where I share the knowledge that Duke gave me, so you don't have to spend the mortgage-sized amount to access it. Built with R, Python, Quarto, and love.

ai algorithms-and-data-structures blog data-analysis data-science data-visualization educational machine-learning portfolio portfolio-website quarto r r-language statistics tutorials

Last synced: 30 Apr 2026

https://github.com/abhi227070/ipl-2024-sold-player-data-analysis

This project analyzes IPL 2024 auctioned players' data, including name, team, cricket type, nationality, and price. Users input a player's name to access team, style, nationality, and auction price, aiding research and fantasy leagues. It offers insights into player dynamics, serving cricket enthusiasts with comprehensive data exploration.

data-analysis data-visualization dataanalytics machine-learning machine-learning-algorithms python3

Last synced: 30 Apr 2026

https://github.com/gitchaell/computer-scrapping

Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.

data-analysis data-structures data-visualization database dbdiagram export-excel powerbi scrapper-script scrapping xlsx

Last synced: 01 May 2026

https://github.com/fazatholomew/marlboroplan

In order to contribute to a more inclusive sustainable energy program in Massachusetts, this project is part of my work for a nonprofit organization called All In Energy and undergraduate thesis for my degree.

data-analysis data-visualization energy jupyter-notebook massachusetts python

Last synced: 01 May 2026

https://github.com/devag2004/electricity-analysis-using-spark

electricity analysis project made using spark

data-analysis spark spark-mllib

Last synced: 01 May 2026