An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/nathadriele/ifood-data-governance-pipeline

Este projeto demonstra uma solução completa de Data Governance com foco em qualidade, rastreabilidade, segurança e conformidade com LGPD. Utiliza tecnologias modernas como Streamlit, Airflow, dbt e Pydantic para implementar um ecossistema funcional e interativo com dashboard de governança de dados.

airflow dashboard data-analysis data-catalog data-engineering data-governance data-quality data-visualization dbt ifood lgpd matplotlib numpy observability-data pandas pipeline pyspark redis seaborn streamlit

Last synced: 02 Apr 2026

https://github.com/ruajean/netflixmoviescraper

🎬 A powerful tool for gathering movie data and user reviews from FilmAffinity's Netflix category. This script scrapes movie details and iterates through user reviews, saving structured information to a CSV file for analysis. Ideal for insights into user sentiments and movie popularity on FilmAffinity.

data-analysis data-visualization dataset jupyter-notebook python scraping

Last synced: 17 Apr 2026

https://github.com/humayun-raza-030/restaurant-recommendation-system

This project is a Restaurant Recommendation System that helps users find restaurants in Lahore based on their location, customer reviews, and ratings. The system scrapes restaurant data from Google Maps, analyzes user reviews for sentiment, and provides a visualization dashboard using Tableau.

data-analysis data-science data-visualization python

Last synced: 17 Apr 2026

https://github.com/kgotsosm/fcc-data-analysis

Notebooks created for the Data Analysis Course on freeCodeCamp

data-analysis data-visualization matplotlib pandas seaborn

Last synced: 17 Apr 2026

https://github.com/victoorv/criminalite_us

Une analyse de la criminalité en fonction de variables socio-économiques a été menée, incluant la sélection et la comparaison de modèles de régression multiple ainsi que des tests d'hypothèses sur les coefficients et la significativité des modèles.

data-analysis data-science r regression regression-analysis regression-models statistical-analysis statistical-tests statistics

Last synced: 04 Apr 2026

https://github.com/ahmad-ali-rafique/decision-tree-regressor-modeling

Comprehensive exploration of decision tree regressors, including data cleaning, model building, and performance evaluation on various datasets.

artificial-intelligence data data-analysis dataanalytics decision-trees decisiontreeregressor modeling models regression-models

Last synced: 17 Apr 2026

https://github.com/sevilaymuni/project-no.3-seaborn-plots

Pandas and Seaborn Mediated Comprehensive Analysis on Differentiated Thyroid Cancer

data-analysis data-structures data-visualization mathplotlib pandas python seaborn

Last synced: 18 Apr 2026

https://github.com/manalisbhavsar/mall-customers-clustering

K-Means clustering to mall customer data, segmenting customers based on their annual income and spending score. To identify patterns and group customers for targeted marketing.

data-analysis data-visualization matplotlib numpy pandas python scikit-learn

Last synced: 18 Apr 2026

https://github.com/al-ghaly/prosper-loans-analysis

A statistical Analysis Project, to analyze the data of a finance company’s loans Using Python packages (pandas – NumPy – seaborn – matplotlib)

data-analysis matplotlib numpy pandas python python-data-analysis seaborn statistical-analysis statistics

Last synced: 18 Apr 2026

https://github.com/mi7773/advanced_sql_data_analytics_project

A hands-on SQL project simulating data analysis using fact and dimension tables, covering trends over time, cumulative metrics, performance breakdowns, segmentation, and reporting via SQL.

analytics business-analytics business-intelligence data data-analysis data-analyst data-analytics database query reporting sql sql-queries sql-query sql-server window-functions window-functions-in-sql

Last synced: 18 Apr 2026

https://github.com/rodriguesl1/analise-ibovespa-fiap

Modelo de previsão do índice IBOVESPA utilizando técnicas de séries temporais. O projeto inclui análise exploratória, decomposição sazonal, testes de estacionariedade e modelagem com Prophet, AutoARIMA e outros modelos estatísticos para apoiar decisões de investimento.

autoarima b3 brasil data-analysis economia finance forecasting ibovespa pandas prophet python statsmodels time-series

Last synced: 19 Apr 2026

https://github.com/tsffarias/my-books

Exploratory analysis of my Dataset 'All_the_Books_I_read' which contains all the books I've read

books data-analysis python tableau

Last synced: 19 Apr 2026

https://github.com/decepticon-ts/cap-ai-studio

Description: A modern, powerful web application for advanced image analysis and batch processing, featuring real-time AI-powered image captioning, comprehensive reporting, and an intuitive user interface. Built with Streamlit and Google's Gemini API.

artificial-intelligence batch-processing computer-vision data-analysis gemini-api image-processing image-processing-python python streamlit streamlit-webapp threading

Last synced: 19 Apr 2026

https://github.com/akash-v7/telecom_customer_churn_prediction

A machine learning project to predict customer churn in the telecom industry using data analysis and classification models. The project includes data preprocessing, exploratory data analysis (EDA), model building, and insights to help telecom companies improve customer retention strategies.

data-analysis data-science data-visualization jupyter-notebook machine-learning predictive-modeling python

Last synced: 20 Apr 2026

https://github.com/nikolaos-mavromatis/etf-data-analysis-dashboard

Insights into SPY ETF performance with an interactive Streamlit dashboard powered by Alpha Vantage data.

api data-analysis data-visualization financial-analysis pandas plotly python streamlit

Last synced: 20 Apr 2026

https://github.com/anjaliwork20/moodify

Mood-based music recommendation system that considers a user's emotional state to recommend songs, genres, artists and playlists using Machine learning

artificial-intelligence cnn-keras cnn-model convolutional-neural-networks data data-analysis data-science data-structures data-visualization database deep-learning machine-learning machine-learning-algorithms python recommended song songs

Last synced: 20 Apr 2026

https://github.com/robinmillford/hr-analytics-employee-performance-analysis

HR Analytics: Unveiling Employee Performance - A comprehensive exploration of employee data using SQL and Power BI, uncovering key insights for strategic HR decision-making.

data-analysis data-visualization jupyter-notebook powerbi python3 sql

Last synced: 20 Apr 2026

https://github.com/profasem/logistics-performance-analysis

Power BI dashboard analyzing logistics performance, delivery delays, carrier efficiency, and regional risk.

business-intelligence dashboard data-analysis logistics powerbi python supply-chain

Last synced: 21 Apr 2026

https://github.com/meerantajalli/networksecuritydefense

This Network Security defense systems acts as an indicator against SMP Floods, UDP Floods, ICMP Floods. This model is trained using packets from wireshark and can easily differentiate between normal network traffic and traffic that has been targetted on the machine by an attacker using the rate of packets transfer and using the source IP.

anomaly-detection classification cyber-security data-analysis ddos-detection icmp-flood intrusion-detection machine-learning network-security packet-analysis python random-forest security smp-flood udp-flood wireshark

Last synced: 21 Apr 2026

https://github.com/tmmvn/analytics-notebooks

A bunch of data analytics notebooks done testing out JetBrains DataLore

ai algorithms data-analysis datalore elements-of-ai helsinki-university-mooc python

Last synced: 22 Apr 2026

https://github.com/thinogueiras/jornada-python

Jornada Python - Hashtag Programação.

data-analysis data-science inteligencia-artificial python rpa

Last synced: 22 Apr 2026

https://github.com/kgotsosm/epl-analysis

Preparing data for machine learning algorithms to predict English Premier League match winners.

data-analysis data-cleaning data-modeling

Last synced: 22 Apr 2026

https://github.com/devexpress-examples/web-forms-pivot-grid-export-additional-captions-header-or-footer

This example illustrates how to add a custom header to the document exported to PDF in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 22 Apr 2026

https://github.com/syed-nihaal/car-price-prediction-and-performance-analysis

A data science notebook project focused on analyzing car features and building a model for car price prediction.

data data-analysis data-visualization jupyter-notebook python

Last synced: 23 Apr 2026

https://github.com/strixion/demoversion_ai

The demoversion of StrixionAI

ai csv data-analysis data-analytics json python txt

Last synced: 24 Apr 2026

https://github.com/henriquetourinho/s.i.g.m.a

Plataforma de busca e análise de arquivos para Linux, com GUI avançada em PySide6 e foco em metadados ricos para investigações profundas.

data-analysis developer-tools file-search metadata open-source pyqt pyside6 python python-brasil qt6 sysadmin-tools

Last synced: 24 Apr 2026

https://github.com/muthukumar0908/youtube-data-harvesting-and-warehousing-using-sql-mongodb-and-streamlit

Create a simple and intuitive user interface using Streamlit, From the youtube getting and extracting the data by using API key. That data stored in database.

data-analysis mongodb-atlas python sqldatabase streamlit-webapp youtube-api

Last synced: 24 Apr 2026

https://github.com/cyberoctane29/python-for-data-analysis

A repository dedicated to learning Python for data analysis, data science, and data analytics. This collection of Jupyter notebooks covers practical exercises and concepts from the Google Advanced Data Analytics Professional Certificate program.

data data-analysis data-analytics data-science python

Last synced: 24 Apr 2026

https://github.com/mariann95/sql_data_warehouse_and_analytics_project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

data-analysis data-analytics data-cleaning data-engineering data-lakehouse data-science data-science-portfolio data-warehouse data-warehousing datalake datawarehouse datawarehousing etl etl-job etl-pipeline medallion-architecture sql sql-query sql-server sqlserver

Last synced: 06 Jun 2026

https://github.com/tmoulik/bikeshare-python

Analysis of Bikeshare data from three major cities

data-analysis data-visualization python udacity-nanodegree

Last synced: 25 Apr 2026

https://github.com/sarangs1621/weather-prediction

Weather Prediction Using Machine Learning is a project that leverages machine learning algorithms to predict weather conditions based on historical data. It evaluates three popular ML models (Decision Tree, KNN, and Logistic Regression) and provides performance insights through metrics and visualizations.

data-analysis decision-tree jupyter-notebook knn logistic-regression machine-learning predictive-modeling python scikit-learn weather-prediction

Last synced: 25 Apr 2026

https://github.com/dcs-training/2023-10-22-carpentry-social-science

Go to https://dcs-training.github.io/2023-10-22-Carpentry-Social-Science/ to follow along the material

data-analysis data-visualisation data-wrangling intro-to-programming r

Last synced: 06 Jun 2026

https://github.com/pararang/nams-thesis-fuzzy

A specialized data processing tool designed to help with Fuzzy Delphi Method calculations for thesis research data analysis. Then extended with some new features for data processing with different method.

data-analysis dematel hacktoberfest hacktoberfest-accepted house-of-quality python sustainability vibecoding

Last synced: 27 Apr 2026

https://github.com/sohamb21/analysis-of-superstore-dataset

I completed the IBM SkillsBuild Data Analytics Internship Program to develop my Data Analytics skills and apply them to a real-world problem by working on this project.

data-analysis python

Last synced: 27 Apr 2026

https://github.com/busesimsek/sql-projects

A collection of my SQL projects with insights into real-world datasets.

data-analysis data-analytics mysql sql

Last synced: 07 Jun 2026

https://github.com/edanur-y/laptop-price-prediction-with-regression-models

Comparing the performances of multi-layer perceptron, k-nearest neighbors, random forest, gradient boosting and extreme gradient boosting regression and on laptop data to predict the price.

data-analysis data-transformation feature-importance hyperparameter-tuning python

Last synced: 27 Apr 2026

https://github.com/l2nce/datamining-study

Introduction to data mining

data-analysis data-mining matplotlib numpy panda

Last synced: 28 Apr 2026

https://github.com/abdeldjalilchafai/us-flight-delay-eda

Structured EDA on 2015 US flight delay data. Clean, reproducible notebook using a 6-step data analysis framework for real-world datasets.

data-analysis data-cleaning eda exploratory-data-analysis flight-delays kaggle matplotlib numpy pandas python seaborn

Last synced: 28 Apr 2026

https://github.com/dcs-training/decode-winterschool

In here you can find material on cluster analysis, data wrangling, and network analysis. Go to the readme file for more info

data-analysis data-visualisation data-wrangling gephi network-analysis python r statistics

Last synced: 28 Apr 2026

https://github.com/buabaj/fortran-assignment

code repository for fortran and python climatology assignment.

big-data climatology data-analysis data-visualization fortran90 python

Last synced: 28 Apr 2026

https://github.com/priyanshubiswas-tech/e-commerce_data_analysis

Analyzes 9,994 e-commerce transactions to uncover insights on sales trends, customer behavior, profitability, and logistics using EDA and visualization. Identifies top products, customer segments, and shipping efficiencies to optimize marketing, inventory, and operations, making it valuable for retail, finance, and logistics.

data data-analysis data-visualization pandas pandas-dataframe plotly-analytics-projects plotly-express python

Last synced: 28 Apr 2026

https://github.com/josedanielchg/efficient-data-storage-for-predictive-modeling

DataCamp project from the Associate Data Scientist track, focusing on optimizing dataset storage by transforming data types and filtering. Prepares data for efficient machine learning workflows

cleaning-dataset data-analysis jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/devexpress-examples/web-forms-pivot-grid-change-summary-display-mode

This example shows how to use different summary display modes in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 29 Apr 2026

https://github.com/thanaraklee/pyspark-dataframe-operations

This project focuses on utilizing PySpark DataFrames to analyze and visualize data sourced from external datasets, such as CSV files. It provides a practical example of how to manipulate, transform, and gain insights from large datasets using the PySpark framework.

data-analysis dataframe pyspark python

Last synced: 29 Apr 2026

https://github.com/devexpress-examples/winforms-visualize-pivot-grid-data-in-chart

The following example shows how to integrate the Pivot Grid with the Chart control.

charting data-analysis dotnet pivot-grid-for-winforms winforms

Last synced: 29 Apr 2026

https://github.com/kasraskari/learn-r-codes

A learning repository for R programming, covering data manipulation, visualization, and statistical analysis. (Work in progress!) 🚧

data-analysis data-analysis-r data-visualization r r-examples r-graphics r-statistics statistics

Last synced: 08 Jun 2026

https://github.com/mdaffailhami/king_county_home_sales_analysis

This repository contains code and analysis for exploring home sales data in King County, featuring geospatial mapping to visualize trends and factors influencing housing prices, including location, size, and various property features, using Python and popular data analysis libraries.

data-analysis data-science folium-maps geospatial python

Last synced: 29 Apr 2026

https://github.com/rafgpereira/obmep-analise

Código que analisa a retrospectiva das premiações da Obmep em determinada localidade e escola

data-analysis excel pandas python

Last synced: 29 Apr 2026

https://github.com/findmyway/dataframe-in-julia

A quick introduction of DataFrame in Julia for users from Python

data-analysis dataframe julia jupyter-notebook

Last synced: 29 Apr 2026

https://github.com/teja-1403/forage-standard-bank-data-science

This repository contains solutions to the 4 different tasks that must be performed during the Data Science virtual internship provided by Standard Bank via Forage.

automl communication-skills data-analysis data-science machine-learning python sql

Last synced: 29 Apr 2026

https://github.com/sharoonjoseph11/indian-liver-diseases

Indian Liver Disease Analysis and Prediction This project leverages the Indian Liver Patient Dataset (ILPD) to analyze liver disease trends and develop predictive models for early diagnosis. Through data preprocessing, exploratory analysis, and machine learning, it identifies key risk factors and builds classification models

data-analysis data-science data-visualization logistic-regression machine-learning pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/al-ghaly/e-commerce-a-b-testing

A Statistical Analysis project in which I Performed an A/B test to analyze the effect of changing the user interface for an E-Commerce company's Website.

data-analysis matplotlib numpy pandas python python-data-analysis seaborn statistical-analysis statistics

Last synced: 29 Apr 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/alam025/invoice-generator

Processed 500+ invoices with automated payment reminders and multi-currency PDF generation

api data-analysis finance fintech nextjs pdfkit prisma python stripe

Last synced: 08 Jun 2026

https://github.com/srinibas-masanta/ibm-applied-data-science-capstone

This repository contains the work completed for the Applied Data Science Capstone Project offered by IBM on Coursera. The capstone project is the final course in the IBM Data Science Professional Certificate series and serves as an opportunity to apply the skills and knowledge gained throughout the series to a real-world data science problem.

capstone-project data-analysis data-science data-visualization machine-learning python web-scraping

Last synced: 30 Apr 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/abhi227070/ipl-2024-sold-player-data-analysis

This project analyzes IPL 2024 auctioned players' data, including name, team, cricket type, nationality, and price. Users input a player's name to access team, style, nationality, and auction price, aiding research and fantasy leagues. It offers insights into player dynamics, serving cricket enthusiasts with comprehensive data exploration.

data-analysis data-visualization dataanalytics machine-learning machine-learning-algorithms python3

Last synced: 30 Apr 2026

https://github.com/gitchaell/computer-scrapping

Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.

data-analysis data-structures data-visualization database dbdiagram export-excel powerbi scrapper-script scrapping xlsx

Last synced: 01 May 2026

https://github.com/devexpress-examples/wpf-pivot-grid-provide-custom-summary-values

This example demonstrates how to determine the value type when you calculate custom summary values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 01 May 2026

https://github.com/falakrana/data-analysis-visualization

This repository showcases data analysis and visualization projects using Python and Tableau. It includes exploratory data analysis, interactive dashboards, and insightful visual stories derived from real-world datasets.

data-analysis data-visualization python tableau-public

Last synced: 01 May 2026

https://github.com/mmfava/lonomia-host-plants-2024

This project investigates the relationship between Lonomia achelous and Lonomia obliqua caterpillars and their host plants. The project uses Docker for a consistent environment and R for statistical analysis, with detailed processes documented in Jupyter notebooks.

data-analysis host-plants lonomia lonomism r

Last synced: 01 May 2026

https://github.com/ariyaarka/result-analysis

A simple analysis of result based on different factors shown in figures

data-analysis jupyter-notebook matplotlib numpy-library pandas-dataframe python seaborn

Last synced: 01 May 2026

https://github.com/myounesdev/authorgraphanalyzer

a web-based visualization tool for analyzing and exploring author collaboration networks

algorithms binary-tree bts d3js data-analysis dijkstra-algorithm django exception-handling pandas python scss

Last synced: 08 Jun 2026

https://github.com/pablo1785/receipt-rs

Receipt processing backend built with Shuttle.rs, Axum and Azure Form Recognizer API

api-rest axum azure backend cognitive-services computer-vision data-analysis rust shuttle-rs sqlx

Last synced: 01 May 2026

https://github.com/kevingastelum/mydataanalysis

My DataAnalyst Projects | Python, SQL, Excel, PowerBI & Tableau

data-analysis python sql visualization

Last synced: 20 May 2026

https://github.com/kavicastelo/soil-fertilizer-analysis-colab

This repository includes a data analysis and model training practical Jupyter notebooks using a soil fertilizer dataset. (use 4th edition)

data-analysis jupyter-notebook python

Last synced: 01 May 2026

https://github.com/more-joao/color-distance-luminance

Data analysis project that aims to establish a relation between the Canberra distance between white and any given color in the RGB colorspace and its luminance.

canberra-distance data-analysis luminance python r rgb

Last synced: 02 May 2026

https://github.com/faithererer/haokanvideo_spider

好看视频爬取与数据分析

data-analysis data-visualization python spider

Last synced: 02 May 2026

https://github.com/suma-aljudaia/my-portfolio

Suma Aljudaia | Portfolio – AI & Data Analysis Enthusiast

ai css data-analysis html machine-learning portfolio

Last synced: 02 May 2026

https://github.com/chuxinh/our-data-manual

All in one place for our data science learning journey by Chuxin and Melody

data-analysis data-science machine-learning python

Last synced: 09 Jun 2026

https://github.com/m0saan/python-for-data-analysis

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney,

data-analysis data-science ipython-notebook machine-learning matplotlib numpy pandas python

Last synced: 02 May 2026

https://github.com/se7en69/rna-seq-data-processing-and-analysis-pipeline

This pipeline automates essential steps for RNA-Seq data analysis, including quality control, read trimming, alignment to a reference genome, and coverage quantification. It leverages tools like FastQC, fastp, STAR, and bedtools to ensure high-quality results, with MultiQC reports providing an overview at each stage.

bioinformaitcs-scripting bioinformatics bioinformatics-pipeline data-analysis linux scripts shell

Last synced: 02 May 2026

https://github.com/maddieemihle/pandas-challenge

Python analysis to create and manipulate school and standardized test data. Scores are calculated, grouped, aggregated, summarized, and organized using pandas.

data-analysis pandas-python

Last synced: 09 Jun 2026

https://github.com/helenaden/data-science-fundamentals

This project delves into fundamental data science concepts using Python libraries like NumPy and Pandas

data-analysis datascience datasets datavisualization datawrangling heatmap numpy pandas patterns python

Last synced: 03 May 2026

https://github.com/dimamirana/finding-correlation-among-social-media-usage-depression-sleep

In our project we tried to analysis whether there is a link between depression and social media usage time

anaconda data-analysis jupiter-notebook matplotlib-pyplot patternlab python

Last synced: 03 May 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/nathadriele/diabetes-clinical-etl-pipeline

Este projeto de Engenharia de Dados em Saúde Pública implementa um pipeline completo para coletar, tratar, padronizar, validar, integrar e visualizar dados públicos do SUS relacionados ao Diabetes Mellitus no Brasil, filtrando pelos códigos CID-10 E10 a E14.

cid data-analysis data-extraction data-pipeline data-science data-structures data-visualization datasus diabetes-detection diabetes-prediction epidemiology-analysis etl-pipeline healthcare-analytics ibge logger pytest sih streamlit sus

Last synced: 09 Jun 2026

https://github.com/matteospanio/speed-analysis

A project to analyze the internet speed

bash-script data-analysis

Last synced: 03 May 2026

https://github.com/nathadriele/world-marathon-run-majors-analytics-challenge

This project presents a complete data engineering, analytics, machine learning, and Streamlit dashboard pipeline focused on the Abbott World Marathon Majors: Tokyo, Boston, London, Berlin, Chicago, and New York City. Covering the 2018 to 2025 seasons, it analyzes more than 628,000 runner records and 86 verified winner entries.

challenge data-analysis data-pipeline gradient-boosting lasso-regression linear-regression machine-learning models predictive-modeling python random-forest ridge-regression run-analytics world-marathon

Last synced: 09 Jun 2026