An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sweta-kaundilya/python_for_data_analysis

Learning Python and all the relevant libraries in python for Data field.

cufflinks data-analysis data-science matplotlib numpy pandas plotly python seaborn

Last synced: 04 May 2026

https://github.com/halyusa16/e-commerce-analysis

This project analyzes a public e-commerce dataset to uncover valuable insights and answer critical business questions. The dataset contains customer, product, order, and transaction details, providing a comprehensive view of the e-commerce platform's operations.

data-analysis data-cleaning data-exploration data-visualization self-project

Last synced: 09 Jun 2026

https://github.com/jatin-mehra119/flight-price-prediction

This study aims to analyze flight booking data from "Ease My Trip" website, using statistical tests and linear regression to extract insights. By understanding this data, valuable information can be gained to benefit passengers using the platform.

data-analysis datacleaning datavisualization machine-learning preprocessing-data python sklearn-pipeline sklearn-regression-algorithm streamlit-webapp

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/flytomarsz/bike-sharing-system-analysis

This analysis project aim to identify bike rental's behavior in 2012 from Capital Bikeshare system, Washington D.C., USA. This project is part of my Data Analysis study at Dicoding.

data-analysis data-visualization jupyter-notebook python streamlit

Last synced: 04 May 2026

https://github.com/dhruvsrikanth/basic-data-science

A short Data Science Project I took up for fun! This is a data analysis based on a dataset I created to predict the distribution of wealth within an economy as well as several characteristics of each class within society!

analysis data-analysis data-pipeline data-science data-visualization machine-learning matplotlib pandas python seaborn sklearn

Last synced: 05 May 2026

https://github.com/zafir100100/cancer-stage-prediction

This code predicts cancer data using various regression models, calculates their average R-squared scores, and prints the best model.

cross-validation data-analysis data-preprocessing decision-trees gradient-boosting linear-regression machine-learning-algorithms numpy pandas random-forest regression scikit-learn

Last synced: 05 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026

https://github.com/monish-nallagondalla/universal-bank

Credit Card Ownership Prediction A machine learning project that predicts credit card ownership using features like age and income, balancing class distributions for improved accuracy.

classification-models credit-card-prediction data-analysis data-classification decision-tree-classifier imbalanced-datasets machine-learning model-evaluation python scikit-learn

Last synced: 05 May 2026

https://github.com/ayaatmohammed/amazon-sales-analysis-pyspark

In-depth analysis of the Olist E-commerce dataset from Kaggle using PySpark for customer segmentation (RFM) and market basket analysis.

big-data big-data-analytics customer-segmentation data-analysis data-science ecommerce jupyter-notebook kaggle pyspark python rfm-analysis

Last synced: 05 May 2026

https://github.com/kammarah/data-sample

I designed a database website 🌐 that can be uploaded easily for use 📤. You can check my website 👀.

data-analysis data-visualization database deploy deployment library-management-system panaversity streamlit webapp

Last synced: 05 May 2026

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 05 May 2026

https://github.com/meinhere/dicoding-analisis-data

Submission Analisis Data dengan tema E-Commerce Streamlit App

data-analysis data-mining e-commerce python streamlit

Last synced: 05 May 2026

https://github.com/ibrahimceyisakar/hotel-finder

Hotel finder system with Python includes data gathering, analyzing, and visualization.

data-analysis data-gathering data-visualization pandas plotly python selenium streamlit

Last synced: 06 May 2026

https://github.com/sankaran-s2001/us-traffic-accidents-analysis-python-eda

Exploratory data analysis of US traffic accidents from 2016-2023, analyzing patterns by time, location, weather, and severity using Python data science libraries.

data-analysis data-science data-visualization eda matplolib numpy pandas python

Last synced: 06 May 2026

https://github.com/harryrlk/data_analysis_showcase

This repository showcases my data analysis and visualization projects using Excel, Python, R, and Tableau. Some projects are under NDA, so key figures and specific numbers are not included, but brief overviews and methodologies are provided. Feel free to explore and contact me for further details.

data-analysis data-science data-visualization excel portfolio python r tableau

Last synced: 06 May 2026

https://github.com/kishorep26/school-recommendation-system

Intelligent school recommendation system that matches students with suitable educational institutions based on preferences and performance metrics

bootstrap data-analysis decision-support edtech education education-technology flask matching-algorithm python recommendation-system school-finder school-search student-portal web-application

Last synced: 06 May 2026

https://github.com/josepablodmg/python--linear-regression-advertising

A linear regression analysis to predict sales based on advertising spending across TV, radio, and newspaper channels. The project includes exploratory data analysis, model training, coefficient visualization, and residual analysis.

advertising data-analysis exploratory-data-analysis linear-regression machine-learning python regression scikit-learn visualization

Last synced: 06 May 2026

https://github.com/suhas-005/jovian-data-analysis-course-assignment

These are my assignments for Data Analysis : Zero to Pandas course by Jovian.ai

data-analysis data-analytics numpy pandas python

Last synced: 07 May 2026

https://github.com/rohansoni45/whatsapp-chat-analysis

This project involves analyzing WhatsApp chat data to extract valuable insights. Using Python and various libraries like Pandas and Matplotlib, the project processes and visualizes chat statistics such as message frequency, most active participants, and sentiment analysis.

chat-analysis data-analysis data-science matplotlib pandas python sentiment-analysis streamlit visualization web-app word-cloud

Last synced: 07 May 2026

https://github.com/badranalyst/residential-unit-prices-data-analysis-application

Python-based analysis of residential unit prices, focusing on data cleaning, visualization, and exploratory data analysis (EDA). Key features include price distribution, and correlation analysis between factors like size, location, and pricing.

data-analysis data-visualization dataset matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/13anush/python-libraries-

A collection of essential Python libraries—NumPy, Pandas, Matplotlib, and Seaborn—perfect for anyone starting out in data analysis.

data-analysis matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/sajjad425/edaipl

The dataset covers the Indian Premier League (IPL) with details on matches (date, teams, venue, results), player stats (runs, wickets), team stats (wins, losses), season summaries, and umpire info. The EDA reveals patterns and insights, highlighting dominant teams, star players, and trends across seasons.

data-analysis eda exploratory-data-analysis ipl python

Last synced: 05 May 2026

https://github.com/cicku/en.650.672

HW of EN.650.672

analytics data-analysis numpy pandas

Last synced: 05 May 2026

https://github.com/pcanadas/weather_scraper

Este proyecto automatiza la recopilación y el procesamiento de datos meteorológicos históricos y previsionales. Utiliza Selenium para extraer información de sitios web de clima, procesa los datos con Pandas y los almacena en archivos CSV limpios. Es ideal para análisis climáticos, visualización de datos o integración en otros sistemas.

beautifulsoup data-analysis pandas python selenium

Last synced: 05 May 2026

https://github.com/iamrajmani/sentimental-analysis

Sentimental Analysis - Final Year College Project

data-analysis data-visualization machine-learning python pytorch

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/erick957/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

🏠 Predict house prices using advanced regression techniques with this comprehensive analysis and cleaning project, from data loading to model deployment.

data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 06 May 2026

https://github.com/mikma03/datascience_python_datacamp

DataScience with Python. Code and examples. Python libraries, including pandas, NumPy, Matplotlib, and many more.

data-analysis data-science datacamp datascience numpy pandas python

Last synced: 06 May 2026

https://github.com/drill-n-bass/dealavo-project

Cartesian product from dictionary to list of dictionaries and faster methods for finding index than the `index` method.

data-analysis data-analysis-python matplotlib pandas python python3 random timeit

Last synced: 06 May 2026

https://github.com/vimlesh-gupta/blinkit_data_analytics_project

End-to-end Blinkit data analytics project using Python, SQL Server & Power BI

blinkit data-analysis eda pandas powerbi python sql-server

Last synced: 06 May 2026

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026

https://github.com/badranalyst/exploratory-data-analysis-on-salaries-dataset

Performing EDA on a dataset related to salaries, exploring relationships between factors like job titles, industries, and locations. Insights are visualized with plots to identify trends and disparities in salary data.

data-analysis dataset eda exploratory-data-analysis pandas python

Last synced: 07 May 2026

https://github.com/mahmoudnamnam/fc-barcelona-reports

FC Barcelona Reports: An interactive web application to analyze and visualize FC Barcelona's match data. Built with Streamlit, it scrapes match data from WhoScored, stores it in MongoDB, and presents insights through interactive visualizations like pass networks, shot maps, and player statistics.

data-analysis data-visualization football-analytics mplsoccer pandas streamlit web-scraping

Last synced: 07 May 2026

https://github.com/biginformatics/git-basics

Hands-on Git and GitHub lessons for analysts and statisticians

data-analysis git github public-health training

Last synced: 10 Jun 2026

https://github.com/riborings/python_projects

Python projects and other programming experiences

data-analysis machine-learning project python regression-analysis

Last synced: 08 May 2026

https://github.com/otonomee/against-the-clock-transcript-analysis

This repository contains code and analysis for exploring the transcripts of the various "Against The Clock" videos featured on the FACT Magazine YouTube channel. The goal is to uncover insights, patterns, and trends across the different artists and their creative process under time constraints.

against-the-clock ai-analysis audio-processing creative-ai creative-process data-analysis fact-magazine machine-learning music-production natural-language-processing nlp text-mining yt-dlp

Last synced: 08 May 2026

https://github.com/danmadeira/algoritmos-estatistica-python

Demonstração de Algoritmos de Estatística em Python

algorithms data-analysis data-science python statistics

Last synced: 08 May 2026

https://github.com/samjoesilvano/password_strength_prediction_using_nlp

Developed a predictive model to categorize passwords as Strong, Good, or Weak, enhancing security and reducing breach risks. The project involves cleaning and analyzing data from an SQL database, using the TF-IDF technique for transformation, and implementing a Logistic Regression model to achieve accurate classifications.

data-analysis data-classification data-cleaning data-visualization logistic-regression machine-learning natural-language-processing pandas password-security password-strength python scikit-learn sql tf-idf

Last synced: 08 May 2026

https://github.com/deepanshkhurana/udacityproject-prediciting-boston-housing-prices

This is a Udacity Project for the Machine Learning Nanodegree. Here, we are trying to predict Boston Housing Prices using sklearn.

data-analysis data-science machine-learning python scikit-learn udacity

Last synced: 08 May 2026

https://github.com/mrunmayee3108/financial-chatbot

A Python chatbot for analyzing financial data of companies with revenue, income, assets, cash flow, and debt ratio queries

chatbot data-analysis jupyter-notebook pandas python python3

Last synced: 09 May 2026

https://github.com/tsbarr/toronto-open-data

Analysis of Toronto's open data initiatives. 🌆 Exploring Toronto's urban systems through data science 📊 Python-based analyses of public datasets 🔍 Focus on community impact and urban patterns 🎓 Academic rigour meets practical insights 🔄 Regularly updated with new analyses

api-integration civic-tech ckan-api data-analysis data-cleaning data-science data-visualization exploratory-data-analysis jupyter-notebook open-data pandas public-data python tableau toronto urban-analytics

Last synced: 09 May 2026

https://github.com/mmfava/qualesuapergunta-scripts-base-2015-2018

Este repositório contém scripts R utilizados durante meu trabalho de consultoria em bioestatística. Os scripts abrangem várias análises estatísticas e serviram como base para análises que foram realizadas. Eles não são scripts das consultorias ou assessorias em si.

analytics data-analysis r

Last synced: 20 May 2026

https://github.com/tyriek-cloud/nyc-mobility-survey-analysis

An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.

aws aws-athena aws-glue aws-glue-crawler aws-quicksight aws-s3 data-analysis data-engineering etl-pipeline json python

Last synced: 09 May 2026

https://github.com/macdon112/credit-card-fraud-detection

Comparing ML models (Random Forest, KNN, Decision Tree) for credit card fraud detection using SMOTE and stratified cross-validation.

classification data-analysis fraud-detection imbalanced-data machine-learning python scikit-learn

Last synced: 10 May 2026

https://github.com/datasqlsantosh/global-energy-consumption-renewable-generation-python-data-analysis-portfolio

This project focuses on analyzing global energy consumption patterns and trends in renewable energy generation using Python data analysis libraries such as Seaborn and NumPy. The analysis aims to explore energy consumption data from various regions worldwide and examine the contribution of renewable energy sources over time

data data-analysis data-visualization pandas seaborn

Last synced: 10 May 2026

https://github.com/luca-02/credit-card-fraud-detection

This is a small master's degree project for New Generation Data Models and DBMSs course (academic year 2024/25).

data-analysis database nosql python

Last synced: 10 Jun 2026

https://github.com/parthds02/customer-segmentation-with-kmeans-clustering

Analyze customer behavior using Python and KMeans Clustering on transactional data. Features RFM analysis, data cleaning, clustering insights, and actionable visualizations to support business decision-making.

data-analysis data-visualization feature-engineering kmeans-clustering numpy pandas vscode

Last synced: 11 May 2026

https://github.com/hrosicka/czechpopulationestimation

This GitHub repository contains Python code for data analysis and population prediction in the Czech Republic up to the year 2050. The code is written in Python and utilizes the Pandas and Matplotlib libraries.

data-analysis data-visualization matplotlib matplotlib-figures matplotlib-pyplot pandas pandas-dataframe pandas-library pandas-python python python3

Last synced: 11 May 2026

https://github.com/deliprofesor/amazon-movie-analysis-and-visualization

"Amazon Movie Analysis and Visualization" is a Python project that analyzes and visualizes movie data from Amazon.com, including ratings, directors, actors, release years, MPAA ratings, and pricing. The project provides insights into movie trends and popular films, helping users explore key patterns through interactive visualizations.

data-analysis data-visualization matplotlib pandas python

Last synced: 12 May 2026

https://github.com/ggarciajavier/udacity-dalf-project2-wrangle-openstreetmap-data

Work performed for the 2nd project of Udacity Data Analyst Nanodegree: OpenStreetMap data wrangling and analysis.

data-analysis openstreetmap python sql

Last synced: 12 May 2026

https://github.com/sakan811/honkai-star-rail-a-few-fun-insights-with-data-analysis

The project gives insights that delve into the Honkai Star Rail's character's stats of all available characters as of the given date.

data data-analysis data-science data-visualization docker flask game honkai honkai-star-rail honkai-starrail seaborn webscraping webscraping-data webscraping-selenium

Last synced: 10 Jun 2026

https://github.com/sricasea/fundraising-insights-mwpccc

Data storytelling meets impact strategy — a nonprofit fundraising analysis project combining SQL, Python, and Deepnote to uncover donor trends and guide smarter decisions.

data-analysis data-storytelling data-visualization deepnote fundraising nonprofit portfolio-project python sql

Last synced: 12 May 2026

https://github.com/manukot/sturdy-engine-python-

I've leant not only various Theoretical Concepts but also practical projects in my Masters Coursework

data-analysis data-visualization python3

Last synced: 13 May 2026

https://github.com/lucs1590/agidatatest

This is a repository with data analysis and data science tests.

data-analysis data-science python test

Last synced: 13 May 2026

https://github.com/nlink-jp/shell-agent-v2

macOS local-first chat & agent tool with interactive data analysis (Wails v2 + React)

data-analysis duckdb golang llm macos react wails

Last synced: 13 May 2026

https://github.com/madhurragarwal/advertising-data-set---eda-and-ml

Logistic Regression and EDA done on Advertising Data set

data-analysis machine-learning

Last synced: 13 May 2026

https://github.com/satvikpraveen/matplotlibmasterpro

📷 MatplotlibMasterPro is a complete, portfolio-ready project to master data visualization using matplotlib. Includes 16 notebooks, real datasets, exportable plots, custom themes, Streamlit dashboard, and Docker support. Ideal for learners and data professionals.

charts custom-plots dashboarding data-analysis data-science data-visualization educational-project interactive-visualizations jupyter-notebook matplotlib notebooks open-source plotting portfolio-project python python-utilities reproducible-research subplots time-series-analysis visualization-tools

Last synced: 14 May 2026

https://github.com/saksham-jain177/cryptodataanalysis

A Python powered project that fetches live cryptocurrency data from the CoinMarketCap API, analyzes it, and updates a live Excel sheet every 5 minutes.

api-integration coinmarketcap cryptocurrency data-analysis excel live-data python

Last synced: 12 Jun 2026

https://github.com/prathmesh2507/global-stock-intelligence-dashboard

Interactive Global Stock Market Analytics Dashboard built using Python, YFinance, Pandas, Streamlit, and Plotly. Analyze 20+ countries and 400+ top stocks with advanced visualizations and financial insights.

dashboard data-analysis data-visualization python stock-analysis streamlit

Last synced: 15 Jun 2026

https://github.com/anderson-andre-p/uber-data-analysis

This repository contains a comprehensive data analysis project focused on Uber rides. The dataset used in this project is a spreadsheet obtained from Uber, containing data related to ride details, such as pick-up and drop-off locations, date and time of the ride, and the fare amount.

data-analysis data-science data-visualization python

Last synced: 15 Jun 2026

https://github.com/dcs-training/data-wrangling-and-vis-pandas

Introduction to analyzing structured data with the Python libraries pandas, for CSV and TSV data, and ElementTree, for XML data. Go to the readme file

data-analysis data-visualisation data-wrangling python

Last synced: 16 Jun 2026

https://github.com/lotfiferaga/amazon-alexa-reviews-sentiment-analysis

Amazon Alexa, developed by Amazon, allows users to interact with technology through voice commands. Analyzing user sentiments about Alexa, with over 40 million users worldwide, is an intriguing data project.

classification data-analysis python sentiment-analysis

Last synced: 18 Jun 2026

https://github.com/ilhanseyhanx/car-price-prediction-with-machine-learning

🚗 ML-powered car price prediction model with 95.88% accuracy using Random Forest and comprehensive data preprocessing

car-price-prediction data-analysis data-science machine-learning pandas python random-forest regression sklearn

Last synced: 19 Jun 2026

https://github.com/alinababer/covid19-timeseries-cases-and-deaths-forecasting-

This study is based on confirmed cases and deaths collected from Pakistan. Results demonstrate the promising potential of TIME SERIES model in forecasting COVID-19 cases and highlight the superior performance of the time series compared to the LSTM.we apply AI-based forecasting models such time series ARIMA, LSTM, prophet and VAR.

arima covid-19 data-analysis data-science data-visualization fbprophet forecasting lstm rnn time-series var vectorautoregression

Last synced: 19 Jun 2026

https://github.com/evanmathew/northwind-traders

SQL-powered analysis of sales, employee performance, and customer behavior using PostgreSQL window functions. This project uncovers key business insights to optimize decision-making.

case-study data-analysis jupyter-notebook northwind-traders postgresql python-postgresql sql

Last synced: 20 Jun 2026

https://github.com/ladaegorova18/data_analysis

Learning the basics of data analysis in Python

analytics data-analysis data-visualization steam-games

Last synced: 24 Jun 2026

https://github.com/imosudi/unsupervised-ml-kmeans-analysis

K-Means clustering analysis using synthetic datasets generated with scikit-learn, including meshgrid visualisation, silhouette score evaluation, and investigation of cluster count and random seed effects.

clustering data-analysis jupyter-notebook kmeans kmeans-clustering machine-learning matplotlib python3 scikit-learn silhouette-score unsupervised-learning

Last synced: 25 Jun 2026

https://github.com/prakshal0809/power-bi-analytics-dashboard

I have developed a dashboard in Power BI utilizing data from an Excel file. The dashboard effectively visualizes and analyzes the given data.

data-analysis powerbi

Last synced: 22 Feb 2026

https://github.com/jasoncobra3/finops-copilot

An end-to-end AI-powered FinOps platform that ingests cloud billing data, analyzes cost trends, answers natural-language questions using a RAG pipeline (LangChain + FAISS + sentence-transformers + Groq), and provides actionable cost optimization recommendations. Includes a FastAPI backend and Streamlit dashboard UI - fully containerized with Docker

ai-assistant cloud-cost-optimization cloud-enginee cost-analytics data-analysis devops docker faiss faiss-vector-database fastapi finops groq langchain llm pandas rag rag-pipeline sentence-transformers sqlite3 streamlit

Last synced: 13 Apr 2026

https://github.com/0xunkn0wn4m1r/data_engineering_banking_project

🏦 Build a complete data engineering workflow for a banking system, showcasing ETL processes, data transformations, and an interactive financial dashboard.

automation data-analysis data-cleaning data-science feature-engineering fintech-bank flask-api loan-default-prediction machine-learning mlops model-explainability numpy postgresql scikit-learn segmentation shap sql unsupervised-learning

Last synced: 09 Apr 2026