An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/prakashjha1/new-analysis-using-llm-locally

An interactive news analysis tool built with Streamlit and local LLMs. This app allows users to analyze and gain insights from the latest news articles using advanced language models, all running locally. Explore trends, sentiment, and key topics with an intuitive interface.

artificial-intelligence data-analysis data-science llms ollama python streamlit

Last synced: 14 Mar 2025

https://github.com/yanny-alt/competitor-sales-analysis-in-power-bi

This project aims to analyze competitor sales for a fictional manufacturing company, Sintec, using Power BI. The focus is on integrating, cleaning, and modeling data from multiple sources to generate insightful reports on company and competitor performance.

data-analysis powerbi sales-analysis

Last synced: 07 Jan 2026

https://github.com/thbaylson/datascience

All of my past data science assignments put into one singular notebook. Most of this comes from my Machine Learning course.

data-analysis data-science data-visualization decision-tree jupyter-notebook k-nearest-neighbors linear-regression machine-learning neural-network pandas-library python3 scikit-learn

Last synced: 09 May 2026

https://github.com/sasanthns/sql_data_warehouse_project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

data data-analysis data-science data-warehouse datacleaning etl etlpipeline sql sqlserver

Last synced: 24 Mar 2025

https://github.com/syed-amjad-ali/-bank-churn-ml

Predicting bank customer churn using machine learning. This project includes exploratory data analysis (EDA), feature engineering, classification models (Logistic Regression, Random Forest), and customer segmentation using K-Means clustering.

classification data-analysis data-science eda jupyter-notebook k-means-clustering machine-learning ml python segmentation

Last synced: 09 Mar 2025

https://github.com/jnyambok/epl_dashboard

English Premier League Dashboard summarizing match data from 2009-2024

data-analysis data-science gcp powerbi

Last synced: 04 Sep 2025

https://github.com/hyperplasma/olympic-visualization-analysis

Multidimensional analysis and visualization of Olympic medals, economy, and happiness index.

data-analysis data-visualization matplotlib numpy pandas python wordcloud

Last synced: 04 May 2026

https://github.com/saksham-jain177/cryptodataanalysis

A Python powered project that fetches live cryptocurrency data from the CoinMarketCap API, analyzes it, and updates a live Excel sheet every 5 minutes.

api-integration coinmarketcap cryptocurrency data-analysis excel live-data python

Last synced: 12 Jun 2026

https://github.com/chinmayee4/vrinda_store_data_analysis

Analyzed Data By Creating Interactive Dashboard Using MS Excel

data-analysis data-cleaning data-visualization excel-dashboard pivot-tables power-query

Last synced: 07 Jan 2026

https://github.com/shahriarha/sql

Structured query language

data-analysis mysql mysql-database sql

Last synced: 02 Sep 2025

https://github.com/muthukumar0908/imdb_movie_analysis_with_powerbi

The project aim is to analyze the dataset using Power Bi, The dataset is related to IMDB Movies.

data-analysis data-visualization powerbi

Last synced: 12 Jun 2025

https://github.com/jcm-ai/quantium-data-analytics-virtual-experience-program

This repository contains all about the proposed solutions to the assignments that I was required to complete as part of the Quantium Data Analytics Virtual Experience Program. ๐Ÿ“Š๐Ÿ“ˆ๐Ÿ“‰๐Ÿ‘จโ€๐Ÿ’ป

commercial-thinking communication-skills data-analysis data-validation data-visualisation data-wrangling jupyter-notebook matplotlib-pyplot numpy-library pandas-python presentation-skills programming python3 scipy-stats seaborn statistical-testing

Last synced: 16 May 2026

https://github.com/adilshamim8/eda-on-health-and-sleep-data

Exploratory Data Analysis (EDA) on health and sleep data, uncovering patterns and insights using Python and visualization tools.

data-analysis data-visualization eda health healthcare sleep sleep-analysis

Last synced: 15 Mar 2025

https://github.com/pngo1997/life-expectancy-logistic-regression

Life expectancy analysis project using logistic regression.

data-analysis logistic-regression r rmarkdown

Last synced: 10 Jun 2026

https://github.com/nurulashraf/customer-segmentation-hierarchical-clustering

A customer segmentation project using hierarchical clustering to group customers based on their spending behaviour and demographics. This helps businesses identify patterns and create targeted marketing strategies.

business-analytics clustering-algorithm customer-segmentation data-analysis hierarchical-clustering machine-learning python unsupervised-learning

Last synced: 18 Apr 2025

https://github.com/giog97/find_similar_tables_on_pubtables-1m

Find similar tables on the PubTables-1M dataset

data-analysis data-visualization datamining dm tables

Last synced: 09 Apr 2025

https://github.com/ankitmishralive/machinelearning

Continuously deep diving in understanding & advancing my expertise in Machine Learning through ongoing education and hands on experience with practical learning.

artificial-intelligence data-analysis data-cleaning data-gathering machine-learning machinel-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 22 Mar 2025

https://github.com/rachit1084/sql-practice-ankit-bansal

Personal SQL problem-solving practice based on Ankit Bansal's YouTube series, with logic-driven solutions for analyst prep.

analytics data-analysis data-analyst interview-preparation logical-reasoning postgresql sql sql-practice

Last synced: 04 Jul 2025

https://github.com/farrelfaricaf/exploratorydataanalyst---titanic

This project analyzes the Titanic dataset using exploratory data analysis (EDA) and visualization techniques to identify survival patterns. The goal is to understand how demographic factors like gender and age influenced survival rates during the 1912 disaster.

data data-analysis data-science data-visualization eda python titanic-dataset

Last synced: 31 Jul 2025

https://github.com/saitoxu/data-analysis-workspace

Docker image for data analysis

data-analysis docker python

Last synced: 04 May 2026

https://github.com/abdullahashfaqvirk/powerbi-dashboards

A collection of Microsoft Power BI dashboards and reports designed to address business challenges and support data driven decision-making.

dashboards data-analysis data-driven data-science microsoft powerbi reports visualization

Last synced: 10 Mar 2026

https://github.com/muhammed-fazal/student-success-and-early-intervention-analytics-system

To consolidate scattered student performance records into a unified Data Warehouse in SQL Server. Engineer an Interactive Power BI dashboards that visualize academic trends, identifying student performance and implement predictive analytics.

analysis analytics dashboard data data-analysis data-engineering data-science data-visualization database etl etl-pipeline power-bi powerbi python sql sql-server

Last synced: 29 May 2026

https://github.com/wo0fle/sfrcp

The program used for a research study I conducted: "Comparison of Star Formation Rate in Spiral versus Elliptical Galaxies."

astronomy astropy data-analysis galaxy jupyter-notebook python research research-project

Last synced: 03 Apr 2025

https://github.com/ymorsi7/caliwageanalysis

California employment and wage analysis on data from the past decade.

data-analysis data-science ipynb jupyter-notebook

Last synced: 21 Jan 2026

https://github.com/deller23/hotel_booking_data_cleaning

Efficiently transforming raw hotel booking data into actionable insights! This project leverages Python and Pandas for advanced data cleaningโ€”handling missing values, detecting outliers, and optimizing featuresโ€”ensuring a high-quality dataset ready for analysisย andย modeling.

data-analysis data-cleaning data-preprocessing data-visualization data-wrangling pandas python

Last synced: 31 Mar 2025

https://github.com/dmdlgg/spotify-analysis

An interactive data analysis app built with Python, Pandas, Plotly, and Streamlit, showcasing insights about the top 1000 most played songs on Spotify. Dataset sourced from Kaggle. Users can explore the frequency, popularity, and most played songs by artist in a clean and intuitive interface.

data-analysis data-visualization pandas plotly python streamlit

Last synced: 11 May 2026

https://github.com/anudeepkaddala/bankds

This repository contains a Python-based solution for cleaning, matching, and formatting bank data. The primary goal is to match banks from two datasets based on their names and associate each bank with its respective asset size. The final output is a cleaned dataset with asset sizes in Indian-style currency format.

data-analysis data-science fuzzy-matching pandas python

Last synced: 12 Apr 2026

https://github.com/ejw-data/pandas-school

Analysis of school data with Pandas

data-analysis pandas python

Last synced: 08 May 2026

https://github.com/loginchik/mid_contracts

ะะฝะฐะปะธะท ะบะพะฝั‚ั€ะฐะบั‚ะพะฒ ะณะพััƒะดะฐั€ัั‚ะฒะตะฝะฝั‹ั… ะทะฐะบัƒะฟะพะบ ะœะ˜ะ”ะฐ ะ ะค

data-analysis dataset pandas python

Last synced: 17 Apr 2025

https://github.com/nikhil-donthusaram/heartdiseaseprediction

Heart Disease Prediction App is a machine learning web application that predicts the likelihood of heart disease based on user medical inputs. Built using a Decision Tree Classifier and deployed with Streamlit for an interactive, user-friendly interface.

data-analysis descision-tree joblib jupyter-notebook machine-learning matplotlib numpy pandas python3 seaborn sklearn streamlit vscode

Last synced: 11 Apr 2026

https://github.com/kernelshreyak/kaggle-notebooks

Collection of my Kaggle notebooks for data analysis and machine learning on a variety of datasets

data-analysis data-science data-visualization kaggle kaggle-competition machine-learning

Last synced: 27 Apr 2026

https://github.com/manisharora96/instagram-reach-analysis

This project provides a detailed approach to analyzing Instagram reach and engagement metrics. By leveraging the code and tools shared here, you can gain valuable insights into your Instagram content's performance and optimize your strategy to grow your audience effectively

data-analysis data-visualization instagram-reach python-tools

Last synced: 23 Mar 2025

https://github.com/nafiealhilaly/first-dash-app

A simple dash plotly app to explore and analyze imagined students assessment dataset

data-analysis data-analytics data-visualization eda plotly-dash python

Last synced: 02 Apr 2025

https://github.com/fatihilhan42/eda-spacex-launches-falcon9-and-falcon-heavy

In this project, we analyze the space flight data of Spacex space research company Falcon 9 rocket.

data-analysis data-science data-visualization eda elonmusk spacex

Last synced: 23 Mar 2025

https://github.com/fatihilhan42/turkey_earthquake_analysis_1915-2021_python

In this project, earthquakes in Turkey from 1915 to 2021 were analyzed. The data taken from the data set, which you can find in the repo, was first organized using data cleaning algorithms. Afterwards, these cleaned data were printed out as graphics and animation using data visualization algorithms.

data-analysis data-cleaning data-visualization jupyter-notebook

Last synced: 23 Mar 2025

https://github.com/analysisbyvivek/Crime-data

Analyzes crime patterns across different areas, exploring factors such as crime type, weapon usage, demographic influences, and geographic distribution to uncover trends in frequency, correlations, and hotspots.

apache-superset data-analysis eda jupyter-notebook python

Last synced: 29 Jan 2026

https://github.com/mchenryspagg/investigate_a_dataset

This is a data analysis project that demonstrates the student's ability to use python data analysis libraries such as pandas, numpy and pyplot in matplotlib to investigate a dataset and answer specific questions from the dataset, thus demonstrating skills in data cleaning, data wrangling, and exploratory data analysis.

data-analysis datetime descriptive-analysis descriptive-statistics exploratory-data-analysis numpy pandas pyplot python visualization

Last synced: 04 May 2026

https://github.com/fatihilhan42/book-recommendation-system-with-python

In this project, we are making a book recommendation system that recommends similar books according to the genres or ratings that the user enters, using a large book dataset. The link of the dataset is given below. Happy reading...

books data-analysis data-science data-visualization kaggle python recommendation-engine recommendation-system

Last synced: 04 May 2026

https://github.com/sagarprajapat2004/data-analysis-visualization

Downloaded and analyzed a dataset from Kaggle using NumPy and Pandas created visualizations with Matplotlib and Seaborn developed a Flask web application to showcase data insights and conclusions.

data-analysis data-modeling data-visualization exploratory-data-analysis flask python statical-analysis

Last synced: 04 May 2026

https://github.com/halyusa16/e-commerce-analysis

This project analyzes a public e-commerce dataset to uncover valuable insights and answer critical business questions. The dataset contains customer, product, order, and transaction details, providing a comprehensive view of the e-commerce platform's operations.

data-analysis data-cleaning data-exploration data-visualization self-project

Last synced: 09 Jun 2026

https://github.com/rtgrt5645/numpy-lab

๐Ÿงฎ Explore, manipulate, and visualize data with NumPy to enhance your Python skills in scientific computing and data analysis.

array-operations data-analysis data-science jupyter-notebook machine-learning numerical-computing numpy numpy-arrays numpy-library numpy-python python python3 scientific-computing

Last synced: 04 May 2026

https://github.com/jatin-mehra119/flight-price-prediction

This study aims to analyze flight booking data from "Ease My Trip" website, using statistical tests and linear regression to extract insights. By understanding this data, valuable information can be gained to benefit passengers using the platform.

data-analysis datacleaning datavisualization machine-learning preprocessing-data python sklearn-pipeline sklearn-regression-algorithm streamlit-webapp

Last synced: 04 May 2026

https://github.com/ljadhav25/logistic-regression-data-science-

Logistic regression estimates the probability of an event occurring, such as voted or didnโ€™t vote, based on a given data set of independent variables.

data-analysis data-science data-visualization logestic-regression machine-learning

Last synced: 04 May 2026

https://github.com/mugilan1309/csv_analyzer

๐Ÿ“Š A simple Streamlit-based CSV Analysis & Preprocessing Tool for quick data insights.

csv-processing data-analysis data-visualization machine-learning python streamlit

Last synced: 04 May 2026

https://github.com/youssefyaser/scrape-the-imdb-site-for-the-top-250-movies

Web scraping the top 250 movies in IMDB site.

data-analysis numpy pandas python

Last synced: 04 May 2026

https://github.com/jendives2000/regressions

Performing of a Linear Regression analysis to determine the strength of the relationship between the number of reviews and sales for a retail company.

data-analysis linear-regression pearson-correlation-coefficient regression

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/flytomarsz/bike-sharing-system-analysis

This analysis project aim to identify bike rental's behavior in 2012 from Capital Bikeshare system, Washington D.C., USA. This project is part of my Data Analysis study at Dicoding.

data-analysis data-visualization jupyter-notebook python streamlit

Last synced: 04 May 2026

https://github.com/tasosfotiadis/time-series-analysis-and-forecasting-of-cryptocurrency-prices

Forecasted Cardano (ADA) cryptocurrency prices using time series analysis. The project involved data preprocessing, trend and seasonality analysis, and model building with ARIMA, SARIMA, and LSTM. Models were evaluated using metrics like MAE and MAPE, providing insights for financial decision-making.

applied-st classical-statistical-models data-analysis deep-learning lstm machine-learning neural-network python r time-series

Last synced: 05 May 2026

https://github.com/rtlich/sap-sustainable-management

Project for the ERP & BI course at Esprit School of Engineering. It optimizes resource and operations management in an agri-food company using SAP MM & PM, focusing on sustainability, COโ‚‚ reduction, and predictive maintenance.

angular business-intelligence data-analysis flask machine-learning ocr powerbi python sql-server talend

Last synced: 05 May 2026

https://github.com/badranalyst/residential-unit-prices-data-analysis-application

Python-based analysis of residential unit prices, focusing on data cleaning, visualization, and exploratory data analysis (EDA). Key features include price distribution, and correlation analysis between factors like size, location, and pricing.

data-analysis data-visualization dataset matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/jacktheprogrammer/time-series-forecasting-and-analysis

My personal project consisting of my personally created notebooks to work with time series forecasting and analysis. In these projects, I've used deep learning using tensorflow, xgboost, statsmodels and scipy libraries of python. The series were of weather, energy consumption and that of stocks.

data-analysis data-science deep-neural-networks energy-consumption machine-learning portfolio prophet-facebook prophet-model python python3 scipy statsmodels stocks tensorflow time-series time-series-analysis timeseries-forecasting weather xgboost

Last synced: 05 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026

https://github.com/pcanadas/weather_scraper

Este proyecto automatiza la recopilaciรณn y el procesamiento de datos meteorolรณgicos histรณricos y previsionales. Utiliza Selenium para extraer informaciรณn de sitios web de clima, procesa los datos con Pandas y los almacena en archivos CSV limpios. Es ideal para anรกlisis climรกticos, visualizaciรณn de datos o integraciรณn en otros sistemas.

beautifulsoup data-analysis pandas python selenium

Last synced: 05 May 2026

https://github.com/monish-nallagondalla/universal-bank

Credit Card Ownership Prediction A machine learning project that predicts credit card ownership using features like age and income, balancing class distributions for improved accuracy.

classification-models credit-card-prediction data-analysis data-classification decision-tree-classifier imbalanced-datasets machine-learning model-evaluation python scikit-learn

Last synced: 05 May 2026

https://github.com/akash-47-tank/personalized-e-commerce-review-summarizer

Personalized E-commerce Product Review Summarizer: A Streamlit app that summarizes product reviews (e.g., from a CSV) using T5-small and tailors summaries to user preferences (price, durability, etc.) with NLP and lightweight ML.

data-analysis e-commerce machine-learning nlp personalization portfolio python scikit-learn sentiment-analysis streamlit t5 transformers web-app

Last synced: 05 May 2026

https://github.com/anushkundu/crime-pattern-analysis

Analyzing Crime Patterns in Montgomery County, USA: An Inclusive Study Based on NIBRS Data (2016-2022)

data-analysis data-visualization descriptive-statistics matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/kammarah/data-sample

I designed a database website ๐ŸŒ that can be uploaded easily for use ๐Ÿ“ค. You can check my website ๐Ÿ‘€.

data-analysis data-visualization database deploy deployment library-management-system panaversity streamlit webapp

Last synced: 05 May 2026

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 05 May 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/hms75/movie_rating_analysis

A movie rating analysis which identifies trends amongst a dataset of 5000 movies.

data-analysis data-visualization matplotlib-pyplot numpy pandas python

Last synced: 05 May 2026

https://github.com/benjaminrose/data-analysis-book

A Jupyter Book for my Spring 2025 PHY 5381 class on Data Analysis

book data-analysis data-science data-visualization jupyter-book open-book python r statistics-course

Last synced: 06 May 2026

https://github.com/ryuzen6/bangalore-real-estate-price-prediction

This is a Data Science Project which predicts the cost of Real Estate in Bangalore. Requirements: Jupyter Notebook (for Data Cleaning and creating the Linear Regression using various python libraries) , Pycharm (python IDE for creating Python Flask Server), Visual Studio Code (to create the UI with HTML, CSS and Javascript).

css3 data-analysis data-science html5 javascript jupyter-notebook machine-learning python3

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

๐Ÿ“Š analyzing NYC school test scores with python ๐Ÿ to spot top performers ๐Ÿ† & trends ๐Ÿ“ˆ

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/erick957/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

๐Ÿ  Predict house prices using advanced regression techniques with this comprehensive analysis and cleaning project, from data loading to model deployment.

data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 06 May 2026

https://github.com/ankitwalimbe/sentiment-analysis

Sentiment analysis of Amazon Fashion reviews using VADER and a baseline ML model (TF-IDF + SGDClassifier). Includes visualizations, reproducible notebook, and recruiter-ready documentation.

data-analysis machine-learning matplotlib nlp pandas python seaborn sentiment-analysis sklearn

Last synced: 06 May 2026

https://github.com/mikma03/datascience_python_datacamp

DataScience with Python. Code and examples. Python libraries, including pandas, NumPy, Matplotlib, and many more.

data-analysis data-science datacamp datascience numpy pandas python

Last synced: 06 May 2026

https://github.com/harryrlk/data_analysis_showcase

This repository showcases my data analysis and visualization projects using Excel, Python, R, and Tableau. Some projects are under NDA, so key figures and specific numbers are not included, but brief overviews and methodologies are provided. Feel free to explore and contact me for further details.

data-analysis data-science data-visualization excel portfolio python r tableau

Last synced: 06 May 2026

https://github.com/kishorep26/school-recommendation-system

Intelligent school recommendation system that matches students with suitable educational institutions based on preferences and performance metrics

bootstrap data-analysis decision-support edtech education education-technology flask matching-algorithm python recommendation-system school-finder school-search student-portal web-application

Last synced: 06 May 2026

https://github.com/josepablodmg/python--linear-regression-advertising

A linear regression analysis to predict sales based on advertising spending across TV, radio, and newspaper channels. The project includes exploratory data analysis, model training, coefficient visualization, and residual analysis.

advertising data-analysis exploratory-data-analysis linear-regression machine-learning python regression scikit-learn visualization

Last synced: 06 May 2026

https://github.com/jbn/vaquero

A Python library for iterative and interactive data wrangling at laptop-scale.

data data-analysis data-cleaning data-mining dirty-data elt etl etl-framework

Last synced: 10 Jun 2026

https://github.com/vimlesh-gupta/blinkit_data_analytics_project

End-to-end Blinkit data analytics project using Python, SQL Server & Power BI

blinkit data-analysis eda pandas powerbi python sql-server

Last synced: 06 May 2026

https://github.com/korniichuk/pydatan-homework

Python Data Analysis course homework

course data-analysis data-analysis-python python python3

Last synced: 06 May 2026

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026