An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/sajjad425/edaipl

The dataset covers the Indian Premier League (IPL) with details on matches (date, teams, venue, results), player stats (runs, wickets), team stats (wins, losses), season summaries, and umpire info. The EDA reveals patterns and insights, highlighting dominant teams, star players, and trends across seasons.

data-analysis eda exploratory-data-analysis ipl python

Last synced: 05 May 2026

https://github.com/cicku/en.650.672

HW of EN.650.672

analytics data-analysis numpy pandas

Last synced: 05 May 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 05 May 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/donmaruko/python-eda-toolkit

CLI-runned EDA with 30 commands utilizing text-related functions, statistical calculations, data visualization, and data manipulation.

data data-analysis data-science data-visualization matplotlib pandas scipy seaborn statistical-analysis statistics wordcloud

Last synced: 06 May 2026

https://github.com/ryuzen6/bangalore-real-estate-price-prediction

This is a Data Science Project which predicts the cost of Real Estate in Bangalore. Requirements: Jupyter Notebook (for Data Cleaning and creating the Linear Regression using various python libraries) , Pycharm (python IDE for creating Python Flask Server), Visual Studio Code (to create the UI with HTML, CSS and Javascript).

css3 data-analysis data-science html5 javascript jupyter-notebook machine-learning python3

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/ankitwalimbe/sentiment-analysis

Sentiment analysis of Amazon Fashion reviews using VADER and a baseline ML model (TF-IDF + SGDClassifier). Includes visualizations, reproducible notebook, and recruiter-ready documentation.

data-analysis machine-learning matplotlib nlp pandas python seaborn sentiment-analysis sklearn

Last synced: 06 May 2026

https://github.com/ksm26/ml-ai-data-science-jobs-in-canada

Explore the latest machine learning, artificial intelligence, and data science job opportunities in Canada. Stay informed about Canadian tech job market trends and find your next career move.

ai-canada ai-careers canada canadian-tech-companies canadian-tech-job-market data data-analysis data-engineering data-science data-science-careers machine-learning prompt-engineering robotics

Last synced: 06 May 2026

https://github.com/devesh8423/machine_learning

Machine Learning practice projects, Jupyter notebooks, and datasets for learning regression, classification, and data analysis.

classification data-analysis data-science data-visualization jupyter-notebook machine-learning matplotlib ml-project numpy-library pandas python regression sckit-learn seaborn

Last synced: 03 May 2026

https://github.com/donmaruko/flask-data-analysis

Flask API for statistical calculations. Data analysis, cleansing, visualization, and manipulation. Documented by Swagger.

api api-rest data-analysis data-science data-visualization datascience flasgger matplotlib pandas seaborn sqlite wordcloud

Last synced: 03 May 2026

https://github.com/samruddhi3012/screen-time-analysis

Hi! This repo demonstrates a python project on Screen Time Analysis.

data-analysis data-visualization python

Last synced: 04 May 2026

https://github.com/mindlessmuse666/titanic-data-visualization

Проект по визуализации данных о пассажирах Титаника с использованием библиотек Python Matplotlib, Seaborn и Plotly.

data-analysis data-visualization matplotlib pandas plotly python seaborn titanic

Last synced: 04 May 2026

https://github.com/fatihilhan42/the-office-eda

Data analysis study of my favorite sitcom, The Office (US).

data-analysis data-science data-visualization fatihilhan office python sitcom

Last synced: 04 May 2026

https://github.com/sanchittechnogeek/rental-data-visualization_python

Statistics and visualization of rental data with python

data-analysis data-science data-visualization statistics

Last synced: 04 May 2026

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: 04 May 2026

https://github.com/damisparks/become_data_analyst

Are you new to Data Analysis ? Here you will find simple notebook that will help through your journey. These are personal projects I work on and still working.

data data-analysis data-visualization matplotlib numpy pandas-tutorial

Last synced: 04 May 2026

https://github.com/mr-chang95/sf_data_visualization

In this personal project, I am interested in examining all of the active businesses in the San Francisco Bay Area while performing some simple data visualizations, mainly on categorical variables.

business data-analysis data-visualization jupyter-notebook pandas python san-francisco

Last synced: 04 May 2026

https://github.com/sweta-kaundilya/python_for_data_analysis

Learning Python and all the relevant libraries in python for Data field.

cufflinks data-analysis data-science matplotlib numpy pandas plotly python seaborn

Last synced: 04 May 2026

https://github.com/fatihilhan42/book-recommendation-system-with-python

In this project, we are making a book recommendation system that recommends similar books according to the genres or ratings that the user enters, using a large book dataset. The link of the dataset is given below. Happy reading...

books data-analysis data-science data-visualization kaggle python recommendation-engine recommendation-system

Last synced: 04 May 2026

https://github.com/hyperplasma/olympic-visualization-analysis

Multidimensional analysis and visualization of Olympic medals, economy, and happiness index.

data-analysis data-visualization matplotlib numpy pandas python wordcloud

Last synced: 04 May 2026

https://github.com/rtgrt5645/numpy-lab

🧮 Explore, manipulate, and visualize data with NumPy to enhance your Python skills in scientific computing and data analysis.

array-operations data-analysis data-science jupyter-notebook machine-learning numerical-computing numpy numpy-arrays numpy-library numpy-python python python3 scientific-computing

Last synced: 04 May 2026

https://github.com/jatin-mehra119/flight-price-prediction

This study aims to analyze flight booking data from "Ease My Trip" website, using statistical tests and linear regression to extract insights. By understanding this data, valuable information can be gained to benefit passengers using the platform.

data-analysis datacleaning datavisualization machine-learning preprocessing-data python sklearn-pipeline sklearn-regression-algorithm streamlit-webapp

Last synced: 04 May 2026

https://github.com/mugilan1309/csv_analyzer

📊 A simple Streamlit-based CSV Analysis & Preprocessing Tool for quick data insights.

csv-processing data-analysis data-visualization machine-learning python streamlit

Last synced: 04 May 2026

https://github.com/saitoxu/data-analysis-workspace

Docker image for data analysis

data-analysis docker python

Last synced: 04 May 2026

https://github.com/youssefyaser/scrape-the-imdb-site-for-the-top-250-movies

Web scraping the top 250 movies in IMDB site.

data-analysis numpy pandas python

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/tasosfotiadis/time-series-analysis-and-forecasting-of-cryptocurrency-prices

Forecasted Cardano (ADA) cryptocurrency prices using time series analysis. The project involved data preprocessing, trend and seasonality analysis, and model building with ARIMA, SARIMA, and LSTM. Models were evaluated using metrics like MAE and MAPE, providing insights for financial decision-making.

applied-st classical-statistical-models data-analysis deep-learning lstm machine-learning neural-network python r time-series

Last synced: 05 May 2026

https://github.com/matt-ags/jornada-python

Repositório com os projetos realizados durante a semana "Jornada Python" - 01/2025

artificial-intelligence automation data-analysis jupyter-notebook machine-learning python

Last synced: 05 May 2026

https://github.com/rtlich/sap-sustainable-management

Project for the ERP & BI course at Esprit School of Engineering. It optimizes resource and operations management in an agri-food company using SAP MM & PM, focusing on sustainability, CO₂ reduction, and predictive maintenance.

angular business-intelligence data-analysis flask machine-learning ocr powerbi python sql-server talend

Last synced: 05 May 2026

https://github.com/demon-2-angel/product-customer-acquisition-analysis-using-behaviour

The database encompasses eight tables with varied attributes and rows. Key analyses include product restocking needs, top VIP customers' contributions, and an average customer profit of $39,039.59. Recommendations emphasize strategic marketing to new customers and incentives for existing VIP clients based on acquisition costs and profit insights.

customer-products customer-segmentation data-analysis database sqlite

Last synced: 05 May 2026

https://github.com/badranalyst/residential-unit-prices-data-analysis-application

Python-based analysis of residential unit prices, focusing on data cleaning, visualization, and exploratory data analysis (EDA). Key features include price distribution, and correlation analysis between factors like size, location, and pricing.

data-analysis data-visualization dataset matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/13anush/python-libraries-

A collection of essential Python libraries—NumPy, Pandas, Matplotlib, and Seaborn—perfect for anyone starting out in data analysis.

data-analysis matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/zafir100100/cancer-stage-prediction

This code predicts cancer data using various regression models, calculates their average R-squared scores, and prints the best model.

cross-validation data-analysis data-preprocessing decision-trees gradient-boosting linear-regression machine-learning-algorithms numpy pandas random-forest regression scikit-learn

Last synced: 05 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026

https://github.com/pcanadas/weather_scraper

Este proyecto automatiza la recopilación y el procesamiento de datos meteorológicos históricos y previsionales. Utiliza Selenium para extraer información de sitios web de clima, procesa los datos con Pandas y los almacena en archivos CSV limpios. Es ideal para análisis climáticos, visualización de datos o integración en otros sistemas.

beautifulsoup data-analysis pandas python selenium

Last synced: 05 May 2026

https://github.com/mohitsai/boston-housing-data-analysis

Data Analysis Project for the City of Boston Government for insights into effect of property rennovations and remodelling on housing availability in the city

data-analysis data-science matplotlib numpy pandas python

Last synced: 05 May 2026

https://github.com/akash-47-tank/personalized-e-commerce-review-summarizer

Personalized E-commerce Product Review Summarizer: A Streamlit app that summarizes product reviews (e.g., from a CSV) using T5-small and tailors summaries to user preferences (price, durability, etc.) with NLP and lightweight ML.

data-analysis e-commerce machine-learning nlp personalization portfolio python scikit-learn sentiment-analysis streamlit t5 transformers web-app

Last synced: 05 May 2026

https://github.com/anushkundu/crime-pattern-analysis

Analyzing Crime Patterns in Montgomery County, USA: An Inclusive Study Based on NIBRS Data (2016-2022)

data-analysis data-visualization descriptive-statistics matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/kammarah/data-sample

I designed a database website 🌐 that can be uploaded easily for use 📤. You can check my website 👀.

data-analysis data-visualization database deploy deployment library-management-system panaversity streamlit webapp

Last synced: 05 May 2026

https://github.com/nimbostratos/titanic-survival-prediction

Machine learning project predicting Titanic survival using AdaBoost with feature engineering and hyperparameter optimization

data-analysis data-science data-science-projects kaggle machine-learning machine-learning-models python scikit-learn

Last synced: 05 May 2026

https://github.com/meinhere/dicoding-analisis-data

Submission Analisis Data dengan tema E-Commerce Streamlit App

data-analysis data-mining e-commerce python streamlit

Last synced: 05 May 2026

https://github.com/panoschatzi/healthcare_and_bioinformatics_analyses

Healthcare and Bioinformatics data analysis projects with Python and SQL.

data-analysis data-cleaning data-visualisation jupyter matplotlib mysql pandas plotly python seaborn sql

Last synced: 06 May 2026

https://github.com/ibrahimceyisakar/hotel-finder

Hotel finder system with Python includes data gathering, analyzing, and visualization.

data-analysis data-gathering data-visualization pandas plotly python selenium streamlit

Last synced: 06 May 2026

https://github.com/yashpaneliya/bank-loan-default-analysis

Analyze and understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default.

data-analysis loan-default-analysis matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/erick957/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

🏠 Predict house prices using advanced regression techniques with this comprehensive analysis and cleaning project, from data loading to model deployment.

data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 06 May 2026

https://github.com/abhinav330/customer-behavior-analysis-linear-regression

This repository explores customer behavior data for an NYC clothing company with both a mobile app and website. They want to understand which platform drives higher sales.

data-analysis data-science data-visualization eda exploratory-data-analysis jupyter jupyter-notebook linear-regression machine-learning machine-learning-algorithms machinelearning-python numpy pandas python regression-analysis

Last synced: 06 May 2026

https://github.com/mikma03/datascience_python_datacamp

DataScience with Python. Code and examples. Python libraries, including pandas, NumPy, Matplotlib, and many more.

data-analysis data-science datacamp datascience numpy pandas python

Last synced: 06 May 2026

https://github.com/harryrlk/data_analysis_showcase

This repository showcases my data analysis and visualization projects using Excel, Python, R, and Tableau. Some projects are under NDA, so key figures and specific numbers are not included, but brief overviews and methodologies are provided. Feel free to explore and contact me for further details.

data-analysis data-science data-visualization excel portfolio python r tableau

Last synced: 06 May 2026

https://github.com/kishorep26/school-recommendation-system

Intelligent school recommendation system that matches students with suitable educational institutions based on preferences and performance metrics

bootstrap data-analysis decision-support edtech education education-technology flask matching-algorithm python recommendation-system school-finder school-search student-portal web-application

Last synced: 06 May 2026

https://github.com/jbn/vaquero

A Python library for iterative and interactive data wrangling at laptop-scale.

data data-analysis data-cleaning data-mining dirty-data elt etl etl-framework

Last synced: 10 Jun 2026

https://github.com/edanur-y/variable-analysis-of-banks-ratio-data

Testing variables for multicollinearity, multivariate normality and analyzing outliers and missing values. ⭕SPSS 🔵R

data-analysis log-transformation missing-values-analysis multicollinearity normality-test r spss

Last synced: 10 Jun 2026

https://github.com/korniichuk/pydatan-homework

Python Data Analysis course homework

course data-analysis data-analysis-python python python3

Last synced: 06 May 2026

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026

https://github.com/jpgiant/gujaratrainfallanalysis_2021

Analysis about the rainfall that occurred in the districts of Gujarat state in 2021

data-analysis exploratory-data-analysis exploratory-data-visualizations matplotlib numpy pandas-python python

Last synced: 07 May 2026

https://github.com/pedrosfaria2/fugascomhelicoptero

Meu primeiro uso do Jupyter Notebook em um projeto

analise-de-dados data-analysis jupyter-notebook matplotlib pandas python

Last synced: 07 May 2026

https://github.com/lucalullo/global-emissions-and-temperature-1950-2024

Global climate analysis covering 75 years of CO₂, greenhouse gas emissions and mean surface temperatures across countries (1950–2024). Built with Pandas, Matplotlib, Seaborn and Plotly.

climate-change co2-emissions data-analysis greenhouse-gas temperature

Last synced: 10 Jun 2026

https://github.com/satyam4229/identify-employee-attrition

This is the model where we predict the attrition of the employees of the company by checking there records and all. In the given dataset, we have the features like salary, environment, age, gender and their experience.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 08 May 2026

https://github.com/riborings/python_projects

Python projects and other programming experiences

data-analysis machine-learning project python regression-analysis

Last synced: 08 May 2026

https://github.com/blladerunner/customer-churn-dashboard

Customer Churn Dashboard — SQL + Python analytics project exploring customer retention patterns, churn rate by demographics and services, and key insights for telecom business strategy.

business-intelligence churn-analysis customer-retention dashboard data-analysis data-analytics data-science pandas powerbi python sql sqlite telecom

Last synced: 08 May 2026

https://github.com/devexpress-examples/wpf-pivot-grid-group-date-time-values

This example shows how to group date-time values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 08 May 2026

https://github.com/danmadeira/algoritmos-estatistica-python

Demonstração de Algoritmos de Estatística em Python

algorithms data-analysis data-science python statistics

Last synced: 08 May 2026

https://github.com/0290192029/apartment-price-predictor

Python-проект по прогнозированию стоимости аренды квартир с помощью линейной регрессии. Практическая работа по теме: "Основы машинного обучения" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

apartment-price-prediction apartments-for-rent api correios-api data-analysis feature-engineering feature-enginering linear-regression linear-regression-models mlops numpy prediction-model r seaborn

Last synced: 08 May 2026

https://github.com/shelton-beep/trading-algorithm

A simple trading algorithm for SPY ETF using a moving average crossover strategy. This project analyzes SPY weekly price data, implements a buy/sell algorithm, and tracks performance metrics to evaluate profitability and risk. Ideal for learning algorithmic trading basics and financial data analysis.

data-analysis financial-analysis investment-strategy jupyter-notebook pandas python quantitative-finance technical-analysis time-series-analysis trading-strategies

Last synced: 08 May 2026

https://github.com/sumit-sinha9/sales-analysis

Analyzing 12 months worth fo Sales data

data-analysis pandas python visualization

Last synced: 08 May 2026

https://github.com/vanshuchaudhary/flightpriceanalysis-

The uploaded file is a Jupyter Notebook titled "Flight Analysis". It likely involves analyzing flight-related data, potentially exploring trends, patterns, or insights using data science techniques. The analysis might include data visualization, statistical analysis, or predictive modeling.

business-analytics data data-analysis data-visualization datainsights datascience matplotlib-pyplot python seaborn seaborn-plots seaborn-python sns statistical-analysis

Last synced: 08 May 2026

https://github.com/mrunmayee3108/financial-chatbot

A Python chatbot for analyzing financial data of companies with revenue, income, assets, cash flow, and debt ratio queries

chatbot data-analysis jupyter-notebook pandas python python3

Last synced: 09 May 2026

https://github.com/drod75/burger_king_analysis

A simple analysis on a burger king dataset.

data-analysis data-visualization jupyter-notebook pandas python seaborn

Last synced: 09 May 2026

https://github.com/emanoelcampos/python-onemonth

This repository contains educational materials and projects developed during a Python course offered by OneMonth. It covers Python basics, intermediate concepts, web development with Flask, and data analysis with pandas. The course is structured into weeks, each focusing on a different aspect of Python programming and its applications.

data-analysis flask jupyter-notebook onemonth python python3

Last synced: 09 May 2026

https://github.com/zxjahid/matplotlib

A comprehensive guide to mastering data visualization with Matplotlib through hands-on examples and advanced techniques. 🚀📊

candlestick candlestick-chart cheatsheet data-analysis data-visualization gtk jupyter-notebook maps matplotlib-python pandas thesis-template tk tutorial wx

Last synced: 09 May 2026

https://github.com/master-helix/ibm-data-analyst-certification-stock-analysis-project

This is a mini project repository of my IBM Certification involving stock analysis and plotting of Tesla and GameStop

analytics data data-analysis data-visualization ibm matplotlib pandas python web-scraping

Last synced: 09 May 2026