An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/badranalyst/residential-unit-prices-data-analysis-application

Python-based analysis of residential unit prices, focusing on data cleaning, visualization, and exploratory data analysis (EDA). Key features include price distribution, and correlation analysis between factors like size, location, and pricing.

data-analysis data-visualization dataset matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/jacktheprogrammer/time-series-forecasting-and-analysis

My personal project consisting of my personally created notebooks to work with time series forecasting and analysis. In these projects, I've used deep learning using tensorflow, xgboost, statsmodels and scipy libraries of python. The series were of weather, energy consumption and that of stocks.

data-analysis data-science deep-neural-networks energy-consumption machine-learning portfolio prophet-facebook prophet-model python python3 scipy statsmodels stocks tensorflow time-series time-series-analysis timeseries-forecasting weather xgboost

Last synced: 05 May 2026

https://github.com/sajjad425/edaipl

The dataset covers the Indian Premier League (IPL) with details on matches (date, teams, venue, results), player stats (runs, wickets), team stats (wins, losses), season summaries, and umpire info. The EDA reveals patterns and insights, highlighting dominant teams, star players, and trends across seasons.

data-analysis eda exploratory-data-analysis ipl python

Last synced: 05 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026

https://github.com/monish-nallagondalla/universal-bank

Credit Card Ownership Prediction A machine learning project that predicts credit card ownership using features like age and income, balancing class distributions for improved accuracy.

classification-models credit-card-prediction data-analysis data-classification decision-tree-classifier imbalanced-datasets machine-learning model-evaluation python scikit-learn

Last synced: 05 May 2026

https://github.com/ayaatmohammed/amazon-sales-analysis-pyspark

In-depth analysis of the Olist E-commerce dataset from Kaggle using PySpark for customer segmentation (RFM) and market basket analysis.

big-data big-data-analytics customer-segmentation data-analysis data-science ecommerce jupyter-notebook kaggle pyspark python rfm-analysis

Last synced: 05 May 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 05 May 2026

https://github.com/caesaredia/ymusic-project

Exploratory data analysis (EDA) of music streaming behavior in two fictional cities using Python, Pandas, and Jupyter Notebook. It explores user behavior, genre preferences, and listening patterns throughout the week.

data-analysis eda pandas python

Last synced: 05 May 2026

https://github.com/hms75/movie_rating_analysis

A movie rating analysis which identifies trends amongst a dataset of 5000 movies.

data-analysis data-visualization matplotlib-pyplot numpy pandas python

Last synced: 05 May 2026

https://github.com/donmaruko/python-eda-toolkit

CLI-runned EDA with 30 commands utilizing text-related functions, statistical calculations, data visualization, and data manipulation.

data data-analysis data-science data-visualization matplotlib pandas scipy seaborn statistical-analysis statistics wordcloud

Last synced: 06 May 2026

https://github.com/panoschatzi/healthcare_and_bioinformatics_analyses

Healthcare and Bioinformatics data analysis projects with Python and SQL.

data-analysis data-cleaning data-visualisation jupyter matplotlib mysql pandas plotly python seaborn sql

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/sankaran-s2001/us-traffic-accidents-analysis-python-eda

Exploratory data analysis of US traffic accidents from 2016-2023, analyzing patterns by time, location, weather, and severity using Python data science libraries.

data-analysis data-science data-visualization eda matplolib numpy pandas python

Last synced: 06 May 2026

https://github.com/ankitwalimbe/sentiment-analysis

Sentiment analysis of Amazon Fashion reviews using VADER and a baseline ML model (TF-IDF + SGDClassifier). Includes visualizations, reproducible notebook, and recruiter-ready documentation.

data-analysis machine-learning matplotlib nlp pandas python seaborn sentiment-analysis sklearn

Last synced: 06 May 2026

https://github.com/deanlogan/data-analysis-course

Code created when completing the Data Analysis with Python Course on freecodecamp.org

course data-analysis numpy pandas python python3

Last synced: 06 May 2026

https://github.com/kishorep26/school-recommendation-system

Intelligent school recommendation system that matches students with suitable educational institutions based on preferences and performance metrics

bootstrap data-analysis decision-support edtech education education-technology flask matching-algorithm python recommendation-system school-finder school-search student-portal web-application

Last synced: 06 May 2026

https://github.com/drill-n-bass/dealavo-project

Cartesian product from dictionary to list of dictionaries and faster methods for finding index than the `index` method.

data-analysis data-analysis-python matplotlib pandas python python3 random timeit

Last synced: 06 May 2026

https://github.com/jbn/vaquero

A Python library for iterative and interactive data wrangling at laptop-scale.

data data-analysis data-cleaning data-mining dirty-data elt etl etl-framework

Last synced: 10 Jun 2026

https://github.com/vimlesh-gupta/blinkit_data_analytics_project

End-to-end Blinkit data analytics project using Python, SQL Server & Power BI

blinkit data-analysis eda pandas powerbi python sql-server

Last synced: 06 May 2026

https://github.com/korniichuk/pydatan-homework

Python Data Analysis course homework

course data-analysis data-analysis-python python python3

Last synced: 06 May 2026

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026

https://github.com/rlalpha49/anisearch-model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2026

https://github.com/urbanekda/upwork_dashboard

A data analysis project examining trends and patterns in the data science job market on Upwork. This project analyzes job postings, requirements, and market demands to provide insights into the freelance data science ecosystem.

data-analysis data-science data-science-projects data-visualization freelance jupyter-notebook python streamlit

Last synced: 07 May 2026

https://github.com/karlyndiary/coffee-shop-sales-analysis

Comprehensive analysis of coffee shop sales utilizing Pandas for data cleaning and exploratory data analysis (EDA), complemented by Streamlit for creating interactive data visualization dashboards.

data-analysis data-cleaning data-preprocessing data-visualization eda pandas streamlit streamlit-dashboard

Last synced: 07 May 2026

https://github.com/badranalyst/exploratory-data-analysis-on-salaries-dataset

Performing EDA on a dataset related to salaries, exploring relationships between factors like job titles, industries, and locations. Insights are visualized with plots to identify trends and disparities in salary data.

data-analysis dataset eda exploratory-data-analysis pandas python

Last synced: 07 May 2026

https://github.com/rohansoni45/whatsapp-chat-analysis

This project involves analyzing WhatsApp chat data to extract valuable insights. Using Python and various libraries like Pandas and Matplotlib, the project processes and visualizes chat statistics such as message frequency, most active participants, and sentiment analysis.

chat-analysis data-analysis data-science matplotlib pandas python sentiment-analysis streamlit visualization web-app word-cloud

Last synced: 07 May 2026

https://github.com/warazkhan/airplane-crashes-and-fatalities-since-1908-

This project analyzes airplane crash data (1908 - 2008)✈️📊 to uncover trends in aviation accidents, fatalities, and safety improvements. Using exploratory data analysis (EDA) and data visualization, we examine key factors influencing crashes, identify high-risk regions, and explore advancements in aviation safety.

data-analysis data-visualization exploratory-data-analysis

Last synced: 10 Jun 2026

https://github.com/jjkay03/discord-call-extractor

Collect HTML data from Discord group/DM to create database of calls

data-analysis database discord discord-tool

Last synced: 07 May 2026

https://github.com/bryanhe24/data_analysis_app

A full-stack web application that allows users to upload CSV datasets, analyze the data with statistical summaries and visualizations, and interact with an AI-powered assistant for querying the dataset.

ai data data-analysis data-visualization fullstack-development javascript math python reactjs

Last synced: 07 May 2026

https://github.com/joseph-pabian/life-expectancy-

Statistical analysis of life expectancy in developed vs developing countries using SQL and Python

data-analysis duckdb public-health python sql statistics

Last synced: 07 May 2026

https://github.com/jpgiant/gujaratrainfallanalysis_2021

Analysis about the rainfall that occurred in the districts of Gujarat state in 2021

data-analysis exploratory-data-analysis exploratory-data-visualizations matplotlib numpy pandas-python python

Last synced: 07 May 2026

https://github.com/muthukumar0908/-singapore-resale-flat-prices-predicting

This project is to develop a machine learning model and deploy it as a user-friendly web application that predicts the resale prices of flats in Singapore.

data-analysis data-visualization mechine-learing plotly python streamlit

Last synced: 07 May 2026

https://github.com/pedrosfaria2/fugascomhelicoptero

Meu primeiro uso do Jupyter Notebook em um projeto

analise-de-dados data-analysis jupyter-notebook matplotlib pandas python

Last synced: 07 May 2026

https://github.com/biginformatics/git-basics

Hands-on Git and GitHub lessons for analysts and statisticians

data-analysis git github public-health training

Last synced: 10 Jun 2026

https://github.com/vyjayanthipolapragada/genai_smart_retail_recommendation

GenAI Smart Retail is a recommendation system designed for retail environments. It provides personalized product recommendations to users based on product descriptions using a content-based filtering approach. The system leverages FastAPI for backend integration, allowing users to interact with the recommendation engine via an API. This project aim

content-based-recommendation data-analysis data-science data-visualization fastapi gen-ai instacart-data jupyter-notebook open-ai python3 retail scikitlearn-machine-learning stream

Last synced: 07 May 2026

https://github.com/riborings/python_projects

Python projects and other programming experiences

data-analysis machine-learning project python regression-analysis

Last synced: 08 May 2026

https://github.com/bnvulpe/regression-and-time-series

This work centers on assessing and comparing predictive models for regression and time series prediction using specific datasets, with the goal of selecting the most effective methodology for unseen test data.

colab data-analysis data-analysis-python data-science data-visualization forecasting jupyter-notebook machine-learning model-evaluation predictive-modeling python regression sarima sarimax time-series-analysis time-series-analysis-and-forecasting

Last synced: 08 May 2026

https://github.com/aidan-zamfir/the-iliad

Data analysis & relationship network for the characters of Homers Iliad

data data-analysis dataframes networks networkx python selenium spacy webscraping

Last synced: 08 May 2026

https://github.com/otonomee/against-the-clock-transcript-analysis

This repository contains code and analysis for exploring the transcripts of the various "Against The Clock" videos featured on the FACT Magazine YouTube channel. The goal is to uncover insights, patterns, and trends across the different artists and their creative process under time constraints.

against-the-clock ai-analysis audio-processing creative-ai creative-process data-analysis fact-magazine machine-learning music-production natural-language-processing nlp text-mining yt-dlp

Last synced: 08 May 2026

https://github.com/fahamidur/cuisine-analysis

This project analyzes recipes from AllRecipes.com to reveal global cooking patterns, nutritional trends, and cultural food differences, offering data-driven insights for food enthusiasts and researchers.

beautifulsoup data-analysis datavisualization pandas selenium tableau-public webscraping

Last synced: 08 May 2026

https://github.com/devexpress-examples/wpf-pivot-grid-group-date-time-values

This example shows how to group date-time values in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 08 May 2026

https://github.com/danmadeira/algoritmos-estatistica-python

Demonstração de Algoritmos de Estatística em Python

algorithms data-analysis data-science python statistics

Last synced: 08 May 2026

https://github.com/samjoesilvano/password_strength_prediction_using_nlp

Developed a predictive model to categorize passwords as Strong, Good, or Weak, enhancing security and reducing breach risks. The project involves cleaning and analyzing data from an SQL database, using the TF-IDF technique for transformation, and implementing a Logistic Regression model to achieve accurate classifications.

data-analysis data-classification data-cleaning data-visualization logistic-regression machine-learning natural-language-processing pandas password-security password-strength python scikit-learn sql tf-idf

Last synced: 08 May 2026

https://github.com/0290192029/apartment-price-predictor

Python-проект по прогнозированию стоимости аренды квартир с помощью линейной регрессии. Практическая работа по теме: "Основы машинного обучения" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

apartment-price-prediction apartments-for-rent api correios-api data-analysis feature-engineering feature-enginering linear-regression linear-regression-models mlops numpy prediction-model r seaborn

Last synced: 08 May 2026

https://github.com/sumit-sinha9/sales-analysis

Analyzing 12 months worth fo Sales data

data-analysis pandas python visualization

Last synced: 08 May 2026

https://github.com/mrunmayee3108/financial-chatbot

A Python chatbot for analyzing financial data of companies with revenue, income, assets, cash flow, and debt ratio queries

chatbot data-analysis jupyter-notebook pandas python python3

Last synced: 09 May 2026

https://github.com/drod75/burger_king_analysis

A simple analysis on a burger king dataset.

data-analysis data-visualization jupyter-notebook pandas python seaborn

Last synced: 09 May 2026

https://github.com/emanoelcampos/python-onemonth

This repository contains educational materials and projects developed during a Python course offered by OneMonth. It covers Python basics, intermediate concepts, web development with Flask, and data analysis with pandas. The course is structured into weeks, each focusing on a different aspect of Python programming and its applications.

data-analysis flask jupyter-notebook onemonth python python3

Last synced: 09 May 2026

https://github.com/abhroroy365/market_analysis

This project explores customer segmentation and market analysis in the context of online retail using an online retail dataset. By applying advanced analytics, we aim to uncover insights that can drive strategic decisions and enhance business performance.

clustering data data-analysis data-visualization kmeans-clustering machine-learning market-analysis python silhouette-analysis

Last synced: 09 May 2026

https://github.com/magnus0969/black-friday-sales-analysis

An in-depth analysis of Black Friday sales data to uncover trends, customer behavior, and product insights. Utilizing Python, data visualization, and machine learning techniques, this project provides key business intelligence to optimize sales strategies.

analysis data-analysis data-science python sales-analysis

Last synced: 09 May 2026

https://github.com/master-helix/ibm-data-analyst-certification-stock-analysis-project

This is a mini project repository of my IBM Certification involving stock analysis and plotting of Tesla and GameStop

analytics data data-analysis data-visualization ibm matplotlib pandas python web-scraping

Last synced: 09 May 2026

https://github.com/marvinmarnold/oipm_stop_search

OIPM's analysis on Stop & Search (frisk) activity by the New Orleans Police Department.

data-analysis frisk new-orleans oipm police search stop

Last synced: 22 Jul 2025

https://github.com/salma-mamdoh/exploring-the-evolution-of-linux-project

My Project to learn the Basics of Analysis on DataCamp

data-analysis datacamp pandas python time-series-analysis

Last synced: 09 May 2026

https://github.com/tyriek-cloud/nyc-mobility-survey-analysis

An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.

aws aws-athena aws-glue aws-glue-crawler aws-quicksight aws-s3 data-analysis data-engineering etl-pipeline json python

Last synced: 09 May 2026

https://github.com/datalopes1/manufacturing_defects

Projeto de EDA utilizando o Manufacturing Defects que pode ser encontrado no Kaggle

data-analysis data-visualization eda exploratory-data-analysis python

Last synced: 09 May 2026

https://github.com/monish-nallagondalla/algerian_forest_fires

This project predicts forest fires in Algeria using machine learning models . The dataset includes various meteorological and environmental features such as temperature, humidity, and wind speed. The app cleans the data and builds models to predict the likelihood of forest fires based on historical data and environmental conditions.

data-analysis data-science datacleaning flask forest-fire-prediction machine-learning meteorological-data python regression-models ridge-regression

Last synced: 09 May 2026

https://github.com/vasishta03/econovisionai

A simple Python desktop app to search and explore OECD economic data (CSV) and report summaries (TXT/JSON) using a modern CustomTkinter GUI—no SQL or web frameworks needed.

csv customtkinter data-analysis desktop-app economic-data gui json local-app oecd pandas python search tkinter

Last synced: 10 May 2026

https://github.com/shridhar1504/rafik-s-kitchen-data-analysis

The Project is about the Analysis of the Sales and Expenses Data of a Famous Fast-food Restaurant. This mainly focuses on gaining Insights that will boost the Future Sales and also Business Strategies it Improve the Profit Margins. Handled Tools are SQL, Python, Power BI, MS Office Tools.

business-analytics business-intelligence data-analysis data-analytics data-visualization eda ms-office powerbi-report powerpoint-presentations python sql-server

Last synced: 10 May 2026

https://github.com/imrandil/sql_practice_with_analysis

SQL practice using postgres db and docker as a tool to setup postgres, loving the sql way

data-analysis docker markdown postgres sql

Last synced: 10 May 2026

https://github.com/mozeel-v/spam-detection

ML-powered SMS Spam Classifier using NLP and Scikit-learn. Detects and filters spam messages with interactive Streamlit UI.

classification data-analysis mnb streamlit

Last synced: 10 May 2026

https://github.com/szuzick/us-immigration-presidential-analysis

Power BI dashboard analyzing 40 years of U.S. immigration data across presidential administrations (1981-2020)

dashboard data-analysis data-visualization government-data immigration powerbi powerbi-dashboards powerbi-visuals presidential-analysis

Last synced: 10 Jun 2026

https://github.com/luca-02/credit-card-fraud-detection

This is a small master's degree project for New Generation Data Models and DBMSs course (academic year 2024/25).

data-analysis database nosql python

Last synced: 10 Jun 2026

https://github.com/sdley/tp2_datascience

Exercice Pratique de traitement de donnees avec python

data-analysis pandas python

Last synced: 11 May 2026

https://github.com/hrosicka/czechpopulationestimation

This GitHub repository contains Python code for data analysis and population prediction in the Czech Republic up to the year 2050. The code is written in Python and utilizes the Pandas and Matplotlib libraries.

data-analysis data-visualization matplotlib matplotlib-figures matplotlib-pyplot pandas pandas-dataframe pandas-library pandas-python python python3

Last synced: 11 May 2026

https://github.com/ceia-prefeitura/urban-lit-tracker-etl

UrbanLitTracker coleta artigos acadêmicos sobre mudanças urbanas via OpenAlex API, processa e armazena em MongoDB. Oferece dashboard interativo com Dash, exibindo dados como trabalhos mais relevantes, autores e palavras-chave frequentes, facilitando a análise e visualização da literatura urbana.

academic-research bibliometrics data-analysis data-pipeline data-visualization etl openalex-api urban-studies

Last synced: 11 May 2026