An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/edwinrlambert/exploring-airbnb-market-trends

Dive into NYC's Airbnb market trends through detailed analysis of listings data, including prices, types, and review dates. This is a DataCamp project.

airbnb data-analysis jupyter-notebook market-trends python

Last synced: 19 Apr 2026

https://github.com/ak-pydev/python_practice

Documenting my learning journey from python -> ML -> DL -> LLM/GenAI -> Agents exercises solved daily from Udemy/Kaggle/YouTube.

data-analysis data-science feature-engineering llms machine-learning mlflow mlops-workflow modeling python3 streamlit uvicorn

Last synced: 20 Apr 2026

https://github.com/mahmoudwal27/e-commerce-data-analysis

A collection of data analysis and visualization projects focused on ecommerce datasets. Using Python in Google Colab for analysis and Excel for exploration, these projects uncover key insights and trends, showcasing expertise in data manipulation and visualization to inform business decisions.

analytics data-analysis data-analysis-python data-set google-cloud python

Last synced: 21 Apr 2026

https://github.com/danpoynor/pet-shelter-data-analysis-notebook

Demonstration of skills analyzing data from a pet shelter. The CSV data contains tables detailing the incoming and outgoing animals and I use my knowledge of Pandas to gather and present the requested information.

csv data-analysis data-cleaning data-science jupyter-notebook matplotlib numpy pandas pet-shelter tabular-data

Last synced: 21 Apr 2026

https://github.com/martinkalema/power-distribution-modelling

Power Distribution Modelling for cea and cel algorithms

data-analysis python synthetic-dataset

Last synced: 21 Apr 2026

https://github.com/nikhilfuke1/a-b-testing-and-regression-analysis-python

Python Statistical Project involves data analysis, visualization, A/B testing, and regression analysis to determine the best-performing platform.

ab-testing data-analysis hypothesis-testing libraries python regression-analysis statistics visualization

Last synced: 21 Apr 2026

https://github.com/tmmvn/analytics-notebooks

A bunch of data analytics notebooks done testing out JetBrains DataLore

ai algorithms data-analysis datalore elements-of-ai helsinki-university-mooc python

Last synced: 22 Apr 2026

https://github.com/prgermux/yield-reporter

This Python application provides a graphical user interface (GUI) for analyzing and visualizing production data from various machines. It uses the PyQt5 framework for the GUI and Matplotlib for plotting data.

automation data-analysis python reporting

Last synced: 22 Apr 2026

https://github.com/rorrell/lifeexpectancy

A Jupyter Notebook where I create a chart with two line plots on it to check out the life expectancy of men vs. women from 1900-2018

data-analysis data-visualization jupyter-notebook python3

Last synced: 22 Apr 2026

https://github.com/leabrodyheine/california-schools-data-visualization

This front-end project provides interactive visualizations of learning models adopted by California schools during the pandemic. Using D3.js and Mapbox, it dynamically presents data through bar charts, bubble charts, heatmaps, and geographic maps, allowing users to explore trends across school types, sizes, and districts.

d3-visualization d3js data-analysis data-visualization mapbox openai plotly

Last synced: 22 Apr 2026

https://github.com/ayushi-gajendra/buenos-aires-subway-statistics

A comprehensive data analysis of the Buenos Aires subway system ridership using Python and Pandas. This project identifies peak-hour congestion patterns, explores hourly passenger distributions, and utilizes the 95th percentile to isolate extreme traffic conditions for urban mobility insights.

95th-percentile buenos-aires data-analysis data-science-portfolio data-visualization matplotlib pandas python statistical-analysis subway-ridership transit-data urban-mobility

Last synced: 05 Jun 2026

https://github.com/al-ogr/sf_pr1_job_analysis_hh

SkillFactory DataScience PROJECT-1. Анализ резюме из HeadHunter

data-analysis data-science ipynb plotly python

Last synced: 23 Apr 2026

https://github.com/thc1006/nycu_timtable_crawler

🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.

academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping

Last synced: 24 Apr 2026

https://github.com/strixion/demoversion_ai

The demoversion of StrixionAI

ai csv data-analysis data-analytics json python txt

Last synced: 24 Apr 2026

https://github.com/muthukumar0908/youtube-data-harvesting-and-warehousing-using-sql-mongodb-and-streamlit

Create a simple and intuitive user interface using Streamlit, From the youtube getting and extracting the data by using API key. That data stored in database.

data-analysis mongodb-atlas python sqldatabase streamlit-webapp youtube-api

Last synced: 24 Apr 2026

https://github.com/gnodux/adb-link

An MCP server that connects to multiple databases. Supports access control and dynamic SQL query tool registration and invocation.

agent ai-tools data-analysis database-gateway go mcp mcp-server

Last synced: 06 Jun 2026

https://github.com/ismielabir/pycsvsummarizer

A lightweight tool to summarize CSV files using various features.

csv data-analysis data-summary python

Last synced: 25 Apr 2026

https://github.com/fbarffmann/belly-button-challenge

Built an interactive JavaScript dashboard to visualize bacterial biodiversity from belly button samples. Analyzed data from 153 participants and identified OTU 1167 as the most common bacteria.

biodiversity dashboard data-analysis data-visualization interactive-charts javascript json plotly

Last synced: 25 Apr 2026

https://github.com/ddihora1604/iit_patna

A multifaceted project involving applying ML models like Ridge Classifier, RNN, RIDOR, Rotation Forest and RUSBoost, integrating SMOTE for class balancing, and handling diverse datasets including those for seating arrangement tasks.

data-analysis data-visualization datamodelling machine-learning-algorithms python

Last synced: 25 Apr 2026

https://github.com/devexpress-examples/winforms-create-a-custom-exporter-for-pivotgridcontrol-with-xtrareport

This example illustrates how to dynamically create a custom report based on PivotGridControl content in WinForms.

data-analysis dotnet pivot-grid pivot-grid-for-winforms winforms

Last synced: 26 Apr 2026

https://github.com/deliprofesor/cinematic-data-analytics-and-recommendation-platform

This project analyzes a movie dataset using machine learning algorithms to predict success, explore revenue-popularity relationships, and develop recommendation systems. It employs techniques like K-Means, DBSCAN, GMM, decision trees, PCA, and NLP for insights and personalized suggestions.

clustering content-based-recommendation data-analysis data-visualization decision-tree gmm k-means machine-learning natural-language-processing nlp pca predictive-modeling python recommendation-system scikit-learn user-based-recommendation

Last synced: 26 Apr 2026

https://github.com/akashvarma26/data-analysis-on-imbd-using-sqlite3

Data Analysis on IMDb dataset using sqlite3 and Pandas in Jupyter notebook.

data-analysis jupyter-notebook pandas-dataframe sqlite

Last synced: 27 Apr 2026

https://github.com/odinleepro/airbnbnewyorkcityanalysis

AirbnbNewYorkCityAnalysis is a comprehensive data analysis and visualization project exploring short-term Airbnb rental trends across New York City (2008–2022). Using open source Airbnb data, the project combines data cleaning, statistical summaries, and Tableau dashboards to uncover pricing patterns, borough level distribution, and insights.

airbnb analytics-project data-analysis data-cleaning data-science data-visualization new-york-city real-estate-analytics tableau urban-analysis

Last synced: 27 Apr 2026

https://github.com/malexandersalazar/covid-19-peru-estimacion-oxigeno-requerido

Análisis técnico de casos confirmados por COVID-19 en Perú para la estimación de oxígeno medicinal requerido.

covid-19 data-analysis data-science peru python

Last synced: 27 Apr 2026

https://github.com/garcane/exodus_analysis

This project analyses cryptocurrency transaction data exported from the Exodus wallet. The goal is to explore and visualize the inflows and outflows of assets, the types of transactions, and other key metrics over time.

bitcoin btc crypto cryptocurrencies cryptocurrency data-analysis data-visualization eth ethereum pandas seaborn

Last synced: 27 Apr 2026

https://github.com/airdac/ml-palmerpenguins

Classification and analysis of the palmerpenguins dataset in Python. Team project from UPC's Master's Degree in Data Science

classification data-analysis data-science machine-learning palmer-penguin python upc

Last synced: 07 Jun 2026

https://github.com/lotfiferaga/hotel-reviews-sentiment-analysis

Efficient Python-driven sentiment analysis for hotel reviews, providing insightful evaluations.

data-analysis data-visualization nlp python

Last synced: 07 Jun 2026

https://github.com/simranshaikh20/diwali-sales-analysis-for-business-insights

A data analyst project on diwali sales . In this state according state , gender, age we are able to know how much sale it done.

data-analysis data-visualization python

Last synced: 28 Apr 2026

https://github.com/datalopes1/warehouse_rfv

Neste projeto será realizada uma análise do tipo RFV (Recência, Frequência e Valor) com dados que encontrei neste video no Youtube do canal Jie Jenn.

analise-rfv data-analysis data-science kmeans python rfm-analysis

Last synced: 28 Apr 2026

https://github.com/emmanuelletocs/steam-game-recommender

A powerful recommendation system for Steam games, combining Content-Based and Collaborative Filtering techniques. Built with Python, Scikit-learn, and Streamlit to deliver accurate, real-time game recommendations. Perfect for gamers and data scientists interested in building intelligent recommendation engines.

als-algorithm data-analysis gaming-industry knn machine-learning mds mysql ncf neural-network pyspark recommendation-engine recommendation-system scikit-learn spark

Last synced: 28 Apr 2026

https://github.com/george-njuguna/spotify-etl-pipeline

This is an ETL pipeline that uses Spotify API , Docker and Airflow

apache-airflow data-analysis docker pipelines python

Last synced: 28 Apr 2026

https://github.com/manalisbhavsar/stock-price-prediction

Stock Price Prediction model using Machine Learning and LSTM to forecast future stock prices based on historical data. Achieved a low error rate of 3.2% by leveraging moving averages and deep learning techniques, ensuring accurate predictions.

data-analysis deep-learning lstm machine-learning matplotlib numpy pandas python

Last synced: 28 Apr 2026

https://github.com/wei-rongrong2/openfoodfactclustering

A project that explores clustering food products based on nutritional attributes using K-Means, Fuzzy C-Means, and DBSCAN algorithms, with a Streamlit dashboard for visualizing results.

clustering dashboard data-analysis dbscan food-products fuzzy-cmeans k-means machine-learning nutrition nutrition-clustering open-food-facts streamlit

Last synced: 28 Apr 2026

https://github.com/ricram2/column-name-extractor

Jupyter Notebook. Takes Folder with one or more CSV and gives back one CSV with a compendium of column names and 3 example values (first, random, random)

data-analysis pandas

Last synced: 29 Apr 2026

https://github.com/chrispsang/healthcare-dataanalysis

Analyze synthetic patient data to identify trends, improve healthcare delivery, and predict patient outcomes using machine learning models. Includes data exploration, preprocessing, model building, and visualizations.

data-analysis data-science data-visualization healthcare jupyter-notebook machine-learning python

Last synced: 29 Apr 2026

https://github.com/prateek5525/yt-analysis-project

This project utilizes the YouTube Data API to analyze channel and video performance, offering insights into subscriber counts, views, video metrics, and monthly trends. It generates visual reports and exports data in CSV format, aiding in effective decision-making and performance tracking.

data-analysis jupyter-notebook python3 seaborn-plots youtube-api

Last synced: 29 Apr 2026

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/i7t5/sentimentnlp

Sentiment analysis for COMP 435 Introduction to Machine Learning, Spring 2025

data-analysis jupyter-notebook machine-learning nlp python sentiment-analysis

Last synced: 29 Apr 2026

https://github.com/khushi-sabarad/web_scraping

This project is a Python-based web scraper that extracts the menu from a cafe and saves it to an Excel file. It was created to automate the process of retrieving and updating menu prices, a task that was observed to be done manually at the hostel.

beautifulsoup data-analysis data-visualization market-analysis pandas python requests web-scraping wordcloud

Last synced: 29 Apr 2026

https://github.com/hardikk-7/election-analysis-project

A data analytics project exploring the 2024 Indian General Election results using Python. Includes party-wise, state-wise, and vote share analysis with visualizations.

data-analysis data-science election-analysis jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/sdley/cas_pratique-del_annuel

Del-Annuel est logiciel de deliberation annuelle des ecoles superieures ou universités

data-analysis pandas python tkinter-gui

Last synced: 29 Apr 2026

https://github.com/dindagustiayu/data-processing

The digital text book to interpreting characterisation results.

characterisation data-analysis gitbook latex-package myst qualitative-analysis quantitative-analysis

Last synced: 08 Jun 2026

https://github.com/supertetelman/frc-data-analysis

A Collection of R, Matlab, and Bash scripts that were developed in real-time from the stands of a FRC competition. Gathered data from various online sources, parsed it, and ran some basic analysis on it to calculate ratings and make basic match predictions. Results were mad public and hosted live via AWS. Developed as a student teaching tool under poor Internet Connectivity with minimal access to real-time match data.

bash data-analysis matlab r teaching

Last synced: 29 Apr 2026

https://github.com/yimethan/basics-of-data-analysis

2023-2 Basics of Data Analysis

data-analysis numpy pandas python

Last synced: 29 Apr 2026

https://github.com/theoplayz2/eda-explorer

Инструмент на Python для разведочного анализа данных (EDA) и визуализации, поддерживающий загрузку данных CSV и JSON, с модульной архитектурой ООП. Практическая работа по теме: "Обнаружение и визуализация данных для понимания их сущности" дисциплины "МДК 13.01: Основы применения методов искусственного интеллекта в программировании".

analysis battery-life cqrs csharp data-analysis eeg-analysis exploratorydataanalysis json-visualization matplotlib messaging profile-report python verilog visualization

Last synced: 29 Apr 2026

https://github.com/muhammadusman-khan/e-commerce-store-eda

Exploratory Data Analysis on E-commerce store data to uncover insights about sales trends, customer behavior, and product performance using Python libraries like Pandas, NumPy, and Matplotlib/Seaborn.

data-analysis data-science data-visualization e-commerce eda exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/shruti-h/sales_data_analysis

Sales Data Analysis | Pandas & Matplotlib

data-analysis data-science data-vi matplotlib pandas-library python

Last synced: 30 Apr 2026

https://github.com/bachtiarashidiqy/ecommercedashboard

An interactive e-commerce analytics dashboard built with Streamlit, providing visualizations for sales performance, product analysis, geographic insights, and delivery status. Includes date filtering, company branding, and comprehensive documentation.

analytics dashboard data-analysis data-visualization e-commerce matplotlib pandas python seaborn streamlit

Last synced: 30 Apr 2026

https://github.com/diogojorgebasso/dataanalysis_r_minesnancy

Les codes et les matériaux des cours d'analyse de données en R à Mines de Nancy. Vous y trouverez également des scripts R, des notebooks et d'autres ressources pour chaque leçon.

analyse-data data-analysis data-science data-visualization estatistics r statistiques statistiques-descriptives

Last synced: 30 Apr 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/ahmedtaher10/covid-19-cases

The data we are using contains the data on covid-19 cases and their impact on GDP from December 31, 2019, to October 10, 2020.

data-analysis python visualization

Last synced: 30 Apr 2026

https://github.com/busra-deveci/kaggle-iris_data_analysis

Exploratory data analysis and visualization of the Iris dataset using Python.

data-analysis iris-dataset kaggle pandas python seaborn visualization

Last synced: 30 Apr 2026

https://github.com/gitchaell/computer-scrapping

Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.

data-analysis data-structures data-visualization database dbdiagram export-excel powerbi scrapper-script scrapping xlsx

Last synced: 01 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/r13i/cheapest-phone-call

Small challenge to find the best phone operator to use based on call price

big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist

Last synced: 04 May 2026

https://github.com/fatihilhan42/the-office-eda

Data analysis study of my favorite sitcom, The Office (US).

data-analysis data-science data-visualization fatihilhan office python sitcom

Last synced: 04 May 2026

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: 04 May 2026

https://github.com/fatihilhan42/book-recommendation-system-with-python

In this project, we are making a book recommendation system that recommends similar books according to the genres or ratings that the user enters, using a large book dataset. The link of the dataset is given below. Happy reading...

books data-analysis data-science data-visualization kaggle python recommendation-engine recommendation-system

Last synced: 04 May 2026

https://github.com/halyusa16/e-commerce-analysis

This project analyzes a public e-commerce dataset to uncover valuable insights and answer critical business questions. The dataset contains customer, product, order, and transaction details, providing a comprehensive view of the e-commerce platform's operations.

data-analysis data-cleaning data-exploration data-visualization self-project

Last synced: 09 Jun 2026

https://github.com/jendives2000/regressions

Performing of a Linear Regression analysis to determine the strength of the relationship between the number of reviews and sales for a retail company.

data-analysis linear-regression pearson-correlation-coefficient regression

Last synced: 04 May 2026

https://github.com/demon-2-angel/product-customer-acquisition-analysis-using-behaviour

The database encompasses eight tables with varied attributes and rows. Key analyses include product restocking needs, top VIP customers' contributions, and an average customer profit of $39,039.59. Recommendations emphasize strategic marketing to new customers and incentives for existing VIP clients based on acquisition costs and profit insights.

customer-products customer-segmentation data-analysis database sqlite

Last synced: 05 May 2026

https://github.com/jacktheprogrammer/time-series-forecasting-and-analysis

My personal project consisting of my personally created notebooks to work with time series forecasting and analysis. In these projects, I've used deep learning using tensorflow, xgboost, statsmodels and scipy libraries of python. The series were of weather, energy consumption and that of stocks.

data-analysis data-science deep-neural-networks energy-consumption machine-learning portfolio prophet-facebook prophet-model python python3 scipy statsmodels stocks tensorflow time-series time-series-analysis timeseries-forecasting weather xgboost

Last synced: 05 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026

https://github.com/akash-47-tank/personalized-e-commerce-review-summarizer

Personalized E-commerce Product Review Summarizer: A Streamlit app that summarizes product reviews (e.g., from a CSV) using T5-small and tailors summaries to user preferences (price, durability, etc.) with NLP and lightweight ML.

data-analysis e-commerce machine-learning nlp personalization portfolio python scikit-learn sentiment-analysis streamlit t5 transformers web-app

Last synced: 05 May 2026

https://github.com/anushkundu/crime-pattern-analysis

Analyzing Crime Patterns in Montgomery County, USA: An Inclusive Study Based on NIBRS Data (2016-2022)

data-analysis data-visualization descriptive-statistics matplotlib numpy pandas python seaborn

Last synced: 05 May 2026

https://github.com/kammarah/data-sample

I designed a database website 🌐 that can be uploaded easily for use 📤. You can check my website 👀.

data-analysis data-visualization database deploy deployment library-management-system panaversity streamlit webapp

Last synced: 05 May 2026

https://github.com/nimbostratos/titanic-survival-prediction

Machine learning project predicting Titanic survival using AdaBoost with feature engineering and hyperparameter optimization

data-analysis data-science data-science-projects kaggle machine-learning machine-learning-models python scikit-learn

Last synced: 05 May 2026

https://github.com/ryuzen6/bangalore-real-estate-price-prediction

This is a Data Science Project which predicts the cost of Real Estate in Bangalore. Requirements: Jupyter Notebook (for Data Cleaning and creating the Linear Regression using various python libraries) , Pycharm (python IDE for creating Python Flask Server), Visual Studio Code (to create the UI with HTML, CSS and Javascript).

css3 data-analysis data-science html5 javascript jupyter-notebook machine-learning python3

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/sankaran-s2001/us-traffic-accidents-analysis-python-eda

Exploratory data analysis of US traffic accidents from 2016-2023, analyzing patterns by time, location, weather, and severity using Python data science libraries.

data-analysis data-science data-visualization eda matplolib numpy pandas python

Last synced: 06 May 2026

https://github.com/abhinav330/customer-behavior-analysis-linear-regression

This repository explores customer behavior data for an NYC clothing company with both a mobile app and website. They want to understand which platform drives higher sales.

data-analysis data-science data-visualization eda exploratory-data-analysis jupyter jupyter-notebook linear-regression machine-learning machine-learning-algorithms machinelearning-python numpy pandas python regression-analysis

Last synced: 06 May 2026

https://github.com/superpandas-ai/superpandas

Adding LLM integration to Pandas library

ai data-analysis llm pandas

Last synced: 06 May 2026

https://github.com/drill-n-bass/dealavo-project

Cartesian product from dictionary to list of dictionaries and faster methods for finding index than the `index` method.

data-analysis data-analysis-python matplotlib pandas python python3 random timeit

Last synced: 06 May 2026

https://github.com/edanur-y/variable-analysis-of-banks-ratio-data

Testing variables for multicollinearity, multivariate normality and analyzing outliers and missing values. ⭕SPSS 🔵R

data-analysis log-transformation missing-values-analysis multicollinearity normality-test r spss

Last synced: 10 Jun 2026

https://github.com/rlalpha49/anisearch-model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2026