An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/kgelli/apple-data-analysis---apache-spark

Modular ETL pipeline for analyzing Apple product purchase patterns using Apache Spark on Databricks with factory design patterns.

apache-spark data-analysis databricks delta-lake etl-pipeline factory-pattern pyspark

Last synced: 22 Apr 2026

https://github.com/rajesh9943/sentiment-analysis-of-consumer-opinions-on-amazon-products

Developed a comprehensive Sentiment Analysis System aimed at classifying Amazon product reviews into positive, neutral, and negative sentiments. The project leveraged advanced Natural Language Processing (NLP) techniques alongside machine learning algorithms to deliver accurate and actionable insights from customer feedback

amazon data-analysis data-manipulation data-preprocessing data-presentation data-visualization machine-learning nlp nlp-library nltk product-reviews-analysis sentiment-analysis sklearn-library word-cloud-generator-in-python-3

Last synced: 05 Jun 2026

https://github.com/ayushi-gajendra/buenos-aires-subway-statistics

A comprehensive data analysis of the Buenos Aires subway system ridership using Python and Pandas. This project identifies peak-hour congestion patterns, explores hourly passenger distributions, and utilizes the 95th percentile to isolate extreme traffic conditions for urban mobility insights.

95th-percentile buenos-aires data-analysis data-science-portfolio data-visualization matplotlib pandas python statistical-analysis subway-ridership transit-data urban-mobility

Last synced: 05 Jun 2026

https://github.com/al-ogr/sf_pr1_job_analysis_hh

SkillFactory DataScience PROJECT-1. Анализ резюме из HeadHunter

data-analysis data-science ipynb plotly python

Last synced: 23 Apr 2026

https://github.com/yuvrajsaraogi/-iris-flower-classification

Iris flower has three species; setosa, versicolor, and virginica, which differs according to their measurements. Now assume that you have the measurements of the iris flowers according to their species, and the task is to train a machine learning model that can learn from the measurements of the iris species and classify them.

classification data data-analysis data-science data-visualization flower flower-classification iris iris-classification iris-flower iris-flower-classification knn knn-classification machine-learning machine-learning-algorithms ml natural-language-processing nlp python

Last synced: 24 Apr 2026

https://github.com/manisharora96/data-analysis-of-smartwatch

The project is structured with sample data, step-by-step Jupyter notebooks, and modular Python scripts for automated analysis

data-analysis data-visualization jupyter-notebook python smartwatch-analysis

Last synced: 24 Apr 2026

https://github.com/edwinrlambert/emomap-sentiment-analysis

To analyze public sentiment related to specific locations in a city (e.g., parks, transit stations, restaurants, neighborhoods) using geo-tagged social media posts, reviews, and comments. The goal is to visualize how people feel across different areas and times.

data-analysis jupyter-notebook python sentiment-analysis

Last synced: 24 Apr 2026

https://github.com/pedrohdosanjos/economic-data-analysis

This project aims to analyze the export data from various states in the United States to Brazil over time. The data is sourced from the FRED (Federal Reserve Economic Data) API and processed to identify the top 5 exporting states for each year, as well as the states with the highest total export value across all years.

api data-analysis data-visualization jupyter-notebook python

Last synced: 24 Apr 2026

https://github.com/ismielabir/pycsvsummarizer

A lightweight tool to summarize CSV files using various features.

csv data-analysis data-summary python

Last synced: 25 Apr 2026

https://github.com/tmoulik/bikeshare-python

Analysis of Bikeshare data from three major cities

data-analysis data-visualization python udacity-nanodegree

Last synced: 25 Apr 2026

https://github.com/m-biriulova/python-job-market-analysis

Web scraping, data analysis, and visualization of Python developer vacancies in Czech Republic.

automation beautifulsoup data-analysis data-visualization portfolio-project python selenium web-scraping

Last synced: 25 Apr 2026

https://github.com/aastopher/mma_outcome

Simple exploratory analysis of UFC Fights and Vegas fight odds from 1993 to 2021

data-analysis data-visualization

Last synced: 06 Jun 2026

https://github.com/dcs-training/2023-10-22-carpentry-social-science

Go to https://dcs-training.github.io/2023-10-22-Carpentry-Social-Science/ to follow along the material

data-analysis data-visualisation data-wrangling intro-to-programming r

Last synced: 06 Jun 2026

https://github.com/rociobenitez/happiness-index-data-processing

Repository for Big Data Processing - Contains Jupyter Notebooks and Datasets for data analysis and processing tasks related to Big Data.

big-data big-data-processing data-analysis data-processing happiness-index happiness-report jupyter-notebook matplotlib pandas seaborn

Last synced: 15 May 2026

https://github.com/deliprofesor/cinematic-data-analytics-and-recommendation-platform

This project analyzes a movie dataset using machine learning algorithms to predict success, explore revenue-popularity relationships, and develop recommendation systems. It employs techniques like K-Means, DBSCAN, GMM, decision trees, PCA, and NLP for insights and personalized suggestions.

clustering content-based-recommendation data-analysis data-visualization decision-tree gmm k-means machine-learning natural-language-processing nlp pca predictive-modeling python recommendation-system scikit-learn user-based-recommendation

Last synced: 26 Apr 2026

https://github.com/pararang/nams-thesis-fuzzy

A specialized data processing tool designed to help with Fuzzy Delphi Method calculations for thesis research data analysis. Then extended with some new features for data processing with different method.

data-analysis dematel hacktoberfest hacktoberfest-accepted house-of-quality python sustainability vibecoding

Last synced: 27 Apr 2026

https://github.com/malexandersalazar/covid-19-peru-estimacion-oxigeno-requerido

Análisis técnico de casos confirmados por COVID-19 en Perú para la estimación de oxígeno medicinal requerido.

covid-19 data-analysis data-science peru python

Last synced: 27 Apr 2026

https://github.com/banyc/dfplot

Summarize a data frame by plotting. `cargo install --git https://github.com/Banyc/dfplot.git`.

csv data-analysis plotly plotting statistics

Last synced: 27 Apr 2026

https://github.com/mango606/da__

2021.9 데이터분석프로그래밍 과제

data-analysis python task

Last synced: 27 Apr 2026

https://github.com/josedanielchg/1990s-netflix-movie-insight

Small exploratory analysis of Netflix movie data from the 1990s. This project is part of the DataCamp Associate Data Scientist in Python program and focuses on filtering, visualizing, and extracting insights from a dataset using Python. Analyze trends in movie durations and count short action films to practice key data science skills!

beginner data-analysis python

Last synced: 27 Apr 2026

https://github.com/tillscode/personal-finance-ml-analysis

Machine learning analysis of personal financial data with predictive modeling and interactive dashboard

dashboard data-analysis finance machine-learning python scikit-learn

Last synced: 28 Apr 2026

https://github.com/rajivaleaakash/customer-churn-prediction

A machine learning project focused on predicting customer churn using various data analysis and modeling techniques. The repository includes data preprocessing, feature engineering, exploratory data analysis (EDA), model training, evaluation, and visualization to help businesses identify customers at risk of leaving.

churn-prediction classification customer-churn data-analysis data-science gridsearchcv imblearn machine-learning numpy pandas pyhton randomsearchcv scikit-learn

Last synced: 28 Apr 2026

https://github.com/george-njuguna/spotify-etl-pipeline

This is an ETL pipeline that uses Spotify API , Docker and Airflow

apache-airflow data-analysis docker pipelines python

Last synced: 28 Apr 2026

https://github.com/matheusafonseca/python-data-visualization-matplotlib-seaborn-masterclass-udemy

This repository is dedicated to storing the code developed during the "Python Data Visualization: Matplotlib & Seaborn Masterclass" course on Udemy.

charts data-analysis data-analysis-python data-science data-visualization database graphics graphics-programming jupyter-notebook matplotlib matplotlib-plots python python3 seaborn seaborn-plots

Last synced: 28 Apr 2026

https://github.com/ericdataplus/kaggle-airbnb-nyc

NYC Airbnb Market Analysis: Multi-source from 2 Kaggle datasets (151K listings)

airbnb data-analysis kaggle nyc python visualization

Last synced: 28 Apr 2026

https://github.com/wei-rongrong2/openfoodfactclustering

A project that explores clustering food products based on nutritional attributes using K-Means, Fuzzy C-Means, and DBSCAN algorithms, with a Streamlit dashboard for visualizing results.

clustering dashboard data-analysis dbscan food-products fuzzy-cmeans k-means machine-learning nutrition nutrition-clustering open-food-facts streamlit

Last synced: 28 Apr 2026

https://github.com/josedanielchg/efficient-data-storage-for-predictive-modeling

DataCamp project from the Associate Data Scientist track, focusing on optimizing dataset storage by transforming data types and filtering. Prepares data for efficient machine learning workflows

cleaning-dataset data-analysis jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/devexpress-examples/web-forms-pivot-grid-change-summary-display-mode

This example shows how to use different summary display modes in Pivot Grid for Web Forms.

asp-net-web-forms data-analysis dotnet pivot-grid pivot-grid-for-web-forms

Last synced: 29 Apr 2026

https://github.com/kawshik-khan/fake-news-analysis

A fake news detection ML model. It utilizes the Bag of Words model for text vectorization and a Multinomial Naive Bayes classifier to predict whether news articles are real or fake. The project covers data preprocessing, model training, and performance evaluation with accuracy metrics and a confusion matrix.

data-analysis data-science machine-learning ml python3

Last synced: 08 Jun 2026

https://github.com/prateek5525/yt-analysis-project

This project utilizes the YouTube Data API to analyze channel and video performance, offering insights into subscriber counts, views, video metrics, and monthly trends. It generates visual reports and exports data in CSV format, aiding in effective decision-making and performance tracking.

data-analysis jupyter-notebook python3 seaborn-plots youtube-api

Last synced: 29 Apr 2026

https://github.com/vanshuchaudhary/zomato

This Jupyter Notebook contains an exploratory data analysis (EDA) of Zomato restaurant data. It includes data cleaning, visualization, and insights into restaurant ratings, pricing, cuisine distribution, and location-based trends.

business-analytics data-analysis data-mining data-science data-visualization datascience matplotlib pandas-dataframe pandas-python python python-3 python-library

Last synced: 29 Apr 2026

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/eco786786/restaurant_orders

This analysis seeks to uncover patterns in customer behaviour by examining restaurant order data.

data-analysis git postgresql tableau

Last synced: 29 Apr 2026

https://github.com/i7t5/sentimentnlp

Sentiment analysis for COMP 435 Introduction to Machine Learning, Spring 2025

data-analysis jupyter-notebook machine-learning nlp python sentiment-analysis

Last synced: 29 Apr 2026

https://github.com/lankesathwik7/sql-query-assistant

Natural language to SQL query converter using Groq LLM. Ask questions in plain English and get SQL queries, visualized results, and natural language explanations. Built with Streamlit and PostgreSQL.

data-analysis database groq llm natural-language-processing python sql

Last synced: 29 Apr 2026

https://github.com/khushi-sabarad/web_scraping

This project is a Python-based web scraper that extracts the menu from a cafe and saves it to an Excel file. It was created to automate the process of retrieving and updating menu prices, a task that was observed to be done manually at the hostel.

beautifulsoup data-analysis data-visualization market-analysis pandas python requests web-scraping wordcloud

Last synced: 29 Apr 2026

https://github.com/farhad-here/textprepx

A Multilingual Text Preprocessing Tool for English and Persian.

cleantext contractions data-analysis deep-learning emoji nlp nltk opp parsivar regex streamlit text-preprocessing textblob

Last synced: 29 Apr 2026

https://github.com/srinibas-masanta/yelp-business-reviews-analysis

This project analyzes Yelp business reviews using Python, Snowflake, and SQL, focusing on efficient data ingestion, transformation, and analysis. We preprocess JSON data, optimize ingestion via Amazon S3, classify sentiments with Python UDFs, and extract insights using SQL queries—showcasing a streamlined end-to-end workflow.

amazon-s3 data-analysis json python snowflake sql

Last synced: 29 Apr 2026

https://github.com/valikmorinko/ecommerce-sales-analysis

Анализ продаж e-commerce: данные, визуализации, аналитические выводы.

data-analysis e-commerce jupyter matplotlib pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/saikumar787/car_price_prediction_using_linear-regression

A machine learning project to predict the selling price of used cars using regression techniques. Includes data preprocessing, model training, evaluation, and testing on new data.

car-price-prediction-with-machine-learning data-analysis joblib jupiter-notebook linear-regression-models model-deployment python scikit-learn standardscaler

Last synced: 29 Apr 2026

https://github.com/brevex/code-complexity-data-analisis

Data collection that shows different complexity scores in an algorithmic dataframe.

code-analysis data-analysis data-science python

Last synced: 29 Apr 2026

https://github.com/supertetelman/frc-data-analysis

A Collection of R, Matlab, and Bash scripts that were developed in real-time from the stands of a FRC competition. Gathered data from various online sources, parsed it, and ran some basic analysis on it to calculate ratings and make basic match predictions. Results were mad public and hosted live via AWS. Developed as a student teaching tool under poor Internet Connectivity with minimal access to real-time match data.

bash data-analysis matlab r teaching

Last synced: 29 Apr 2026

https://github.com/monddavila/online-retail-data-analysis

Online Retail Exploratory Data Analysis with Python

data-analysis jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/edgarhtt/uber_freight_data_analysis

Uber Freight interview homework. It consisted of solving a 2 warehouse problem and an ETL task

data-analysis data-science data-visualization python

Last synced: 30 Apr 2026

https://github.com/srinibas-masanta/ibm-applied-data-science-capstone

This repository contains the work completed for the Applied Data Science Capstone Project offered by IBM on Coursera. The capstone project is the final course in the IBM Data Science Professional Certificate series and serves as an opportunity to apply the skills and knowledge gained throughout the series to a real-world data science problem.

capstone-project data-analysis data-science data-visualization machine-learning python web-scraping

Last synced: 30 Apr 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/busra-deveci/kaggle-iris_data_analysis

Exploratory data analysis and visualization of the Iris dataset using Python.

data-analysis iris-dataset kaggle pandas python seaborn visualization

Last synced: 30 Apr 2026

https://github.com/fazatholomew/marlboroplan

In order to contribute to a more inclusive sustainable energy program in Massachusetts, this project is part of my work for a nonprofit organization called All In Energy and undergraduate thesis for my degree.

data-analysis data-visualization energy jupyter-notebook massachusetts python

Last synced: 01 May 2026

https://github.com/mmfava/lonomia-host-plants-2024

This project investigates the relationship between Lonomia achelous and Lonomia obliqua caterpillars and their host plants. The project uses Docker for a consistent environment and R for statistical analysis, with detailed processes documented in Jupyter notebooks.

data-analysis host-plants lonomia lonomism r

Last synced: 01 May 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/fbarffmann/project1

Analyzed factors influencing movie profitability using Python. Cleaned and visualized film industry data to uncover trends in budgets, sales, genres, and ratings.

box-office-analysis data-analysis data-visualization matplotlib movie-industry pandas python regression seaborn

Last synced: 01 May 2026

https://github.com/linguini1/edueval

The BorealisAI Let's Solve It mentorship project: summarizing student feedback submissions on their professor into one cohesive paragraph for faculty consideration during performance reviews.

ai data data-analysis data-science machine-learning machinelearning nlp python pytorch sentiment-analysis

Last synced: 01 May 2026

https://github.com/vedantshi/stock-price-prediction-for-maang-companies

This project utilizes Long Short-Term Memory (LSTM) networks to forecast stock prices. It includes steps for data preprocessing, model training, and visualization of predictions using Python in Jupyter Notebook. The project demonstrates proficiency in machine learning, data analysis, and Python programming.

data-analysis data-visualization lstm machine-learning python stock-price-prediction

Last synced: 01 May 2026

https://github.com/bhoyarapurva23399/mini-erp-inventory-billing

Lightweight ERP inventory and billing web app built using Python Flask and SQLite — featuring product, customer, and dashboard management.

backend data-analysis erp flask inventory-billing mini-project python sqlite

Last synced: 01 May 2026

https://github.com/mateusoliveira30/top-intelligent-people

This project performs an exploratory analysis of the top_intelligent_people_in_the_world_5000.csv dataset, featuring some of the world's most intelligent individuals. Using pandas and matplotlib, the analysis includes checking for missing values, describing variables, and visualizing data.

data-analysis graphics kaggle-dataset python3

Last synced: 03 May 2026

https://github.com/nishnash54/sidba

CSV to MongoDB with type conversion

csv-converter data-analysis mongodb statistics

Last synced: 02 May 2026

https://github.com/adithya17-star/ai-powered-fraud-detection

An AI-powered fraud detection system using machine learning algorithms to identify suspicious transactions and provide interactive visualizations for financial security.

dashboard-visualization data-analysis finance-technology fintech flask fraud-detection machine-learning python security transaction-monitoring

Last synced: 02 May 2026

https://github.com/shreeparab1890/unicorns-of-india-till-sep-2022-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Unicorns of India till Sep 2022.

analysis data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 02 May 2026

https://github.com/bhawnagoyal18/ai-doctor-a-symptom-checker-disease-predictor

AI Doctor is an intelligent healthcare application that utilizes machine learning (ML) and Python to predict potential diseases based on user-input symptoms. The project integrates data from multiple medical datasets and provides an interactive web-based UI for an intuitive user experience.

data-analysis data-engineering data-visualization dataset flask html5 machine-learning python sql stacking statistics

Last synced: 02 May 2026

https://github.com/rorrell/employmentdata

A Jupyter Notebook where I use group by to analyze the average unemployment rate by year

data-analysis data-visualization jupyter-notebook python3

Last synced: 02 May 2026

https://github.com/chuxinh/our-data-manual

All in one place for our data science learning journey by Chuxin and Melody

data-analysis data-science machine-learning python

Last synced: 09 Jun 2026

https://github.com/m0saan/python-for-data-analysis

Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney,

data-analysis data-science ipython-notebook machine-learning matplotlib numpy pandas python

Last synced: 02 May 2026

https://github.com/stepankuzmin/machine-learning-data-analysis

My homeworks on Coursera Machine Learning and Data Analysis specialization

coursera data-analysis jupiter machine-learning python

Last synced: 03 May 2026

https://github.com/fatihilhan42/tourist_analysis_in_turkey_with_python

In this project, the number of tourists coming to Turkey between 2008-2021 was analyzed. The data from the data set you can find in the warehouse was first organized using data cleaning algorithms. These cleaned data were then output graphically using data visualization algorithms.

data-analysis data-cleaning data-science data-visualization jupyter-notebook python

Last synced: 03 May 2026

https://github.com/chaedoll/analysis-python-foreignerinfra

국내 외국인 대상 인프라 개선을 위한 보고서 (Report on improving infrastructure for foreigners)

data-analysis python team-project

Last synced: 03 May 2026

https://github.com/salma-mamdoh/project-writing-functions-for-product-analysis

My Project to learn the Basics of Analysis on DataCamp

data-analysis data-camp pandas python

Last synced: 03 May 2026

https://github.com/matteospanio/speed-analysis

A project to analyze the internet speed

bash-script data-analysis

Last synced: 03 May 2026

https://github.com/nathadriele/world-marathon-run-majors-analytics-challenge

This project presents a complete data engineering, analytics, machine learning, and Streamlit dashboard pipeline focused on the Abbott World Marathon Majors: Tokyo, Boston, London, Berlin, Chicago, and New York City. Covering the 2018 to 2025 seasons, it analyzes more than 628,000 runner records and 86 verified winner entries.

challenge data-analysis data-pipeline gradient-boosting lasso-regression linear-regression machine-learning models predictive-modeling python random-forest ridge-regression run-analytics world-marathon

Last synced: 09 Jun 2026

https://github.com/mindlessmuse666/titanic-data-visualization

Проект по визуализации данных о пассажирах Титаника с использованием библиотек Python Matplotlib, Seaborn и Plotly.

data-analysis data-visualization matplotlib pandas plotly python seaborn titanic

Last synced: 04 May 2026

https://github.com/arv-anshul/ipl-api

IPL API using Flask framework and ipl dataset.

api data-analysis fast-api flask flask-api ipl ipl-api python3

Last synced: 04 May 2026

https://github.com/mugilan1309/csv_analyzer

📊 A simple Streamlit-based CSV Analysis & Preprocessing Tool for quick data insights.

csv-processing data-analysis data-visualization machine-learning python streamlit

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/vara-co/crowdfunding_etl

ETL Mini Project based on a Crowdfunding Database, using CRUD operations. SQL, Postgres, and an ERD.

data-analysis database datacleaning erd erdiagram etl jupyter-notebook postgres postgresql regex schema sql

Last synced: 04 May 2026

https://github.com/jacktheprogrammer/time-series-forecasting-and-analysis

My personal project consisting of my personally created notebooks to work with time series forecasting and analysis. In these projects, I've used deep learning using tensorflow, xgboost, statsmodels and scipy libraries of python. The series were of weather, energy consumption and that of stocks.

data-analysis data-science deep-neural-networks energy-consumption machine-learning portfolio prophet-facebook prophet-model python python3 scipy statsmodels stocks tensorflow time-series time-series-analysis timeseries-forecasting weather xgboost

Last synced: 05 May 2026

https://github.com/codewithmayank-py/box-office-analysis-with-seaborn-and-python

This repository contains Python code and datasets for analyzing box office data. Explore trends, patterns, and factors influencing movie performance.

analysis box-office-data-analysis data-analysis data-visualization dataset jupyter-notebook matplotlib pandas python3 seaborn

Last synced: 05 May 2026