An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/manisharora96/data-analysis-of-smartwatch

The project is structured with sample data, step-by-step Jupyter notebooks, and modular Python scripts for automated analysis

data-analysis data-visualization jupyter-notebook python smartwatch-analysis

Last synced: 24 Apr 2026

https://github.com/cyberoctane29/python-for-data-analysis

A repository dedicated to learning Python for data analysis, data science, and data analytics. This collection of Jupyter notebooks covers practical exercises and concepts from the Google Advanced Data Analytics Professional Certificate program.

data data-analysis data-analytics data-science python

Last synced: 24 Apr 2026

https://github.com/mirwais-farahi/data-visualization-with-tableau-specialization

The Specialization provides Tableau for data visualization and business intelligence. The series covers skills like assessing data quality, designing visualizations and dashboards, and combining data sources to create compelling, data-driven stories.

dashboard data-analysis geospatial map tableau visualization

Last synced: 16 Feb 2026

https://github.com/edwinrlambert/emomap-sentiment-analysis

To analyze public sentiment related to specific locations in a city (e.g., parks, transit stations, restaurants, neighborhoods) using geo-tagged social media posts, reviews, and comments. The goal is to visualize how people feel across different areas and times.

data-analysis jupyter-notebook python sentiment-analysis

Last synced: 24 Apr 2026

https://github.com/monarch1108/customerinsights-kmeans

understanding customers using KMeans and RFM(recency, frequency & monetary) analysis

data-analysis data-visualization kmeans-clustering machine-learning matplotlib numpy pandas scikit-learn

Last synced: 11 May 2026

https://github.com/achrefbenammar404/quasi-patterned-conversations-analysis

Official Implementation of the IEEE EUROCON 2025 Paper A Computational Approach to Modeling Conversational Systems Analyzing Large-Scale Quasi-Patterned Dialogue Flows Mohamed Achref Ben Ammar – National Institute of Applied Science and Technology (INSAT), University of Carthage, Tunisia Mohamed Taha Bennani – University of Tunis El Manar (FST)

ai computational-linguistics conversational-agent conversational-ai data-analysis graph-algorithms nlp research research-paper

Last synced: 14 Oct 2025

https://github.com/mariann95/sql_data_warehouse_and_analytics_project

Building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics. This repository also contains a collection of SQL scripts demonstrating various analytical techniques, such as changes over time, cumulative, performance, data segmentation, part-to-whole analysis.

data-analysis data-analytics data-cleaning data-engineering data-lakehouse data-science data-science-portfolio data-warehouse data-warehousing datalake datawarehouse datawarehousing etl etl-job etl-pipeline medallion-architecture sql sql-query sql-server sqlserver

Last synced: 06 Jun 2026

https://github.com/fbarffmann/belly-button-challenge

Built an interactive JavaScript dashboard to visualize bacterial biodiversity from belly button samples. Analyzed data from 153 participants and identified OTU 1167 as the most common bacteria.

biodiversity dashboard data-analysis data-visualization interactive-charts javascript json plotly

Last synced: 25 Apr 2026

https://github.com/tmoulik/bikeshare-python

Analysis of Bikeshare data from three major cities

data-analysis data-visualization python udacity-nanodegree

Last synced: 25 Apr 2026

https://github.com/xjwllmsx/hacker-news-engagement

Analyze Hacker News data to reveal which post types and posting hours spark the most discussion, using Python and a reproducible Jupyter notebook.

data data-analysis jupyter python

Last synced: 25 Apr 2026

https://github.com/anushkundu/london-housing-market-analysis

London Housing Market Analysis: An Insightful Power BI Dashboard"

data-analysis data-visualization powerbi transformation

Last synced: 27 Jan 2026

https://github.com/ddihora1604/iit_patna

A multifaceted project involving applying ML models like Ridge Classifier, RNN, RIDOR, Rotation Forest and RUSBoost, integrating SMOTE for class balancing, and handling diverse datasets including those for seating arrangement tasks.

data-analysis data-visualization datamodelling machine-learning-algorithms python

Last synced: 25 Apr 2026

https://github.com/marielachirinosr/bellabeat-wellness-data-trends

Analyzing smart device data for insights on user activity patterns to optimize interventions for better health outcomes.

data data-analysis data-visualization pandas python python3 tableau tableau-public

Last synced: 25 Apr 2026

https://github.com/aastopher/mma_outcome

Simple exploratory analysis of UFC Fights and Vegas fight odds from 1993 to 2021

data-analysis data-visualization

Last synced: 06 Jun 2026

https://github.com/dcs-training/datavisualisationwithr2021

Data Visualisation with R Course (delivered by the Centre in October/November 2021). This workshop is focusing on good practice of creating graphs with R and R Studio. Go to the readme file

data-analysis data-visualisation data-wrangling r

Last synced: 23 Jun 2026

https://github.com/chandansoren/customer-personality-analysis

Predict how different customer segments will respond for a particular product or service.

data-analysis data-visualization python

Last synced: 26 Apr 2026

https://github.com/devexpress-examples/wpf-pivotgrid-customize-the-cell-template

This example demonstrates how to customize the cell appearance in Pivot Grid for WPF.

data-analysis dotnet dxpivotgrid pivot-grid pivot-grid-for-wpf wpf

Last synced: 26 Apr 2026

https://github.com/emaleckova/emaleckova.github.io

My personal website created with Quarto

biology data-analysis data-viz quarto r

Last synced: 23 Jun 2026

https://github.com/analysisbyvivek/crime-data

Analyzes crime patterns across different areas, exploring factors such as crime type, weapon usage, demographic influences, and geographic distribution to uncover trends in frequency, correlations, and hotspots.

apache-superset data-analysis eda jupyter-notebook python

Last synced: 11 May 2026

https://github.com/sferez/gradient_descent

Multiple Linear Regression, Gradient Descent with Python

data-analysis data-science gradient-descent linear-regression python

Last synced: 12 May 2026

https://github.com/deliprofesor/cinematic-data-analytics-and-recommendation-platform

This project analyzes a movie dataset using machine learning algorithms to predict success, explore revenue-popularity relationships, and develop recommendation systems. It employs techniques like K-Means, DBSCAN, GMM, decision trees, PCA, and NLP for insights and personalized suggestions.

clustering content-based-recommendation data-analysis data-visualization decision-tree gmm k-means machine-learning natural-language-processing nlp pca predictive-modeling python recommendation-system scikit-learn user-based-recommendation

Last synced: 26 Apr 2026

https://github.com/akashvarma26/data-analysis-on-imbd-using-sqlite3

Data Analysis on IMDb dataset using sqlite3 and Pandas in Jupyter notebook.

data-analysis jupyter-notebook pandas-dataframe sqlite

Last synced: 27 Apr 2026

https://github.com/arush-codes/paris-olympic-de

data engineering project on paris olympics 2024

azure data-analysis data-engineering microsoft-azure olympics2024 pipeline

Last synced: 27 Apr 2026

https://github.com/odinleepro/airbnbnewyorkcityanalysis

AirbnbNewYorkCityAnalysis is a comprehensive data analysis and visualization project exploring short-term Airbnb rental trends across New York City (2008–2022). Using open source Airbnb data, the project combines data cleaning, statistical summaries, and Tableau dashboards to uncover pricing patterns, borough level distribution, and insights.

airbnb analytics-project data-analysis data-cleaning data-science data-visualization new-york-city real-estate-analytics tableau urban-analysis

Last synced: 27 Apr 2026

https://github.com/mumtaz4118/amazon-iphone-12-data-scrapped

Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.

data-analysis data-extraction data-science data-scraping html mark-up python

Last synced: 27 Apr 2026

https://github.com/deliprofesor/amazon-movie-analysis-and-visualization

"Amazon Movie Analysis and Visualization" is a Python project that analyzes and visualizes movie data from Amazon.com, including ratings, directors, actors, release years, MPAA ratings, and pricing. The project provides insights into movie trends and popular films, helping users explore key patterns through interactive visualizations.

data-analysis data-visualization matplotlib pandas python

Last synced: 12 May 2026

https://github.com/sohamb21/analysis-of-superstore-dataset

I completed the IBM SkillsBuild Data Analytics Internship Program to develop my Data Analytics skills and apply them to a real-world problem by working on this project.

data-analysis python

Last synced: 27 Apr 2026

https://github.com/busesimsek/sql-projects

A collection of my SQL projects with insights into real-world datasets.

data-analysis data-analytics mysql sql

Last synced: 07 Jun 2026

https://github.com/leticia-ducatti/sales-dashboard-project

Interactive sales dashboard built with Python and Streamlit — shows KPIs, allows filtering, and visualizes sales data.

data-analysis pandas plotly python streamlit

Last synced: 12 May 2026

https://github.com/hfzdzakii/dicoding-airqualityanalysisdata

This repo is a master submission for my Dicoding Final Project. Air Quality Dataset is being used to fulfill the submission. Feel free to explore and I hope my work give you some insight!

data-analysis data-visualization streamlit

Last synced: 27 Apr 2026

https://github.com/caesaredia/food-app-user-behavior-analysis

Analyze user behavior and optimize app experience in a food-tech startup through funnel analysis and A/A/B testing. Includes data prep, visualization, and statistical testing in Python.

a-b-testing chi-square data-analysis data-visualization funnel-analysis python statistical-testing user-behavior

Last synced: 27 Apr 2026

https://github.com/airdac/ml-palmerpenguins

Classification and analysis of the palmerpenguins dataset in Python. Team project from UPC's Master's Degree in Data Science

classification data-analysis data-science machine-learning palmer-penguin python upc

Last synced: 07 Jun 2026

https://github.com/mango606/da__

2021.9 데이터분석프로그래밍 과제

data-analysis python task

Last synced: 27 Apr 2026

https://github.com/inddrsingh/restaurant_orders_mysql

Complex SQL queries on restaurant data for better and precise insights

data-analysis insights mysql

Last synced: 28 Jan 2026

https://github.com/sumit9000/submission-of-web-server-log-analysis-assessment

This project analyzes one year of real-world HTTP access logs from the University of Calgary’s computer science server. Using Python, pandas, and regular expressions, we clean and parse the data to extract meaningful insights and answer 10 analytical questions.

data-analysis data-cleaning eda jupyter-notebook log-parsing pandas python realworld-data regex web-log-analysis

Last synced: 14 Apr 2026

https://github.com/jbalooshie/pyber_analysis

Analysis of ride share data using Matplotlib and pandas, executed in Jupyter Notebook. Breakdowns are provided based on the city size, average fare, and number of rides taken.

data-analysis data-science data-visualization jupyter-notebook matplotlib pandas python

Last synced: 12 May 2026

https://github.com/josedanielchg/1990s-netflix-movie-insight

Small exploratory analysis of Netflix movie data from the 1990s. This project is part of the DataCamp Associate Data Scientist in Python program and focuses on filtering, visualizing, and extracting insights from a dataset using Python. Analyze trends in movie durations and count short action films to practice key data science skills!

beginner data-analysis python

Last synced: 27 Apr 2026

https://github.com/tillscode/personal-finance-ml-analysis

Machine learning analysis of personal financial data with predictive modeling and interactive dashboard

dashboard data-analysis finance machine-learning python scikit-learn

Last synced: 28 Apr 2026

https://github.com/sweta2501/netflix_dataanalysis

With the help of Netflix Data, I have done some Data Analysis.

data-analysis data-science jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/szymon-budziak/real_estate_house_prices_prediction

Predicting real estate house prices using various machine learning algorithms, including data exploration, preprocessing, model training, and evaluation.

data-analysis data-preprocessing data-science eda jupyter-notebook machine-learning matplotlib numpy optuna pandas predictive-modeling price-prediction python random-forest regression scikit-learn seaborn

Last synced: 21 Jan 2026

https://github.com/ygalvao/bra_scraper_2022

A web scraper bot for the 2nd round of the 2022 Brazilian Federal Elections.

data-analysis data-analytics selenium web-scraper webscraper

Last synced: 12 May 2026

https://github.com/simranshaikh20/diwali-sales-analysis-for-business-insights

A data analyst project on diwali sales . In this state according state , gender, age we are able to know how much sale it done.

data-analysis data-visualization python

Last synced: 28 Apr 2026

https://github.com/gmalbert/rugby

Rugby Data Analysis and Sports Betting

data-analysis rugby sports-betting

Last synced: 31 May 2026

https://github.com/min-thway-htut/r-programming

Repository for R-Programming

data-analysis r-programming

Last synced: 10 Jun 2026

https://github.com/datalopes1/warehouse_rfv

Neste projeto será realizada uma análise do tipo RFV (Recência, Frequência e Valor) com dados que encontrei neste video no Youtube do canal Jie Jenn.

analise-rfv data-analysis data-science kmeans python rfm-analysis

Last synced: 28 Apr 2026

https://github.com/krypten/playingcardsstatisticalanalysis

Statistical Analysis of Playing Cards (Descriptive Statistics: Final Project)

data-analysis machine-learning machinelearning python statistics udacity

Last synced: 12 May 2026

https://github.com/rajivaleaakash/customer-churn-prediction

A machine learning project focused on predicting customer churn using various data analysis and modeling techniques. The repository includes data preprocessing, feature engineering, exploratory data analysis (EDA), model training, evaluation, and visualization to help businesses identify customers at risk of leaving.

churn-prediction classification customer-churn data-analysis data-science gridsearchcv imblearn machine-learning numpy pandas pyhton randomsearchcv scikit-learn

Last synced: 28 Apr 2026

https://github.com/elmezianech/autoinventory

This project is an end-to-end, fully automated warehouse management solution designed to tackle real-world inventory challenges in the FMCG sector. From real-time data ingestion and predictive analytics to interactive dashboards, this project combines cutting-edge technologies and an event-driven architecture to simulate a business-ready system.

automation dashboard data-analysis data-engineering-pipeline docker etl glue-job inventory-management kafka kpis lambda-functions lstm ml-pipeline mlflow power-bi pytorch redshift s3 streamlit warehouse-management

Last synced: 28 Apr 2026

https://github.com/sufyan14/weather-data-analysis

A Streamlit dashboard that forecasts 30-day weather trends using uploaded CSV data and Facebook Prophet.

data-analysis python streamlit

Last synced: 28 Apr 2026

https://github.com/dcs-training/decode-winterschool

In here you can find material on cluster analysis, data wrangling, and network analysis. Go to the readme file for more info

data-analysis data-visualisation data-wrangling gephi network-analysis python r statistics

Last synced: 28 Apr 2026

https://github.com/manalisbhavsar/stock-price-prediction

Stock Price Prediction model using Machine Learning and LSTM to forecast future stock prices based on historical data. Achieved a low error rate of 3.2% by leveraging moving averages and deep learning techniques, ensuring accurate predictions.

data-analysis deep-learning lstm machine-learning matplotlib numpy pandas python

Last synced: 28 Apr 2026

https://github.com/rorrell/coviddeaths

A Jupyter Notebook where I create several visualizations based on data about COVID-19 deaths from 2020 to 2024

data-analysis data-visualization jupyter-notebook python3

Last synced: 28 Apr 2026

https://github.com/abhi227070/car-price-prediction

This project implements a machine learning model to predict the price of cars based on various features such as mileage, manufacturing date, fuel type, and more. Users can input car information, and the model will estimate the price of the car based on the provided data. This tool can be useful for both car buyers and sellers to estimate car price.

data-analysis machine-learning machine-learning-algorithms machinelearning python3 regression regression-models scikit-learn scikitlearn-machine-learning

Last synced: 28 Apr 2026

https://github.com/buabaj/fortran-assignment

code repository for fortran and python climatology assignment.

big-data climatology data-analysis data-visualization fortran90 python

Last synced: 28 Apr 2026

https://github.com/priyanshubiswas-tech/e-commerce_data_analysis

Analyzes 9,994 e-commerce transactions to uncover insights on sales trends, customer behavior, profitability, and logistics using EDA and visualization. Identifies top products, customer segments, and shipping efficiencies to optimize marketing, inventory, and operations, making it valuable for retail, finance, and logistics.

data data-analysis data-visualization pandas pandas-dataframe plotly-analytics-projects plotly-express python

Last synced: 28 Apr 2026

https://github.com/angelalim88/jakarta-air-quality-index-classification

This project classifies Jakarta's Air Quality Index (AQI) from 2010 to 2023 using machine learning models (Random Forest, MLP, SVM) based on pollutant concentrations.

data-analysis data-visua machine-learning scikit-learn tensorflow

Last synced: 13 Oct 2025

https://github.com/szapp/candyanalysis

Case study: Analyze the candy power ranking to identify and recommend popular candy characteristics

data-analysis data-visualization feature-selection interaction-terms

Last synced: 28 Apr 2026

https://github.com/elishah-john/happiness-report-2019

Analysis of "Happiness Report 2019" using python.

data-analysis data-visualization educational jupyter-notebook python

Last synced: 12 May 2026

https://github.com/tanzeelgcuf/medical-information-rule-based-prediction-model-with-api

a rule based system, that learns to make new rules, for medical information that will take the first 11 text fields and predict the last 2 text fields - the diagnosis and disposition. I needs to show the key words used to make the predicted diagnosis and disposition.

data-analysis django machine-learning-algorithms openpyxl python python3 rest-api

Last synced: 28 Apr 2026

https://github.com/priyanshu7639/data_visualization_dashboard

An Interactive data visualization tool that combines traditional plotting capabilities with modern AI assistance. It allows users to create and modify visualizations through natural language commands, making data exploration accessible to users of all skill levels.

business-analytics data-analysis data-engineering data-exploration data-science data-visualization datapreprocessing datascience interactive-visualizations matplotlib plotly plotting python research-tool streamlit

Last synced: 12 May 2026

https://github.com/jsimell/sleepanalysis

A Python data analysis project analyzing the sleep quality affecting factors and temporal patterns in the sleeping data of a single subject.

data-analysis matplotlib numpy pandas python scikit-learn seaborn

Last synced: 14 Apr 2026

https://github.com/josedanielchg/efficient-data-storage-for-predictive-modeling

DataCamp project from the Associate Data Scientist track, focusing on optimizing dataset storage by transforming data types and filtering. Prepares data for efficient machine learning workflows

cleaning-dataset data-analysis jupyter-notebook python

Last synced: 28 Apr 2026

https://github.com/leosimoes/alura-7daysofcode-dados

Desafios das Trilhas de Dados - Ciência de Dados, Machine Learning e Python Pandas.

data-analysis data-science jupyter-notebook machine-learning python

Last synced: 28 Apr 2026

https://github.com/ricram2/column-name-extractor

Jupyter Notebook. Takes Folder with one or more CSV and gives back one CSV with a compendium of column names and 3 example values (first, random, random)

data-analysis pandas

Last synced: 29 Apr 2026

https://github.com/i-am-uchenna/sql-data-warehouse-project

The Data Warehouse and Analytics Project is a comprehensive initiative designed to demonstrate the end-to-end process of building a modern data warehouse and deriving actionable insights through SQL-based analytics.

architecture business-intelligence crm data data-analysis database database-management datawarehouse erp etl etl-pipeline model sql sqlserver

Last synced: 15 May 2026

https://github.com/prady2309/sales-prediction-using-python

Implemented using Multiple Linear Regression

data-analysis data-science machine-learning python

Last synced: 29 Apr 2026

https://github.com/emircanakyuzz/veri_gorsellestirilmesi_ve_analizi-analysis_and_visualization_of_dataset

Bu çalışmada numpy, pandas, seaborn ve matplotlib gibi veri biliminde çokca bilinen modülleri kullanarak analiz ve görselleştirme işlemleri gerçekleştirdim.

data-analysis data-science data-visualization jupyter-notebook python

Last synced: 29 Apr 2026

https://github.com/thanaraklee/pyspark-dataframe-operations

This project focuses on utilizing PySpark DataFrames to analyze and visualize data sourced from external datasets, such as CSV files. It provides a practical example of how to manipulate, transform, and gain insights from large datasets using the PySpark framework.

data-analysis dataframe pyspark python

Last synced: 29 Apr 2026

https://github.com/takshshah-16/pizza_sales_sql

SQL-powered pizza sales analytics project using MySQL Workbench to derive business insights through data exploration and queries.

business-intelligence data-analysis database-management mysql sql

Last synced: 09 Oct 2025

https://github.com/kawshik-khan/fake-news-analysis

A fake news detection ML model. It utilizes the Bag of Words model for text vectorization and a Multinomial Naive Bayes classifier to predict whether news articles are real or fake. The project covers data preprocessing, model training, and performance evaluation with accuracy metrics and a confusion matrix.

data-analysis data-science machine-learning ml python3

Last synced: 08 Jun 2026

https://github.com/prateek5525/yt-analysis-project

This project utilizes the YouTube Data API to analyze channel and video performance, offering insights into subscriber counts, views, video metrics, and monthly trends. It generates visual reports and exports data in CSV format, aiding in effective decision-making and performance tracking.

data-analysis jupyter-notebook python3 seaborn-plots youtube-api

Last synced: 29 Apr 2026

https://github.com/nivasharmaa/spiderverse

A comprehensive Java program for analyzing and managing events and data points within a fictional spiderverse. Features event handling, anomaly detection, cluster management, and robust file I/O operations.

advanced-algorithms anomaly-detection clustering data-analysis file-io object-oriented-programming

Last synced: 29 Apr 2026

https://github.com/alefrp/properties_dbt

A DBT project for analyzing city property data.

data-analysis data-warehouse dbt python sql

Last synced: 13 Oct 2025

https://github.com/kasraskari/learn-r-codes

A learning repository for R programming, covering data manipulation, visualization, and statistical analysis. (Work in progress!) 🚧

data-analysis data-analysis-r data-visualization r r-examples r-graphics r-statistics statistics

Last synced: 08 Jun 2026

https://github.com/satyacoder29/crowdfunding-in-sql

Crowdfunding is a method of raising funds for projects or causes by collecting small contributions from a large group of people, usually through online platforms. It enables individuals, startups, and nonprofits to secure funding, offering rewards or recognition in exchange, and helps bring ideas to life without traditional financing.

data-analysis data-cleaning database-management mysql-database quries sql sql-functions sql-server views

Last synced: 29 Apr 2026

https://github.com/agailloty/preprocess

preprocess is a fast data analysis preprocessing tool.

cli data-analysis preprocessing-data

Last synced: 12 May 2026

https://github.com/mdaffailhami/king_county_home_sales_analysis

This repository contains code and analysis for exploring home sales data in King County, featuring geospatial mapping to visualize trends and factors influencing housing prices, including location, size, and various property features, using Python and popular data analysis libraries.

data-analysis data-science folium-maps geospatial python

Last synced: 29 Apr 2026

https://github.com/fisseha-estifanos/telecom

A showcase repository for a specific telecommunication company. Used to analyze several telecommunication data set features and generate useful insights accordingly. Insights generated could be seen at https://github.com/Fisseha-Estifanos/telecom-visualizer or at https://fisseha-estifanos-telecom-visualizer-home-huxgy0.streamlitapp.com/

data-analysis notebooks-jupyter python visual-studio-code visualization

Last synced: 12 May 2026

https://github.com/parthds02/-daily-calorie-count-meal-plan-generator-

Welcome to the Daily Calorie Count Meal Plan Generator project! This Streamlit web application is designed to create personalized meal plans based on user inputs such as age, weight, gender, and calorie goals. It also allows users to download their customized meal plans as PDFs.

calories-tracker data-analysis data-science pdf-generation streamlit vscode

Last synced: 13 May 2026

https://github.com/dcs-training/network-analyisis-python

Course material for introducing data visualization with Altair and network analysis with NetworkX (in Python). Go to the readme file

data-analysis data-visualisation network-analysis python text-analysis

Last synced: 29 Apr 2026

https://github.com/i7t5/sentimentnlp

Sentiment analysis for COMP 435 Introduction to Machine Learning, Spring 2025

data-analysis jupyter-notebook machine-learning nlp python sentiment-analysis

Last synced: 29 Apr 2026

https://github.com/devanshsahu47/prime-content-analytics

Prime Data Explorer analyzes Amazon Prime's content and credits data to uncover trends in release years, genres, and ratings. It cleans, merges, and visualizes the data to provide actionable insights for optimizing content strategy and boosting audience engagement.

data-analysis data-visualization exploratory-data-analysis jupyter-notebook python3

Last synced: 13 May 2026

https://github.com/findmyway/dataframe-in-julia

A quick introduction of DataFrame in Julia for users from Python

data-analysis dataframe julia jupyter-notebook

Last synced: 29 Apr 2026