An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/grypesc/graduateadmissions

Visualization, analysis and predictive modeling of a Kaggle graduate admissions dataset.

data-analysis data-mining data-science data-visualization dataset

Last synced: 08 Jul 2025

https://github.com/markoshb/machine-learning-subject

Implementation of multiclass classification problems in R

classification-model data-analysis r

Last synced: 14 Mar 2025

https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform

This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.

bigquery data data-analysis data-modeling live-streaming sql

Last synced: 23 Jun 2025

https://github.com/hfxbse/dhbw-data-analysis

Exploratory data analysis R notebook for the module T3INF4333 "Grundlagen Data Science" held in 2024 by Lothar B. Blum at the DHBW Stuttgart.

data-analysis data-science dhbw dhbw-stuttgart ggplot2 r r-notebook

Last synced: 04 May 2026

https://github.com/virajbhutada/article-recommendation-system

This project aims to redefine content discovery by delivering personalized article recommendations tailored to individual user preferences. We use advanced machine learning techniques like PCA and K-means clustering to analyze user behavior and article characteristics to provide highly accurate recommendations.

anaconda article-recommendation clustering-algorithm data-analysis data-science keras-tensorflow machine-learning machine-learning-algorithms ml-models numpy pandas plotly python scikit-learn scipy

Last synced: 06 Jan 2026

https://github.com/tnickster/ai-analyst-agent

Ask questions about your business data in plain English, Get automatic SQL queries and visualizations, Receive AI-powered insights and recommendations, No SQL knowledge required

ai-assistant business-analytics business-intelligence data-analysis data-analyst data-visualization database-query gpt-4 langchain llm mysql natural-language-processing openai plotly python sql-generation streamlit

Last synced: 08 Apr 2026

https://github.com/roberto-butti/fit_explorer

FIT File Explorer, in GO Lang

data-analysis fitness geospatial golang

Last synced: 12 Apr 2025

https://github.com/garcane/nike_web_crawler

This project involves web scraping Nike's product pages to extract product names, prices and links. The project showcases three different implementations of the web crawler using Selenium and BeautifulSoup. It also includes visualisation of the scraped data using Matplotlib and Seaborn.

beautifulsoup data-analysis data-visualization python selenium web-crawler web-scraper webcrawler webscraper webscraping webscraping-beautifulsoup

Last synced: 18 Apr 2026

https://github.com/sarathchandranpm/cleaning-and-exploratory-analysis-of-global-layoff-data

This project involves a thorough data analysis and cleaning process centered on global layoff data. It showcases advanced data management abilities by integrating data cleaning methods with a detailed exploration of workforce reduction patterns across various companies, industries, and countries.

data-analysis data-cleaning mysql sql

Last synced: 22 Sep 2025

https://github.com/jen-uis/loan-status-prediction

This repository contains project materials for the Winter STAT 206 class, University of California, Riverside, A. Gary Anderson School of Management.

data data-analysis data-analytics data-cleaning data-visualization descriptive-analytics julia julia-language jupyter-notebook predictive-analytics predictive-modeling team-collaboration

Last synced: 02 Jan 2026

https://github.com/messi10tom/ai-based-grade-prediction

GDSC task-1: Build a model to predict a student’s final grade based on features such as attendance, participation, assignment scores, and exam marks.

ai data-analysis data-science regression streamlit

Last synced: 02 May 2026

https://github.com/revan-alqahmi/summarize-talabat-company-reviews

Natural Language Processing Project, which is a program that analyzes Arabic comments at Talabat Company and classifies them into positive, negative, and neutral using machine learning algorithms and natural language processing techniques.

artificial-intelligence data-analysis machine-learning-algorithms natural-language-processing python

Last synced: 11 Jan 2026

https://github.com/jcbritobr/iris

Iris dataset and data analysis with julia language.

data-analysis data-science data-visualization iris-dataset julia-language

Last synced: 06 Apr 2025

https://github.com/anurag-kumar-molankala/anurag-kumar-molankala

👋 About Me I'm a Power BI Developer with a passion for data visualization and UI/UX design. I create interactive dashboards that turn data into clear, actionable insights for smarter decision-making.

business-intelligence dashboards data-analysis data-visualization dax-query mlanguage powerbi sqlserver uiuxdesigner

Last synced: 25 Jan 2026

https://github.com/sumidcyber/dataviz-master

This Python application provides a user-friendly interface to load and visualize the contents of a CSV file. Users can choose from various types of graphs and perform analyses on the dataset.

data-analysis data-analysis-project data-analysis-python database databases python python3

Last synced: 02 Jan 2026

https://github.com/pymarcus/tcc_sistemasdeinformacao2025

This application is part of a research project aimed to use Gemini AI agent to identify "atoms of confusion" -- minimal code elements that cause misunderstandings -- in the context of Software Engineering.

atoms-of-code ci-cd clean-architecture concurrent-programming data-analysis design-patterns gemini-api golang ifmg inteligencia-artificial postgresql software-engineering solid tcc tdd workerpool

Last synced: 14 May 2026

https://github.com/theanujsinha01/rainfall-prediction-using-machine-learning

This project predicts whether it will rain or not based on weather features like pressure, humidity, dew point, cloud cover, sunshine, wind direction, and wind speed. We use a Random Forest Classifier, a popular ML algorithm, trained on historical weather data. The model learns patterns and helps us forecast rain chances.

classification data-analysis eda machine-learning-algorithms matplotlib numpy pandas python scikit-learn seaborn supervised-learning

Last synced: 11 Apr 2026

https://github.com/arv-anshul/easy-analysis

A python package to perform Data Analysis easily. (Not Recommended)

arv-dumped data-analysis data-science easy-analysis eda pypi pypi-package python3

Last synced: 14 May 2025

https://github.com/jen-uis/la-crime-data-analysis

This repository contains project materials for the Fall 2023 MGT 256 class. This project is completed with assists from Professor Adem Orsdemir.

business-analytics crime-data crime-data-analysis data-analysis knn la-crimes-from-2020 la-safe r r-markdown r-studio report-generation rmd united-states visualization

Last synced: 14 Mar 2025

https://github.com/karatechop/noaa-storm-database-data-analysis

Analysis of population health and economic consequences of events documented in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

data-analysis knitr r rmarkdown

Last synced: 14 Mar 2025

https://github.com/misaghmomenib/stock-momentum-analysis

A Python-based Data Analysis Tool Designed to Evaluate Stock Momentum. Leverages Historical Market Data to Identify Trends, Predict Price Movements, and Assist in Making Informed Investment Decisions.

data-analysis data-analysis-python data-visualization git open-source python

Last synced: 10 Apr 2025

https://github.com/x1ao4/doc-merger

通过 python 脚本将两个相对不完整的文档合并为一个完整的文档 / merge two relatively incomplete documents into one complete document via python script

data-analysis data-merging document-analysis document-comparison document-processing documents filtering filtering-data merge merge-documents

Last synced: 28 Jun 2025

https://github.com/hafeez-urrehman/mobile-price-classification

In the Mobile Price Classification project, I built a predictive model to categorize mobile phones into different price ranges based on their features by applying machine learning techniques.

data-analysis linear-regression machine-learning mobile-price-prediction model-save-and-load predictive-modeling

Last synced: 15 May 2026

https://github.com/garciparedes/castile-and-leon-crops

Data Analysis of Castile and Leon Crops Area over the last years

castile-and-leon crops data-analysis data-science jupyter jupyter-notebook notebook spain

Last synced: 06 Jun 2026

https://github.com/palwisha-18/weather-api

Weather API built in Flask

data-analysis flask html pandas python

Last synced: 10 May 2026

https://github.com/sarincr/basics-of-julia-programming-language

Julia is a high-level, high-performance, dynamic programming language. While it is a general purpose language and can be used to write any application, many of its features are well-suited for high-performance numerical analysis and computational science.

data data-analysis data-mining data-science data-visualization dataanalysis dataanalytics datascience julia julia-language julia-library julia-package julialang machine-learning

Last synced: 19 May 2026

https://github.com/mijisu0103/sk-ai-data-academy

This repository contains the projects and assignments completed during the SKADA course, which focused on foundational skills in Python, data analysis, and machine learning. Here, you will find various scripts, notebooks, and documentation that showcase my learning journey and the practical applications of the concepts covered in the course.

data-analysis machine-learning python

Last synced: 10 Jul 2025

https://github.com/ronaldkanyepi/python-sreamlit-duplicate-records-finder-remover

This is a duplicate remover on csv,excel or txt files based on single or multi columns

css data-analysis data-visualization datascience python streamlit

Last synced: 07 May 2026

https://github.com/gher-uliege/stareso-data-processing

A set of tools to read, plot and process data from STARESO

coastal corsica data-analysis data-processing ocean-sciences oceanography

Last synced: 30 Mar 2025

https://github.com/jasontanx/capstone-project-machine-learning

A final semester project from my MSc Data Science course

data-analysis datascience machinelearningprojects tourism-data

Last synced: 26 Mar 2025

https://github.com/rayyan9477/coin-detection-project

This Coin Detection Project leverages machine learning techniques to identify coins using a dataset from Kaggle. Key libraries utilized include OpenCV for image processing, TensorFlow for model training, and Pandas for data manipulation. The project also employs NumPy for numerical operations and Matplotlib for visualization.

computer-vision data-analysis data-science data-visualization machine-learning notebook python

Last synced: 15 May 2026

https://github.com/manfredhair/wine-analysis-knn

wine data analysis using KNN with python and panda and sklearn

data-analysis data-science knn wine

Last synced: 16 Sep 2025

https://github.com/udaykumar-dhokia/d8a

d8a is a modern data analytics platform that provides powerful visualization and analysis tools for your data.

data-analysis data-visualization fullstack-development

Last synced: 16 Sep 2025

https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm

A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.

beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews

Last synced: 06 May 2026

https://github.com/jamesnw/wtb-data

Explore beer addition and style info from WhatToBrew.com

data-analysis homebrewing jupyter-notebook python3

Last synced: 18 Apr 2026

https://github.com/atharvbyadav/dark-store-feasibility-analysis

A hackathon project analyzing the feasibility of setting up dark stores using data-driven insights. Focuses on demand clustering, location intelligence and logistics optimization.

business-intelligence dark-store data-analysis geospatial-analysis hackathon hackathon-project location-intelligence logistics pandas python retail-analytics urban-planning visualization

Last synced: 20 Jan 2026

https://github.com/wiseaidev/truth-guard

Analyzing a 79k Dataset of Misinformation and Fake News

data-analysis fastapi lstm machine-learning python supervised-learning

Last synced: 19 Jan 2026

https://github.com/edseldim/FirstRoundElectionsFr

A data visualization spreadsheet on Excel

data-analysis data-visualization excel pandas python

Last synced: 02 Aug 2025

https://github.com/viseshrp/community_health_indicator

Android app to fetch,organize and represent NYC health data

android data-analysis data-visualization health

Last synced: 03 Mar 2025

https://github.com/sourabh-kumar04/numpy-basic

Numpy-Basic is a structured learning repo covering NumPy from basics to advanced. It includes arrays, indexing, reshaping, filtering, vector ops, angle functions, stats, and .npy file handling. Each concept is explained with code, examples, and Matplotlib visualizations in both light and dark modes. Ideal for students and data learners.

data-analysis data-science data-visualization learning learning-resources machine-learning matplotlib numerical-computing numpy python python-library python-programming

Last synced: 10 May 2026

https://github.com/mksingh431/python-project

Learn Pandas with exercises and sample projects

data-analysis data-science data-visualization project projects python

Last synced: 03 May 2026

https://github.com/luminati-io/amazon-dataset-samples

A sample dataset of over 1,000 Amazon product listings, extracted using the Bright Data API, perfect for competitive analysis, market trends, and eCommerce insights.

amazon api data-analysis data-science dataset ecommerce products web-scraping

Last synced: 03 Jan 2026

https://github.com/muneeb1030/eda-of-physionets-ecg

EDA of Physionet Data set regarding "A Large Scale 12 Lead Electrocardiogram Database for Arrhythmia Study 1.0.0". This project focuses on the preprocessing of electrocardiogram (ECG) signals and utilizes Principal Component Analysis (PCA) for dimensionality reduction

12-lead-ecg data-analysis ecg-signal eda pca python3 wfdb

Last synced: 25 Jul 2025

https://github.com/balapriyac/python-data-analysis

Code along to simple data science and analysis projects and tutorials

data-analysis data-science python

Last synced: 25 Jul 2025

https://github.com/hariyebk/eplinsights

English Premier League 2018/2019 Data Analysis

class-composition data-analysis filesystem-library

Last synced: 26 Jul 2025

https://github.com/hoangsonww/fred-banking-data-analysis

💸 AI-powered banking data explorer that combines FRED API insights with vector search, regression analysis, and interactive chat via OpenAI, Claude, and Gemini. Built with TypeScript, React, and Express for seamless full-stack performance.

anthropic chartjs claude-ai data data-analysis data-analytics data-science data-visualization fred fred-api gemini google-generative-ai logistic-regression multiple-regression openai pinecone react regression typescript vector-database

Last synced: 09 Apr 2025

https://github.com/rayyan9477/diamond-price-forecasting

This is a comprehensive machine learning project focused on predicting diamond prices. Using a dataset of diamond attributes, the project implements various machine learning models to forecast prices. Key features include data preprocessing, exploratory data analysis (EDA), and model training with algorithms such as Linear Regression, Decision Tree

data-analysis data-science decision-trees eda linear-regression machine-learning

Last synced: 26 Jul 2025

https://github.com/incubrain/awesome-maharashtra-data

A collection of datasets specific to Maharashtra, India. WIP

ai artificial-intelligence data data-analysis data-science datasets maharashtra marathi

Last synced: 23 May 2026

https://github.com/windjammer6/8.-star-wars-data-analysis-python

A personal project to analyse data from a Star Wars survey. Python libraries used: Pandas, Matplotlib

data-analysis python

Last synced: 27 Jul 2025

https://github.com/adrianycmc/introducaoadatascience

Explorando dados: Utilizando Python, Pandas e o Colaboratory do Google.

data-analysis data-science jupyter pandas python

Last synced: 30 Apr 2026

https://github.com/tbep-tech/ecometab-r-training

Website materials for R training on ecosystem metabolism

data-analysis open-science workshop

Last synced: 19 Feb 2026

https://github.com/tbep-tech/verified-wbids

Materials for evaluation of verified WBIDs in the Tampa Bay watershed

data-analysis open-science tampa-bay tbep water-quality

Last synced: 19 Feb 2026

https://github.com/birkkarlsen/beam_dynamics_tools

Repository filled with functions related to the analysis of longitudinal beam dynamics measurements and simulations

accelerator-physics beam-dynamics data-analysis

Last synced: 19 Sep 2025

https://github.com/tbep-tech/red-tide-twitter

Supplementary materials to accompany Skripnikov et al. red tide Twitter analysis

ccmp-li1 ccmp-wq3 data-analysis open-science tampa-bay tberf water-quality

Last synced: 19 Feb 2026

https://github.com/codingprivacy/feedback-portal-system

AI based Feedback Portal System which takes periodic feedbacks from users via highly human friendly chat-bot, analyse the responses through NLP and sentiment analysis and visualize the analysis on the portal website.

artificial-intelligence bokeh chatbot data-analysis flask mysql-database nlp portal python sentiment-analysis visualization website

Last synced: 19 Sep 2025

https://github.com/jcaperella29/financial-data-scraper

Financial Data Scraper is a Python-based web scraping tool using Selenium to extract financial data from Stock Analysis. It scrapes Income Statement, Balance Sheet, Cash Flow, and Ratios for multiple companies and saves them as CSV files.

automation data-analysis finance financial-statements investment python selenium stock-market web-scraping

Last synced: 28 Jul 2025

https://github.com/andystmc/nextflownyc

Developed a machine learning model (Bidirectional LSTM) to forecast NYC traffic volumes using 10 years of automated traffic count data. Achieved strong predictive accuracy, demonstrating the power of deep learning for urban traffic analysis.

data-analysis data-cleaning data-science data-visualization exploratory-data-analysis feature-engineering hyperparameter-tuning jupyter-notebook lstm-neural-networks machine-learning numpy pandas predictive-modeling python3 scikit-learn tensorflow-keras traffic-flow-forecasting

Last synced: 07 Apr 2026

https://github.com/nafisalawalidris/sales-performance-dashboard

Sales Performance Dashboard: Analyze and visualize sales data using Power BI. Gain insights into trends, customer segments, product performance, and geographic distribution. Make data-driven decisions to optimize sales strategies and maximize revenue.

analytics-revenue dashboard-power-bi data data-analysis intelligence-sales optimization performance sales visualization-business

Last synced: 03 Feb 2026

https://github.com/numbersprotocol/dyda

Dynamic data pipeline framework

ai artificial-neural-networks data-analysis data-science

Last synced: 07 Nov 2025

https://github.com/archived-blueprints/amazonathena-blueprints

Simplified blueprints for building data pipelines with Amazon Athena.

amazon-athena athena cli data-analysis data-engineering data-science elt etl

Last synced: 29 Jul 2025

https://github.com/rayyan9477/youtube-spam-detection-with-flask-and-machine-learning

This is a web application built using Flask that detects spam comments on YouTube using a Naive Bayes classifier. It leverages techniques such as CountVectorizer for feature extraction and scikit-learn for machine learning. The application reads data from a CSV file and predicts whether a comment is spam or not.

data-analysis data-science machine-learning nlp-machine-learning spam-detection

Last synced: 21 Sep 2025

https://github.com/atharvbyadav/expensemate

A simple, lightweight personal finance tracker built with Streamlit and SQLite. Log expenses, visualize spending habits, manage budgets, and download reports – all through an interactive web interface.

budgeting data-analysis data-visualization expense-tracker finance-app open-source pandas personal-finance plotly python sqlite streamlit streamlit-webapp

Last synced: 28 Apr 2026

https://github.com/hemanthkumarsunkari27/pmay_analysis_project

Built for the 1st AI for Good Hackathon by Snowflake, this project uses data analytics and AI to explore housing and sanitation trends in India under PMAY. Using Snowflake and Streamlit, it provides interactive insights into regional disparities, helping guide sustainable infrastructure development.

data-analysis data-visualization pmay-analysis sanitation-coverage snowflake-integration streamlit-dashboard sustainable-development

Last synced: 26 Mar 2025

https://github.com/tynoee/covid19_data_analysis

This is an analysis of Covid 19 dataset using multiple SQL queries. The dataset used for this analysis includes various information regarding COVID-19 cases such as confirmed cases, deaths, and recoveries, segmented by different geographical locations and time periods.

data-analysis excel sql sqlserver-2019 tableau tableau-public

Last synced: 16 Feb 2026

https://github.com/prajakta1321/kaggle-ai-report-2023

A Report describing the trends in emergence of AI over the years !

data-analysis data-visualization python3

Last synced: 28 Jun 2025

https://github.com/gad-dimnt-cptec/scanplot

Um sistema de plotagem simples para o SCANTEC

data-analysis jupyter-notebook pandas python scantec

Last synced: 17 Jan 2026

https://github.com/gappeah/london-housing-price-dashboard

This Excel-based Housing Visual Dashboard provides a comprehensive view of average house prices across various boroughs in London from 1996 to 2013. The dashboard is designed to offer insights into housing market trends and price variations across different areas of London over time.

data data-analysis data-visualization excel visual

Last synced: 31 Jul 2025

https://github.com/dannyben/datamix

DSL for manipulating tabular data

csv data data-analysis data-engineering gem ruby tabular-data

Last synced: 31 Jul 2025

https://github.com/michellepellon/jobx

A modern, powerful job scraper for LinkedIn, Indeed and beyond.

compensation data data-analysis indeed indeed-scraping jobs jobsearch linkedin linkedin-scraper

Last synced: 17 Jan 2026

https://github.com/banyc/dfsql

SQL REPL/lib for Data Frames

cli csv data-analysis jsonl ndjson repl sql

Last synced: 31 Jul 2025

https://github.com/tim-hub/python-course

A new Python Course, a new trial to offer MOOC style learning resources and content for python learners

data-analysis learning python

Last synced: 17 Mar 2025

https://github.com/myself-aas/quantium_data_analytics_forage

This project analyzes retail customer chip purchasing behavior using Python, focusing on customer segmentation and key spending drivers to provide data-driven insights for strategic category management recommendations.

data-analysis data-engineering data-science data-visualization feature-engineering forage internship-project matplotlib-pyplot numpy-library pandas-dataframe pearson-correlation python quantium-virtual-experience scipy-stats seaborn

Last synced: 31 Jul 2025

https://github.com/mynenik/xyplot-win32

XYPLOT Plotting and Data Analysis Program for 32-bit Windows

cpp data-analysis data-manipulation data-visualization forth mfc windows-app

Last synced: 18 Mar 2025

https://github.com/Zen204/airbnb-availability

A machine learning model that predicts Airbnb listing availability, utilizing feature engineering and supervised learning techniques to improve guest experience and optimize host management.

binary-classification data-analysis data-preprocessing data-visualization feature-engineering machine-learning matplotlib model-evaluation nlp pandas predictive-modeling python scikit-learn seaborn supervised-learning

Last synced: 02 Apr 2025

https://github.com/shubhamgoyal575/diwali-sankranti-promotion-sales

This Power BI dashboard analyzes sales performance during Diwali and Sankranti festivals. It provides insights into revenue trends, top-selling products, regional sales distribution, and customer purchasing behavior to help optimize festive season sales strategies. 🚀

buisness-intelligence dashboard data-analysis data-visualization diwali-sankranti-sales-analysis excel fast-moving-consumers-goods fmcg microsoft-power-bi mysql power-query powerbi revenue-insights sales-dashboard sales-insights sql

Last synced: 02 Mar 2026

https://github.com/athul64/exploratory-data-analysis

To preprocess and analyze the given employee dataset, present the findings graphically, and derive meaningful insights to help better understand the company’s workforce.

colab-notebook data-analysis data-visualization matplotlib numpy pandas python seaborn statistical-analysis

Last synced: 25 Feb 2026