An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/santiagortiiz/snowflake-data-warehousing

Snowflake University. Snowflake Data Warehousing. Foundamentals

big-data data-analysis data-warehouse olap snowflake

Last synced: 19 Mar 2026

https://github.com/mohamed3nan/udacity

Udacity Data Analysis Nanodegree Program

data-analysis data-visualization numpy pandas python

Last synced: 10 Apr 2026

https://github.com/chen0040/pyspark-advanced-algorithms

Samples of Advanced Algorithms and Data Analysis implemented in pyspark

advanced-algorithms data-analysis map-reduce pyspark

Last synced: 12 Jan 2026

https://github.com/anurag-kumar-molankala/data-professional-survey

This Power BI dashboard analyzes survey responses from data professionals, covering key aspects such as salary distribution, job satisfaction, and preferred programming languages. The insights help understand trends in the data industry and what matters most to professionals.

dashboard data-analysis data-visualization dax-measures dax-query demographics etl-process excel-import power-bi salary-analysis sql-server survey-analysis trend-analysis

Last synced: 02 Feb 2026

https://github.com/wewoc/garmin_local_archive

Secure, local-first archive for Garmin Connect health data (HRV, sleep, activities). Private & offline. Structured for local analysis (Excel, HTML-Dashboard, Ollama, Open WebUI, AnythingLLM). Your data stays on your machine.

backup dashboard data-analysis fitness-tracker garmin garmin-connect ollama open-webui privacy privacy-enhancing-technologies privacy-first privacy-focused python self-hosted

Last synced: 16 Apr 2026

https://github.com/snehilk1312/data_science

This Repository contains the Data Science things I have done in recent times along with visualization , cleaning , models, statistics, Courses, Datasets. :=)

data-analysis data-science glove natural-language-processing nlp nltk statistics word2vec

Last synced: 02 Apr 2026

https://github.com/sadia-khan13/supervised_machine_learning

This repository is meant to document my hands-on experience with supervised learning algorithms and techniques. It includes a variety of exercises, and experiments using different types of data and tools. Each file represents a step forward in building my machine learning skills.

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms python sciket-learn supervised-machine-learning

Last synced: 06 Mar 2026

https://github.com/michellepellon/jobx

A modern, powerful job scraper for LinkedIn, Indeed and beyond.

compensation data data-analysis indeed indeed-scraping jobs jobsearch linkedin linkedin-scraper

Last synced: 17 Jan 2026

https://github.com/archie-cm/a-b-testing-mobile-games

This project have objective to examine what happens when the first gate in the game was moved from level 30 to level 40. When a player installed the game, he or she was randomly assigned to either gate30 or gate40.

abtesting data-analysis python retention-rate

Last synced: 17 Apr 2026

https://github.com/hebaqaisar/movie-recommender-system

AI Recommender System - Recommends you similar movies based on Directors, Tags, Name, Type, Actors, Genre etc

artificial-intelligence data-analysis data-mining data-science jupyter-notebook machine-learning machine-learning-algorithms ml movies-rate pycharm python

Last synced: 17 Apr 2026

https://github.com/revan-alqahmi/summarize-talabat-company-reviews

Natural Language Processing Project, which is a program that analyzes Arabic comments at Talabat Company and classifies them into positive, negative, and neutral using machine learning algorithms and natural language processing techniques.

artificial-intelligence data-analysis machine-learning-algorithms natural-language-processing python

Last synced: 11 Jan 2026

https://github.com/alfikiafan/air-quality-analysis

This repository contains a comprehensive data analysis project on Air Quality Dataset, covering the complete data analysis process from data gathering, cleaning, exploratory data analysis (EDA), to building a fully interactive dashboard using Streamlit.

air-quality data-analysis dicoding

Last synced: 17 Apr 2026

https://github.com/discdiver/new-belgium-ratings

Find the most popular New Belgium beers of all time!

beautifulsoup data-analysis pandas python seaborn webscraping

Last synced: 10 Apr 2026

https://github.com/haloapping/pisangijo

Kumpulan library dan framework untuk analisa data, data science, machine learning, deep learning dan masih banyak lagi berbasis bahasa pemrograman Python 🐍.

belajar data-analysis data-science deep-learning forecasting libraries machine-learning perkakas pustaka python3 recommender-system referensi tools

Last synced: 13 Jun 2026

https://github.com/noodleslove/house-of-representative-analysis-i

This project uses public data about the stock trades made by members of the US House of Representatives.

data-analysis data-science eda kaggle-dataset matplotlib-pyplot pandas python stocks-trading

Last synced: 18 Apr 2026

https://github.com/jossimmar/ensa-ss25

Repositorio destinado al manejo de datos de consumo de los Clientes Mayores de ENSA del Grupo Distriluz.

data-analysis electrical-engineering python sqlite

Last synced: 30 Mar 2025

https://github.com/prernarohra/heart-disease-prediction

This project develops a machine learning model to predict heart disease risk based on symptoms and medical history. The model achieved the best accuracy with Logistic Regression, as it works well for binary classification problems.

artificial-intelligence data-analysis data-science dataset heartdisease-prediction machine-learning models

Last synced: 06 Nov 2025

https://github.com/mdaffailhami/customer-data-analysis

This repository contains code and analysis for exploring customer data, focusing on profiling and contact preferences. The project includes various stages of data processing, from raw data preparation to final cleaned datasets, and employs Python and popular data analysis libraries to uncover insights and trends.

data-analysis data-cleaning data-science data-visualization jupyter jupyter-notebook pandas plotly python

Last synced: 03 Mar 2026

https://github.com/noeyislearning/sharpe-ratio-amazon-facebook

Explore the Sharpe Ratio and its application to evaluate the performance of two tech giants: Amazon and Facebook.

amazon data-analysis data-science data-visualization facebook python3 sharpe-ratio

Last synced: 27 Mar 2025

https://github.com/djo/data-analysis

Data Analysis course notebooks in R

data-analysis r

Last synced: 29 Mar 2025

https://github.com/jrbourbeau/cr-composition

IceCube cosmic-ray composition analysis

cosmic-rays data-analysis machine-learning physics python

Last synced: 20 Apr 2026

https://github.com/ipanalytics/vpn-provider-overlap-intelligence

Aggregate VPN provider infrastructure overlap analysis: exact-IP overlap, shared /24 prefixes, hosting dependency, and provider relationship clusters. No raw VPN IP lists.

anti-fraud asn cybersecurity data-analysis fraud-detection infrastructure ip ip-intelligence ip-reputation network-analysis network-intelligence osint proxy-detection threat-intelligence vpn vpn-detection

Last synced: 25 May 2026

https://github.com/gorhkdwj/da_portfolio

Kim Jae Chun's DA_Portfolio

data data-analysis python sql

Last synced: 20 Feb 2026

https://github.com/ganeshkumartk/ncov-2019

[EDA] Statistical modelling of Novel Coronavirus breakout nCoV-2019

corona data-analysis ncov ncov-2019 statistics wuhan wuhan-coronavirus wuhan-virus

Last synced: 05 Jun 2026

https://github.com/virajbhutada/movie-rental-store-analytics-sql-powerbi-excel

Dive into the DVD rental industry with my Capstone project, Movie Rental Analytics. Analyzing the Sakila DVD Rental Store Database, I extract insights through exploratory data analysis (EDA) and Power BI visualizations. Findings inform strategies for optimizing film inventory, enhancing business operations, and customer experiences.

business-intelligence capstone-project customer-behavior-analysis data-analysis data-science excel exploratory-data-analysis film-ratings mece movie-database movie-rental mysql powerbi powerbi-visuals revenue-analysis sql sql-database

Last synced: 05 Jun 2026

https://github.com/chandansoren/diabetics_prediction

Predicting that whether the patient has diabetes or not on the basis of the features we will provide to our machine learning model.

data-analysis machine-learning python svm

Last synced: 06 Jun 2026

https://github.com/ibnaleem/cyberchef-discord

A versatile Discord bot that implements CyberChef's features for encoding, decoding, encrypting, compressing, analysing data directly and more in your Discord server

compression cti cyberchef cybersecurity data-analysis data-manipulation discord-bot discord-js encoding encryption hashing infosec parsing redteam

Last synced: 28 Jan 2026

https://github.com/anastasius21/imdb-movie-analysis

Analysis of IMDb's Top 1000 Movies dataset using Pandas, Matplotlib, and Seaborn. It provides visualizations and insights into various aspects of movies, such as ratings, genres, directors, and release years.

data-analysis data-exploration data-science data-visualization imdb imdb-dataset jupyter-notebook python

Last synced: 25 Apr 2026

https://github.com/asifdotexe/quickvu

Quick VU: No-code, data cleaning analysis and visualization tool built on Streamlit. Quickly clean, visualize, explore, and understand data relationships and correlations with ease. Perfect for analysts, business users, and anyone looking to gain data insights—without writing a single line of code.

automation data-analysis data-cleaning data-visualization python3 streamlit-application toolkit

Last synced: 06 Jun 2026

https://github.com/cdeweyx/game-of-thrones-s7e1-eda

Exploratory data analysis of scraped tweets related to Game of Thrones S7E1

data-analysis data-visualization python twitter-api

Last synced: 26 Apr 2026

https://github.com/dogoncouch/dhcptranslate

Parses ISC DHCP server config, performs DNS resolution as needed, and outputs lease data in CSV format.

configuration csv-format data-analysis isc-dhcp isc-dhcp-server migration-tool

Last synced: 20 Mar 2025

https://github.com/allanotieno254/codsoft

This repository showcases a series of data science projects completed during an internship with CODESOFT. Each project utilizes Python and various machine learning techniques to solve specific problems in data analysis, classification, regression, and predictive modeling.

classification data-analysis data-science feature-engineering machine-learning model-evaluation predictive-modeling python-programming regression

Last synced: 15 May 2025

https://github.com/gurpreetkaurjethra/ai-data-visualization-agent

This Streamlit application creates an interactive Data Visualization Assistant that can understand Natural Language Queries and generate appropriate Visualizations using LLMs.

aiagents aichatbot aidevelopment artificial-intelligence data-analysis data-visualization generative-ai llms

Last synced: 25 Jun 2025

https://github.com/akarshankapoor7/tensorflow_tutorial

This is an easy and fast tutorial for tensorflow. In data science, TensorFlow is an open-source machine learning framework by Google. It's used for building and training machine learning and deep learning models.

data-analysis data-science deep-learning machine-learning tensorflow

Last synced: 27 Apr 2026

https://github.com/pawlo77/kaggle-project

Repository for 'kaggle' project of Data Science Scientific Circle at Faculty of Mathematics and Information Science, Warsaw University of Technology

data-analysis data-science eda maschine-learning

Last synced: 20 Mar 2025

https://github.com/mengyaohuang/data-manipulation-and-analysis

Data processing implementation with tools in Python

data-analysis nlp-machine-learning pandas-dataframe python

Last synced: 27 Apr 2026

https://github.com/husna-poyraz/titanic-machine-learning

Use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.

data data-analysis data-science data-visualization deep-learning machine-learning missing-data outlier-detection python titanic

Last synced: 10 May 2026

https://github.com/alxrm/scent-of-literature

Russian literature sentiment analysis in terms of very small dataset

classification data-analysis sentiment-analysis sklearn tf-idf

Last synced: 28 Apr 2026

https://github.com/ajwad-shaikh/sristi-sanshodh-collect

SRISTI Sanshodh Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments. Contribute and make the world a better place! ✨📋✨ https://docs.opendatakit.org/collect-…

collect data-analysis data-collection javarosa odk opendatakit

Last synced: 04 Apr 2025

https://github.com/p2-718na/alice-simulation

Code for my Lab-2 course.

cern-root data-analysis

Last synced: 13 Mar 2025

https://github.com/5ekastanx/data-analysis

Extracting data from parsing, for example, like hacking using Python using all sorts of function methods

data-analysis html python

Last synced: 14 Mar 2025

https://github.com/akashparley/stocklyzer

Stocklyzer is a real-time stock analysis web app built with Streamlit. It features stock performance tracking, technical indicators, CAPM-based risk-return insights, and ARIMA-based price prediction. Ideal for finance enthusiasts, analysts, and learners exploring data-driven investing tools.

arima-forecasting data-analysis financial-analysis machine-learning stock-price-prediction

Last synced: 16 May 2026

https://github.com/madhuresh2011/amazon-sales-report-analysis-using-python

This project focuses on analyzing Amazon sales data using Python to uncover insights into sales performance, customer behavior, and product trends

charts cleaning-data data-analysis jupyter-notebook matplotlib numpy pandas python seaborn visualization

Last synced: 17 Apr 2026

https://github.com/mardavsj/weather-prediction

Weather prediction model which mainly focuses on visualization.

data-analysis data-visualization matplotlib numpy pandas pandas-dataframe

Last synced: 10 Apr 2026

https://github.com/walkerdustin/vergleich-von-messmethoden-fuer-punktwolken

Bei der Vermessung eines physischen Raumes ist das Ergebnis eine Punktwolke. Diese Punktwolke beschreibt dann ausgewählte Punkte im Raum, zum Beispiel auf den Wänden und der Decke. Wenn diese Punkte in zwei seperaten Messungen gemessen werden, vielleicht sogar von unterschiedlichen Geräten, soll hinterher herausgefunden werden wie genau diese Punktwolken übereinstimmen. Dafür gibt es zwei grundsätzlich verschiedene Methoden. Diese sollen hier verglichen werden.

3d-models accuracy-metrics data-analysis data-visualization kaggle measure-distance numpy point-cloud pointcloudprocessing punkte python science-research simulation statistics

Last synced: 11 Apr 2026

https://github.com/ivanildobarauna-dev/api-to-dataframe

Python library that simplifies obtaining data from API endpoints by converting them directly into Pandas DataFrames. This library offers robust features, including retry strategies for failed requests.

data-analysis data-analytics data-engineering library pypi-packages python

Last synced: 06 Mar 2025

https://github.com/faris771/investigate_a_dataset

This repository contains a Jupyter Notebook that investigates a dataset using data analysis techniques.

data-analysis

Last synced: 29 Apr 2026

https://github.com/jhrcook/wagenmaker-data-analysis

Analysis of Registered Replication Report: Strack, Martin, & Stepper (1988) by Wagenmaker et al.

data-analysis r r-project statistics

Last synced: 08 Jun 2026

https://github.com/alemalvarez/data-analysis-web-project

Web-app providing a simple interface for data storage,

data-analysis data-science javascript react webapp

Last synced: 29 Apr 2026

https://github.com/happybono/sonatasmooth

Provides three different noise reduction algorithms for smoothing out data : Rectangular Averaging, Binomial Median Filtering, and Binomial Averaging. It processes data from a list and displays the results in another list.

algorithms average binomial binomial-coefficient binomial-theorem calibration csharp data-analysis data-calibration dynamic-noise-reduction median noise-algorithms noise-reduction noise-reduction-kernel outliers rectangular-averaging windows-desktop windows-desktop-application windows-forms winforms

Last synced: 30 Oct 2025

https://github.com/iamjuniorb/data_structures_and_algorithms

I'm working on Data Structures and Algorithms I C949 class in school and decided to write up all of these searching algorithms, sorting algorithms, strutures, and so on to get a better understanding. These can be used with large datasets to test their space and time complexities.

data data-analysis data-science data-structures datastructures datastructures-algorithms datastructuresandalgorithm math mathematics programming python python-app python-library python3

Last synced: 08 Jun 2026

https://github.com/odeyiany2/flit-apprenticeship-data-science-projects

This repo contains all my projects for my FLiT Apprenticeship

data-analysis data-science data-visualization machine-learning sql

Last synced: 17 May 2026

https://github.com/arielle0222/data_analysis

📊 Data analysis projects for autonomous driving and smart mobility engineering using Python and SQL.

autonomous-driving composite data-analysis electric-vehicles environmental-data python visualizatoin

Last synced: 30 Apr 2026

https://github.com/mkk-1817/hr-attrition

This project, conducted during my internship at MeriSKILL, focuses on HR Attrition Prediction using advanced Machine Learning models. The initiative includes the development of a dynamic Dashboard and in-depth Analysis to offer actionable insights for proactive human resource strategies.

data-analysis data-science data-visualization jupyter-notebook machine-learning-algorithms powerbi python

Last synced: 03 May 2026

https://github.com/hetuvpatel/research-chatgpt

Research and data analysis project evaluating the social, ethical, and educational impacts of ChatGPT using survey-driven insights and Python-powered data analysis. 📚🤖

data-analysis matplotlib pandas python seaborn

Last synced: 01 May 2026

https://github.com/billgewrgoulas/hypothesis-testing-on-37-seasons-of-nba

First assignment for the course Data Mining @CSE.UOI

data-analysis data-science numpy scipy seaborn statistics

Last synced: 01 May 2026

https://github.com/lisa-ho/breadit

Respository for scraping and analysing data from the Reddit/Sourdough community to explore lockdown baking trends.

data-analysis data-viz nltk python reddit-api sentiment-analysis web-scraping

Last synced: 01 May 2026

https://github.com/ndohvich/ibm-data-science-professional-certificate

Kickstart your career in data science & ML. Build data science skills, learn Python & SQL, analyze & visualize data, build machine learning models. No degree or prior experience required.

coursera dash data-analysis data-science html5 ibm ibm-professional-certificate javascript machine-learnng python sql

Last synced: 16 Nov 2025

https://github.com/com-480-data-visualization/project-2023-the-vizards

Lausanne Transportation : a data visualization of the Lausanne Transportation network. Developed by the Vizards team as part of the EPFL Data Visualization course project (COM-480).

buses data-analysis data-science data-visualization epfl lausanne map metro public-transport public-transportation switzerland webgl

Last synced: 01 May 2026

https://github.com/enamhasan/analyzing-the-impact-of-recession-on-automobile-sales

Data Analyis and Visualization Dashboard of the Impact of Recession on Automobile Sales

dashboard data-analysis data-science data-visualization pandas plotly plotly-dash python

Last synced: 05 May 2026

https://github.com/archie-cm/credit_risk_model_vix_id-x_partners

The objective project is to decrease the company's losses by up to 30% through bad loans by creating a machine learning system to assist in automating loan assessments

credit-risk data-analysis data-visualization machine-learning scorecard

Last synced: 01 May 2026

https://github.com/ahmednasef3/udemy-courses-full-eda

Exploratory Data Analysis on the factors that can affect the promotions and earnings in Udemy Courses and the perfect way to make a good saled course in Udemy.

data-analysis data-science data-visualization eda exploratory-data-analysis matplotlib pandas seaborn udemy-course-project

Last synced: 01 May 2026

https://github.com/dangerousfish/uk-climate-trends-dashboard-metoffice

A data pipeline and Streamlit dashboard that aggregates, cleans and visualises historical UK Met Office station data - interactive charts, heatmaps and maps for temperature, rainfall and sunshine.

climate climate-analysis climate-change climate-data climate-science data-analysis data-visualization metoffice metofficeweather streamlit temperature weather

Last synced: 02 May 2026

https://github.com/vitia-fritelle/analise_dieese

Análise realizada com base nos dados extraídos do site https://www.dieese.org.br/analisecestabasica/salarioMinimo.html

data-analysis economic-data

Last synced: 09 Apr 2025

https://github.com/anandanraju/youtube-data-api-model

The YouTube Analytics API enables you to generate custom reports containing YouTube Analytics data. The API supports reports for channels and for content owners. Report fields are characterized as either dimensions or metrics

analytics data-analysis data-science metrics model python telemetry youtube youtube-api

Last synced: 03 May 2026

https://github.com/zeynepcol/data-analysis-visualization

Data visualization and interactive analytics - Olympics Dataset

data-analysis data-science data-visualization matplotlib pandas plotly python scipy seaborn streamlit

Last synced: 03 May 2026

https://github.com/faisal-khann/diwali-sales-analysis

The "Diwali Sales Analysis" project aims to analyze the sales data during the Diwali festival period to uncover insights and trends that can help improve marketing strategies and sales performance in the future

csv data-analysis eda jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 11 Apr 2026

https://github.com/0xjeremy/me-18-final

Data collection and Analysis tools for IMUs

data-analysis imu raspberry-pi

Last synced: 03 May 2026

https://github.com/tqhungdev0605/crawl_200_jd_dataanalyst

Automate job data scraping for 200 Data Analyst postings on https://vn.indeed.com using Python

data-analysis jupyter-notebook python3 scraping selenium

Last synced: 11 Apr 2026

https://github.com/shadan100/stroke-prediction-analysis

A web based application to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

artificial-intelligence data-analysis data-science django django-framework jupyter-notebook machine-learning matplotlib pandas predictive-modeling python stroke-prediction web-application

Last synced: 08 Mar 2026

https://github.com/narenkhatwani/arkouda-projects

This repository contains the source codes of the projects done using Arkouda (a software package that allows a user to interactively issue massive parallel computations on distributed data using functions and syntax that mimic NumPy, the underlying computational library used in most Python data science workflows.)

arkouda data-analysis data-analytics data-science high-performance high-performance-computing highperformancecomputing numpy pandas parallel-computing parallel-processing parallelization python

Last synced: 17 Apr 2026

https://github.com/angelgardt/wlm-sdarp-old

World of Linear Models: Statistics & Data Analysis in R for Psychologists

data-analysis data-visualization gh-pages manim-animations quarto r rstudio statistics

Last synced: 04 May 2026

https://github.com/gracysapra/r-in-data-science

This repository contains essential guides for data analysis using R, covering topics like data preparation, data reshaping, and data visualization. Each file focuses on fundamental techniques to manipulate, clean, and visualize data effectively using R programming.

data-analysis data-preparation data-reshaping data-science data-visualization data-visualizations ggplot r r-for-data-science

Last synced: 19 Apr 2026

https://github.com/shuddha2021/stellar-candidate-selector

A sophisticated candidate selection algorithm leveraging multi-criteria analysis and machine learning to identify top software engineering candidates. This tool features flexible filtering, score adjustment, and detailed visualizations to streamline the recruitment process.

candidate-selection data-analysis data-visualization machine-learning pandas plotting-in-python python python-data-analysis recruitment scikit-learn

Last synced: 05 May 2026

https://github.com/gonzalo123/pivot.pandas

Data Analysis with Python. Pivot tables with Pandas

data-analysis jupyter-notebook pandas pivot-tables python

Last synced: 05 May 2026

https://github.com/aymane-maghouti/sentiment-analysis-for-jumia-reviews-and-smartphone-price-prediction-system

The project focuses on customer sentiment analysis for Jumia, aiding informed online decisions. It collects and analyzes product comments to determine sentiments and implements a decision-making algorithm. Additionally, it includes product price prediction system using regression techniques.

beutifulsoup data-analysis data-cleaning data-collection data-preprocessing data-scraping data-visualization eda falsk machine-learning python web-application

Last synced: 18 Apr 2026

https://github.com/subhojit45/python3-iphones-x-flipkart-sales-analysis

A simple six questions and their insights derived from iphone sales on Flipkart dataset.

data-analysis jupyter-notebook python3 visual-studio-code visualization

Last synced: 19 May 2026

https://github.com/mr-vozhyk/karpov.courses-study

Часть заданий, мини-проектов и финальный проект от karpov.courses

airflow data-analysis git python sql statistics

Last synced: 05 May 2026

https://github.com/rcv911/lyapunov-indicators

Calculating Lyapunov indicators with multiprocessing in Python

data-analysis lyapunov lyapunov-indicators multiprocessing

Last synced: 18 Jan 2026