An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/vansh-py04/data-extraction-and-text-analysis

The objective of this assignment is to extract textual data articles from the given URL and perform text analysis to compute variables that are explained

data-analysis data-extraction data-science nlp nlp-machine-learning python textanalysis webscraping

Last synced: 24 Apr 2026

https://github.com/shadan100/sales-prediction-analysis

The aim is to build a predictive model and find out the sales of each product at a particular store. Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.

artificial-intelligence data-analysis data-science django django-framework jupyter-notebook machine-learning matplotlib pandas predictive-modeling python sales-prediction

Last synced: 01 Mar 2026

https://github.com/is-leeroy-jenkins/sherpa

A budget execution & data analysis tool based on Winforms, .NET 6, and written in C# for EPA analysts

budget-management data-analysis data-science data-visualization federal-government

Last synced: 13 May 2026

https://github.com/anandanraju/youtube-data-api-model

The YouTube Analytics API enables you to generate custom reports containing YouTube Analytics data. The API supports reports for channels and for content owners. Report fields are characterized as either dimensions or metrics

analytics data-analysis data-science metrics model python telemetry youtube youtube-api

Last synced: 03 May 2026

https://github.com/seankwarren/water-quality-analysis

An examination of water quality in the Atlanta watershed with a focus on identifying neglected areas and potential strategies for improving water quality monitoring

analytics data-analysis jupyter-notebook python

Last synced: 03 May 2026

https://github.com/stastnypremysl/lsql-csv

lsql-csv is a tool for small CSV file data querying from a shell with short queries. It makes it possible to work with small CSV files like with a read-only relational databases. The tool implements a new language LSQL similar to SQL, specifically designed for working with CSV files in a shell. LSQL aims to be a more lapidary language than SQL.

csv data-analysis data-processing haskell language linux-shell lsql lsql-csv new-language query-language relational-database sql unix-command unix-philosophy unix-shell

Last synced: 25 Feb 2026

https://github.com/zpreisler/modules

Python libraries and modules for processing simulation outputs

data-analysis python scripts tensorflow

Last synced: 13 May 2026

https://github.com/hetuvpatel/research-chatgpt

Research and data analysis project evaluating the social, ethical, and educational impacts of ChatGPT using survey-driven insights and Python-powered data analysis. 📚🤖

data-analysis matplotlib pandas python seaborn

Last synced: 01 May 2026

https://github.com/elissorokin/data-analyst-portfolio-rus

Это репозиторий, в котором я демонстрирую свои навыки, делюсь проектами и отслеживаю прогресс в области анализа данных и Data Science.

ab-testing data data-analysis datalense matplotlib numpy pandas plotly portfolio postgresql python scipy seaborn sql statistical-analysis

Last synced: 25 Feb 2026

https://github.com/ryanfranklin237/data-cleansing

A group of python scripts that clean large data sets by removing duplicate data, putting data in correct formats, and removing redundant cells

data-analysis data-cleaning data-science extract-transform-load pandas-dataframe python

Last synced: 23 Jun 2026

https://github.com/iguptashubham/pizzahut-analysis-sql

best dataset for data analysis. Pizzahut data analysis done by Shubham Gupta in MySql. This dataset is provided by friend of mine intern at pizzahut. In pizzahut, they used this dataset to train and ask question. This data does not reveal anything about the pizzahut. It is safe to share. data

data-analysis data-analytics database dataset datasets mysql mysql-database pizzahut

Last synced: 14 May 2026

https://github.com/antononcube/wl-datareshapers-paclet

Wolfram Language (aka Mathematica) paclet for data reshaping functions, like, long- and wide form, cross tabulation, etc.

contingency-table cross-tabulation data-analysis data-transformation long-form wide-form

Last synced: 20 Mar 2026

https://github.com/madeiradata/microsoft-data-analysts-club

Open-source Repository of Useful Scripts and Solutions for Microsoft Data Analysts

data-analysis data-visualization microsoft-data-analysis powerbi powerbi-report

Last synced: 19 Mar 2026

https://github.com/antononcube/wl-quantileregression-paclet

Wolfram Language (aka Mathematica) paclet that provides various Quantile Regression functions.

data-analysis machine-learning quantile-regression time-series time-series-analysis

Last synced: 20 Mar 2026

https://github.com/pferreirafabricio/data-immersion

🏊🏻‍♂️ Activities and exercises from 'Imersão Dados' event

data data-analysis data-science dataset jupiter-notebook python

Last synced: 14 May 2026

https://github.com/rogernet/desafio-profissional-produto-data-driven

Ajudar a formar Analistas de Produto, PMs e Gestores de Negócio capazes de tomar decisões estratégicas baseadas em dados.

data-analysis data-science data-visualization product

Last synced: 23 Jun 2026

https://github.com/flyingfathead/neurograph-framework

A versatile tool for visualizing entropy loss in TensorFlow-based neural network training, providing insightful scatter plots with annotations.

data-analysis data-analysis-python data-visualization entropy graph graphs neural-network neural-networks neural-networks-visualization nn python python3 tensorflow tensorflow2 training visualization visualization-tools

Last synced: 24 Apr 2026

https://github.com/scarblase/salary-comparison

Submission for the DataCamp Salary Competition(1 level). 🏆

data data-analysis data-science data-visualization engineering python sql structured-data

Last synced: 01 May 2026

https://github.com/abhik1711/material-classification-and-energy-band-prediction---excavate-25

A Two-Stage Machine Learning Pipeline: A Binary Classifier to identify insulators with high accuracy and a Stacking Regressor to predict precise band gap values for insulators by leveraging advanced feature engineering techniques and ensemble learning methods

data-analysis machine-learning python

Last synced: 23 Jun 2026

https://github.com/nmsby/pca-machine-learning-lab

Principal Component Analysis (PCA) implementation and analysis lab for Machine Learning. Features manual PCA implementation, scikit-learn applications, data compression, and feature extraction with detailed visualizations.

data-analysis dimensionality-reduction jupyter-notebook machine-learning numpy pca python scikit-learn visualization

Last synced: 01 May 2026

https://github.com/dangerousfish/uk-climate-trends-dashboard-metoffice

A data pipeline and Streamlit dashboard that aggregates, cleans and visualises historical UK Met Office station data - interactive charts, heatmaps and maps for temperature, rainfall and sunshine.

climate climate-analysis climate-change climate-data climate-science data-analysis data-visualization metoffice metofficeweather streamlit temperature weather

Last synced: 02 May 2026

https://github.com/keganedwards/housing-prices-exploration

Using machine learning algorithms to explore housing prices

data-analysis data-science python school-project

Last synced: 24 Apr 2026

https://github.com/cs-joy/pandasv2.0.3

learn data analysis with pandas

data-analysis pandas pandas-learning

Last synced: 03 May 2026

https://github.com/adagio/ivoox_episodes

iVoox Episodes: Scraping & Analysis

beautifulsoup4 data-analysis ivoox pandas python python3 scraping

Last synced: 20 Apr 2026

https://github.com/lijesh010/globalsuperstoresalesanalysis

The Global Superstore Sales Analysis repository showcases a comprehensive Power BI dashboard that provides valuable insights into sales performance. This project is designed to present key information and trends to stakeholders, enabling informed decision-making.

dashboard data-analysis data-visualization msexcel power-bi sales-analysis

Last synced: 19 Mar 2026

https://github.com/tnleite/projeto_king_lift

Este projeto apresenta uma análise detalhada dos dados financeiros da King Lift, uma empresa de locação de empilhadeiras. Utilizando Microsoft Excel, Power Query e Power Pivot, desenvolvi um dashboard interativo, também em Excel, que ajuda a empresa a obter insights valiosos para melhorar a eficiência operacional e aumentar o faturamento.

data-analysis data-science data-visualization excel

Last synced: 19 Mar 2026

https://github.com/harshmule1/store-sales-analysis

Sales Analysis Using Power Bi

data-analysis powerbi

Last synced: 19 Mar 2026

https://github.com/billgewrgoulas/hypothesis-testing-on-37-seasons-of-nba

First assignment for the course Data Mining @CSE.UOI

data-analysis data-science numpy scipy seaborn statistics

Last synced: 01 May 2026

https://github.com/jrbourbeau/cr-composition

IceCube cosmic-ray composition analysis

cosmic-rays data-analysis machine-learning physics python

Last synced: 20 Apr 2026

https://github.com/sunnybibyan/random_data_generation

A project that generates a dataset using various statistical distributions (Normal, Uniform, Exponential, Random Integers, and Binomial) and performs data analysis. Includes visualizations and an option to export the data as a CSV file.

data-analysis data-visualization python random-data-generation statistics streamlit-webapp

Last synced: 13 Jun 2026

https://github.com/abhi18av/innovation-competition

Submission for a programming challenge

clojure clojurescript data-analysis

Last synced: 13 Jun 2026

https://github.com/msthamizh/phonepe-pulse-data-visualization-and-exploration

Developing a Streamlit application that allows users to explore and analyze transaction data from the PhonePe Pulse dataset. The project aims to provide insights into digital payment trends across India.

data-analysis data-visualization dataframe mysql pandas plotly python streamlit

Last synced: 02 May 2026

https://github.com/ferrangarciarovira/premier-league-betting-analysis

Comprehensive Python analysis of Premier League betting market inefficiencies (2005–2024). Evaluates bookmaker biases, betting strategies, and market efficiency using statistical methods and Monte Carlo simulations.

betting-strategies bias-detection data-analysis market-efficiency monte-carlo-simulation premier-league python sports-analytics

Last synced: 03 May 2026

https://github.com/leosimoes/uerj-tcc-analisador-dados

Trabalho de conclusão de curso (TCC) em Engenharia de Computação. Aplicativo Web para preparação e análise de dados, criação de gráficos e modelos de regressão linear e logistica.

computer-engineer data-analysis data-science data-visualization linear-logistic linear-regression python streamlit

Last synced: 24 Apr 2026

https://github.com/rakumar99/power-bi-projects

This repository contains various power bi projects and dashboards of Humaan Resources , Financial Analysis using Power BI Desktop.

dashboards data-analysis data-visualization databases datacleaning datamodeling etl powerbi powerquery reports

Last synced: 04 Jun 2026

https://github.com/lisa-ho/breadit

Respository for scraping and analysing data from the Reddit/Sourdough community to explore lockdown baking trends.

data-analysis data-viz nltk python reddit-api sentiment-analysis web-scraping

Last synced: 01 May 2026

https://github.com/riddhis2226/titanic-survival-data-analysis

Titanic-Survival-Data-Analysis : Analyze passenger data from the Titanic to predict survival based on features like age, gender, class, and fare.

data-analysis data-mining data-science data-visualization database jupyter-notebook machine-learning-models machinelearning-python plotlyjs python3

Last synced: 01 May 2026

https://github.com/reinmagine/eliminating-no-sensor

Contains my project that analyzes air quality sensor data to determine if the NO (Nitric Oxide) sensor in N. Mai, Los Angeles, CA can be removed without affecting data accuracy.

air-quality-sensor colab-notebook cost-optimization data-analysis data-optimization matplotlib-python nitric-oxide pyspark-python python sql

Last synced: 14 Jun 2026

https://github.com/seekinginfiniteloop/fedcal

A feature-rich Python calendar that enables time series analyses of changes in federal workforce schedules and shifts in executive department funding status.

data-analysis data-science econometrics economic-data economics federal federal-government hr pandas pandas-library pandas-python pydata python

Last synced: 15 Apr 2026

https://github.com/soufianboukir/ecom-analytics-platform

End-to-end data science project on an Amazon sales dataset, including data preprocessing, analysis, modeling, and a Streamlit dashboard for insights and decision-making.

data-analysis data-science data-visualization data-visualization-dashboard forecasting-models timeseries

Last synced: 14 Jun 2026

https://github.com/anthonybench/datapeek

Peek summary of datafile in a succinct, opinionated manner.

cli data data-analysis

Last synced: 02 Mar 2026

https://github.com/denisecase/nlp-03-text-exploration

Exploratory analysis of text corpora using tokenization, frequency, co-occurrence, and bigrams to reveal structure in text.

bigrams co-occurence corpus-analysis data-analysis nlp python text-analysis text-exploration tokenization

Last synced: 02 Jun 2026

https://github.com/kinshuk-code-1729/data-visualisation-using-python

This Repository consists of several python snippets for creating Two-Dimensional (2D) Graphics

data-analysis data-science data-visualization matplotlib visualization

Last synced: 02 Jun 2026

https://github.com/savinrazvan/degrees

A program that computes the "degrees of separation" between two actors by identifying the sequence of movies connecting them, inspired by the Six Degrees of Kevin Bacon game. Uses IMDb-based datasets for actors, movies, and their relationships.

actor-connections ai data-analysis degrees-of-separation educational-project graph-theory imdb movie-database python six-degrees-of-kevin-bacon

Last synced: 24 Apr 2026

https://github.com/kaz-yos/distributed

Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulation Study (Pharmacoepidemiol Drug Saf 2018)

data-analysis epidemiology statistics

Last synced: 15 Jun 2026

https://github.com/ensinho/data-analysis

My repository for data analysis studys in Python.

csv data-analysis graphs python python-documentation

Last synced: 15 Jun 2026

https://github.com/mr-chang95/loan_data_visualization

Data Visualization Project for Udacity's Data Analyst Program. Using Python in Jupyter Notebook.

data-analysis data-visualization jupyter-notebook loans python udacity-data-analyst-nanodegree

Last synced: 24 Apr 2026

https://github.com/chetanmalviya513/Firm-Financial-Transaction-Analysis

📊 Financial Analysis & Forecasting Processed large-scale financial data using Python for trend analysis and insights. Developed interactive Tableau dashboards to improve forecasting accuracy and reduce costs by 25%.

data-analysis financial-data forecasting insights msexcel pandas python reporting tableau-dashboards

Last synced: 15 Jun 2026

https://github.com/virajbhutada/movie-rental-store-analytics-sql-powerbi-excel

Dive into the DVD rental industry with my Capstone project, Movie Rental Analytics. Analyzing the Sakila DVD Rental Store Database, I extract insights through exploratory data analysis (EDA) and Power BI visualizations. Findings inform strategies for optimizing film inventory, enhancing business operations, and customer experiences.

business-intelligence capstone-project customer-behavior-analysis data-analysis data-science excel exploratory-data-analysis film-ratings mece movie-database movie-rental mysql powerbi powerbi-visuals revenue-analysis sql sql-database

Last synced: 05 Jun 2026

https://github.com/mirokeimioniemi/classifying-software-pirates

Exploring the factors driving people into software piracy by training two machine learning models to predict whether a person with certain characteristics and sentiments is likely to possess any pirated software or not using a dataset collected via a survey targeting users of music production software.

data-analysis data-science decision-tree-classifier logistic-regression machine-learning piracy python software-piracy survey

Last synced: 06 May 2026

https://github.com/archie-cm/a-b-testing-mobile-games

This project have objective to examine what happens when the first gate in the game was moved from level 30 to level 40. When a player installed the game, he or she was randomly assigned to either gate30 or gate40.

abtesting data-analysis python retention-rate

Last synced: 17 Apr 2026

https://github.com/mr-vozhyk/karpov.courses-study

Часть заданий, мини-проектов и финальный проект от karpov.courses

airflow data-analysis git python sql statistics

Last synced: 05 May 2026

https://github.com/scarblase/homeless-animals-analysis

A data-driven exploration of homeless animal statistics 🐶🐱. Analyze age distribution, shelter dynamics, and adoption patterns using Python, Pandas, and Seaborn.

animals data-analysis data-mining data-science data-science-projects data-visualization matplotlib matplotlib-pyplot numpy pandas plotly python python3 ukraine

Last synced: 06 May 2026

https://github.com/bhavik444/techistanbul_python_bootcamp

👨💻 Master Python programming through practical exercises in this 80-hour bootcamp, designed for beginners to advanced learners.

algorithms api-development automation coding-bootcamp data-analysis data-visualization django flask git machine-learning python software-engineering testing web-development web-scraping

Last synced: 28 Apr 2026

https://github.com/alxrm/scent-of-literature

Russian literature sentiment analysis in terms of very small dataset

classification data-analysis sentiment-analysis sklearn tf-idf

Last synced: 28 Apr 2026

https://github.com/carmoreno/analisisaccidentalidadbogota

Data Analysis about traffic accidents at Bogotá, Colombia.

data-analysis data-science jupyer-notebook matplotlib numpy pandas scikit-learn

Last synced: 17 Apr 2026

https://github.com/mgobeaalcoba/pandas_y_numpy

Trabajos realizados con estas librerías de Python para manejo de datos.

data-analysis data-science dataset numpy pandas python

Last synced: 28 Apr 2026

https://github.com/com-480-data-visualization/project-2023-the-vizards

Lausanne Transportation : a data visualization of the Lausanne Transportation network. Developed by the Vizards team as part of the EPFL Data Visualization course project (COM-480).

buses data-analysis data-science data-visualization epfl lausanne map metro public-transport public-transportation switzerland webgl

Last synced: 01 May 2026

https://github.com/luminati-io/shopee-dataset-samples

A sample dataset of over 1000 Shopee products, extracted using the Bright Data API, ideal for pricing optimization, gap analysis, and market strategy refinement..

api data-analysis data-mining datasets products shopee web-scraping

Last synced: 12 Feb 2026

https://github.com/thameran/mmar

Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

cli data-analysis datascience finance generator github-config go haskell hurst markdown mmar mmark time-series xml2rfc

Last synced: 06 May 2026

https://github.com/reza-saeedi-coding/netflix-data-analysis

A complete end-to-end Netflix dataset analysis using Python, SQL, and Matplotlib. Explores genres, content ratings, and trends using exploratory data analysis and visualizations.

data-analysis data-cleaning eda matplotlib netflix pandas portfolio-project python sql sqlite

Last synced: 17 Apr 2026

https://github.com/mrsamsonn/monolithic-polylithic-crystal-segmentation

A grid segmentation algorithm for clustering crystal structures using diffraction patterns. Useful in material science and nanotechnology, this code enables detailed analysis of crystals for research and industrial applications.

clustering crystal-structure crystallography data-analysis diffraction-patterns grid-segmentation image-processing k-means machine-learning matertial-science nanotechnology python research-project research-tools scientific-computing

Last synced: 28 Apr 2026

https://github.com/ac12644/fractz-ai-data-analyst

Analyze data and gain insights instantly with FRACTZ's AI Data Analyst. Flexible, fast analytics tailored to your needs.

ai data-analysis data-visualization

Last synced: 01 Feb 2026

https://github.com/githubuseraccountamazing/the-amari-project

a project in which I attempted to push some of the limits of stable-diffusion while taking some data along the way

ai ai-generated-images bash data-analysis machine-learning stable-diffusion textual-inversion

Last synced: 05 May 2026

https://github.com/pngo1997/axa-xl-insurance-bi-dashboard

Provides a comprehensive analysis of insurance submissions, approvals, compliance rates, and profitability for AXA XL Insurance.

bi-analytics bi-dashboard business-analytics data-analysis filtering performance-analysis powerbi segmentation visualization

Last synced: 08 Feb 2026

https://github.com/kirkalyn13/opensignal_autogenerate_report

Script used to generate results/summary, including the trends of flagged provinces, from the raw excel data file,

data-analysis data-science data-visualization matplotlib numpy pandas python

Last synced: 06 May 2026

https://github.com/jongan69/potion-leaderboard

Start of Entry for potion leaderboard contest

data-analysis leaderboard potion trading

Last synced: 11 Jun 2026

https://github.com/amirhosseinhonardoust/customer-sentiment-intelligence-platform

An enterprise-grade NLP + Streamlit + SQL platform for analyzing customer feedback. Performs automated sentiment detection, stores labeled reviews in SQLite, and delivers real-time dashboards with probability insights to support business, marketing, and product optimization decisions.

community-project cost-of-living dashboard data-analysis data-visualization economic-analysis inflation-tracking local-data open-data pandas price-tracker public-insight python sqlite streamlit

Last synced: 06 May 2026

https://github.com/chahiriabderrahmane/carpricepredictor

🚗 Cars Exploration & Price Prediction | Analyzing Cars.com Listings

data-analysis data-science data-visualization machine-learning python streamlit web-scraping

Last synced: 08 Feb 2026

https://github.com/namratha2301/best-selling-books

Comprehensive examination of best-selling books, focusing on understanding sales patterns, genre distributions, and the impact of various features on book performance.This project aims to predict book sales and classify genres, providing valuable insights for authors, publishers, and readers.

data-analysis data-visualization matplotlib pandas sckiit-learn seaborn

Last synced: 06 May 2026

https://github.com/sandk21/detection_faux_billets

Algorithme de détection de faux billets selon leurs dimensions géométriques et application web pour générer les prédictions

data-analysis data-science data-visualization machine-learning pandas python scipy sklearn streamlit

Last synced: 03 Apr 2026

https://github.com/billy-enrizky/yelpfusion

Finding All restaurants in the Maryland area using YelpFusion API

data-analysis pandas yelp-api yelpfusion

Last synced: 28 Apr 2026

https://github.com/mrankitgupta/titanic-survival-prediction-93-xgboost

Titanic Survival Prediction Project (93% Accuracy)🛳️ In this notebook, The goal is to correctly predict if someone survived the Titanic shipwreck using different Machine Learning Model & Hyperparameter tunning.

classification data-analysis data-science data-visualization gradient-boosting kaggle-competition linear-regression logistic-regression machine-learning machine-learning-algorithms ml ml-models nlp prediction predictive-modeling random-forest titanic titanic-kaggle titanic-survival-prediction xgboost

Last synced: 06 May 2026

https://github.com/com-480-data-visualization/project-2023-choo-choo-data-darlings

This repository contains the source code for our data visualization project, an interactive platform designed to explore the intricate Swiss transportation network. Developed by the Choo Choo Data Darlings team at EPFL, the project provides an in-depth view into the vast array of Swiss transportation operations, including trains, buses, and trams.

boats buses data-analysis data-science data-visualisation data-visualization epfl metro public-transport public-transportation switzerland trains trams

Last synced: 01 May 2026

https://github.com/arhcoder/base-hackathon-2022

💸 Sistema que analiza las facturas de compra-venta de una empresa de importaciones y exportaciones, y crea una base de conocimiento con la que crea sugerencias de abastecimiento para las empresas clientes de Banco BASE, con el fin de ahorrarles dinero.

algorithms bank companies data-analysis decision-making exportation hackaton importation javascript mysql python suggestions

Last synced: 16 Apr 2026

https://github.com/kaushik0911/jubilant-guide

A Streamlit application for advanced route planning and accessibility analysis using OpenRouteService (ORS). Explore optimal routes while avoiding roadblocks and discover points of interest (POIs) within travel time ranges.

data-analysis data-visualization geospatial-analysis python streamlit

Last synced: 16 Jun 2026

https://github.com/priyanshubiswas-tech/deloitte-daikibo-forensic-analysis-task-2

Forensic pay equity analyzer for Deloitte. Processes compensation data to classify gender equality scores into Fair/Unfair/Discriminative tiers. Outputs modified Excel with 3-tier evaluation system.

data data-analysis deloitte excel forensic-analysis

Last synced: 06 Feb 2026

https://github.com/mattdelaune/retail_rfm_analysis

Power BI multi-page report leveraging advanced data visualization for RFM analysis. Delivers deep analytical insights into customer behavior, engagement, and spending patterns, driving strategic business decisions.

data-analysis dax powerbi report rfm-analysis sales-data visualization

Last synced: 19 Mar 2026

https://github.com/rakumar99/jp-morgan-chase-virtual-internship

This repository contains the various tasks assigned by JPMorgan Chase & Co. Virtual Internship on Microsoft Excel

conditional-formatting dashboard data-analysis data-visualization hlookup pivot-tables presentation vba-macros vlookup

Last synced: 02 Mar 2026

https://github.com/vyjayanthipolapragada/early_sepsis_detection_ml

An end-to-end project leveraging clinical datasets (PhysioNet, MIMIC-IV, MIMIC-IV-ED) to develop and compare ML and LSTM-based models for early sepsis prediction.

data-analysis data-visualization deep-learning healthcare jupyter-notebook keras-tensorflow lstm-neural-networks machine-learning neural-network python

Last synced: 28 Apr 2026