An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/rajesh9943/web-scraping-analysis-of-top-us-company-revenue-growth-in-2023

Explore the landscape of US business growth in 2023 with our dynamic project, 'Web Scraping for US 2023 Revenue Growth.' Utilizing advanced web scraping techniques, we unveil insights into the top companies driving economic expansion.

cleaning-data data data-analysis data-visualization manipulation numpy pandas pre-fill

Last synced: 16 Aug 2025

https://github.com/dcs-training/much-ado-about-nothing-missing-data-in-research

Repo for the Much ado about nothing workshop. Go to the Readme file

data-analysis data-cleaning data-wrangling r

Last synced: 15 Jun 2025

https://github.com/aakk23/netflix_sql_project

This SQL project provides an analytical overview of Netflix's movies and TV shows dataset, uncovering key insights related to content types, ratings, release trends, and geographic distribution. It helps explore patterns in content availability, audience targeting, and regional preferences to support data-driven decisions.

data-analysis netflix-data-analysis postgresql sql

Last synced: 10 Apr 2025

https://github.com/ritap03/neuralnetwork-shapeclassifier

Feedforward neural network system in MATLAB for geometric shape classification. Includes data preprocessing, network training and evaluation, confusion matrix analysis, and a graphical interface for user interaction and model testing.

ai data-analysis deep-learning feedforward-network gui image-classification machine-learning matlab neural-network pattern-recognition

Last synced: 14 May 2026

https://github.com/jackmnob/python-tableau-eda-stockdash

Data cleaning, preparation, and manipulation (EDA) for an interactive stock market dashboard with Tableau - using pandas (Python) via JupyterLab

cleaning-data dashboard data-analysis data-preparation eda jupyter-notebook jupyterlab python tableau-public

Last synced: 14 May 2026

https://github.com/amoghkori/deeplabcut-package-for-animal-pose-estimation

DeepLabCut Mouse Location Prediction: Training a deep neural network to predict the location of a mouse using annotated joint positions.

data-analysis data-annotations data-preprocessing deep-learning machine-learning model-evaluation python-programming research research-project

Last synced: 17 Mar 2025

https://github.com/sadratehranian/pem-fuel-cell

The methodology section details the use of Python for data processing and analysis, employing statistical and machine learning-based anomaly detection techniques to identify potential issues in fuel cell stacks. It emphasizes data preprocessing, feature engineering, exploratory data analysis (EDA), and anomaly detection.

anomaly-detection data-analysis data-science data-visualization exploratory-data-analysis feature-engineering fuel-cell machine-learning preprocessing python statistical-analysis visual-studio-code

Last synced: 26 Mar 2025

https://github.com/zulfachafidz/green_horizon_forecasting_peak_organic_avocado_sales_with_the_prophet_algorithm

The Green Horizon Project leverages the Prophet algorithm to predict peak sales of organic avocados, supporting the campaign "APEAM GO ORGANIC." Using Python and Looker Studio, this analysis aims to provide deep insight into sales trends and potential, forming the basis of smarter marketing strategies.

algorithm algorithms analytics data data-analysis data-engineering data-mining data-science data-visualization forecasting machine-learning machine-learning-algorithms prophet-model python python-script

Last synced: 17 May 2026

https://github.com/hemangsharma/bookingdataanalysisreport

The report helps understand key trends and insights around customer bookings, pricing, and other related attributes.

analysis data data-analysis data-analytics data-visualization streamlit streamlit-dashboard

Last synced: 14 May 2026

https://github.com/dadvaiahpavan/ai-data-scientist-

AI-powered tool for dataset analysis, featuring data preprocessing, classification, regression, anomaly detection, and text analysis. Built with scikit-learn, pandas, and Plotly for visualization. Includes an interactive Streamlit web interface for real-time data analysis.

ai anomaly-detection classification data-analysis data-science machine-learning panda plotu regression scikit-learn sentiment-analysis streamlit

Last synced: 03 May 2026

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 13 May 2026

https://github.com/codeonthespectrum/web-scrap

Este projeto realiza o web scraping da Wikipédia para obter dados sobre os municípios mais populosos do estado do Rio de Janeiro.

data-analysis data-visualization webscraping

Last synced: 16 Feb 2026

https://github.com/jibbs1703/airline-data-analysis

This repository contains the Exploratory Data Analysis of the flight delay and cancellation for airline flights in the United States in the year 2015. With this EDA, insights and solutions are suggested for business owners and airport managers.

business-insights business-solution data-analysis data-visualization

Last synced: 20 Mar 2025

https://github.com/muneeb706/r-programming

R-Programming examples for data analysis.

data-analysis r-programming

Last synced: 26 Mar 2025

https://github.com/juliargubolin/sql-for-data-analysis

This repository was created in order to insert all the documents, files and notes I took while learning SQL and data analysis through "SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights" by Cathy Tanimura (O'Reilly).

advanced data-analysis data-science sql

Last synced: 11 Jan 2026

https://github.com/alexzalox/us_stocks

Read US stock tickers and their costs from a CSV and display the formatted DataFrame in the terminal using pandas.

data-analysis finance pandas python python3 stocks yfinance

Last synced: 15 May 2026

https://github.com/wwgolay/hr1099-timelapse-vlbi

The repository for HR1099 timelapse VLBI.

astronomy astrophysics data-analysis website

Last synced: 03 Apr 2025

https://github.com/vipulbunny/ml-learning_projects

A collection of machine learning projects implemented in Python, showcasing core concepts like regression, classification, clustering, and model evaluation techniques. Ideal for learners and data science enthusiasts.

classification clustering data-analysis data-science data-visualization decision-trees jupyter-notebook machine-learning model-evaluation random-forest regression supervised-learning unsupervised-learning

Last synced: 23 Jul 2025

https://github.com/hevalhazalkurt/word_analyser

A web app developed in Python and Django that analyzes given text mathematically and sentimentally.

analyzer analyzes content data-analysis django emotion python python3 sentiment sentiment-analyser sentiment-analysis text text-analysis

Last synced: 19 May 2026

https://github.com/jpgiant/training_project

Analyzing whether there is a difference between the average death ages of left handers and right handers using Bayesian Conditional Probability Theorem.

bayesian-statistics data-analysis data-visualization numpy pandas-dataframe python

Last synced: 30 Apr 2026

https://github.com/aalkiyumi/project-3-docker-container-for-data-processing-script

This Dockerized Python application analyzes two text files (IF.txt and AlwaysRememberUsThisWay.txt). It counts total words, identifies the largest file, and finds the top three most frequent words in each. Results are saved to an output file and printed to the console.

cs5165 data-analysis data-engineering data-science docker introduction-to-cloud-computing statistical-analysis text-processing uc uc2026 university-of-cincinnati

Last synced: 17 May 2026

https://github.com/anas436/data-science-projects

Explore my diverse collection of projects showcasing machine learning, data analysis, and more. Organized by project, each directory contains code, datasets, documentation, and resources. Dive in to discover insights and techniques in data science. Reach out for collaborations and feedback.

data-analysis data-science machine-learning

Last synced: 27 Mar 2025

https://github.com/k31ner/inmopipeline

Proyecto integral de análisis y modelado predictivo de datos inmobiliarios, que abarca recolección, transformación, visualización y machine learning utilizando Python y herramientas modernas de ingeniería y ciencia de datos.

data-analysis data-engineering data-science fastapi python streamlit

Last synced: 08 May 2026

https://github.com/vara-co/solar-eclipse-2024

Group Project on the 2024 Solar Eclipse's Path over the US with an interactive map and a couple of visualizations on the data gathered.

data-analysis data-visualizations html-css-javascript interactive-map javascript map solar-eclipse

Last synced: 15 May 2026

https://github.com/prakashjha1/whatsapp-chat-analyzer

WhatsApp Analyzer means we are analyzing our WhatsApp group activities. It tracks our conversation and analyses how much time we are spending or saying it as “wasting” on WhatsApp.

data-analysis data-science natural-language-processing pandas pyhton regular-expression

Last synced: 15 May 2026

https://github.com/annnieglez/computer-vision-parking-lot

This project leverages computer vision techniques to analyze parking lot occupancy. The goal is to detect available parking spaces in real-time using image and video input.

computer-vision data-analysis data-science data-visualization google-colab image-classification image-processing machine-learning python transfer-learning

Last synced: 15 May 2026

https://github.com/rijul007/market-basket-analysis-using-r

Market Basket Analysis using association rules, leveraging R’s powerful tools for data-driven retail strategies.

data-analysis data-science r

Last synced: 02 Apr 2025

https://github.com/nishumehta/uber-rides-data-analysis

An in-depth analysis of Uber ride data for the year 2016, to uncover patterns in ride behavior, mileage trends, and frequent start locations to generate actionable insights for business decisions.

data-analysis jupyter-notebook matplotlib-pyplot pandas python tableau-dashboards

Last synced: 09 May 2026

https://github.com/danicaalana/sales-review-sentiment-analysis

This project is a sentiment analysis project using a machine learning model. It analyzes Amazon product reviews to determine whether the sentiment expressed is positive, negative, or neutral using Multinomial Naive Bayes Method.

amazon data-analysis data-science machine-learning naive-bayes python sales-review sentiment-analysis

Last synced: 15 May 2026

https://github.com/pentalpha/bti-performance-study

A series of analysis on a large amount of data about the grades of students in the Technology Information course at UFRN

analysis big-data clustering data-analysis data-science data-visualization ipynb ipython jupyter-notebook performance-analysis plot python python3

Last synced: 15 May 2026

https://github.com/12danielll/neurogenomics_project

This project focuses on analyzing sequencing data to understand molecular mechanisms of neurological diseases and predict the effectiveness of immunotherapy in breast cancer patients. It integrates Python and R scripts for data processing, statistical analysis, and visualization, alongside a comprehensive report detailing methods and findings.

bioinformatics biostatistics clustering clustering-algorithms data-analysis data-visualization deseq2 differential-gene-expression functional-analysis immune-therapy machine-learning neurological-disease neuroscience pca-analysis python r seurat single-cell-analysis

Last synced: 06 Apr 2026

https://github.com/srummanf/elnino-anomaly-study

Study on El Niño’s impact on Chennai groundwater sustainability

data-analysis machine-learning python satellite-imagery-analysis

Last synced: 15 May 2026

https://github.com/cassiofb-dev/fide-rating-analysis

The plot speaks for itself

chess data-analysis fide hans rating

Last synced: 15 Jun 2025

https://github.com/fer-aguirre/cookiecutter-data-analysis-extensive

A cookiecutter template for data analysis projects using Python.

cookiecutter data-analysis project-template python

Last synced: 09 Apr 2025

https://github.com/scailfin/rob-webapi-flask

Default RESTful Web API implementation for the Reproducible Open Benchmarks for Data Analysis Platform (ROB) using the Flask web framework.

benchmarks data-analysis reproducibility webapi

Last synced: 17 Mar 2026

https://github.com/deliprofesor/customerseg-customer-segmentation-and-shopping-analysis

This project performs data exploration, segmentation, and modeling of wholesale customer data using clustering algorithms, PCA, and decision trees to analyze purchasing behavior and predict customer channel preferences.

clustering customer-segmentation data-analysis data-visualization dbscan decision-tree gmm kmeans machine-learning pca

Last synced: 24 Jun 2025

https://github.com/betkh/datascieneinpython

Jupiter Notebook files

data-analysis data-visualization

Last synced: 16 Jun 2025

https://github.com/czesctuklap/sustainable-fashion-database-analysis

This project, analyzes a dataset of sustainable fashion trends for 2024. It includes data preprocessing, exploration, visualization, and insights on environmental impact factors such as carbon footprint, water usage, waste production, and sustainability practices.

data-analysis data-visualization database dataset keggle sustainable-fashion

Last synced: 30 Apr 2026

https://github.com/fmind/malpop

Rank the popularity of malware applications by their occurrence on VirusTotal

data-analysis malware popularity ranking virustotal

Last synced: 11 Apr 2025

https://github.com/estevan-ulian/py-agent-voice

Um projeto para lidar com interações de voz entre humano e agente de I.A. permitindo a leitura e análise de dados de um arquivo CSV.

agent-based-modeling data-analysis python3 whisper-ai

Last synced: 11 Apr 2025

https://github.com/josedanielchg/nyc-schools-test-scores-exploration

DataCamp project analyzing NYC public school test scores to identify top math-performing schools, the best overall SAT scores, and borough-level variability using Python and pandas

data-analysis jupyter-notebook python

Last synced: 19 Mar 2025

https://github.com/swat1563/recommendation-system

This repository features a recommendation system and analytics engine using datasets on users, organizations, contents, contacts, events, and recommendations. It includes data preprocessing, building a recommendation system, and creating visual reports with Power BI.

analytics data-analysis data-visualization engine kaggle numpy pandas powerbi powerbi-dashboards powerbi-desktop powerbi-reports python recommendation-engine recommendation-system recommender-systems scikit-learn scipy

Last synced: 07 Jan 2026

https://github.com/damianmarti/big-mac-index

Data analysis from BigMac index

data-analysis data-science

Last synced: 03 Apr 2025

https://github.com/rijul007/diamonds-analysis-using-r

Diamonds data analysis using R, exploring relationships between diamond attributes (such as carat, cut, color, and clarity) and price, with a focus on providing insights for engagement ring selection through various statistical techniques and data visualizations including histograms, boxplots, scatter plots, and bar charts.

data-analysis data-science

Last synced: 25 Jan 2026

https://github.com/istinnew/enaic-s-discount-strategy-analysis

**(Open to Collaboration):** This project evaluates the impact of discounts on sales and customer retention for Eniac. It includes data cleaning, visualization, storytelling, and strategic insights to optimize discount strategies while maintaining brand reputation. 📊🛍️✨

cleaning-data cleaning-data-in-python cost-optimization data-analysis data-science data-visualization library presentation python visualization

Last synced: 03 Apr 2025

https://github.com/elliotone/nl-semantic-kernel-sales-analyzer

A console project showing Microsoft Semantic Kernel examples for sales data analysis using local AI models via LM Studio.

ai csharp data-analysis dotnet lm-studio local-ai machine-learning semantic-kernel

Last synced: 16 May 2026

https://github.com/sadia-khan13/modern_arts_data_cleaning

Welcome to the Data Cleaning project! This repository is dedicated to showcasing best practices and techniques for cleaning data using Pandas within Jupyter Notebook

data-analysis data-analysis-python data-cleaning data-science jupyter-notebook pandas-python

Last synced: 10 May 2026

https://github.com/alejandrolara11/desafio_latam_introduccion_analisis_de_datos

Repositorio del curso "Introducción al Análisis de Datos" de Desafío Latam. Ejercicios prácticos realizados durante el curso, enfocados en análisis de datos con Python, Pandas, y visualización básica.

data-analysis data-science data-visualization matplotlib numpy pandas python seaborn statsmodels

Last synced: 29 Apr 2026

https://github.com/jwt218/isonq

MATLAB package for Qtegra-generated data file processing.

data-analysis geochemistry isotopes matlab

Last synced: 03 Apr 2025

https://github.com/yasir-arafah/nyc-trip-fare-prediction-using-tcn

"NYC Trip Fare Prediction Using Temporal Convolutional Networks (TCN)" is a Data Analytics Project where the trip and fare data of NYC taxi are combined and then analyzed using Pyspark and visualized using Matplotlib library. The project predicts the fare by using Temporal Convolutional Neural Network.

colab data-analysis matplotlib nyc-taxi-dataset pyspark python

Last synced: 29 Apr 2026

https://github.com/pdiegel/currencytracker

A Python application that fetches real-time currency exchange rates from an API, securely stores the data in an SQLite database, and includes error handling, logging, and good programming practices for reliable and periodic data capturing.

analysis api currency data-analysis data-capture logging python python3 sqlite3 tracker

Last synced: 09 Sep 2025

https://github.com/michael-angelo-mootoo/quanta-app

Quanta is an open source statistical package app / toolkit for neuroscience and general computational descriptive and inferential statistics.

computational-statistics customtkinter data-analysis descriptive-statistics gui-application inferential-statistics neuroscience python r statistical-analysis statistics tkinter-python

Last synced: 16 May 2026

https://github.com/grindelfp/two-data-manipulative-tasks

Two simple tasks on data analysis and processing.

data-analysis ipynb mlda

Last synced: 17 Feb 2026

https://github.com/nick-peter-marcus/chocolate-bar-analysis

Analyzing Chocolate Bar Features and Ratings - Data Visualization, Decision Trees, Random Forest

data-analysis data-visualization decision-trees python random-forest seaborn sklearn

Last synced: 10 May 2026

https://github.com/satyacoder29/smartfinance-dynamic-financial-dashboard

SmartFinance: Dynamic Financial Dashboard is an interactive tool designed to visualize key financial metrics like revenue, expenses, and profit. It features real-time data updates, charts, slicers, and navigation for easy analysis. This dashboard helps businesses make data-driven decisions and optimize financial performance.

data-analysis data-cleaning data-modeling data-visualization powerbi powerbi-desktop powerbi-visuals powerquerym

Last synced: 13 Feb 2026

https://github.com/chrisrobertsjr/chrisrobertsjr

Welcome to my Github Profile!

data data-analysis java r sql statistics

Last synced: 03 May 2026

https://github.com/hassanislam463/sentiment_analysis_of_financial_news_headlines_and_affect_on_stock_price_prediction

This project analyzes financial news sentiment using a fine-tuned RoBERTa model and integrates it with stock data to predict price movements using LSTM and GRU. It highlights the role of sentiment in enhancing stock market forecasting.

data-analysis data-science data-visualization deep-learning lstm-neural-networks nlp-machine-learning

Last synced: 28 Mar 2025

https://github.com/hassanislam463/british-airways-data-science

Analyze Skytrax reviews to uncover customer sentiments and key themes while predicting booking behavior using machine learning. This repository includes data collection, analysis, and modeling scripts alongside concise, visualized insights to improve customer experience and operational efficiency.

data-analysis data-science data-visualization

Last synced: 28 Mar 2025

https://github.com/martachesnova/sql

Performing data modeling (ERD) and data engineering. Then, writing series of SQL queries to analyze Employee Database of a company.

data-analysis data-engineering data-modeling erd postgresql sql

Last synced: 16 May 2026

https://github.com/engraulleite/local-data-warehousing-with-docker

Creating a DW from 0 to hero. Starting with logical and physical modeling to valuable reports.

airbyte data-analysis datawarehouse docker etl-pipeline metabase pgadmin4 postgresql

Last synced: 01 May 2026

https://github.com/gaurav-van/data-analysis-projects

Collections of Projects that involves Data Analysis and Informed Decision Making

data-analysis database powerbi sql

Last synced: 06 Sep 2025

https://github.com/colindean/allegheny_voter_reg_analysis

Allegheny County Voter Registration Analysis Tools

data-analysis data-science elections pandas polars python voting

Last synced: 16 May 2026

https://github.com/katarinatmb/serbia-protest-analysis

This project analyzes the frequency, regional distribution, and group characteristics of protests that emerged across Serbia following the fatal collapse of the Novi Sad train station roof in November 2024. The analysis explores how different communities responded in the aftermath of the disaster, using data visualization in RStudio

data-analysis data-visualization r r-mark rstudio

Last synced: 10 Jul 2025

https://github.com/mboula/mboula.github.io

GitHub portfolio + interactive resume | Showcasing data projects in civil rights (housing), cannabis, and analytics

cannabis case-study civil-rights compliance dashboards data-analysis data-cleaning data-vizualization excel google-data-analytics housing open-data pattern-analysis portfolio pro-se public-data r sql tableau

Last synced: 10 Jul 2025

https://github.com/carlosvinimsouza/jupyter-notebook-basic

Armazenado todos os trabalhos referentes a Ciência de Dados.

data-analysis data-science programas-jupyter-notebook python

Last synced: 11 May 2026

https://github.com/stkisengese/numpy-data-fundamentals

A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.

data data-analysis numpy pre-processing

Last synced: 16 May 2026

https://github.com/mfakhriazhar/housing-price-analysis

Determining the price of a house also depends on various factors such as building area, exterior quality, and amenities. This dataset provides information on properties for sale, and through Exploratory Data Analysis (EDA), patterns and key factors affecting house prices can be identified.

data-analysis data-science data-visualization eda exploratory-data-analysis python

Last synced: 16 May 2026

https://github.com/ashwin331133/sql-healthcare-data

This repository contains SQL queries designed to analyze health care data. The queries focus on patient demographics, encounter costs, and flu shot statistics, aiming to provide insights into patient behavior and financial impacts. The datasets include information on patient encounters, flu shots, and hospital admissions.

data-analysis mysql sql

Last synced: 29 Oct 2025

https://github.com/panoschatzi/erythrocyte_study_statistical_analyses

R code for data transformation, analysis and visualization of experimental data, as well as for statistical analyses and quantitative simulations.

afex data-analysis emmeans ggplot2 lme4 purrr r rprogramming rstats rstudio statistics tidyverse visualization

Last synced: 04 Apr 2025

https://github.com/arkww/matmap

Making maps from a Database and making the user guess which map is displayed

data-analysis data-science javascript python

Last synced: 24 Apr 2026