An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/pramodkondur/dataspark-end-to-end-dataanalytics

Cleaned, performed EDA and stored data in MySQL. Queried, and analyzed data, uncovering opportunities to drive revenue growth and optimize operations, with a potential revenue growth of $30.03 million. Reported key insights using Power BI.

data-analysis data-visualization eda powerbi python sql

Last synced: 21 May 2026

https://github.com/spshah1701/world-development-indicators

Analysis of World Development Indicators (WDI) using big data technologies, specifically Databricks, Apache Spark, and Scala.

apache-spark big-data data-analysis spark-sql

Last synced: 17 Mar 2025

https://github.com/mosalem149/pythonutilities

A collection of Python scripts for common utility tasks including file manipulation, word counting, longest word detection, and grade categorization. Perfect for quick and easy solutions to everyday programming problems.

data-analysis educational-tools file-io file-manipulation grade-calculation python text-analysis text-processing utility word-counting

Last synced: 15 May 2026

https://github.com/darshan1924/house-price-pridiction

This repository contains a machine learning project for predicting house prices based on various features, including geographical coordinates. The project includes data preprocessing steps to handle# House Price Prediction Project

data-analysis data-preprocessing house-prices jupyter-notebook machine-learning prediction

Last synced: 27 Mar 2025

https://github.com/thecoderpinar/globalwarmingforecast

🌍 Global Warming Forecast Tool An advanced tool for analyzing and forecasting climate trends using ARIMA and Prophet models, with interactive visualizations and scenario simulations.

arima climate-change data-analysis environmental-science forecasting global-warming machine-learning prophet streamlit time-series-analysis visualization

Last synced: 27 Mar 2025

https://github.com/nehul1149/olympic-data-analysis

This project is an interactive data visualization and analytics platform for exploring historical Olympic Games data. Built with Python and Streamlit, it offers an in-depth analysis of medal tallies, athlete statistics, and country-wise performance trends, providing users with powerful insights into the world's biggest sporting event.

analysis data-analysis data-science data-visualization matplotlib python streamlit

Last synced: 18 May 2026

https://github.com/brevex/hotel-booking-demand-data-analysis

Data analysis in Python of demand for urban hotels and resorts showing their causes and relationships

data-analysis data-science hotel-booking-analysis kaggle python

Last synced: 08 May 2026

https://github.com/tknishh/investing-platform

An investing platform application to help users get information and analyze various foreign currency assets. The investing platform uses an ETL pipeline to insert new batches of Forex data once a day.

data-analysis investing-platform pipeline

Last synced: 18 Mar 2025

https://github.com/iamsainikhil/us-births-analysis

Analysis of US-Births during 1994-2003 based on CDC-NCHS data set.

data-analysis python

Last synced: 16 May 2026

https://github.com/ebrizzzz/data-visualization-project-using-tableau

A data visualization project for the Visual Data Analysis course (Spring Term 2025) at the University of Skövde. This project explores the factors influencing national happiness scores across different global regions from 2005 to 2022.

analytics data data-analysis data-science data-visualization python regression tableau

Last synced: 16 Jun 2025

https://github.com/qorah/vic-edu-housing-insights

Analysis of education outcomes and housing affordability in Victoria, Australia.

data-analysis jupyter-notebook

Last synced: 18 Mar 2025

https://github.com/betkh/datascieneinpython

Jupiter Notebook files

data-analysis data-visualization

Last synced: 16 Jun 2025

https://github.com/abidshafee/google.colaboratory_projects

This repository contains the collections of interactive python notebooks (ipynb) that are some of my projects on Data Science, Machine Learning (ML), and Natural Language Processing (NLP).

colaboratory data-analysis data-science lstm machine-learning nlp statistics time-series

Last synced: 09 Jul 2025

https://github.com/felipe-veas/visor-sueldos-publicos

Herramienta interactiva para visualizar y analizar remuneraciones del sector público en Chile, construida con Streamlit.

audit chile data-analysis python streamlit transparency

Last synced: 16 May 2026

https://github.com/czesctuklap/sustainable-fashion-database-analysis

This project, analyzes a dataset of sustainable fashion trends for 2024. It includes data preprocessing, exploration, visualization, and insights on environmental impact factors such as carbon footprint, water usage, waste production, and sustainability practices.

data-analysis data-visualization database dataset keggle sustainable-fashion

Last synced: 30 Apr 2026

https://github.com/fmind/malpop

Rank the popularity of malware applications by their occurrence on VirusTotal

data-analysis malware popularity ranking virustotal

Last synced: 11 Apr 2025

https://github.com/estevan-ulian/py-agent-voice

Um projeto para lidar com interações de voz entre humano e agente de I.A. permitindo a leitura e análise de dados de um arquivo CSV.

agent-based-modeling data-analysis python3 whisper-ai

Last synced: 11 Apr 2025

https://github.com/badranalyst/startup-expansion-analysis-with-pandas-matplotlib-and-power-bi

Analyzes startup growth and expansion factors using Pandas for data analysis and Matplotlib for visualizations. Complements findings with data visualizations in Power BI, providing actionable insights into funding and market trends.

dashboard data-analysis data-visualization dataset matplotlib matplotlib-pyplot pandas power-bi powerbi

Last synced: 16 May 2026

https://github.com/josedanielchg/nyc-schools-test-scores-exploration

DataCamp project analyzing NYC public school test scores to identify top math-performing schools, the best overall SAT scores, and borough-level variability using Python and pandas

data-analysis jupyter-notebook python

Last synced: 19 Mar 2025

https://github.com/swat1563/recommendation-system

This repository features a recommendation system and analytics engine using datasets on users, organizations, contents, contacts, events, and recommendations. It includes data preprocessing, building a recommendation system, and creating visual reports with Power BI.

analytics data-analysis data-visualization engine kaggle numpy pandas powerbi powerbi-dashboards powerbi-desktop powerbi-reports python recommendation-engine recommendation-system recommender-systems scikit-learn scipy

Last synced: 07 Jan 2026

https://github.com/damianmarti/big-mac-index

Data analysis from BigMac index

data-analysis data-science

Last synced: 03 Apr 2025

https://github.com/rijul007/diamonds-analysis-using-r

Diamonds data analysis using R, exploring relationships between diamond attributes (such as carat, cut, color, and clarity) and price, with a focus on providing insights for engagement ring selection through various statistical techniques and data visualizations including histograms, boxplots, scatter plots, and bar charts.

data-analysis data-science

Last synced: 25 Jan 2026

https://github.com/rahil-p/nba-hackathon

2018 NBA Hackathon application

data-analysis data-wrangling

Last synced: 16 May 2026

https://github.com/istinnew/enaic-s-discount-strategy-analysis

**(Open to Collaboration):** This project evaluates the impact of discounts on sales and customer retention for Eniac. It includes data cleaning, visualization, storytelling, and strategic insights to optimize discount strategies while maintaining brand reputation. 📊🛍️✨

cleaning-data cleaning-data-in-python cost-optimization data-analysis data-science data-visualization library presentation python visualization

Last synced: 03 Apr 2025

https://github.com/elliotone/nl-semantic-kernel-sales-analyzer

A console project showing Microsoft Semantic Kernel examples for sales data analysis using local AI models via LM Studio.

ai csharp data-analysis dotnet lm-studio local-ai machine-learning semantic-kernel

Last synced: 16 May 2026

https://github.com/sadia-khan13/modern_arts_data_cleaning

Welcome to the Data Cleaning project! This repository is dedicated to showcasing best practices and techniques for cleaning data using Pandas within Jupyter Notebook

data-analysis data-analysis-python data-cleaning data-science jupyter-notebook pandas-python

Last synced: 10 May 2026

https://github.com/as16082023/goodcabs-performance-analysis

Codebasics Resume Challenge 13 Analysing Goodcabs' performance in transportation across India from January to June 2024

codebasicsresumeprojectchallenge data-analysis goodcabs mysql sql

Last synced: 03 Apr 2025

https://github.com/alejandrolara11/desafio_latam_introduccion_analisis_de_datos

Repositorio del curso "Introducción al Análisis de Datos" de Desafío Latam. Ejercicios prácticos realizados durante el curso, enfocados en análisis de datos con Python, Pandas, y visualización básica.

data-analysis data-science data-visualization matplotlib numpy pandas python seaborn statsmodels

Last synced: 29 Apr 2026

https://github.com/jwt218/isonq

MATLAB package for Qtegra-generated data file processing.

data-analysis geochemistry isotopes matlab

Last synced: 03 Apr 2025

https://github.com/yasir-arafah/nyc-trip-fare-prediction-using-tcn

"NYC Trip Fare Prediction Using Temporal Convolutional Networks (TCN)" is a Data Analytics Project where the trip and fare data of NYC taxi are combined and then analyzed using Pyspark and visualized using Matplotlib library. The project predicts the fare by using Temporal Convolutional Neural Network.

colab data-analysis matplotlib nyc-taxi-dataset pyspark python

Last synced: 29 Apr 2026

https://github.com/ggarciajavier/udacity-dalf-project3-test-perceptual-phenomenom

Work performed for the 3rd project of Udacity Data Analyst Nanodegree: statistical testing of a perceptual phenomenom (Stroop task).

data-analysis python statistical-inference udacity-data-analyst-nanodegree

Last synced: 18 May 2026

https://github.com/pdiegel/currencytracker

A Python application that fetches real-time currency exchange rates from an API, securely stores the data in an SQLite database, and includes error handling, logging, and good programming practices for reliable and periodic data capturing.

analysis api currency data-analysis data-capture logging python python3 sqlite3 tracker

Last synced: 09 Sep 2025

https://github.com/michael-angelo-mootoo/quanta-app

Quanta is an open source statistical package app / toolkit for neuroscience and general computational descriptive and inferential statistics.

computational-statistics customtkinter data-analysis descriptive-statistics gui-application inferential-statistics neuroscience python r statistical-analysis statistics tkinter-python

Last synced: 16 May 2026

https://github.com/grindelfp/two-data-manipulative-tasks

Two simple tasks on data analysis and processing.

data-analysis ipynb mlda

Last synced: 17 Feb 2026

https://github.com/mahmoudwal27/brazilian_ecommerce

This project explores and cleans the Olist Brazilian E-Commerce dataset using Python (Pandas) to prepare it for Power BI visualization. The process includes loading data, performing exploratory analysis, handling missing values and duplicates, formatting key columns, and exporting clean datasets.

analytics data-analysis data-analysis-python google-cloud python

Last synced: 16 May 2026

https://github.com/ishansurdi/data-visualisation-empowering-business-with-effective-insights

The following tasks are completed for Data Visualization: Empowering Business with Effective Insights on Forage in October 2024. It is important to note that this should not be interpreted as an endorsement.

chart communicating-insights-and-analysis dashboard data data-analysis forage powerbi powerbi-visuals tableau tata tata-group virtual-internship visual visualization

Last synced: 17 Feb 2026

https://github.com/karishmagupta05/e-commerce-sales-dashboard

This project is an interactive E-Commerce Sales Dashboard built using Power BI. It provides key insights into sales, profit, and customer behavior through visually engaging charts and graphs.

data-analysis data-visualization powerbi

Last synced: 09 Feb 2026

https://github.com/nick-peter-marcus/chocolate-bar-analysis

Analyzing Chocolate Bar Features and Ratings - Data Visualization, Decision Trees, Random Forest

data-analysis data-visualization decision-trees python random-forest seaborn sklearn

Last synced: 10 May 2026

https://github.com/satyacoder29/smartfinance-dynamic-financial-dashboard

SmartFinance: Dynamic Financial Dashboard is an interactive tool designed to visualize key financial metrics like revenue, expenses, and profit. It features real-time data updates, charts, slicers, and navigation for easy analysis. This dashboard helps businesses make data-driven decisions and optimize financial performance.

data-analysis data-cleaning data-modeling data-visualization powerbi powerbi-desktop powerbi-visuals powerquerym

Last synced: 13 Feb 2026

https://github.com/chrisrobertsjr/chrisrobertsjr

Welcome to my Github Profile!

data data-analysis java r sql statistics

Last synced: 03 May 2026

https://github.com/hassanislam463/sentiment_analysis_of_financial_news_headlines_and_affect_on_stock_price_prediction

This project analyzes financial news sentiment using a fine-tuned RoBERTa model and integrates it with stock data to predict price movements using LSTM and GRU. It highlights the role of sentiment in enhancing stock market forecasting.

data-analysis data-science data-visualization deep-learning lstm-neural-networks nlp-machine-learning

Last synced: 28 Mar 2025

https://github.com/hassanislam463/british-airways-data-science

Analyze Skytrax reviews to uncover customer sentiments and key themes while predicting booking behavior using machine learning. This repository includes data collection, analysis, and modeling scripts alongside concise, visualized insights to improve customer experience and operational efficiency.

data-analysis data-science data-visualization

Last synced: 28 Mar 2025

https://github.com/martachesnova/sql

Performing data modeling (ERD) and data engineering. Then, writing series of SQL queries to analyze Employee Database of a company.

data-analysis data-engineering data-modeling erd postgresql sql

Last synced: 16 May 2026

https://github.com/datalopes1/fifa21_datacleaning

Neste projeto será feito o processo de limpeza e manipulação a partir do dataset FIFA 21 messy, raw dataset for cleaning/ exploring, que pode ser encontrado no Kaggle, com licensa CC0: Public Domain e enviado por Rachit Toshniwal.

data-analysis data-cleaning python

Last synced: 30 Apr 2026

https://github.com/engraulleite/local-data-warehousing-with-docker

Creating a DW from 0 to hero. Starting with logical and physical modeling to valuable reports.

airbyte data-analysis datawarehouse docker etl-pipeline metabase pgadmin4 postgresql

Last synced: 01 May 2026

https://github.com/gaurav-van/data-analysis-projects

Collections of Projects that involves Data Analysis and Informed Decision Making

data-analysis database powerbi sql

Last synced: 06 Sep 2025

https://github.com/colindean/allegheny_voter_reg_analysis

Allegheny County Voter Registration Analysis Tools

data-analysis data-science elections pandas polars python voting

Last synced: 16 May 2026

https://github.com/katarinatmb/serbia-protest-analysis

This project analyzes the frequency, regional distribution, and group characteristics of protests that emerged across Serbia following the fatal collapse of the Novi Sad train station roof in November 2024. The analysis explores how different communities responded in the aftermath of the disaster, using data visualization in RStudio

data-analysis data-visualization r r-mark rstudio

Last synced: 10 Jul 2025

https://github.com/mboula/mboula.github.io

GitHub portfolio + interactive resume | Showcasing data projects in civil rights (housing), cannabis, and analytics

cannabis case-study civil-rights compliance dashboards data-analysis data-cleaning data-vizualization excel google-data-analytics housing open-data pattern-analysis portfolio pro-se public-data r sql tableau

Last synced: 10 Jul 2025

https://github.com/carlosvinimsouza/jupyter-notebook-basic

Armazenado todos os trabalhos referentes a Ciência de Dados.

data-analysis data-science programas-jupyter-notebook python

Last synced: 11 May 2026

https://github.com/stkisengese/numpy-data-fundamentals

A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.

data data-analysis numpy pre-processing

Last synced: 16 May 2026

https://github.com/mfakhriazhar/housing-price-analysis

Determining the price of a house also depends on various factors such as building area, exterior quality, and amenities. This dataset provides information on properties for sale, and through Exploratory Data Analysis (EDA), patterns and key factors affecting house prices can be identified.

data-analysis data-science data-visualization eda exploratory-data-analysis python

Last synced: 16 May 2026

https://github.com/ashwin331133/sql-healthcare-data

This repository contains SQL queries designed to analyze health care data. The queries focus on patient demographics, encounter costs, and flu shot statistics, aiming to provide insights into patient behavior and financial impacts. The datasets include information on patient encounters, flu shots, and hospital admissions.

data-analysis mysql sql

Last synced: 29 Oct 2025

https://github.com/jelhamm/internode-hellinger-distance-based-decision-tree

Simulations for the paper "Inter node Hellinger Distance based Decision Tree by Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib"

articles data-analysis data-mining decision-tree decision-tree-classifier hddt hellinger-distance-criterion machine-learning numpy-library paper-implementations python scipy-library simulation tree-node

Last synced: 04 Apr 2025

https://github.com/panoschatzi/erythrocyte_study_statistical_analyses

R code for data transformation, analysis and visualization of experimental data, as well as for statistical analyses and quantitative simulations.

afex data-analysis emmeans ggplot2 lme4 purrr r rprogramming rstats rstudio statistics tidyverse visualization

Last synced: 04 Apr 2025

https://github.com/alfioma/ada-xtq

🔗 Simplify data transfer with ada-xtq, a lightweight tool for seamless integration and efficient handling of data between platforms.

ada algorithms api-development artificial-intelligence automation data-analysis data-visualization docker machine-learning neural-networks open-source programming python software-development xtq

Last synced: 01 May 2026

https://github.com/RLAlpha49/AniSearch-Model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2025

https://github.com/ifigeneiatsiflidou/popular-items-sales-analysis

Two data tasks in Python: popular items by ZIP & store sales breakdown with plots.

data-analysis matplotlib pandas

Last synced: 16 May 2026

https://github.com/arkww/chinesenewspaperwordcount

Analysis the word count of Chinese characters in Simplified and Traditional Chinese characters and comparing the results

chinese-language data-analysis data-science python

Last synced: 16 May 2026

https://github.com/htsandaruvan/attrition-analytics-suite-by-hello-green

I have created a comprehensive data analytics dashboard to identify factors contributing to attrition,

data-analysis data-analytics data-visualization powerbi

Last synced: 20 Jan 2026

https://github.com/arkww/matmap

Making maps from a Database and making the user guess which map is displayed

data-analysis data-science javascript python

Last synced: 24 Apr 2026

https://github.com/coditheck/data_analysis

Data analysis is the process of inspecting, cleaning, transforming, and modeling data in order to discover useful information, draw conclusions, and support decision making.

data-analysis python

Last synced: 17 Jun 2025

https://github.com/rohitha-tata/churn-predict

Churn Predict uses Machine Learning to analyze customer behavior and identify those likely to leave. It involves data preprocessing, feature selection, model training (Logistic Regression, Random Forest, XGBoost), and evaluation using accuracy and ROC-AUC. The model provides actionable insights to help businesses reduce churn and improve retention

data-analysis logistic-regression machine-learning python

Last synced: 16 May 2026

https://github.com/abhishekyadav915/data-analytics-projects

This project focuses on performing comprehensive data analysis to extract valuable insights from a given dataset. By leveraging various data manipulation, cleaning, and visualization techniques, the project aims to uncover patterns, trends, and correlations that can inform decision-making and strategy.

data-analysis data-visualization dataset

Last synced: 05 Apr 2025

https://github.com/riborings/uranouchi42microdiversity

In this repository live the bash, R and Julia scripts used to explore the microdiversity of the prokaryotic community at Uranouchi Inlet (42-sample time-series) by means of metagenomic shotgun sequencing under the supervision of the Ogata Lab.

big-data data-analysis data-visualisation diversity-analysis marine-ecology marine-ecosystem metagenomics microbiome-analysis prokaryotic-genomes

Last synced: 29 Oct 2025

https://github.com/ygalvao/uow_ai_final_project

This was my Final Project for the Artificial Intelligence Diploma program of The University of Winnipeg - Professional, Applied and Continuing Education (PACE).

data-analysis data-analytics dbscan elections k-means k-means-clustering machine-learning som som-clustering

Last synced: 10 Jul 2025

https://github.com/rorrell/spotifyhistory

A Jupyter Notebook where I wrangle some data and plot a chart to draw some conclusions about a user's Spotify history

data-analysis data-visualisation data-wrangling jupyter-notebook python3

Last synced: 19 May 2026

https://github.com/rita94105/smart_contract_vulnerability_detector

Smart contracts are pivotal in blockchain applications but are prone to vulnerabilities that can lead to significant losses. SmartGuard: Multi-Stage Smart Contract Vulnerability Detection tackles this issue by developing a machine learning framework to identify eight vulnerability types using datasets from Kaggle and Hugging Face.

data-analysis machine-learning smart-contracts streamlit vulnerability-detection

Last synced: 01 Aug 2025

https://github.com/halyusa16/sql-employee-insights

This project dives into employee data to uncover actionable insights using SQL. It mimics real-world HR and business analysis tasks, from salary comparisons to workforce demographics and potential cost-cutting strategies.

data-analysis mysql sql

Last synced: 11 Apr 2025

https://github.com/sukitsubaki/screen-time-tracker

A minimalist Python tracker that records the usage time of various applications and provides insights into your computer usage habits.

application-usage data-analysis monitoring productivity python python-cli screen-time time-tracking

Last synced: 12 Apr 2025