An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/datalopes1/fifa21_datacleaning

Neste projeto será feito o processo de limpeza e manipulação a partir do dataset FIFA 21 messy, raw dataset for cleaning/ exploring, que pode ser encontrado no Kaggle, com licensa CC0: Public Domain e enviado por Rachit Toshniwal.

data-analysis data-cleaning python

Last synced: 30 Apr 2026

https://github.com/martachesnova/sql

Performing data modeling (ERD) and data engineering. Then, writing series of SQL queries to analyze Employee Database of a company.

data-analysis data-engineering data-modeling erd postgresql sql

Last synced: 16 May 2026

https://github.com/hassanislam463/british-airways-data-science

Analyze Skytrax reviews to uncover customer sentiments and key themes while predicting booking behavior using machine learning. This repository includes data collection, analysis, and modeling scripts alongside concise, visualized insights to improve customer experience and operational efficiency.

data-analysis data-science data-visualization

Last synced: 28 Mar 2025

https://github.com/hassanislam463/sentiment_analysis_of_financial_news_headlines_and_affect_on_stock_price_prediction

This project analyzes financial news sentiment using a fine-tuned RoBERTa model and integrates it with stock data to predict price movements using LSTM and GRU. It highlights the role of sentiment in enhancing stock market forecasting.

data-analysis data-science data-visualization deep-learning lstm-neural-networks nlp-machine-learning

Last synced: 28 Mar 2025

https://github.com/erseco/ugr_tratamiento_inteligente_datos

Repositorio de trabajo de la asignatura Tratamiento Inteligente de Datos del Máster en Ingeniería Informática de la Universidad de Granada (UGR)

data-analysis data-mining ugr

Last synced: 26 Apr 2026

https://github.com/errea/vet_clinic_database

For this project you need special preparation. As the goal of this project is to solve some performance issue, first we need to introduce those issues. In order to do that, you will populate your database with a significant number of data.

data data-analysis data-structures data-visualization database

Last synced: 21 May 2026

https://github.com/kashirin-alex/thither.direct-onamove

an android skeleton-example application for using data from Thither.Direct platform on mobile applications

android-application data data-analysis data-structures data-visualization mobile-development mobility query research-data-management

Last synced: 27 Apr 2026

https://github.com/habiburrahman-mu/data-wrangling

Data Wrangling is the process of converting data from the initial format to a format that may be better for analysis.

data-analysis data-mining data-science

Last synced: 21 May 2026

https://github.com/andersoncrs/regularizacion_lasso_en_modelos_de_regresion_lineal

Este repositorio contiene un análisis detallado sobre la implementación de la regularización Lasso en modelos de regresión lineal para predecir el precio de vehículos. Se parte de un conjunto de datos limpio y se aplican diversas transformaciones y modelados para mejorar la precisión de las predicciones.

data-analysis data-science data-visualization jupyter-notebook linear-regression regularization-methods seaborn sklearn

Last synced: 16 May 2026

https://github.com/dcs-training/regressionandmixedeffectsmodelling

This course will introduce you to regression and linear mixed-effects models (LMMs). It will help to develop your theoretical understanding and practical skills for running such models in R. Go to the readme file

data-analysis r rmarkdown statistics

Last synced: 25 Feb 2025

https://github.com/dcs-training/introtodatabases

This repository host the material connected to a training developed by Dave Elsmore (Edina) for CDCS. Go to the readme file

data-analysis data-wrangling databases sql

Last synced: 10 Jun 2026

https://github.com/chrisrobertsjr/chrisrobertsjr

Welcome to my Github Profile!

data data-analysis java r sql statistics

Last synced: 03 May 2026

https://github.com/satyacoder29/smartfinance-dynamic-financial-dashboard

SmartFinance: Dynamic Financial Dashboard is an interactive tool designed to visualize key financial metrics like revenue, expenses, and profit. It features real-time data updates, charts, slicers, and navigation for easy analysis. This dashboard helps businesses make data-driven decisions and optimize financial performance.

data-analysis data-cleaning data-modeling data-visualization powerbi powerbi-desktop powerbi-visuals powerquerym

Last synced: 13 Feb 2026

https://github.com/nuraj250/datainsighthub

A Node.js backend application that processes and analyzes personal user data to generate personalized insights and recommendations. It features secure user authentication, data upload and storage, custom algorithms for data analysis, and optional real-time notifications and third-party API integrations. Perfect for showcasing backend development

api-development backend-development bcrypt data-analysis data-analytics data-insights dotenv express jwt-authentication mongodb nodejs passport secure-api user-authentication

Last synced: 09 Apr 2026

https://github.com/teja-1403/forage-tata-data-visualisation-empowering-business-with-effective-insights

This repository contains solutions to the 4 different tasks that must be performed during the Data Visualisation: Empowering Business with Effective Insights virtual internship provided by TATA via Forage.

analysis-and-reporting analytics analytics-and-decision-science charts communications dashboards data-analysis data-cleanup data-interpretation data-storytelling data-visualizations graph insights power-bi visual-basic visualizations

Last synced: 18 Feb 2026

https://github.com/daniel-jcvv/daniel-jcvv

👨‍💻 Data Engineer | 3+ years enterprise experience with Telcel & Citi Banamex Develop ETL pipelines, data governance, and cloud solutions. Building scalable data architectures and automated workflows for Fortune 500 clients. Tech Stack: Python, SQL Server, Oracle, Apache Airflow, PySpark

agentic-ai apache-airflow apache-kafka apache-spark automation business-intelligence citi-bank-apis data-analysis data-engineering data-lake data-warehouse etl-pipeline medallion-architecture mlops n8n-workflow python rag sql-server

Last synced: 15 Apr 2026

https://github.com/nick-peter-marcus/chocolate-bar-analysis

Analyzing Chocolate Bar Features and Ratings - Data Visualization, Decision Trees, Random Forest

data-analysis data-visualization decision-trees python random-forest seaborn sklearn

Last synced: 10 May 2026

https://github.com/srikarveluvali/dataanalysis

The "Dataset - Extraction, Analysis, and Visualization" project is a Python-based data analysis venture that focuses on exploring and interpreting the "Video Game Sales Analysis" dataset.

css data-analysis html javascript matplotlib numpy pandas python seaborn tableau

Last synced: 09 Apr 2026

https://github.com/mvharsh/blinkit-sales-dashboard

An interactive Power BI dashboard visualizing Blinkit's sales performance across outlets, item types, and customer ratings for strategic insights.

blinkitdashboard data-analysis data-visualization powerbi

Last synced: 25 Jan 2026

https://github.com/shafaq-aslam/data-gathering

A hands on collection of notebooks exploring multiple techniques of data gathering, from reading CSV, Excel, JSON, and SQL files to exporting data in various formats and fetching real time data through APIs. This repository documents my complete learning journey of data ingestion, preparation, and extraction for data analysis workflows.

api data-analysis data-export data-gathering data-import data-science jupyter-notebook machine-learning pandas python python3

Last synced: 21 May 2026

https://github.com/karishmagupta05/e-commerce-sales-dashboard

This project is an interactive E-Commerce Sales Dashboard built using Power BI. It provides key insights into sales, profit, and customer behavior through visually engaging charts and graphs.

data-analysis data-visualization powerbi

Last synced: 09 Feb 2026

https://github.com/as16082023/global-electronics-retailer

Analyzed Maven Electronics' performance data to identify factors driving revenue decline since 2020.

advanced-excel data-analysis data-visualization

Last synced: 03 Feb 2026

https://github.com/anonymo2239/big-data-churn-analyzer

Scalable customer churn prediction using PySpark. Includes EDA, feature engineering, modeling, and real-time inference on new data.

big-data churn-analysis churn-prediction classification-algorithm data-analysis data-science data-visualization modeling pyspark

Last synced: 21 May 2026

https://github.com/ishansurdi/data-visualisation-empowering-business-with-effective-insights

The following tasks are completed for Data Visualization: Empowering Business with Effective Insights on Forage in October 2024. It is important to note that this should not be interpreted as an endorsement.

chart communicating-insights-and-analysis dashboard data data-analysis forage powerbi powerbi-visuals tableau tata tata-group virtual-internship visual visualization

Last synced: 17 Feb 2026

https://github.com/berkekaragoz/media-investments-data-analysis

Advertisement Investments Distribution of Turkey by Medium

data-analysis r

Last synced: 19 Aug 2025

https://github.com/gui-sitton/bank-loans

In this project I will prepare a report for a bank's loan division. I find out whether a customer's marital status and number of children have an impact on loan default, as well as other factors

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 21 May 2026

https://github.com/abhipatel35/moviematcher-movie-recommender-system

A robust movie recommendation system using the MovieLens dataset, employing Collaborative Filtering, Matrix Factorization, and Hybrid Models to enhance recommendation accuracy and diversity.

collaborative-filtering content-based-filtering data-analysis eda hybrid-models machine-learning matrix-factorization movie-recommendations movielens-dataset python recommender-system surprise-library

Last synced: 21 May 2026

https://github.com/dcs-training/much-ado-about-nothing-missing-data-in-research

Repo for the Much ado about nothing workshop. Go to the Readme file

data-analysis data-cleaning data-wrangling r

Last synced: 15 Jun 2025

https://github.com/mahmoudwal27/brazilian_ecommerce

This project explores and cleans the Olist Brazilian E-Commerce dataset using Python (Pandas) to prepare it for Power BI visualization. The process includes loading data, performing exploratory analysis, handling missing values and duplicates, formatting key columns, and exporting clean datasets.

analytics data-analysis data-analysis-python google-cloud python

Last synced: 16 May 2026

https://github.com/tapas-gope/pizza-sales

This project analyzes Pizza Sales Data to provide insights into customer preferences and sales performance. Key metrics include total revenue, orders, and average order value, with a breakdown by pizza category and size. The dashboard identifies peak sales periods and top-selling items, supporting data-driven business decisions.

business-intelligence dashboard data-analysis data-visualization dax powerbi sales-analysis

Last synced: 02 Jan 2026

https://github.com/kaushik-puttaswamy/food-delivery-time-prediction-using-machine-learning

The Food Delivery Time Prediction Model estimates delivery times using regression algorithms, with XGBoost as the best performer, and is deployed as a real-time application via Streamlit.

data-analysis data-science delivery food-delivery geolocation machine-learning modeldeployment predictive-modeling python realtimeproject regression-models streamlit xgboost

Last synced: 16 Apr 2026

https://github.com/shivani8136/bellabeat-smart-device-data-analysis

This project analyzes smart device fitness data to uncover insights into user behavior, engagement, and wellness patterns. Conducted for Bellabeat, a high-tech company specializing in health-focused smart products for women, this analysis supports strategic decisions around product development and feature prioritization.

data-analysis data-visualization r-programming-language

Last synced: 08 Feb 2026

https://github.com/grindelfp/two-data-manipulative-tasks

Two simple tasks on data analysis and processing.

data-analysis ipynb mlda

Last synced: 17 Feb 2026

https://github.com/leabrodyheine/ml-kaggle-cirrhosis-data

This project showcases skills in machine learning, data preprocessing, and model evaluation using Python libraries such as scikit-learn, XGBoost, and Optuna. It involves implementing various machine learning models, handling imbalanced data, and employing imputation techniques to enhance model performance for predicting cirrhosis outcomes.

data-analysis data-pre imbalanced-data imputation machine-learning optuna pipeline scikit-learn xgboost

Last synced: 14 May 2026

https://github.com/michael-angelo-mootoo/quanta-app

Quanta is an open source statistical package app / toolkit for neuroscience and general computational descriptive and inferential statistics.

computational-statistics customtkinter data-analysis descriptive-statistics gui-application inferential-statistics neuroscience python r statistical-analysis statistics tkinter-python

Last synced: 16 May 2026

https://github.com/rajesh9943/visualizing-global-development-trends-an-animated-analysis-of-life-expectancy-and-fertility-rates

To clean and analyze data to find trends in global population, fertility, and life expectancy from 1960 to 2016. This idea was inspired by hans rosling . To analyze the data, I used a scatter bubble chart, which clearly shows how's the population increased and the fertility rate decreased from 1960 to 2016.

data-analysis data-cleaning-and-preprocessing data-exploration expolatory-data-analysis identify-patterns reporting vizualisation

Last synced: 08 Oct 2025

https://github.com/touradbaba/multi-page_dash_application

This repository contains a Multi-Page Dash Application designed to provide interactive visualizations of geo-spatial data, focusing on population and GDP. The app offers insights into demographic and economic trends through interactive maps and various types of charts. It is built with Python, using Plotly and Dash, and is deployed on Heroku.

dash dashboard data-analysis data-visualization exploratory-data-analysis heroku-deployment plotly pythonanywhere

Last synced: 27 Jul 2025

https://github.com/balajimohan18/power-bi-visualization-project

This repository contains Visualization Projects which is visualized through Power BI Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and by the insights we can reduce damages by accidents & calamities.

data-analysis data-cleaning data-science data-visualization exploratory-data-analysis microsoft-excel microsoft-power-bi microsoft-powerpoint powerbi powerbi-visuals powerpoint-slides

Last synced: 08 Mar 2026

https://github.com/pdiegel/currencytracker

A Python application that fetches real-time currency exchange rates from an API, securely stores the data in an SQLite database, and includes error handling, logging, and good programming practices for reliable and periodic data capturing.

analysis api currency data-analysis data-capture logging python python3 sqlite3 tracker

Last synced: 09 Sep 2025

https://github.com/kheriberto/logistic_regression_project

A project that analyses dummie data from an advertising company using logistic regression

data-analysis logistic-regression pandas python scikit-learn seaborn

Last synced: 08 Apr 2026

https://github.com/andrii04/andreamonforte-bi-assignment

Automated Data Pipeline that ingests daily GA4-formatted CSV files from a private Google Cloud Storage bucket, validates and loads them into BigQuery, and prepares analysis-ready views. The solution is built for deployment as a Cloud Function triggered by Cloud Scheduler and uses Python with the Google Cloud Storage and BigQuery client libraries.

automation bigquery cloud cloudfunctions data data-analysis data-engineering etl etlpipeline gcp google googlecloudplatform pipeline python sql

Last synced: 09 Nov 2025

https://github.com/ggarciajavier/udacity-dalf-project3-test-perceptual-phenomenom

Work performed for the 3rd project of Udacity Data Analyst Nanodegree: statistical testing of a perceptual phenomenom (Stroop task).

data-analysis python statistical-inference udacity-data-analyst-nanodegree

Last synced: 18 May 2026

https://github.com/hayatiyrtgl/cryptocurrency_time_series_rnn

Python script for training a Simple RNN model on cryptocurrency price data to predict future prices, including data exploration and evaluation

data-analysis data-science data-visualization keras pandas pandas-python prediction predictive-modeling python python-script rnn rnn-tensorflow tensorflow time-series time-series-analysis

Last synced: 08 Apr 2026

https://github.com/yasir-arafah/nyc-trip-fare-prediction-using-tcn

"NYC Trip Fare Prediction Using Temporal Convolutional Networks (TCN)" is a Data Analytics Project where the trip and fare data of NYC taxi are combined and then analyzed using Pyspark and visualized using Matplotlib library. The project predicts the fare by using Temporal Convolutional Neural Network.

colab data-analysis matplotlib nyc-taxi-dataset pyspark python

Last synced: 29 Apr 2026

https://github.com/l0rd-inquisit0r/data-analytics

A repository of data analytics implementations in Python

ai data-analysis data-analysis-python data-analytics

Last synced: 18 Jun 2025

https://github.com/gutow/langmuir_trough

Code to run homebuilt Langmuir Trough using Jupyter and Python. Link below for API docs:

data-acquisition data-analysis jupyter langmuir-trough plotting

Last synced: 11 Aug 2025

https://github.com/ejw-data/tableau-drug-study

Brief analysis of drug treatments that were also analyzed with pandas

data-analysis tableau

Last synced: 02 Jan 2026

https://github.com/saidulalimallick04/smart-traffic-violation-pattern-detector-dashboard

This project is a Streamlit web application designed to analyze traffic violation data. It provides a user-friendly interface to explore, visualize, and gain insights from traffic violation datasets. Users can upload their own data, perform analysis, and view summaries and trends.

dashboard data-analysis data-visualization internship-project pandas python smart-traffic streamlit

Last synced: 18 Apr 2026

https://github.com/rajkumargara/bike_rental_data_analysis

Chicago bike rental data analysis for business insights using R programming

data-analysis data-visualization data-wrangling large-dataset machine-learning-algorithms

Last synced: 11 Aug 2025

https://github.com/faizantkhan/automated-eda

This repository showcases tools for automatic Exploratory Data Analysis (EDA) in Python. These tools help you quickly understand your datasets and generate insightful reports.

automatic automation autoviz data-analysis data-analysis-python data-science data-visualization dtale dtale-library eda exploratory-data-analysis ml pandas pandas-profiling python python-library sweetviz

Last synced: 18 Apr 2026

https://github.com/anamakarevich/suicide_rates_factors

Female suicide rates analysis for Udacity Hacathon

data-analysis data-cleaning linear-regression suicide

Last synced: 21 May 2026

https://github.com/puspacempaka/hackerrank-sql-challenges-intermediate

This repository features solutions to various intermediate-level SQL challenges from HackerRank. It includes efficient SQL queries, problem-solving techniques, and well-documented scripts. Explore these solutions to understand different SQL problems and enhance your skills.

challenges data-analysis database hackerrank-solutions queries sql sql-intermediate-level

Last synced: 02 Jan 2026

https://github.com/simranshaikh20/credit-card-dashboard

A Data Visualization Project using Microsoft Power bi

data-analysis data-visualization powerbi

Last synced: 02 Jan 2026

https://github.com/mmzong/gee_lifestyleeffectsonhypertension

Generalized Estimating Equations (GEE), Quasi-likelihood under the Independence Model Criterion (QIC), Longitudinal data, Embedded box plots within violin plots with hypertension risk categories, spaghetti plots, aggregate line plots, histograms, faceted-area plots, box and jitter plots. Investigating the impact of lifestyle on health.

aggregate-line-plot area-faceted-plots box-plots data-analysis data-manipulation data-science data-visualization generalized-estimating-equations histograms jitter-plots longitudinal-data qic quasi-likelihoods r spaghetti-plots violin-plots

Last synced: 29 Jul 2025

https://github.com/iliyasalve/cyclistic_case_study

Analysis of the Bike-Sharing System for the following question: "How do annual members and casual riders use Cyclistic bikes differently?"

bike-sharing data data-analysis data-visualisation r

Last synced: 06 Apr 2025

https://github.com/erayagdogan/simplecharts

Simple Charts is a chart maker compose app with material 3 design. Charts are created using the lets-plot-compose library.

android android-app charts data-analysis data-visualization jetpack-compose lets-plot-kotlin material-3 viewmodel

Last synced: 29 Jun 2026

https://github.com/ginga1402/car_price_prediction

Predict the price of a car using MS Excel.

college-project data-analysis excel linear-regression

Last synced: 30 Mar 2025

https://github.com/abhishekyadav915/diwali_sales_analysis

This project aims to analyze sales data during the Diwali festival using Python. The analysis focuses on identifying key trends, customer purchasing behavior, and sales performance across different segments. By leveraging data visualization and statistical analysis, we uncover insights.

data-analysis data-visualization matplotlib-pyplot numpy-library pandas-dataframe seaborn-python

Last synced: 05 Apr 2025

https://github.com/tashi-2004/data-visualization-tableau-traffic-collision-insights

Analysis of traffic collision data using Tableau, featuring interactive visualizations that highlight trends in injuries and fatalities, contributing factors, and geographic distributions. It includes various sheets and dashboards, with recommendations for enhancing road safety. The dataset is available for further exploration.

data-analysis data-visualization eda geospatial-analysis machine-learning predictive-modeling statistics tableau traffic-analysis

Last synced: 19 Mar 2026

https://github.com/rajesh9943/web-scraping-analysis-of-top-us-company-revenue-growth-in-2023

Explore the landscape of US business growth in 2023 with our dynamic project, 'Web Scraping for US 2023 Revenue Growth.' Utilizing advanced web scraping techniques, we unveil insights into the top companies driving economic expansion.

cleaning-data data data-analysis data-visualization manipulation numpy pandas pre-fill

Last synced: 16 Aug 2025

https://github.com/jwt218/isonq

MATLAB package for Qtegra-generated data file processing.

data-analysis geochemistry isotopes matlab

Last synced: 03 Apr 2025

https://github.com/jabulente/t-test-python-implementation

A Python-based implementation of one-sample, two-sample, and paired t-tests for statistical analysis and hypothesis testing.

automation data-analysis data-science eda exploratory-data-analysis hypothesis-testing independent-ttest one-sample-t-test python reporting statistics ttest two-sample-t-test

Last synced: 27 Jun 2025

https://github.com/thesfinox/mltools

A collection of simple tools for data science and machine learning projects.

ai data-analysis data-science data-visualization logging machine-learning matplotlib neural-network python toolbox

Last synced: 14 May 2025

https://github.com/chaedoll/teamproject-foreignerreport

국내 외국인 대상 인프라 개선을 위한 보고서 (Report on improving infrastructure for foreigners)

data-analysis python

Last synced: 25 Feb 2025

https://github.com/alejandrolara11/desafio_latam_introduccion_analisis_de_datos

Repositorio del curso "Introducción al Análisis de Datos" de Desafío Latam. Ejercicios prácticos realizados durante el curso, enfocados en análisis de datos con Python, Pandas, y visualización básica.

data-analysis data-science data-visualization matplotlib numpy pandas python seaborn statsmodels

Last synced: 29 Apr 2026

https://github.com/hossein-rahmati/credit-card-fraud-detection

This repository contains the implementation of a machine learning pipeline for detecting fraudulent credit card transactions. The project leverages common data science libraries to preprocess data, train multiple models, and evaluate their performance using appropriate classification metrics.

data-analysis fraud-detection k-fold-cross-validation machine-learning random-forest-classifier

Last synced: 15 Sep 2025

https://github.com/leticiamilan/dashboard-analitico-de-vendas-globais

Dashboard Analítico de Vendas Globais - DSA - Desenvolvido com Power BI

dashboard dashboard-power-bi data-analysis power-bi powerbi

Last synced: 03 Feb 2026

https://github.com/mr-chang95/datascience_airbnb

Data Science Project for Udacity's Data Scientist Program. Using Python in Jupyter Notebook.

airbnb data-analysis data-science data-visualization jupyter-notebook numpy pandas python sklearn

Last synced: 08 Apr 2026

https://github.com/as16082023/goodcabs-performance-analysis

Codebasics Resume Challenge 13 Analysing Goodcabs' performance in transportation across India from January to June 2024

codebasicsresumeprojectchallenge data-analysis goodcabs mysql sql

Last synced: 03 Apr 2025

https://github.com/ct83/become-a-data-analyst-udacity

This repository contains all of the code, projects and reports that I wrote as I pursued my Udacity - Data Analyst NanoDegree.

data-analysis data-analysis-python data-analyst data-visualisation data-visualization-project datascience python udacity udacity-data-analyst-nanodegree

Last synced: 12 Aug 2025

https://github.com/sadia-khan13/modern_arts_data_cleaning

Welcome to the Data Cleaning project! This repository is dedicated to showcasing best practices and techniques for cleaning data using Pandas within Jupyter Notebook

data-analysis data-analysis-python data-cleaning data-science jupyter-notebook pandas-python

Last synced: 10 May 2026

https://github.com/elliotone/nl-semantic-kernel-sales-analyzer

A console project showing Microsoft Semantic Kernel examples for sales data analysis using local AI models via LM Studio.

ai csharp data-analysis dotnet lm-studio local-ai machine-learning semantic-kernel

Last synced: 16 May 2026

https://github.com/istinnew/enaic-s-discount-strategy-analysis

**(Open to Collaboration):** This project evaluates the impact of discounts on sales and customer retention for Eniac. It includes data cleaning, visualization, storytelling, and strategic insights to optimize discount strategies while maintaining brand reputation. 📊🛍️✨

cleaning-data cleaning-data-in-python cost-optimization data-analysis data-science data-visualization library presentation python visualization

Last synced: 03 Apr 2025

https://github.com/kunalkumar2001/sales-project-using-excel-and-sql

Comprehensive sales analysis using SQL, Excel, and PowerPoint to uncover insights on top-sellers, peak times, and branch performance.

data-analysis data-analytics excel mssql sql

Last synced: 03 Nov 2025

https://github.com/rahil-p/nba-hackathon

2018 NBA Hackathon application

data-analysis data-wrangling

Last synced: 16 May 2026