An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with big-data-analytics

A curated list of projects in awesome lists tagged with big-data-analytics .

https://github.com/ict-bda/easyml

Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.

big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio

Last synced: 15 May 2025

https://github.com/ICT-BDA/EasyML

Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.

big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio

Last synced: 27 Mar 2025

https://github.com/mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark

big-data big-data-analytics data-algorithms pyspark spark spark-dataframes spark-rdd

Last synced: 14 May 2025

https://github.com/v6d-io/v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage

Last synced: 20 Mar 2025

https://github.com/MrXujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

antv big-data big-data-analytics bigdata dooring low-code lowcode nodejs react webgl2

Last synced: 07 Feb 2026

https://github.com/caioricciuti/ch-ui

Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.

big-data big-data-analytics big-data-visualization clickhouse-ui

Last synced: 24 Feb 2026

https://github.com/mrxujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

antv big-data big-data-analytics bigdata dooring low-code lowcode nodejs react webgl2

Last synced: 05 Apr 2025

https://github.com/lithops-cloud/lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

big-data big-data-analytics cloud-computing data-processing distributed kubernetes multicloud multiprocessing object-storage parallel python serverless serverless-computing serverless-functions

Last synced: 03 Jan 2026

https://github.com/rouyang2017/SISSO

A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.

big-data-analytics compressed-sensing machine-learning material-science symbolic-regression

Last synced: 04 May 2025

https://github.com/Ashish7129/Graph_Sampling

Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.

big-data big-data-analytics breadth-first-search data-mining graphs induction network network-analysis network-science networkx python random-walk sample sampling social-network-analysis subgraph

Last synced: 19 Jul 2025

https://github.com/archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives

Last synced: 13 Apr 2025

https://github.com/u2i/egis

Egis - a handy Ruby interface for AWS Athena

aws aws-athena big-data big-data-analytics ruby ruby-gem

Last synced: 16 Jul 2025

https://github.com/ingef/conquery

Visual, interactive queries against big databases

big-data big-data-analytics java

Last synced: 08 May 2025

https://github.com/wittline/pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

aws aws-emr big-data big-data-analytics dataengineering ec2-spot ec2-spot-instances emr-cluster pyspark python spark wordcloud-generator

Last synced: 13 Apr 2025

https://github.com/arakat-community/arakat

ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

big-data-analytics business-intelligence cloud-native-applications data-pipelines distributed-systems docker docker-swarm predictive-maintenance

Last synced: 07 May 2025

https://github.com/azure/azurekusto

R interface to Azure Data Explorer, aka Kusto

azure azure-data-explorer azure-sdk-r big-data-analytics kusto r

Last synced: 20 Oct 2025

https://github.com/jdvelasq/courses

Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia

analytics big-data big-data-analytics data-science training-materials

Last synced: 23 Aug 2025

https://github.com/seeratawan01/autocapture.js

Build your own analytics - A single library to grabs every click, touch, page-view, and fill — forever.

analytics autocapture big-data-analytics events heatmap user-behavior-analytics user-behaviour user-events user-interaction

Last synced: 30 Apr 2025

https://github.com/srlozano/tinder-big-data-analysis

Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech

big-data big-data-analytics data-science dating-app mongodb python

Last synced: 11 Oct 2025

https://github.com/adityakamble49/loss-ratio-prediction

Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage

big-data big-data-analytics data-science insurance jupyter-notebook politics python

Last synced: 04 Apr 2026

https://github.com/fiware/tutorials.big-data-flink

:blue_book: FIWARE 305: Real-time Processing of Context Data using Apache Flink

apache-flink big-data-analytics fiware fiware-cosmos flink orion-flink-connector tutorial

Last synced: 30 Apr 2025

https://github.com/n1ghtf1re/map-of-emergency-incidents

Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information

big-data big-data-analytics big-data-visualization bigdata color-mixing colors data data-analytics data-science data-visualization data-visualization-challenges data-visualization-simpler mysql open-source-project php student-project

Last synced: 18 Mar 2025

https://github.com/asavinov/bistro

A general-purpose data analysis engine radically changing the way batch and stream data is processed

analytics big-data-analytics edge-analytics iot stream-analytics stream-processing

Last synced: 10 Sep 2025

https://github.com/bydevmar/master_masd_fpo

Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.

acp afc algebra big-data-analytics dashboards data-analysis datascience economics english graph-theory latex linear-algebra non-linear-algebra probability prog python scientific-research software-package statistics

Last synced: 05 May 2025

https://github.com/ren294/covid-data-process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

airflow aws aws-ec2 aws-quicksight big-data big-data-analytics covid19-data docker docker-compose hadoop-hdfs hdfs hive kafka nifi pipeline redpanda spark spark-sql spark-streaming sparksql

Last synced: 29 Oct 2025

https://github.com/msusazureaccelerators/workplace-intelligence-accelerator

The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.

accelerator ai artificial-intelligence azure azure-devops big-data-analytics human-resources m365 machine-learning microsoft ml power-bi workplace-analytics

Last synced: 07 Oct 2025

https://github.com/sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics

Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.

azure azure-data-factory azure-fabric bi-analytics big-data-analytics big-data-projects cloud-data-warehouse cloud-dataflow data-analytics data-engineering data-engineering-pipeline data-engineering-project data-pipeline-monitoring data-science data-visualization data-warehouse etl etl-framework etl-pipeline

Last synced: 14 Apr 2026

https://github.com/ren294/log-analysis-project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming

Last synced: 08 Jul 2025

https://github.com/logannye/emsqrt

Process any data size with a fixed, small memory footprint. EM-√ is an external-memory ETL/log processing engine with hard peak-RAM guarantees. Unlike traditional systems that "try" to stay within memory limits, EM-√ enforces a strict memory cap, enabling you to process arbitrarily large datasets using small memory footprints.

big-data big-data-analytics cloud cloud-computing edge-ai edge-computing efficiency efficient-algorithm memory-allocation rust streaming streaming-algorithms streaming-data

Last synced: 27 Apr 2026

https://github.com/dgkanatsios/gameanalyticseventhubfunctionscosmosdatalake

Big data reference architecture and implementation for an online multiplayer game

big-data big-data-analytics data-lake-analytics event-hubs lambda-architecture

Last synced: 14 Apr 2025

https://github.com/angeligareta/spark-flight-prediction

Assignment for Cloud Computing And Big Data Ecosystems Design subject that aims to predict flight arrival time using Apache Spark and Scala.

apache-spark big-data big-data-analytics cloud-computing scala upm

Last synced: 19 May 2026

https://github.com/harshoza36/movielens_pyspark

MovieLens Dataset analysis using Hadoop and Pyspark

big-data-analytics hadoop movielens movielens-data-analysis pyspark spark spark-sql

Last synced: 17 May 2026

https://github.com/garystafford/dataproc-java-demo

Demonstration of Google Cloud Dataproc for running Spark jobs with Java

big-data-analytics dataproc gcp google java spark

Last synced: 03 Aug 2025

https://github.com/fiware/tutorials.big-data-spark

:blue_book: FIWARE 306: Real-time Processing of Context Data using Apache Spark

apache-spark big-data-analytics fiware fiware-cosmos orion-spark-connector spark tutorial

Last synced: 27 Feb 2026

https://github.com/bayunova28/sas_visual_data_mining_machine_learning

This repository contains about my weekly projects from Big Data Analytics II course at my college

big-data-analytics big-data-projects data-science machine-learning neural-network

Last synced: 20 Mar 2026

https://github.com/aveek-saha/cricket-score-predictor

A Big data application to predict the outcome of a T20 cricket match.

big-data big-data-analytics clustering pyspark spark spark-mllib

Last synced: 11 Apr 2026

https://github.com/ssiarhei115/customer-classification

Developing ML model predicting bank' customer inclination to open a deposit

big-data big-data-analytics data data-science data-visualization mashine-learning

Last synced: 09 Apr 2025

https://github.com/windi-wulandari/pbi_kimia-farma-x-rakamin

A data-driven analytics project for Kimia Farma to evaluate business performance from 2020-2023 using BigQuery. Focused on transaction data, inventory, branch operations, and product insights. Results were visualized through an interactive dashboard to support strategic decisions and optimizations.

big-data-analytics bigquery datawarehouse googlelooker sql

Last synced: 03 Jan 2026

https://github.com/pdoup/avoulos

Big Data Analytics Project - Fall '21

big-data-analytics spark

Last synced: 15 May 2026

https://github.com/vara-co/home_sales

Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions

big-data big-data-analytics cache google-colab google-colaboratory parquet pyspark pyspark-sql

Last synced: 28 Mar 2025

https://github.com/adnanrahin/spark-flights-data-analysis

The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations.

apache-spark big-data-analytics docker docker-compose docker-container java maven spark spark-sql spark-streaming

Last synced: 08 Apr 2026

https://github.com/abroniewski/idlecompute-data-management-architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

bdma big-data big-data-analytics bigdata dataops hadoop-hdfs machine-learning parquet pipeline pyspark-mllib

Last synced: 08 Aug 2025

https://github.com/jabhij/predictionmodels_heartdiseases

Comparison of various Machine Learning algorithms for Heart Diseases (Heart Attack) prediction.

big-data-analytics bigdata data-visualization datamodeling decision-tree hive knn logistic-regression machine-learning mapreduce naviebayes random-forest svm

Last synced: 04 Jun 2026

https://github.com/nickenshidqia/big_data_analytics_kimia_farma

Big Data Analytics Project gives challenges to create data mart design and dashboard on Kimia Farma

big-data-analytics dashboard data-analyst looker-studio postgresql

Last synced: 10 Apr 2025

https://github.com/theoliverlear/crypto-trader

A Spring Boot web app that buys and sells cryptocurrencies from API data sources. Its quick trading and other features allow users to leverage computer power to outperform the market.

ai ai-models big-data-analytics cryptocurrency data-science financial full-stack hibernate-jpa machine-learning nodejs python sass service spring-boot tensorflow typescript website

Last synced: 13 Apr 2026

https://github.com/smusab9152/pyspark_programs_and_projects

Collection of PySpark programs and projects demonstrating the use of Apache Spark's Python API for big data processing and analysis. It includes practical implementations such as logistic regression classification, data analysis on the Iris dataset, and basic PySpark operations like temperature conversion.

apache-spark big-data big-data-analytics data-engineering distributed-computing etl pyspark spark-dataframes spark-rdd spark-sql

Last synced: 18 May 2026

https://github.com/vinay-ram1999/data-engineer-playground

A fully containerized multi-service environment to prototype end-to-end ETL workflows.

airflow big-data-analytics data-engineering delta-lake docker iceberg lakehouse minio nessie postgresql spark sql trino unitycatalog

Last synced: 14 Apr 2026

https://github.com/rohith-2/big_data_analysis

Performance of Aircraft in the US from 1987 to 2008.

apache-spark big-data big-data-analytics bigdata dashboard scala spark tableau

Last synced: 23 Jan 2026

https://github.com/nathanvilbert/kaiture-agriculture-business-reports-with-power-bi

The project "Kaiture-Agriculture-Business-Reports-with-Power-BI" focuses on utilizing Business Intelligence to optimize agricultural yield and productivity. By integrating Power BI for data analysis, this project provides comprehensive insights into crop production patterns, market trends, and key factors affecting yield.

big-data-analytics data-visualization power-bi sas swot-analysis

Last synced: 19 Mar 2026

https://github.com/bayunova28/sas_viya_programming

This repository contains about my weekly projects from Big Data Analytics II course at my college

big-data-analytics big-data-projects data-science sas sas-viya

Last synced: 20 Mar 2026

https://github.com/sohhamseal/random-topic-ppts

A set of presentations I had created for various seminar work and/or coursework

big-data-analytics blockchain-technology brain-computer-interface

Last synced: 19 Mar 2026

https://github.com/hrolive/big-data-analysis-with-hadoop-and-rhadoop

Foundations of “Big Data” processing by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop, Rhadoop, and R libraries parallel, doParallel, foreach and Rmpi.

big-data big-data-analytics hadoop hdfs hpc hpc-clusters jupyter mapreduce mpi python r rstudio unix

Last synced: 19 Apr 2026

https://github.com/varshithdupati/yelp-business-analysis

Big Data analysis on Yelp reviews/businesses for Arizona. Using Hadoop, Spark, PySpark.

arizona-state-university big-data big-data-analytics data-analysis hadoop pyspark spark yelp

Last synced: 04 May 2026

https://github.com/mrham17/spotify_streaming_analytics

Project is stable & documentation will be completed soon. Thank you for your understanding and patience.

big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics

Last synced: 24 Jul 2025

https://github.com/zmyzheng/stack_overflow_qa_assistant

Big Data Analysis project with recommendation, cluster analysis and graph database

big-data-analytics cluster-analysis data-visualization graph-database hadoop mahout recommendation-system

Last synced: 30 Mar 2025

https://github.com/zmyzheng/browserassistant

Big Data & Cloud Computing project for recommendation, cluster analysis, data visualization with Hadoop and Spark deployed in auto- scaling cloud environment, youtube link:

angular big-data-analytics cloud cluster-analysis data-visualization elasticsearch flask hadoop recommendation-system spark spring-boot

Last synced: 14 Apr 2026

https://github.com/hanif-syazul/analyzing-kimia-farma-sales-performance-with-gcp

This repository contains the final project for the Rakamin Big Data Analytics Internship. It include a complete dashboard of Kimia Farma's sales performance analysis from 2020 to 2023.

big-data-analytics bigquery internship-project kimia-farma looker-studio rakamin sql

Last synced: 02 Jan 2026

https://github.com/mehwishferoz/bda-project

A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.

big-data-analytics hadoop hadoop-hdfs hadoop-mapreduce java-8

Last synced: 10 May 2026

https://github.com/kaustubh-indulkar/te-it-dsbda-assignmnets

This repository contains the solutions for a series of assignments covering Data Science And Big Data Analytics concepts.

big-data big-data-analytics data-analytics data-science data-visualization sppu-2019-pattern sppu-it-dept

Last synced: 29 Mar 2025

https://github.com/rudra-g-23/rural-financial-inclusion-govt-scheme-recommendation

A Project where Analysis the Govt. Dataset to understand the rural Indian financial condition and create a ML model for prediction.

big-data big-data-analysis big-data-analytics big-data-and-ml big-data-projects big-data-visualization ml mlops

Last synced: 22 Feb 2026

https://github.com/mituskillologies/bigdata-ait-sep24

Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.

apache-hadoop apache-spark big-data big-data-analytics hadoop spark

Last synced: 10 Mar 2026

https://github.com/yuvrajsaraogi/unemployment-analysis-with-python

Unemployment is measured by the unemployment rate which is the number of people who are unemployed as a percentage of the total labour force. We have seen a sharp increase in the unemployment rate during Covid-19, so analyzing the unemployment rate can be a good data science project.

big-data big-data-analytics data-analysis data-science data-visualization engineering excel jupyter-notebook machine-learning mini-project natural-language-processing nlp project python3 sql

Last synced: 19 Apr 2026

https://github.com/aanujkhurana/bigdata-analysis

SocialMedia Big Data Analysis for Eminem (music artist), using RStudio and R lang

big-data big-data-analytics r-programming-language rstudio

Last synced: 13 Jul 2025

https://github.com/h-fuzzy-logic/technical-writing

Technical writing samples. Includes walkthroughs and tutorials around data engineering and cloud architectures.

big-data big-data-analytics cloud data-engineering

Last synced: 12 Jan 2026

https://github.com/aalkiyumi/project-4-big-data-analysis-with-pyspark-on-weather-data

In this project, I analyzed weather data from the NCEI Global Surface Summary of Day dataset using PySpark in Jupyter Notebook. Tasks included data cleaning, statistical analysis, and forecasting for temperature, wind speed, precipitation, and extreme weather events. The project also predicts future weather patterns for Cincinnati and Florida.

big-data-analytics cs5165 data-analysis data-cleaning data-engineering data-science introduction-to-cloud-computing jupyter-notebook machine-learning precipitation-analysis predictive-modeling pyspark statistical-analysis temperature-forecasting time-series-forecasting uc uc2026 university-of-cincinnati wind-speed-data

Last synced: 17 Mar 2025

https://github.com/jabulente/histogram-visualization-with-matplotlib

This repository showcases how to create visually appealing and customized histograms using Python’s Matplotlib and Seaborn libraries. It includes examples of enhancing default plots with colors, fonts, transparency, and layout adjustments to better communicate data distribution insights.

ai big-data-analytics data-science data-storytelling data-visualization histogram matplotlib

Last synced: 22 Jul 2025

https://github.com/adwaiy2912/bda-lab

Repository contains weekly lab work and assignments for the Big Data Analytics (BDA) course

big-data-analytics hadoop hbase hive mapreduce pig-latin spark

Last synced: 13 May 2026

https://github.com/radhikareddy-chintareddy/big-data-insights-nyc-taxi-trips-2013-

A project showcasing memory-efficient big data processing using Python, focusing on scalable data handling to overcome memory constraints. Includes anomaly detection, efficient visualizations, and actionable insights from the 2013 NYC Taxi Trip dataset.

big-data-analytics csv-reader matplotlib-pyplot python

Last synced: 16 Apr 2026

https://github.com/r13i/cheapest-phone-call

Small challenge to find the best phone operator to use based on call price

big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist

Last synced: 04 May 2026