Projects in Awesome Lists tagged with map-reduce
A curated list of projects in awesome lists tagged with map-reduce .
https://github.com/chrislusf/gleam
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.
distributed-computing distributed-systems golang map-reduce
Last synced: 13 May 2025
https://github.com/numaproj/numaflow
Kubernetes-native platform to run massively parallel data/streaming jobs
data-processing hacktoberfest k8s kubernetes map-reduce pipeline stream-processing
Last synced: 14 Mar 2026
https://github.com/qihoo360/poseidon
A search engine which can hold 100 trillion lines of log data.
big-data golang map-reduce poseidon search-engine
Last synced: 08 Apr 2025
https://github.com/Qihoo360/poseidon
A search engine which can hold 100 trillion lines of log data.
big-data golang map-reduce poseidon search-engine
Last synced: 11 Apr 2025
https://github.com/juliafolds/transducers.jl
Efficient transducers for Julia
distributed-computing high-performance iterators julia map-reduce parallel transducers
Last synced: 05 Apr 2025
https://github.com/tirthajyoti/spark-with-python
Fundamentals of Spark with Python (using PySpark), code examples
analytics apache apache-spark big-data database dataframe distributed-computing hadoop hdfs machine-learning map-reduce mlib parallel-computing pyspark python spark sql
Last synced: 05 Apr 2025
https://github.com/tkf/threadsx.jl
Parallelized Base functions
high-performance julia map-reduce parallel sorting-algorithms transducers
Last synced: 16 May 2025
https://github.com/xarray-contrib/flox
Fast & furious GroupBy operations for dask.array
Last synced: 12 Dec 2025
https://github.com/asavinov/prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow
Last synced: 11 Apr 2025
https://github.com/daleroberts/pypar
Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
big-data map-reduce mpi python
Last synced: 04 Mar 2026
https://github.com/juliafolds/foldscuda.jl
Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)
cuda gpu high-performance iterators julia map-reduce parallel transducers
Last synced: 13 Apr 2025
https://github.com/rvantonder/hack_parallel
The core parallel and shared memory library used by Hack, Flow, and Pyre
map-reduce ocaml parallel shared-memory
Last synced: 13 Feb 2026
https://github.com/cheng-lin-li/spark
There are Python 2.7 codes and learning notes for Spark 2.1.1
als alternating-least-squares apriori-algorithm apriori-son cosine-similarity kmeans kmeans-clustering map-reduce minhash minhash-lsh-algorithm python27 savasere-omiecinski-and-navathe spark tf-idf uv-decomposition
Last synced: 19 Feb 2026
https://github.com/agynio/claude-map-reduce-memory
Global, unlimited persistent memory for Claude Code agents. Context-activated hints injected automatically via hooks using scatter-gather map-reduce.
agent-memory ai-agents claude claude-code cli cmr-memory llm map-reduce prompt-caching
Last synced: 28 May 2026
https://github.com/captaincodeman/datastore-mapper
Appengine Datastore Mapper in Go
appengine bigquery cloud-storage datastore datastore-entities datastore-mapper go map-reduce shards
Last synced: 26 Apr 2025
https://github.com/shellyln/open-soql
Open source implementation of the SOQL.
dml graph-query javascript library map-reduce object-query query-engine resolvers soql sql typescript
Last synced: 13 Apr 2025
https://github.com/asadiahmad/word-counter
Word Counter with Haskell for Programming Language Design Course
haskell map-reduce recursive-algorithm word-counter
Last synced: 19 Sep 2025
https://github.com/gwr3n/jsdp
A Java Stochastic Dynamic Programming Library
control dynamic inventory java lambda-calculus maintenance map-reduce object-oriented optimal parallel programming stochastic stream uncertainty
Last synced: 12 Jan 2026
https://github.com/futureverse/future.mapreduce
[EXPERIMENTAL] R package: future.mapreduce - Utility Functions for Future Map-Reduce API Packages
Last synced: 10 Apr 2025
https://github.com/mesqueeb/map-anything
Array.map but for objects with good TypeScript support. A small and simple integration.
compose map-object map-reduce mapping object-map object-mapper object-to-object transform
Last synced: 13 Apr 2025
https://github.com/asuiu/streamerate
Iterable Java8 style Streams for Python
java-streams map-reduce mapreduce python python-iterables python-itertools python-mapreduce python-multiprocessing python-multithreading python-streaming python3 streaming
Last synced: 06 Jul 2025
https://github.com/drapegnik/bsuir
🎓Repository for masters labs on FCSN, BSUIR
aws-lambda blockchain bsuir bsuir-labworks digital-signal-processing dsp hyperledger hyperledger-fabric labs machine-learning map-reduce neural-networks oop plc study tcp tcp-chat traffic-light uml
Last synced: 18 Mar 2025
https://github.com/natelalor/ai_report_generator
A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.
artificial-intelligence chatbot embedding-models langchain map-reduce object-oriented-programming openai openai-api pinecone python vector-database
Last synced: 26 Jul 2025
https://github.com/gregorykogan/yt-framework
Build scalable data pipelines on YTsaurus with automatic stage management, local development simulation, and more.
big-data data-pipeline distributed-computing etl framework map-reduce python yt ytsaurus
Last synced: 24 Feb 2026
https://github.com/frobnitzem/mpi_list
A package for working with lists distributed over MPI
data-science hpc map-reduce mpi4py
Last synced: 18 Mar 2025
https://github.com/cbuschka/aws-scatter-gather
Scatter gather with AWS lambda
aws fork-join lambda map-reduce python scatter-gather step-functions terraform
Last synced: 01 May 2026
https://github.com/shathor/gaia-cluster
Provides a scaffold to easily build a cluster to query the data from ESA's Gaia satellite. Gaia is an ambitious mission to chart a three-dimensional map of our Galaxy, the Milky Way. Gaia will provide unprecedented positional and radial velocity measurements with the accuracies needed to produce a stereoscopic and kinematic census of about one billion stars in our Galaxy and throughout the Local Group. This amounts to about 1 per cent of the Galactic stellar population.
apache-cassandra apache-spark astronomy big-data bigdata cassandra cluster distributed-computing esa hadoop java java-8 machine-learning map-reduce
Last synced: 02 Jan 2026
https://github.com/stefan-schroedl/pigrank
Apache Pig UDFs for ranking (ndcg, mrr, jaccard coefficient, cosine similarity, rank-biased overlap)
cosine-similarity dcg hadoop map-reduce mrr pig ranking
Last synced: 22 Apr 2026
https://github.com/sepandhaghighi/hadoop
Anagram Python Script In Hadoop
anagram anagram-solver linux localhost map-reduce python3 script
Last synced: 02 May 2026
https://github.com/sergey-shandar/purelogic-ts
PureLogic for TypeScript
big-data lazy-evaluation linq map-reduce typescript
Last synced: 23 Jul 2025
https://github.com/mc-cat-tty/complementiprogrammazione
Appunti del corso di Complementi di Programmazione. UniMoRe. 2023-2024.
functional-programming garbage-collection garbage-collector map-reduce object-oriented-programming oop python python3 reference-counting unit-testing
Last synced: 21 Apr 2026
https://github.com/sskender/analysis-of-massive-datasets
Analysis of Massive Datasets FER labs
big-data data-flow data-flows frequency-analysis graph-algorithms graph-theory map-reduce mapreduce minhash node-ranking page-rank page-ranking recommendation-system recommender-system simhash similarity-search
Last synced: 18 Aug 2025
https://github.com/sergey-shandar/purelogic
PureLogic
big-data c-sharp dot-net linq map-reduce
Last synced: 23 Jul 2025
https://github.com/stdlib-js/utils-map-reduce-right
Perform a single-pass map-reduce operation against each element in an array while iterating from right to left and return the accumulated result.
accumulate accumulation accumulator aggregate iterate javascript map map-reduce node node-js nodejs reduce reducer reduction stdlib transform util utilities utility utils
Last synced: 16 Jul 2025
https://github.com/martincastroalvarez/hadoop-hdfs-map-reduce-docker
Running Map Reduce in Hadoop using Docker
big-data bigdata hadoop hdfs map-reduce
Last synced: 08 Apr 2025
https://github.com/vasukalariya/mit-distributed-systems-lab-6.824-6.5840
MIT Distributed Systems CS 6.824/CS 6.5840
distributed-systems go key-value map-reduce raft-consensus-algorithm sharding
Last synced: 14 Mar 2025
https://github.com/adhithadias/map-reduce-word-count-openmp-mpi
This repository contains an implementation of counting words in many files using the map-reduce algorithm. The algorithm is implemented in both OpenMP and MPI. A serial implementation is also available for perf evaluation.
ece563 map-reduce mpi openmp openmpi purdue word-count wordcount
Last synced: 30 Jul 2025
https://github.com/anindya-prithvi/map_rizzuse-dscd
A repository for a _real_ project (Map - reduce)
map map-reduce mapreduce reduce
Last synced: 06 Jun 2026
https://github.com/ishaansathaye/csc369-introdistributedcomputing
Cal Poly Fall 2024 CSC 369 Intro to Distributed Computing
distributed-computing hadoop java map-reduce scala spark
Last synced: 02 May 2026
https://github.com/hamidzr/freq-analysis
python map-reduce freq analysis with basic stemmer
Last synced: 14 Mar 2025
https://github.com/jwulf/zeebe-map-reduce
Reusable Map/Reduce workflow in Zeebe
bpmn data-processing map-reduce microservices workflow-engine zeebe
Last synced: 29 Jan 2026
https://github.com/shubhamv108/distributed-systems
Resources for learning distributed systems.
byzantine-fault-tolerance distributed-computing distributed-systems map-reduce
Last synced: 24 Mar 2025
https://github.com/vidhijain/movies
Map Reduce paradigm on movies dataset using PySpark
Last synced: 19 May 2026
https://github.com/vbugaevskii/hadoop-streaming-protoseq
A small library example how to work with binary files with Hadoop Streaming.
binary hadoop hadoop-streaming map-reduce sequence
Last synced: 14 Jan 2026
https://github.com/quinlan-lab/constraint-tools
Tools to discover natural selection given multiple evolved DNA sequences (e.g., gnomad cohort, or multiple tumor samples)
axios flask-api kmer-counting map-reduce plotly-js pysam snv spa vue-material vuejs vuex
Last synced: 29 Apr 2026
https://github.com/zoltan-nz/learning-spark
Playing with Apache Spark
apache-spark java map-reduce spark
Last synced: 17 Apr 2026
https://github.com/adelin-info/tp_datacloud
Architecture et développement des systèmes distribuées à large echelle
hadoop java map-reduce scala spark yarn zookeeper
Last synced: 17 Apr 2026
https://github.com/chen0040/pyspark-advanced-algorithms
Samples of Advanced Algorithms and Data Analysis implemented in pyspark
advanced-algorithms data-analysis map-reduce pyspark
Last synced: 12 Jan 2026
https://github.com/flaviodelgrosso/llm-chain-map-reduce
Simple map reduce LLM chain pattern demo using Ollama and Langchain
langchain llm map-reduce ollama python rag
Last synced: 12 Apr 2026
https://github.com/c3duan/steam-engagement-predictor
Recommender System wrapped with a Binary Classifier
bayesian-personalized-ranking bpr collaborative-filtering latent-factor-model map-reduce pca recommender-system steam steam-games t-sne
Last synced: 18 Aug 2025
https://github.com/f-z/databases
Various database projects (modeling, SQL, R)
data-visualization databases datavisualization map-reduce mapreduce r sql
Last synced: 16 May 2026
https://github.com/aromoh/basic-sentiment-analysis-mrjob-twitter-
Project developed to make an sentiment analysis using dictionary implemented with MrJob applying a map-reduce model. It can be executed locally or in HDFS enviroments (such as Hadoop or AWS)
aws-ec2 hadoop hdfs-enviroments map-reduce mrjob sentiment-analysis twiiter
Last synced: 30 Oct 2025
https://github.com/aryangupta-09/kmeans-using-mapreduce
K-means clustering algorithm using MapReduce.
distributed-systems grpc grpc-python k-means k-means-algorithm k-means-clustering k-means-implementation k-means-implementation-in-python kmeans kmeans-algorithm kmeans-clustering kmeans-clustering-algorithm map-reduce mapreduce mapreduce-algorithm mapreduce-python protobuf-python protobuf3 protocol-buffers remote-communication
Last synced: 18 May 2026
https://github.com/kunalpisolkar24/ir_lab
Collection of practical codes for Savitribai Phule Pune University's Information Retrieval Lab (410247) .
cosine-similarity information-retrieval map-reduce pagerank sppu-computer-engineering text-preprocessing web-crawling
Last synced: 09 Jun 2026
https://github.com/activestate/recipe-577676-dirt-simple-mapreduce
Dirt simple map/reduce
learning learning-by-doing learning-python map-reduce recipes snippets
Last synced: 14 May 2026
https://github.com/divy9881/distributed_computing
Distributed Computing Protocol based on Map-Reduce Computing Paradigm.
c cpp distributed-computing distributed-systems map-reduce
Last synced: 16 May 2026
https://github.com/h1ghbre4k3r/rust-map-reduce
A small hobby implementation of MapReduce that I hacked together at 2am.
Last synced: 07 May 2025
https://github.com/leohmoraes/sum-of-items-grouped-by-decade
A partir de datas e valores, obter a soma de valores separados por décadas / solution of question in whatsapp From dates and values, get the sum of values separated by decades
Last synced: 13 Jul 2025
https://github.com/susheel-1999/genai-youtubesummarize
App that takes a YouTube video URL, extracts its transcript, and generates a consolidated summary using LangChain’s Map-Reduce strategy and Groq’s Llama model.
genai groq langchain llama map-reduce summarize youtube
Last synced: 18 Apr 2026
https://github.com/ia-programming/websearch_langchain
an implementation of websearch and stuff, map-reduce, refine and map-rerank
langchain langchain-python map-reduce map-rerank python python3 refine search-engine stuff
Last synced: 02 May 2026
https://github.com/malisha4065/hadoopproject
Map reducing task with apache hadoop.
apache apache-hadoop hadoop-yarn map-reduce
Last synced: 22 Jul 2025
https://github.com/dimits-ts/large-scale-data
Distributed computing for data science tasks, executed on a Ubuntu server.
cassandra kafka map-reduce spark vagrant
Last synced: 02 Jan 2026
https://github.com/deepcloudlabs/dcl235-2020-mar-09
DCL-235: Effective Java Programming
effective-java java-10 java-11 java-12 java-13 java-14 java-8 java-9 map-reduce stream-api-java8
Last synced: 07 Jun 2026
https://github.com/miferreiro/cdap-map-reduce
Map/Reduce exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
Last synced: 25 Jul 2025
https://github.com/manuparra/tallerh2s
Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada
hadoop hdfs java map-reduce python spark wordcount
Last synced: 28 Aug 2025
https://github.com/massimostanzione/distgrep
A distributed grep, implemented with the MapReduce model.
distributed-computing grep map-reduce mapreduce shuffleandsort
Last synced: 14 Mar 2025
https://github.com/mirzaim/hadoop-twitter-analysis
Hadoop MapReduce analysis of US Election 2020 Tweets.
hadoop hdfs map-reduce tweet-analysis us-election-2020
Last synced: 26 Feb 2025
https://github.com/Francesco-Biscaccia/BigData_Projects
Assignment repository for the Big Data Computing course at the University of Padova for the academic year 2023-2024.
big-data k-center-problem map-reduce reservoir-sampling spark spark-streaming sticky-sampling
Last synced: 03 Oct 2025
https://github.com/lenss/csce438-spring2025
machine problems for TAMU CSCE438 Distributed Processing Systems, Spring 2025
arm c computer-architectures course-project cplusplus distributed-systems docker grpc hadoop map-reduce message-queue networked-systems rabbitmq remote-procedure-call system-design tamu virtual-machine x86
Last synced: 19 Apr 2025
https://github.com/dominicluidold/ws21-introductiontobigdataprojects
A collection of mandatory exercises in "Introduction to Big Data Projects" - 1st semester master @ Vorarlberg University of Applied Sciences (FHV)
avro bigdata hadoop java map-reduce
Last synced: 24 Mar 2025
https://github.com/aneeshmurali-n/python-tasks
Here we are going to learn data science and python by doing tasks by task
data-science data-visualization exploratory-data-analysis filter lambda lambda-functions map-reduce matplotlib numpy oops-in-python pandas python python-fundamentals regex regular-expression
Last synced: 12 Apr 2026
https://github.com/martincastroalvarez/typescript-map-reduce
Map Reduce using Typescript
Last synced: 08 Apr 2025
https://github.com/linkdd/link.parallel
Parallel computing framework
map-reduce parallel-computing pure-python
Last synced: 08 Apr 2025
https://github.com/ranfysvalle02/summwebsite
This repository is not intended to serve as the next 'library' for text summarization. Instead, it is designed to be an educational resource, providing insights into the inner workings of text summarization.
ai azure context-length large-text llm map-reduce openai parallel-computing parallel-processing python summaries summarization summarizer
Last synced: 11 May 2026
https://github.com/fblupi/master_informatica-ccsa
Repositorio de la asignatura Cloud Computing: Servicios y Aplicaciones del Máster de Ingeniería Informática de la UGR
cloud-computing containers data-science docker hadoop mahout map-reduce mapreduce mongodb opennebula virtual-machine
Last synced: 27 Apr 2026
https://github.com/skywardpixel/mit-ds-labs
MIT 6.824 Distributed Systems Engineering labs, Spring 2020.
Last synced: 02 Apr 2025
https://github.com/dhruvsrikanth/mapreducesparsesolver
An Go native implementation of the Map Reduce parallel framework for a sparse linear solver utilizing conjugate gradient to solve the Poisson equation.
conjugate-gradient-optimization go golang high-performance-computing hpc map-reduce parallel-programming poisson-equation sparse-linear-solver
Last synced: 06 Oct 2025
https://github.com/cloudposse-archives/terraform-aws-spotinst-mrscaler
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource
cluster emr emr-cluster hcl2 map-reduce spot-instances spotinst
Last synced: 23 Feb 2026
https://github.com/heracliteanflux/exercises-scala
Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.
apache-hadoop apache-maven apache-spark distributed-computing distributed-file-system distributed-systems hadoop map-reduce mrjob scala spark
Last synced: 14 Mar 2026
https://github.com/christopher-dabrowski/parallel-log-analizer
Program do równoległej analizy logów na 7 laboratoria z Programowania Równoległego
academic-project log-analyzer map-reduce mpi multiprocessing
Last synced: 19 Oct 2025
https://github.com/carlosmorette/luia
🎷 | Distributed file processor (Cluster nodes)
Last synced: 15 Apr 2026
https://github.com/ramitsurana/emr-ml
AWS EMR Info including Hadoop, Map Reduce and Hive along with Machine Learning
Last synced: 10 Feb 2026
https://github.com/ammahmoudi/mapreduce-examples
Map Rduce Examples using pure Scala and Then using Spark
map-reduce mapreduce scala spark spark-mapreduce
Last synced: 16 Apr 2026
https://github.com/alextanhongpin/node-mongo-native
Simple map-reduce example with native mongodb client
Last synced: 18 Apr 2026
https://github.com/bubustack/map-reduce-adapter-engram
Map-reduce adapter Engram for bobrapet — dynamic fan-out with child StoryRuns and result aggregation.
batch bubustack engram fan-out go kubernetes map-reduce parallel
Last synced: 18 Apr 2026
https://github.com/ericlondon/docker-hadoop-streaming-scala
Docker Hadoop Streaming Scala
docker hadoop hdfs map-reduce scala streaming
Last synced: 03 May 2026
https://github.com/aalekh/mpi-mr
A sample implementation of Map Reduce using MPI
Last synced: 16 Feb 2026