An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with map-reduce

A curated list of projects in awesome lists tagged with map-reduce .

https://github.com/chrislusf/gleam

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

distributed-computing distributed-systems golang map-reduce

Last synced: 13 May 2025

https://github.com/numaproj/numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

data-processing hacktoberfest k8s kubernetes map-reduce pipeline stream-processing

Last synced: 14 Mar 2026

https://github.com/qihoo360/poseidon

A search engine which can hold 100 trillion lines of log data.

big-data golang map-reduce poseidon search-engine

Last synced: 08 Apr 2025

https://github.com/Qihoo360/poseidon

A search engine which can hold 100 trillion lines of log data.

big-data golang map-reduce poseidon search-engine

Last synced: 11 Apr 2025

https://github.com/xarray-contrib/flox

Fast & furious GroupBy operations for dask.array

dask map-reduce xarray

Last synced: 12 Dec 2025

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 11 Apr 2025

https://github.com/daleroberts/pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

big-data map-reduce mpi python

Last synced: 04 Mar 2026

https://github.com/juliafolds/foldscuda.jl

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

cuda gpu high-performance iterators julia map-reduce parallel transducers

Last synced: 13 Apr 2025

https://github.com/rvantonder/hack_parallel

The core parallel and shared memory library used by Hack, Flow, and Pyre

map-reduce ocaml parallel shared-memory

Last synced: 13 Feb 2026

https://github.com/agynio/claude-map-reduce-memory

Global, unlimited persistent memory for Claude Code agents. Context-activated hints injected automatically via hooks using scatter-gather map-reduce.

agent-memory ai-agents claude claude-code cli cmr-memory llm map-reduce prompt-caching

Last synced: 28 May 2026

https://github.com/ihor/phadoop

Map/reduce jobs for Hadoop in PHP

hadoop map-reduce php

Last synced: 12 Apr 2025

https://github.com/asadiahmad/word-counter

Word Counter with Haskell for Programming Language Design Course

haskell map-reduce recursive-algorithm word-counter

Last synced: 19 Sep 2025

https://github.com/futureverse/future.mapreduce

[EXPERIMENTAL] R package: future.mapreduce - Utility Functions for Future Map-Reduce API Packages

futures map-reduce package r

Last synced: 10 Apr 2025

https://github.com/mesqueeb/map-anything

Array.map but for objects with good TypeScript support. A small and simple integration.

compose map-object map-reduce mapping object-map object-mapper object-to-object transform

Last synced: 13 Apr 2025

https://github.com/natelalor/ai_report_generator

A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.

artificial-intelligence chatbot embedding-models langchain map-reduce object-oriented-programming openai openai-api pinecone python vector-database

Last synced: 26 Jul 2025

https://github.com/gregorykogan/yt-framework

Build scalable data pipelines on YTsaurus with automatic stage management, local development simulation, and more.

big-data data-pipeline distributed-computing etl framework map-reduce python yt ytsaurus

Last synced: 24 Feb 2026

https://github.com/frobnitzem/mpi_list

A package for working with lists distributed over MPI

data-science hpc map-reduce mpi4py

Last synced: 18 Mar 2025

https://github.com/shathor/gaia-cluster

Provides a scaffold to easily build a cluster to query the data from ESA's Gaia satellite. Gaia is an ambitious mission to chart a three-dimensional map of our Galaxy, the Milky Way. Gaia will provide unprecedented positional and radial velocity measurements with the accuracies needed to produce a stereoscopic and kinematic census of about one billion stars in our Galaxy and throughout the Local Group. This amounts to about 1 per cent of the Galactic stellar population.

apache-cassandra apache-spark astronomy big-data bigdata cassandra cluster distributed-computing esa hadoop java java-8 machine-learning map-reduce

Last synced: 02 Jan 2026

https://github.com/stefan-schroedl/pigrank

Apache Pig UDFs for ranking (ndcg, mrr, jaccard coefficient, cosine similarity, rank-biased overlap)

cosine-similarity dcg hadoop map-reduce mrr pig ranking

Last synced: 22 Apr 2026

https://github.com/stdlib-js/utils-map-reduce-right

Perform a single-pass map-reduce operation against each element in an array while iterating from right to left and return the accumulated result.

accumulate accumulation accumulator aggregate iterate javascript map map-reduce node node-js nodejs reduce reducer reduction stdlib transform util utilities utility utils

Last synced: 16 Jul 2025

https://github.com/martincastroalvarez/hadoop-hdfs-map-reduce-docker

Running Map Reduce in Hadoop using Docker

big-data bigdata hadoop hdfs map-reduce

Last synced: 08 Apr 2025

https://github.com/adhithadias/map-reduce-word-count-openmp-mpi

This repository contains an implementation of counting words in many files using the map-reduce algorithm. The algorithm is implemented in both OpenMP and MPI. A serial implementation is also available for perf evaluation.

ece563 map-reduce mpi openmp openmpi purdue word-count wordcount

Last synced: 30 Jul 2025

https://github.com/anindya-prithvi/map_rizzuse-dscd

A repository for a _real_ project (Map - reduce)

map map-reduce mapreduce reduce

Last synced: 06 Jun 2026

https://github.com/ishaansathaye/csc369-introdistributedcomputing

Cal Poly Fall 2024 CSC 369 Intro to Distributed Computing

distributed-computing hadoop java map-reduce scala spark

Last synced: 02 May 2026

https://github.com/hamidzr/freq-analysis

python map-reduce freq analysis with basic stemmer

frequency-analysis map-reduce

Last synced: 14 Mar 2025

https://github.com/jwulf/zeebe-map-reduce

Reusable Map/Reduce workflow in Zeebe

bpmn data-processing map-reduce microservices workflow-engine zeebe

Last synced: 29 Jan 2026

https://github.com/vidhijain/movies

Map Reduce paradigm on movies dataset using PySpark

apache-spark map-reduce

Last synced: 19 May 2026

https://github.com/vbugaevskii/hadoop-streaming-protoseq

A small library example how to work with binary files with Hadoop Streaming.

binary hadoop hadoop-streaming map-reduce sequence

Last synced: 14 Jan 2026

https://github.com/quinlan-lab/constraint-tools

Tools to discover natural selection given multiple evolved DNA sequences (e.g., gnomad cohort, or multiple tumor samples)

axios flask-api kmer-counting map-reduce plotly-js pysam snv spa vue-material vuejs vuex

Last synced: 29 Apr 2026

https://github.com/zoltan-nz/learning-spark

Playing with Apache Spark

apache-spark java map-reduce spark

Last synced: 17 Apr 2026

https://github.com/adelin-info/tp_datacloud

Architecture et développement des systèmes distribuées à large echelle

hadoop java map-reduce scala spark yarn zookeeper

Last synced: 17 Apr 2026

https://github.com/chen0040/pyspark-advanced-algorithms

Samples of Advanced Algorithms and Data Analysis implemented in pyspark

advanced-algorithms data-analysis map-reduce pyspark

Last synced: 12 Jan 2026

https://github.com/flaviodelgrosso/llm-chain-map-reduce

Simple map reduce LLM chain pattern demo using Ollama and Langchain

langchain llm map-reduce ollama python rag

Last synced: 12 Apr 2026

https://github.com/f-z/databases

Various database projects (modeling, SQL, R)

data-visualization databases datavisualization map-reduce mapreduce r sql

Last synced: 16 May 2026

https://github.com/aromoh/basic-sentiment-analysis-mrjob-twitter-

Project developed to make an sentiment analysis using dictionary implemented with MrJob applying a map-reduce model. It can be executed locally or in HDFS enviroments (such as Hadoop or AWS)

aws-ec2 hadoop hdfs-enviroments map-reduce mrjob sentiment-analysis twiiter

Last synced: 30 Oct 2025

https://github.com/kunalpisolkar24/ir_lab

Collection of practical codes for Savitribai Phule Pune University's Information Retrieval Lab (410247) .

cosine-similarity information-retrieval map-reduce pagerank sppu-computer-engineering text-preprocessing web-crawling

Last synced: 09 Jun 2026

https://github.com/divy9881/distributed_computing

Distributed Computing Protocol based on Map-Reduce Computing Paradigm.

c cpp distributed-computing distributed-systems map-reduce

Last synced: 16 May 2026

https://github.com/h1ghbre4k3r/rust-map-reduce

A small hobby implementation of MapReduce that I hacked together at 2am.

map-reduce mapreduce rust

Last synced: 07 May 2025

https://github.com/leohmoraes/sum-of-items-grouped-by-decade

A partir de datas e valores, obter a soma de valores separados por décadas / solution of question in whatsapp From dates and values, get the sum of values separated by decades

javascript map-reduce

Last synced: 13 Jul 2025

https://github.com/kuanghuei/map-reduce

A Java implementation of MapReduce

map-reduce

Last synced: 24 Jun 2025

https://github.com/susheel-1999/genai-youtubesummarize

App that takes a YouTube video URL, extracts its transcript, and generates a consolidated summary using LangChain’s Map-Reduce strategy and Groq’s Llama model.

genai groq langchain llama map-reduce summarize youtube

Last synced: 18 Apr 2026

https://github.com/ia-programming/websearch_langchain

an implementation of websearch and stuff, map-reduce, refine and map-rerank

langchain langchain-python map-reduce map-rerank python python3 refine search-engine stuff

Last synced: 02 May 2026

https://github.com/malisha4065/hadoopproject

Map reducing task with apache hadoop.

apache apache-hadoop hadoop-yarn map-reduce

Last synced: 22 Jul 2025

https://github.com/dimits-ts/large-scale-data

Distributed computing for data science tasks, executed on a Ubuntu server.

cassandra kafka map-reduce spark vagrant

Last synced: 02 Jan 2026

https://github.com/miferreiro/cdap-map-reduce

Map/Reduce exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020

map-reduce python

Last synced: 25 Jul 2025

https://github.com/manuparra/tallerh2s

Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada

hadoop hdfs java map-reduce python spark wordcount

Last synced: 28 Aug 2025

https://github.com/massimostanzione/distgrep

A distributed grep, implemented with the MapReduce model.

distributed-computing grep map-reduce mapreduce shuffleandsort

Last synced: 14 Mar 2025

https://github.com/mirzaim/hadoop-twitter-analysis

Hadoop MapReduce analysis of US Election 2020 Tweets.

hadoop hdfs map-reduce tweet-analysis us-election-2020

Last synced: 26 Feb 2025

https://github.com/Francesco-Biscaccia/BigData_Projects

Assignment repository for the Big Data Computing course at the University of Padova for the academic year 2023-2024.

big-data k-center-problem map-reduce reservoir-sampling spark spark-streaming sticky-sampling

Last synced: 03 Oct 2025

https://github.com/dominicluidold/ws21-introductiontobigdataprojects

A collection of mandatory exercises in "Introduction to Big Data Projects" - 1st semester master @ Vorarlberg University of Applied Sciences (FHV)

avro bigdata hadoop java map-reduce

Last synced: 24 Mar 2025

https://github.com/martincastroalvarez/typescript-map-reduce

Map Reduce using Typescript

map-reduce typescript

Last synced: 08 Apr 2025

https://github.com/linkdd/link.parallel

Parallel computing framework

map-reduce parallel-computing pure-python

Last synced: 08 Apr 2025

https://github.com/ranfysvalle02/summwebsite

This repository is not intended to serve as the next 'library' for text summarization. Instead, it is designed to be an educational resource, providing insights into the inner workings of text summarization.

ai azure context-length large-text llm map-reduce openai parallel-computing parallel-processing python summaries summarization summarizer

Last synced: 11 May 2026

https://github.com/fblupi/master_informatica-ccsa

Repositorio de la asignatura Cloud Computing: Servicios y Aplicaciones del Máster de Ingeniería Informática de la UGR

cloud-computing containers data-science docker hadoop mahout map-reduce mapreduce mongodb opennebula virtual-machine

Last synced: 27 Apr 2026

https://github.com/skywardpixel/mit-ds-labs

MIT 6.824 Distributed Systems Engineering labs, Spring 2020.

map-reduce raft

Last synced: 02 Apr 2025

https://github.com/dhruvsrikanth/mapreducesparsesolver

An Go native implementation of the Map Reduce parallel framework for a sparse linear solver utilizing conjugate gradient to solve the Poisson equation.

conjugate-gradient-optimization go golang high-performance-computing hpc map-reduce parallel-programming poisson-equation sparse-linear-solver

Last synced: 06 Oct 2025

https://github.com/cloudposse-archives/terraform-aws-spotinst-mrscaler

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource

cluster emr emr-cluster hcl2 map-reduce spot-instances spotinst

Last synced: 23 Feb 2026

https://github.com/heracliteanflux/exercises-scala

Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.

apache-hadoop apache-maven apache-spark distributed-computing distributed-file-system distributed-systems hadoop map-reduce mrjob scala spark

Last synced: 14 Mar 2026

https://github.com/christopher-dabrowski/parallel-log-analizer

Program do równoległej analizy logów na 7 laboratoria z Programowania Równoległego

academic-project log-analyzer map-reduce mpi multiprocessing

Last synced: 19 Oct 2025

https://github.com/carlosmorette/luia

🎷 | Distributed file processor (Cluster nodes)

erlang erpc map-reduce

Last synced: 15 Apr 2026

https://github.com/ramitsurana/emr-ml

AWS EMR Info including Hadoop, Map Reduce and Hive along with Machine Learning

emr hadoop map-reduce

Last synced: 10 Feb 2026

https://github.com/ammahmoudi/mapreduce-examples

Map Rduce Examples using pure Scala and Then using Spark

map-reduce mapreduce scala spark spark-mapreduce

Last synced: 16 Apr 2026

https://github.com/alextanhongpin/node-mongo-native

Simple map-reduce example with native mongodb client

koa map-reduce mongodb nodejs

Last synced: 18 Apr 2026

https://github.com/bubustack/map-reduce-adapter-engram

Map-reduce adapter Engram for bobrapet — dynamic fan-out with child StoryRuns and result aggregation.

batch bubustack engram fan-out go kubernetes map-reduce parallel

Last synced: 18 Apr 2026

https://github.com/aalekh/mpi-mr

A sample implementation of Map Reduce using MPI

map-reduce parallelism

Last synced: 16 Feb 2026