Projects in Awesome Lists tagged with mapreduce
A curated list of projects in awesome lists tagged with mapreduce .
https://github.com/donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
aws big-data caffe data-science deep-learning hadoop kaggle keras machine-learning mapreduce matplotlib numpy pandas python scikit-learn scipy spark tensorflow theano
Last synced: 12 May 2025
https://github.com/powerjob/powerjob
Enterprise job scheduling middleware with distributed computing ability.
cron distributed java job job-scheduler mapreduce scheduler workflow
Last synced: 14 May 2025
https://github.com/PowerJob/PowerJob
Enterprise job scheduling middleware with distributed computing ability.
cron distributed java job job-scheduler mapreduce scheduler workflow
Last synced: 27 Mar 2025
https://github.com/douban/dpark
Python clone of Spark, a MapReduce alike framework in Python
bigdata dpark mapreduce python spark stream-processing
Last synced: 29 Oct 2025
https://github.com/mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
apache-hadoop apache-spark data-algorithms design-patterns distributed-algorithms distributed-computing hadoop-mapreduce java machine-learning mappers mapreduce partitioning pyspark python reducers scala
Last synced: 14 May 2025
https://github.com/microsoft/mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 14 May 2025
https://github.com/microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 08 Apr 2025
https://github.com/Microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 14 Mar 2025
https://github.com/cdapio/cdap
An open source framework for building data analytic applications.
cdap dataset integration java java-8 mapreduce middleware platform python spark spark-streaming unified
Last synced: 13 May 2025
https://github.com/bcongdon/corral
🐎 A serverless MapReduce framework written for AWS Lambda
aws-lambda mapreduce mapreduce-framework serverless
Last synced: 04 Apr 2025
https://github.com/grailbio/bigslice
A serverless cluster computing system for the Go programming language
bigdata cluster computing etl go golang machinelearning mapreduce
Last synced: 21 Apr 2025
https://github.com/apache/uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
mapreduce remote-shuffle-service rss shuffle spark tez
Last synced: 15 May 2025
https://github.com/apache/incubator-uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
mapreduce remote-shuffle-service rss shuffle spark tez
Last synced: 10 Mar 2025
https://github.com/CamDavidsonPilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
distributed-computing estimate mapreduce percentile pyspark python quantile
Last synced: 26 Mar 2025
https://github.com/camdavidsonpilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
distributed-computing estimate mapreduce percentile pyspark python quantile
Last synced: 08 Apr 2025
https://github.com/cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Last synced: 04 Oct 2025
https://github.com/tencent/firestorm
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
mapreduce remoteshuffle shuffle spark
Last synced: 18 Oct 2025
https://github.com/mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
algorithms bigdata data data-abstractions data-algorithms data-transformation dataframes design design-patterns machine-learning mappers mapreduce monoid partitioning-algorithms pyspark python rdd reducers spark transformations
Last synced: 07 Apr 2025
https://github.com/lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
apache-spark dataproc emr hadoop learning-hadoop mapreduce spark wordcount
Last synced: 16 May 2025
https://github.com/kevwan/mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
concurrent concurrent-programming go golang mapreduce mapreduce-go
Last synced: 06 Jul 2025
https://github.com/mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
algorithms apache-hadoop apache-spark big-data data-algorithms data-analysis data-engineering data-partition data-transformation glossary mapreduce mapreduce-algorithm mapreduce-python monoid partitioning-algorithms pyspark pyspark-algorithms-book santa-clara-university spark-dataframes spark-rdd
Last synced: 12 Apr 2025
https://github.com/touero/ctenopharyngodon-idella
Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
fastapi hadoop hadoop-mapreduce java mapreduce maven scraping
Last synced: 08 Oct 2025
https://github.com/mimecast/dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
adhoc devops devops-tools distributed golang log log-management mapreduce mimecast troubleshooting
Last synced: 03 May 2025
https://github.com/cocainecong/tangseng
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
boltdb distributed-systems dockcer-compose docker etcd full-text-search gin grpc inverted-index kafka losertree lsm-tree mapreduce search-engine segment vector-search
Last synced: 05 Apr 2025
https://github.com/feng-li/Distributed-Statistical-Computing
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
hadoop mapreduce pyspark-tutorial spark spark-teaching statistical-models
Last synced: 26 Mar 2025
https://github.com/Refefer/Dampr
Python Data Processing library
batch-processing dataflow machine-learning mapreduce
Last synced: 19 Jul 2025
https://github.com/kwartile/connected-component
Map Reduce Implementation of Connected Component on Apache Spark
apache-spark connected-components graph-algorithms graphx mapreduce scala union-find
Last synced: 27 Jul 2025
https://github.com/mahmoudparsian/pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
algorithms big-data data data-abstractions data-science dataframe distributed-computing graphframes mapreduce monoid nosql partitioning pyspark pyspark-algorithms python rdd spark transformations
Last synced: 07 Apr 2025
https://github.com/flipkart-incubator/hbase-orm
A production-grade HBase ORM library that makes accessing HBase clean, fast and fun (Can also be used as Bigtable ORM)
bigtable bigtable-orm cloud-bigtable hadoop hbase hbase-orm mapreduce object-mapping orm
Last synced: 24 Jul 2025
https://github.com/groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.
apache-sedona apache-spark big-data bigdata bigtop docker gutenberg-ebooks hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce jupyter-notebook mapreduce mapreduce-bash mrjob pyspark spark spark-sql testdfsio
Last synced: 06 Apr 2025
https://github.com/nellore/rail
Scalable RNA-seq analysis
alignments emr ipython mapreduce rail-rna rna-seq-analysis
Last synced: 09 Apr 2025
https://github.com/am-kantox/elixir-iteraptor
Handy enumerable operations implementation.
elixir elixir-lang iteration mapreduce
Last synced: 09 Apr 2025
https://github.com/turboway/pybigdata
使用 python 操作大数据的各种组件
elasticsearch hadoop hbase hive impala kafka mapreduce spark
Last synced: 23 Jul 2025
https://github.com/arindas/mit-6.824-distributed-systems
Template repository to work on the labs from MIT 6.824 Distributed Systems course.
distributed-systems mapreduce raft-consensus-algorithm
Last synced: 31 Aug 2025
https://github.com/asuiu/pyxtension
Pure Python extensions library that includes Scala-like streams, Json with attribute access syntax, and other common use stuff
java-streams mapreduce python python-iterables python-itertools python-json python-mapreduce python-multiprocessing python-multithreading python-streaming streaming
Last synced: 09 Apr 2025
https://github.com/whitfin/efflux
Easy Hadoop Streaming and MapReduce interfaces in Rust
Last synced: 16 Apr 2025
https://github.com/hiejulia/data-pipeline-project
Data pipeline project
amazon-web-services azure bigml classification data-pipeline deployment distributed-systems hadoop java kafka machine-learning mapreduce maven spark streaming
Last synced: 16 Jul 2025
https://github.com/jishnub/parallelutilities.jl
Fast and easy parallel mapreduce on HPC clusters
distributed distributed-computing high-performance-computing hpc hpc-applications hpc-cluster hpc-clusters julia mapreduce parallel parallel-computing reduction
Last synced: 05 Sep 2025
https://github.com/longshilin/hadoop-mapreduce
基于MapReduce的应用案例 :ear_of_rice:
Last synced: 05 Oct 2025
https://github.com/d2si-oss/ooso
Java library for running Serverless MapReduce jobs
aws java lambda library mapreduce serverless
Last synced: 13 Oct 2025
https://github.com/deeptiman/offchaindata
Hyperledger Fabric OffChain Storage
blockchain-technology couchdb docker golang grpc-client grpc-go grpc-service hyperledger-fabric mapreduce mapreduce-demo offchain-storage
Last synced: 23 Apr 2025
https://github.com/innofang/subgraph-isomorphism
❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop
isomorphism mapreduce mapreduce-algorithm subgraph-count subgraph-isomorphism ullmann ullmann-algorithm vf2 vf2-algorithm
Last synced: 29 Apr 2025
https://github.com/zunzhuowei/qs-hadoop
大数据生态圈学习
bigdata elasticsearch hadoop mapreduce spark spark-streaming storm
Last synced: 22 Jul 2025
https://github.com/yaa110/goterator
Lazy iterator implementation for Golang
golang golang-module golang-package iterator mapreduce
Last synced: 01 May 2025
https://github.com/alash3al/aggrex
a crazy API gateway aggregation using javascript as a language and go as a runtime
aggregator api api-client api-gateway cloud golang javascript mapreduce reverse-proxy
Last synced: 28 Apr 2025
https://github.com/goldmansachs/mrword2vec
A MapReduce / Hadoop implementation of Word2Vec
Last synced: 11 Apr 2025
https://github.com/magicxor/mapreduce
LINQ for Delphi (Object Pascal)
arrays delphi filter flatmap foreach generics high-order-function linq map mapreduce object-pascal pascal reduce selectmany
Last synced: 27 Feb 2025
https://github.com/dayyass/pydfs
Distributed File System written in Python
distributed-systems filesystem hadoop hdfs mapreduce python
Last synced: 13 Apr 2025
https://github.com/gramian/hapod
HAPOD - Hierarchical Approximate Proper Orthogonal Decomposition
data-driven data-reduction data-science datascience dimension-reduction distributed-memory high-performance-computing hpc limited-memory mapreduce mapreduce-algorithm model-order-reduction model-reduction pca pod proper-orthogonal-decomposition svd unsupervised-learning
Last synced: 06 May 2025
https://github.com/usc-isi-i2/pyrallel
Yet another easy-to-use python3 parallel library for humans.
mapreduce multiprocessing multithreading parallel parallel-computing parallel-processing parallel-programming python python3 queue shared-memory
Last synced: 26 Jun 2025
https://github.com/eftec/documentstoreone
A flat document store for PHP that allows multiples concurrencies
bigdata database mapreduce php php-library
Last synced: 20 Jun 2025
https://github.com/ktorzpersonal/purescript-ifrit
An SQL -> NoSQL compiler for data aggregation
aggregation compiler mapreduce mongodb nosql pipeline sql
Last synced: 12 Dec 2025
https://github.com/conradsnicta/armadillo-gmm
gmm_diag and gmm_full: C++ classes for multi-threaded Gaussian mixture models and Expectation-Maximisation
armadillo clustering clustering-algorithm cpp em-algorithm expectation-maximization gaussian-mixture-models gmm k-means k-means-clustering machine-learning mapreduce openmp statistics
Last synced: 13 May 2025
https://github.com/lithops-cloud/airflow-plugin
Plugin for Apache Airflow to execute serverless tasks using Lithops
airflow airflow-plugin big-data bigdata dag faas ibm ibm-cloud ibm-cloud-functions map mapreduce serverless serverless-functions
Last synced: 02 Sep 2025
https://github.com/bugenzhao/6.824-mapreduce
An implementation of "6.824 Lab 1: MapReduce (2021)" in async Rust.
6824 distributed-systems mapreduce mit rpc
Last synced: 11 Apr 2025
https://github.com/vikhyat/stormycloud
Ridiculously simple distributed applications in Ruby.
distributed-systems mapreduce ruby
Last synced: 30 Apr 2025
https://github.com/isislab-unisa/sof
Simulation Optimization and exploration Framework on the cloud: SOF
agent-based-simulation hadoop java mapreduce optimization-process simulation-model simulation-optimization sof
Last synced: 12 Apr 2025
https://github.com/nwjlyons/slice
Elixir's Enum module implemented in Go using generics.
Last synced: 14 Aug 2025
https://github.com/asuiu/streamerate
Iterable Java8 style Streams for Python
java-streams map-reduce mapreduce python python-iterables python-itertools python-mapreduce python-multiprocessing python-multithreading python-streaming python3 streaming
Last synced: 06 Jul 2025
https://github.com/banyc/mapreduce
In C#. Master-Worker. From scratch. No Hadoop. Done. Depend on DFS.
distributed-systems educational from-scratch mapreduce master-slave object-oriented-programming
Last synced: 14 May 2025
https://github.com/perfectlysoft/perfect-hadoop
Perfect Hadoop: WebHDFS, MapReduce & Yarn.
hadoop mapreduce perfect server-side-swift swift webhdfs yarn
Last synced: 05 May 2025
https://github.com/jishnub/mpimapreduce.jl
An MPI-based distributed map-reduce function for Julia
distributed-computing julia mapreduce message-passing mpi parallel parallel-computing
Last synced: 28 Oct 2025
https://github.com/lapets/mr4mp
Thin MapReduce-like layer that wraps the Python multiprocessing library.
library mapreduce multiprocessing multiprocessing-library parallel-programming parallel-python python python-library
Last synced: 14 Jul 2025
https://github.com/timvisee/wrdcntr
:dash: A simple yet very fast word counter witten in Rust
concurrency mapreduce rayon rust wordcount
Last synced: 12 Apr 2025
https://github.com/bikash/r2time
R connector for OpenTSDB: Analyzing large time-series data in R environment using data-intensive capabilities.
hbase mapreduce opentsdb timeseries
Last synced: 25 Jul 2025
https://github.com/hxndev/hadoop-mapreduce-to-analyze-sentiment-of-keyword
In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.
code hadoop hadoop-hdfs hadoop-mapreduce hdfs java mapreduce mapreduce-java parallel-computing parallel-programming sentiment-analysis sentimental-analysis
Last synced: 11 Oct 2025
https://github.com/edydfang/uw-madison-cs537
Operating System Projects
filesystem mapreduce operating-system rowhammer scheduling shell system-calls xv6
Last synced: 13 Apr 2025
https://github.com/WilliamX1/cse-2021
A distributed file system similar to Google File System (GFS).
distributed-file-system gfs mapreduce raft rpc
Last synced: 14 Apr 2025
https://github.com/chasakisd/distributedsystems
A recommendations Android App with a Backend Server in Java that supports Map & Reduce.
android distributed-systems mapreduce xamarin xamarin-android
Last synced: 30 Apr 2025
https://github.com/hxndev/finding-average-temperature-of-each-year-using-hadoop-hdfs
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
average-calculator code hadoop hadoop-cluster hadoop-filesystem hadoop-hdfs hadoop-mapreduce java mapreduce mapreduce-java
Last synced: 24 Oct 2025
https://github.com/ChasakisD/DistributedSystems
A recommendations Android App with a Backend Server in Java that supports Map & Reduce.
android distributed-systems mapreduce xamarin xamarin-android
Last synced: 12 Mar 2025
https://github.com/dedalozzo/eoc-server
A complete CouchDB Query Server written in PHP.
couchdb couchdb-query-server couchdb-server mapreduce php
Last synced: 01 May 2025
https://github.com/hxndev/hadoop-mapreduce-to-find-average-length-of-comments
In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.
average-length code comments distributed-computing distributed-systems hadoop hadoop-filesystem hadoop-hdfs hadoop-mapreduce hdfs java mapreduce mapreduce-java parallel-computing parallel-programming
Last synced: 31 Mar 2025
https://github.com/jordicorbilla/mapreduce
Data parallel text processing with MapReduce
Last synced: 18 Aug 2025
https://github.com/rezaei121/elasticsearch-mapreduce-wordcount
elasticsearch Map/Reduce integration (word count project)
bigdata elasticsearch elasticsearch-mapreduce-wordcount hdfs java mapreduce wordcount
Last synced: 27 Apr 2025
https://github.com/khinshankhan/nlp-tf-idf-hadoop
NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop
Last synced: 24 Dec 2025
https://github.com/xudong963/distributed-systems
MapReduce, Raft, KV Raft, Shared KV
fault-tolerance golang kvs-server learn-golang mapreduce papers raft
Last synced: 23 Dec 2025
https://github.com/anicolaspp/reactor
Multi-system MapR integration
java mapr mapr-db mapr-streams maprdb mapreduce mapreduce-java scala
Last synced: 06 Mar 2025
https://github.com/zurfyx/cassandra-hadoop-example
Cassandra Hadoop Example
cassandra hadoop mapreduce nodejs
Last synced: 30 Mar 2025