Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with mapreduce
A curated list of projects in awesome lists tagged with mapreduce .
https://github.com/donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
aws big-data caffe data-science deep-learning hadoop kaggle keras machine-learning mapreduce matplotlib numpy pandas python scikit-learn scipy spark tensorflow theano
Last synced: 16 Dec 2024
https://github.com/PowerJob/PowerJob
Enterprise job scheduling middleware with distributed computing ability.
cron distributed java job job-scheduler mapreduce scheduler workflow
Last synced: 30 Oct 2024
https://github.com/powerjob/powerjob
Enterprise job scheduling middleware with distributed computing ability.
cron distributed java job job-scheduler mapreduce scheduler workflow
Last synced: 16 Dec 2024
https://github.com/douban/dpark
Python clone of Spark, a MapReduce alike framework in Python
bigdata dpark mapreduce python spark stream-processing
Last synced: 12 Oct 2024
https://github.com/mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
apache-hadoop apache-spark data-algorithms design-patterns distributed-algorithms distributed-computing hadoop-mapreduce java machine-learning mappers mapreduce partitioning pyspark python reducers scala
Last synced: 19 Dec 2024
https://github.com/Microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 25 Oct 2024
https://github.com/microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 06 Nov 2024
https://github.com/microsoft/mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 21 Dec 2024
https://github.com/cdapio/cdap
An open source framework for building data analytic applications.
cdap dataset integration java java-8 mapreduce middleware platform python spark spark-streaming unified
Last synced: 17 Dec 2024
https://github.com/bcongdon/corral
🐎 A serverless MapReduce framework written for AWS Lambda
aws-lambda mapreduce mapreduce-framework serverless
Last synced: 21 Dec 2024
https://github.com/grailbio/bigslice
A serverless cluster computing system for the Go programming language
bigdata cluster computing etl go golang machinelearning mapreduce
Last synced: 09 Nov 2024
https://github.com/apache/incubator-uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
mapreduce remote-shuffle-service rss shuffle spark tez
Last synced: 20 Dec 2024
https://github.com/camdavidsonpilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
distributed-computing estimate mapreduce percentile pyspark python quantile
Last synced: 21 Dec 2024
https://github.com/CamDavidsonPilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
distributed-computing estimate mapreduce percentile pyspark python quantile
Last synced: 30 Oct 2024
https://github.com/cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Last synced: 20 Dec 2024
https://github.com/tencent/firestorm
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
mapreduce remoteshuffle shuffle spark
Last synced: 17 Dec 2024
https://github.com/mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
algorithms bigdata data data-abstractions data-algorithms data-transformation dataframes design design-patterns machine-learning mappers mapreduce monoid partitioning-algorithms pyspark python rdd reducers spark transformations
Last synced: 17 Dec 2024
https://github.com/lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
apache-spark dataproc emr hadoop learning-hadoop mapreduce spark wordcount
Last synced: 15 Dec 2024
https://github.com/kevwan/mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
concurrent concurrent-programming go golang mapreduce mapreduce-go
Last synced: 20 Dec 2024
https://github.com/mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
algorithms apache-hadoop apache-spark big-data data-algorithms data-analysis data-engineering data-partition data-transformation glossary mapreduce mapreduce-algorithm mapreduce-python monoid partitioning-algorithms pyspark pyspark-algorithms-book santa-clara-university spark-dataframes spark-rdd
Last synced: 16 Dec 2024
https://github.com/touero/ctenopharyngodon-idella
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
fastapi hadoop hadoop-mapreduce java mapreduce maven scraping
Last synced: 15 Dec 2024
https://github.com/mimecast/dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
adhoc devops devops-tools distributed golang log log-management mapreduce mimecast troubleshooting
Last synced: 12 Nov 2024
https://github.com/cocainecong/tangseng
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
boltdb distributed-systems dockcer-compose docker etcd full-text-search gin grpc inverted-index kafka losertree lsm-tree mapreduce search-engine segment vector-search
Last synced: 19 Dec 2024
https://github.com/feng-li/Distributed-Statistical-Computing
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
hadoop mapreduce pyspark-tutorial spark spark-teaching statistical-models
Last synced: 30 Oct 2024
https://github.com/Refefer/Dampr
Python Data Processing library
batch-processing dataflow machine-learning mapreduce
Last synced: 27 Nov 2024
https://github.com/kwartile/connected-component
Map Reduce Implementation of Connected Component on Apache Spark
apache-spark connected-components graph-algorithms graphx mapreduce scala union-find
Last synced: 12 Oct 2024
https://github.com/mahmoudparsian/pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
algorithms big-data data data-abstractions data-science dataframe distributed-computing graphframes mapreduce monoid nosql partitioning pyspark pyspark-algorithms python rdd spark transformations
Last synced: 06 Nov 2024
https://github.com/flipkart-incubator/hbase-orm
A production-grade HBase ORM library that makes accessing HBase clean, fast and fun (Can also be used as Bigtable ORM)
bigtable bigtable-orm cloud-bigtable hadoop hbase hbase-orm mapreduce object-mapping orm
Last synced: 16 Nov 2024
https://github.com/nellore/rail
Scalable RNA-seq analysis
alignments emr ipython mapreduce rail-rna rna-seq-analysis
Last synced: 12 Oct 2024
https://github.com/am-kantox/elixir-iteraptor
Handy enumerable operations implementation.
elixir elixir-lang iteration mapreduce
Last synced: 18 Dec 2024
https://github.com/groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
apache-sedona apache-spark big-data bigdata bigtop docker gutenberg-ebooks hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce jupyter-notebook mapreduce mapreduce-bash mrjob pyspark spark spark-sql testdfsio
Last synced: 17 Dec 2024
https://github.com/turboway/pybigdata
使用 python 操作大数据的各种组件
elasticsearch hadoop hbase hive impala kafka mapreduce spark
Last synced: 15 Nov 2024
https://github.com/arindas/mit-6.824-distributed-systems
Template repository to work on the labs from MIT 6.824 Distributed Systems course.
distributed-systems mapreduce raft-consensus-algorithm
Last synced: 23 Nov 2024
https://github.com/asuiu/pyxtension
Pure Python extensions library that includes Scala-like streams, Json with attribute access syntax, and other common use stuff
java-streams mapreduce python python-iterables python-itertools python-json python-mapreduce python-multiprocessing python-multithreading python-streaming streaming
Last synced: 16 Dec 2024
https://github.com/whitfin/efflux
Easy Hadoop Streaming and MapReduce interfaces in Rust
Last synced: 16 Nov 2024
https://github.com/jishnub/parallelutilities.jl
Fast and easy parallel mapreduce on HPC clusters
distributed distributed-computing high-performance-computing hpc hpc-applications hpc-cluster hpc-clusters julia mapreduce parallel parallel-computing reduction
Last synced: 28 Oct 2024
https://github.com/deeptiman/offchaindata
Hyperledger Fabric OffChain Storage
blockchain-technology couchdb docker golang grpc-client grpc-go grpc-service hyperledger-fabric mapreduce mapreduce-demo offchain-storage
Last synced: 08 Nov 2024
https://github.com/longshilin/hadoop-mapreduce
基于MapReduce的应用案例 :ear_of_rice:
Last synced: 10 Nov 2024
https://github.com/innofang/subgraph-isomorphism
❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop
isomorphism mapreduce mapreduce-algorithm subgraph-count subgraph-isomorphism ullmann ullmann-algorithm vf2 vf2-algorithm
Last synced: 11 Nov 2024
https://github.com/zunzhuowei/qs-hadoop
大数据生态圈学习
bigdata elasticsearch hadoop mapreduce spark spark-streaming storm
Last synced: 02 Dec 2024
https://github.com/hiejulia/data-pipeline-project
Data pipeline project
amazon-web-services azure bigml classification data-pipeline deployment distributed-systems hadoop java kafka machine-learning mapreduce maven spark streaming
Last synced: 16 Dec 2024
https://github.com/goldmansachs/mrword2vec
A MapReduce / Hadoop implementation of Word2Vec
Last synced: 07 Nov 2024
https://github.com/alash3al/aggrex
a crazy API gateway aggregation using javascript as a language and go as a runtime
aggregator api api-client api-gateway cloud golang javascript mapreduce reverse-proxy
Last synced: 29 Nov 2024
https://github.com/yaa110/goterator
Lazy iterator implementation for Golang
golang golang-module golang-package iterator mapreduce
Last synced: 12 Nov 2024
https://github.com/gramian/hapod
HAPOD - Hierarchical Approximate Proper Orthogonal Decomposition
data-driven data-reduction data-science datascience dimension-reduction distributed-memory high-performance-computing hpc limited-memory mapreduce mapreduce-algorithm model-order-reduction model-reduction pca pod proper-orthogonal-decomposition svd unsupervised-learning
Last synced: 13 Nov 2024
https://github.com/magicxor/mapreduce
LINQ for Delphi (Object Pascal)
arrays delphi filter flatmap foreach generics high-order-function linq map mapreduce object-pascal pascal reduce selectmany
Last synced: 12 Nov 2024
https://github.com/dayyass/pydfs
Distributed File System written in Python
distributed-systems filesystem hadoop hdfs mapreduce python
Last synced: 14 Oct 2024
https://github.com/ktorzpersonal/purescript-ifrit
An SQL -> NoSQL compiler for data aggregation
aggregation compiler mapreduce mongodb nosql pipeline sql
Last synced: 15 Oct 2024
https://github.com/eftec/documentstoreone
A flat document store for PHP that allows multiples concurrencies
bigdata database mapreduce php php-library
Last synced: 07 Nov 2024
https://github.com/conradsnicta/armadillo-gmm
gmm_diag and gmm_full: C++ classes for multi-threaded Gaussian mixture models and Expectation-Maximisation
armadillo clustering clustering-algorithm cpp em-algorithm expectation-maximization gaussian-mixture-models gmm k-means k-means-clustering machine-learning mapreduce openmp statistics
Last synced: 24 Oct 2024
https://github.com/isislab-unisa/sof
Simulation Optimization and exploration Framework on the cloud: SOF
agent-based-simulation hadoop java mapreduce optimization-process simulation-model simulation-optimization sof
Last synced: 15 Nov 2024
https://github.com/vikhyat/stormycloud
Ridiculously simple distributed applications in Ruby.
distributed-systems mapreduce ruby
Last synced: 09 Nov 2024
https://github.com/nwjlyons/slice
Elixir's Enum module implemented in Go using generics.
Last synced: 15 Dec 2024
https://github.com/asuiu/streamerate
Iterable Java8 style Streams for Python
java-streams map-reduce mapreduce python python-iterables python-itertools python-mapreduce python-multiprocessing python-multithreading python-streaming python3 streaming
Last synced: 10 Nov 2024
https://github.com/perfectlysoft/perfect-hadoop
Perfect Hadoop: WebHDFS, MapReduce & Yarn.
hadoop mapreduce perfect server-side-swift swift webhdfs yarn
Last synced: 13 Nov 2024
https://github.com/banyc/mapreduce
In C#. Master-Worker. From scratch. No Hadoop. Done. Depend on DFS.
distributed-systems educational from-scratch mapreduce master-slave object-oriented-programming
Last synced: 19 Nov 2024
https://github.com/jishnub/mpimapreduce.jl
An MPI-based distributed map-reduce function for Julia
distributed-computing julia mapreduce message-passing mpi parallel parallel-computing
Last synced: 11 Oct 2024
https://github.com/timvisee/wrdcntr
:dash: A simple yet very fast word counter witten in Rust
concurrency mapreduce rayon rust wordcount
Last synced: 15 Nov 2024
https://github.com/lapets/mr4mp
Thin MapReduce-like layer that wraps the Python multiprocessing library.
library mapreduce multiprocessing multiprocessing-library parallel-programming parallel-python python python-library
Last synced: 23 Nov 2024
https://github.com/bugenzhao/6.824-mapreduce
An implementation of "6.824 Lab 1: MapReduce (2021)" in async Rust.
6824 distributed-systems mapreduce mit rpc
Last synced: 13 Oct 2024
https://github.com/WilliamX1/cse-2021
A distributed file system similar to Google File System (GFS).
distributed-file-system gfs mapreduce raft rpc
Last synced: 08 Nov 2024
https://github.com/ChasakisD/DistributedSystems
A recommendations Android App with a Backend Server in Java that supports Map & Reduce.
android distributed-systems mapreduce xamarin xamarin-android
Last synced: 24 Oct 2024
https://github.com/rezaei121/elasticsearch-mapreduce-wordcount
elasticsearch Map/Reduce integration (word count project)
bigdata elasticsearch elasticsearch-mapreduce-wordcount hdfs java mapreduce wordcount
Last synced: 11 Nov 2024
https://github.com/jordicorbilla/mapreduce
Data parallel text processing with MapReduce
Last synced: 06 Nov 2024
https://github.com/xudong963/distributed-systems
MapReduce, Raft, KV Raft, Shared KV
fault-tolerance golang kvs-server learn-golang mapreduce papers raft
Last synced: 07 Nov 2024
https://github.com/dedalozzo/eoc-server
A complete CouchDB Query Server written in PHP.
couchdb couchdb-query-server couchdb-server mapreduce php
Last synced: 12 Nov 2024
https://github.com/samyak2/yacs
YAAAAAAAAAAAAAAAAAAAAAAAAAAAAACS
distributed-systems mapreduce scheduler yarn yet-another
Last synced: 11 Nov 2024
https://github.com/zerefwayne/wordcount
A Go Implementation of MapReduce algorithm to calculate word count in text corpus.
concurrency golang goroutine mapreduce
Last synced: 21 Nov 2024
https://github.com/constantiner/fun-ctional
The library brings most of the familiar functional techniques (like functional composition) to asynchronous world with shining Promises
async asynchronous asynchronous-functions asynchronous-programming client-side functional functional-composition functional-programming javascript javascript-library library mapreduce nodejs promise promise-handling promises server-side
Last synced: 08 Dec 2024
https://github.com/chaokunyang/athena
A task scheduler for spark, flink, mapreduce, java, python, bash
flink hadoop mapreduce spark task-manager task-scheduler
Last synced: 19 Nov 2024
https://github.com/khinshankhan/nlp-tf-idf-hadoop
NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop
Last synced: 18 Nov 2024
https://github.com/anicolaspp/reactor
Multi-system MapR integration
java mapr mapr-db mapr-streams maprdb mapreduce mapreduce-java scala
Last synced: 16 Nov 2024
https://github.com/grycap/marla
MApReduce on AWS LAmbda
aws-lambda lambda mapreduce python serverless
Last synced: 08 Nov 2024
https://github.com/zurfyx/cassandra-hadoop-example
Cassandra Hadoop Example
cassandra hadoop mapreduce nodejs
Last synced: 11 Dec 2024
https://github.com/vschiavoni/SecureStreams-DEBS17
SecureStreams, DEBS'17
lua mapreduce pipeline reactive-programming security sgx sgx-enclave streaming
Last synced: 09 Nov 2024
https://github.com/wtanaka/ansible-role-apache-spark
Ansible role to install Apache Spark
ansible ansible-galaxy ansible-role ansible-roles apache-spark batch galaxy mapreduce spark streaming
Last synced: 22 Nov 2024
https://github.com/anindya-prithvi/map_rizzuse-dscd
A repository for a _real_ project (Map - reduce)
map map-reduce mapreduce reduce
Last synced: 15 Nov 2024
https://github.com/ggcr/go-mapreduce
MapReduce implementation written in Go with heavy use of concurrency and the distributed systems paradigm.
concurrency distributed-systems go goroutines mapreduce mit mit-6824 thread
Last synced: 14 Nov 2024
https://github.com/yoongoing/bigdata_pyspark
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️
bigdata dataminig jupyter-notebook mapreduce mapreduce-python pyspark spark
Last synced: 17 Dec 2024
https://github.com/sandeepkundalwal/advanced-computer-science-practicum
[CS515: Advanced Computer Science Practicum] This repo contains all the assignment of CS515 offered at IIT Mandi by Dr. Sriram Kailasam & Dr. Manas Thakur during Fall Session 2022.
fork-join hadoop java mapreduce scheme-programming-language thread-pool threads
Last synced: 07 Dec 2024