Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with mapreduce

A curated list of projects in awesome lists tagged with mapreduce .

https://github.com/donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

aws big-data caffe data-science deep-learning hadoop kaggle keras machine-learning mapreduce matplotlib numpy pandas python scikit-learn scipy spark tensorflow theano

Last synced: 16 Dec 2024

https://github.com/PowerJob/PowerJob

Enterprise job scheduling middleware with distributed computing ability.

cron distributed java job job-scheduler mapreduce scheduler workflow

Last synced: 30 Oct 2024

https://github.com/powerjob/powerjob

Enterprise job scheduling middleware with distributed computing ability.

cron distributed java job job-scheduler mapreduce scheduler workflow

Last synced: 16 Dec 2024

https://github.com/douban/dpark

Python clone of Spark, a MapReduce alike framework in Python

bigdata dpark mapreduce python spark stream-processing

Last synced: 12 Oct 2024

https://github.com/water8394/bigdata-interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata flink hadoop hbase hdfs interview interview-questions kafka mapreduce spark yarn

Last synced: 21 Dec 2024

https://github.com/water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata flink hadoop hbase hdfs interview interview-questions kafka mapreduce spark yarn

Last synced: 30 Oct 2024

https://github.com/collabh/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

bigdata bigdatalearning debezium flink hadoop hbase hdfs hive hudi kafka kudu mapreduce olap spark

Last synced: 19 Dec 2024

https://github.com/collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

bigdata bigdatalearning debezium flink hadoop hbase hdfs hive hudi kafka kudu mapreduce olap spark

Last synced: 31 Oct 2024

https://github.com/cdapio/cdap

An open source framework for building data analytic applications.

cdap dataset integration java java-8 mapreduce middleware platform python spark spark-streaming unified

Last synced: 17 Dec 2024

https://github.com/bcongdon/corral

🐎 A serverless MapReduce framework written for AWS Lambda

aws-lambda mapreduce mapreduce-framework serverless

Last synced: 21 Dec 2024

https://github.com/grailbio/bigslice

A serverless cluster computing system for the Go programming language

bigdata cluster computing etl go golang machinelearning mapreduce

Last synced: 09 Nov 2024

https://github.com/apache/incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

mapreduce remote-shuffle-service rss shuffle spark tez

Last synced: 20 Dec 2024

https://github.com/camdavidsonpilon/tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

distributed-computing estimate mapreduce percentile pyspark python quantile

Last synced: 21 Dec 2024

https://github.com/CamDavidsonPilon/tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

distributed-computing estimate mapreduce percentile pyspark python quantile

Last synced: 30 Oct 2024

https://github.com/cubefs/compass

Compass is a task diagnosis platform for bigdata

airflow bigdata diagnose dolphinscheduler flink hadoop mapreduce scheduler spark sql

Last synced: 22 Dec 2024

https://github.com/cwensel/cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

hadoop java mapreduce tez

Last synced: 20 Dec 2024

https://github.com/datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

bigdata hadoop hbase hdfs hive mapreduce spark

Last synced: 17 Nov 2024

https://github.com/tencent/firestorm

Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers

mapreduce remoteshuffle shuffle spark

Last synced: 17 Dec 2024

https://github.com/lynnlangit/learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

apache-spark dataproc emr hadoop learning-hadoop mapreduce spark wordcount

Last synced: 15 Dec 2024

https://github.com/kevwan/mapreduce

A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

concurrent concurrent-programming go golang mapreduce mapreduce-go

Last synced: 20 Dec 2024

https://github.com/touero/ctenopharyngodon-idella

Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.

fastapi hadoop hadoop-mapreduce java mapreduce maven scraping

Last synced: 15 Dec 2024

https://github.com/mimecast/dtail

DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.

adhoc devops devops-tools distributed golang log log-management mapreduce mimecast troubleshooting

Last synced: 12 Nov 2024

https://github.com/cocainecong/tangseng

Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统

boltdb distributed-systems dockcer-compose docker etcd full-text-search gin grpc inverted-index kafka losertree lsm-tree mapreduce search-engine segment vector-search

Last synced: 19 Dec 2024

https://github.com/feng-li/Distributed-Statistical-Computing

Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)

hadoop mapreduce pyspark-tutorial spark spark-teaching statistical-models

Last synced: 30 Oct 2024

https://github.com/Refefer/Dampr

Python Data Processing library

batch-processing dataflow machine-learning mapreduce

Last synced: 27 Nov 2024

https://github.com/kwartile/connected-component

Map Reduce Implementation of Connected Component on Apache Spark

apache-spark connected-components graph-algorithms graphx mapreduce scala union-find

Last synced: 12 Oct 2024

https://github.com/flipkart-incubator/hbase-orm

A production-grade HBase ORM library that makes accessing HBase clean, fast and fun (Can also be used as Bigtable ORM)

bigtable bigtable-orm cloud-bigtable hadoop hbase hbase-orm mapreduce object-mapping orm

Last synced: 16 Nov 2024

https://github.com/nellore/rail

Scalable RNA-seq analysis

alignments emr ipython mapreduce rail-rna rna-seq-analysis

Last synced: 12 Oct 2024

https://github.com/am-kantox/elixir-iteraptor

Handy enumerable operations implementation.

elixir elixir-lang iteration mapreduce

Last synced: 18 Dec 2024

https://github.com/turboway/pybigdata

使用 python 操作大数据的各种组件

elasticsearch hadoop hbase hive impala kafka mapreduce spark

Last synced: 15 Nov 2024

https://github.com/arindas/mit-6.824-distributed-systems

Template repository to work on the labs from MIT 6.824 Distributed Systems course.

distributed-systems mapreduce raft-consensus-algorithm

Last synced: 23 Nov 2024

https://github.com/asuiu/pyxtension

Pure Python extensions library that includes Scala-like streams, Json with attribute access syntax, and other common use stuff

java-streams mapreduce python python-iterables python-itertools python-json python-mapreduce python-multiprocessing python-multithreading python-streaming streaming

Last synced: 16 Dec 2024

https://github.com/jehiah/gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

dataproc go hadoop mapreduce mrjob

Last synced: 27 Oct 2024

https://github.com/aikuyun/bigdata-doc

大数据学习笔记,学习路线,技术案例整理。

bigdata flink hadoop hdfs hive kafka mapreduce

Last synced: 30 Oct 2024

https://github.com/whitfin/efflux

Easy Hadoop Streaming and MapReduce interfaces in Rust

hadoop mapreduce processing

Last synced: 16 Nov 2024

https://github.com/orangedrk/javanotes

Java后端学习笔记。包括Linux、maven、git、互联网架构、大数据体系等

flume git hadoop hbase hdfs hive javaee javase kafka linux mapreduce maven mybatis mycat rabbitmq redis spring spring-boot springcloud zookeeper

Last synced: 13 Oct 2024

https://github.com/chucheng92/hadoopdedup

:watermelon:基于Hadoop和HBase的大规模海量数据去重

big-data cdc dedup fsp mapreduce

Last synced: 08 Nov 2024

https://github.com/saleyn/etran

Erlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments

arguments default elixir erlang fold function map mapreduce parser pipe pipeline transform

Last synced: 27 Oct 2024

https://github.com/hobbyquaker/mqttdb

JSON Store with MQTT Interface :books::open_file_folder::satellite:

database documents json mapreduce metadata mqtt nosql store views

Last synced: 10 Oct 2024

https://github.com/longshilin/hadoop-mapreduce

基于MapReduce的应用案例 :ear_of_rice:

hadoop mapreduce mr

Last synced: 10 Nov 2024

https://github.com/innofang/subgraph-isomorphism

❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop

isomorphism mapreduce mapreduce-algorithm subgraph-count subgraph-isomorphism ullmann ullmann-algorithm vf2 vf2-algorithm

Last synced: 11 Nov 2024

https://github.com/goldmansachs/mrword2vec

A MapReduce / Hadoop implementation of Word2Vec

java mapreduce word2vec

Last synced: 07 Nov 2024

https://github.com/alash3al/aggrex

a crazy API gateway aggregation using javascript as a language and go as a runtime

aggregator api api-client api-gateway cloud golang javascript mapreduce reverse-proxy

Last synced: 29 Nov 2024

https://github.com/yaa110/goterator

Lazy iterator implementation for Golang

golang golang-module golang-package iterator mapreduce

Last synced: 12 Nov 2024

https://github.com/singgel/bigdata-skilltree

Spark、flink、HBase、Hive、flume集成了一些Hadoop的原生api的一些demo(如HDFS、MapReduce:目前就这两个);同时测试一些异常功能

hadoop hbase hdfs hive kylin mapreduce scala spark

Last synced: 14 Oct 2024

https://github.com/sylvainhalle/mrsim

A simple MapReduce framework in Java

hadoop java mapreduce tuples

Last synced: 11 Oct 2024

https://github.com/zenoyang/web-click-flow

网站点击流离线日志分析

etl flume hadoop hive mapreduce sqoop

Last synced: 16 Nov 2024

https://github.com/dayyass/pydfs

Distributed File System written in Python

distributed-systems filesystem hadoop hdfs mapreduce python

Last synced: 14 Oct 2024

https://github.com/ktorzpersonal/purescript-ifrit

An SQL -> NoSQL compiler for data aggregation

aggregation compiler mapreduce mongodb nosql pipeline sql

Last synced: 15 Oct 2024

https://github.com/eftec/documentstoreone

A flat document store for PHP that allows multiples concurrencies

bigdata database mapreduce php php-library

Last synced: 07 Nov 2024

https://github.com/conradsnicta/armadillo-gmm

gmm_diag and gmm_full: C++ classes for multi-threaded Gaussian mixture models and Expectation-Maximisation

armadillo clustering clustering-algorithm cpp em-algorithm expectation-maximization gaussian-mixture-models gmm k-means k-means-clustering machine-learning mapreduce openmp statistics

Last synced: 24 Oct 2024

https://github.com/isislab-unisa/sof

Simulation Optimization and exploration Framework on the cloud: SOF

agent-based-simulation hadoop java mapreduce optimization-process simulation-model simulation-optimization sof

Last synced: 15 Nov 2024

https://github.com/vikhyat/stormycloud

Ridiculously simple distributed applications in Ruby.

distributed-systems mapreduce ruby

Last synced: 09 Nov 2024

https://github.com/nwjlyons/slice

Elixir's Enum module implemented in Go using generics.

elixir generics go mapreduce

Last synced: 15 Dec 2024

https://github.com/perfectlysoft/perfect-hadoop

Perfect Hadoop: WebHDFS, MapReduce & Yarn.

hadoop mapreduce perfect server-side-swift swift webhdfs yarn

Last synced: 13 Nov 2024

https://github.com/banyc/mapreduce

In C#. Master-Worker. From scratch. No Hadoop. Done. Depend on DFS.

distributed-systems educational from-scratch mapreduce master-slave object-oriented-programming

Last synced: 19 Nov 2024

https://github.com/jishnub/mpimapreduce.jl

An MPI-based distributed map-reduce function for Julia

distributed-computing julia mapreduce message-passing mpi parallel parallel-computing

Last synced: 11 Oct 2024

https://github.com/timvisee/wrdcntr

:dash: A simple yet very fast word counter witten in Rust

concurrency mapreduce rayon rust wordcount

Last synced: 15 Nov 2024

https://github.com/lapets/mr4mp

Thin MapReduce-like layer that wraps the Python multiprocessing library.

library mapreduce multiprocessing multiprocessing-library parallel-programming parallel-python python python-library

Last synced: 23 Nov 2024

https://github.com/bugenzhao/6.824-mapreduce

An implementation of "6.824 Lab 1: MapReduce (2021)" in async Rust.

6824 distributed-systems mapreduce mit rpc

Last synced: 13 Oct 2024

https://github.com/axsaucedo/hadoop-overview

Hands on Hadoop, services, installation

ambari hadoop hdfs hive mapreduce mesos notes pig spark yarn

Last synced: 06 Nov 2024

https://github.com/WilliamX1/cse-2021

A distributed file system similar to Google File System (GFS).

distributed-file-system gfs mapreduce raft rpc

Last synced: 08 Nov 2024

https://github.com/ChasakisD/DistributedSystems

A recommendations Android App with a Backend Server in Java that supports Map & Reduce.

android distributed-systems mapreduce xamarin xamarin-android

Last synced: 24 Oct 2024

https://github.com/jordicorbilla/mapreduce

Data parallel text processing with MapReduce

mapreduce parallel-computing

Last synced: 06 Nov 2024

https://github.com/mynameisvinn/sprite

serverless mapreduce

mapreduce python serverless

Last synced: 07 Nov 2024

https://github.com/dedalozzo/eoc-server

A complete CouchDB Query Server written in PHP.

couchdb couchdb-query-server couchdb-server mapreduce php

Last synced: 12 Nov 2024

https://github.com/samyak2/yacs

YAAAAAAAAAAAAAAAAAAAAAAAAAAAAACS

distributed-systems mapreduce scheduler yarn yet-another

Last synced: 11 Nov 2024

https://github.com/zerefwayne/wordcount

A Go Implementation of MapReduce algorithm to calculate word count in text corpus.

concurrency golang goroutine mapreduce

Last synced: 21 Nov 2024

https://github.com/constantiner/fun-ctional

The library brings most of the familiar functional techniques (like functional composition) to asynchronous world with shining Promises

async asynchronous asynchronous-functions asynchronous-programming client-side functional functional-composition functional-programming javascript javascript-library library mapreduce nodejs promise promise-handling promises server-side

Last synced: 08 Dec 2024

https://github.com/chaokunyang/athena

A task scheduler for spark, flink, mapreduce, java, python, bash

flink hadoop mapreduce spark task-manager task-scheduler

Last synced: 19 Nov 2024

https://github.com/khinshankhan/nlp-tf-idf-hadoop

NLP analysis of Term Frequency - Inverse Document Frequency using Hadoop

hadoop mapreduce nlp tf-idf

Last synced: 18 Nov 2024

https://github.com/grycap/marla

MApReduce on AWS LAmbda

aws-lambda lambda mapreduce python serverless

Last synced: 08 Nov 2024

https://github.com/zurfyx/cassandra-hadoop-example

Cassandra Hadoop Example

cassandra hadoop mapreduce nodejs

Last synced: 11 Dec 2024

https://github.com/anindya-prithvi/map_rizzuse-dscd

A repository for a _real_ project (Map - reduce)

map map-reduce mapreduce reduce

Last synced: 15 Nov 2024

https://github.com/ggcr/go-mapreduce

MapReduce implementation written in Go with heavy use of concurrency and the distributed systems paradigm.

concurrency distributed-systems go goroutines mapreduce mit mit-6824 thread

Last synced: 14 Nov 2024

https://github.com/yoongoing/bigdata_pyspark

⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️

bigdata dataminig jupyter-notebook mapreduce mapreduce-python pyspark spark

Last synced: 17 Dec 2024

https://github.com/sandeepkundalwal/advanced-computer-science-practicum

[CS515: Advanced Computer Science Practicum] This repo contains all the assignment of CS515 offered at IIT Mandi by Dr. Sriram Kailasam & Dr. Manas Thakur during Fall Session 2022.

fork-join hadoop java mapreduce scheme-programming-language thread-pool threads

Last synced: 07 Dec 2024