Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nossbigg/mini-data-pipeline

A quick way to deploy a mini data pipeline
https://github.com/nossbigg/mini-data-pipeline

hadoop kafka python spark zookeeper

Last synced: 2 days ago
JSON representation

A quick way to deploy a mini data pipeline

Host: GitHub
URL: https://github.com/nossbigg/mini-data-pipeline
Owner: nossbigg
Created: 2018-02-08T15:09:10.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2018-02-12T05:56:52.000Z (almost 7 years ago)
Last Synced: 2024-12-12T10:23:40.011Z (about 2 months ago)
Topics: hadoop, kafka, python, spark, zookeeper
Language: Java
Homepage:
Size: 65.4 KB
Stars: 0
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # mini-data-pipeline

A quick way to deploy a mini data pipeline 

## Components

- [Hadoop](http://hadoop.apache.org/): An unstructured data store

- [Kafka](https://kafka.apache.org/): A message queue

- [Spark](https://spark.apache.org/): An in-memory job runner

- [Zookeeper](https://zookeeper.apache.org/): A distributed systems coordinator

    - *A necessary Kafka dependency*

## Docker Images Used

- Hadoop: [sequenceiq](https://hub.docker.com/r/sequenceiq/spark/)

    - Comes installed with Spark binaries for experimentation

- Kafka: [wurstmeister](https://hub.docker.com/r/wurstmeister/kafka/)

- Spark: [p7hb](https://hub.docker.com/r/p7hb/docker-spark/)

- Zookeeper:  [Apache](https://hub.docker.com/_/zookeeper/)

## Ports

- Hadoop

    - 50010, 50020, 50070, 50075, 50090, 8020, 9000: HDFS

    - 10020, 19888: MapReduce

    - 8030, 8031, 8032, 8033, 8040, 8042, 8088: YARN

    - 49707, 2122: Other

- Kafka

    - 9092

- Spark

    - 8080: Spark Web UI

    - 7077: Job endpoint

- Spark (Slave)

    - 8081: Web UI

- Zookeeper

    - 2181: For clients (eg. Kafka) to connect to

    

Reference: [Apache Ambari v1.2.3 Chapter 10. Configuring Ports

](https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/reference_chap2.html)