https://github.com/getindata/bigdatatutorial
https://github.com/getindata/bigdatatutorial
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/getindata/bigdatatutorial
- Owner: getindata
- Created: 2015-02-16T19:07:02.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2017-02-02T11:27:49.000Z (over 9 years ago)
- Last Synced: 2025-04-09T20:11:38.972Z (about 1 year ago)
- Language: Java
- Size: 6.86 MB
- Stars: 7
- Watchers: 6
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
BigDataTutorial
=======
Spark Streaming And Kafka
-------------------------
SSH into some slave node (the edgenode can not be used, because it does not have Kafka libraries and configuration)
Before the demo, please install following tools:
curl https://bintray.com/sbt/rpm/rpm | tee /etc/yum.repos.d/bintray-sbt-rpm.repo
yum install -y sbt vim
During the demo
In terminal (1):
# build with dependencies
cd kafka
mvn package -Pfull
# export varenvs
export KAFKA=$(hostname):6667
export ZOOKEEPER=$(cat /etc/kafka/conf/server.properties | grep zookeeper.connect= | cut -d'=' -f 2)
export JAVA_HOME=$(cat /etc/hadoop/conf/hadoop-env.sh | grep JAVA_HOME= | cut -d'=' -f 2)
# print varenvs
echo $KAFKA
echo $ZOOKEEPER
echo $JAVA_HOME
# create the topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper $ZOOKEEPER --replication-factor 1 --partitions 1 --topic logevent
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER
# produce some data to the topic
vim src/main/java/com/getindata/tutorial/bigdatatutorial/kafka/LogEventTsvProducer.java
$JAVA_HOME/bin/java -cp target/bigdatatutorial-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.getindata.tutorial.bigdatatutorial.kafka.LogEventTsvProducer $KAFKA logevent true
In terminal (2):
export KAFKA=$(hostname):6667
export ZOOKEEPER=$(cat /etc/kafka/conf/server.properties | grep zookeeper.connect= | cut -d'=' -f 2)
# consume data using Kafka console consumer
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic logevent --zookeeper $ZOOKEEPER --from-beginning
# start Spark Streaming app
cd ../streaming
vim src/main/scala/TopSongs.scala
sbt assembly
./bin/start.sh TopSongs $ZOOKEEPER $KAFKA logevent
In terminal (1):
# produce some data to the topic
$JAVA_HOME/bin/java -cp target/bigdatatutorial-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.getindata.tutorial.bigdatatutorial.kafka.LogEventTsvProducer $KAFKA logevent true