{"id":21514944,"url":"https://github.com/getindata/bigdatatutorial","last_synced_at":"2025-04-09T20:11:45.426Z","repository":{"id":27406635,"uuid":"30883455","full_name":"getindata/BigDataTutorial","owner":"getindata","description":null,"archived":false,"fork":false,"pushed_at":"2017-02-02T11:27:49.000Z","size":7190,"stargazers_count":7,"open_issues_count":1,"forks_count":8,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-09T20:11:38.972Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getindata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-16T19:07:02.000Z","updated_at":"2022-08-12T12:48:48.000Z","dependencies_parsed_at":"2022-09-02T05:11:02.838Z","dependency_job_id":null,"html_url":"https://github.com/getindata/BigDataTutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2FBigDataTutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2FBigDataTutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2FBigDataTutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2FBigDataTutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getindata","download_url":"https://codeload.github.com/getindata/BigDataTutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103872,"owners_count":21048245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T23:53:38.639Z","updated_at":"2025-04-09T20:11:45.404Z","avatar_url":"https://github.com/getindata.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"BigDataTutorial\n=======\n\nSpark Streaming And Kafka\n-------------------------\n\nSSH into some slave node (the edgenode can not be used, because it does not have Kafka libraries and configuration)\nBefore the demo, please install following tools:\n\n\tcurl https://bintray.com/sbt/rpm/rpm | tee /etc/yum.repos.d/bintray-sbt-rpm.repo\n\tyum install -y sbt vim\n\nDuring the demo\n\nIn terminal (1):\n\n\t# build with dependencies\n\tcd kafka\n\tmvn package -Pfull\n\n\t# export varenvs\n\texport KAFKA=$(hostname):6667\n\texport ZOOKEEPER=$(cat /etc/kafka/conf/server.properties | grep zookeeper.connect= | cut -d'=' -f 2)\n\texport JAVA_HOME=$(cat /etc/hadoop/conf/hadoop-env.sh | grep JAVA_HOME= | cut -d'=' -f 2)\n\n\t# print varenvs\n\techo $KAFKA\n\techo $ZOOKEEPER\n\techo $JAVA_HOME\n\n\t# create the topic\n\t/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER\n\t/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper $ZOOKEEPER --replication-factor 1 --partitions 1 --topic logevent\n\t/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper $ZOOKEEPER\n\n\t# produce some data to the topic\n\tvim src/main/java/com/getindata/tutorial/bigdatatutorial/kafka/LogEventTsvProducer.java\n\t$JAVA_HOME/bin/java -cp target/bigdatatutorial-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.getindata.tutorial.bigdatatutorial.kafka.LogEventTsvProducer $KAFKA logevent true\n\nIn terminal (2):\n\n\texport KAFKA=$(hostname):6667\n\texport ZOOKEEPER=$(cat /etc/kafka/conf/server.properties | grep zookeeper.connect= | cut -d'=' -f 2)\n\n\t# consume data using Kafka console consumer\n\t/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic logevent --zookeeper $ZOOKEEPER --from-beginning\n\n\t# start Spark Streaming app\n\tcd ../streaming\n\tvim src/main/scala/TopSongs.scala\n\tsbt assembly\n\t./bin/start.sh TopSongs $ZOOKEEPER $KAFKA logevent\n\nIn terminal (1):\n\n\t# produce some data to the topic\n\t$JAVA_HOME/bin/java -cp target/bigdatatutorial-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.getindata.tutorial.bigdatatutorial.kafka.LogEventTsvProducer $KAFKA logevent true\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fbigdatatutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetindata%2Fbigdatatutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fbigdatatutorial/lists"}