{"id":23080665,"url":"https://github.com/chen0040/vagrant-big-data","last_synced_at":"2026-05-03T18:33:48.745Z","repository":{"id":85884828,"uuid":"95828805","full_name":"chen0040/vagrant-big-data","owner":"chen0040","description":"Vagrantfiles for development in big data","archived":false,"fork":false,"pushed_at":"2017-07-07T15:34:23.000Z","size":32,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-03T13:47:58.615Z","etag":null,"topics":["cassandra","elasticsearc","hdfs","kafka","mesos","redis","spark","storm","vagrantfile","zookeeper"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chen0040.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-29T23:33:15.000Z","updated_at":"2018-03-17T09:42:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"186a44f6-c2d0-4344-888a-6acfc0ff645a","html_url":"https://github.com/chen0040/vagrant-big-data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chen0040/vagrant-big-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fvagrant-big-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fvagrant-big-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fvagrant-big-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fvagrant-big-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chen0040","download_url":"https://codeload.github.com/chen0040/vagrant-big-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fvagrant-big-data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32579851,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cassandra","elasticsearc","hdfs","kafka","mesos","redis","spark","storm","vagrantfile","zookeeper"],"created_at":"2024-12-16T13:15:54.871Z","updated_at":"2026-05-03T18:33:48.740Z","avatar_url":"https://github.com/chen0040.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vagrant-big-data\n\nVagrantfiles for development in big data\n\n# Install\n\ngit clone this project to your local computer and cd to one of the directory, then run \"vagrant up\".\n\n# Features\n\nAllow user to create different devops environment locally using Vagrantfiles\n\n* Zookeeper cluster\n* Zookeeper + Kafka\n* Storm with Zookeeper + Kafka\n* Spark with Zookeeper + Kafka\n* Spark with Mesos + Zookeeper + Kafka \n\n# Usage\n\n### Zookeeper Cluster\n\ncd to the directory \"zookeeper\" under the root directory and run \"vagrant up\". This will start a multi-machine vagrant \nsetup in which a zookeeper cluster will be auto-started on 3 VMs. The three ubuntu VMs have the following ip address \nby default (hostname:ip-address):\n\n* zoo1: 192.168.10.12\n* zoo2: 192.168.10.13\n* zoo3: 192.168.10.14\n\nTo login to one of the VMs, say zoo1, run \"vagrant ssh zoo1\".\n\nTo check the status of the zookeeper cluster, run the following command (e.g., from within the zoo1 VM):\n\n```bash\necho stat | nc zoo1 2181\necho stat | nc zoo2 2181\necho stat | nc zoo3 2181\n```\n\nThe log of the zookeeper is located in the VMs at \"/var/log/zookeeper/zookeeper.log\"\nThe zoo.cfg for configuration is located in the VMs at \"/etc/zookeeper/conf/zoo.cfg\"\n\nRun \"vagrant suspend\" or \"vagrant resume\" to stop or restart the zookeeper cluster.\n\nFor the zookeeper VMs, the zookeeper is auto started when the VM is up or resumed.\n\n### Zookeeper+Kafka Cluster\n\ncd to the directory \"zookeeper+kafka\" under the root directory and run \"vagrant up\". This will start a multi-machine vagrant \nsetup in which a zookeeper cluster and a kafka server will be auto-started in 4 VMs. The 4 ubuntu VMs have the following ip address \nby default (hostname:ip-address):\n\n* zoo1: 192.168.10.12\n* zoo2: 192.168.10.13\n* zoo3: 192.168.10.14\n* kafka1: 192.168.10.15\n\nTo login to one of the VMs, say kafka1, run \"vagrant ssh kafka1\".\n\nFor the zookeeper configuration, refers to the earlier section.\n\nFor the kafka1, the installation directory is in /opt/kafka,the configuration server.properties can be found in /opt/kafka/config/\n\nFor the kafka1, the kafka is auto started when the VM is up or resumed.\n\nWithin the VM, the kafka service can be managed by issuing command such as \"service kafka start/stop/restart/status\"\n\nTo check if kafka is running, run the following command in kafka1:\n\n```bash\nservice kafka status\n```\n\nTo test the kafka producer, create a topic and send a message to the topic using the following command in kafka1:\n\n```bash\necho \"Hello, World\" | /opt/kafka/bin/kafka-console-producer.sh --broker-list kafka1:9092 --topic TutorialTopic \u003e /dev/null\n```\n\nTo test the kafka consumer, run the following command:\n\n```bash \n/opt/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic TutorialTopic --from-beginning\n```\n\n### Storm Cluster (with Zookeeper + Kafka)\n\ncd to the directory \"storm\" under the root directory and run \"vagrant up\". This will start a multi-machine vagrant setup in which a \nstorm cluster will be auto-started. The following VMs will be setup to run:\n\n* a zookeeper cluster at the following hostname:ips:\n\n    * zoo1:192.168.10.12\n    * zoo2:192.168.10.13\n    * zoo3:192.168.10.14\n\n* a kafka server at the following hostname:ip (which use the zookeeper cluster above):\n\n    * kafka1:192.168.10.15\n    \n* a storm cluster at the following hostname:ips (which uses the zookeeper cluster above):\n\n    * stormnimbus1:192.168.10.17\n    * stormslave1:192.168.10.18\n    * stormslave2:192.168.10.19\n    * stormslave3:192.168.10.20\n    \nFor each of the storm VMs, the storm is installed at /opt/storm, and the configuration is at /opt/storm/conf/storm.yaml. \n\nWithin stormnimbus1 VM, the storm nimbus and storm ui is auto started when the VM is up or resumed\n\nWithin the stormnimbus1 VM, you can issue command such as as \"service storm-nimbus start/stop/restart/status\" \nand \"service storm-ui start/stop/restart/status\"\n\nWithin stormslave[x] VM, the storm supervisor is auto started when the VM is up.\n\nWithin the stormslave[x] VMs, you can issue command such as \"service storm-supervisor start/stop/restart/status\"\n\nOnce the vagrant VMs are up and running, you can go to your host computer and enter \nthe url http://192.168.10.17:8080 to your browser on the host computer. This will show the storm UI.\n\n### Hadoop Cluster\n\ncd to the directory \"hadoop\" under the root directory and run \"vagrant up\". This will start a multi-machine vagrant setup in which a \nhdfs cluster will be auto-started. The following VMs will be setup to run:\n\n\n\n### Spark Cluster\n\ncd to the directory \"spark\" under the root directory and run \"vagrant up\". This will start a multi-machine vagrant setup in which a \nspark cluster will be auto-started. The following VMs will be setup to run:\n    \n* a spark cluster at the following hostname:ips (which uses the zookeeper cluster above):\n\n    * sparkmaster:192.168.10.21\n    * sparkslave1:192.168.10.22\n    * sparkslave2:192.168.10.23\n    * sparkslave3:192.168.10.24\n    \nFor each of the spark VMs, the spark is installed at /opt/spark, and the configuration is at /opt/spark/conf/slaves. \n\nWithin sparkmaster1 VM, the spark cluster is auto started when the VM is up or resumed\n\nWithin the sparkmaster1 VM, you can issue command such as as \"service spark start/stop/restart/status\" \n\nOnce the vagrant VMs are up and running, you can go to your host computer and enter \nthe url http://192.168.10.21:4040 to your browser on the host computer. This will show the spark UI.\n\n### Spark Cluster (with Zookeeper + Kafka)\n\ncd to the directory \"spark+zookeeper+kafka\" under the root directory and run \"vagrant up\". This will start a multi-machine vagrant setup in which a \nspark cluster will be auto-started. The following VMs will be setup to run:\n\n* a zookeeper cluster at the following hostname:ips:\n\n    * zoo1:192.168.10.12\n    * zoo2:192.168.10.13\n    * zoo3:192.168.10.14\n\n* a kafka server at the following hostname:ip (which use the zookeeper cluster above):\n\n    * kafka1:192.168.10.15\n    \n* a spark cluster at the following hostname:ips (which uses the zookeeper cluster above):\n\n    * sparkmaster:192.168.10.21\n    * sparkslave1:192.168.10.22\n    * sparkslave2:192.168.10.23\n    * sparkslave3:192.168.10.24\n    \nFor each of the spark VMs, the spark is installed at /opt/spark, and the configuration is at /opt/spark/conf/slaves. \n\nWithin sparkmaster1 VM, the spark cluster is auto started when the VM is up or resumed\n\nWithin the sparkmaster1 VM, you can issue command such as as \"service spark start/stop/restart/status\" \n\nOnce the vagrant VMs are up and running, you can go to your host computer and enter \nthe url http://192.168.10.21:4040 to your browser on the host computer. This will show the spark UI.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchen0040%2Fvagrant-big-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchen0040%2Fvagrant-big-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchen0040%2Fvagrant-big-data/lists"}