{"id":13752817,"url":"https://github.com/InsightDataScience/ansible-playbook","last_synced_at":"2025-05-09T20:34:25.412Z","repository":{"id":79898294,"uuid":"80498060","full_name":"InsightDataScience/ansible-playbook","owner":"InsightDataScience","description":"Ansible playbook to deploy distributed technologies","archived":false,"fork":false,"pushed_at":"2017-11-20T07:00:05.000Z","size":90,"stargazers_count":67,"open_issues_count":7,"forks_count":44,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-12-01T08:38:34.546Z","etag":null,"topics":["ansible","ansible-playbooks","aws","data-engineering","devops","ec2-instance","infrastructure-management","kafka","zookeeper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InsightDataScience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-31T06:49:48.000Z","updated_at":"2024-07-03T18:21:17.000Z","dependencies_parsed_at":"2023-05-25T02:00:13.298Z","dependency_job_id":null,"html_url":"https://github.com/InsightDataScience/ansible-playbook","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InsightDataScience%2Fansible-playbook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InsightDataScience%2Fansible-playbook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InsightDataScience%2Fansible-playbook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InsightDataScience%2Fansible-playbook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InsightDataScience","download_url":"https://codeload.github.com/InsightDataScience/ansible-playbook/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253321772,"owners_count":21890462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","ansible-playbooks","aws","data-engineering","devops","ec2-instance","infrastructure-management","kafka","zookeeper"],"created_at":"2024-08-03T09:01:11.337Z","updated_at":"2025-05-09T20:34:20.973Z","avatar_url":"https://github.com/InsightDataScience.png","language":"Python","readme":"## Ansible playbook to deploy distributed technologies\nThis project is a set of Ansible playbooks to easily install a set of distributed technologies on [AWS](https://aws.amazon.com/)\n\n## Table of Contents\n1. [Supported playbooks](#supported-playbooks)\n2. [Supported commands](#supported-commands)\n3. [Setup](#setup)\n  * [On your local/remote machine](#on-your-localremote-machine)\n  * [Using Docker container](#using-docker-container)\n4. [Playbooks](#playbooks)\n  * [Launch/Terminate EC2 instances on AWS](#ec2)\n  * [Zookeeper](#zookeeper)\n  * [Kafka](#kafka)\n  * [Vowpal Wabbit](#vowpal-wabbit)\n\n## Supported playbooks\n* EC2\n* Zookeeper\n* Kafka\n\n## Supported Commands\n```bash\n~$ ansible-playbook \u003cmaster-playbook\u003e.yml --extra-vars \"\u003cvar1\u003e=\u003cvalue1\u003e \u003cvar2\u003e=\u003cvalue2\u003e\" --tags \"\u003ctag1\u003e,\u003ctag2\u003e\"\n```\n* **EC2** playbook is controlled by a yaml file containing variables for the EC2 instances to be acted on. More details [below](#ec2)\n* **Zookeeper**, **Kafka**, and **Vowpal Wabbit** playbooks need respective cluster tags to be specified to identify which nodes are in the cluster and need to be acted on. More details [below](#zookeeper)\n\n## Setup\n### On your local/remote machine\n1. [Setup ansible for your system](http://docs.ansible.com/ansible/intro_installation.html)\n2. Create following folders\n\n  ```bash\n  ~$ mkdir -p /etc/ansible/hosts\n  ```\n3. Clone this repo\n\n  ```bash\n  ~$ git clone https://github.com/InsightDataScience/ansible-playbook.git\n  ```\n\n4. Copy the `ec2.py` and `ec2.ini` files in this repo to `/etc/ansible/hosts`\n5. Update information in `ansible_example.cfg` and move it to `/etc/ansible/ansible.cfg`\n6. Export AWS credentials as environment variables\n\n  ```bash\n  export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX\n  export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXX\n  ```\n\n## Using Docker container\n1. [Setup Docker for your system](https://docs.docker.com/engine/installation/)\n2. Clone this repo\n\n  ```bash\n  ~$ git clone https://github.com/InsightDataScience/ansible-playbook.git\n  ```\n3. Build your docker image locally with the following command - run this from the root folder of this repo\n\n  ```bash\n  ~$ docker build -t ansible-playbook -f conf/Dockerfile .\n  ```\n\n4. Run the docker container in interactive mode using the script in the repo - `run_ansible_playbook_container.sh`\n\n  ```bash\n  ~$ ./run_ansible_playbook_container.sh\n  ```\n5. Update information in `/etc/ansible/ansible.cfg` config file inside the container\n6. Export AWS credentials in `~/.profile` inside the container\n\n  ```bash\n  export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX\n  export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXX\n  ```\n\n## Playbooks\n* ###EC2\n\n  Launch/Start/Stop/Terminate EC2 instances on AWS.\n\n  * ####Variable file: \n    \n    Update `example_ec2_vars.yml` as per your requirement\n\n    EC2 playbook is controlled by a yaml file with variables defined for the EC2 instances. An example variable file -`example_ec2_vars.yml` - is included in this repo. You can define your own yaml file with the following information:\n\n    ```yaml\n    ---\n    key_pair: \u003ckey-name\u003e\n    instance_type: \u003cinstance-type\u003e\n    region: \u003cregion\u003e\n    security_group_id: \u003csecurity-group-id\u003e\n    num_instances: \u003cnum-of-instances\u003e\n    subnet_id: \u003csubent-id\u003e\n    tag_key_vals:\n      Name: \u003ccluster-name\u003e\n      \u003ccustom-tag-key1\u003e: \u003ccustom-tag-val1\u003e\n      \u003ccustom-tag-key2\u003e: \u003ccustom-tag-val2\u003e\n    ```\n\n    The `Name` tag in the `tag_key_vals` is mandatory to create an identifier for the instances. More tags can be added if needed but are optional.\n\n    In your terminal, you will likely also need to add your private key to an ssh agent:\n\n    ```bash\n    ssh-add \u003c/path/to/my.pem\u003e\n    ```\n\n  * ####Launch EC2 instances:\n    \n    ```bash\n    ~$ ansible-playbook ./ec2.yml --extra-vars \"vars_file=./example_ec2_vars.yml\" --tags launch\n    ```\n  * ####Stop EC2 instances:\n  \n    ```bash\n    ~$ ansible-playbook ./ec2.yml --extra-vars \"vars_file=./example_ec2_vars.yml\" --tags stop \n    ```\n  * ####Start EC2 instances:\n  \n    ```bash\n    ~$ ansible-playbook ./ec2.yml --extra-vars \"vars_file=./example_ec2_vars.yml\" --tags start \n    ```\n  * ####Terminate EC2 instances:\n  \n    ```bash\n    ~$ ansible-playbook ./ec2.yml --extra-vars \"vars_file=./example_ec2_vars.yml\" --tags terminate\n    ```\n\n* ###Zookeeper\n  For Zookeeper playbook, a `zookeeper_tag` needs to be specified to identify the nodes in the cluster. This `zookeeper_tag` can be any tag specified in `tag_key_vals` in the variable file for [EC2]( while launching EC2 instances.\n\n  The `zookeeper_tag` is specifed as `\u003ckey\u003e_\u003cvalue\u003e` for one of the `tag_key_vals` to be used. For example, if the `\u003ccluster-name\u003e` in the [EC2 variable file](example_ec2_vars.yml) mentioned above was `test-cluster`, the `zookeeper_tag` would be specified as `zookeeper_tag=Name_test-cluster`. It doesn't have to be the `Name` tag but could be any key value pair in `tag_key_vals` specified as `zookeeper_tag=\u003ckey\u003e_\u003cvalue\u003e`.\n\n  * ####Install Zookeeper:\n\n    ```bash\n    ~$ ansible-playbook ./zookeeper.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e\" --tags install\n    ```\n  * ####Start Zookeeper:\n\n    ```bash\n    ~$ ansible-playbook ./zookeeper.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e\" --tags start\n    ```\n  * ####Get info about Zookeeper on the specified cluster:\n\n    ```bash\n    ~$ ansible-playbook ./zookeeper.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e\" --tags info\n    ```\n  * ####Stop Zookeeper:\n\n    ```bash\n    ~$ ansible-playbook ./zookeeper.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e\" --tags stop \n    ```\n  * ####Uninstall Zookeeper:\n\n    ```bash\n    ~$ ansible-playbook ./zookeeper.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e\" --tags uninstall\n    ```\n\n* ###Kafka\n  Kafka has a dependency on Zookeeper for cluster membership, topic configuration, data partition, etc. For Kafka playbook, a `zookeeper_tag` and a `kafka_tag` needs to be specified to identify the nodes in the zookeeper and kafka cluster respectively. The `kafka_tag` and `zookeeper_tag` can be any tag specified in `tag_key_vals` in the [variable file for EC2](#variable-file).\n\n  The `kafka_tag` and `zookeeper_tag` are specifed as `\u003ckey\u003e_\u003cvalue\u003e` for one of the `tag_key_vals` to be used. For example, if the `\u003ccluster-name\u003e` in the [EC2 variable file](#variable-file) mentioned above was `test-cluster` and we had same cluster for Zookeeper and Kafka, the `kafka_tag` and `zookeeper_tag` would be specified as `zookeeper_tag=Name_test-cluster` and `kafka_tag=Name_test-cluster` respectively. Both Zookeeper and Kafka don't have to be on the same cluster and it doesn't have to be the `Name` tag but it could be any key value pair in `tag_key_vals` specified as `zookeeper_tag=\u003ckey\u003e_\u003cvalue\u003e` and `kafka_tag=\u003ckey\u003e_\u003cvalue\u003e`.\n\n  ####Kafka's dependency on Zookeeper\n\n  Kafka's dependency on Zookeeper is taken care of by the Kafka playbook. If you are trying to ssetup Kafka on the cluster specified by `kafka_tag`, the playbook will check that Zookeeper is installed on the cluster `zookeeper_tag` and if it is not setup, the playbook will first setup Zookeeper and then Kafka. By default, any operation on Kafka cluster, like `start`, `install`, etc., will first be executed on the Zookeeper cluster. However, we would want some of the operations to be executed on the Kafka cluster, like `stop`, `uninstall`, etc., not be executed on the Zookeeper cluster. This can be achieved by specifying a flag `--skip-tags zookeeper` while running the Kafka playbook. Examples for this behavior are shown below in the `stop` and `uninstall` operations.\n  \n\n  * ####Install Kafka:\n\n    ```bash\n    ~$ ansible-playbook ./kafka.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e kafka_tag=\u003ccluster_tag\u003e\" --tags install\n    ```\n  * ####Start Kafka:\n\n    ```bash\n    ~$ ansible-playbook ./kafka.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e kafka_tag=\u003ccluster_tag\u003e\" --tags start\n    ```\n  * ####Get info about Kafka on the specified cluster:\n\n    ```bash\n    ~$ ansible-playbook ./kafka.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e kafka_tag=\u003ccluster_tag\u003e\" --tags info\n    ```\n  * ####Stop Kafka:\n\n    ```bash\n    ~$ ansible-playbook ./kafka.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e kafka_tag=\u003ccluster_tag\u003e\" --tags stop --skip-tags zookeeper\n    ```\n  * ####Uninstall Kafka:\n\n    ```bash\n    ~$ ansible-playbook ./kafka.yml --extra-vars \"zookeeper_tag=\u003ccluster_tag\u003e kafka_tag=\u003ccluster_tag\u003e\" --tags uninstall --skip-tags zookeeper\n\n    ```\n\n *  #### Vowpal Wabbit   \n\nVowpal Wabbit is a fast out-of-core Machine Learning system. Installation can take upwards of 10 minutes on micro instances, as it compiles a lot of C++ with high optimization levels using Clang. \n\n  * ####Install Vowpal Wabbit:\n\n    ```bash\n    ~$ ansible-playbook ./vw.yml --extra-vars \"vw_tag=class_vw\" --tags install\n    ```\n\n","funding_links":[],"categories":["kafka"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FInsightDataScience%2Fansible-playbook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FInsightDataScience%2Fansible-playbook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FInsightDataScience%2Fansible-playbook/lists"}