{"id":31816810,"url":"https://github.com/splunk/docker-swarm-splunk-hf","last_synced_at":"2025-10-11T09:58:40.958Z","repository":{"id":40464979,"uuid":"222508815","full_name":"splunk/docker-swarm-splunk-hf","owner":"splunk","description":"Run Splunk heavy forwarders in Docker Swarm for high availability, security, and reduced cost!","archived":false,"fork":false,"pushed_at":"2024-03-14T13:49:44.000Z","size":217,"stargazers_count":13,"open_issues_count":11,"forks_count":9,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-04-15T02:58:39.964Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/splunk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-11-18T17:47:00.000Z","updated_at":"2024-03-21T23:35:12.000Z","dependencies_parsed_at":"2024-02-05T16:58:22.468Z","dependency_job_id":"7bd898e3-5c14-4bc6-9647-835e8c2333ca","html_url":"https://github.com/splunk/docker-swarm-splunk-hf","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/splunk/docker-swarm-splunk-hf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fdocker-swarm-splunk-hf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fdocker-swarm-splunk-hf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fdocker-swarm-splunk-hf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fdocker-swarm-splunk-hf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/splunk","download_url":"https://codeload.github.com/splunk/docker-swarm-splunk-hf/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/splunk%2Fdocker-swarm-splunk-hf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006750,"owners_count":26084185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-11T09:58:22.192Z","updated_at":"2025-10-11T09:58:40.951Z","avatar_url":"https://github.com/splunk.png","language":"Python","readme":" # Docker Swarm Management for Heavy Forwarders\n\n ## Introduction\n\n ### Problem\n\n Managing \"do everything\" heavy forwarders is tedious: dependencies, deployment servers, conflicting configurations,\n failover.\n\n This can be easier.\n\n ### Solution\n\n Containers allow us to independently manage one input, or group a handful of similar inputs without additional\n hardware, real or virtual.  They can also be shipped to another infrastructure without worrying about the other admins\n changing anything.\n\n Docker Swarm enables scaling and failover for these containers.  Sprinkle in some persistent storage and your heavy\n forwarders are now significantly more resilient than before.\n\n ## Basic Process\n\n These playbooks provide you with this path:\n\n * Deploy Docker Swarm for your environment\n\n * Define your heavy forwarder image\n   * Operating System\n   * Splunk Version\n   * TAs (or other configs) to install via local file, templates, git repositories, etc.\n   * Healthcheck index/sourcetype and expected event time period\n   * Volumes for persistent storage\n\n* Build your image, along with some convenient variants\n\n   * Primary image\n      * Intended to be deployed to swarm\n      * Disables indexes, splunkweb\n      * Configures license master\n\n   * Standalone image\n      * Eases development efforts prior to deploying to swarm\n      * Leaves splunkweb enabled\n\n* Push your image to a registry\n\n* Deploy a service using your image\n\n## Deploy Docker Swarm\n\nNOTE: This playbook and associated roles are a starting point, and assume your docker nodes are all running\nRHEL/CentOS 7.\n\n### Define your nodes\n\nIn your inventory, you should specify groups in this manner:\n\n    docker_nodes:\n      children:\n        # defines which host will perform the build\n        # only the first host in this group will ever be used\n        docker_build_hosts:\n          hosts:\n            docker-build-hostname:\n        # nodes that are actually part of the swarm\n        docker_swarm_nodes:\n          children:\n            # swarm managers are a subset of swarm nodes that are defined as managers of the node\n            # however it is valid, and potentially preferred, for all swarm nodes to be managers\n            docker_swarm_managers:\n              hosts:\n                docker-swarm-hostname01:\n                ...\n                docker-swarm-hostname0N:\n\nOnce your inventory has been defined, you can provision your docker nodes by running:\n\n`ansible-playbook -i \u003cpath-to-inventory-file\u003e docker_nodes_provision.yml`\n\nThis playbook (docker_nodes_provision.yml) supports automated Docker installation and configuration on both **CentOS** and **Ubuntu** hosts.\n\n## Features\n- Detects and handles CentOS and Ubuntu distributions\n- Installs Docker on manager and worker nodes\n- Optional `firewalld` support (disabled by default)\n\n## Enable Firewalld (Optional)\n\nBy default, `firewalld` is **disabled**.\nTo enable and configure `firewalld`, run the playbook with the `enable_firewall` tag:\n\n```bash\nansible-playbook -i \u003cpath-to-inventory-file\u003e docker_nodes_provision.yml --tags enable_firewall\n```\n\n### Additional Configuration\n\nYou can also specify additional packages to be installed on CentOS and Ubuntu hosts by defining the following variables in your inventory:\n\n- `centos_packages`: A list of additional packages to install on CentOS hosts.\n- `ubuntu_packages`: A list of additional packages to install on Ubuntu hosts.\n\nFor example:\n\n```yaml\ncentos_packages:\n  - vim\n  - git\n  - curl\n\nubuntu_packages:\n  - vim\n  - git\n  - curl\n```\n\nAdditionally, the playbook uses the `system_architecture` variable to determine the architecture of the system (e.g., `amd64`, `arm64`) when adding the Docker repository. This ensures compatibility with the host's architecture.\n\nYou can define the `system_architecture` variable explicitly in your inventory or allow it to be automatically detected by Ansible. For example:\n\n```yaml\nsystem_architecture: amd64\n```\n\nThis variable is used in tasks like adding the Docker repository:\n\n```yaml\napt_repository:\n  repo: \"deb [arch={{ system_architecture }} signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable\"\n  state: present\n  filename: docker\n```\n\nThe playbook uses the `ubuntu_prerequisites` variable to define the list of prerequisite packages that need to be installed on Ubuntu hosts. These packages are installed before configuring Docker.\n\nYou can customize the `ubuntu_prerequisites` variable in your inventory or group variables file. For example:\n\n```yaml\nubuntu_prerequisites:\n  - curl\n  - jq\n```\n\n## Examples\n\n### Simplest image definition\n\nThis image doesn't actually do anything valuable, but shows the required variables that must be set to perform the\nbuild.  The full inventory file is in [all-in-one/inventory.yml](examples/all-in-one/inventory.yml).  The below shows\nthe host variables only.\n\n    # every image must have a version\n    version: 1.0.0\n\n    # define where to get the splunk tarball\n    # this can be http://, ftp://, file://, etc.\n    # or more specifically, any URL wget will handle\n    splunk_tgz_url: \u003curl to splunk tarball\u003e\n\n    # set the splunk admin user/password for user-seed.conf\n    splunk_admin_user: admin\n    splunk_admin_password: pleasechangeme\n    # you could use splunk_admin_password_hash for a pre-hashed password\n    # splunk_admin_password_hash = \u003cyour password hash\u003e\n\n    # a license master uri is required\n    # though forwarders don't index, and therefore don't consume license, they sometimes need to run searches\n    # against populated lookups to forward results to the indexing tier.  the forwarder license doesn't enable\n    # this type of usage.\n    license_master_uri: https://lm.example.org:8089\n\n### Splunk metrics.log healthchecking image\n\n#### But first, let's organize with group_vars\n\nBecause our image definitions are represented as Ansible hosts, they can make use of Ansible's standard variable\nprecedence.  For our examples we're going to place the common variables from the above example in group_vars. This\ninventory is named [organized-environment](examples/organized-environment).\n\nHere we have an [inventory.yml](examples/organized-environment/inventory.yml), just as above.  But it only keeps the\nhost-specific variables, not the common ones, as they have been moved to\n[organized-environment/group_vars/all.yml](examples/organized-environment/group_vars/all.yml).\n\nOur simplest image definition is still included in this [inventory.yml](examples/organized-environment/inventory.yml),\nbut now it only has one variable (`version`), since the rest of its configuration was generic and applicable to almost\nany host in our environment.\n\nNow that our inventory is a bit easier to manage for multiple images, we'll move to the healthchecking image.\n\n#### internal_healthcheck_forwarder's image-specific config\n\nIn [organized-environment/inventory.yml](examples/organized-environment/inventory.yml) under\n`internal_healthcheck_forwarder`:\n\n    # remember every image must have a version number\n    version: 1.0.0\n\n    # in splunk's metrics.log\n    # for the per_index_thruput group\n    healthcheck_metrics_group: per_index_thruput\n    # for the _internal series\n    healthcheck_metrics_series: _internal\n    # expect to see non-zero thruput no older than 60 seconds\n    healthcheck_allowed_age_seconds: 60\n\nThis configuration enables Docker's builtin healthcheck functionality, using Splunk's metrics.log to identify if the\ncontainer has processed events for the configured group/series.\n\n### Staging components into your Docker image\n\nFor any of this to have any value, you need a way to add custom content to the built image.  This is done through\n\"stage_items\".  You are given two standard variables to use: `common_stage_items` (intended for `group_vars`) and\n`host_stage_items` (intended for `host_vars`).\n\nThese two variables allow you to deploy a common set of components to all of your images, while still allowing you to\ntailor individual images for their specific purposes.\n\nAn example of `host_stage_items` use is to include the specific Splunk TA and `inputs.conf` to enable data collection.\n\nBut all of your images will need to forward their collected data to your indexers.  `common_stage_items` lets you define\nthat configuration just once, in `group_vars`, to ensure that every container you run will properly send its events,\nand not just index them locally to never see the light of day.\n\nIn our inventory, we have this in [group_vars/all.yml](examples/organized-environment/group_vars/all.yml):\n\n    common_stage_items:\n      - type: copy\n        src: \"{{ inventory_dir }}/files/docker_forwarder_outputs\"\n        # stage_items dest points to a staging_path\n        dest: splunk\n        # because dest is, effectively, /opt/splunk, have this directory copied to a subdirectory\n        dest_sub_path: etc/apps/docker_forwarder_outputs\n\nThis tells the build process to copy\n[files/docker_forwarder_outputs](examples/organized-environment/files/docker_forwarder_outputs) to the `splunk`\nstaging path, and place it `etc/apps/docker_forwarder_outputs` under that path.\n\nWhich brings us to...\n\n#### Staging paths\n\nA \"staging path\" is a directory that will be copied during the build process, via `Dockerfile`'s `COPY` command.  This\nwas implemented to minimize the number of image layers that will be created, since each `COPY` command creates a new\nlayer. It ends up simplifying most image build definitions, by allowing re-use of standard paths.\n\nThese playbooks ship with two default build paths:\n\n* `build`\n* `splunk`\n\n`build` represents the directory where `Dockerfile` is created.  As such this path never has a COPY performed on it.\n\n`splunk` represents `$SPLUNK_HOME`.  This is where most configurations end up needing to be placed.\n\nAdditional staging paths can be created, but that is an advanced topic for later.\n\n#### Standalone images shouldn't forward to the indexers\n\nBut what about the standalone image that also gets created?  In my opinion it shouldn't forward events to the indexers,\nbecause it's intended to be run in a more convenient form (with splunkweb still enabled, etc.) where I imagine most of\nyour troubleshooting would occur.\n\nStage items also give you the option of using a `condition`. This is a jinja expression that is evaluated while staging\nitems to build your image, and if true that particular item will be included for that image.  Our\n`docker_forwarder_outputs` example actually looks like this in our\n[group_vars/all.yml](examples/organized-environment/group_vars/all.yml):\n\n    common_stage_items:\n      - type: copy\n        src: \"{{ inventory_dir }}/files/docker_forwarder_outputs\"\n        # stage_items dest points to a staging_path\n        dest: splunk\n        # because dest is, effectively, /opt/splunk, have this directory copied to a subdirectory\n        dest_sub_path: etc/apps/docker_forwarder_outputs\n        # but only when building the primary (not standalone) build\n        condition: \"{{ build_vars.primary_build }}\"\n\nThis `condition` makes use of a variable defined in our \"build variations\" configuration, `primary_build`, which has\na value of `True` when the non-standalone image is built.  Thus we have these image versions built:\n\n* 1.0.0 (has `docker_forwarder_outputs`)\n* 1.0.0-standalone (does not have any forwarding app installed)\n\n#### Included, out of the box stage items\n\n[templates/build](roles/docker_image_build/templates/build)\n* Dockerfile\n\n[templates/base](roles/docker_image_build/templates/base)\n* Install user-seed.conf\n* Configure server.conf, inputs.conf with servername\n* Configure convenience indexed fields\n* Configure `storageEngineMigration` in `server.conf` when `migrate_kvstore` is defined (only needed on version 8.x)\n* Always upgrade kvstore from 3.6 to 4.0 (only works on 9.x)\n* Upgrade kvstore to latest version when `upgrade_kvstrore` is defined\n\n[templates/kvstore_disable](roles/docker_image_build/templates/kvstore_disable)\n* Disables KVStore on primary builds (but overrideable)\n\n[templates/metrics_healthcheck](roles/docker_image_build/templates/metrics_healthcheck)\n* Enables the metrics.log healthcheck functionality detailed above\n\n[templates/primary_build](roles/docker_image_build/templates/primary_build)\n* Deployed to primary builds only\n* Disables indexes\n* Disables splunkweb\n* Sets license master\n* Sets pass4SymmKey\n\nThe set of included stage items attempts to get you on your feet with this process as soon as possible.  These are\nconfigurations we consider standard and reasonable for almost all instances.\n\n### Mounts and volumes\n\nMany use cases of heavy forwarders require some form of persistent storage for checkpoints.  You get full control of\nmounts defined for your built image and deployed service.\n\nSimilar to \"stage items\", you are given two standard variables to use: `common_volumes` (intended for `group_vars`) and\n`host_volumes` (intended for `host_vars`).\n\nEach item in the `_volumes` list is a dictionary, with this set of keys:\n\n    host_volumes:\n      - name: \u003cname of volume\u003e\n        path: \u003cpath to be mounted on the container\u003e\n        mount_type: image_volume|service_volume|service_bind\n        # optional volume_type.  leave unset for \"normal\" docker volumes.\n        # requires additional configuration for nfs or other drivers.\n        #volume_type:\n        owner: \u003cowner of the mounted location\u003e\n        group: \u003cgroup of the mounted location\u003e\n\n#### volumes_mounted_forwarder\n\nThe `volumes_mounted_forwarder` image definition has this volume definition:\n\n    host_volumes:\n      # create a bind mount to see how the container can be made aware of specific host data\n      - mount_type: service_bind\n        source: /etc/hosts\n        target: \"{{ splunk_home }}/host_etc_hosts\"\n\n      # create a persistent volume for $SPLUNK_HOME/var\n      # note that if your swarm nodes don't have a common storage path, this likely needs to be NFS\n      # or it's only persistent per node, and not across all nodes\n      - mount_type: service_volume\n        name: volumes_mounted_forwarder_service_volume\n        path: \"{{ splunk_home }}/var\"\n\nThe second volume enables persistent storage for $SPLUNK_HOME/var.  This allows logging and checkpointing to survive\nbetween runs of the container.\n\n### Time to build!\n\nNow that we've discussed the components of the image definition, let's actually build the sample inventory.\n\n#### Prerequisites\n\nTODO - Add them here\n\n#### The build playbook\n\nFrom the directory containing this set of playbooks, run:\n\n`ansible-playbook -i examples/organized-environment/inventory.yml build.yml`\n\nIf all goes well, you should have six new docker images:\n\n    % docker image ls\n    REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE\n    hello_swarm_forwarder            1.0.0               1cc85627b805        4 minutes ago       1.16GB\n    hello_swarm_forwarder            1.0.0-standalone    c57563200b60        4 minutes ago       1.16GB\n    internal_healthcheck_forwarder   1.0.0               c1aced0b9f2e        4 minutes ago       1.16GB\n    internal_healthcheck_forwarder   1.0.0-standalone    87341b058f36        4 minutes ago       1.16GB\n    volumes_mounted_forwarder        1.0.0               d7ad71c2bad9        5 minutes ago       1.16GB\n    volumes_mounted_forwarder        1.0.0-standalone    11d48a8886bd        4 minutes ago       1.16GB\n\n### Time to push!\n\n#### Prerequisites\n\n* Docker registry to you have permissions to push to\n* Changes made to the [inventory variables](examples/organized-environment/group_vars/all.yml) for:\n  * registry\n  * registry_username\n  * registry_password\n  * repository_path (optional)\n* Changes made to the [inventory hosts](examples/organized-environment/inventory.yml) list for:\n  * docker_nodes\n\n#### Pushing to an ECR repository\n\nTo use a ECR repository on AWS, set `is_ecr_registry: true` this will call the `aws_tools` role and install the aws cli on the ansible controller, and grab the password for `docker_login` to use.\n\nThe following variables need to be set to use a ECR repository:\n* ecr_access_key_id\n* ecr_secret_access_key\n* ecr_aws_region\n* aws_ecr_username (defaults to `AWS`)\n\n#### The push playbook\n\nFrom the directory containing this set of playbooks, run:\n\n`ansible-playbook -i examples/organized-environment/inventory.yml push.yml`\n\nThis will log in to the repository and perform the necessary tasks to have your newly built images pushed to your Docker\nregistry.\n\nThat's it! There's very little to this step, but it is separated out to prevent unintentional pushing of an image prior\nto validation that it is correct.\n\n### Time to deploy!\n\n#### Prerequisites\n\n* Operational Docker Swarm environment.\n\nTODO - Add Swarm bringup playbooks and documentation.\n\n#### The deploy playbook\n\nFrom the directory containing this set of playbooks, run:\n\n`ansible-playbook -i examples/organized-environment/inventory.yml deploy.yml`\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsplunk%2Fdocker-swarm-splunk-hf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsplunk%2Fdocker-swarm-splunk-hf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsplunk%2Fdocker-swarm-splunk-hf/lists"}