{"id":13658722,"url":"https://github.com/networkop/arista-network-ci","last_synced_at":"2025-10-23T22:30:49.680Z","repository":{"id":41467384,"uuid":"150089279","full_name":"networkop/arista-network-ci","owner":"networkop","description":"A portable network CI demo with Gitlab, Ansible, cEOS, Robot Framework and Batfish","archived":false,"fork":false,"pushed_at":"2020-09-18T09:50:26.000Z","size":633,"stargazers_count":74,"open_issues_count":0,"forks_count":22,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-09-27T10:42:21.481Z","etag":null,"topics":["ansible","arista","batfish","gitlab-ci","network-ci","robot-framework"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/networkop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-24T10:57:34.000Z","updated_at":"2024-09-06T09:19:30.000Z","dependencies_parsed_at":"2022-09-05T21:30:55.432Z","dependency_job_id":null,"html_url":"https://github.com/networkop/arista-network-ci","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/networkop%2Farista-network-ci","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/networkop%2Farista-network-ci/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/networkop%2Farista-network-ci/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/networkop%2Farista-network-ci/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/networkop","download_url":"https://codeload.github.com/networkop/arista-network-ci/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219867717,"owners_count":16555814,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","arista","batfish","gitlab-ci","network-ci","robot-framework"],"created_at":"2024-08-02T05:01:02.025Z","updated_at":"2025-10-23T22:30:44.314Z","avatar_url":"https://github.com/networkop.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# A portable network CI demo with Gitlab, Ansible, cEOS, Robot Framework and Batfish\n\n## Introduction\n\nThe goal of this project is to demonstrate the following:\n1. Applicability of CI/CD principles to data centre networks\n2. The use of Arista cEOS and [docker-topo](https://github.com/networkop/arista-ceos-topo) to build arbitrary network topologies\n3. The use of [Batfish](https://github.com/batfish/batfish) to analyze network configurations and verify the expected control plane and data plane properties\n4. The use of [Robot Framework](https://github.com/aristanetworks/robotframework-aristalibrary) for network testing and verification\n5. The use of Ansible to generate complex network configurations from simple data models \n6. The use of Ansible to generate **diffs** for both network configurations and network state pre- and post-change\n\nTo achieve the above goals, I'll use the following topology:\n\n![Test Network](./img/net.png)\n\nWe start with Leaf-1, Leaf-2, Spine-1 and Spine-2 fully functional. Our CI pipeline is going go through the following sequence of actions:\n\n1. Generate configurations for the new Leaf-3 (along with the necessary configs for both of the spine switches)\n2. Analyse these new configurations and their resulting data plane with Batfish\n3. Build a lab topology with docker-topo and run a series of Robot Framework tests against it\n4. Generate **diffs** between the proposed and the current production configuration\n5. Wait for a manual trigger to push the changes to the production network\n6. Collect and compare the contents of IPv4 FIB and ARP tables before and after the change\n\nThe generated configs will have several configuration bugs injected to best illustrate the most salient points of Batfish and Robot Framework, and the following walkthrough is going to explain how those bugs are found and flagged by both tools and how to rectify them and achieve a successfull build.\n\nAll of the components of this demo are encapsulated inside Docker containers. The following Docker containers are created directly on the test machine's Docker engine:\n\n* Gitlab server - serves as a git repository and a CI server\n* Gitlab runner - receives jobs scheduled by the CI server and executes them\n* Docker registry - stores large Docker images to speed up the running of the demo\n* Batfish - the server component of Batfish network configuration analysis tool\n\nIn addition to the above containers, Gitlab runner will spin-up Docker container to execute scheduled jobs and this container will, in turn, spin-up cEOS containers that will simulate the network topology. Such \"nested containerization\" is allowed through the use of [docker-in-docker](https://github.com/jpetazzo/dind).\n\n## Environment Setup\n\n\u003e Note: Stable internet connection is required\n\nClone this git repository\n\n```\ngit clone https://github.com/networkop/arista-network-ci.git \u0026\u0026 cd arista-network-ci\n```\n\nSave the full absolute path to the current directory:\n\n```\nexport ROOTPATH=$(pwd)\n```\n\nBuild the Batfish server Docker image (this may take up to 20 minutes):\n\n```\ndocker build -t batfish tests/batfish/\n```\n\nDownload cEOS-lab and Network Validation tools from [arista.com](arista.com) and copy them into a local directory\n\n```\nls -q *.tar.*\ncEOS-Lab.tar.xz  network_validation-1.0.1.tar.gz\n```\n\nBuild the Docker executor image that will be used by the Gitlab runner to execute jobs (this may take up to 10 minutes):\n\n```\ndocker build -t networkci .\n```\n\nFind out the default Docker bridge IPv4 address of the local host:\n\n```\nexport PRIMARY_IP=$(docker network inspect bridge --format \"{{range .IPAM.Config }}{{.Gateway}}{{ end }}\")\n```\n\nThis ip will be used in a default url for Gitlab server and Docker registry\n\nEnable insecure Docker registry at this address and restart docker daemon:\n```bash\n# cat \u003c\u003cEOF \u003e /etc/docker/daemon.json \n{\n  \"insecure-registries\" : [\"$PRIMARY_IP:5000\"]\n}\nEOF\n# systemctl restart docker\n```\n\nSpin-up the local Gitlab server, runner, Batfish server and a private docker registry(assuming docker-compose is [installed](https://docs.docker.com/compose/install/))\n\n```\ndocker-compose up -d\n```\n\nImport the cEOS-lab image into the private Docker registry\n\n```\ndocker import cEOS-Lab.tar.xz ceos:4.20.0F\ndocker tag ceos:4.20.0F $PRIMARY_IP:5000/ceos:4.20.0F\ndocker push $PRIMARY_IP:5000/ceos:4.20.0F\n```\n\nImport Docker executor image into the private Docker registry\n\n```\ndocker tag networkci $PRIMARY_IP:5000/networkci\ndocker push $PRIMARY_IP:5000/networkci\n```\n\nUpdate .gitlab-ci.yml's PRIMARY_IP with the value of the PRIMARY_IP environment variable\n\n```\nsed -ri \"s/(^\\s*)PRIMARY_IP:.*$/\\1PRIMARY_IP: ${PRIMARY_IP}/\" .gitlab-ci.yml\n```\n\nWait until the Gitlab server's status changes from `starting` to `healthy`:\n\n```\nwatch docker ps\n```\n\nRegister the Gitlab Runner\n\n```\n$ ./register-runner.sh \nRunning in system-mode.                            \n                                                   \nRegistering runner... succeeded                     runner=RunnerTo\nRunner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded! \n```\n\n\n\nLogin the Gitlab server URL http://localhost:9080 with username `root` and password `GitLabAdmin`, create a new **public** project called `network-ci`.\n\n![Test Network](./img/project.png)\n\nRedirect the current repository to the new project's remote\n\n```\ngit remote remove origin\ngit remote add origin http://$PRIMARY_IP:9080/root/network-ci.git\n```\n\nMake sure you're on the master branch. \n\n```\ngit branch\n* master\n```\n\nIf not, create it:\n\n```\ngit checkout -b master\n```\n\n## Network CI Walkthrough\n\n### Building production network\n\nBefore we start demonstrating our CI pipeline, we need to build a simulation of the production network. To do that, first, generate the \"current\" configs for production (these configs are missing the Leaf-3 BGP peerings).\n\n```\ncd ./build\nansible-playbook -e @group_vars/current.yml -e buildenv=prod -e outpath=$ROOTPATH/prod/configs/ build.yml\n```\n\nBuild the \"production\" topology (requires [docker-topo](https://github.com/networkop/arista-ceos-topo) to be installed)\n\n```\ncd ../prod\ndocker image pull networkop/alpine-host:latest\ndocker image tag networkop/alpine-host alpine-host\ndocker-topo --create topo.yml\n```\n\nPaste all the aliases provided in the output of the previous command into the current shell\n\n```\nalias Leaf-1='docker exec -it clos_Leaf-1 Cli'\nalias Leaf-2='docker exec -it clos_Leaf-2 Cli'\n```\n\nLogin Spine-1 and Spine-2 and verify that only two BGP peerings are configured and are in `Established` state\n\n```\nSpine-2# sh ip bgp sum\nBGP summary information for VRF default\nRouter identifier 1.1.1.200, local AS number 65100\nNeighbor Status Codes: m - Under maintenance\n  Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State  PfxRcd PfxAcc\n  10.0.255.1       4  65001              4         4    0    0 00:00:29 Estab  0      0\n  10.0.255.3       4  65002              4         4    0    0 00:00:29 Estab  0      0\n```\n\n### Pushing the first change\n\nFirst, have a look at the proposed data model:\n\n```\ncd $ROOTPATH\ncat build/group_vars/intended.yml \n```\n\nThis data model gets used to build the full network configurations for all devices, including all leaf and spine switches.\n\nPush the current repo to the Gitlab server using root/GitLabAdmin as credentials:\n\n```\ngit add -A .; git commit -m \"new change\"; git push -u origin master\n```\n\nGitlab server automatically detects the CI pipeline in `.gitlab-ci.yml` file and schedules to run in on our gitab runners. The pipeline is built out of 5 stages and 6 jobs, which are executed in the order specified in the `stages` variable.\n\nThe first, **build stage**, contains two jobs that generate configs for lab and production network topologies respectively. They are saved as artifacts, which makes them available to all subsequent pipeline stages.\n\n```yaml\n\n.generate: \u0026generate_template\n  before_script:\n  - apk add --upgrade ansible\n  - pip3 install --upgrade netaddr\n  - mkdir -p $CI_PROJECT_DIR/outputs/configs\n  script: \n  - cd build\n  - ansible-playbook -e buildenv=$BUILDENV \n                     -e ansible_python_interpreter=/usr/bin/python3 \n                     -e @group_vars/intended.yml \n                     -e outpath=$CONF_DIR\n                     build.yml\n  artifacts:\n    when: always\n    paths:\n    - ./outputs\n\ngenerate_lab_configs:\n  \u003c\u003c: *generate_template\n  stage: build\n  variables:\n    BUILDENV: lab\n\ngenerate_prod_configs:\n  \u003c\u003c: *generate_template\n  stage: build\n  variables:\n    BUILDENV: prod\n```\n\nThe second, **analysis stage** runs a series of Batfish tests against the generated configurations and verifies that:\n* The configs do not contain any unused or non-existent configuration objects\n* All BGP peerings have been configured correctly and each leaf peers with each one of the spine switches\n* The traceroute between Leaf-3 and Leaf-1/2 loopbacks is successful, traverses exactly two hops and takes two different paths(i.e satisfies the multipathing property of Clos topologies)\n* In case of a failure of **Spine-2** the traceroutes between leaf loopbacks still succeeds.\n\n```yaml\nbatfish:\n  stage: analysis\n  before_script:\n    - mkdir -p ./candidate/configs\n    - mkdir -p ./candidate-with-failure/configs\n    - cp ./outputs/configs/prod_* ./candidate/configs\n    - cp ./outputs/configs/prod_* ./candidate-with-failure/configs\n    - cp ./tests/batfish/node_blacklist ./candidate-with-failure/node_blacklist\n  script:\n    - ./tests/batfish/leaf-3.py --host $PRIMARY_IP --log ./outputs/batfish.txt\n  artifacts:\n    when: always\n    paths:\n      - ./outputs\n```\n\nThe third, **test stage** builds a lab topology out of cEOS Docker images and runs a series of Robot Framework tests against it to confirm that:\n* Leaf-1 and Leaf-2 loopbacks are learned by Leaf-3 via BGP\n* The SVIs of those switches are reachable and Host-3 can reach both Host-1 and Host-2  \n\nThe final Robot Framework report is uploaded to Gitlab as artifact and can be viewed from Gitlab web GUI, rendered by the built-in Github Pages.\n\n\n```yaml\nlab_testing:\n  stage: test\n  before_script:\n  - dockerd --insecure-registry $LOCAL_REGISTRY \u0026\n  - sleep 2\n  - mkdir -p $CI_PROJECT_DIR/outputs/robot\n  - docker image pull $LOCAL_REGISTRY/ceos:4.20.0F\n  - docker image tag $LOCAL_REGISTRY/ceos:4.20.0F ceos:4.20.0F\n  - docker image pull networkop/alpine-host:latest\n  - docker image tag networkop/alpine-host alpine-host\n  - pip3 install --upgrade git+https://github.com/networkop/arista-ceos-topo.git\n  - docker-topo --create ./tests/robot/topo.yml\n  - docker exec lab_Leaf-1 wfw Aaa\n  - sleep 120\n  script:\n  - cd ./tests/robot\n  - mkdir ./report\n  - validate_network.py --config test.yml --reportdir report\n  - cp report/* $CI_PROJECT_DIR/outputs/robot/\n  - ./has_failed.sh\n  artifacts:\n    when: always\n    paths:\n    - ./outputs\n```\n\nThe fourth stage, called **diff**, dry-runs the new change against production network, generates configuration diffs for each device in the topology and saves them as artifacts for future review.\n\n```yaml\ngenerate_diffs:\n  stage: diff\n  before_script:\n  - mkdir -p $CI_PROJECT_DIR/outputs/diffs\n  script:\n  - cd ./build\n  - ansible-playbook --diff --check \n                     -e PROD_IP=$PRIMARY_IP \n                     -e buildenv=$BUILDENV \n                     -e outpath=$CI_PROJECT_DIR/outputs/diffs\n                     -e confdir=$CONF_DIR\n                     diff.yml\n  variables:\n    BUILDENV: prod\n  artifacts:\n    when: always\n    paths:\n    - ./outputs\n```\n\nThe final job has to be triggered manually, with the assumption that the change reviewer first needs to be satisfied with the test results and generated diffs. The outputs of `show ip route` and `show arp` are captured before and after the change is pushed and saved as artifacts of the current pipeline run.\n\n```yaml\npush_to_prod:\n  stage: push\n  script:\n  - cd ./build\n  - ansible-playbook -e PROD_IP=$PRIMARY_IP \n                     -e buildenv=$BUILDENV  \n                     push.yml\n  when: manual\n  allow_failure: true\n  artifacts:\n    paths:\n    - ./build/snapshots\n  variables:\n    BUILDENV: prod\n```\n\n### Fixing bugs found by Batfish\n\nThe first run of the pipeline should fail at the batfish job, which should detect a large number of issues. To view the output of batfish, we can navigate to Project -\u003e CI/CD -\u003e Pipelines -\u003e Latest failed pipeline -\u003e Batfish and either view it right there or click `Browse` Job artifacts and find the `batfish.txt` file. \n\n![Batfish Errors](./img/batfish.png)\n\nBased on the provided output we can conclude that:\n\n* Each configuration file contains an undefined data structure (e.g. Leaf-1's config at line 31)\n* each configuration file contains an unused data structure (e.g. Leaf-1's config at line 33)\n* All traceroutes from Leaf-3's loopback to Leaf-1 and Leaf-2 loopbacks have failed due to `NO_ROUTE` reason\n\nIf we browse to artifacts and examine production Leaf-1's config at lines 31 and 33 we will see the following:\n\n![Batfish Error #1](./img/pl-error.png)\n\nThis looks like a misspelt prefix list name which is used as a match in a route-map that controls which routes get redistributed into BGP. So this would also explain why the traceroutes between the loopbacks have failed as well. To correct that we can edit the [routing.j2](./build/roles/build/template/routing.j2) file to make the prefix-list names match. Once done, we can re-run our pipeline\n\n```bash\ngit add .; git commit -m \"bug fix\"; git push\n```\nThe next pipeline run will also fail, however this time, the error is different:\n\n![Batfish Error #2](./img/mpath-error.png)\n\nThis looks like the traceroute hasn't explored all the possible paths from one leaf to another, which can only happen if we haven't configured the BGP `maximum-paths` setting. We can once again edit the `routing.j2` and add the `maximum-paths 2` right after the redistribution statement and re-run the pipeline again:\n\n```bash\ngit add .; git commit -m \"bug fix\"; git push\n```\n\n### Fixing bugs found by Robot\n\nThe next run of the pipeline will fail on Robot tests. Specifically, the data plane connectivity tests that verify that Leaf-3 can reach SVIs of Leaf-1 and Leaf-2 is failing. If we browse artifacts collected for this job, we'll see a folder called `robot`. Inside that folder, the `log.html` file contains the outputs of Robot test run, including the snapshot of BGP RIB, captured right before the data plane tests. If we expand that test, we won't be able to see any of the remote `192.168.X.X` subnets. \n\n![Robot Error #1](./img/redist-error.png)\n\nIf we go back to our configs, we should be able to see that since we're only redistributing loopbacks, none of the other directly connected subnets ends up in the BGP RIB. So let's fix that by allowing all directly connected networks to be redistributed:\n\n```\n!\nroute-map RMAP-CONNECTED-BGP permit 1000\n!\n```\n\nLet's push the updates to Gitlab:\n\n```bash\ngit add .; git commit -m \"bug fix\"; git push\n```\n\nNext time our pipeline would fail on the last, end-to-end reachability test that is run between Host-3 and Host-1/2.\n\n![Robot Error #2](./img/svi-error.png)\n\nThis final test performs a ping from a directly connected host and it looks like, although Leaf-3 can reach both SVIs, Host-3 cannot. To fix that let's get a closer look at our access ports data model:\n\n```yaml\nservers:\n  Leaf-1:\n    interfaces:\n      lab: eth3\n      prod: Ethernet10-11\n    vlan: 10\n    svi: 192.168.10.1/24\n  Leaf-2:\n    interfaces:\n      lab: eth3\n      prod: Ethernet10\n    vlan: 20\n    svi: 192.168.20.1/24\n  Leaf-3:\n    interfaces:\n      lab: eth3\n      prod: Ethernet10\n    vlan: 30\n    svi: 192.168.30.3/24\n```\n\nIt looks like the IP assigned to Leaf-3's SVI is not the first IP of the subnet, which is how Leaf-1 and Leaf-2 are configured. Let's chang that and re-run our pipeline:\n\n```bash\ngit add .; git commit -m \"bug fix\"; git push\n```\n\n### Diff review and manual push\n\nThis time all tests should complete successfully and the pipeline stops right before the change gets pushed into production. The penultimate job dry-runs the proposed changes against the production network, generates diffs and stores them as artifacts, like this example for Spine-2:\n\n```diff\n--- system:/running-config\n+++ session:/ansible_1537901087-session-config\n@@ -47,6 +47,9 @@\n    neighbor 10.0.254.3 remote-as 65002\n    neighbor 10.0.254.3 send-community\n    neighbor 10.0.254.3 maximum-routes 12000 \n+   neighbor 10.0.254.5 remote-as 65003\n+   neighbor 10.0.254.5 send-community\n+   neighbor 10.0.254.5 maximum-routes 12000 \n    redistribute connected route-map RMAP-CONNECTED-BGP\n !\n end\n```\n\nFinally, once the change reviewer is happy with the test results, configs and diffs, he/she may chose to push the change directly into production. The final job, `push_to_prod`, captures the state of IPv4 routing table and ARP tables before and after the change and generates deltas to see what routes/ARPs have been added or removed as the result. These state snapshots are stored as artifacts in `outputs/snapshots` folder. For example, this is how added IPv4 routes delta is reported by the last job:\n\n```\n###########################\n# IP routing table added: #\n###########################\n\n[\n    \"1.1.1.1/32\", \n    \"1.1.1.200/32\", \n    \"1.1.1.100/32\", \n    \"192.168.20.0/24\", \n    \"10.0.254.0/31\", \n    \"1.1.1.2/32\", \n    \"10.0.254.2/31\", \n    \"192.168.30.0/24\", \n    \"192.168.10.0/24\", \n    \"10.0.255.0/31\", \n    \"1.1.1.3/32\", \n    \"10.0.255.2/31\"\n]\n------------------------------------------------------\n\n#############################\n# IP routing table removed: #\n#############################\n\n[]\n------------------------------------------------------\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnetworkop%2Farista-network-ci","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnetworkop%2Farista-network-ci","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnetworkop%2Farista-network-ci/lists"}