{"id":16590875,"url":"https://github.com/qlyoung/lagopus","last_synced_at":"2025-03-21T13:31:20.707Z","repository":{"id":46831043,"uuid":"235864470","full_name":"qlyoung/lagopus","owner":"qlyoung","description":"Distributed fuzzing platform","archived":false,"fork":false,"pushed_at":"2023-02-15T22:54:58.000Z","size":17271,"stargazers_count":46,"open_issues_count":3,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-15T03:44:16.668Z","etag":null,"topics":["cluster","fuzzing","kubernetes","security"],"latest_commit_sha":null,"homepage":"https://docs.lagopus.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qlyoung.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-23T18:57:39.000Z","updated_at":"2024-06-25T05:18:48.000Z","dependencies_parsed_at":"2024-10-28T10:49:28.574Z","dependency_job_id":null,"html_url":"https://github.com/qlyoung/lagopus","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qlyoung%2Flagopus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qlyoung%2Flagopus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qlyoung%2Flagopus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qlyoung%2Flagopus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qlyoung","download_url":"https://codeload.github.com/qlyoung/lagopus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244806091,"owners_count":20513378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","fuzzing","kubernetes","security"],"created_at":"2024-10-11T23:14:37.103Z","updated_at":"2025-03-21T13:31:19.414Z","avatar_url":"https://github.com/qlyoung.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/qlyoung/lagopus/blob/master/etc/lagopus.svg\" alt=\"The project logo; a stylized Arctic fox head\" width=\"20%\"/\u003e\n\u003c/p\u003e\n\n# lagopus\n\nDistributed fuzzing platform on Kubernetes.\n\nhttps://docs.lagopus.io/\n\n![Screenshots of web interface](etc/lagopus-demo.gif)\n\n\n## About\n\n*Note*: This is early access software with no production-ready releases yet.\nIt works, but there's a lot left to do to make it easy to use.\n\nLagopus is a distributed fuzzing application built on top of Kubernetes. It\nallows you to fuzz arbitrary targets using clustered compute resources. You\nspecify the target binary, fuzzing driver (`afl` or `libFuzzer`), corpus, and\njob parameters such as #CPUs, memory and TTL. Lagopus then creates a container\nfor the fuzzing job and runs it on an available cluster node.\n\nLagopus takes much of the manual work out of analyzing fuzzing results. When a\nfuzzing job finishes, Lagopus collects any crashing inputs and their stack\ntraces and analyses them for type and severity. Results are collated,\ndeduplicated and stored in a database. The generated corpus is cleaned and\nminimized. All results are deposited in your NFS share for later use.\n\nLagopus has a web interface for creating jobs, monitoring progress and fuzzing\nstatistics, and viewing crash analyses. This interface is built on top of a\nHTTP REST API which can be used for integration with external tools such as\nCI/CD systems and test suites.\n\nLagopus runs on Kubernetes, so it runs as well in on-prem, bare metal\ndeployments as it does on your choice of k8s cloud providers.\n\n## Installation \u0026 Usage\n\nSee the docs for:\n\n- [Installation](http://docs.lagopus.io/en/latest/installing.html)\n- [Usage](http://docs.lagopus.io/en/latest/usage.html)\n\n## FAQ\n\n- Q: Why not ClusterFuzz?\n\n  A: ClusterFuzz is tightly integrated with Google Cloud. While bots can be\n     deployed locally with some effort, the application itself runs on App\n     Engine, requires Firebase for authentication, only supports Google for\n     user authentication, and is quite difficult to deploy. Perhaps the most\n     annoying thing is that ClusterFuzz has a hard dependency on Google Cloud\n     Storage for fuzzing results. Because of the amount of corpus data\n     generated by fuzzers, and the long term size of uploaded target binaries,\n     this can get expensive. App Engine is also an unavoidable fixed cost.\n     Finally, even if a cloud budget is available, CF still locks you into\n     Gcloud.\n\n     ClusterFuzz is primarily designed for Google's scale and use cases. Google\n     uses ClusterFuzz to fuzz V8, Fuschia and Android, so much of the codebase\n     has customizations for these targets. It's also designed as a massively\n     scalable system - Google claims their internal instance is sized at 25,000\n     nodes. Depending on who you are, these design choices could be a pro or\n     con. I personally wanted something lighter, with first class support for\n     on prem deployments as well as non-Google clouds, and a smaller, more\n     hackable codebase.\n\n     Please note that none of this should be taken as criticism directed at\n     Google, ClusterFuzz, or any of the people working on ClusterFuzz.\n     ClusterFuzz is obviously super awesome for the scale and workloads it is\n     designed for, and if you have the cloud budget and scaling requirements,\n     it's your best option. Lagopus is for the rest of us :)\n\n- Q: Why just AFL and libFuzzer?\n\n  A: I am most familiar with those tools. More fuzzers can be added with time.\n\n- Q: Why Kubernetes?\n\n  A: Kubernetes is, to my knowledge, the only clustered orchestration tool that\n     supports certain features necessary for high performance fuzzing jobs,\n     such as static CPU resources, privileged containers, and distributed\n     storage. Also, my existing workloads were already running in Docker. And I\n     wanted to learn Kubernetes.\n\n- Q: Why [lagopus](https://en.wikipedia.org/wiki/Arctic_fox)?\n\n  A: Cuz they're awesome ^_^\n\n- Q: My target is dynamically linked and the fuzzer image doesn't have its\n     shared libraries; what do?\n\n  A: I've had good success with [ermine](http://magicermine.com/index.html).\n     Statifier will likely not work due to ASLR.\n\n     If you can't statically link your target, then you can simply copy the\n     necessary dependencies into the job zip and install them via\n     `provision.sh`, as described in the\n     [docs](http://docs.lagopus.io/en/latest/usage.html#creating-jobs).\n\n## Prior Art\n\nOther projects in the same vein.\n\n- The venerable [ClusterFuzz](https://github.com/google/ClusterFuzz), parts of\n  which are used in Lagopus\n- [LuckyCat](https://github.com/fkie-cad/LuckyCAT) (greetz!)\n\n\n## Prerequisites\n\nYou should have a k8s cluster, ideally on bare metal, or vms with static CPU\nresources.\n\nYou should have at least 1 node with at least 4 real CPUs (not hyperthreads /\nsmt). More CPUs are better. More nodes are better.\n\nYou should understand how AFL and libFuzzer operate and the differences between\nthem.\n\nYou should be aware that fuzzers thrash disk and consume massive amounts of\nCPU, and plan accordingly. See AFL docs for more info.\n\nYou should understand how rss limits work with ASAN on x86-64, and how\nlibFuzzer and AFL handle those.\n\nYou should install the absolute latest stable version of k8s. This project uses\nadvanced features of k8s for performance reasons, many of which are only\navailable in k8s \u003e= v1.17.\n\nIt's a good idea to have an operational understanding of Kubernetes.\nSpecifically, you will have an easier time with debugging cluster setup if you\nunderstand how k8s handles:\n\n- `sysctl`s\n- CPU management policies (`none` and `static`)\n- CPU affinity \u0026 resources\n- Container QOS policies (`Guaranteed`, `BestEffort`, etc)\n\n\n## Todo\n\nComplicated:\n\n- Lagopus cannot distribute multithreaded / multiprocess jobs across nodes.\n  Distribution is at the job level; a single job is always contained to a\n  single node (but not vice versa). This means a small cluster where each node\n  has a high CPU count is preferable to a larger cluster of smaller nodes.\n\n- Lagopus depends on the existence of an NFS share external to itself to store\n  job data. This isn't really a limitation, but it slightly complicates initial\n  setup.\n\n- Lagopus runs on Kubernetes, which can be a significant barrier for people who\n  aren't familiar with it.\n\nPlanned:\n\n- ~~Backtrace collection~~ :heavy_check_mark:\n- ~~Job monitoring~~ :heavy_check_mark:\n- ~~Job input validation~~ :heavy_check_mark:\n- Source-based code coverage viewer\n- Better deployment process\n- Job tags\n- CLI client\n- Corpus management\n- More fuzzers\n- Performance audit\n- TLS support\n- Security (always last :-)\n\nWishlist:\n\n- Authentication\n- Docker-compose support\n- Reduce k8s tendrils\n- Reduce vendored code\n- Support for interpreted targets\n- Support for non-x64 targets\n\n## Dev Notes\n\nMiscellaneous bits of information that may be relevant in the future, or for\ndebugging.\n\n- `gdb` will not work in a container without `seccomp=unconfined`; this is the\n  default in k8s, so it's not explicitly set anywhere in Lagopus, but when\n  running the fuzzer container manually make sure to pass\n  `--security-opt seccomp=unconfined` or you won't get the detailed analysis of\n  crashes usually provided by\n  [exploitable](https://github.com/jfoote/exploitable).\n\n- Lagopus was using gdb exploitable ^ to analyze crashes, but exploitable is\n  currently turned off because its heuristics see ASAN terminations as clean\n  exits. Currently the crash analysis code is lifted from ClusterFuzz, as it\n  already has all the gritty regexes and classifiers sorted out.\n\n- `afl` requires sysctl `kernel.core_pattern=core` to get core files. k8s has\n  support for allowing nodes to allow pods to set sysctls (pods also have\n  settings to allow that at the pod level) which results in a taint on the node\n  and therefore requires tolerances on the pod.  However, k8s only supports\n  namespaced sysctls; `kernel.core_pattern` isn't one, and thus must be\n  manually set on the node entirely outside of k8s before starting the kubelet.\n\n- Fuzzer jobs should run with dedicated cores, for a couple reasons. The first\n  is that this is just better for performance regardless of the particular\n  fuzzer in use. The second is more subtle, and applies to fuzzers that pin\n  themselves to particular CPUs in order to increase cache locality and\n  reduce kernel scheduler overhead. `afl` in particular does this. When\n  starting up, `afl` searches for \"free\" (not already pinned by another\n  process) CPU cores to bind itself to, which it determines by looking at\n  `/proc`. However, `/proc` is not bind mounted into containers by default, so\n  it's possible for another process, either on the host or another container, to\n  be bound to a given CPU even though the container's `/proc` says the core is\n  free. In this case the bind will still work but now you have two processes\n  pinned to the same CPU on the host. This is far worse than not binding at\n  all. So until container runtimes fix this (if they ever do), CPU assignments\n  must be manually set on the container itself by the container runtime.\n\n  This is a bit tricky to do in k8s. First the nodes must be configured with\n  the `static` cpu policy by passing `--cpu-manager-policy=static` to kubelet.\n  Second, the pod containers must be in the \"Guaranteed\" QOS class, which means\n  both requests and limits for both memory and cpu must be set, and must equal\n  each other. This will cause each container to have N CPUs assigned to it\n  exclusively, which solves the issue with `/proc` by sidestepping it\n  completely.\n\n  However, again there is a caveat. A peculiarity of container runtimes is that\n  even when containers are assigned to specific CPUs, the containers still see\n  all of the host CPUs and don't actually know which of them have been assigned\n  to it. This again poses some complications with `afl`'s CPU pinning\n  mechanism.  `afl`'s current (upstream) CPU selection heuristics will usually\n  fail when run in a container because it tries to bind to the first available\n  CPU (as per `/proc`), typically CPU 0, which may or may not be assigned to\n  the container. If not, the system call to perform the bind -\n  `sched_setaffinity` - will fail and `afl` will bail out.  This is solved for\n  Lagopus by packaging a patched `afl` that tries *all* cores until it finds\n  one that binding succeeds on. I have an open PR[0] against `afl` for this\n  patch, so hopefully at some point Lagopus can go back to using upstream\n  `afl`. However, Google doesn't seem to be paying much attention to the\n  repository, so who knows how long that will take.\n\n  [0] https://github.com/google/AFL/pull/68\n\n- It would be nice to use\n  [halfempty](https://github.com/googleprojectzero/halfempty) for minimization\n  instead of the current tools, as it's much faster. This can probably be done\n  fairly easily.\n\n- Corpus minimization probably shouldn't be subject to the same resource\n  constraints as fuzzing itself, because it takes so long. However, doing this\n  would introduce a lot of complications; multiple stages of pods (hard to do\n  with k8s Jobs), resource availability no longer corresponds directly to the\n  jobs the user scheduled...definitely should evaluate halfempty first to see\n  if it speeds things up enough on e.g. 2 core jobs to make this a non issue.\n\n- Jobs could be distributed across nodes by using the NFS share as a source for\n  new corpus inputs and periodically synchronizing?\n\n- Collecting `perf stat` output for fuzzing tasks into InfluxDB could be cool.\n  However, perf inside containers is still a pita, for the same reasons that\n  make procfs a pain. Brendan Gregg touched on this in his container perf talk,\n  https://www.youtube.com/watch?v=bK9A5ODIgac @ 33m.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqlyoung%2Flagopus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqlyoung%2Flagopus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqlyoung%2Flagopus/lists"}