{"id":49576773,"url":"https://github.com/macstab/macstab-chaos-jvm-agent","last_synced_at":"2026-05-03T17:35:40.542Z","repository":{"id":353550116,"uuid":"1219416882","full_name":"macstab/macstab-chaos-jvm-agent","owner":"macstab","description":"A set of Java libraries providing an JVM agent to perform chaos testing on the JVM level. It provides a comprehensive set of features for chaos testing, including network latency, packet loss, disk speed modifications, and more.","archived":false,"fork":false,"pushed_at":"2026-04-24T11:17:17.000Z","size":1222,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-24T12:30:37.304Z","etag":null,"topics":["backend","java"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/macstab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-23T21:20:28.000Z","updated_at":"2026-04-23T21:27:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/macstab/macstab-chaos-jvm-agent","commit_stats":null,"previous_names":["macstab/macstab-chaos-jvm-agent"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/macstab/macstab-chaos-jvm-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macstab%2Fmacstab-chaos-jvm-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macstab%2Fmacstab-chaos-jvm-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macstab%2Fmacstab-chaos-jvm-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macstab%2Fmacstab-chaos-jvm-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/macstab","download_url":"https://codeload.github.com/macstab/macstab-chaos-jvm-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macstab%2Fmacstab-chaos-jvm-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32578953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backend","java"],"created_at":"2026-05-03T17:35:39.007Z","updated_at":"2026-05-03T17:35:40.529Z","avatar_url":"https://github.com/macstab.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n  Engineered by  Christian Schnapka\n                 Embedded Principal+ Engineer\n                 Macstab GmbH · Hamburg, Germany\n                 https://macstab.com\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n--\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n# macstab-chaos-jvm-agent\n\n**Pipeline-grade chaos engineering for the JVM. The failures that page your on-call become commits that fail in PR review.**\n\n[![Java 21+](https://img.shields.io/badge/Java-21%2B-blue.svg)](https://openjdk.org/projects/jdk/21/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)\n[![ByteBuddy](https://img.shields.io/badge/ByteBuddy-instrumentation-orange.svg)](https://bytebuddy.net/)\n[![Spring Boot](https://img.shields.io/badge/Spring%20Boot-3%20%26%204-6DB33F.svg)](https://spring.io/projects/spring-boot)\n[![Quarkus](https://img.shields.io/badge/Quarkus-supported-4695EB.svg)](https://quarkus.io/)\n[![Micronaut](https://img.shields.io/badge/Micronaut-supported-1A1A1A.svg)](https://micronaut.io/)\n\n*Designed and engineered by* **[Christian Schnapka](https://macstab.com)** —\nPrincipal+ Engineer · [Macstab GmbH](https://macstab.com) · Hamburg, Germany\n\n\u003c/div\u003e\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n### Part of the Macstab Chaos Engineering Stack\n\n|            **JVM bytecode** *(this repo)*            | [**Container orchestration**](https://github.com/macstab/chaos-testing) | [**LD_PRELOAD libc**](https://github.com/macstab/macstab-chaos-testing-libraries) |\n|:----------------------------------------------------:|:-----------------------------------------------------------------------:|:---------------------------------------------------------------------------------:|\n|        In-process chaos for JVM applications         |         Annotation-driven Testcontainers chaos for any service          |               Pure C99 syscall-level chaos for any Linux container                |\n| 62 JDK call sites · Spring 3/4 · Micronaut · Quarkus |        Network · disk · DNS · CPU · memory · pre-built scenarios        |                glibc + musl × amd64 + arm64 · 100 % line coverage                 |\n\n**One mental model — three layers.** Same selector × effect × policy DSL spans the JVM, the container, and the libc layer. Each layer ships and runs independently; combine them when you need full distributed-system chaos coverage.\n\n\u003c/div\u003e\n\n---\n\n## The Short Version\n\nYour Redis cluster is fine 99.9 % of the time.\n\nThen a pod drains. Replication lag spikes to 300 ms. Your retry logic hammers the primary. p99 doubles. PagerDuty fires at 3 AM.\n\nYou've been in this room before. Toxiproxy didn't catch it — it's TCP-blind, it never sees `HikariPool.getConnection()` blocking. Your unit tests didn't catch it — they mocked the Redis client. Your integration tests didn't catch it — Testcontainers gave you a perfect Redis. The last game day was three months ago.\n\n**Add eight lines to your test suite. Catch it on the next PR.**\n\n```java\n@ChaosTest\nvoid retryLogicSurvivesReplicationLag(ChaosControlPlane chaos) {\n    chaos.activate(ChaosScenario.builder(\"replica-lag-300ms\")\n        .selector(ChaosSelector.network(\n            Set.of(OperationType.SOCKET_READ),\n            NamePattern.prefix(\"redis-replica.\")))\n        .effect(ChaosEffect.delay(Duration.ofMillis(300)))\n        .activationPolicy(ActivationPolicy.always())\n        .build());\n\n    assertThat(service.read1000Keys().latencyP99()).isLessThan(BUDGET);\n}\n```\n\nNo sidecar. No mocks. No application code changes. The bytecode of `java.net.Socket` reads is rewritten at JVM startup; chaos applies surgically inside the JVM that's already running your production code paths. **62 JDK call sites** auto-wired across DNS, SSL, JDBC, HTTP, NIO, sockets, virtual threads, monitors, scheduler, GC, class loading, file I/O, ThreadLocal, JNDI, JMX, serialization, native libraries, queues, executors, async completion, and more. One annotation. Zero `--add-opens` flags. The agent self-grants every JDK module open it needs at install time.\n\nWorks in JUnit 5 with **Spring Boot 3, Spring Boot 4, Micronaut, and Quarkus** out of the box.\n\n### What questions does it answer?\n\nThe questions your on-call has to answer at 3 AM — turned into PR-blocking assertions:\n\n- *\"Will a 3-second network outage kill my HikariCP pool, or will it recover?\"*\n- *\"If one connection has 1 s latency, does my repeatable-read transaction still produce consistent data, or does it return stale rows?\"*\n- *\"Is `read_from = REPLICA_PREFERRED` actually routing reads correctly when the primary is slow?\"*\n- *\"Does my circuit breaker open before my caller's timeout fires?\"*\n- *\"When DNS resolution slows from 1 ms to 800 ms, does anything in my stack actually time out, or does it deadlock?\"*\n- *\"If GC pauses 200 ms during a burst, do my queue consumers fall behind permanently or catch up?\"*\n\nEvery one of those becomes a `@ChaosTest` method that runs on every PR. No game day required. No SRE team required. No production blast radius. **Failures become commits, not incidents.**\n\n---\n\n## Part of a Three-Layer Chaos Engineering Stack\n\n`macstab-chaos-jvm-agent` is the **JVM bytecode layer** of a vertically-integrated chaos engineering toolkit. This repo is self-contained — everything in this README works standalone — but it composes with two sibling layers when broader coverage is needed.\n\n| Layer                        | Repo                                                                                                    | What it covers                                                                                                                                                                                                                                                                                                                |\n|------------------------------|---------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **JVM bytecode** (this repo) | [`macstab/macstab-chaos-jvm-agent`](https://github.com/macstab/macstab-chaos-jvm-agent)                 | 62 JDK call sites instrumented in-process. Spring Boot 3/4 + Micronaut + Quarkus integration. JUnit 5 `@ChaosTest`. Selector × effect × policy DSL. Live config reload.                                                                                                                                                       |\n| **Container orchestration**  | [`macstab/chaos-testing`](https://github.com/macstab/chaos-testing)                                     | Annotation-driven chaos on top of Testcontainers. CPU throttling, memory pressure, disk I/O, network partitions, DNS failures, pre-built Redis Sentinel + replication-lag scenarios, Toxiproxy adapter, Redis-aware fault injection.                                                                                          |\n| **LD_PRELOAD libc**          | [`macstab/macstab-chaos-testing-libraries`](https://github.com/macstab/macstab-chaos-testing-libraries) | Pure C99 LD_PRELOAD shared objects: file I/O (latency / `errno` / torn / corrupt), network, DNS, clock, process, memory. **glibc + musl × amd64 + arm64**, 100 % line coverage on shipped sources, Docker runtime validation as a quality gate. Language-agnostic — works for any process inside any container, not just JVM. |\n\n**Start here** if you're on the JVM. This is the entry point and the richest layer.\n\n**Compose layers** for full distributed-system coverage:\n- Add the LD_PRELOAD layer to inject *kernel-real* time skew, slow disks, and DNS slowdowns into containers — chaos for failure modes the JVM can't see (e.g. `clock_gettime` is a syscall the JVM can't intrinsically intercept; the LD_PRELOAD lib can).\n- Add the orchestration layer to wire `@ChaosTest` annotations directly to Testcontainers-managed Redis, Postgres, Kafka — including pre-built scenarios like *replication lag during pod drainage*.\n\nThree repos, one mental model: **the same selector × effect × policy DSL spans the libc layer, the JVM layer, and the orchestration layer.** No cross-layer coupling — each layer is independently adoptable, independently versioned, independently released.\n\n---\n\n\u003c!-- TOC --\u003e\n* [macstab-chaos-jvm-agent](#macstab-chaos-jvm-agent)\n  * [The Short Version](#the-short-version)\n    * [What questions does it answer?](#what-questions-does-it-answer)\n  * [Part of a Three-Layer Chaos Engineering Stack](#part-of-a-three-layer-chaos-engineering-stack)\n  * [Floor 0 — What it does (plain English)](#floor-0--what-it-does-plain-english)\n  * [Floor -1 — Architecture (senior engineer territory)](#floor--1--architecture-senior-engineer-territory)\n  * [Floor -2 — Runtime mechanics (principal-level)](#floor--2--runtime-mechanics-principal-level)\n    * [Evaluation Pipeline](#evaluation-pipeline)\n    * [Classloader Bridge](#classloader-bridge)\n    * [Reentrancy Guard](#reentrancy-guard)\n  * [Floor -3 — JVM internals, bytecode, and OS mechanics (the 1% layer)](#floor--3--jvm-internals-bytecode-and-os-mechanics-the-1-layer)\n    * [ByteBuddy Advice and the JVM Retransformation Mechanism](#bytebuddy-advice-and-the-jvm-retransformation-mechanism)\n    * [JMM Happens-Before in the Bootstrap Bridge](#jmm-happens-before-in-the-bootstrap-bridge)\n    * [`AtomicLong.incrementAndGet()` on x86-64](#atomiclongincrementandget-on-x86-64)\n    * [`synchronized(this)` — Rate Limit and the HotSpot Lock Inflation Protocol](#synchronizedthis--rate-limit-and-the-hotspot-lock-inflation-protocol)\n    * [`LockSupport.park()` and OS Thread Scheduling](#locksupportpark-and-os-thread-scheduling)\n    * [Safepoint Mechanics and `SafepointStormStressor`](#safepoint-mechanics-and-safepointstormstressor)\n    * [Virtual Threads and Carrier-Thread Pinning Risk](#virtual-threads-and-carrier-thread-pinning-risk)\n    * [`SplittableRandom` — Why Not `ThreadLocalRandom`?](#splittablerandom--why-not-threadlocalrandom)\n  * [Quick Start](#quick-start)\n    * [1. Add the dependency](#1-add-the-dependency)\n    * [2. Annotate your test](#2-annotate-your-test)\n    * [3. Or attach at startup for production-like testing](#3-or-attach-at-startup-for-production-like-testing)\n  * [Core Concepts](#core-concepts)\n  * [Selectors — Full Reference](#selectors--full-reference)\n  * [Effects — Full Reference](#effects--full-reference)\n    * [Choosing an effect](#choosing-an-effect)\n      * [Latency and timing](#latency-and-timing)\n      * [Errors and failure handling](#errors-and-failure-handling)\n      * [Resource pressure (background stressors)](#resource-pressure-background-stressors)\n      * [Threading and concurrency](#threading-and-concurrency)\n      * [JVM-wide pause pressure](#jvm-wide-pause-pressure)\n    * [Inline effects (execute on the calling thread)](#inline-effects-execute-on-the-calling-thread)\n    * [Background stressor effects](#background-stressor-effects)\n  * [Activation Policy](#activation-policy)\n  * [Session Isolation](#session-isolation)\n  * [Recipes](#recipes)\n    * [Recipe 1 — Flaky downstream under a retry policy](#recipe-1--flaky-downstream-under-a-retry-policy)\n    * [Recipe 2 — Circuit breaker verification under sustained failure](#recipe-2--circuit-breaker-verification-under-sustained-failure)\n    * [Recipe 3 — Memory pressure soak without crashing CI](#recipe-3--memory-pressure-soak-without-crashing-ci)\n  * [Startup Configuration (JSON)](#startup-configuration-json)\n    * [Live config reload (file watch)](#live-config-reload-file-watch)\n  * [Diagnostics](#diagnostics)\n  * [Architecture](#architecture)\n    * [Module responsibilities](#module-responsibilities)\n    * [Runtime dispatch](#runtime-dispatch)\n    * [Structural diagram](#structural-diagram)\n    * [Module reference](#module-reference)\n  * [Spring Boot Integration](#spring-boot-integration)\n    * [Test Starter](#test-starter)\n    * [Runtime Starter](#runtime-starter)\n  * [Performance](#performance)\n    * [Real-world service impact](#real-world-service-impact)\n    * [Hot-path overhead targets](#hot-path-overhead-targets)\n    * [What the JIT does](#what-the-jit-does)\n    * [Benchmarks](#benchmarks)\n  * [Build](#build)\n  * [Detailed Documentation](#detailed-documentation)\n  * [License](#license)\n  * [About the Engineer](#about-the-engineer)\n    * [Timeline](#timeline)\n    * [Specific evidence in this project](#specific-evidence-in-this-project)\n    * [Available for senior engineering engagements](#available-for-senior-engineering-engagements)\n\u003c!-- TOC --\u003e\n\n---\n\n## Floor 0 — What it does (plain English)\n\nYou have a Java service. You want to know what happens when:\n- the database connection pool is always slow\n- the executor that processes your orders gets delayed\n- `System.currentTimeMillis()` lies to your TTL checks\n- `Selector.select()` wakes up for no reason, like the Linux kernel actually does\n- the GC is under pressure while your request is in-flight\n\nThis library lets you **turn those scenarios on and off programmatically**, scoped to a single test thread, without touching production code. Multiple tests run in parallel in the same JVM — each test's chaos is invisible to every other test.\n\n---\n\n## Floor -1 — Architecture (senior engineer territory)\n\nThe agent loads via the standard Java Instrumentation API (`-javaagent:` or dynamic self-attach). ByteBuddy weaves `@Advice` hooks into selected JDK methods at startup. Those hooks call a static dispatcher that routes to the live scenario registry, evaluates matching scenarios against an 8-check activation pipeline, and executes the decision inline on the calling thread.\n\n```\nApplication Thread\n  └─► instrumented JDK method (e.g. ThreadPoolExecutor.execute)\n        └─► ByteBuddy @Advice (inlined bytecode)\n              └─► BootstrapDispatcher (bootstrap classloader)\n                    └─► ChaosBridge → ChaosRuntime (agent classloader)\n                          └─► ScenarioRegistry.match() → evaluate() × N\n                                └─► RuntimeDecision: delay + gate + terminal action\n```\n\n**57 interception handles** span: thread lifecycle, executor submission, scheduled ticks, blocking queues, `CompletableFuture`, NIO selectors, TCP sockets, clock (`currentTimeMillis`/`nanoTime`), GC, `System.exit`, reflection, `ObjectInputStream`, class loading, `LockSupport.park`, AQS acquire, JNDI, JMX, ZIP compression, `ThreadLocal`, native library loading, HTTP client send, JDBC execute, DNS resolve, SSL handshake, `Thread.sleep`, file I/O, and arbitrary method entry/exit.\n\n**Session isolation**: each test gets a `ChaosSession` backed by a `ThreadLocal\u003cString\u003e`. Session-scoped chaos evaluates only when the session ID on the current thread matches. Executor submissions within a `session.bind()` scope carry the session ID into worker threads via task decoration — chaos propagates exactly where intended and nowhere else.\n\n---\n\n## Floor -2 — Runtime mechanics (principal-level)\n\n### Evaluation Pipeline\n\nEvery intercepted JVM operation runs through `ScenarioController.evaluate()` — an 8-gate pipeline that short-circuits on the first failed check:\n\n1. `started.get()` — `AtomicBoolean`, maps to a volatile read + memory barrier\n2. `sessionId` equality — `String.equals()`, null means JVM-scope (passes all)\n3. `SelectorMatcher.matches()` — exhaustive sealed-type `switch` over `ChaosSelector` subtypes; stateless; zero allocation\n4. `matchedCount.incrementAndGet()` + activation-window check — `AtomicLong` CAS → lazy INACTIVE transition\n5. Warm-up gate: `matched \u003c= activateAfterMatches`\n6. Rate limit: `synchronized(this)` sliding-window token bucket — `rateWindowStartMillis + rateWindowPermits`\n7. Probability: `new SplittableRandom(baseSeed ^ matched ^ id.hashCode()).nextDouble()`\n8. Max-applications: CAS loop on `appliedCount` — prevents overshoot under concurrent access\n\n**Why the CAS loop at step 8?** A naive `incrementAndGet()` then compare pattern allows N racing threads to simultaneously read `count \u003c max`, all increment past the cap, and all apply the effect. The CAS loop (`compareAndSet(current, current+1)` with retry on collision) is the only correct solution under the Java Memory Model.\n\n### Classloader Bridge\n\nJDK classes (`Thread`, `Socket`, `System`, etc.) are loaded by the **bootstrap classloader** — the root of the classloader hierarchy with no parent. ByteBuddy advice woven into these classes executes *in* the bootstrap classloader's namespace, which cannot see agent classes by name. The bridge:\n\n1. At startup, `BootstrapDispatcher.class` bytecode is extracted from the agent JAR, written to a temp JAR, and appended to the bootstrap classpath via `Instrumentation.appendToBootstrapClassLoaderSearch`\n2. A 57-slot `MethodHandle[]` array is built against `BridgeDelegate.class` using `MethodHandles.publicLookup()` and wired into `BootstrapDispatcher.install()` via reflection against the bootstrap-classloader version (`Class.forName(\"...BootstrapDispatcher\", true, null)`)\n3. `handles` is written to the `volatile` field **before** `delegate` — the Java Memory Model's happens-before rule on volatile writes guarantees any thread that observes `delegate != null` also observes the fully-initialized `handles` array\n\n### Reentrancy Guard\n\nChaos code itself calls instrumented JDK methods (`Thread.sleep`, `LockSupport.park`, `ConcurrentHashMap` internals). Without protection, each chaos dispatch would trigger another chaos dispatch, recursing until stack overflow. The guard:\n\n```\nDEPTH : ThreadLocal\u003cint[]\u003e  (bootstrap-classloader resident)\ninvoke():\n  if DEPTH.get() \u003e 0 → return fallback immediately\n  DEPTH.set(DEPTH.get() + 1)\n  try { ... dispatch ... }\n  finally { if --depth == 0: DEPTH.remove() }\n```\n\nThe `ThreadLocal.remove()` in the `finally` block is critical for thread pool longevity — without it, the `ThreadLocal` entry accumulates on pooled threads, creating a slow per-thread memory leak across thousands of requests.\n\nA second recursion risk: `ThreadLocal.get()` is itself instrumented in Phase 2. Reading `DEPTH.get()` inside `invoke()` would re-trigger `ThreadLocalGetAdvice`, which would call `invoke()`, which would call `DEPTH.get()` ... The advice contains an identity check:\n\n```java\nif (threadLocal == BootstrapDispatcher.depthThreadLocal()) return false;\n```\n\nThis single pointer-equality check is the only thing preventing infinite recursion at that specific callsite.\n\n---\n\n## Floor -3 — JVM internals, bytecode, and OS mechanics (the 1% layer)\n\n### ByteBuddy Advice and the JVM Retransformation Mechanism\n\nByteBuddy instrumentation uses `AgentBuilder` with `RedefinitionStrategy.RETRANSFORMATION` and `disableClassFormatChanges()`. What this means at the bytecode level:\n\n- **`disableClassFormatChanges()`** constrains ByteBuddy to inline-only transformations: no new fields, no new constant pool entries that change the class format, no changes to method signatures. The transformed class must be accepted by `ClassFileTransformer.transform()` under the constraints of JVMTI's `RetransformClasses`. This is enforced by JVMTI spec §11.2.2 (\"Retransformation Incapable\") — specifically, that retransformable transformers may only modify method bodies, not the class schema.\n- **`@Advice.OnMethodEnter`** bytecode is copied verbatim (not called — *copied*) into the target method's bytecode at the entry point. The JVM sees one contiguous method body. After JIT compilation (`-XX:CompileThreshold` default 10,000 calls for C2 on HotSpot), the advice body is inlined by the JIT compiler as part of the compiled native frame. There is no virtual dispatch overhead after warm-up.\n- **Native method interception** (specifically `System.currentTimeMillis()`, `System.nanoTime()`): these are `@IntrinsicCandidate` native methods. HotSpot replaces them with Architecture-specific intrinsics during JIT compilation — on x86-64, `currentTimeMillis()` becomes a direct `RDTSC` + conversion sequence; on AArch64, it uses `MRS X0, CNTVCT_EL0`. ByteBuddy advice on the Java wrapper is dead code after JIT compilation. The `ClockSkewEffect` cannot intercept production clock reads via bytecode instrumentation alone — it works only via the direct `ChaosRuntime.applyClockSkew()` API.\n\n### JMM Happens-Before in the Bootstrap Bridge\n\nThe two-field publication in `BootstrapDispatcher.install()`:\n\n```java\nhandles = methodHandles;  // volatile write W1\ndelegate = bridgeDelegate; // volatile write W2\n```\n\nBy JSR-133 §17.4.5 ([https://jcp.org/aboutJava/communityprocess/mrel/jsr133/index.html](https://jcp.org/aboutJava/communityprocess/mrel/jsr133/index.html)):\n\n\u003e A write to a volatile field happens-before every subsequent read of that field.\n\nW1 happens-before W2 (program order + volatile ordering). Any thread T that reads `delegate != null` (volatile read R2) has R2 synchronizes-with W2, and W2 happens-after W1. By transitivity: `handles` is visible to T. On x86-64, the `volatile` write compiles to a `MOV` + `LOCK XCHG` or `MFENCE` (depending on JIT strategy) to enforce store-ordering. On AArch64, it emits `STLR` (Store-Release) which provides release semantics, ensuring all prior stores are visible before this store completes.\n\n### `AtomicLong.incrementAndGet()` on x86-64\n\n```java\nmatchedCount.incrementAndGet()\n// compiles to:\nLOCK XADD [rsi+offset], 1   ; atomic fetch-and-add on x86-64\n// or equivalently via CAS loop:\nLOCK CMPXCHG [rsi+offset], rax\n```\n\nThe `LOCK` prefix on x86 asserts the cache coherency protocol (MESI) for the cache line containing the field, issues a full memory barrier (both acquire and release semantics), and ensures atomicity across hyperthreads sharing an L1 cache. On AMD Zen and Intel Architectures with MESIF, this causes a cache-line ownership transfer if another core holds the line in Modified state — the latency spike is 40–70 cycles for cross-core coherency vs. ~4 cycles for same-core hits.\n\n**False sharing risk**: `ScenarioController` packs `matchedCount` and `appliedCount` as adjacent `AtomicLong` fields. Both fields fit in the same 64-byte cache line on typical JVMs. Under high-concurrency scenarios where many threads increment `matchedCount` while others CAS `appliedCount`, the cache line bounces between cores. This is a known trade-off in the current implementation — `@Contended` (JDK internal) padding could eliminate it at the cost of 128 bytes per controller.\n\n### `synchronized(this)` — Rate Limit and the HotSpot Lock Inflation Protocol\n\nThe rate-limit check is `synchronized(this)` on the `ScenarioController` instance. HotSpot's lock protocol ([JVM Spec §6.5 monitorenter](https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-6.html#jvms-6.5.monitorenter)):\n\n1. **Biased locking** (if `-XX:+UseBiasedLocking`, JDK \u003c 21): the thread ID is CAS-written into the mark word of the object header. Subsequent acquires by the same thread are lock-free — just a mark word read. JDK 21 removed biased locking ([JEP 374](https://openjdk.org/jeps/374)).\n2. **Lightweight lock**: CAS on the object's mark word to install a pointer to the current thread's stack frame. The `monitorenter` bytecode (opcode `0xC2` in the JVM instruction set) triggers this.\n3. **Heavyweight lock (inflated)**: when contention is detected, HotSpot inflates to an OS mutex — `pthread_mutex_lock(3)` on Linux, which maps to `futex(2)` with `FUTEX_WAIT` on the lock word. The inflated monitor (`ObjectMonitor` in JVM internals) contains an entry queue and a wait set backed by `ParkEvent` objects.\n\nFor the rate-limit case, contention is expected to be near-zero (rate-limited scenarios are rare by design). Biased or lightweight locking dominates — the synchronized block executes in ~5 ns under the uncontended path.\n\n### `LockSupport.park()` and OS Thread Scheduling\n\n`LockSupport.park(blocker)` ([JDK source: `java.util.concurrent.locks.LockSupport`](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/concurrent/locks/LockSupport.java)) maps to `Unsafe.park(false, 0L)` → JVM intrinsic → OS-level thread suspension:\n\n- **Linux**: `pthread_cond_timedwait(3)` → `futex(2)` with `FUTEX_WAIT_BITSET` or `FUTEX_WAIT`. The thread is removed from the kernel run queue, its `task_struct` state set to `TASK_INTERRUPTIBLE`, and control returned to the scheduler. Wakeup via `LockSupport.unpark()` calls `futex(FUTEX_WAKE)`.\n- **macOS**: `mach_wait_until(2)` or `semaphore_timedwait` via the Mach port abstraction.\n- **Minimum scheduler quanta**: on a Linux kernel with `CONFIG_HZ=1000` (1ms tick), the thread cannot be rescheduled faster than 1ms. `Thread.sleep(delayMillis)` with `delayMillis=1` may actually sleep 1–2ms depending on scheduler load. High-resolution timers (`CONFIG_HIGH_RES_TIMERS=y`) reduce this to sub-millisecond on modern kernels.\n\nThe `THREAD_PARK` instrumentation fires a `beforeThreadPark()` dispatch before the actual park. If a `DelayEffect` is configured, `Thread.sleep(delayMillis)` is called — which itself calls `park`, which the reentrancy DEPTH guard intercepts and returns the fallback immediately. Without the DEPTH guard, a 100ms chaos delay on `THREAD_PARK` would cause infinite `sleep → park → chaos eval → sleep → park ...` recursion.\n\n### Safepoint Mechanics and `SafepointStormStressor`\n\nHotSpot safepoints are JVM-global stop-the-world pauses. All application threads must reach a \"safe point\" — a location in the bytecode where the JVM knows the full GC root set. The `SafepointStormStressor` deliberately triggers them by calling both `System.gc()` and `Instrumentation.retransformClasses()` on a timer.\n\nHow safepoints work at the JVM level:\n1. The JVM sets a \"safepoint flag\" in a polling page (a memory-mapped page set to no-access)\n2. JIT-compiled code contains safepoint polls at loop back-edges and method returns — a load from the polling page. If the page is no-access, the resulting `SIGSEGV` is caught by the JVM signal handler which parks the thread at the safepoint\n3. Interpreted code polls at each bytecode boundary\n4. Once all threads are parked, the JVM performs the safepoint operation (GC, retransformation, deoptimization, etc.) and releases all threads\n\n`SafepointStormStressor` calling `Instrumentation.retransformClasses()` forces a safepoint on the calling timer thread, which blocks all application threads for the duration of the transformation. This simulates STW pause pressure that would appear in production under heavy GC load or JVM agent activity.\n\n### Virtual Threads and Carrier-Thread Pinning Risk\n\nOn JDK 21+ ([JEP 444 — Virtual Threads](https://openjdk.org/jeps/444)), virtual threads (`Thread.ofVirtual()`) are scheduled by the JVM's `ForkJoinPool`-based scheduler, mounted on platform carrier threads. A virtual thread that calls `synchronized` blocks *pins* its carrier thread — the carrier cannot be reassigned to another virtual thread while the virtual thread holds a monitor.\n\nThe `MONITOR_ENTER` interception instruments `AbstractQueuedSynchronizer.acquire()` — a proxy for `java.util.concurrent.locks.ReentrantLock`, not for `synchronized` blocks. The `beforeMonitorEnter()` chaos dispatch (if configured with a `DelayEffect`) adds latency to every AQS lock acquisition. On virtual threads, this delay occurs while the virtual thread is pinned to its carrier (if the lock is reentrant), which blocks that carrier from serving other virtual threads. Under high concurrency, this can cascade into carrier thread exhaustion.\n\nThis is documented in JEP 444: \"A virtual thread cannot be unmounted when it is pinned to its carrier\" — specifically in the case of `synchronized` blocks and JNI calls. AQS-based locks (`ReentrantLock`) do not pin virtual threads; the virtual thread is unmounted when parked inside AQS.\n\n### `SplittableRandom` — Why Not `ThreadLocalRandom`?\n\n`ScenarioController.passesProbability()` creates `new SplittableRandom(baseSeed ^ matched ^ id.hashCode())` per call. Why not `ThreadLocalRandom.current().nextDouble()`?\n\n1. **Reproducibility**: `ThreadLocalRandom` seeds are non-deterministic (seeded from `/dev/urandom` or `nanoTime()`). With a fixed `randomSeed` in `ActivationPolicy`, we need deterministic sampling across runs — same seed + same `matchedCount` = same draw. `SplittableRandom` with an explicit seed satisfies this; `ThreadLocalRandom` does not.\n2. **Thread-safety**: `SplittableRandom` is not thread-safe ([JDK API](https://docs.oracle.com/en/java/docs/api/java.base/java/util/SplittableRandom.html)). Creating a new instance per call (cheap — 3 `long` fields) avoids any shared-state issue. The seed is varied by `matched` to prevent the same `Random(seed)` from always returning the same first value.\n3. **Why not `Random(seed).nextDouble()`?** `java.util.Random` uses a linear congruential generator with `AtomicLong` state — it's thread-safe but that safety is achieved via CAS, adding unnecessary contention. `SplittableRandom` uses a non-linear generator (a variant of `xorshift`) with no internal synchronization.\n\n---\n\n## Quick Start\n\n### 1. Add the dependency\n\n```kotlin\n// build.gradle.kts\ntestImplementation(\"com.macstab:chaos-agent-testkit:0.1.0-SNAPSHOT\")\n```\n\n### 2. Annotate your test\n\n```java\n@ExtendWith(ChaosAgentExtension.class)\nclass MyServiceTest {\n\n    @Test\n    void shouldHandleExecutorDelays(ChaosSession session) {\n        session.activate(ChaosScenario.builder(\"slow-executor\")\n            .scope(ChaosScenario.ScenarioScope.SESSION)\n            .selector(ChaosSelector.executor(Set.of(OperationType.EXECUTOR_SUBMIT, OperationType.EXECUTOR_WORKER_RUN)))\n            .effect(ChaosEffect.delay(Duration.ofMillis(200)))\n            .build());\n\n        try (ChaosSession.ScopeBinding scope = session.bind()) {\n            myService.doWork(); // executor submissions delayed 200ms\n        }\n    }\n}\n```\n\n`ChaosAgentExtension` self-attaches the agent, opens a fresh `ChaosSession` per test, and closes it after. No `-javaagent` flag required for tests.\n\n### 3. Or attach at startup for production-like testing\n\n```bash\njava -javaagent:chaos-agent-bootstrap-0.1.0-SNAPSHOT.jar=configFile=/etc/chaos/plan.json \\\n     -jar your-app.jar\n```\n\n---\n\n## Core Concepts\n\n| Concept               | What it is                                                                    |\n|-----------------------|-------------------------------------------------------------------------------|\n| **Scenario**          | One selector + one effect + one activation policy                             |\n| **Selector**          | Matching rule: which JVM operation(s) trigger this scenario                   |\n| **Effect**            | What happens: delay, reject, suppress, gate, exception, corrupt, stress, skew |\n| **Activation policy** | Gating: probability, rate limit, warm-up, time window, max applications       |\n| **Session**           | Thread-local isolation scope — chaos targets only session-bound threads       |\n| **Handle**            | `AutoCloseable` returned by `activate()`; close to stop the scenario          |\n\n---\n\n## Selectors — Full Reference\n\nEvery selector factory takes a `Set\u003cOperationType\u003e`. Pass an empty set to accept every operation the selector understands. For pattern-based filters, pass a `NamePattern` (e.g. `NamePattern.prefix(...)`, `NamePattern.regex(...)`, `NamePattern.any()`).\n\n| Factory                                                                                                        | Intercepts                                                              |\n|----------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|\n| `ChaosSelector.executor(Set\u003cOperationType\u003e)`                                                                   | `ThreadPoolExecutor.execute()` / `submit()` / `invokeAll()`             |\n| `ChaosSelector.scheduling(Set\u003cOperationType\u003e)`                                                                 | `ScheduledExecutorService.schedule*()`                                  |\n| `ChaosSelector.thread(Set\u003cOperationType\u003e, ThreadKind)`                                                         | `Thread.start()` — platform and virtual threads                         |\n| `ChaosSelector.queue(Set\u003cOperationType\u003e)`                                                                      | `BlockingQueue.put()` / `take()` / `offer()` / `poll()`                 |\n| `ChaosSelector.async(Set\u003cOperationType\u003e)`                                                                      | `CompletableFuture.complete()` / `completeExceptionally()` / `cancel()` |\n| `ChaosSelector.network(Set\u003cOperationType\u003e[, NamePattern remoteHostPattern])`                                   | `Socket` connect / read / write, `ServerSocket.accept()`                |\n| `ChaosSelector.nio(Set\u003cOperationType\u003e[, NamePattern channelClassPattern])`                                     | `SocketChannel`, `ServerSocketChannel`, `Selector.select()`             |\n| `ChaosSelector.method(Set\u003cOperationType\u003e, NamePattern classPattern, NamePattern methodPattern)`                | Arbitrary method entry (`METHOD_ENTER`) and exit (`METHOD_EXIT`)        |\n| `ChaosSelector.classLoading(Set\u003cOperationType\u003e, NamePattern loaderClassPattern, NamePattern classNamePattern)` | `ClassLoader.loadClass()` / `defineClass()` / `getResource()`           |\n| `ChaosSelector.monitor(Set\u003cOperationType\u003e)`                                                                    | `synchronized` monitor enter/exit                                       |\n| `ChaosSelector.jvmRuntime(Set\u003cOperationType\u003e)`                                                                 | `currentTimeMillis()`, `nanoTime()`, `gc()`, `exit()`, `halt()`         |\n| `ChaosSelector.threadLocal(Set\u003cOperationType\u003e[, NamePattern valueClassPattern])`                               | `ThreadLocal.get()` / `ThreadLocal.set()`                               |\n| `ChaosSelector.shutdown(Set\u003cOperationType\u003e)`                                                                   | `System.exit()` / `Runtime.halt()` / shutdown hook register/remove      |\n| `ChaosSelector.httpClient(Set\u003cOperationType\u003e[, NamePattern urlPattern])`                                       | `HttpClient.send()` / `sendAsync()`                                     |\n| `ChaosSelector.jdbc()` / `ChaosSelector.jdbc(OperationType...)`                                                | JDBC `Connection`, `Statement`, `PreparedStatement`, `ResultSet`        |\n| `ChaosSelector.dns(Set\u003cOperationType\u003e[, NamePattern hostnamePattern])`                                         | `InetAddress.getAllByName()`                                            |\n| `ChaosSelector.ssl(Set\u003cOperationType\u003e)`                                                                        | `SSLEngine.wrap()` / `unwrap()` handshake                               |\n| `ChaosSelector.fileIo(Set\u003cOperationType\u003e)`                                                                     | `FileInputStream` / `FileOutputStream` / `RandomAccessFile` / `Files`   |\n| `ChaosSelector.stress(StressTarget)`                                                                           | Background stressor lifecycle binding                                   |\n\nFor full parameter semantics and edge-case behaviour of every selector, see [`docs/configuration-reference.md`](docs/configuration-reference.md).\n\n---\n\n## Effects — Full Reference\n\n### Choosing an effect\n\nThe decision guide below maps **testing goals** to effects. Every effect in the two reference tables that follow is listed here at least once; effects with multiple distinct use cases (for example `skewClock` with three modes) appear on multiple rows so the goal-to-effect mapping is direct.\n\n#### Latency and timing\n\n| I want to test…                                                        | Effect                                                     |\n|------------------------------------------------------------------------|------------------------------------------------------------|\n| What happens when a downstream is slow                                 | `delay(Duration)` — fixed pause                            |\n| Behaviour under variable latency (realistic network jitter)            | `delay(Duration min, Duration max)` — uniform-random pause |\n| A caller's timeout logic when the callee blocks indefinitely           | `gate(Duration maxBlock)` — block until released or capped |\n| Clock-based logic (TTLs, retries, token expiry) under continuous drift | `skewClock(Duration, ClockSkewMode.DRIFT)`                 |\n| Behaviour when the clock stops advancing                               | `skewClock(Duration, ClockSkewMode.FREEZE)`                |\n| Behaviour under a fixed clock offset (positive or negative)            | `skewClock(Duration, ClockSkewMode.FIXED)`                 |\n| Code that handles `Selector.select()` spurious wakeups                 | `spuriousWakeup()`                                         |\n\n#### Errors and failure handling\n\n| I want to test…                                                 | Effect                                                                                                |\n|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|\n| A downstream returning \"not available\"                          | `reject(String message)` — throws a semantically correct exception for the intercepted operation type |\n| A call that silently fails (returns `null` / `false` / empty)   | `suppress()`                                                                                          |\n| A `CompletableFuture` completing exceptionally mid-pipeline     | `exceptionalCompletion(FailureKind, String)`                                                          |\n| A callee throwing a *specific* exception class                  | `injectException(String className, String message)`                                                   |\n| A return value that is technically valid but semantically wrong | `corruptReturnValue(ReturnValueStrategy.NULL / ZERO / EMPTY / BOUNDARY_MAX / BOUNDARY_MIN)`           |\n\n#### Resource pressure (background stressors)\n\n| I want to test…                                                | Effect                                                   |\n|----------------------------------------------------------------|----------------------------------------------------------|\n| Behaviour under sustained heap pressure                        | `heapPressure(long bytes, int chunkSize)`                |\n| GC behaviour under high allocation churn                       | `gcPressure(long bytesPerSecond, Duration)`              |\n| Metaspace (class metadata) pressure from dynamic class loading | `metaspacePressure(int classCount, int fieldsPerClass)`  |\n| Direct (off-heap) buffer pressure                              | `directBufferPressure(long totalBytes, int bufferSize)`  |\n| JIT code cache filling up (inline-cache thrash, deopt)         | `codeCachePressure(int classCount, int methodsPerClass)` |\n| String intern pool growth (permanent root retention)           | `stringInternPressure(int count, int length)`            |\n| Reference-queue flood (phantom-reference reclamation delay)    | `referenceQueueFlood(int count, Duration interval)`      |\n| Finalizer backlog stalling GC                                  | `finalizerBacklog(int objectCount, Duration delay)`      |\n\n#### Threading and concurrency\n\n| I want to test…                                    | Effect                                                                                                        |\n|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------|\n| A real JVM monitor deadlock between N participants | `deadlock(int participantCount)` — requires `ActivationPolicy.withDestructiveEffects()`                       |\n| Thread-leak behaviour (unbounded thread creation)  | `threadLeak(int count, String prefix, boolean daemon)` — requires `ActivationPolicy.withDestructiveEffects()` |\n| `ThreadLocal` leaks on a pooled thread             | `threadLocalLeak(int entries, int valueSize)`                                                                 |\n| Monitor contention on a shared lock                | `monitorContention(...)`                                                                                      |\n| A keep-alive thread preventing clean shutdown      | `keepAlive(String name, boolean daemon, Duration heartbeat)`                                                  |\n\n#### JVM-wide pause pressure\n\n| I want to test…                                    | Effect                                                                                                  |\n|----------------------------------------------------|---------------------------------------------------------------------------------------------------------|\n| Application behaviour during stop-the-world pauses | `safepointStorm(Duration gcInterval)` — periodic `System.gc()` + `Instrumentation.retransformClasses()` |\n\nTwo behavioural notes that the table cannot express:\n\n- **Effects compose across matching scenarios.** Delays from all matching scenarios accumulate; terminal actions (`reject` / `suppress` / `exception` / `corruptReturnValue`) resolve by `precedence` — higher wins. This lets you layer e.g. \"slow every call + reject 10 % of calls\" without conflict.\n- **Stressors are background-attached, not per-call.** Their selector pairing (`ChaosSelector.stress(StressTarget)`) is a lifecycle binding — the stressor starts when the scenario activates and persists until the handle is closed or `activeFor` elapses, independent of operation traffic on the JVM.\n\n### Inline effects (execute on the calling thread)\n\n| Factory                                                          | Description                                                                   |\n|------------------------------------------------------------------|-------------------------------------------------------------------------------|\n| `ChaosEffect.delay(Duration)`                                    | Fixed pause before the operation proceeds                                     |\n| `ChaosEffect.delay(Duration min, Duration max)`                  | Uniform random pause in `[min, max]`                                          |\n| `ChaosEffect.gate(Duration maxBlock)`                            | Block until `handle.release()` is called (or `maxBlock` elapses)              |\n| `ChaosEffect.reject(String message)`                             | Throw a semantically correct exception for the operation type                 |\n| `ChaosEffect.suppress()`                                         | Silently discard; return `null` / `false` per operation contract              |\n| `ChaosEffect.exceptionalCompletion(FailureKind, String message)` | Complete a `CompletableFuture` with a failure                                 |\n| `ChaosEffect.injectException(String className, String message)`  | Inject arbitrary exception at method entry via reflection                     |\n| `ChaosEffect.corruptReturnValue(ReturnValueStrategy)`            | Corrupt return value: `NULL`, `ZERO`, `EMPTY`, `BOUNDARY_MAX`, `BOUNDARY_MIN` |\n| `ChaosEffect.skewClock(Duration, ClockSkewMode)`                 | Skew `currentTimeMillis()` / `nanoTime()`: `FIXED`, `DRIFT`, `FREEZE`         |\n| `ChaosEffect.spuriousWakeup()`                                   | Force `Selector.select()` to return 0 immediately                             |\n\n### Background stressor effects\n\n| Factory                                                                        | What it does                                           |    Recoverable?     |\n|--------------------------------------------------------------------------------|--------------------------------------------------------|:-------------------:|\n| `ChaosEffect.heapPressure(long bytes, int chunkSizeBytes)`                     | Retain `byte[]` allocations on heap                    |          ✅          |\n| `ChaosEffect.keepAlive(String threadName, boolean daemon, Duration heartbeat)` | Spawn an idle keep-alive thread                        |          ✅          |\n| `ChaosEffect.metaspacePressure(int classCount, int fieldsPerClass)`            | Define synthetic classes into an isolated classloader  |     ✅ (slow GC)     |\n| `ChaosEffect.directBufferPressure(long totalBytes, int bufferSizeBytes)`       | Allocate off-heap `ByteBuffer.allocateDirect`          |  ✅ (GC-dependent)   |\n| `ChaosEffect.gcPressure(long allocationRateBytesPerSecond, Duration duration)` | Continuously allocate short-lived objects              |          ✅          |\n| `ChaosEffect.finalizerBacklog(int objectCount, Duration finalizerDelay)`       | Flood the finalizer queue                              |          ✅          |\n| `ChaosEffect.deadlock(int participantCount)`                                   | Create a real JVM monitor deadlock between N threads   |          ✅          |\n| `ChaosEffect.threadLeak(int threadCount, String namePrefix, boolean daemon)`   | Start permanently-parked threads that are never joined |          ✅          |\n| `ChaosEffect.threadLocalLeak(int entriesPerThread, int valueSizeBytes)`        | Leak `ThreadLocal` entries on a background thread      |     ✅ (partial)     |\n| `ChaosEffect.monitorContention(…)`                                             | Saturate a shared lock with background contenders      |          ✅          |\n| `ChaosEffect.codeCachePressure(int classCount, int methodsPerClass)`           | Generate ByteBuddy classes to fill the JIT code cache  |          ✅          |\n| `ChaosEffect.safepointStorm(Duration gcInterval)`                              | Trigger periodic GC + retransformation (STW pauses)    |          ✅          |\n| `ChaosEffect.stringInternPressure(int internCount, int stringLengthBytes)`     | Intern unique strings into the JVM string pool         | ✅ (pool is GC root) |\n| `ChaosEffect.referenceQueueFlood(int referenceCount, Duration floodInterval)`  | Flood the JVM reference queue with phantom refs        |          ✅          |\n\n\u003e ⚠️ `deadlock()` and `threadLeak()` with `daemon=false` prevent a clean JVM exit until the activation handle is closed. Both require `ActivationPolicy.withDestructiveEffects()` at registration time. Closing the handle interrupts all participating threads and releases all locks.\n\nFor full parameter semantics, bounds, and edge-case behaviour of every effect, see [`docs/configuration-reference.md`](docs/configuration-reference.md).\n\n---\n\n## Activation Policy\n\n`ActivationPolicy` is a record. Use the static factories for the common cases, or construct the canonical record directly for fine-grained control. Probability must be in `(0.0, 1.0]` — pass `null` / omit the JSON field for the `1.0` default; omit the scenario entirely to disable it.\n\n```java\n// Always fire (default)\nActivationPolicy fire = ActivationPolicy.always();\n\n// Fire on every match, but start paused until handle.start() is called\nActivationPolicy armed = ActivationPolicy.manual();\n\n// Explicit opt-in for deadlock() / threadLeak()\nActivationPolicy destructive = ActivationPolicy.withDestructiveEffects();\n\n// Fine-grained: 30% probability, rate-limit 10/s, warm-up, auto-expire, cap, seed\nActivationPolicy tuned = new ActivationPolicy(\n    ActivationPolicy.StartMode.AUTOMATIC,\n    0.30,                                              // probability (in (0, 1])\n    5L,                                                // activateAfterMatches (warm-up)\n    100L,                                              // maxApplications\n    Duration.ofSeconds(30),                            // activeFor\n    new ActivationPolicy.RateLimit(10, Duration.ofSeconds(1)),\n    42L,                                               // randomSeed\n    false);                                            // allowDestructiveEffects\n```\n\nAll guards compose as AND. Fields may be `null` (Long / Duration / RateLimit / Long) to opt out of that axis.\n\n---\n\n## Session Isolation\n\n```java\n@Test void testA(ChaosSession sessionA) {\n    sessionA.activate(delayScenario);\n    try (var b = sessionA.bind()) {\n        // only threads carrying sessionA's ID see this chaos\n        executor.execute(sessionA.wrap(() -\u003e myService.doWork()));\n    }\n}\n\n@Test void testB(ChaosSession sessionB) {\n    // completely independent, even in parallel — different session UUID\n    sessionB.activate(rejectScenario);\n}\n```\n\n---\n\n## Recipes\n\nThree worked examples showing how selectors, effects, and activation policy compose for real testing goals. Each recipe is self-contained — copy, adapt the selector pattern, and run.\n\n### Recipe 1 — Flaky downstream under a retry policy\n\n**Goal.** Verify that a client's retry logic correctly recovers when 30 % of outgoing HTTP calls to a specific host fail with a connection reset, mixed with variable latency on the successes.\n\n```java\nvar flakyDownstream = ChaosScenario.builder(\"flaky-payments-api\")\n    .selector(ChaosSelector.httpClient(\n        Set.of(OperationType.HTTP_CLIENT_SEND, OperationType.HTTP_CLIENT_SEND_ASYNC),\n        NamePattern.prefix(\"https://payments.internal/\")))\n    .effect(ChaosEffect.reject(\"chaos: connection reset by peer\"))\n    .activationPolicy(new ActivationPolicy(\n        ActivationPolicy.StartMode.AUTOMATIC,\n        0.30,                                               // 30 % of matched calls reject\n        0L, null, null, null,\n        42L,                                                // deterministic seed — reproducible failure pattern\n        false))\n    .precedence(10)                                         // reject wins over delay when both match\n    .build();\n\nvar slowDownstream = ChaosScenario.builder(\"slow-payments-api\")\n    .selector(ChaosSelector.httpClient(\n        Set.of(OperationType.HTTP_CLIENT_SEND, OperationType.HTTP_CLIENT_SEND_ASYNC),\n        NamePattern.prefix(\"https://payments.internal/\")))\n    .effect(ChaosEffect.delay(Duration.ofMillis(50), Duration.ofMillis(400)))\n    .build();\n\nsession.activate(flakyDownstream);\nsession.activate(slowDownstream);\n// 100 % of calls experience jitter; 30 % of calls additionally fail.\n// The retry policy under test sees a realistic mixed-failure signal.\n```\n\n**What this exercises.** Delays accumulate across matching scenarios, but terminal actions resolve by precedence — the `reject` (precedence=10) wins over the default `delay`, so the 30 % of calls that lose the probability roll see `reject` *after* the latency jitter is applied. `randomSeed=42L` makes the pattern reproducible across runs — same input produces same failure sequence, so a failing test bisects cleanly.\n\n### Recipe 2 — Circuit breaker verification under sustained failure\n\n**Goal.** Prove that a Resilience4j `CircuitBreaker` transitions to `OPEN` state after 20 downstream failures, then rejects further calls fast with `CallNotPermittedException`.\n\n```java\nvar sustainedFailure = ChaosScenario.builder(\"circuit-breaker-trigger\")\n    .selector(ChaosSelector.jdbc(OperationType.JDBC_STATEMENT_EXECUTE))\n    .effect(ChaosEffect.injectException(\n        \"java.sql.SQLTransientConnectionException\",\n        \"chaos: pool exhausted\"))\n    .activationPolicy(new ActivationPolicy(\n        ActivationPolicy.StartMode.AUTOMATIC,\n        1.0,                                                // 100 % — every call fails\n        0L,\n        20L,                                                // hard cap: after 20 applications, stop\n        null, null, null,\n        false))\n    .build();\n\nsession.activate(sustainedFailure);\n\n// Drive 25 calls through the circuit breaker.\n// Calls 1-20: fail with SQLTransientConnectionException — circuit breaker counts failures.\n// Calls 21-25: fail with CallNotPermittedException — circuit breaker is now OPEN, fast-rejecting.\nfor (int i = 0; i \u003c 25; i++) {\n    try {\n        circuitBreaker.executeCallable(() -\u003e jdbcTemplate.queryForList(\"SELECT 1\"));\n    } catch (CallNotPermittedException e) {\n        // assert: transition happened after exactly 20 failures\n    }\n}\n```\n\n**What this exercises.** `maxApplications=20L` caps the chaos at exactly 20 effect applications — the 21st matched call passes through uninstrumented, so the circuit-breaker's fast-reject behaviour is observable on real calls instead of being masked by further injected failures. The CAS loop on `appliedCount` (Floor -2, step 8) guarantees the cap holds even under concurrent test threads hitting the connection pool in parallel.\n\n### Recipe 3 — Memory pressure soak without crashing CI\n\n**Goal.** Verify that a service holds its SLO under sustained heap pressure for 30 seconds, then the chaos self-releases so CI does not OOM-kill the test JVM.\n\n```java\nvar heapSoak = ChaosScenario.builder(\"heap-pressure-soak\")\n    .selector(ChaosSelector.stress(StressTarget.HEAP))\n    .effect(ChaosEffect.heapPressure(\n        512L * 1024 * 1024,                                 // 512 MiB total retention\n        1024 * 1024))                                       // 1 MiB chunks — avoids single-allocation OOM\n    .activationPolicy(new ActivationPolicy(\n        ActivationPolicy.StartMode.AUTOMATIC,\n        1.0, 0L, null,\n        Duration.ofSeconds(30),                             // auto-release after 30 s — CI safety\n        null, null, false))\n    .build();\n\ntry (var handle = session.activate(heapSoak)) {\n    // Run your soak test here. The stressor holds 512 MiB for up to 30 s,\n    // then self-releases. If your test finishes sooner, the try-with-resources\n    // closes the handle and the heap is freed immediately.\n    runWorkloadFor(Duration.ofSeconds(20));\n}\n// After handle.close() or activeFor expiry, the 512 MiB is eligible for GC.\n// CI's next test starts from a clean heap.\n```\n\n**What this exercises.** Stressors are background-attached (they do not fire per call) — activation spins up the stressor once, it holds memory for the lifetime of the handle, and closing the handle (via try-with-resources or `activeFor` expiry) releases the retained arrays. Using 1 MiB chunks instead of one 512 MiB allocation avoids a single `OutOfMemoryError` if the JVM is already close to its ceiling — pressure builds gradually, so the service under test experiences GC thrash rather than instant death.\n\n---\n\n## Startup Configuration (JSON)\n\n```json\n{\n  \"name\": \"soak-test-plan\",\n  \"scenarios\": [\n    {\n      \"id\": \"executor-latency\",\n      \"scope\": \"JVM\",\n      \"selector\": { \"type\": \"executor\" },\n      \"effect\": { \"type\": \"delay\", \"minDelay\": \"PT0.1S\", \"maxDelay\": \"PT0.5S\" },\n      \"activationPolicy\": { \"probability\": 0.5 }\n    }\n  ]\n}\n```\n\n```bash\n-javaagent:agent.jar=configFile=/etc/chaos/plan.json\n-javaagent:agent.jar=configBase64=\u003cbase64-json\u003e\n-javaagent:agent.jar=configJson={\"name\":\"...\"}\n-javaagent:agent.jar=configFile=/etc/plan.json,debugDumpOnStart=true\n```\n\nEnvironment variables: `MACSTAB_CHAOS_CONFIG_FILE`, `MACSTAB_CHAOS_CONFIG_JSON`, `MACSTAB_CHAOS_CONFIG_BASE64`.\n\n### Live config reload (file watch)\n\nPoint the agent at a file and enable watch mode — the agent polls the file at the configured interval, computes the diff, and updates only what changed while the JVM runs:\n\n```bash\n# poll every 500 ms\n-javaagent:agent.jar=configFile=/etc/chaos/plan.json,configWatchInterval=500\n\n# or via environment\nMACSTAB_CHAOS_CONFIG_FILE=/etc/chaos/plan.json\nMACSTAB_CHAOS_WATCH_INTERVAL=500\n```\n\nThe diff algorithm is structural: scenarios with the same `id` **and** identical content are kept running untouched. Scenarios that are new or whose content changed are stopped and re-activated. Scenarios that were removed are stopped. Programmatically activated scenarios (via `ChaosControlPlane.activate()`) are never touched by the poller.\n\n---\n\n## Diagnostics\n\n```java\nChaosDiagnostics diag = controlPlane.diagnostics();\nChaosDiagnostics.Snapshot snap = diag.snapshot();\n\nsnap.scenarios().forEach(r -\u003e\n    System.out.printf(\"%s: state=%s matched=%d applied=%d reason=%s%n\",\n        r.id(), r.state(), r.matchedCount(), r.appliedCount(), r.reason()));\n\nSystem.out.println(diag.debugDump()); // full text dump\n```\n\nJMX MBean: `com.macstab.chaos.jvm:type=ChaosDiagnostics` — inspect from `jconsole` without code changes.\n\n**Diagnosing zero applications**:\n- `matchedCount \u003e 0 \u0026\u0026 appliedCount == 0` → selector works; activation policy is filtering\n- `matchedCount == 0` → selector not matching; verify operation type, class name pattern, and `session.bind()` is active\n\n---\n\n## Architecture\n\nThe agent is a multi-module Gradle project organised as a strict directed dependency graph: the stable public API at the top, bytecode instrumentation and the bootstrap bridge at the bottom, with runtime core, startup configuration, and framework integrations layered between. Every module is independently versioned and independently publishable — consumers depend on `chaos-agent-api` for contract stability and pull in exactly one integration module (test or runtime, Boot 3 or Boot 4) for the chosen environment.\n\n### Module responsibilities\n\n- **Public contract** — `chaos-agent-api` is the only module application code compiles against. Sealed hierarchies (`ChaosSelector`, `ChaosEffect`), records (`ChaosScenario`, `ChaosPlan`, `ActivationPolicy`, `NamePattern`), and the `ChaosControlPlane` / `ChaosSession` interfaces form the stable surface. Everything else is implementation.\n- **Runtime core** — `chaos-agent-core` holds the scenario registry, the 8-gate evaluation pipeline, session scoping, and every stressor implementation (heap, metaspace, direct-buffer, GC, finalizer, deadlock, thread-leak, monitor, code-cache, safepoint, string-intern, reference-queue, thread-local, keep-alive, virtual-thread carrier pinning). Hot-path code (`ChaosDispatcher`, `ScenarioController`) is profiled against JMH benchmarks.\n- **Bytecode instrumentation** — `chaos-agent-instrumentation-jdk` defines the ByteBuddy `@Advice` classes and the bootstrap-classloader bridge. `BootstrapDispatcher` (bootstrap-resident, appended via `Instrumentation.appendToBootstrapClassLoaderSearch`) is the only path by which instrumented JDK methods reach the agent classloader — a `volatile` two-field publication protocol guarantees JMM visibility of the 57-slot `MethodHandle[]` dispatch table.\n- **Agent lifecycle** — `chaos-agent-bootstrap` owns the `premain` / `agentmain` entry points, the singleton `ChaosControlPlane` installation, and JMX MBean registration.\n- **Configuration resolution** — `chaos-agent-startup-config` resolves plans from `configFile`, `configJson`, `configBase64`, and environment variables, deserialises them via Jackson polymorphic mapping, and runs the live-reload file watcher.\n- **Test integrations** — `chaos-agent-testkit` provides `ChaosAgentExtension` (JUnit 5) and `ChaosPlatform.installLocally()` for self-attach. `chaos-agent-spring-boot3-test-starter` and `chaos-agent-spring-boot4-test-starter` compose `@ChaosTest` on top and wire a class-scoped `ChaosSession` into Spring Boot tests.\n- **Runtime integrations** — `chaos-agent-spring-boot3-starter` and `chaos-agent-spring-boot4-starter` expose `ChaosControlPlane` as a Spring bean and register the `/actuator/chaos` endpoint for live plan activation against running applications.\n\n### Runtime dispatch\n\nEvery intercepted JVM operation travels through five bytecode frames before a decision is reached:\n\n1. **Instrumented JDK method** — e.g. `ThreadPoolExecutor.execute()`, with ByteBuddy advice woven into the method body at agent startup (not called — *copied* inline)\n2. **`BootstrapDispatcher`** — bootstrap-classloader-resident static entry point holding the 57-slot `MethodHandle[]` and a `ThreadLocal\u003cint[]\u003e` reentrancy depth guard (a one-element `int` array avoids `Integer` autoboxing per call)\n3. **`BridgeDelegate`** — agent-classloader bridge that unboxes arguments and forwards to the active runtime\n4. **`ChaosDispatcher` / `ScenarioController`** — registry lookup + the 8-gate evaluation pipeline: `started` → session match → selector match → `matchedCount++` + activation window → warm-up → rate limit → probability → max-applications CAS\n5. **`ChaosEffect`** — terminal action executed inline on the calling thread (`delay` / `reject` / `suppress` / `gate` / `exception` / `corruptReturnValue` / `skewClock` / stressor handle)\n\nThe reentrancy guard short-circuits when chaos code itself calls instrumented JDK methods (e.g. a delay effect's own `Thread.sleep` would otherwise recurse into `THREAD_PARK` interception). Removing the `ThreadLocal` entry in the `finally` block is critical for thread-pool longevity — without it the entry accumulates on pooled threads, creating a slow per-thread memory leak.\n\n### Structural diagram\n\n```plantuml\n@startuml\n!theme plain\ntitle macstab-chaos-jvm-agent — Module Architecture\n\nskinparam componentStyle rectangle\nskinparam component {\n  BackgroundColor\u003c\u003capi\u003e\u003e #E8F4FD\n  BackgroundColor\u003c\u003ccore\u003e\u003e #FFF3E0\n  BackgroundColor\u003c\u003cbridge\u003e\u003e #F3E5F5\n  BackgroundColor\u003c\u003cboot\u003e\u003e #E8F5E9\n  BackgroundColor\u003c\u003cconfig\u003e\u003e #FFF8E1\n  BackgroundColor\u003c\u003ctest\u003e\u003e #FCE4EC\n  BackgroundColor\u003c\u003cspring\u003e\u003e #E0F2F1\n}\n\npackage \"Public Contract\" {\n  [chaos-agent-api] \u003c\u003capi\u003e\u003e\n}\n\npackage \"Runtime\" {\n  [chaos-agent-core]                  \u003c\u003ccore\u003e\u003e\n  [chaos-agent-instrumentation-jdk]   \u003c\u003cbridge\u003e\u003e\n  [chaos-agent-startup-config]        \u003c\u003cconfig\u003e\u003e\n  [chaos-agent-bootstrap]             \u003c\u003cboot\u003e\u003e\n}\n\npackage \"Test-Time Integrations\" {\n  [chaos-agent-testkit]                     \u003c\u003ctest\u003e\u003e\n  [chaos-agent-spring-boot3-test-starter]   \u003c\u003ctest\u003e\u003e\n  [chaos-agent-spring-boot4-test-starter]   \u003c\u003ctest\u003e\u003e\n}\n\npackage \"Runtime Integrations\" {\n  [chaos-agent-spring-boot3-starter]  \u003c\u003cspring\u003e\u003e\n  [chaos-agent-spring-boot4-starter]  \u003c\u003cspring\u003e\u003e\n}\n\n[chaos-agent-core]                --\u003e [chaos-agent-api]\n[chaos-agent-instrumentation-jdk] --\u003e [chaos-agent-core]\n[chaos-agent-instrumentation-jdk] --\u003e [chaos-agent-api]\n[chaos-agent-startup-config]      --\u003e [chaos-agent-api]\n\n[chaos-agent-bootstrap] --\u003e [chaos-agent-core]\n[chaos-agent-bootstrap] --\u003e [chaos-agent-instrumentation-jdk]\n[chaos-agent-bootstrap] --\u003e [chaos-agent-startup-config]\n\n[chaos-agent-testkit]                   --\u003e [chaos-agent-bootstrap]\n[chaos-agent-testkit]                   --\u003e [chaos-agent-api]\n[chaos-agent-spring-boot3-test-starter] --\u003e [chaos-agent-testkit]\n[chaos-agent-spring-boot4-test-starter] --\u003e [chaos-agent-testkit]\n\n[chaos-agent-spring-boot3-starter] --\u003e [chaos-agent-bootstrap]\n[chaos-agent-spring-boot4-starter] --\u003e [chaos-agent-bootstrap]\n\nnote right of [chaos-agent-api]\n  Sealed hierarchies:\n    ChaosSelector (19 subtypes)\n    ChaosEffect   (23 subtypes)\n  Records:\n    ChaosScenario, ChaosPlan,\n    ActivationPolicy, NamePattern\n  Enums:\n    OperationType (57 values)\nend note\n\nnote bottom of [chaos-agent-instrumentation-jdk]\n  ByteBuddy @Advice +\n  BootstrapDispatcher (bootstrap CL)\n  — 57 MethodHandles\n  — ThreadLocal\u003cint[]\u003e reentrancy guard\n  — volatile two-field publication\nend note\n\n@enduml\n```\n\n`plantuml` code fences do not render on GitHub directly — copy the source into the IntelliJ PlantUML plugin, `plantuml.jar`, or [plantuml.com/plantuml](https://www.plantuml.com/plantuml) to render. Pre-rendered diagrams for this and all downstream sequences live in [`docs/overall-agent.md`](docs/overall-agent.md).\n\n### Module reference\n\n| Module                                  | Role                                                                      |\n|-----------------------------------------|---------------------------------------------------------------------------|\n| `chaos-agent-api`                       | **Stable public API** — the only module application code should depend on |\n| `chaos-agent-bootstrap`                 | Agent entry point (`premain`/`agentmain`), singleton, MBean registration  |\n| `chaos-agent-core`                      | Scenario registry, evaluation pipeline, session scoping, stressors        |\n| `chaos-agent-instrumentation-jdk`       | ByteBuddy advice, bootstrap bridge (57 interception handles)              |\n| `chaos-agent-startup-config`            | JSON/base64/file config resolution and Jackson mapping                    |\n| `chaos-agent-testkit`                   | JUnit 5 extension, `ChaosPlatform.installLocally()` for self-attach       |\n| `chaos-agent-spring-boot3-test-starter` | `@ChaosTest` + `ChaosAgentExtension` for Spring Boot 3 tests              |\n| `chaos-agent-spring-boot4-test-starter` | `@ChaosTest` + `ChaosAgentExtension` for Spring Boot 4 tests              |\n| `chaos-agent-spring-boot3-starter`      | Runtime starter with Actuator endpoint for Spring Boot 3                  |\n| `chaos-agent-spring-boot4-starter`      | Runtime starter with Actuator endpoint for Spring Boot 4                  |\n| `chaos-agent-examples`                  | Runnable usage examples                                                   |\n\n---\n\n## Spring Boot Integration\n\nTwo axes, four modules: test-time vs runtime, Boot 3 vs Boot 4. All four are `compileOnly` against their Spring Boot BOM — they are inert until the consuming application supplies Spring Boot on the classpath.\n\n### Test Starter\n\nThe test starters give a `@SpringBootTest` class one-annotation access to chaos instrumentation. Add the dependency, put `@ChaosTest` on the class, declare a `ChaosSession` parameter on any test method.\n\n```kotlin\n// build.gradle.kts — Spring Boot 3\ntestImplementation(\"com.macstab:chaos-agent-spring-boot3-test-starter:0.1.0-SNAPSHOT\")\n\n// Spring Boot 4\ntestImplementation(\"com.macstab:chaos-agent-spring-boot4-test-starter:0.1.0-SNAPSHOT\")\n```\n\n```java\n@ChaosTest\nclass OrderServiceChaosTest {\n\n    @Test\n    void slowDatabaseRejectsOrdersGracefully(ChaosSession chaos) {\n        chaos.activate(ChaosScenario.builder(\"slow-jdbc\")\n            .scope(ChaosScenario.ScenarioScope.SESSION)\n            .selector(ChaosSelector.jdbc())\n            .effect(ChaosEffect.delay(Duration.ofSeconds(3)))\n            .build());\n\n        try (var binding = chaos.bind()) {\n            assertThrows(OrderTimeoutException.class,\n                () -\u003e orderService.placeOrder(testOrder));\n        }\n    }\n}\n```\n\n`@ChaosTest` composes `@SpringBootTest` and `@ExtendWith(ChaosAgentExtension.class)`. The extension self-attaches the agent (idempotent across the JVM), opens a class-scoped `ChaosSession`, injects it into test method parameters, and closes it after the last test method runs. `@Nested` classes inherit the same session. `ChaosControlPlane` can also be injected as a parameter. No `-javaagent` flag is needed for test JVMs.\n\n### Runtime Starter\n\nThe runtime starters wire the chaos agent into a running Spring Boot application and optionally expose a Spring Boot Actuator endpoint for runtime activation and control.\n\n```kotlin\n// build.gradle.kts — Spring Boot 3\nimplementation(\"com.macstab:chaos-agent-spring-boot3-starter:0.1.0-SNAPSHOT\")\n\n// Spring Boot 4\nimplementation(\"com.macstab:chaos-agent-spring-boot4-starter:0.1.0-SNAPSHOT\")\n```\n\n```yaml\n# application.yml — opt-in required; all flags default to false\nmacstab:\n  chaos:\n    enabled: true\n    config-file: /etc/chaos/soak-plan.json  # optional startup plan\n    debug-dump-on-start: false\n    actuator:\n      enabled: true   # exposes /actuator/chaos — protect with Spring Security\n```\n\nWhen `enabled: true`, the starter installs the agent and exposes `ChaosControlPlane` as a Spring bean with `destroyMethod = \"close\"`. If `config-file` is set, the plan is loaded and activated on `ApplicationReadyEvent`. When `actuator.enabled: true` (and `spring-boot-actuator` is on the classpath), the `/actuator/chaos` endpoint becomes available:\n\n```bash\n# Inspect active scenarios\ncurl http://localhost:8080/actuator/chaos\n\n# Activate a plan inline\ncurl -X POST http://localhost:8080/actuator/chaos \\\n     -H 'Content-Type: application/json' \\\n     -d '{\"name\":\"latency\",\"scenarios\":[{\"id\":\"slow-executor\",\"scope\":\"JVM\",\"selector\":{\"type\":\"executor\"},\"effect\":{\"type\":\"delay\",\"minDelay\":\"PT0.2S\",\"maxDelay\":\"PT0.5S\"}}]}'\n\n# Stop a specific scenario by ID\ncurl -X DELETE http://localhost:8080/actuator/chaos/slow-executor\n\n# Stop all starter-managed scenarios\ncurl -X DELETE http://localhost:8080/actuator/chaos\n```\n\n\u003e The `/actuator/chaos` endpoint can activate arbitrary fault injection in the live JVM. Protect it as you would a shutdown endpoint — never expose it unauthenticated to the public internet.\n\nFor deep technical detail on all four modules — lifecycle, conditional wiring, `@Nested` session propagation, `ChaosHandleRegistry` design, `ChaosAgentInitializer` timing, Boot 3 vs Boot 4 factory differences, and PlantUML sequence diagrams — see [`docs/spring-integration.md`](docs/spring-integration.md).\n\n---\n\n## Performance\n\nThe agent is designed to be invisible on any I/O-bound path. All numbers below are for the **hot path after JIT warm-up** (~10 000 invocations for C2 tier on HotSpot).\n\n### Real-world service impact\n\nThe number that matters is not nanoseconds per call — it is the total overhead on your service\nwhile chaos is active. For a typical Java microservice handling **2 000 requests/sec** with a\nrealistic mix of file reads, DNS lookups, SSL connections, and timed retries:\n\n| Agent state                                         | Total dispatch overhead | % of one CPU core (2.5 GHz) |\n|-----------------------------------------------------|-------------------------|-----------------------------|\n| Agent installed, **no scenarios**                   | ~0.31 ms/sec            | **0.003 %**                 |\n| **4 active scenarios**, all Phase 4 operation types | ~1.74 ms/sec            | **0.017 %**                 |\n| **4 exhausted scenarios** left resident in registry | ~1.74 ms/sec            | **0.017 %**                 |\n\nTwo hundredths of one percent. That is the tax for running four simultaneous chaos scenarios\nacross all instrumented operation types in a busy service.\n\nThe practical implication by operation type:\n\n| Operation                      | Cost matters when…                                                    | Cost is negligible when…                                                |\n|--------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------|\n| File I/O read (page cache hot) | Exhausted scenarios left resident; \u003e100 K reads/sec                   | Container under memory pressure, page cache evicted → syscall dominates |\n| DNS resolution                 | JVM address cache hit + exhausted scenarios                           | Real DNS query involved → network roundtrip (≥500 µs) dwarfs 300 ns     |\n| SSL handshake                  | Never — TLS crypto (1–10 ms) is always 3–4 orders of magnitude larger | Always                                                                  |\n| Thread.sleep                   | Never — the sleep duration dominates completely                       | Always                                                                  |\n\n**One rule to remember:** call `ChaosActivationHandle.stop()` when a scenario is done.\nAn exhausted scenario (one that hit `maxApplications`) costs as much to evaluate per call as one\nthat is actively firing — and delivers nothing. Stopping it returns overhead to the zero-scenario\nfloor immediately.\n\n### Hot-path overhead targets\n\n| Scenario                                       | Target                 | What drives the cost                                                  |\n|------------------------------------------------|------------------------|-----------------------------------------------------------------------|\n| Agent installed, zero active scenarios         | **\u003c 60 ns**            | Registry empty check + early return                                   |\n| One scenario active, no selector match         | **\u003c 100 ns**           | Operation-type mismatch exits before pattern evaluation               |\n| One scenario active, match, no terminal effect | **\u003c 300 ns**           | Full 8-check evaluation pipeline, no effect applied                   |\n| Session scope miss (wrong session ID)          | **\u003c 20 ns additional** | ThreadLocal read + identity compare, exits before selector evaluation |\n| 10 active scenarios, one match                 | **\u003c 1 µs**             | Linear registry scan, all misses exit at selector check               |\n\nFor reference: HikariCP connection borrow ~5–15 µs · local TCP roundtrip ~50–200 µs · `Thread.sleep(1)` ~1 ms.\n\n### What the JIT does\n\nByteBuddy advice bytecode is **copied verbatim into the target method body** at retransformation time — not called, *inlined at the bytecode level*. After JIT warm-up, the C2 compiler inlines the `MethodHandle.invoke()` dispatch chain through `ChaosBridge` into `ChaosDispatcher`. In the zero-scenario case the entire hot path reduces to a null check and an untaken branch.\n\nThe `volatile` read of `ChaosDispatcher`'s scenario registry is a single acquire load. On x86 TSO (total store order) the hardware guarantees load-load ordering without an `MFENCE` instruction — the acquire semantics cost zero additional cycles versus a plain read on Intel/AMD. On AArch64 it compiles to `LDAR` (load-acquire), which prevents speculative execution of dependent loads past the registry pointer.\n\n### Benchmarks\n\n`chaos-agent-benchmarks` contains a full JMH 1.37 suite across JDBC, HTTP client, Thread, DNS,\nSSL, and File I/O hot paths at all scenario-count variants. Run with:\n\n```bash\n./gradlew :chaos-agent-benchmarks:run\n```\n\nSee [`docs/benchmarks.md`](docs/benchmarks.md) for the full analysis: JIT warm-up reasoning,\nper-operation throughput impact tables, CPU cycle breakdown at 2.5 GHz, and realistic mixed\nmicroservice modelling.\n\n---\n\n## Build\n\n```bash\n./gradlew build                        # compile + test all modules\n./gradlew test                         # tests only\n./gradlew :chaos-agent-bootstrap:jar   # produce the agent JAR\n./gradlew :chaos-agent-benchmarks:run  # run JMH benchmarks\n```\n\nRequires JDK 21+ at runtime. Build toolchain targets JDK 25; `--release 21` is enforced.\n\n---\n\n## Detailed Documentation\n\nInternal Architecture documentation lives in [`docs/`](docs/):\n\n| Document                                              | What it covers                                                                                                            |\n|-------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|\n| [`overall-agent.md`](docs/overall-agent.md)           | System Architecture, all analysis dimensions, stack walkdown, PlantUML diagrams                                           |\n| [`api.md`](docs/api.md)                               | Stable API contract: builders, selectors, effects, diagnostics                                                            |\n| [`core.md`](docs/core.md)                             | Evaluation pipeline, session scoping, stressor lifecycle, JMM analysis                                                    |\n| [`instrumentation.md`](docs/instrumentation.md)       | ByteBuddy advice, bootstrap bridge, reentrancy guard, 57-handle table                                                     |\n| [`bootstrap.md`](docs/bootstrap.md)                   | Agent initialization, self-attach, MBean registration                                                                     |\n| [`startup-config.md`](docs/startup-config.md)         | Config source resolution, JSON schema, path safety                                                                        |\n| [`testkit.md`](docs/testkit.md)                       | JUnit 5 extension, session lifecycle, anti-patterns                                                                       |\n| [`spring-integration.md`](docs/spring-integration.md) | Spring Boot 3 and 4 starters: `@ChaosTest`, `ChaosAgentExtension`, Actuator endpoint, configuration reference             |\n| [`benchmarks.md`](docs/benchmarks.md)                 | JMH benchmark suite: hot-path targets, JIT analysis, result interpretation, `ChaosDispatcher` vs `ChaosRuntime` profiling |\n\n---\n\n## License\n\nApache License 2.0 — see [LICENSE](LICENSE). Use it in production, ship it in your products, fork it, build a business around it. The only thing you cannot do is claim you wrote it.\n\n---\n\n## About the Engineer\n\nThis three-repo stack — `macstab-chaos-jvm-agent`, [`chaos-testing`](https://github.com/macstab/chaos-testing), [`macstab-chaos-testing-libraries`](https://github.com/macstab/macstab-chaos-testing-libraries) — is the work of one engineer: **Christian Schnapka**, Hamburg, Germany.\n\n### Timeline\n\n| Year                     | What I was shipping                                                                                                                                                                                                        |\n|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **1984** *(age 10)*      | 6502 assembler on the Commodore 64                                                                                                                                                                                         |\n| **1987** *(age 14)*      | Motorola 68000 (M68k) assembler / C on the Commodore Amiga                                                                                                                                                                 |\n| **1989** *(from age 15)* | International demoscene — active in **Razor 1911**, **Sanity**, **Anthrox**, **Incal**; multiple demo-competition wins with my groups                                                                                      |\n| **1990**                 | x86 assembler + C / C++ on PC. Part-time at German game studios (**Software 2000**, **Rainbow Arts**) and short stints at studios in Birmingham, UK — shipping on cartridges and floppies, where there was no patch button |\n| **1996**                 | Transitioned to business / enterprise software engineering — the arc that runs to today                                                                                                                                    |\n| **1996**                 | Java — since 1.0, 30 years and counting                                                                                                                                                                                    |\n| **2002**                 | Python — 24 years and counting                                                                                                                                                                                             |\n| **2008**                 | LXC (Linux Containers) — early adopter, production use through to the Docker era and beyond                                                                                                                                |\n| **2013**                 | Docker — since first release; production use across enterprise stacks                                                                                                                                                      |\n| **~2015**                | Go — distributed-system internals, network programming                                                                                                                                                                     |\n| **2018**                 | Kotlin — JVM ecosystem coverage alongside Java; coroutines, multiplatform, server-side                                                                                                                                     |\n| **~2014**                | Kubernetes                                                                                                                                                                                                                 |\n\n**Diplom Informatiker** — German pre-Bologna 5-year computer-science degree, equivalent to a master's. 42 years of programming, 36 years of professional systems work, 30 years of enterprise software, 24 of Python, 10 of Go, 7 of Kotlin.\n\nThe depth shown in this project — JVMTI re-entrancy debugging on JDK 25, `@IntrinsicCandidate` JIT bypass analysis for the clock-skew limitation, ByteBuddy advice composition with `disableClassFormatChanges()`, the post-install retransform pass for classes that escape `installOn()`, the agent self-granting JDK module opens via manifest *and* `Instrumentation.redefineModule` — comes from a path that started with peeking C64 memory at 10, ran through the demoscene where every cycle counted on the wire, through game studios that shipped on cartridges with no recall option, and then 30 years of production enterprise software. Most engineers enter at the framework layer and look down. **This stack reads from below.** Principal-engineer titles are job descriptions; assembler at 10, the demoscene at 15, and shipping for game studios at 16 — that is a starting line.\n\n### Specific evidence in this project\n\nConcrete artifacts a reviewer can read:\n\n- **62 of 67 `OperationType` values auto-wired** across modern JDK internals — including JDK 25 changes most chaos tools haven't caught up to (`Socket$SocketInputStream` rename, `sun.nio.ch.NioSocketImpl` as default `SocketImpl`, `jdk.internal.loader.NativeLibraries.load` going `native`)\n- **Single-annotation `@ChaosTest` integration** working on Spring Boot 3, Spring Boot 4, Micronaut, and Quarkus — four frameworks with four different test-context conventions, one annotation\n- **Honest documentation** of what *cannot* work and why — `SYSTEM_CLOCK_MILLIS` documented with the actual JVM constraints (native `@IntrinsicCandidate`, JIT replacement with `RDTSC`/`MRS CNTVCT_EL0`), not papered over\n- **Cross-libc and cross-arch validation** in the sister C repo — `glibc + musl × amd64 + arm64`, 100 % line coverage on shipped sources, Docker runtime validation as a quality gate\n- **Apache 2.0 throughout** — usable in production, in commercial products, no lock-in\n\n### Available for senior engineering engagements\n\nLimited capacity. Typically:\n\n- **Fractional / interim Principal Engineer** — architecture, mentoring, hardest-problem ownership\n- **Reliability engineering** — chaos-engineering / SRE-tooling enablement, post-incident systemic fixes, \"we keep getting paged for X\" investigations\n- **JVM performance** — agents, GC tuning, instrumentation, deep profiling\n- **Systems-level work** — C / C++ / assembler-adjacent investigations, native libraries, Linux internals\n\nIf your team is fighting production issues that \"more tests\" hasn't fixed:\n\n- **[macstab.com](https://macstab.com)** — engagement enquiries\n- **info@macstab.com** — direct contact\n- **[GitHub @macstab](https://github.com/macstab)** — more open-source work\n\nA small number of engagements per year. The work is deep — production systems with receipts in `git log`, not slide decks.\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**[Christian Schnapka](https://macstab.com)**\nPrincipal+ Engineer\n[Macstab GmbH](https://macstab.com) · Hamburg, Germany\n\n*Building systems that operate correctly at the edges — including the ones you deliberately break.*\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacstab%2Fmacstab-chaos-jvm-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmacstab%2Fmacstab-chaos-jvm-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacstab%2Fmacstab-chaos-jvm-agent/lists"}