{"id":50517969,"url":"https://github.com/jacksonpradolima/coleman","last_synced_at":"2026-06-03T01:30:49.932Z","repository":{"id":81743917,"uuid":"432847332","full_name":"jacksonpradolima/coleman","owner":"jacksonpradolima","description":"COLEMAN (Combinatorial VOlatiLE Multi-Armed BANdit)  - and strategies for HCS context","archived":false,"fork":false,"pushed_at":"2026-05-20T02:31:05.000Z","size":18033,"stargazers_count":23,"open_issues_count":0,"forks_count":10,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-20T05:55:48.801Z","etag":null,"topics":["ci","coleman","continuous-integration","hcs","highly-configurable-system","mab","multi-armed-bandit","tcp","tcpci","test-case-prioritization"],"latest_commit_sha":null,"homepage":"https://jacksonpradolima.github.io/coleman/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacksonpradolima.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["jacksonpradolima"],"buy_me_a_coffee":"pradolima"}},"created_at":"2021-11-28T23:12:49.000Z","updated_at":"2026-05-20T02:32:45.000Z","dependencies_parsed_at":"2024-04-06T00:27:23.402Z","dependency_job_id":"3c0c76c0-5281-4036-b06d-2dc983dfcfdb","html_url":"https://github.com/jacksonpradolima/coleman","commit_stats":null,"previous_names":["jacksonpradolima/coleman"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/jacksonpradolima/coleman","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonpradolima%2Fcoleman","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonpradolima%2Fcoleman/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonpradolima%2Fcoleman/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonpradolima%2Fcoleman/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacksonpradolima","download_url":"https://codeload.github.com/jacksonpradolima/coleman/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonpradolima%2Fcoleman/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33844686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ci","coleman","continuous-integration","hcs","highly-configurable-system","mab","multi-armed-bandit","tcp","tcpci","test-case-prioritization"],"created_at":"2026-06-03T01:30:44.879Z","updated_at":"2026-06-03T01:30:49.924Z","avatar_url":"https://github.com/jacksonpradolima.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/jacksonpradolima","https://buymeacoffee.com/pradolima"],"categories":[],"sub_categories":[],"readme":"# Coleman\n\n[![Docs](https://img.shields.io/badge/Docs-Coleman%20Site-3D9970?style=flat-square)](https://jacksonpradolima.github.io/coleman/)\n![](https://img.shields.io/badge/python-3.14+-blue.svg)\n[![Bugs](https://sonarcloud.io/api/project_badges/measure?project=jacksonpradolima_coleman\u0026metric=bugs)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_coleman)\n[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=jacksonpradolima_coleman\u0026metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_coleman)\n[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=jacksonpradolima_coleman\u0026metric=security_rating)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_coleman)\n[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=jacksonpradolima_coleman\u0026metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=jacksonpradolima_coleman)\n[![codecov](https://codecov.io/github/jacksonpradolima/coleman/branch/main/graph/badge.svg?token=BW04LB0B5Y)](https://codecov.io/github/jacksonpradolima/coleman)\n\n\n### Solving the Test Case Prioritization using Multi-Armed Bandit Algorithms\n\n**COLEMAN** (_**C**ombinatorial V**O**lati**LE** **M**ulti-Armed B**AN**dit_) is a Multi-Armed Bandit (MAB) based approach designed\nto address the Test Case Prioritization in Continuous Integration (TCPCI) Environments in a cost-effective way.\n\nWe designed **COLEMAN** to be generic regarding the programming language in the system under test,\nand adaptive to different contexts and testers' guidelines.\n\nModeling a MAB-based approach for TCPCI problem gives us some advantages in relation to studies found in the literature, as follows:\n\n- It learns how to incorporate the feedback from the application of the test cases thus incorporating  diversity in the test suite prioritization;\n- It uses a policy to deal with the Exploration vs Exploitation (EvE) dilemma, thus mitigating the problem of beginning without knowledge (learning) and adapting to changes in the execution environment, for instance, the fact that some test cases are added (new test cases) and removed (obsolete test cases) from one cycle to another (volatility of test cases);\n- It is model-free. The technique is independent of the development environment and programming language, and does not require any analysis in the code level;\n- It is more lightweight, that is, needs only the historical failure data to execute, and has higher performance.\n\nIn this way, this repository contains the COLEMAN's implementation.\nFor more information about **COLEMAN** read **Ref1** in [References](#references).\n\nFurthermore, this repository contains the adaptation to deal with Highly-Configurable System (HCS) context through two strategies:\n- **Variant Test Set Strategy** (VTS) that relies on the test set specific for each variant; and\n- **Whole Test Set Strategy** (WST) that prioritizes the test set composed by the union of the test cases of all variants.\n\nFor more information about **WTS** and **VTS** read **Ref2** in [References](#references).\n\nOn the other hand, we extended **COLEMAN** to consider the context surrouding in each CI Cycle. The extended version\nwe named as **CONSTANTINE** (_**CON**textual te**S**t priori**T**iz**A**tion for co**NT**inuous **INE**gration_).\n\n**CONSTANTINE** can use any feature given a dataset, for instance:\n- Test Case Duration (Duration): The time spent by a test case to execute;\n- Number of Test Ran Methods (NumRan): The number of test methods executed during the test, considering that some test\nmethods are not executed due to some previous test method(s) have been failed;\n- Number of Test Failed Methods (NumErrors): The number of test methods which failed during the test.\n- Test Case Age (TcAge): This feature measures how long the test case exists and is given by a number which\nis incremented for each new CI Cycle in that the test case is used;\n- Test Case Change (ChangeType): Considers whether a test case changed. If a test case is changed from a commit\nto another, there is a high probability that the alteration was performed because some change in the software needs\nto be tested. If the test case was changed, we could detect and consider if the test case was renamed, or it added\nor removed some methods;\n- Cyclomatic Complexity (McCabe): This feature considers the complexity of McCabe.\nHigh complexity can be related to a more elaborated test case;\n- Test Size (SLOC): Typically, size of a test case refers to either the lines of code or the number of\n`assertions` in a test case. This feature is note correlated with coverage. For instance,\nif we have two tests t_1 and t_2 and both cover a method, but t_2 have more assertions than t_1, consequently,\nt_2 have higher chances to detect failures.\n\nIn order to use this `version`, use any Contextual-MAB available, for instance, LinUCB and SWLinUCB.\n\n# Getting started\n\n- [Coleman](#coleman)\n    - [Solving the Test Case Prioritization using Multi-Armed Bandit Algorithms](#solving-the-test-case-prioritization-using-multi-armed-bandit-algorithms)\n- [Getting started](#getting-started)\n- [Citation](#citation)\n- [Quick start](#quick-start)\n  - [Library API](#library-api)\n  - [CLI](#cli)\n  - [Config packs](#config-packs)\n  - [Sweep engine](#sweep-engine)\n  - [Deterministic run\\_id](#deterministic-run_id)\n  - [Provenance](#provenance)\n- [Installation](#installation)\n  - [As a library (recommended for new projects)](#as-a-library-recommended-for-new-projects)\n  - [From source (for development)](#from-source-for-development)\n- [Development](#development)\n  - [Code Cost Evaluation](#code-cost-evaluation)\n  - [DevContainer (recommended)](#devcontainer-recommended)\n- [Architecture: Results, Checkpoints \\\u0026 Telemetry](#architecture-results-checkpoints--telemetry)\n  - [Configuration](#configuration)\n  - [Optional extras](#optional-extras)\n  - [Querying results](#querying-results)\n- [Observability](#observability)\n  - [Using the DevContainer (zero-step setup)](#using-the-devcontainer-zero-step-setup)\n  - [Local setup (without DevContainer)](#local-setup-without-devcontainer)\n  - [Port reference](#port-reference)\n  - [Metric names](#metric-names)\n    - [Cardinality rules](#cardinality-rules)\n  - [Adding ClickHouse (optional)](#adding-clickhouse-optional)\n  - [Tear down](#tear-down)\n- [Datasets](#datasets)\n- [About the files input](#about-the-files-input)\n- [Using the tool](#using-the-tool)\n  - [How data flows through Coleman](#how-data-flows-through-coleman)\n  - [MAB Policies Available](#mab-policies-available)\n  - [Running for Non-HCS System](#running-for-non-hcs-system)\n  - [Running for an HCS system](#running-for-an-hcs-system)\n    - [Whole Test Set Strategy](#whole-test-set-strategy)\n    - [Variant Test Set Strategy](#variant-test-set-strategy)\n- [Analysis of COLEMAN Performance](#analysis-of-coleman-performance)\n  - [Performance Metrics](#performance-metrics)\n  - [Methodologies](#methodologies)\n  - [Visualizations](#visualizations)\n- [References](#references)\n- [Contributors](#contributors)\n----------------------------------\n\n\n# Citation\n\nIf this tool contributes to a project which leads to a scientific publication, I would appreciate a citation.\n\n```\n@Article{pradolima2020TSE,\n  author  = {Prado Lima, Jackson A. and Vergilio, Silvia R.},\n  journal = {IEEE Transactions on Software Engineering},\n  title   = {A Multi-Armed Bandit Approach for Test Case Prioritization in Continuous Integration Environments},\n  year    = {2020},\n  pages   = {12},\n  doi     = {10.1109/TSE.2020.2992428},\n}\n\n@article{pradolima2021EMSE,\n  author  = {Prado Lima, Jackson A. and Mendon{\\c{c}}a, Willian D. F. and Vergilio, Silvia R. and Assun{\\c{c}}{\\~a}o, Wesley K. G.},\n  journal = {Empirical Software Engineering},\n  title   = {{Cost-effective learning-based strategies for test case prioritization in Continuous Integration of Highly-Configurable Software}},\n  year    = {2021}\n}\n\n```\n\n# Quick start\n\nColeman ships a **typed, library-first experiment system** with\nYAML configs, composable config packs, a sweep engine, and deterministic\n`run_id` hashing.  External projects can `pip install coleman` and\ndrive experiments programmatically **or** via the `coleman` CLI — no repo\ncheckout required.\n\nThe library namespace is now `coleman`.\n\n\u003e **Breaking change** — the `CONFIG_FILE` environment variable, raw TOML\n\u003e dict workflow, and `main.py` entry-point are removed.  Configuration is now\n\u003e handled via YAML configs, typed Pydantic v2 models, config packs, and\n\u003e the `coleman` CLI.\n\n## Library API\n\n```python\nfrom coleman.spec import RunSpec, SweepSpec, SweepAxis, compute_run_id\nfrom coleman.api import run, run_many, sweep\n\n# 1. Define a spec\nspec = RunSpec(\n    experiment={\"datasets\": [\"alibaba@druid\"], \"policies\": [\"UCB\"], \"rewards\": [\"RNFail\"]},\n    results={\"out_dir\": \"./runs\"},\n)\n\n# 2. Deterministic run_id — same config always produces the same ID\nassert compute_run_id(spec) == compute_run_id(spec)\n\n# 3. Single run\nresult = run(spec)\nprint(result.run_id, result.artifacts_dir)\n\n# 4. Parameter sweep (grid × seeds)\nspecs = sweep(spec, SweepSpec(\n    axes=[SweepAxis(mode=\"grid\", params={\"algorithm.ucb.rnfail.c\": [0.1, 0.3, 0.5]})],\n    seeds=[0, 1, 2],\n))  # 3 values × 3 seeds = 9 specs\n\nresults = run_many(specs, max_workers=4)\n```\n\n| Function | Description |\n|----------|-------------|\n| `run(spec)` | Execute a single experiment from a resolved `RunSpec` |\n| `run_many(specs, max_workers=)` | Execute multiple specs, optionally in parallel |\n| `sweep(base, sweep_spec)` | Expand a base spec × sweep definition into concrete specs |\n| `load_spec(path)` | Load and validate a `RunSpec` from YAML (with pack resolution) |\n| `save_resolved(spec, path)` | Persist a resolved spec as canonical JSON |\n\n## CLI\n\nThe `coleman` console script is installed automatically with the package:\n\n```bash\n# Execute a single run\ncoleman run --config run.yaml\n\n# Parameter sweep (grid mode)\ncoleman sweep --config base.yaml \\\n    --grid algorithm.ucb.rnfail.c=0.1,0.3,0.5 \\\n    --grid execution.seed=range(0,20) \\\n    --workers 4\n\n# Parameter sweep declared in YAML (no --grid needed)\ncoleman sweep --config base.yaml --workers 4\n\n# Dry-run — print generated specs without executing\ncoleman sweep --config base.yaml --grid execution.seed=range(0,5) --dry-run\n\n\nYou can now define a top-level `sweep:` section in the YAML config used by\n`coleman sweep`; CLI `--grid` values are merged on top.\n# Validate a config and optionally write the resolved spec\ncoleman validate --config base.yaml --resolve resolved.json\n```\n\n## Config packs\n\nConfig packs are composable YAML fragments under `packs/`.  Reference them\nwith the `packs` key in your config and inline overrides win:\n\n```yaml\n# my-experiment.yaml\npacks:\n  - policy/linucb\n  - reward/rnfail\n  - results/parquet\n  - telemetry/off\n\nexperiment:\n  datasets: [\"alibaba@druid\"]\n  policies: [\"LinUCB\"]\n\nexecution:\n  independent_executions: 30\n\nhooks:\n  fail_fast: false\n  plugins:\n    - my_project.hooks.ForecastHook\n\nextensions:\n  my_domain:\n    forecast_selection:\n      policy: ThompsonSampling\n      reward: Binary\n```\n\nShipped starter packs:\n\n| Pack | Category | Contents |\n|------|----------|----------|\n| `policy/linucb` | Policy | LinUCB with default alpha values |\n| `reward/rnfail` | Reward | RNFail reward function |\n| `runtime/local` | Runtime | Single-process local execution |\n| `results/parquet` | Results | Parquet sink with defaults |\n| `results/duckdb` | Results | DuckDB sink with consolidated files |\n| `telemetry/off` | Telemetry | Telemetry disabled |\n\n## Sweep engine\n\nThe sweep engine supports **grid** (Cartesian product) and **zip** (paired)\nmodes, with optional seed replication:\n\n```python\nfrom coleman.spec import SweepSpec, SweepAxis, expand_sweep, RunSpec\n\nbase = RunSpec()\nsweep_spec = SweepSpec(\n    axes=[\n        SweepAxis(mode=\"grid\", params={\n            \"algorithm.ucb.rnfail.c\": [0.1, 0.3, 0.5],\n            \"execution.parallel_pool_size\": [1, 4],\n        }),\n    ],\n    seeds=[0, 1, 2],\n)\nspecs = expand_sweep(base, sweep_spec)\n# 3 × 2 × 3 seeds = 18 specs, deterministic order\n```\n\n- **Grid mode** — Cartesian product of all parameter lists\n- **Zip mode** — paired lists (must have equal length, raises `ValueError` otherwise)\n- **Seeds** — each base spec is replicated per seed; the seed is stored on `ExecutionSpec.seed` and affects `run_id`\n- **CLI + YAML composition** — `coleman sweep` reads top-level `sweep:` from YAML and combines it with `--grid`\n\n## Runner extensibility\n\nColeman supports extensibility without replacing the native runner:\n\n1. `extensions` for namespaced custom domain config.\n2. `hooks` for lifecycle plugins (`on_run_start`, `on_dataset_start`,\n   `on_execution_start`, `on_execution_end`, `on_dataset_end`, `on_run_end`,\n   `on_error`).\n\nExecution-level hooks are process-local in worker context and remain\nparallel-safe when `parallel_pool_size \u003e 1`.\n\n## Deterministic run_id\n\nEvery `RunSpec` hashes to a deterministic 12-character identifier:\n\n```\nrun_id = sha256(canonical_json(resolved_spec))[:12]\n```\n\n- **Canonical JSON**: sorted keys, compact separators (`json.dumps(sort_keys=True, separators=(\",\", \":\"))`)\n- **Same config → same `run_id` → same output directory**\n- Provenance files and artifacts are written to `\u003cout_dir\u003e/\u003crun_id\u003e/`\n\n## Provenance\n\nEach run persists:\n\n| File | Contents |\n|------|----------|\n| `spec.resolved.json` | The fully resolved `RunSpec` as canonical JSON |\n| `provenance.json` | Git commit, dirty flag, Python version, `uv.lock` hash |\n\n# Installation\n\n## As a library (recommended for new projects)\n\nInstall Coleman as a dependency in your project:\n\n```bash\npip install coleman\n```\n\nOr with optional extras:\n\n```bash\npip install coleman[telemetry]     # OpenTelemetry SDK\npip install coleman[clickhouse]    # ClickHouse results sink\n```\n\nThen use the [Library API](#library-api) or the [`coleman` CLI](#cli) to\ndrive experiments — no repo checkout required.\n\n## From source (for development)\n\nTo develop or modify the tool, follow these steps:\n\n1. Clone the repository:\n\n```shell\ngit clone git@github.com:jacksonpradolima/coleman.git\ncd coleman\n```\n\n2. Install [UV](https://docs.astral.sh/uv/) – a fast Python package manager.\n\n3. Install dependencies:\n\n```shell\nuv sync\n```\n\n4. Install the project locally in editable mode:\n\n```shell\nuv pip install -e .\n```\n\n5. Create a YAML config file (see [Quick start](#quick-start) for\n   the config format) and run with the `coleman` CLI:\n\n```bash\ncoleman run --config my-experiment.yaml\n```\n\n# Development\n\nThis project uses a `Makefile` to streamline common development tasks. Run `make help` to see all available commands.\n\n| Command                  | Description                          |\n|--------------------------|--------------------------------------|\n| `make install`           | Full dev setup (all extras + editable install) |\n| `make pre-commit-install`| Install pre-commit hooks             |\n| `make test`              | Run tests with pytest                |\n| `make lint`              | Run the ruff linter                  |\n| `make format`            | Run the ruff formatter               |\n| `make docs`              | Build documentation with Zensical    |\n| `make cost-structural`   | Run all structural cost checks (CC + MI + Xenon) |\n| `make cost-energy`       | Estimate energy/carbon for a workload |\n| `make help`              | Show all available Make targets      |\n\n## Code Cost Evaluation\n\nColeman enforces code quality through a **multi-dimensional cost scorecard**\ncovering structural complexity, runtime profiling, and energy estimation.\n\n**CI gates** run automatically on every pull request:\n\n- **Xenon complexity gate** — fails if any block exceeds C, any module average exceeds B,\n  or the project average exceeds A.\n- **Radon maintainability index** — fails if any module scores below A (MI \u003c 20).\n\n**Local evaluation commands:**\n\n```bash\nmake cost-structural        # all structural checks (CC + MI + Xenon)\nmake cost-complexity        # radon cyclomatic complexity\nmake cost-maintainability   # radon MI gate (fails if any module \u003c 20)\nmake cost-xenon             # xenon complexity gate\nmake cost-wily              # wily trend analysis\nmake cost-profile-scalene   # scalene CPU/memory profiling\nmake cost-energy            # codecarbon energy estimation\n```\n\nSee [Code Cost Evaluation](docs/code-cost.md) for full documentation.\n\n## DevContainer (recommended)\n\nThe fastest way to start developing is with a [DevContainer](https://containers.dev/). Open the repo in VS Code or any DevContainer-compatible editor and select **\"Reopen in Container\"** — **everything works out of the box**, including the full observability stack.\n\n**What happens automatically:**\n\n1. Python 3.14 + uv + all dependencies (including telemetry \u0026 ClickHouse extras) are installed *(on create)*\n2. Pre-commit hooks are configured *(on create)*\n3. The **observability stack** (OTel Collector + Prometheus + Grafana + ClickHouse) starts via Docker-in-Docker *(on every start)*\n4. **Telemetry** can be enabled via the `telemetry/local` pack *(swap `telemetry/off` for `telemetry/local` in `run.yaml`)*\n\nAfter the container builds, just run your experiment:\n\n```bash\ncoleman run --config run.yaml\n```\n\nGrafana is already live at http://localhost:3000 — open it in your browser to see metrics in real-time.\n\n**Other useful commands:**\n\n```bash\nmake test           # run the test suite\nmake lint           # lint with ruff\nmake docs-serve     # preview docs locally\n```\n\n**What's included in the DevContainer:**\n\n| What | Why |\n|------|-----|\n| Python 3.14 + [uv](https://docs.astral.sh/uv/) | The project's package manager |\n| Docker-in-Docker | Runs the observability stack automatically |\n| VS Code extensions | Ruff, Pylance, Pyright, Copilot, TOML, Jupyter, etc. |\n| Telemetry + ClickHouse extras | Pre-installed — no extra `pip install` needed |\n| OTel Collector + Prometheus + Grafana + ClickHouse | Started automatically on container start |\n| Port forwarding | All service ports mapped to your host (see table below) |\n\n**Forwarded ports (all accessible from your host browser):**\n\n| Port | Service | URL |\n|------|---------|-----|\n| 3000 | Grafana | http://localhost:3000 |\n| 9090 | Prometheus | http://localhost:9090 |\n| 4317 | OTel Collector (gRPC) | — (used by the framework) |\n| 4318 | OTel Collector (HTTP) | — (used by the framework) |\n| 8889 | Prometheus metrics | http://localhost:8889/metrics |\n| 8123 | ClickHouse (HTTP) | http://localhost:8123 |\n| 9000 | ClickHouse (native) | — |\n\n\u003e **ClickHouse sink remains optional.** The service is running in DevContainer,\n\u003e but you still need `sink: \"clickhouse\"` in your results config or use\n\u003e a ClickHouse results pack if you want results persisted to ClickHouse instead of Parquet.\n\n# Architecture: Results, Checkpoints \u0026 Telemetry\n\nColeman is **framework-first**: `coleman run --config run.yaml` works with zero external\nservices.  All monitoring is split into three independent layers that can be\nenabled or disabled individually:\n\n| Layer | Purpose | Default | Optional |\n|-------|---------|---------|----------|\n| **Results** | Persist experiment facts (NAPFD, APFDc, …) | Partitioned Parquet (zstd) | ClickHouse sink |\n| **Checkpoints** | Crash-safe resume | Local filesystem (pickle + `progress.json`) | — |\n| **Telemetry** | Observability (latency, throughput) | Disabled (NoOp) | OpenTelemetry + Collector |\n\nWhen a layer is disabled its module resolves to a **null implementation** with\nnear-zero overhead (`NullSink`, `NullCheckpointStore`, `NoOpTelemetry`).\n\n## Configuration\n\nAll settings live in `run.yaml` and composable config packs under `packs/`.\nSee [Configuration](docs/configuration.md) for the full YAML schema reference.\n\n```yaml\n# run.yaml — default configuration using packs\npacks:\n  - execution/default        # parallel_pool_size: 10, independent_executions: 10\n  - experiment/alibaba_druid # datasets, rewards, policies\n  - algorithm/defaults       # baseline defaults (UCB/FRRMAB/EpsilonGreedy/LinUCB/SWLinUCB)\n  - results/parquet          # Parquet sink with default settings\n  - checkpoint/default       # checkpoint enabled, interval: 50000\n  - telemetry/off            # telemetry disabled (swap for telemetry/local to enable)\n\n# Inline overrides (applied on top of packs):\n# execution:\n#   parallel_pool_size: 4\n```\n\n## Optional extras\n\n```bash\n# Telemetry (OpenTelemetry SDK)\npip install coleman[telemetry]\n\n# ClickHouse results sink\npip install coleman[clickhouse]\n```\n\n## Querying results\n\nResults are written as Hive-partitioned Parquet files under `./runs/`.  You\ncan query them directly with DuckDB (already a project dependency):\n\nFor a guided end-to-end example covering configuration, observability,\nresume/recovery, export, and final analysis, see [docs/workflow.py](docs/workflow.py).\n\n```sql\n-- Average NAPFD per policy\nSELECT policy, AVG(fitness) AS avg_napfd\nFROM read_parquet('./runs/**/*.parquet', hive_partitioning=1)\nGROUP BY policy\nORDER BY avg_napfd DESC;\n\n-- Cost distribution per reward function\nSELECT reward_function,\n       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY cost) AS median_cost\nFROM read_parquet('./runs/**/*.parquet', hive_partitioning=1)\nGROUP BY reward_function;\n```\n\n# Observability\n\n\u003e **Framework-first guarantee:** `coleman run --config run.yaml` works without Docker or\n\u003e any of these services.  The observability stack is **optional** for local\n\u003e installs, but **enabled automatically** in the DevContainer.\n\nColeman ships with a local observability stack (OTel Collector + Prometheus + Grafana)\nfor real-time metrics and traces during experiments.\n\n## Using the DevContainer (zero-step setup)\n\nIf you develop inside the DevContainer, **everything is already running**.\nThe post-start hook automatically:\n\n1. Starts the OTel Collector + Prometheus + Grafana + ClickHouse via Docker Compose\n2. Telemetry can be enabled via the `telemetry/local` pack in `run.yaml`\n\nThe `telemetry` and `clickhouse` pip extras are installed during container creation.\n\nJust run your experiment and open Grafana:\n\n```bash\ncoleman run --config run.yaml\n# Open http://localhost:3000 → metrics appear in real-time\n```\n\nNo manual steps required.\n\n## Local setup (without DevContainer)\n\nIf you're **not** using the DevContainer, follow these steps:\n\n```bash\n# 1. Start the observability stack\ncd examples/observability\ndocker compose up -d\n\n# 2. Install telemetry extras\nuv pip install coleman[telemetry]\n\n# 3. Enable telemetry in your run.yaml:\n#    Replace telemetry/off with telemetry/local in the packs list\n\n# 4. Run experiments — metrics flow to Grafana\ncoleman run --config run.yaml\n```\n\n## Port reference\n\n| Port | Service | URL |\n|------|---------|-----|\n| **3000** | Grafana | http://localhost:3000 |\n| **4317** | OTel Collector (gRPC) | — (used by the framework) |\n| **4318** | OTel Collector (HTTP) | — (used by the framework, default endpoint) |\n| **8889** | Prometheus metrics | http://localhost:8889/metrics |\n| **8123** | ClickHouse (HTTP) | http://localhost:8123 *(auto-started in DevContainer; local: `--profile clickhouse`)* |\n| **9000** | ClickHouse (native) | — *(auto-started in DevContainer; local: `--profile clickhouse`)* |\n\n## Metric names\n\n| Metric | Type | Description |\n|--------|------|-------------|\n| `coleman.cycles_total` | Counter | Total experiment cycles processed |\n| `coleman.bandit_update_latency` | Histogram (s) | Bandit arm-update latency |\n| `coleman.prioritization_latency` | Histogram (s) | Test-case prioritization latency |\n| `coleman.evaluation_latency` | Histogram (s) | Evaluation step latency |\n| `coleman.napfd` | Histogram | NAPFD score distribution |\n| `coleman.apfdc` | Histogram | APFDc score distribution |\n\n### Cardinality rules\n\n- **No `step` label** in metrics (would create unbounded cardinality).\n- `run_id` is a resource attribute, not a metric label.\n- Per-step detail is available in **traces** (span attributes).\n\n## Adding ClickHouse (optional)\n\nIn the DevContainer, ClickHouse is already started by the post-start hook.\nFor local setups, start ClickHouse alongside the existing stack:\n\n```bash\ncd examples/observability\ndocker compose --profile clickhouse up -d\n```\n\nThen switch the results sink in your `run.yaml`:\n\n```yaml\nresults:\n  sink: clickhouse\n```\n\n```bash\n# Run experiments — results go to the coleman_results table\ncoleman run --config run.yaml\n```\n\nThe ClickHouse extras are already installed in the DevContainer.  For local\ninstalls run `uv pip install coleman[clickhouse]` first.\n\n## Tear down\n\n```bash\ncd examples/observability\ndocker compose --profile clickhouse down -v\n```\n\n# Datasets\n\nThe datasets used in the examples (and much more datasets) are available at [Harvard Dataverse Repository](https://dataverse.harvard.edu/dataverse/gres-ufpr).\nYou can create your own dataset using out [GitLab CI - Torrent tool](https://github.com/jacksonpradolima/gitlabci-torrent) or our adapted version from [TravisTorrent tool](https://github.com/jacksonpradolima/travistorrent-tools).\nBesides that, you can extract relevant information about each system using our tool named [TCPI - Dataset - Utils](https://github.com/jacksonpradolima/tcpci-dataset-utils).\n\n# About the files input\n\n**COLEMAN** now uses Parquet files (`features-engineered.parquet` and\n`data-variants.parquet`) as the primary scenario input format. CSV inputs are\nstill accepted for compatibility, but they are deprecated and emit warnings.\n\nThe second file, **data-variants** (Parquet/CSV), is used by the HCS, and it represents all results from all variants.\nThe information is organized by commit and variant.\n\n- **features-engineered.parquet** (or legacy `features-engineered.csv`) contains the following information:\n  - **Id**: unique numeric identifier of the test execution;\n  - **Name**: unique numeric identifier of the test case;\n  - **BuildId**: a value uniquely identifying the build;\n  - **Duration**: approximated runtime of the test case;\n  - **LastRun**: previous last execution of the test case as *DateTime*;\n  - **Verdict**: test verdict of this test execution (Failed: 1, Passed: 0).\n\n- **data-variants.parquet** (or legacy `data-variants.csv`) contains all information that **features-engineered.parquet** has, and in addition the following information:\n  - **Variant**: variant name.\n\nIn this way, **features-engineered** organizes the information for a single system or variant, and\n**data-variants** tracks the information for all variants used during the software life-cycle (for each commit).\n\nDuring the **COLEMAN**'s execution, we use **data-variants** to identify the variants used in a current commit and apply the **WTS** strategy.\n\nFor **CONSTANTINE**, additional columns can be used and represents a contextual information. In this way, you define\nwhat kind of information can be used!\n\n#  Using the tool\n\n## How data flows through Coleman\n\n```mermaid\nflowchart TD\n    A[YAML config + packs: run.yaml] --\u003e B[coleman CLI / library API]\n    C[\"features-engineered.csv (data-variants.csv for HCS)\"] --\u003e D[Scenario Provider]\n    B --\u003e D\n\n    D --\u003e E[\"Virtual Scenarios (CI cycles with test cases)\"]\n    E --\u003e F[Environment]\n\n    G[\"Policy (Random, UCB, FRRMAB, LinUCB, etc.)\"] --\u003e H[Agent]\n    I[\"Reward Function: (RNFail, TimeRank)\"] --\u003e H\n    H --\u003e F\n\n    F --\u003e J[Test prioritization per cycle]\n    J --\u003e K[\"Test execution outcomes (verdict, duration, rank)\"]\n    K --\u003e L[\"Evaluation Metrics (e.g., NAPFD)\"]\n    L --\u003e M[\"Results Sink (Parquet / ClickHouse)\"]\n    L --\u003e P[\"Telemetry (OTel → Collector → Grafana, optional)\"]\n\n    F --\u003e Q[\"Checkpoint Store (local, crash-safe resume)\"]\n\n    K --\u003e O[Feedback loop]\n    O --\u003e I\n    O --\u003e G\n```\n\nTo use COLEMAN, you need to provide the necessary configurations. This includes setting up environment variables and configuration files.\n\nConfigure the utility by editing the `run.yaml` file located in the project's root directory.\nThe file uses composable config packs for a clean, modular setup — each pack provides\ndefaults for one concern.  Add inline overrides below the `packs:` list to customise settings.\n\nHere's an example `run.yaml` file:\n\n```yaml\n# run.yaml — Default Run Configuration\npacks:\n  - execution/default        # parallel_pool_size: 10, independent_executions: 10\n  - experiment/alibaba_druid # datasets, rewards, policies\n  - algorithm/defaults       # FRRMAB, UCB, EpsilonGreedy, LinUCB, SWLinUCB params\n  - results/parquet          # Parquet sink\n  - checkpoint/default       # checkpoint enabled\n  - telemetry/off            # telemetry disabled\n  - reward/rnfail            # RNFail reward\n  - hcs/off                  # HCS disabled\n  - contextual/default       # contextual information defaults\n\n# Inline overrides (applied on top of packs):\nexecution:\n  parallel_pool_size: 4\nexperiment:\n  datasets:\n    - square@retrofit\n```\n\n**where:**\n- Execution Configuration:\n  - `parallel_pool_size` is the number of worker processes to run **COLEMAN** in parallel.\n  - `independent_executions` is the number of independent experiments we desire to run.\n  - `force_sequential_under_scalene` forces sequential execution while Scalene is active to improve profiling stability and avoid missing per-thread attribution issues.\n  - Parallelism has two layers:\n    - inside a run: `execution.parallel_pool_size` controls process-pool workers for independent executions;\n    - across many specs: `coleman sweep --workers` (or API `run_many(..., max_workers=...)`) controls concurrent spec execution.\n- Experiment Configuration:\n  - `scheduled_time_ratio` represents the Schedule Time Ratio, that is, time constraints that represents the time available to run the tests. **Default**: 0.1 (10%), 0.5 (50%), and 0.8 (80%) of the time available.\n  - `datasets_dir` is the directory that contains your system. For instance, we desire to run the algorithm for the systems that are inside the directory **data**.\n  - `datasets` is an array that represents the datasets to analyse. It's the folder name inside `datasets_dir` which contains the required file inputs.\n  - `experiment_dir` is the directory where we will save the results.\n  - `rewards` defines the reward functions to be used, available RNFailReward and TimeRankReward (See **Ref1** in [References](#references)).\n  - `policies` selects the Policies available on **COLEMAN** (classic + extended), including greedy, UCB, contextual, and non-stationary/sliding-window variants.\n- Algorithm Configuration: each algorithm has its own individual configuration. Next, we present some of them:\n  - FRRAB:\n    - `window_sizes` is an array that contains the sliding window sizes\n    - `c` is the scaling factor. It's defined for each reward function used.\n  - UCB:\n    - `c` is the scaling factor. It's defined for each reward function used.\n  - Epsilon-Greedy:\n    - `epsilon` is the epsilon value. It's defined for each reward function used.\n- HCS Configuration:\n  - `wts_strategy` represents the usage of Whole Test Set (WTS) Strategy for a system HCS (See [Whole Test Set Strategy](#whole-test-set-strategy)).\n- Contextual Information:\n  - Config\n    - `previous_build` what kind of information we obtain from previous build and not in the current one. For\n      instance, the test case duration (`Duration`) will _know_ only after the test execution.\n  - Feature Group\n    - `feature_group_name` represent the name of a feature group. We can create different groups of feature to\n      evaluate the influence of each one.\n    - `feature_group_values` represent the features selected to be used by the Contextual MAB.\n\n\n##  MAB Policies Available\n\nThe following policies are available on **COLEMAN**.\n\nBaseline and classical policies:\n- Random\n- Greedy\n- EpsilonGreedy\n- UCB\n- UCB1\n- SlMAB\n- FRRMAB\n\nGreedy variants:\n- DecayEpsilonGreedy\n- OptimisticGreedy\n\nUCB variants:\n- UCB2\n- SlidingWindowUCB\n- KLUCB\n- UCBTuned\n- UCBV\n- MOSSUCB\n\nBayesian/stochastic/adversarial variants:\n- ThompsonSampling\n- BayesianUCB\n- Softmax\n- Pursuit\n- EpsilonDecreasing\n- BootstrappedThompson\n- PHE\n- EXP3\n- EXP3IX\n- DiscountedUCB\n- ChangeDetectionUCB\n\nCombinatorial variants:\n- CombinatorialUCB\n- CombinatorialThompson\n\nDueling / ranking variants:\n- DuelingUCB\n- PairwiseThompsonRanking\n\nPortfolio meta-policy:\n- PortfolioUCB\n\nContextual variants:\n- LinUCB\n- SWLinUCB\n- LinTS\n- ContextualEpsilonGreedy\n- SWLinTS\n- SWContextualEpsilonGreedy\n\n## Running for Non-HCS System\n\nTo execute **COLEMAN** for a non-HCS system, first update the experiment settings in your `run.yaml` (or override inline):\n\n- `datasets_dir: examples`\n- `datasets: [fakedata]`\n\nSubsequently, you can run the program with the following command:\n\n```\ncoleman run --config run.yaml\n```\n\n## Running for an HCS system\n\nFor HCS systems, we provide two distinct strategies to determine optimal solutions for variants:\n**WTS** (Whole Test Set Strategy) and **VTS** (Variant Test Set Strategy).\nYou can learn more about these in **Ref2** under the [References](#references) section.\n\nWhen employing the **WTS** and **VTS** strategies, regard `datasets_dir` as the directory housing your system.\nFor the **WTS** approach, variants of a system are discerned from subfolders within the `datasets_dir` directory.\nEssentially, `datasets_dir` symbolizes the project name.\nThis differentiation in execution methodology between HCS and non-HCS systems is crucial,\nalongside the `wts_strategy` variable. For clarity, please inspect our example directory.\n\n### Whole Test Set Strategy\n\nThe **WTS** strategy prioritizes the test set composed by the union of the test cases of all variants.\nTo employ this strategy, set in your `run.yaml`:\n\n- `hcs_configuration.wts_strategy: true`\n\nFor a practical demonstration, set `datasets = [\"dune@total\"]`\n(a dataset amalgamating test cases from all variants)\nand `datasets_dir = \"examples/core@dune-common\"`.\nThis provides a concise example using the Dune dataset.\nMore details on the dataset are available under [Datasets](#datasets).\n\n### Variant Test Set Strategy\n\nContrastingly, the **VTS** approach evaluates each variant as an isolated system.\nTo harness this strategy, set in your `run.yaml`:\n\n- `hcs_configuration.wts_strategy: false`\n\nAs example, use `datasets = [\"dune@debian_10 clang-7-libcpp-17\", \"dune@debian_11 gcc-10-20\",\n\"dune@ubuntu_20_04 clang-10-20\"]` and `datasets_dir = \"examples/core@dune-common\"`\nto run one small example using Dune dataset (See [Datasets](#datasets)).\nNow, we consider each variant as single system.\n\nAs a hands-on example, set `datasets =\n[\"dune@debian_10 clang-7-libcpp-17\", \"dune@debian_11 gcc-10-20\", \"dune@ubuntu_20_04 clang-10-20\"]`\nand `datasets_dir = \"examples/core@dune-common\"`.\nThis offers a succinct example using the Dune dataset, treating each variant as a unique system.\nFurther insights into the dataset are available in the [Datasets](#datasets) section.\n\n# Analysis of COLEMAN Performance\n\nAs part of our ongoing effort to provide the state-of-the-art tool, Coleman, for TCPCI, we've created examples to guide any researcher to understand the performance, effectiveness, and adaptability of our tool. The analysis, available in our [marimo notebook](notebooks/analysis.py) (and the original [Jupyter notebook](notebooks/analysis.ipynb)), leverages various libraries such as DuckDB, Pandas, Seaborn, and Matplotlib to process data and visualize the results.\n\n## Performance Metrics\n\nThe notebook has examples including but not limited to test case execution times, prioritization effectiveness, and algorithm efficiency under different configurations and environments.\n\n## Methodologies\n\nThe notebook employs SQL queries for data manipulation and leverages Python's data analysis and visualization libraries to derive meaningful insights from historical test data. Our methodology ensures a robust analysis framework capable of handling large datasets and producing actionable intelligence.\n\n## Visualizations\n\nData visualizations play a key role in our analysis, offering intuitive understanding of complex data patterns and algorithm performance. The notebook includes various charts and graphs that elucidate the trade-offs between different prioritization strategies and their impact on test cycle times and failure detection rates.\n\n# References\n\n- 📖 [**Ref1**] [A Multi-Armed Bandit Approach for Test Case Prioritization in Continuous Integration Environments](https://doi.org/10.1109/TSE.2020.2992428) published at **IEEE Transactions on Software Engineering (TSE)**\n\n[![ESEC/FSE](https://img.youtube.com/vi/w8Lf0VEWkQk/0.jpg)](https://www.youtube.com/watch?v=w8Lf0VEWkQk)\n\n- 📖 [**Ref2**] [Learning-based prioritization of test cases in continuous integration of highly-configurable software](https://doi.org/10.1145/3382025.3414967) published at **Proceedings of the 24th ACM Conference on Systems and Software Product Line (SPLC'20)**\n\n\n[![SPLC](https://img.youtube.com/vi/tT8Ygt8jCKg/0.jpg)](https://www.youtube.com/watch?v=tT8Ygt8jCKg)\n\n# Contributors\n\nPlease see our [Contributing Guidelines](CONTRIBUTING.md) if you'd like to contribute.\n\nFor vulnerability reports, refer to our [Security Policy](SECURITY.md).\n\n- 👨‍💻 Jackson Antonio do Prado Lima \u003ca href=\"mailto:jacksonpradolima@gmail.com\"\u003e:e-mail:\u003c/a\u003e\n\n\u003ca href=\"https://github.com/jacksonpradolima/coleman/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=jacksonpradolima/coleman\" /\u003e\n\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonpradolima%2Fcoleman","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacksonpradolima%2Fcoleman","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonpradolima%2Fcoleman/lists"}