{"id":48327289,"url":"https://github.com/devdanzin/labeille","last_synced_at":"2026-04-05T00:55:25.912Z","repository":{"id":340017222,"uuid":"1164162462","full_name":"devdanzin/labeille","owner":"devdanzin","description":"Hunt for CPython JIT bugs by running real-world test suites","archived":false,"fork":false,"pushed_at":"2026-03-20T00:46:12.000Z","size":2559,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-20T16:12:23.549Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devdanzin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-22T18:23:20.000Z","updated_at":"2026-03-20T00:46:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/devdanzin/labeille","commit_stats":null,"previous_names":["devdanzin/labeille"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/devdanzin/labeille","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdanzin%2Flabeille","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdanzin%2Flabeille/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdanzin%2Flabeille/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdanzin%2Flabeille/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devdanzin","download_url":"https://codeload.github.com/devdanzin/labeille/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devdanzin%2Flabeille/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31420785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T00:25:07.052Z","status":"ssl_error","status_checked_at":"2026-04-05T00:25:05.923Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-05T00:55:25.791Z","updated_at":"2026-04-05T00:55:25.889Z","avatar_url":"https://github.com/devdanzin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# labeille\n\n**Hunt for CPython JIT bugs by running real-world test suites.**\n\nlabeille is a companion to [lafleur](https://github.com/devdanzin/lafleur), an\nevolutionary fuzzer for CPython's JIT compiler. Where lafleur generates synthetic\nprograms to find structural bugs, labeille takes a complementary approach: it runs\nthe test suites of popular PyPI packages against JIT-enabled CPython builds to find\ncrashes — segfaults, aborts, and assertion failures that only surface with real-world\ncode patterns.\n\n## Why?\n\nFuzzers are great at finding crashes triggered by unusual code structures, but they\nrarely produce code that resembles real-world usage. Meanwhile, the test suites of\npopular packages exercise well-established code patterns, library interactions, and\nedge cases that package authors have accumulated over years. Running these suites\nagainst a JIT-enabled CPython catches bugs that synthetic programs miss — semantic\nerrors, optimization regressions, and interaction effects between the JIT and native\nextensions.\n\n## Status\n\n**Early development (alpha).** Core features are implemented and functional:\n`resolve` and `run` for discovering packages and running test suites, `bench`\nfor multi-condition performance comparison, `ft` for free-threaded CPython\ncompatibility testing, and `compat` for C extension build compatibility surveys.\nThe registry format and CLI interface may change.\n\n## Security Considerations\n\nLabeille installs PyPI packages and runs their test suites, which means\nexecuting arbitrary third-party code on your machine. This is inherent to the\ntask, not a bug — `setup.py`, build scripts, post-install hooks, and test code\nall run with your user's privileges.\n\n**Run labeille in a disposable, isolated environment**, especially when testing\nbeyond the most popular, well-audited packages. Even for well-known packages,\nsupply chain attacks (typosquatting, compromised maintainer accounts, malicious\nupdates) are a real and growing threat.\n\nRecommended isolation strategies, from simplest to strongest:\n\n- **Docker or Podman container** — easiest to set up, good process isolation\n- **Dedicated VM** — stronger isolation from host filesystem and network\n- **Ephemeral cloud instance** torn down after each batch run — strongest\n  guarantee of a clean slate\n- At minimum, avoid running as root and use a dedicated user account\n\nWhen using `--repos-dir` or `--venvs-dir` for persistent directories, cached\nrepos and venvs from previous runs persist on disk. A compromised package's\nartifacts survive across runs unless the directories are cleaned.\n\n## Installation\n\n```bash\npipx install labeille\n```\n\nOr with pip:\n\n```bash\npip install labeille\n```\n\n### From source\n\n```bash\ngit clone https://github.com/devdanzin/labeille\ncd labeille\npip install -e '.[dev]'\n```\n\n## Quick Start\n\n```bash\n# Step 0: Fetch the package registry\nlabeille registry sync\n\n# Step 1: Resolve packages — build the test registry from a PyPI top-packages dump\nlabeille resolve --from-json top-pypi-packages.json --top 50\n\n# Or resolve specific packages by name\nlabeille resolve requests click flask\n\n# Step 2: Run test suites against a JIT-enabled Python build\nlabeille run --target-python /path/to/jit-python\n\n# Dry-run to see what would be tested without actually running anything\nlabeille run --target-python /path/to/jit-python --dry-run\n\n# Run only pure-Python packages (skip C extensions)\nlabeille run --target-python /path/to/jit-python --skip-extensions\n\n# Stop after finding the first crash\nlabeille run --target-python /path/to/jit-python --stop-after-crash 1\n\n# Run tests in parallel (4 workers)\nlabeille run --target-python /path/to/jit-python --workers 4\n\n# Test a specific package at a specific git revision\nlabeille run --target-python /path/to/jit-python \\\n    --packages=requests@abc1234 --no-shallow\n```\n\n### Testing specific revisions\n\nTo test a package at a specific git revision (useful for reproducing\ncrashes or bisecting regressions):\n\n```bash\nlabeille run --target-python /path/to/python \\\n    --packages=requests@abc1234 --no-shallow\n```\n\nThe `@revision` accepts any git ref: commit hashes, branch names,\ntags, or relative refs like `HEAD~10`. Use `--no-shallow` (or\n`--clone-depth=0`) when the target revision may be beyond the default\nshallow clone depth.\n\nRevision overrides are ephemeral — they apply to the current run only\nand are not written back to the registry. The exact CLI invocation is\nrecorded in `run_meta.json` for reproducibility.\n\n### Runtime customization\n\nOverride test behavior without modifying the registry:\n\n```bash\n# Run with coverage\nlabeille run --extra-deps coverage \\\n    --test-command-override \"coverage run -m pytest\"\n\n# Add verbose output to all test commands\nlabeille run --test-command-suffix \"--tb=long -v\"\n\n# Test a fork\nlabeille run --packages=requests \\\n    --repo-override \"requests=https://github.com/fork/requests\"\n\n# Combine: test a specific revision of a fork with extra deps\nlabeille run --packages=requests@fix-branch \\\n    --repo-override \"requests=https://github.com/fork/requests\" \\\n    --extra-deps \"coverage\" --no-shallow\n```\n\n### Bisecting crashes\n\nWhen a crash is found, bisect the package's git history to pinpoint the\nfirst commit that introduced it:\n\n```bash\n# Find which commit introduced a SIGSEGV in requests\nlabeille bisect requests \\\n    --good=v2.30.0 --bad=v2.31.0 \\\n    --target-python /path/to/jit-python\n\n# Filter by crash signature\nlabeille bisect requests \\\n    --good=v2.30.0 --bad=v2.31.0 \\\n    --target-python /path/to/jit-python \\\n    --crash-signature \"SIGSEGV\"\n\n# Use a persistent work directory (avoids re-cloning)\nlabeille bisect requests \\\n    --good=v2.30.0 --bad=v2.31.0 \\\n    --target-python /path/to/jit-python \\\n    --work-dir /tmp/bisect-work\n```\n\nThe bisect algorithm clones the repo at full depth, verifies the good and\nbad revisions, then binary-searches to find the first bad commit. Commits\nthat fail to build are automatically skipped by trying neighboring commits.\n\n### Platform support\n\nSystem profiling works on Linux and macOS. Platform-specific details:\n\n- **Linux**: CPU info from `/proc/cpuinfo`, memory from `/proc/meminfo`,\n  disk type from `/sys/block/`.\n- **macOS**: CPU info from `sysctl`, memory from `vm_stat`, disk type\n  from `diskutil`.\n\nAll other features (registry, runner, analysis, bisect) work identically\non both platforms.\n\n## How It Works\n\nlabeille operates in two phases:\n\n### Phase 1: Resolve (`labeille resolve`)\n\nBuilds a registry of packages to test:\n\n1. Reads package names from CLI arguments, a text file, or a PyPI top-packages\n   JSON dump.\n2. Queries the PyPI JSON API for each package to find its source repository URL.\n3. Classifies each package as pure Python, C extension, or unknown by inspecting\n   wheel tags.\n4. Creates a YAML configuration file per package in the registry.\n5. Updates the registry index sorted by download count.\n\nResolve is **non-destructive**: it never overwrites package files that have been\nmanually enriched (`enriched: true`).\n\n### Phase 2: Run (`labeille run`)\n\nRuns test suites and detects crashes:\n\n1. Reads the registry and filters packages based on CLI options.\n2. For each package: clones the repo, creates a venv with the target Python,\n   installs the package, and runs its test command.\n3. Sets `PYTHON_JIT=1` and `PYTHONFAULTHANDLER=1` to enable the JIT and get\n   crash tracebacks.\n4. Classifies each result as pass, fail, crash, timeout, or error.\n5. For crashes: extracts a signature (signal + stderr context) and saves the\n   full stderr output.\n6. Writes results as JSONL for analysis, with full metadata for reproducibility.\n\nRuns can execute packages in parallel with `--workers N` for faster batch\ntesting. Each worker handles one package end-to-end with results collected\nthread-safely.\n\nRuns are **resumable**: use `--skip-completed` with the same `--run-id` to\ncontinue after an interruption.\n\n## Dependency Scanning\n\nBefore enriching a package, scan its test imports to discover dependencies:\n\n```bash\n# Clone and scan\ngit clone --depth=1 https://github.com/psf/requests /tmp/requests\nlabeille scan-deps /tmp/requests --package-name requests\n\n# Compare against existing install_command\nlabeille scan-deps /tmp/requests --package-name requests \\\n    --install-command \"pip install -e '.[dev]'\"\n\n# Get just the pip install line for missing deps\nlabeille scan-deps /tmp/requests --format pip\n\n# JSON output for scripting\nlabeille scan-deps /tmp/requests --format json\n```\n\n## Enriching Packages\n\nAfter `labeille resolve` creates skeleton registry files, each package needs\nto be *enriched* with specific installation and test instructions. This is\nthe most important step — without accurate enrichment, test runs will fail\nwith missing dependencies, broken installs, or pytest configuration errors.\n\nEnrichment can be done manually, with Claude Code, or with another AI coding\nagent. For the field reference, enrichment guidelines, and the registry schema,\nsee [laruche](https://github.com/devdanzin/laruche). For a step-by-step\nwalkthrough with common problems and ready-to-use Claude Code prompts, see\n**[doc/enrichment.md](doc/enrichment.md)**.\n\n## Registry\n\nThe package registry is maintained in\n[laruche](https://github.com/devdanzin/laruche), a separate repository\ncontaining YAML configurations for each tracked package.\n\n- **Fetch the registry:** `labeille registry sync` clones or updates the\n  registry to `~/.local/share/labeille/registry/`.\n- **Override the location:** Pass `--registry-dir \u003cpath\u003e` to any command\n  to use a different registry directory.\n- **Field schema:** See the [laruche](https://github.com/devdanzin/laruche)\n  README for the full field reference and enrichment guidelines.\n\n## Analyzing Results\n\nAnalyze registry composition and run results:\n\n```bash\n# Registry overview (counts by type, framework, skip reasons)\nlabeille analyze registry\n\n# Registry as a table, filtered\nlabeille analyze registry --format table --where extension_type:pure\n\n# Single run summary (aggregate stats, crash detail, reproduce commands)\nlabeille analyze run\n\n# Specific run, quiet mode (crashes only)\nlabeille analyze run 2026-02-23T08-01-05 -q\n\n# Compare two runs (status changes, timing deltas)\nlabeille analyze compare 2026-02-20T10-00-00 2026-02-22T10-00-00\n\n# Run history with trends and flaky package detection\nlabeille analyze history --last 5\n\n# Deep dive on a specific package\nlabeille analyze package requests\n```\n\n### Commit-aware comparison\n\nWhen comparing runs, labeille shows whether each package's repository\nchanged between runs:\n\n```\nlabeille analyze compare run_001 run_002\n\nStatus changes:\n  requests: PASS → CRASH\n    Repo: abc1234 → abc1234 (unchanged — likely a CPython/JIT regression)\n```\n\nThis helps triage new crashes: if the package code didn't change,\nthe regression is almost certainly on the CPython/JIT side.\n\n## Registry Management\n\nBatch operations for managing the package registry:\n\n```bash\n# Preview adding a new field (dry run)\nlabeille registry add-field skip_versions --type dict --after skip_reason\n\n# Apply the change\nlabeille registry add-field skip_versions --type dict --after skip_reason --apply\n\n# Resume after an interrupted operation\nlabeille registry add-field skip_versions --type dict --after skip_reason --apply --lenient\n\n# Set a field on filtered packages\nlabeille registry set-field timeout 600 --where extension_type=extensions --apply\n\n# Validate registry against schema\nlabeille registry validate\n\n# Remove a deprecated field\nlabeille registry remove-field old_field --apply --lenient\n```\n\n## Benchmarking\n\nCompare test suite performance across conditions — JIT vs no-JIT, different\ninterpreters, with/without coverage:\n\n```bash\n# Compare JIT-enabled vs disabled\nlabeille bench run \\\n    --condition \"jit:target_python=/opt/cpython/python,env.PYTHON_JIT=1\" \\\n    --condition \"nojit:target_python=/opt/cpython/python,env.PYTHON_JIT=0\" \\\n    --work-dir ~/bench-work --top 30\n\n# Or use a YAML profile for repeated benchmarks\nlabeille bench run --profile jit-overhead.yaml\n\n# View results and compare conditions\nlabeille bench show results/bench_*\nlabeille bench compare results/bench_*\n\n# Track performance over time\nlabeille bench track init jit-perf\nlabeille bench track add jit-perf results/bench_*\nlabeille bench track trend jit-perf\n```\n\nKey features: multi-condition comparison via YAML profiles or inline definitions,\nstatistical analysis with confidence intervals, per-test timing capture, anomaly\ndetection, longitudinal tracking with regression alerts, resource constraints\n(memory limits, CPU affinity), cache dropping for cold-start benchmarks, and\nexport to CSV/Markdown.\n\nFor the complete guide see **[doc/benchmarking.md](doc/benchmarking.md)**.\n\n## Free-Threaded Testing\n\nTest packages against free-threaded CPython builds to detect crashes, deadlocks,\nand race conditions:\n\n```bash\n# Run each package 10 times with PYTHON_GIL=0\nlabeille ft run --target-python ~/cpython-ft/python \\\n    --work-dir ~/ft-work --top 50\n\n# Compare with GIL-enabled behavior to isolate free-threading issues\nlabeille ft run --target-python ~/cpython-ft/python \\\n    --work-dir ~/ft-work --compare-with-gil --top 50\n\n# View results and analyze flakiness\nlabeille ft show results/ft_*\nlabeille ft flaky results/ft_* --package urllib3\n```\n\nKey features: multiple iterations per package to catch intermittent races,\ndeadlock detection via output stall monitoring, GIL comparison mode, C extension\nGIL compatibility probing (`Py_mod_gil`), TSAN race condition detection, and\nfailure categories (compatible, GIL fallback, intermittent, crash, deadlock).\n\nFor the complete guide see **[doc/free-threaded.md](doc/free-threaded.md)**.\n\n## Compatibility Analysis\n\nSurvey C extension packages for build compatibility against any Python version:\n\n```bash\n# Survey all C extensions in the registry against Python 3.15\nlabeille compat survey --target-python ~/cpython-315/python \\\n    --extensions-only --workers 4\n\n# Or survey specific packages from sdist or source\nlabeille compat survey --target-python ~/cpython-315/python \\\n    --packages numpy,scipy,pandas --from source\n\n# View results, compare two surveys\nlabeille compat show compat-results/compat_*\nlabeille compat compare compat-results/compat_314 compat-results/compat_315\n```\n\nKey features: three build modes (sdist, git source, `--no-binary :all:`), 40+\nbuilt-in error classification patterns (removed C API, Cython, PyO3, struct\nchanges, meson, cmake), custom pattern support via YAML, import probing after\nbuild, parallel execution, and markdown export.\n\nFor the complete guide see **[doc/compat.md](doc/compat.md)**.\n\n## Registry Format\n\nThe registry field schema is documented in\n[laruche](https://github.com/devdanzin/laruche). Each package has a YAML\nfile with fields for installation, testing, and metadata.\n\nThe default registry location is `~/.local/share/labeille/registry/`. Use\n`--registry-dir` to override this for any command.\n\n## Results\n\nEach run creates a directory under `results/{run_id}/` containing:\n\n- **`run_meta.json`** — Run metadata: Python version, JIT status, hostname, timing.\n- **`results.jsonl`** — One JSON line per package with status, exit code, signal,\n  crash signature, timing, and installed dependency versions.\n- **`crashes/`** — Full stderr captures for crashed packages.\n- **`run.log`** — Detailed debug log.\n\nResult statuses: `pass`, `fail`, `crash`, `timeout`, `install_error`,\n`clone_error`, `error`.\n\n## Project Structure\n\n```\nlabeille/\n├── src/labeille/        # Main package\n│   ├── cli.py           # Click CLI entry point (resolve, run, bisect, scan-deps, registry, analyze)\n│   ├── resolve.py       # Resolve PyPI packages to source repositories\n│   ├── runner.py        # Run test suites and capture results\n│   ├── bisect.py        # Automated crash bisection across git history\n│   ├── registry.py      # Registry reading/writing/schema\n│   ├── registry_cli.py  # Batch registry management CLI\n│   ├── registry_ops.py  # Batch operations (add/remove/rename/set/validate)\n│   ├── analyze.py       # Data loading and analysis functions\n│   ├── analyze_cli.py   # Analysis CLI (registry, run, compare, history, package)\n│   ├── bench_cli.py     # Benchmarking CLI (run, show, compare, track, export)\n│   ├── bench/           # Benchmarking subsystem\n│   │   ├── runner.py    # Benchmark execution engine\n│   │   ├── config.py    # Profile loading and condition resolution\n│   │   ├── compare.py   # Statistical comparison\n│   │   ├── tracking.py  # Longitudinal tracking series\n│   │   ├── trends.py    # Trend analysis and regression detection\n│   │   └── ...          # stats, anomaly, constraints, cache, system, export\n│   ├── ft_cli.py        # Free-threaded testing CLI (run, show, flaky, compat, compare)\n│   ├── ft/              # Free-threaded testing subsystem\n│   │   ├── runner.py    # Free-threading test execution\n│   │   ├── results.py   # Failure categories and result structures\n│   │   ├── compat.py    # Extension GIL compatibility detection\n│   │   └── ...          # analysis, compare, display, export\n│   ├── compat_cli.py    # Compatibility survey CLI (survey, show, diff, patterns)\n│   ├── compat.py        # C extension compatibility survey and error classification\n│   ├── formatting.py    # Shared text formatting (tables, histograms, sparklines)\n│   ├── summary.py       # Run summary formatting\n│   ├── yaml_lines.py    # Line-level YAML manipulation\n│   ├── classifier.py    # Pure Python / C extension detection\n│   ├── scan_deps.py     # AST-based test dependency scanner\n│   ├── import_map.py    # Import name to pip package mapping\n│   ├── crash.py         # Crash detection and signature extraction\n│   └── logging.py       # Structured logging setup\n├── doc/                 # Documentation\n│   ├── workflow.md      # Resolve-run workflow guide\n│   ├── benchmarking.md  # Benchmarking guide\n│   ├── free-threaded.md # Free-threaded testing guide\n│   ├── compat.md        # Compatibility analysis guide\n│   └── enrichment.md    # Package enrichment guide\n├── tests/               # Unit and integration tests\n└── results/             # Test run output (gitignored)\n```\n\nThe package registry lives in a separate repository:\n[laruche](https://github.com/devdanzin/laruche).\nUse `labeille registry sync` to fetch it.\n\n## Relationship to lafleur\n\n[lafleur](https://github.com/devdanzin/lafleur) and labeille are complementary tools\nfor finding CPython JIT bugs:\n\n| | lafleur | labeille |\n|---|---|---|\n| **Approach** | Evolutionary fuzzing | Real-world test suites |\n| **Input** | Generated synthetic programs | Existing package tests |\n| **Finds** | Structural JIT bugs | Semantic bugs, regressions |\n| **Coverage** | Broad, random exploration | Targeted, real usage patterns |\n\nUsed together, they provide broad coverage of the JIT's behavior under both synthetic\nand real-world workloads.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, coding standards, and\nthe pull request process.\n\n## Acknowledgments\n\n[Anthropic](https://www.anthropic.com/) provided financial support that enabled access to\nadvanced AI capabilities for labeille's development. See [CREDITS.md](CREDITS.md) for full\ndetails.\n\n## License\n\nMIT — see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevdanzin%2Flabeille","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevdanzin%2Flabeille","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevdanzin%2Flabeille/lists"}