{"id":47663750,"url":"https://github.com/mattyopon/faultray","last_synced_at":"2026-04-02T11:48:05.326Z","repository":{"id":344153353,"uuid":"1176780870","full_name":"mattyopon/faultray","owner":"mattyopon","description":"Zero-risk infrastructure chaos simulation — 5 engines, 2000+ scenarios, 3-Layer availability proof. No production fault injection.","archived":false,"fork":false,"pushed_at":"2026-03-28T13:04:40.000Z","size":8926,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-28T15:35:45.306Z","etag":null,"topics":["availability","chaos-engineering","devops","infrastructure","python","resilience","simulation","sre"],"latest_commit_sha":null,"homepage":"https://faultray.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mattyopon.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":"CLA.md"},"funding":{"github":"mattyopon"}},"created_at":"2026-03-09T11:19:20.000Z","updated_at":"2026-03-28T13:04:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mattyopon/faultray","commit_stats":null,"previous_names":["mattyopon/infrasim","mattyopon/faultray"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/mattyopon/faultray","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattyopon%2Ffaultray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattyopon%2Ffaultray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattyopon%2Ffaultray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattyopon%2Ffaultray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mattyopon","download_url":"https://codeload.github.com/mattyopon/faultray/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mattyopon%2Ffaultray/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31305809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T09:48:21.550Z","status":"ssl_error","status_checked_at":"2026-04-02T09:48:19.196Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["availability","chaos-engineering","devops","infrastructure","python","resilience","simulation","sre"],"created_at":"2026-04-02T11:48:04.575Z","updated_at":"2026-04-02T11:48:05.318Z","avatar_url":"https://github.com/mattyopon.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eFaultRay\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\u003cstrong\u003eDORA-Compliant Resilience Testing — Without Touching Production\u003c/strong\u003e\u003c/p\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/faultray/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/faultray\" alt=\"PyPI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/faultray/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/dm/faultray\" alt=\"Downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.python.org/downloads/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.11+-blue.svg\" alt=\"Python 3.11+\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-BSL%201.1-orange.svg\" alt=\"License\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://doi.org/10.5281/zenodo.19139911\"\u003e\u003cimg src=\"https://zenodo.org/badge/DOI/10.5281/zenodo.19139911.svg\" alt=\"DOI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/mattyopon/faultray/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/mattyopon/faultray/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://faultray.streamlit.app/\"\u003e\u003cimg src=\"https://static.streamlit.io/badges/streamlit_badge_black_white.svg\" alt=\"Open in Streamlit\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/mattyopon/faultray\"\u003e\u003cimg src=\"https://img.shields.io/badge/resilience-72%2F100-green\" alt=\"Resilience Score\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\nFaultRay simulates **2,000+ failure scenarios** entirely in memory — mathematically proving your availability ceiling before anything breaks. Built for financial institutions that need to prove DORA compliance without risking production systems.\n\n```bash\npip install faultray\nfaultray demo\n```\n\n```\n╭────────── FaultRay Chaos Simulation Report ──────────╮\n│ Resilience Score: 36/100                             │\n│ Scenarios tested: 2,000+                             │\n│ Critical: 7  Warning: 66  Passed: 77                 │\n│ DORA Compliance: 48/52 controls assessed             │\n╰──────────────────────────────────────────────────────╯\n```\n\n## Why Financial Institutions Choose FaultRay\n\nTraditional chaos engineering tools (Gremlin, Steadybit, AWS FIS) inject real failures into production. For banks, insurers, and payment processors operating under DORA, that approach creates unacceptable risk.\n\nFaultRay takes a fundamentally different approach: **mathematical simulation**. Your trading systems stay online. Your payment rails keep running. You still get the evidence regulators need.\n\n| | Gremlin | Steadybit | AWS FIS | **FaultRay** |\n|---|---|---|---|---|\n| Approach | Breaks production | Breaks production | Breaks production | **Math simulation** |\n| Production risk | Medium-High | Medium | Medium | **Zero** |\n| Setup | Agent per host | Agent per host | AWS only | **`pip install`** |\n| DORA evidence | No | No | No | **Yes — audit-ready** |\n| AI agent testing | No | No | No | **Yes** |\n| Cost | $$$$ | $$$ | $$ | **Free tier / Enterprise** |\n\n## DORA Compliance — All 5 Pillars\n\nFaultRay maps directly to the EU Digital Operational Resilience Act (Regulation EU 2022/2554), fully effective since January 17, 2025. Non-compliance carries fines up to **2% of global annual turnover**.\n\n### Full DORA Command Suite\n\n```bash\n# Pillar 1: ICT Risk Management (Articles 5-16)\nfaultray dora assess model.json              # 52-control compliance check\nfaultray dora risk-assessment model.json     # Comprehensive risk evaluation\nfaultray dora gap-analysis model.json        # Control gaps + remediation\n\n# Pillar 2: Incident Management (Articles 17-23)\nfaultray dora incident-assess model.json     # Incident readiness evaluation\n\n# Pillar 3: Resilience Testing (Articles 24-27)\nfaultray simulate model.json --json          # 2,000+ scenario simulation\nfaultray dora test-plan model.json           # Generate resilience test plan\nfaultray dora tlpt-readiness model.json      # TLPT preparation assessment\n\n# Pillar 4: Third-Party Risk (Articles 28-30)\nfaultray dora concentration-risk model.json  # ICT concentration risk (HHI)\nfaultray dora register model.json            # RTS 2024/1774 register\n\n# Pillar 5: Information Sharing (Article 45)\n# Integrated threat intelligence from CVE/CISA advisories\n\n# Evidence \u0026 Reporting\nfaultray dora evidence model.json            # Audit-ready evidence package\nfaultray dora report model.json              # HTML report for regulators\nfaultray dora rts-export model.json --format csv  # Machine-readable export\n```\n\n### What Regulators See\n\nFaultRay generates timestamped, signed evidence packages that map every finding to specific DORA articles and RTS requirements:\n\n- **RTS 2024/1774** — ICT Risk Management Framework details\n- **ITS 2024/2956** — Register of Information templates\n- **RTS 2025/301** — Incident reporting content and timelines\n\n## Quick Start\n\n### 1. Terraform Safety Net (CI/CD Integration)\n\n```bash\nterraform plan -out=plan.out\nterraform show -json plan.out \u003e plan.json\nfaultray tf-check plan.json --fail-on-regression --min-score 60\n```\n\n```yaml\n# .github/workflows/terraform.yml\n- name: Resilience Gate\n  run: |\n    pip install faultray\n    terraform show -json plan.out \u003e plan.json\n    faultray tf-check plan.json --fail-on-regression --min-score 60\n```\n\n### 2. GitHub Action (Marketplace)\n\nAdd FaultRay to any CI/CD pipeline with our official GitHub Action:\n\n```yaml\n# .github/workflows/resilience.yml\nname: Resilience Check\non: [pull_request]\njobs:\n  check:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: mattyopon/faultray@v1\n        with:\n          plan-file: plan.json\n          min-score: 60\n          fail-on-regression: true\n          financial: true\n```\n\nOr use it with a YAML infrastructure definition:\n\n```yaml\n      - uses: mattyopon/faultray@v1\n        with:\n          yaml-file: infra.yaml\n          financial: true\n          cost-per-hour: 25000\n```\n\nAvailable inputs:\n\n| Input | Description | Default |\n|---|---|---|\n| `plan-file` | Path to Terraform plan JSON file | `''` |\n| `yaml-file` | Path to infrastructure YAML file | `''` |\n| `min-score` | Minimum resilience score (0-100). Fails if below. | `0` |\n| `fail-on-regression` | Fail if resilience score drops from baseline | `false` |\n| `financial` | Include financial impact analysis | `false` |\n| `cost-per-hour` | Default cost per hour of downtime (USD) | `10000` |\n\n### 3. Define Your Infrastructure\n\n```yaml\n# infra.yaml\ncomponents:\n  - id: api-gateway\n    type: load_balancer\n    replicas: 2\n  - id: trading-engine\n    type: app_server\n    replicas: 3\n  - id: market-data\n    type: database\n    replicas: 1   # ← FaultRay flags this as SPOF\n\ndependencies:\n  - source: api-gateway\n    target: trading-engine\n    type: requires\n  - source: trading-engine\n    target: market-data\n    type: requires\n```\n\n```bash\nfaultray load infra.yaml\nfaultray simulate --html report.html\n```\n\n### 4. AI Agent Testing\n\n```bash\nfaultray agent assess ai-workflow.yaml     # Risk assessment\nfaultray agent scenarios ai-workflow.yaml  # What could go wrong?\n```\n\nSimulates AI-specific failures: hallucination cascades, context overflow, LLM rate limiting, token exhaustion, tool failures, agent loops, prompt injection.\n\n### Sensitivity Ratchet Simulation\n\nMeasure how much damage the **sensitivity ratchet** prevents. The ratchet is a security mechanism where an agent's outbound permissions narrow irreversibly once it accesses data above a certain sensitivity threshold (PUBLIC \u003c INTERNAL \u003c CONFIDENTIAL \u003c RESTRICTED \u003c TOP_SECRET).\n\n```bash\nfaultray agent ratchet                        # Run all built-in scenarios\nfaultray agent ratchet --scenario exfiltration  # Single scenario\nfaultray agent ratchet --json                 # Machine-readable output\n```\n\nBuilt-in scenarios:\n- **exfiltration** — Agent reads classified data then tries to send externally\n- **cross-agent** — Agent A passes classified data to Agent B who attempts external send\n- **escalation** — Agent gradually accesses higher-sensitivity data\n\nEach scenario runs twice (with and without the ratchet) and reports an **effectiveness score** showing how much data-leak damage the ratchet prevents.\n\n### 5. Continuous Compliance Monitoring\n\n```bash\nfaultray compliance-monitor model.json --framework dora  # DORA\nfaultray compliance-monitor model.json --framework soc2  # SOC 2\nfaultray compliance-monitor model.json --framework pci   # PCI DSS\n```\n\nTracks compliance trends over 90 days with automated drift detection.\n\n## APM — Application Performance Monitoring\n\nFaultRay includes a lightweight APM agent that collects real-time host metrics and feeds them to the FaultRay collector for anomaly detection, alerting, and topology-aware analysis.\n\n```bash\n# One-command interactive setup\nfaultray apm setup\n\n# Or manual setup\nfaultray apm install --collector http://localhost:8080\nfaultray apm start\nfaultray apm status\n```\n\n### Architecture\n\n```\nYour Hosts                          FaultRay Server\n┌────────────────────────────┐      ┌──────────────────────────────┐\n│  APM Agent  (each host)    │      │  Collector  faultray serve   │\n│  ─────────────────────     │      │  ─────────────────────────── │\n│  Collects every 15s:       │─────▶│  Time-Series DB              │\n│  • CPU utilization         │ HTTP │  Anomaly Detection (Z-score) │\n│  • Memory usage            │      │  Alert Rules Engine          │\n│  • Disk usage              │      │  Web Dashboard  :8080/apm    │\n│  • Network I/O             │      └──────────────────────────────┘\n│  • Process count           │\n│  • TCP connections         │\n└────────────────────────────┘\n```\n\n### Metrics Collected\n\n| Metric | Description |\n|---|---|\n| `cpu_percent` | CPU utilization across all cores |\n| `memory_percent` | RAM usage (used / total) |\n| `disk_percent` | Root disk usage |\n| `net_bytes_sent` | Network bytes sent |\n| `net_bytes_recv` | Network bytes received |\n| `process_count` | Number of running processes |\n| `tcp_connections` | Active TCP connections |\n\n### Integration with Simulation\n\nAPM real-baseline data feeds directly into chaos simulations:\n\n```bash\n# Capture baseline metrics\nfaultray apm metrics \u003cagent-id\u003e --json \u003e baseline.json\n\n# Run simulation using real topology\nfaultray simulate infra.yaml\n\n# Correlate simulation results with APM alerts\nfaultray apm alerts --severity critical\n```\n\n## Resilience Badge\n\nShow your infrastructure resilience score in your README:\n\n```bash\nfaultray badge infra.yaml\n```\n\nOutput:\n\n```\n[![Resilience Score](https://img.shields.io/badge/resilience-72%2F100-green)](https://github.com/mattyopon/faultray)\n```\n\nWhich renders as: ![Resilience Score](https://img.shields.io/badge/resilience-72%2F100-green)\n\nThe badge color adjusts automatically based on your score:\n\n| Score | Color |\n|-------|-------|\n| 80-100 | Bright green |\n| 60-79 | Green |\n| 40-59 | Yellow |\n| 20-39 | Orange |\n| 0-19 | Red |\n\nFor raw URL output (no markdown wrapping):\n\n```bash\nfaultray badge infra.yaml --url\n```\n\n## Key Features\n\n| Feature | Description |\n|---|---|\n| **5-Layer Availability Model** | Mathematical proof of your uptime ceiling — \"your 99.99% SLA is physically impossible given this topology\" |\n| **5 Simulation Engines** | Cascade, Dynamic, Ops, What-If, Capacity |\n| **DORA Compliance Suite** | 52 controls, 5 pillars, audit-ready evidence packages |\n| **Cascade Failure Analysis** | Graph-based blast radius mapping with containment scoring |\n| **SPOF Detection** | Automatic identification of single points of failure |\n| **AI Agent Testing** | 7 agent-specific fault types (hallucination, loops, etc.) |\n| **Terraform Integration** | Pre-apply impact analysis as a CI/CD gate |\n| **Third-Party Risk** | ICT concentration risk analysis (Herfindahl-Hirschman Index) |\n| **Multi-Framework Compliance** | SOC 2, ISO 27001, PCI DSS 4.0, NIST CSF, DORA, HIPAA, GDPR |\n| **APM Agent** | Install once, monitor forever — real-time metrics, anomaly detection, topology auto-discovery |\n| **100+ CLI Commands** | From `faultray demo` to `faultray war-room` |\n\n## The 5-Layer Availability Model\n\nMost SLA claims are aspirational. FaultRay proves what's actually achievable:\n\n| Layer | What It Measures | Financial Impact |\n|---|---|---|\n| L1: Software | Deploy downtime, human error, config drift | Operational uptime ceiling |\n| L2: Hardware | MTBF/MTTR × redundancy × failover | Physical infrastructure limits |\n| L3: Theoretical | Network loss, GC pauses, jitter | Unreachable upper bound |\n| L4: Operational | Incident rate × response time, on-call coverage | Team capacity constraints |\n| L5: External SLA | ∏(third-party SLAs) | Vendor dependency floor |\n\n**Result**: A mathematically provable availability ceiling. If your infrastructure graph says 99.95% max but you're promising 99.99%, FaultRay catches it — before the regulator does.\n\n## Research \u0026 Patent\n\nFaultRay's core algorithms are described in a peer-reviewable paper and protected by a US patent application.\n\n**Paper:**\n\u003e Maeda, Y. (2026). *FaultRay: In-Memory Infrastructure Resilience Simulation with Graph-Based Cascade Analysis, Multi-Layer Availability Limits, and AI Agent Failure Modeling.* Zenodo. [DOI: 10.5281/zenodo.19139911](https://doi.org/10.5281/zenodo.19139911)\n\n**Patent:**\n\u003e US Provisional Patent Application No. 64/010,200 (filed March 19, 2026)\n\n```bibtex\n@misc{maeda2026faultray,\n  author    = {Maeda, Yutaro},\n  title     = {FaultRay: In-Memory Infrastructure Resilience Simulation},\n  year      = {2026},\n  doi       = {10.5281/zenodo.19139911},\n  publisher = {Zenodo}\n}\n```\n\n## Development\n\n```bash\npip install -e \".[dev]\"\npytest tests/ -v\nruff check src/ tests/\n```\n\n## Community\n\n- [Contributing Guide](CONTRIBUTING.md)\n- [Security Policy](SECURITY.md)\n- [Code of Conduct](CODE_OF_CONDUCT.md)\n- [Changelog](CHANGELOG.md)\n\n## License\n\nBSL 1.1 — see [LICENSE](LICENSE). Converts to Apache 2.0 on 2030-03-17.\n","funding_links":["https://github.com/sponsors/mattyopon"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmattyopon%2Ffaultray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmattyopon%2Ffaultray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmattyopon%2Ffaultray/lists"}