An open API service indexing awesome lists of open source software.

https://github.com/gnufood/parlov

Detects HTTP oracle vulnerabilities through differential probing of RFC-compliant servers.
https://github.com/gnufood/parlov

http pentesting rust security

Last synced: about 15 hours ago
JSON representation

Detects HTTP oracle vulnerabilities through differential probing of RFC-compliant servers.

Awesome Lists containing this project

README

          

# parlov

**HTTP oracle detection tool — systematic probing for RFC-compliant information leakage.**

HTTP servers that faithfully implement RFC 9110 semantics often leak protected internal state through deterministic differences in status codes, response bodies, and headers. parlov detects those differential signals, classifies their severity, and reports whether an application is vulnerable to oracle-based enumeration.

parlov doesn't exploit vulnerabilities — it observes *correct* server behavior and determines whether that behavior reveals information it shouldn't.

---

## Install

```bash
curl -fsSL https://releases.gnu.foo/parlov/install.sh | sh
# or: cargo install parlov
```

---

## How It Works

Every oracle is a **differential**: parlov sends the same request twice — once for a known-existing resource, once for a known-nonexistent one — and observes how the server responds differently. A difference in status code, body, headers, or timing is a signal.

parlov runs this differential across multiple strategies and HTTP methods, accumulates evidence into a Bayesian posterior, and produces a verdict: **Confirmed**, **Likely**, **Inconclusive**, or **NotPresent**.

---

## Detection

### Vectors

Strategies fall into four detection vectors:

- **Status-code differential** — compares HTTP status codes between baseline and probe (e.g. 403 vs 404)
- **Cache probing** — manipulates conditional-request headers (`If-Match`, `If-None-Match`, `If-Modified-Since`, `Range`) to elicit cache-state differentials
- **Error-message granularity** — distinguishes error semantics under matching status codes (e.g. FK constraint violation vs generic 404)
- **Redirect-diff** — compares redirect behavior across URL mutations (case, trailing slash, percent-encoding) to leak canonical existence

### Per-method elicitation

- **GET / HEAD** — 403 vs 404 is the classic signal; HEAD is bandwidth-efficient for high-volume scanning
- **POST** — 409 Conflict (existing) vs 201 / 200 / 202 / 303 (new resource created or accepted)
- **PATCH / PUT** — 422 Unprocessable (existing, invalid patch) vs 404 (nonexistent); PUT may also produce 422 vs 201 under upsert semantics
- **DELETE** — 405 Method Not Allowed or 409 Conflict (existing, deletion blocked) vs 404 (nonexistent)

### Classification patterns

The pattern table below maps each (baseline, probe) status pair to a base verdict. Additional signals (body diff, header presence, metadata leak) can raise confidence further at runtime — the verdict shown is from the pattern alone.

| Baseline | Probe | Base Verdict | RFC Basis |
|----------|-------|--------------|-----------|
| 200 OK | 404 | Confirmed | §15.3.1 |
| 206 Partial Content | 404 | Confirmed | §15.3.7 |
| 416 Range Not Satisfiable | 404 | Confirmed | §15.5.17 |
| 403 Forbidden | 404 | Confirmed | §15.5.4 |
| 401 Unauthorized | 404 | Confirmed | §15.5.2 |
| 409 Conflict | 201 / 200 / 202 / 303 | Confirmed | §15.5.10 |
| 304 Not Modified | 404 | Confirmed | §15.4.5 |
| 422 Unprocessable | 404 / 201 | Confirmed | §15.5.21 |
| 412 Precondition Failed | 404 | Confirmed | §13.1.1 |
| 409 Conflict | 404 | Confirmed | §15.5.10 |
| 409 Conflict | 204 | Confirmed | §15.5.10 |
| 406 Not Acceptable | 404 | Confirmed | §15.5.7 |
| 415 Unsupported Media | 404 | Confirmed | §15.5.16 |
| 410 Gone | 404 | Confirmed | §15.5.11 |
| 500 Internal Error | 404 | Confirmed | §15.6.1 |
| 204 No Content | 404 | Confirmed | §9.3.2 |
| 405 Not Allowed | 404 | Confirmed | §15.5.6 |
| 301 Moved Permanently | 404 | Confirmed | §15.4.2 |
| 302 Found | 404 | Confirmed | §15.4.3 |
| 303 See Other | 404 | Confirmed | §15.4.4 |
| 307 Temporary Redirect | 404 | Confirmed | §15.4.8 |
| 308 Permanent Redirect | 404 | Confirmed | §15.4.9 |
| 413 Content Too Large | 404 | Confirmed | §15.5.14 |
| 411 Length Required | 404 | Confirmed | §15.5.12 |
| 202 Accepted | 404 | Confirmed | §15.3.3 |
| 402 Payment Required | 404 | Likely | §15.5.3 |
| 429 Too Many Requests | 404 | Likely | RFC 6585 §4 |
| 400 Bad Request | 201 / 200 | Likely | §15.5.1 |
| 300 Multiple Choices | 404 | Likely | §15.4.1 |
| *same* | *same* | NotPresent | — |

Unrecognised differentials carry a base confidence of 40, which alone classifies as `NotPresent`. Additional signals (body diff, header value, metadata leak) can lift them to `Likely` or higher.

### Severity

Finding-level severity is signal-dependent, not derived from verdict alone. Strong patterns reach `Medium` from base impact alone; weak patterns (402, 429, 300, 400) and most redirect/payload patterns stay at `Low` without corroborating header signals. `High` is reserved for findings where a `MetadataLeak` signal carries size information (e.g. `Content-Range` on 206 / 416). `Likely` verdicts cap one severity tier below `Confirmed`.

The endpoint-level `severity` field in `endpoint_verdict` is verdict-derived: `Confirmed → High`, `Likely → Medium`, otherwise absent.

### Stability

Stability is confirmed via adaptive sampling — when an initial differential is detected, parlov collects exactly three sample pairs and only classifies if all three are consistent on each side. A same-status first pair short-circuits to `NotPresent` immediately.

---

## `parlov scan`

Automated elicitation scan — runs every applicable strategy against a target endpoint and reports a single endpoint-level verdict. Phase 2 follow-up probes chain off Phase 1 evidence (e.g. follow a baseline `Location`, replay a harvested `ETag`).

### Flags

| Flag | Required | Default | Description | Example |
|------|----------|---------|-------------|---------|
| `--target` | ✓ | — | Target URL template; `{id}` is substituted at request time | `--target "https://api.example.com/users/{id}"` |
| `--baseline-id` | ✓ | — | Known-existing resource ID | `--baseline-id 1001` |
| `--probe-id` | — | Random UUIDv4 | Known-nonexistent resource ID | `--probe-id 0` |
| `--body` | — | — | Request body template; `{id}` is substituted at request time | `--body '{"email":"{id}"}'` |
| `--header` | — | — | Additional request header; repeatable | `--header "Authorization: Bearer ..."` |
| `--risk` | — | `safe` | Maximum strategy risk level — see [Risk levels](#risk-levels) | `--risk method-destructive` |
| `--vector` | — | — | Filter strategies by detection vector; repeatable | `--vector cache-probing:safe` |
| `--strategy` | — | — | Run only the named strategy; repeatable | `--strategy if-none-match-elicit` |
| `--alt-credential` | — | — | Under-scoped credential for the scope-manipulation strategy | `--alt-credential "Authorization: Bearer ..."` |
| `--known-duplicate` | — | — | Known-duplicate field for the uniqueness strategy | `--known-duplicate email=alice@corp.com` |
| `--state-field` | — | — | State field for the state-transition strategy | `--state-field status=invalid` |
| `--exhaustive` | — | `false` | Run all strategies regardless of interim confidence | `--exhaustive` |
| `--repro` | — | `false` | Emit reproducible `curl` commands per finding | `--repro` |
| `--verbose` | — | `false` | Include filtered headers and body samples per finding | `--verbose` |
| `--format` | — | `table` | Output format — see [Output](#output) | `--format json` |

**Notes:**

- `--repro` and `--verbose` emit secrets verbatim. Headers — including `Authorization` and `Cookie` — are not redacted. Both flags are opt-in for hand verification.
- `--exhaustive` also forces Phase 2 chaining to run when the Phase 1 stop rule has already fired.
- `--format` is a global flag, parsed before the subcommand: `parlov --format json scan --target ...`

### Strategy selection

The strategy plan is shaped by exactly one of three mutually-exclusive flags. Combining any two returns an error at startup.

- **`--risk`** *(default `safe`)* — runs every strategy at or below the given risk ceiling
- **`--vector`** — runs only strategies in the listed detection vectors; per-vector risk ceilings via the `vector:risk` syntax. Valid vectors: `cache-probing`, `error-message-granularity`, `redirect-diff`, `status-code-diff`
- **`--strategy`** — runs only the named strategy IDs

### Risk levels

| Level | What it includes |
|-------|------------------|
| `safe` | Read-only probes — no server state is modified |
| `method-destructive` | Non-idempotent probes (POST, PATCH, PUT) — may have side-effects but avoids permanent data loss |
| `operation-destructive` | Irreversible probes (DELETE, resource exhaustion) — may cause permanent state changes |

### Examples

**Anonymous existence probe — minimal invocation:**

```bash
parlov scan \
--target "https://api.example.com/users/{id}" \
--baseline-id "1001"
```

A 403 (baseline) vs 404 (probe) confirms the server reveals which users exist to unauthenticated callers.

**Authenticated IDOR probe — detect resources the caller can't access:**

```bash
parlov scan \
--target "https://api.example.com/projects/{id}" \
--baseline-id "proj-abc" \
--header "Authorization: Bearer eyJhbG..."
```

A 403 vs 404 with a low-privilege token confirms the server discloses which projects exist to authenticated callers who lack access — the classic BOLA enumeration primitive.

**Email enumeration via registration — body-shaped target:**

```bash
parlov scan \
--target "https://api.example.com/register" \
--baseline-id "alice@corp.com" \
--probe-id "nonexistent@corp.com" \
--body '{"email": "{id}", "password": "test123"}' \
--risk method-destructive
```

A 409 (existing) vs 201 (new) confirms the server leaks which emails are registered. `--risk method-destructive` is required because POST is non-idempotent.

**Single-vector scan — only test cache-probing strategies:**

```bash
parlov scan \
--target "https://api.example.com/users/{id}" \
--baseline-id "1001" \
--vector cache-probing
```

Narrows the plan to conditional-request strategies (`If-Match`, `If-None-Match`, `Range`, etc.) — useful when you've already swept the status-code vector and want to isolate cache-state oracles.

**CI integration — structured JSON output:**

```bash
parlov --format json scan \
--target "https://api.example.com/users/{id}" \
--baseline-id "1001" \
> findings.json
```

JSON output (and `--format sarif`) is designed for pipeline ingestion. SARIF v2.1.0 plugs directly into GitHub Advanced Security / Code Scanning.

**Manual verification — reproducible curl per finding:**

```bash
parlov scan \
--target "https://api.example.com/users/{id}" \
--baseline-id "1001" \
--header "Authorization: Bearer eyJ..." \
--repro
```

Each finding includes baseline and probe `curl` invocations, copy-pasteable for hand verification. Useful when triaging findings or writing up reports.

---

## Output

parlov emits three views of the same scan data: a colored terminal table (default), structured JSON (v1.2.0) for piping, and SARIF v2.1.0 for CI/security-platform ingestion. All three come from the same `EndpointVerdict` + `ScanFinding[]` payload — the differences are layout, naming, and SARIF's parity gaps.

### Output forms

#### `--format table`

ANSI-colored terminal output. Each non-`NotPresent` finding renders the strategy, verdict, severity, evidence (status pair, label, leak description, RFC basis), and the actual probe URL pair. An `EndpointVerdict` summary block follows the per-finding rows with posterior probability, stop reason, strategies-run / total, and first-confirming / final-confirming strategy. When the endpoint status is anything other than `EvidenceObserved`, the table appends `Observability` and `Action` rows explaining what blocked the scan and what to do next.

#### `--format json`

Structured JSON v1.2.0 — fields per [Schema](#schema). With `--verbose`, each finding additionally carries filtered request/response headers and 256-byte body samples; with `--repro`, reproducible `curl` invocations per finding.

#### `--format sarif`

SARIF v2.1.0 suitable for GitHub Advanced Security / Code Scanning ingestion. The same underlying `ScanFinding` data is reshaped into SARIF's rule/result model; parlov domain blocks are preserved verbatim under the spec-blessed `properties` extension point. See [SARIF reference](#sarif-reference) for the field mapping and SARIF-only fields.

### Schema

Canonical field definitions used by both `--format json` and `--format sarif`. See [Field mapping](#field-mapping) for where each parlov field lands in SARIF.

#### Top-level envelope

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `schema_version` | string | always | Schema version — currently `"1.2.0"` |
| `target_url` | string | always | URL template passed to `--target` |
| `endpoint_verdict` | object | `scan` command | Aggregated verdict across all strategies; absent on single-finding output |
| `findings` | array | always | Per-strategy `ScanFinding` entries |

#### `endpoint_verdict`

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `oracle_class` | string | always | e.g. `"Existence"` |
| `verdict` | string | always | `Confirmed`, `Likely`, `Inconclusive`, or `NotPresent` |
| `posterior_probability` | number | always | Bayesian posterior `[0.0, 1.0]` |
| `severity` | string | always | `High` (Confirmed), `Medium` (Likely), `None` otherwise |
| `strategies_run` | integer | always | Strategies dispatched during the scan |
| `strategies_total` | integer | always | Strategies planned at scan start |
| `stop_reason` | string | when stopped | `EarlyAccept`, `EarlyReject`, or `ExhaustedPlan`; omitted while scan is running |
| `first_threshold_crossed_by` | string | exhaustive mode | Strategy ID where the posterior first crossed the confirm threshold |
| `final_confirming_strategy` | string | when Confirmed | First strategy in scan order where cumulative attribution crosses the confirm threshold |
| `contributing_findings` | array | always | Per-strategy log-odds contributions — see below |
| `observability_status` | string | always | `EvidenceObserved`, `ProbedNoEvidence`, `BlockedBeforeOracleLayer`, `PartiallyBlocked`, `Underpowered`, or `SurfaceMismatch` |
| `block_summary` | object | when blocked | Present when `observability_status` is `BlockedBeforeOracleLayer` or `PartiallyBlocked`; omitted otherwise |

#### `contributing_findings[]`

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `strategy_id` | string | always | e.g. `"emg-fk-violation"` |
| `strategy_name` | string | always | Human-readable strategy name |
| `outcome_kind` | string | always | `Positive`, `NoSignal`, `Contradictory`, or `Inapplicable` |
| `log_odds_contribution` | number | always | Log-odds delta applied to the running posterior; `0.0` for `NoSignal` and `Inapplicable` |
| `block_family` | string | `Inapplicable` outcomes | `Authorization`, `Method`, `Parser`, `TechniqueLocal`, or `Surface` |

#### `block_summary`

Present only when `observability_status` is `BlockedBeforeOracleLayer` or `PartiallyBlocked`.

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `expected_observation_opportunities` | integer | always | Total strategies that could have reached the oracle layer |
| `blocked_before_oracle_layer` | integer | always | Count blocked by scan-wide gates before reaching the oracle layer |
| `blocked_fraction` | number | always | `blocked_before_oracle_layer / expected_observation_opportunities` |
| `dominant_block_family` | string | always | Block family causing the most blocks |
| `dominant_block_reasons` | array of strings | always | Operator-facing reason strings from blocked strategies |
| `operator_action` | string | when available | Suggested remediation action |

#### Per-finding fields

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `finding_id` | string | always | Deterministic fingerprint of technique + target + method |
| `strategy` | object | always | `id`, `name`, `method` |
| `result` | object | always | Verdict, confidence, severity, and impact class |
| `technique` | object | always | Detection vector and RFC strength |
| `matched_pattern` | object | always | Pattern label, leak description, and RFC basis |
| `evidence` | object | always | `reasons` (scoring breakdown) and `signals` (typed observations) |
| `probe` | object | always | Baseline/probe URLs and method; `headers` added by `--verbose` |
| `exchange` | object | always | Status codes; `headers` and `body_samples` added by `--verbose` |
| `repro` | object | `--repro` | `baseline_curl` and `probe_curl` as copy-pasteable commands |
| `chain_provenance` | object | Phase 2 findings | `producer_kind` and `producer_value` of the harvested signal |

**`result`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `oracle_class` | string | always | e.g. `"Existence"` |
| `verdict` | string | always | `Confirmed`, `Likely`, `Inconclusive`, or `NotPresent` |
| `confidence` | integer | always | `0–100` |
| `severity` | string | always | Signal-dependent: `High`, `Medium`, `Low`, or `None` — not derived from verdict alone |
| `impact_class` | string | when set | `High`, `Medium`, or `Low`; omitted when the pattern carries no impact classification |

**`technique`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `id` | string | always | Technique ID, e.g. `"emg-fk-violation"` |
| `vector` | string | always | `StatusCodeDiff`, `CacheProbing`, `ErrorMessageGranularity`, or `RedirectDiff` |
| `normative_strength` | string | always | `Must`, `MustNot`, `Should`, or `May` |

**`matched_pattern`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `label` | string | when matched | Short pattern name, e.g. `"State-conflict differential"` |
| `leaks` | string | when matched | Human description of what is leaked |
| `rfc_basis` | string | when matched | e.g. `"RFC 9110 §15.5.10"` |

**`evidence`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `reasons` | array | always | Scoring breakdown — see `reasons[]` below |
| `signals` | array | always | Typed observations — see `signals[]` below |

**`evidence.reasons[]`**

| Field | Type | Description |
|-------|------|-------------|
| `description` | string | Human-readable explanation of the contribution |
| `points` | integer | Points added (positive) or subtracted (negative); `i16` range |
| `dimension` | string | `Confidence` or `Impact` |

**`evidence.signals[]`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `kind` | string | always | `StatusCodeDiff`, `HeaderPresence`, `HeaderValue`, `BodyDiff`, `TimingDiff`, `MetadataLeak`, or `InputReflection` |
| `evidence` | string | always | Human-readable observation, e.g. `"409 (baseline) vs 404 (probe)"` |
| `rfc_basis` | string | when applicable | RFC section grounding the expected behavior |

**`probe`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `baseline_url` | string | always | Baseline URL after `{id}` substitution |
| `probe_url` | string | always | Probe URL after `{id}` substitution |
| `method` | string | always | HTTP method, e.g. `"DELETE"` |
| `headers` | object | `--verbose` | Security-relevant request headers; `baseline` and `probe` sub-keys |

**`exchange`**

| Field | Type | Present | Description |
|-------|------|---------|-------------|
| `baseline_status` | integer | always | HTTP status of the baseline response |
| `probe_status` | integer | always | HTTP status of the probe response |
| `headers` | object | `--verbose` | Security-relevant response headers; `baseline` and `probe` sub-keys |
| `body_samples` | object | `--verbose` | 256-byte UTF-8-safe body samples; `baseline` and `probe` sub-keys |

**`repro`**

| Field | Type | Description |
|-------|------|-------------|
| `baseline_curl` | string | Reproducible `curl` for the baseline request; headers verbatim, not redacted |
| `probe_curl` | string | Reproducible `curl` for the probe request; headers verbatim, not redacted |

**`chain_provenance`**

| Field | Type | Description |
|-------|------|-------------|
| `producer_kind` | string | Discriminant of the harvested signal, e.g. `"Etag"`, `"Location"` |
| `producer_value` | string | Serialized value of the harvested signal |

### SARIF reference

#### Field mapping

How parlov [Schema](#schema) fields appear in SARIF output. `—` indicates the field is not exposed in SARIF.

| parlov field | SARIF location |
|--------------|----------------|
| `schema_version` | `version` (`"2.1.0"`) |
| `target_url` | `runs[].properties.target_url` and `runs[].results[].locations[].physicalLocation.artifactLocation.uri` |
| `endpoint_verdict.oracle_class` | — |
| `endpoint_verdict.verdict` | `runs[].properties.endpoint_verdict` |
| `endpoint_verdict.posterior_probability` | `runs[].properties.posterior_probability` |
| `endpoint_verdict.severity` | — |
| `endpoint_verdict.strategies_run` | `runs[].properties.strategies_run` |
| `endpoint_verdict.strategies_total` | `runs[].properties.strategies_total` |
| `endpoint_verdict.stop_reason` | `runs[].properties.stop_reason` |
| `endpoint_verdict.first_threshold_crossed_by` | — |
| `endpoint_verdict.final_confirming_strategy` | — |
| `endpoint_verdict.contributing_findings` | — |
| `endpoint_verdict.observability_status` | `runs[].properties.observability_status` |
| `endpoint_verdict.block_summary.operator_action` | `runs[].properties.operator_action` |
| `endpoint_verdict.block_summary` (other fields) | — |
| `finding.finding_id` | `runs[].results[].fingerprints.oracleFingerprint/v1` |
| `finding.strategy.id` | `runs[].results[].ruleId` (and matched rule's `id`) |
| `finding.strategy.name` | rule `name` (derived as `OracleClass`+`Oracle`) |
| `finding.strategy.method` | `runs[].results[].properties.method` |
| `finding.result.oracle_class` | `runs[].results[].properties.oracle_class` |
| `finding.result.verdict` | `runs[].results[].level` (derived) and `properties.verdict` |
| `finding.result.confidence` | `runs[].results[].properties.confidence` and rule `properties.security-severity` (derived as `confidence/10`) |
| `finding.result.severity` | — |
| `finding.result.impact_class` | `runs[].results[].properties.impact_class` |
| `finding.technique.id` | matches `ruleId` |
| `finding.technique.vector` | rule `properties.vector` |
| `finding.technique.normative_strength` | — |
| `finding.matched_pattern.label` | rule `shortDescription.text` |
| `finding.matched_pattern.leaks` | `runs[].results[].message.text` (fallback: primary signal evidence) |
| `finding.matched_pattern.rfc_basis` | embedded in `relatedLocations[].message.text` |
| `finding.evidence.reasons` | `runs[].results[].properties.reasons` |
| `finding.evidence.signals` | `runs[].results[].relatedLocations[]` — flattened to `[kind] evidence (rfc_basis)` strings |
| `finding.probe` | `runs[].results[].properties.probe` |
| `finding.exchange` | `runs[].results[].properties.exchange` |
| `finding.repro` | `runs[].results[].properties.repro` |
| `finding.chain_provenance` | `runs[].results[].properties.chain_provenance` |

#### SARIF-only fields

Fields present in SARIF that have no direct parlov schema equivalent — either SARIF metadata or values derived from the parlov model.

| SARIF field | Value / derivation |
|-------------|--------------------|
| `$schema` | SARIF v2.1.0 JSON schema URL |
| `version` | `"2.1.0"` |
| `runs[].tool.driver.name` | `"parlov"` |
| `runs[].tool.driver.version` | parlov package version |
| `runs[].tool.driver.rules[]` | One per unique technique that fired — see [Rule definitions](#rule-definitions) below |
| `runs[].results[].level` | `error` (Confirmed), `warning` (Likely), or `note` (Inconclusive). Derived from `finding.result.verdict` |
| `runs[].results[].partialFingerprints.techniqueTargetHash/v1` | `technique_id:host/path` (scheme stripped) — for cross-run deduplication |

#### Rule definitions

Each unique technique that fired produces one entry in `runs[].tool.driver.rules[]`.

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Technique ID, e.g. `"emg-fk-violation"` |
| `name` | string | `OracleClass` + `Oracle`, e.g. `"ExistenceOracle"` |
| `shortDescription.text` | string | Pattern label, or `"HTTP differential oracle"` when no label matched |
| `properties.oracle_class` | string | Oracle class, e.g. `"Existence"` |
| `properties.vector` | string | Detection vector, e.g. `"ErrorMessageGranularity"` |
| `properties.security-severity` | string | One-decimal float in `[0.0, 10.0]`, derived from `confidence / 10` |

---

## License

Dual-licensed under [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE) at your option.