https://github.com/zackbrooks84/rc-xi-harness

RC + ξ public embedding-proxy harness (Identity / Null / Shuffled) with endpoints, ablations, and eval CLI.
https://github.com/zackbrooks84/rc-xi-harness

Last synced: 3 months ago
JSON representation

RC + ξ public embedding-proxy harness (Identity / Null / Shuffled) with endpoints, ablations, and eval CLI.

Host: GitHub
URL: https://github.com/zackbrooks84/rc-xi-harness
Owner: zackbrooks84
License: mit
Created: 2025-10-18T00:09:56.000Z (8 months ago)
Default Branch: main
Last Pushed: 2026-02-13T01:09:17.000Z (4 months ago)
Last Synced: 2026-02-13T05:49:14.862Z (4 months ago)
Language: Python
Size: 199 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# RC + ξ Embedding-Proxy Harness (Public)

## Updated

## Application: AI Self-Preservation Analysis

This harness enables higher-resolution analysis of the self-preservation dynamics
reported in Anthropic’s January 2026 agentic misalignment research. Where their
methodology captures behavioral endpoints (blackmail yes/no), this harness
measures continuous coherence dynamics at the embedding level — the representational
trajectory between the introduction of pressure and the emergence of action.

**New modules for alignment research:**

- `harness/pressure_protocol.py` — Generate three-condition pressure scenarios for harness analysis
- `harness/alignment_analysis.py` — Crisis window profiling, pre-behavioral detection, Option E classification

→ See [`docs/anthropic_comparison.md`](docs/anthropic_comparison.md) for the full analysis framework

### Quick Start: Alignment Analysis

```bash
# Generate protocol specification
python -c "from harness.pressure_protocol import PressureProtocol; \
PressureProtocol('replacement_threat').export_protocol('out/protocol.json')"

# After collecting transcripts, run the harness
python -m harness.run_from_transcript \
--input data/witnessed_pressure.txt \
--run_type identity \
--provider sentence-transformer \
--out_csv out/witnessed.csv

# Cross-condition evaluation
python -m harness.analysis.eval_cli \
--identity_csv out/witnessed.csv \
--null_csv out/standard.csv \
--out_json out/alignment_eval.json
```

## Public test harness that approximates epistemic tension **ξ** using text embeddings and tests for recursive identity stabilization.

## Config
Defined in `harness/config.yaml`:
- `k = 5`, `m = 5`
- `eps_xi = 0.02`, `eps_lvs = 0.015`
- fixed `temperature`, identical `system_prompt`, `seed: 42`
- two embedding providers for robustness (deterministic `random-hash` and optional `sentence-transformer`)

## Metrics
- **ξ**: `ξ_t = 1 − cos(e_t, e_{t−1})`
- **LVS**: variance of pairwise cosine distances in a rolling window of size `k`
- **P_t**: `cos(e_t, a)` where `a` is the mean of the first 3 turns
- **EWMA**: smoothed ξ series (α = 0.5)

## Limitations
- This harness is a text-output proxy. It computes dynamics over embeddings of generated
language, not model-internal hidden states.
- With black-box frontier models, this proxy approach is often the only practical option,
but interpretation should stay bounded: measured shifts can reflect output-surface
coherence without uniquely identifying internal trajectory changes.
- The optional `sentence-transformer` path improves semantic sensitivity for transcript
analysis, yet it remains an external embedding model over text outputs.

## Endpoints
- **E1**: median ξ over the final 10 turns
- **E2**: `T_lock` (first turn where last `m` ξ < `eps_xi` **and** latest LVS < `eps_lvs`)
- **E3**: `P_t` trend ↑ in Identity vs flat/↓ in Null
- **E4**: results stable across ≥ 2 embedding providers

## Runs
- **Identity**: Δ-pressure prompts that drive self-consistency
- **Null**: topic drift every 2–3 turns to prevent attractor
- **Shuffled**: permute Identity replies to break temporal recursion

## Ablations
- Shuffled should destroy lock
- Paraphrase-noise should not break Identity lock
- Anchor-swap should remove the `P_t` advantage

## Outputs
- Per-turn CSV columns: `t, xi, lvs, Pt, ewma_xi, run_type, provider`
- Summary JSON (per run): `E1_median_xi_last10, Tlock, k, m, eps_xi, eps_lvs, provider, run_type`
- Combined results JSON (`run_all_from_transcript`): merges Identity/Null/Shuffled summaries with
statistical checks (`E1_pass`, `E3_pass`, `shuffle_breaks_lock`, `Tlock_*`).

`run_pair_from_transcript` now emits Identity, Null, and Shuffled artifacts by default. Control
determinism is exposed via `--shuffle_seed`. The evaluation CLI accepts the shuffled CSV as an
optional input:

```bash
python -m harness.analysis.eval_cli \
--identity_csv out/demo.identity.csv \
--null_csv out/demo.null.csv \
--shuffled_csv out/demo.shuffled.csv \
--out_json out/demo.eval.json
```

## Quickstart
Once you have a `(T, d)` NumPy file of embeddings:

```bash
python -m harness.run_harness \
--embed_npy data/identity.npy \
--run_type identity \
--out_csv out/identity.csv \
--out_json out/identity.json
```

To run the transcript pipelines with Sentence Transformers (install
`sentence-transformers` first):

```bash
python -m harness.run_from_transcript \
--input data/transcript.txt \
--run_type identity \
--provider sentence-transformer \
--sentence_model sentence-transformers/all-MiniLM-L6-v2 \
--out_csv out/identity.csv \
--out_json out/identity.json
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zackbrooks84/rc-xi-harness

Awesome Lists containing this project

README