https://github.com/pineforge-4pass/pineforge-corpus

PineForge TV validation corpus (maintainers only)
https://github.com/pineforge-4pass/pineforge-corpus

Last synced: 10 days ago
JSON representation

PineForge TV validation corpus (maintainers only)

Host: GitHub
URL: https://github.com/pineforge-4pass/pineforge-corpus
Owner: pineforge-4pass
License: apache-2.0
Created: 2026-05-05T14:19:07.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2026-05-31T18:53:20.000Z (18 days ago)
Last Synced: 2026-05-31T20:23:13.621Z (18 days ago)
Language: C++
Size: 51 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE

Awesome Lists containing this project

README

# PineForge validation corpus

The corpus is PineForge's reproducibility kit for the parity claim in the
project README. Every probe is a hand-written, clean-room PineScript v6
strategy paired with TradingView's exported trade list and PineForge's own
trade list, so a third party can diff the two CSVs and confirm engine
behaviour matches TradingView on the same bar feed.

## Headline parity

- **246** verified strategies, all under `corpus/validation/`.
- **~375,000 trades** total across the suite (TV: 375,453; engine: 375,548 — `+95` ≈ 0.03 % over TV).
- **245** excellent (bit-for-bit or within strict thresholds on every
parity dimension).
- **1** documented anomaly — `anomaly-equity-mirror-strategy-equity-01` —
where TradingView's broker emulator exhibits non-deterministic
accept/reject behaviour at the exact 1× equity margin boundary. Engine
is deterministic and correct; full write-up lives in
`pineforge-utils/parity-anomalies/tv-margin-boundary.md`.
- **0** strong / moderate / weak.

The canonical, regenerated-each-sweep disposition table is
[`validation_report.md`](validation_report.md) (rendered as
`validation_report.html` and `validation_report.pdf`).

## Artifact tuple

Each probe directory ships four files in git:

| File | Source | Role |
| ------------------- | ---------------------------- | ---------------------------------------------------------- |
| `strategy.pine` | hand-written | PineScript v6 source |
| `generated.cpp` | pineforge-codegen transpiler | C++ output of the transpiler over `strategy.pine` |
| `tv_trades.csv` | TradingView export | TV broker emulator's trade list for `strategy.pine` |
| `engine_trades.csv` | PineForge | Engine's trade list for the same script (TV-format CSV) |

`generated.cpp` is the transpiler output of our own clean-room
PineScript and ships under the same Apache-2.0 license as
`strategy.pine`. It is included in-tree so public users can rebuild
without needing access to the separate, source-available `pineforge-codegen`
transpiler — `cmake --build build --target corpus_strategies` compiles
each `generated.cpp` into a per-strategy shared library. The compiled
`strategy.dylib` / `.so` / `.dll` are platform-specific build artefacts
and remain ignored.

## Reference OHLCV

The comparison feed is **Binance ETH-USDT-USDT 15-minute bars**, with an
optional 1-minute companion for `bar_magnifier` and lower-timeframe probes:

- `data/ohlcv_ETH-USDT-USDT_15m.csv` — primary 15m bar feed.
- `data/ohlcv_ETH-USDT-USDT_15m_warmup6m.csv` — same window plus six
months of leading warmup bars; used by default when present so TA, MTF,
pivot, and equity-feedback state starts closer to TradingView's chart
state.
- `data/ohlcv_ETH-USDT-USDT_1m.csv` and `..._1m_warmup6m.csv` — 1m feed
for `magnifier-*` and `ltf-*` probes.

## Layout

```
corpus/
├── validation/ 246 probes — surface-driven probe family
│ ├── ta-* 50 probes — TA built-in math (rsi, macd, sma, ...)
│ ├── order-* 40 probes — entry/exit/cancel placement
│ ├── bracket-* 13 probes — TP/SL via strategy.exit / strategy.order
│ ├── udt-* 22 probes — user-defined types + methods
│ ├── mtf-* 15 probes — request.security regular HTF
│ ├── matrix-* 6 probes — matrix typed/generic
│ ├── analyzer-* 6 probes — engine analyzer / parity isolation
│ ├── composite-* 52 probes — multi-surface integration (community-style)
│ ├── pyramid-* 4 probes — pyramiding=N
│ ├── oca-* 3 probes — OCA group cancel/reduce/none
│ ├── magnifier-* 3 probes — bar_magnifier sub-bar walks
│ ├── ltf-* 2 probes — request.security_lower_tf arrays
│ ├── session-* 2 probes — session() / TZ / DST
│ ├── recompute-* 2 probes — calc_on_every_tick / TA recompute
│ ├── na-* 2 probes — na propagation
│ ├── cap-* 2 probes — intraday cap (max_intraday_filled_orders)
│ ├── barstate-* 2 probes — barstate.* checks
│ ├── vwap-* 2 probes — VWAP band pricing / fills
│ ├── risk-* 1 probe — risk gates / limits
│ ├── stats-* 1 probe — performance stats / reporting
│ ├── symbol-* 1 probe — ticker/symbol specification mapping
│ ├── timeframe-* 1 probe — script_tf/input_tf timeframe handling
│ └── anomaly-* 1 probe — documented TV non-determinism
├── data/ reference OHLCV (Binance ETH-USDT-USDT 15m + 1m)
├── LICENSE Apache-2.0
├── NOTICE attribution
├── LEGAL.md provenance / trademarks
├── README.md this file
├── CMakeLists.txt per-strategy .so build glob
├── .gitignore ignores compiled strategy.dylib/.so/.dll only
├── validation_report.md canonical parity disposition, regenerated each sweep
└── validation_report.{html,pdf} rendered from .md
```

Total: **246** probes.

## Naming convention

Every probe directory follows:

```
--NN[a-z]?
```

- **``** — one of the 19 surface categories below. The category
is the engine surface or PineScript feature the probe is built to
exercise.
- **``** — kebab-case description of the specific
behaviour under test (e.g. `atr-trail-series-int-points`,
`kalman-filter-1d`, `bb-kc-squeeze-release`).
- **`NN`** — two-digit sequence number, used to disambiguate when more
than one probe lands on the same `(category, slug)` pair.
- **`[a-z]?`** — optional letter suffix, used **only** for documented
A/B variant pairs that share the same numeric slot (e.g.
`barstate-isconfirmed-magnifier-on-01a` vs
`…-magnifier-off-01b`).

The 23 categories (with probe counts):

| Category
| -----------
| `composite` |
| `ta`
| `order`
| `udt`
| `mtf`
| `bracket` |
| `matrix` |
| `analyzer` |
| `pyramid` |
| `oca`
| `magnifier` |
| `session` |
| `recompute` |
| `na`
| `ltf`
| `cap`
| `barstate` |
| `vwap`
| `risk`
| `stats`
| `symbol` |
| `timeframe` |
| `anomaly` | | Count | Surface exercised | | ----: | ---------------------------------------------------------- | 52 | Multi-surface integration probes (community-style scripts) | | 50 | TA built-in math (rsi, macd, sma, hma, …) | | 40 | Entry/exit/cancel order placement | | 22 | User-defined types + methods | | 15 | `request.security` regular HTF | 13 | TP/SL via `strategy.exit` / `strategy.order` | 6 | `matrix` typed and generic | 6 | Engine analyzer / parity isolation | 4 | `pyramiding=N` | | 3 | OCA group cancel / reduce / none | 3 | `bar_magnifier` sub-bar walks | 2 | `session()` / TZ / DST | 2 | `calc_on_every_tick` / TA recompute | | 2 | `na` propagation | | 2 | `request.security_lower_tf` arrays | | 2 | Intraday cap (`max_intraday_filled_orders`) | 2 | `barstate.*` checks | | 2 | VWAP band pricing / fills | | 1 | risk gates / limits | | 1 | performance stats / reporting | 1 | ticker/symbol specification mapping | 1 | script_tf/input_tf timeframe handling | 1 | Documented TV non-determinism |

## Where the numbers come from

The headline figure is produced by the verifier sweep that emits
[`validation_report.md`](validation_report.md). That report is the
authoritative disposition for every probe — tier, profile, per-dimension
deltas, anomaly verdicts.

The full pipeline (build + run + verify across the whole tree) is one
command:

```bash
JOBS=8 scripts/run_corpus.sh
```

That script:

1. Configures CMake with `-DPINEFORGE_BUILD_CORPUS_STRATEGIES=ON`.
2. Builds `libpineforge.a` plus one `strategy.so` per probe via
`cmake --build build --target corpus_strategies`.
3. Loads each `strategy.so` through `scripts/run_strategy.py`, runs it
against `corpus/data/ohlcv_ETH-USDT-USDT_15m.csv` (or the warmup
variant when present), and writes `engine_trades.csv` next to the
probe.
4. Runs `scripts/verify_corpus.py --all` to produce the report.

## Reproducing parity locally

No transpiler access required — `generated.cpp` ships in-tree.

```bash
# 1. Clone the engine and pull this corpus submodule
git clone https://github.com/fullpass-4pass/pineforge-engine.git
cd pineforge-engine
git submodule update --init corpus

# 2. Build all per-strategy .so files, run them, and verify
JOBS=8 scripts/run_corpus.sh
```

You need the engine repo, this corpus, and a C++17 compiler. The engine
is deterministic given a fixed bar feed, the shipped `generated.cpp`,
and a fixed runtime build. If a probe's rebuilt `engine_trades.csv`
disagrees with the committed copy, that is a bug — please open an issue.

## CSV format

Both `tv_trades.csv` and `engine_trades.csv` use TradingView's row layout:

- **Two rows per trade**, sharing the same `Trade #`. The exit row is
emitted before the entry row (TV convention; PineForge mirrors it for
direct diff).
- **Reverse-chronological by trade number** (newest first).
- **Time format**: `YYYY-MM-DD HH:MM`. Engine CSVs are UTC. TradingView
exports use the chart's wall-clock timezone; this corpus defaults to
UTC+8 unless a probe `inputs.json` overrides `tv_trades_csv_tz`.

`tv_trades.csv` (TradingView's actual export):

```
Trade #,Type,Date and time,Signal,Price USDT,Position size (qty),...
14,Exit long,2026-04-27 20:30,TPSL,2291.52,1,2291.52,...
14,Entry long,2026-04-27 20:15,…,2289.7,1,2289.7,…
```

`engine_trades.csv` (PineForge's mirrored format, fewer columns —
PineForge does not currently emit TV's "Signal" tag or percent-of-position
MFE/MAE):

```
Trade #,Type,Date and time,Price,Qty,Net PnL,Net PnL %,MFE,MAE,Cumulative PnL
14,Exit long,2026-04-27 20:30,2291.520000,1,1.820000,0.0795,2.250000,0.160000,-38.120000
14,Entry long,2026-04-27 20:15,2289.700000,1,1.820000,0.0795,2.250000,0.160000,-38.120000
```

`Net PnL` and `Net PnL %` are per-trade. `Cumulative PnL` is the
engine-side running total. `MFE`/`MAE` are dollars per unit qty,
mirroring TV's "Favorable / Adverse excursion USD".

## Parity thresholds

The verifier (`scripts/verify_corpus.py`) applies one of two threshold
profiles per probe and emits a tier label:

### Profiles

| Dimension | STRICT | PRODUCTION |
| -------------------------- | ------: | ---------: |
| Trade-count delta | 1.0% | 1.0% |
| Entry-price p90 delta | 0.01% | 0.01% |
| Exit-price p90 delta | 0.01% | 0.05% |
| Per-trade P&L p90 delta | 1.0% | 1.0% |

PRODUCTION relaxes only the exit-price tolerance (5×) to absorb sub-bar
broker-side fill drift on probes that use `strategy.exit`. The verifier
auto-detects `strategy.exit` in `strategy.pine` and selects PRODUCTION
for those probes; everything else stays on STRICT.

A trade is "matched" when engine and TV agree on direction and entry/
exit times fall within a 1-hour gating window (plus a $3 entry-price
gate to defend against same-bar duplicates). The PnL p90 calc applies a
near-zero filter (`|tv_pnl| > $0.01`) to avoid div-by-near-zero blow-up
on TV's magnifier zero-PnL trades.

### Tier labels

| Tier | Meaning |
| ------------- | ------- |
| `excellent` | All four dimensions pass the resolved profile. Bit-for-bit or within strict-profile thresholds. |
| `strong` | Dimensions pass a relaxed envelope (5× thresholds) — close but not excellent. Used as a pass-with-caveat tier. |
| `moderate` | Some dimensions exceed the strong envelope but trades still align meaningfully. Investigate. |
| `weak` | Significant divergence. Real bug or probe-design issue. |
| `minimal` | Probe produces zero engine trades or zero TV trades — nothing to compare. |
| `anomaly` | Engine output is correct per Pine spec; TV is non-deterministic on this probe. Documented per-probe via `inputs.json::expected_tier: "anomaly"` plus a `notes` field with the deep-analysis write-up. Excluded from headline excellent count. Currently 1 probe (`anomaly-equity-mirror-strategy-equity-01`). |
| `engine_only` | Engine produces correct trades that intentionally diverge from TV (e.g., engine fires a bar TV's broker emulator silently drops). Documented per-probe via `inputs.json::validation_overrides::expect_tv_match: false` plus an `expect_tv_match_reason` write-up. Currently 0 probes. |
| `missing` | Required artefact (TV CSV or engine CSV) absent. Should never appear in committed state. |

The `anomaly` and `engine_only` overrides only fire when the computed
tier would be below `excellent` — a future engine fix that lifts a
documented divergence to bit-for-bit match still reports as `excellent`,
not silently masked.

## Publishing posture

The corpus is published under **Apache-2.0**, matching the engine. Every
`strategy.pine` is a clean-room PineForge original — no third-party
PineScript is redistributed. TradingView trade-list CSVs are factual
records of running each script on TV's broker emulator, included only for
parity verification. OHLCV is public market data from Binance USDT-M
futures. See [`LEGAL.md`](LEGAL.md) for the full provenance and trademark
notes.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pineforge-4pass/pineforge-corpus

Awesome Lists containing this project

README