https://github.com/samuraiwriter7/purity-detection-algorithm-v0.1

A draft algorithmic specification for estimating origin purity, AI-generated ratio, warning flags, and review readiness in AI source-preservation systems.
https://github.com/samuraiwriter7/purity-detection-algorithm-v0.1

ai-governance ai-provenance civilization-os epicenter-preservation model-collapse purity-protocol rag-provenance royalty-os source-preservation trace-protocol

Last synced: 3 days ago
JSON representation

A draft algorithmic specification for estimating origin purity, AI-generated ratio, warning flags, and review readiness in AI source-preservation systems.

Host: GitHub
URL: https://github.com/samuraiwriter7/purity-detection-algorithm-v0.1
Owner: SamuraiWriter7
License: mit
Created: 2026-05-25T01:49:11.000Z (17 days ago)
Default Branch: main
Last Pushed: 2026-05-25T02:11:23.000Z (17 days ago)
Last Synced: 2026-05-25T03:25:39.673Z (17 days ago)
Topics: ai-governance, ai-provenance, civilization-os, epicenter-preservation, model-collapse, purity-protocol, rag-provenance, royalty-os, source-preservation, trace-protocol
Homepage:
Size: 24.4 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

# Epicenter Preservation OS Protocol v0.1

**Status:** Draft v0.1.1
**Version:** 0.1.1
**Release Date:** 2026-05-25

Epicenter Preservation OS Protocol is a draft specification for preserving primary sources, reference traces, value circulation, and model-collapse monitoring in AI civilization.

This protocol defines a minimal structure for identifying and preserving **Epicenters** — primary sources of thought, expression, observation, practice, and structural invention — while making AI reference events, purity signals, and royalty-readiness review more traceable.

The goal is not to prohibit AI-assisted creation.

The goal is to preserve the source ecology of AI civilization.

---

## Concept

AI systems increasingly depend on human-created sources:

- articles
- books
- notes
- logs
- datasets
- talks
- protocols
- structural concepts
- firsthand observations

However, when AI systems consume, summarize, rewrite, and recursively reuse these sources, the original Epicenter can become difficult to identify.

This creates several risks:

- primary sources become invisible
- reference events are not recorded
- creators are not credited or compensated
- synthetic data begins to replace natural data
- model-collapse risk increases
- AI civilization loses source diversity

Epicenter Preservation OS Protocol proposes a minimal structure for addressing these risks.

```text
Epicenter
↓
ReferenceEvent
↓
TraceProtocol
↓
PurityProtocol
↓
CollapseMonitor
↓
RoyaltyProtocol / Allocation Review
```

---

## Core Idea

The protocol separates five layers:

1. **Epicenter Layer**
Identifies the original source unit.

2. **Reference Layer**
Records when an AI system references or uses an Epicenter.

3. **Purity Layer**
Estimates the origin composition of a source, including natural, synthetic, hybrid, or recursive synthetic signals.

4. **Collapse Monitoring Layer**
Observes ecosystem-level risks caused by declining natural-data ratios or recursive synthetic reuse.

5. **Royalty / Value Circulation Layer**
Provides a path toward auditable value return, without directly automating final allocation decisions.

---

## Design Principles

### 1. Preserve the Source

AI civilization depends on primary sources.

If the source layer collapses, downstream AI systems become increasingly shallow, repetitive, and derivative.

### 2. Trace Before Allocation

Reference events must be recorded before any royalty or compensation logic is applied.

```text
No trace
↓
No reliable allocation
```

### 3. Purity Is Not Moral Judgment

Origin purity is not a measure of artistic value, legal authorship, or human worth.

It is an ecological signal for understanding data composition.

```text
Purity ≠ Value
Purity ≠ Copyright
Purity ≠ Authorship
Purity = estimated origin composition
```

### 4. AI-Assisted Creation Is Not Invalid

The protocol does not reject AI-assisted work.

Instead, it distinguishes:

- human-origin sources
- AI-assisted sources
- hybrid sources
- synthetic sources
- recursively generated sources

### 5. Review Before Royalty

Purity scores and trace records should support review.

They should not automatically determine legal ownership or final payment.

---

## Repository Structure

```text
.
├── README.md
├── LICENSE
├── CHANGELOG.md
├── CITATION.cff
├── spec/
│ └── epicenter-preservation-os-v0.1.yaml
├── schemas/
│ ├── epicenter-preservation-os.schema.json
│ └── purity-assessment.schema.json
├── docs/
│ └── purity-detection-algorithm.md
├── examples/
│ ├── epicenter.sample.yaml
│ ├── reference-event.sample.yaml
│ ├── royalty-record.sample.yaml
│ ├── model-profile.sample.yaml
│ └── purity-assessment.sample.yaml
└── .github/
└── workflows/
└── validate-examples.yml
```

---

## Key Documents

### `spec/epicenter-preservation-os-v0.1.yaml`

Defines the core entities and protocol layers of the Epicenter Preservation OS, including:

- `Epicenter`
- `ReferenceEvent`
- `RoyaltyRecord`
- `ModelProfile`
- `TraceProtocol`
- `RoyaltyProtocol`
- `PurityProtocol`
- `CollapseMonitor`

---

### `docs/purity-detection-algorithm.md`

Defines the draft algorithmic framework for estimating:

- `origin_purity_score`
- `ai_generated_ratio`
- `warning_flags`

This document supports:

- natural / synthetic data separation
- hybrid data classification
- recursive synthetic data risk detection
- model-collapse monitoring
- reference auditing
- royalty-readiness review

---

### `schemas/epicenter-preservation-os.schema.json`

Provides a simplified machine-readable schema for validating the main protocol objects.

---

### `schemas/purity-assessment.schema.json`

Provides a machine-readable validation schema for purity assessment records, including:

- signal scores
- confidence values
- warning flags
- review status
- downstream-use permissions

---

### `examples/`

Contains sample objects for:

- Epicenter registration
- AI reference logging
- royalty records
- model profiles
- purity assessment results

---

## Start Here

The recommended reading order is:

### 1. Core Protocol

Start with:

```text
spec/epicenter-preservation-os-v0.1.yaml
```

This file defines the main structure of the protocol.

It introduces the core entities:

- `Epicenter`
- `ReferenceEvent`
- `RoyaltyRecord`
- `ModelProfile`

And the core protocol layers:

- `TraceProtocol`
- `RoyaltyProtocol`
- `PurityProtocol`
- `CollapseMonitor`

---

### 2. Purity Detection Algorithm

Then read:

```text
docs/purity-detection-algorithm.md
```

This document explains how the system may estimate the origin purity of an Epicenter.

It defines:

- natural data
- synthetic data
- hybrid data
- recursive synthetic data
- `origin_purity_score`
- `ai_generated_ratio`
- `warning_flags`
- human / multi-wing review triggers

This layer is especially important because the protocol does not treat AI-assisted creation as invalid.

Instead, it separates ecological data-quality signals from legal authorship, copyright ownership, or final compensation decisions.

---

### 3. Purity Assessment Example

Review:

```text
examples/purity-assessment.sample.yaml
```

This file shows how a purity assessment result may be represented in practice.

It includes:

- input signal scores
- score basis notes
- `origin_purity_score`
- `ai_generated_ratio`
- `warning_flags`
- review status
- downstream-use decisions

---

### 4. Schema Layer

Review:

```text
schemas/epicenter-preservation-os.schema.json
schemas/purity-assessment.schema.json
```

These files provide machine-readable validation structures for the core protocol objects and purity assessment outputs.

---

### 5. Example Objects

Finally, review the remaining sample files in:

```text
examples/
```

These examples show how Epicenters, reference events, royalty records, model profiles, and purity assessment results may be represented in practice.

---

## Validation

This repository includes a GitHub Actions workflow for validating example files against their corresponding JSON Schemas.

The validation workflow is defined in:

```text
.github/workflows/validate-examples.yml
```

Current validation targets:

```text
examples/purity-assessment.sample.yaml
↓
schemas/purity-assessment.schema.json
```

The workflow checks that the sample purity assessment object conforms to the schema definition, including:

- required fields
- score ranges from `0.0` to `1.0`
- allowed `method` values
- allowed `warning_flags`
- review status structure
- downstream-use permission structure
- ISO 8601 date-time format for `assessed_at`

This validation layer supports the `PurityProtocol` defined in the core protocol by ensuring that `origin_purity_score`, `ai_generated_ratio`, and `warning_flags` can be represented in a machine-readable and testable format.

To run the validation automatically, push changes to the `main` branch or open a pull request.

The workflow can also be triggered manually from the GitHub Actions tab.

### Local Validation Concept

The GitHub Actions workflow uses:

```text
Python 3.12
jsonschema
PyYAML
```

The validation process is:

```text
Load YAML example
↓
Load JSON Schema
↓
Validate example against schema
↓
Report pass / fail
```

If validation fails, the workflow prints the failing field path and the corresponding schema error.

This makes the protocol easier to maintain as the specification evolves.

---

## Core Entities

### Epicenter

An Epicenter is a primary source unit.

It may represent:

- an article
- a note
- a book
- a talk
- a log
- a dataset
- a protocol
- a structural concept

Example fields include:

- `id`
- `type`
- `source_platform`
- `author_id`
- `created_at`
- `updated_at`
- `language`
- `tags`
- `structure_fingerprint`
- `origin_purity_score`
- `ai_generated_ratio`
- `metadata`

---

### ReferenceEvent

A ReferenceEvent records when an AI system references, uses, retrieves, trains on, or reasons from an Epicenter.

Example fields include:

- `id`
- `model_id`
- `epicenter_id`
- `timestamp`
- `context_window_hash`
- `usage_type`
- `request_id`
- `trace_id`
- `weight`

Supported usage types may include:

- `training`
- `inference`
- `rag`
- `fine_tune`

---

### RoyaltyRecord

A RoyaltyRecord represents a possible value-circulation record derived from reference activity.

Example fields include:

- `id`
- `epicenter_id`
- `creator_id`
- `total_reference_weight`
- `amount`
- `period`
- `settlement_status`
- `settlement_channel`

This object does not automatically prove legal payment entitlement.

It represents a candidate accounting layer for future review and settlement.

---

### ModelProfile

A ModelProfile describes model-side purity and source-dependency characteristics.

Example fields include:

- `id`
- `provider`
- `type`
- `training_data_purity`
- `collapse_risk_score`
- `epicenter_dependency_index`

This object helps monitor whether AI systems remain connected to diverse primary sources or drift toward recursive synthetic dependency.

---

## Protocol Layers

### TraceProtocol

Defines the minimum requirements for recording reference events.

Required fields include:

- `ReferenceEvent.id`
- `ReferenceEvent.epicenter_id`
- `ReferenceEvent.model_id`
- `ReferenceEvent.timestamp`
- `ReferenceEvent.usage_type`
- `ReferenceEvent.trace_id`

Expected guarantees include:

- immutability
- auditability
- creator accessibility
- regulator accessibility

---

### PurityProtocol

Defines how the protocol may estimate the origin composition of an Epicenter.

Input signals may include:

- `ai_generated_ratio`
- structure-fingerprint similarity to known AI patterns
- author self-declaration
- provenance evidence
- revision lineage
- citation transparency
- structural originality

Outputs may include:

- `origin_purity_score`
- `ai_generated_ratio`
- `warning_flags`

The detailed algorithm is defined in:

```text
docs/purity-detection-algorithm.md
```

---

### CollapseMonitor

Defines how ecosystem-level risk may be monitored.

Inputs may include:

- model training-data purity
- synthetic-data ratio
- natural-data ratio
- reference statistics
- collapse-risk score
- Epicenter production decline

Possible alerts include:

- `natural_data_ratio_below_threshold`
- `epicenter_production_decline`
- `model_collapse_risk_high`

---

### RoyaltyProtocol

Defines an abstract structure for aggregating reference weight into possible value-circulation records.

A draft formula may look like:

```text
Σ(ReferenceEvent.weight.value) × rate_table[usage_type]
```

Possible modifiers include:

- `origin_purity_score`
- `ai_generated_ratio`
- review status
- dispute status
- platform policy
- creator permission

RoyaltyProtocol is intentionally abstract in v0.1.1.

It should not be treated as a final legal or financial settlement mechanism.

---

## Example Flow

```text
1. A creator publishes an original article.
↓
2. The article is registered as an Epicenter.
↓
3. The Epicenter receives a structure fingerprint and metadata.
↓
4. An AI system references the Epicenter through RAG or inference.
↓
5. A ReferenceEvent is logged.
↓
6. A purity assessment estimates origin composition.
↓
7. CollapseMonitor uses aggregated purity signals.
↓
8. RoyaltyProtocol may prepare candidate value-circulation records.
↓
9. Human / multi-wing review determines readiness.
```

---

## Non-Goals

This repository does not attempt to:

- define final copyright ownership
- prove legal authorship
- automatically determine payment
- ban AI-assisted creation
- classify all AI-generated content as low value
- replace human review
- provide a production-ready payment system
- define universal originality
- make moral judgments about creators or content

This protocol is a draft structural layer for traceability, review, and ecosystem-health monitoring.

---

## Relationship to Other Concepts

Epicenter Preservation OS Protocol may relate to:

- Trace Protocol
- Structure Fingerprint
- Royalty OS
- Allocation Readiness
- Model Collapse monitoring
- RAG provenance
- AI data governance
- creator compensation systems
- natural / synthetic data separation
- AI reference auditing

---

## Version History

See:

```text
CHANGELOG.md
```

Current version:

```text
0.1.1
```

### v0.1.1

Adds the PurityProtocol implementation layer:

- purity detection algorithm document
- purity assessment sample
- purity assessment JSON Schema
- GitHub Actions validation workflow
- README validation documentation

### v0.1.0

Initial draft protocol.

Defined the core entities and high-level protocol layers:

- `Epicenter`
- `ReferenceEvent`
- `RoyaltyRecord`
- `ModelProfile`
- `TraceProtocol`
- `RoyaltyProtocol`
- `PurityProtocol`
- `CollapseMonitor`

---

## Citation

If you use this specification, please cite it using:

```text
CITATION.cff
```

---

## License

This repository is released under the license defined in:

```text
LICENSE
```

---

## Summary

Epicenter Preservation OS Protocol v0.1.1 defines a draft structure for preserving primary sources in AI civilization.

It connects:

```text
source preservation
↓
reference tracing
↓
purity assessment
↓
collapse monitoring
↓
value-circulation readiness
```

The protocol does not claim to solve legal authorship or compensation automatically.

Instead, it provides a machine-readable foundation for preserving the source layer that AI civilization depends on.

In short:

```text
If AI civilization is a river,
Epicenters are the springs.
This protocol is a first draft of the water-source registry.
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/samuraiwriter7/purity-detection-algorithm-v0.1

Awesome Lists containing this project

README