{"id":43199732,"url":"https://github.com/anuroopvj/raiiaf","last_synced_at":"2026-02-01T06:01:39.046Z","repository":{"id":331601722,"uuid":"1124491983","full_name":"AnuroopVJ/RAIIAF","owner":"AnuroopVJ","description":"RAIIAF is a binary container format aimed at increased reproducibility for AI-generated images.","archived":false,"fork":false,"pushed_at":"2026-02-01T04:15:15.000Z","size":2853,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-01T05:22:20.985Z","etag":null,"topics":["file-format","file-format-library","python-library","python3"],"latest_commit_sha":null,"homepage":"https://anuroopvj.github.io/RAIIAF/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AnuroopVJ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-29T05:41:46.000Z","updated_at":"2026-02-01T04:13:12.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AnuroopVJ/RAIIAF","commit_stats":null,"previous_names":["anuroopvj/gen5","anuroopvj/raiiaf"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/AnuroopVJ/RAIIAF","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuroopVJ%2FRAIIAF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuroopVJ%2FRAIIAF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuroopVJ%2FRAIIAF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuroopVJ%2FRAIIAF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AnuroopVJ","download_url":"https://codeload.github.com/AnuroopVJ/RAIIAF/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuroopVJ%2FRAIIAF/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28970194,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T05:48:53.985Z","status":"ssl_error","status_checked_at":"2026-02-01T05:47:55.855Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["file-format","file-format-library","python-library","python3"],"created_at":"2026-02-01T06:01:37.900Z","updated_at":"2026-02-01T06:01:39.028Z","avatar_url":"https://github.com/AnuroopVJ.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# .raiiaf - Rich AI Image Artifact Format\n\n![bannerimage](raiiaf_img.png)\n[![Build and Deploy Docs](https://github.com/AnuroopVJ/RAIIAF/actions/workflows/docs.yml/badge.svg)](https://github.com/AnuroopVJ/RAIIAF/actions/workflows/docs.yml)\n[![CI](https://github.com/AnuroopVJ/RAIIAF/actions/workflows/ci.yml/badge.svg)](https://github.com/AnuroopVJ/RAIIAF/actions/workflows/ci.yml)\n![Python](https://img.shields.io/badge/python-3.9%2B-blue)\n![PyPI](https://img.shields.io/pypi/v/raiiaf)\n\n## Features\n- Noise latent tensor storage\n- Rich AI-native metadata that includes:\n   \n      - Model name and version\n      - Prompt  \n      - Tags  \n      - Hardware information  \n      - Generation settings\n- Environment info stored automatically as hashed canonical strings\n- Issues warnings if environment drift is detected!\n\n## Storage Efficiency Benchmark (Full Latent Tensor)\n\nWe evaluate the storage overhead of different industry-standard strategies for embedding large AI metadata by comparing how the same image and the same latent tensor are stored across multiple file formats.\n\n### Experimental Setup\n- **Images:** 5 PNG images\n- **Latent tensor:** Shape (1, 4, 64, 64), approximately 89 KB\n- **Metadata:** Identical semantic metadata across all formats\n- **raiiaf implementation:** Official `raiiaf` API (v0.1.0), no mocks\n- **Metric:** Relative file size overhead compared to the raw PNG baseline\n\n### Compared Storage Strategies\n- **Raw PNG:** Image only (baseline)\n- **PNG + Embedded XMP:** Latent tensor serialized as XMP and embedded inside the PNG\n- **PNG + XMP Sidecar:** Latent tensor stored in a separate `.xmp` file alongside the PNG\n- **raiiaf (.raiiaf):** Single-file, binary container storing both image and latent tensor\n\n### Results\nThe average file sizes and relative overheads are summarized below:\n\n| Format | Avg. Size (KB) | Avg. Overhead (%) |\n|------|---------------|-------------------|\n| Raw PNG | 1708.3 | – |\n| RAIIAF (.raiiaf) | 1739.3 | 1.8 |\n| PNG + Embedded XMP | 1797.9 | 5.2 |\n| PNG + XMP Sidecar | 1795.2 | 5.1 |\n\n![Storage overhead comparison for different metadata strategies](paper/graph.png)\n\n### Interpretation\nFor the same image and identical latent tensor, XMP-based workflows incur approximately 5.1–5.2% storage overhead, regardless of whether the metadata is embedded or stored as a sidecar file. In contrast, raiiaf introduces only ~1.8% overhead.\n\nThis corresponds to a ~3.3% absolute reduction in file size and approximately 65% lower relative metadata overhead compared to standard XMP-based approaches, while preserving a single-file workflow.\n\nThese results indicate that a binary, AI-native container can store large latent tensors more space-efficiently than XML-based metadata strategies under identical conditions.\n\n# Why not just use EXIF/XMP/Sidecar files?\nHere is an emperical comparison:\n\n| Aspect | EXIF / XMP (Custom Metadata) | raiiaf |\n|------|-------------------------------|------|\n| Schema enforcement | Convention-based, unenforced | 🟢 Canonical, versioned schema |\n| Semantic consistency | Low; tag drift common | 🟢 High; fixed fields + chunk types |\n| Latent representation | Not supported (text hacks) | 🟢 Native latent chunks (binary-safe) |\n| Environment capture | Ad-hoc text notes | 🟢 Explicit env chunks (model, seed, hw, libs) |\n| Reproducibility ceiling | Limited, state incomplete | 🟢 High; full generation state captured |\n| Data typing | Weak (string-heavy XML) | 🟢 Strong typing (binary, arrays, structs) |\n| Extensibility | Easy but uncontrolled | 🟡 Controlled; safer but slower evolution |\n| Tooling ecosystem | Mature, ubiquitous | 🔴 Immature, raiiaf-specific tools needed |\n| Interoperability | Works almost everywhere | 🔴 Breaks without raiiaf-aware readers |\n| Failure mode | Metadata silently ignored | 🟡 Metadata explicit but unreadable without tooling |\n\nAlthough XMP has the ability to embed any binary payload, doing so necessitates ad hoc conventions, manual validation, and cautious handling to prevent silent data loss. By formalizing these practices into a first-class, schema-enforced representation, raiiaf lessens failure modes and implementation burden.\n\na comparison with sidecar files: \n\n| Aspect | Where Sidecar (XMP) shines / empirical behavior | Where raiiaf shines / empirical behavior |\n|---|---|---|\n| Non-destructive editing | 🟢 Sidecars allow metadata edits without touching the original asset (important for read-only/ RAW workflows). Many DAMs expose \"write to sidecar only\" modes. | 🟡 raiiaf typically embeds state into a container; editing metadata might require raiiaf-aware tooling. Good for integrity but less “drop-in” non-destructive edit unless you design a sidecar-style raiiaf wrapper. |\n| Support \u0026 tooling (read/write) | 🟢 Very broad: exiftool, Lightroom, digikam and many tools understand XMP and sidecars; exiftool can create and sync sidecars. This is production proven.| 🔴 Immature: raiiaf requires custom readers/writers; integration work required. This is the main adoption barrier (engineering cost). |\n| Round-trip fidelity (read-\u003emodify-\u003ewrite-\u003eread) | 🔴 Variable: many apps will normalize/sync/overwrite XMP fields; different tools may not round-trip custom blobs reliably (risk of normalization or loss). Empirical reports of inconsistent XMP syncing and sidecar issues exist. | 🟢 High if tools are spec-compliant: raiiaf’s schema + chunk validation lets compliant tools preserve unknown chunks and guarantee round-trip fidelity (design goal). |\n| Orphaning \u0026 file management | 🔴 Sidecars can be orphaned (moved, renamed, or not uploaded); cloud/backup processes often miss them — real world bug reports and GH issues. | 🟢 raiiaf embeds state into the artifact (single file), eliminating the orphaning problem — better for long-term datasets and archives. |\n| Resilience to platform stripping | 🔴 Both can be stripped in public pipelines; embedded XMP is sometimes removed by social platforms and services. Sidecars are even more fragile because many uploaders ignore sidecar files. | 🟡 raiiaf helps when you control the pipeline (archives, datasets). For public publishing (social platforms) nothing is guaranteed unless the platform preserves custom blocks — but embedding reduces the chance of separate-file loss. |\n| Validation \u0026 semantics | 🔴 XMP allows arbitrary namespaces; no enforcement — different users/tools will store semantically identical things under different keys (fragmentation). Empirical evidence of inconsistent metadata across tools.| 🟢 raiiaf enforces typed schema, versioning, and chunk semantics → machine-verifiable metadata (reduces semantic drift). This is the core reproducibility gain. |\n| Indexing \u0026 large-scale querying | 🟡 Sidecars can be indexed when present, but inconsistent field names and scattered sidecars complicate large dataset indexing.| 🟢 raiiaf’s structured schema makes programmatic indexing and deterministic query semantics straightforward — ideal for datasets and benchmarks. |\n| Forensic reproducibility (latents + env) | 🔴 Sidecars *can* carry execution state but lack canonical typing; implementations vary, and many tools will not preserve or validate these blobs — proven brittle in practice. | 🟢 raiiaf explicitly models latents + env as first-class chunks; enables replayability and causal experiments (the main scientific advantage). |\n| Failure modes | 🟡 Sidecar failure mode = orphaning or silent loss (metadata disappears silently). Empirical reports show users losing sidecars in real workflows. | 🟢 raiiaf failure mode = unreadable/unsupported file (loud failure) — preferable for research because you detect lack of tooling. |\n| Best practical use-cases | 🟢 Sidecars: photographers, mixed toolchains, read-only RAW editing, workflows that must avoid touching assets.| 🟢 raiiaf: datasets, reproducible experiments, forensic archives, research pipelines where exact replayability matters (latent + env capture).|\n\n## Installation\nJust pip install the package!\n```bash\npip install raiiaf\n```\n## Usage\nimport the classes\n```python\nfrom raiiaf.main import raiiafFileHandler\n```\nFirst you need to instantiate the raiiafFileHandler class.\n```python\nraiiaf = raiiafFileHandler()\n```\n\n# Encoding\n!!! danger\n    **DISCLAIMER**:\n    The encoder expects **NumPy arrays**.  \n    If you use PyTorch tensors, convert them with `.detach().cpu().numpy()`.\n\n```python\nfrom raiiaf.main import raiiafFileHandler\n\nraiiaf = raiiafFileHandler()\ninitial_noise_tensor = torch.randn(batch_size, channels, height, width)\nlatent = {\n    \"initial_noise\": initial_noise_tensor.detach().cpu().numpy() #The encoder expects numpy array not a torch tensor object\n}\nbinary_img_data = raiiaf.png_to_bytes(r'path/to/image.png') # use the helper function to convert image to bytes\n\nraiiaf.file_encoder(\n    filename=\"encoded_img.raiiaf\", # The .raiiaf extension is required!\n    latent=latent,# initial latent noise\n    chunk_records=[],\n    model_name=\"Stable Diffusion 3\",\n    model_version=\"3\", # Model Version\n    prompt=\"A puppy smiling, cinematic\",\n    tags=[\"puppy\",\"dog\",\"smile\"],\n    img_binary=binary_img_data,\n    convert_float16=False, # whether to convert input tensors to float16\n    generation_settings={\n        \"seed\": 42,\n        \"steps\": 20,\n        \"sampler\": \"ddim\",\n        \"cfg_scale\": 7.5,\n        \"scheduler\": \"pndm\",\n        \"eta\": 0.0,\n        \"guidance\": \"classifier-free\",\n        \"precision\": \"fp16\",\n        \"deterministic\": True\n    },\n    hardware_info={\n        \"machine_name\": \"test_machine\",\n        \"os\": \"linux\",\n        \"cpu\": \"Intel\",\n        \"cpu_cores\": 8, # minimum 1\n        \"gpu\": [{\"name\": \"RTX 3090\", \"memory_gb\": 24, \"driver\": \"nvidia\", \"cuda_version\": \"12.1\"}],\n        \"ram_gb\": 64.0,\n        \"framework\": \"torch\",\n        \"compute_lib\": \"cuda\"\n    }\n)\n```\n\n# Decoding\n```python\ndecoded = raiiaf.file_decoder(filename)\n# now to save the metadata\nmetadata = decoded[\"metadata\"][\"raiiaf_metadata\"]\n\n# to just get specific metadata blocks\nmodel_info = decoded[\"metadata\"][\"raiiaf_metadata\"][\"model_info\"]\n\n# to save decoded metadata to a json file\nwith open(\"decoded_metadata.json\", \"w\") as f:\n    json.dump(decoded[\"metadata\"], f, indent=2)\n\n# to save just the image_binary as png\nimage_bytes = decoded[\"chunks\"].get(\"image\")\nif image_bytes is not None:\n    img = Image.open(io.BytesIO(image_bytes))\n    img.save(\"decoded_image.png\")\n```\n\n# LICENSE\nMIT\n# Contribution\nPlease refer to the CONTRIBUTING.md filein the repo\n# Documentation\nFull docs: https://anuroopvj.github.io/RAIIAF\n\n# Future improvements\n- Supporting other frameworks and utilities in the EnvChunk\n- Reducing File Size\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanuroopvj%2Fraiiaf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanuroopvj%2Fraiiaf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanuroopvj%2Fraiiaf/lists"}