{"id":49368219,"url":"https://github.com/yaniv2809/fixtureforge","last_synced_at":"2026-04-27T21:00:51.683Z","repository":{"id":351159566,"uuid":"1152710308","full_name":"Yaniv2809/fixtureforge","owner":"Yaniv2809","description":"Agentic test data harness for Python — deterministic in CI, AI-powered in dev. pytest plugin included.","archived":false,"fork":false,"pushed_at":"2026-04-17T19:38:28.000Z","size":1415,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T18:07:01.063Z","etag":null,"topics":["ai","anthropic","developer-tools","faker","fixtures","gemini","llm","openai","pydantic","pytest","python","synthetic-data","test-data","testing"],"latest_commit_sha":null,"homepage":"https://yaniv2809.github.io/fixtureforge/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yaniv2809.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-08T10:02:15.000Z","updated_at":"2026-04-17T19:38:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Yaniv2809/fixtureforge","commit_stats":null,"previous_names":["yaniv2809/fixtureforge"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Yaniv2809/fixtureforge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaniv2809%2Ffixtureforge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaniv2809%2Ffixtureforge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaniv2809%2Ffixtureforge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaniv2809%2Ffixtureforge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yaniv2809","download_url":"https://codeload.github.com/Yaniv2809/fixtureforge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaniv2809%2Ffixtureforge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32354574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"ssl_error","status_checked_at":"2026-04-27T20:07:00.910Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","anthropic","developer-tools","faker","fixtures","gemini","llm","openai","pydantic","pytest","python","synthetic-data","test-data","testing"],"created_at":"2026-04-27T21:00:30.660Z","updated_at":"2026-04-27T21:00:51.669Z","avatar_url":"https://github.com/Yaniv2809.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"fixtureforge-logo.png\" alt=\"FixtureForge Logo\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n# FixtureForge\n\n**Agentic Test Data Harness for Python.**  \nGenerate realistic, context-aware fixtures — deterministic in CI, AI-powered in development.\n\n[![PyPI version](https://img.shields.io/pypi/v/fixtureforge.svg)](https://pypi.org/project/fixtureforge/)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n---\n\n## The Problem\n\n```python\n# This is what most test data looks like:\nuser = User(name=\"Test User\", email=\"test@test.com\", bio=\"Lorem ipsum...\")\n\n# It doesn't catch real-world edge cases.\n# It doesn't feel like production data.\n# And writing 500 of them by hand? Not happening.\n```\n\nFixtureForge solves this in two modes:\n\n```python\n# CI mode — deterministic, zero AI, seed-controlled. Same seed = same data. Always.\nforge = Forge(use_ai=False, seed=42)\nusers = forge.create_batch(User, count=500)\n\n# Dev mode — AI-generated, context-aware, realistic\nforge = Forge()\nreviews = forge.create_batch(Review, count=50, context=\"angry holiday customers\")\n```\n\n---\n\n## Installation\n\n```bash\npip install fixtureforge\n```\n\nWith your preferred AI provider:\n\n```bash\npip install \"fixtureforge[anthropic]\"   # Claude\npip install \"fixtureforge[openai]\"      # GPT\npip install \"fixtureforge[gemini]\"      # Google Gemini\npip install \"fixtureforge[all]\"         # All providers\n```\n\n---\n\n## Quick Start\n\n```python\nfrom fixtureforge import Forge\nfrom pydantic import BaseModel\n\nclass User(BaseModel):\n    id: int\n    name: str\n    email: str\n    bio: str\n\nforge = Forge()  # auto-detects provider from env vars\nusers = forge.create_batch(User, count=50, context=\"SaaS platform users\")\n```\n\nThat's it. FixtureForge:\n- Assigns sequential IDs automatically\n- Generates `name` and `email` with Faker (zero API cost)\n- Sends only `bio` to the AI — in a single batch call for all 50 records\n\n---\n\n## Core Concepts\n\n### Intelligent Field Routing\n\nEvery field is classified into a tier. Only semantic fields hit the AI:\n\n| Tier | Fields | Generator | Cost |\n|------|--------|-----------|------|\n| **Structural** | `id`, `user_id`, `order_id` | Internal counters / FK registry | Free |\n| **Standard** | `name`, `email`, `phone`, `address`, `date` | Faker | Free |\n| **Computed** | `@computed_field` properties | Pydantic | Free |\n| **Semantic** | `bio`, `description`, `review`, `message` | LLM (batched) | API tokens |\n\n100 users with 2 semantic fields = **2 API calls**, not 200.\n\n### CI Mode vs Dev Mode\n\n```python\n# CI — fully deterministic, no network, reproducible\nforge = Forge(use_ai=False, seed=42)\n\n# Dev — AI-powered, realistic context\nforge = Forge(provider_name=\"anthropic\", model=\"claude-haiku-4-5-20251001\")\n\n# Large datasets — seed+interpolation, constant cost regardless of count\nforge.create_large(Order, count=100_000, seed_ratio=0.01)  # pays for ~1k, delivers 100k\n```\n\n### Verbose Mode\n\nSee exactly where each value comes from:\n\n```python\nforge = Forge(use_ai=False, seed=42, verbose=True)\nuser = forge.create(User)\n\n# [structural] id    = 1\n# [faker]      name  = 'Allison Hill'\n# [faker]      email = 'donaldgarcia@example.net'\n# [ai]         bio   = 'Passionate developer with 8 years...'\n```\n\n---\n\n## Providers\n\nFixtureForge auto-detects your provider from environment variables:\n\n```bash\nexport ANTHROPIC_API_KEY=...   # → Claude (default: claude-haiku-4-5-20251001)\nexport OPENAI_API_KEY=...      # → GPT    (default: gpt-4o-mini)\nexport GOOGLE_API_KEY=...      # → Gemini (default: gemini-2.0-flash)\nexport GROQ_API_KEY=...        # → Groq   (default: llama-3.3-70b-versatile)\n# No key? → Ollama (localhost:11434) → Deterministic-only\n```\n\nOr be explicit:\n\n```python\nforge = Forge(provider_name=\"anthropic\", model=\"claude-sonnet-4-6\")\nforge = Forge(provider_name=\"ollama\", model=\"llama3.2\")\nforge = Forge(use_ai=False)  # zero cost, zero network\n```\n\n---\n\n## Foreign Key Relationships\n\nRegister parent records first — child FKs resolve automatically:\n\n```python\n# Step 1: generate customers\ncustomers = forge.create_batch(Customer, count=10)\n\n# Step 2: orders automatically reference real customer IDs\norders = forge.create_batch(Order, count=100)\n# order.customer_id → always a valid customer.id\n```\n\n---\n\n## DataSwarms — Parallel Multi-Model Generation\n\nGenerate multiple models in parallel with shared AI cache.  \nThe first model warms the cache; every subsequent model inherits it (~90% cheaper per model).\n\n```python\nresults = forge.swarm(\n    models=[User, Order, Product, Payment],\n    counts=[10,   50,    100,     30],\n    contexts=[\"SaaS users\", \"E-commerce orders\", None, None],\n)\n\n# returns:\n# {\n#   \"User\":    [...10 users...],\n#   \"Order\":   [...50 orders...],\n#   \"Product\": [...100 products...],\n#   \"Payment\": [...30 payments...],\n# }\n```\n\n5 models ≈ cost of 1.5 models.\n\n---\n\n## Permission Gates\n\nFixtureForge classifies models by data sensitivity and gates dangerous operations:\n\n```python\nclass SafeUser(BaseModel):\n    id: int\n    name: str          # SAFE — auto-approved\n\nclass CustomerProfile(BaseModel):\n    id: int\n    ssn: str           # SENSITIVE — requires FORGE_ALLOW_PII=1\n    salary: float      # SENSITIVE\n\nclass SecurityTest(BaseModel):\n    id: int\n    sql_injection: str # DANGEROUS — requires interactive confirmation\n```\n\n```python\n# PII auto-approved\nforge = Forge(allow_pii=True)\n\n# CI/headless — dangerous ops silently rejected\nforge = Forge(interactive=False)\n```\n\nThree levels: `safe` (auto) → `sensitive` (env gate) → `dangerous` (human prompt).\n\n---\n\n## Domain Rules — ForgeMemory\n\nPersist business rules that survive across sessions.  \nRules are re-read on every generation call — update a rule, next call respects it immediately.\n\n```python\nforge.memory.add_rule(\"financial\", \"Users under 18 get restricted account type\")\nforge.memory.add_rule(\"user\", \"Israeli phone numbers use format 05x-xxx-xxxx\")\nforge.memory.add_rule(\"orders\", \"Max 3 active loans per customer at any time\")\n\n# Rules inject into AI prompts automatically\nusers = forge.create_batch(User, count=50, context=\"Israeli SaaS platform\")\n```\n\n**Skeptical Memory** — rules are hints, not truth. FixtureForge validates stored rules against the live schema before every generation call.\n\n**Progressive Forgetting** — field names and types are never stored (re-derivable from the model). Only business rules that exist nowhere else in the code are kept.\n\n---\n\n## ForgeDream — Coverage Analysis\n\nFind gaps in your test-data coverage automatically:\n\n```python\nimport os\nos.environ[\"FORGE_FLAG_DREAM\"] = \"1\"\n\nreport = forge.dream(models=[User, Order], force=True)\nprint(report.summary())\n\n# ForgeDream Report - 2026-04-08\n#   Coverage gaps found  : 3\n#   Rule conflicts found : 0\n#   Top gaps:\n#     [User.age]   no_boundary : No boundary-value rules for numeric field 'age'\n#     [User.email] no_invalid  : No invalid-data rules for well-known field 'email'\n#     [Order.total] no_boundary: No boundary-value rules for numeric field 'total'\n```\n\nFour phases: **Orient** (read index) → **Gather** (find gaps) → **Consolidate** (merge rules) → **Prune** (trim to ≤200 lines).\n\nReport saved as `.forge/coverage_gaps.json`.\n\n---\n\n## Streaming — Memory-Safe Large Datasets\n\n```python\n# Lazy evaluation — writes to disk one record at a time\nfor user in forge.create_stream(User, count=1_000_000, filename=\"users.json\"):\n    pass  # process one record, never loads all into memory\n```\n\nSupports `.json`, `.csv`, `.sql` output formats.\n\n---\n\n## Export\n\n```python\nfrom fixtureforge.core.exporter import DataExporter\n\nusers = forge.create_batch(User, count=100)\nDataExporter.to_json(users, \"users.json\")\nDataExporter.to_csv(users, \"users.csv\")\nDataExporter.to_sql(users, \"users.sql\", table_name=\"users\")\n```\n\n---\n\n## Response Cache\n\nAI responses are cached locally for 7 days. Identical requests cost nothing after the first call.\n\n```python\nforge = Forge(use_cache=True)   # default — saves to ~/.fixtureforge/cache/\nforge = Forge(use_cache=False)  # disable caching\n```\n\n---\n\n## Feature Flags\n\n```python\nfrom fixtureforge.config import is_enabled, flag_summary\n\nflag_summary()\n# {\n#   'FORGE_SWARMS':      True,   # shipped\n#   'FORGE_PERMISSIONS': True,   # shipped\n#   'FORGE_COMPRESSION': True,   # shipped\n#   'FORGE_MCP':         True,   # shipped\n#   'FORGE_DREAM':       False,  # enable with FORGE_FLAG_DREAM=1\n#   'FORGE_KAIROS':      False,  # coming in v2.x\n#   'FORGE_ULTRAPLAN':   False,  # coming in v2.x\n# }\n```\n\nEnable any staged feature with an env var:\n\n```bash\nFORGE_FLAG_DREAM=1 python run_tests.py\n```\n\n---\n\n## Stats \u0026 Diagnostics\n\n```python\nforge.stats()\n# {\n#   \"registry\": {\"user\": 50, \"order\": 200},\n#   \"session_tokens\": 1240,\n#   \"memory\": {\"topics\": 3, \"total_kb\": 2.4},\n#   \"flags\": {\"FORGE_SWARMS\": True, \"FORGE_PERMISSIONS\": True}\n# }\n\nforge.clear_registry()  # reset FK registry between independent test scenarios\n```\n\n---\n\n## Architecture\n\n```\nFixtureForge v2.0\n├── Config Layer        feature flags, env-var overrides\n├── Security Layer      safe / sensitive / dangerous gates, mailbox pattern\n├── Memory Layer        FORGE.md pointer index, on-demand topic files\n├── Generation Layer    IntelligentRouter, SmartBatchEngine, DataSwarms\n├── Compression Layer   Micro → Auto → Full (three-layer pipeline)\n├── Export Layer        JSON / CSV / SQL / streaming\n└── Background Layer    ForgeDream coverage analysis (feature-flagged)\n```\n\n**Provider-agnostic**: Claude, GPT, Gemini, Groq, Ollama, or no AI at all.  \n**Pydantic v2 native**: full support for `@computed_field`, validators, and constrained types.  \n**CI-safe**: `seed=` parameter guarantees identical output across runs.\n\n---\n\n## Comparison\n\n| | FixtureForge | factory_boy | faker | hypothesis |\n|---|---|---|---|---|\n| AI-generated context | Yes | No | No | No |\n| Deterministic (seed=) | Yes | Yes | Yes | Yes |\n| FK relationships | Auto | Manual | No | No |\n| Coverage analysis | Yes | No | No | Partial |\n| CI-safe mode | Yes | Yes | Yes | Yes |\n| Large datasets | Yes (100k+) | Manual | Manual | No |\n| Permission gates | Yes | No | No | No |\n\nFixtureForge is not a replacement for `faker` — it uses `faker` internally. It's not a replacement for `hypothesis` — it solves a different problem. It adds the layer between \"I need realistic data\" and \"I need it to feel like production\".\n\n---\n\n## Requirements\n\n- Python 3.11+\n- pydantic \u003e= 2.5\n- faker \u003e= 22.0\n\nAI providers are optional extras — the core works with zero dependencies beyond pydantic and faker.\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\n## Links\n\n- **Docs**: https://yaniv2809.github.io/fixtureforge/\n- **PyPI**: https://pypi.org/project/fixtureforge/\n- **Repository**: https://github.com/Yaniv2809/fixtureforge\n- **Issues**: https://github.com/Yaniv2809/fixtureforge/issues\n\n💬 [Join the discussion](https://github.com/Yaniv2809/fixtureforge/discussions/1)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaniv2809%2Ffixtureforge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyaniv2809%2Ffixtureforge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaniv2809%2Ffixtureforge/lists"}