{"id":48325612,"url":"https://github.com/juliensimon/ocel-generator","last_synced_at":"2026-04-07T22:01:18.425Z","repository":{"id":335120966,"uuid":"1144275158","full_name":"juliensimon/ocel-generator","owner":"juliensimon","description":"Generate realistic multi-agent workflow traces with LLM-enriched content. pip install open-agent-traces","archived":false,"fork":false,"pushed_at":"2026-04-04T19:31:20.000Z","size":963,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-05T01:03:22.056Z","etag":null,"topics":["agent-observability","agent-traces","ai-agents","anomaly-detection","conformance-checking","dataset-generation","dataset-generator","langchain","llm","llm-agents","multi-agent","ocel","process-mining","synthetic-data"],"latest_commit_sha":null,"homepage":"https://huggingface.co/datasets/juliensimon/open-agent-traces","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juliensimon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-28T14:05:09.000Z","updated_at":"2026-04-04T20:02:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/juliensimon/ocel-generator","commit_stats":null,"previous_names":["juliensimon/ocel-generator"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/juliensimon/ocel-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Focel-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Focel-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Focel-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Focel-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juliensimon","download_url":"https://codeload.github.com/juliensimon/ocel-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Focel-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31530647,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-observability","agent-traces","ai-agents","anomaly-detection","conformance-checking","dataset-generation","dataset-generator","langchain","llm","llm-agents","multi-agent","ocel","process-mining","synthetic-data"],"created_at":"2026-04-05T00:53:10.359Z","updated_at":"2026-04-07T22:01:18.416Z","avatar_url":"https://github.com/juliensimon.png","language":"Python","readme":"# ocelgen\n\n**Generate realistic multi-agent workflow traces on demand.** Any domain, any pattern, any LLM. Validated against OCEL 2.0 and PM4Py.\n\n[![PyPI](https://img.shields.io/pypi/v/open-agent-traces)](https://pypi.org/project/open-agent-traces/)\n[![CI](https://github.com/juliensimon/ocel-generator/actions/workflows/ci.yml/badge.svg)](https://github.com/juliensimon/ocel-generator/actions)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![Python 3.11+](https://img.shields.io/badge/Python-3.11%2B-blue.svg)](https://python.org)\n[![OCEL 2.0](https://img.shields.io/badge/OCEL-2.0-orange.svg)](https://www.ocel-standard.org/)\n[![Dataset on HF](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-open--agent--traces-yellow)](https://huggingface.co/datasets/juliensimon/open-agent-traces)\n\n```bash\npip install open-agent-traces\n\nocelgen generate --pattern sequential --runs 50 --noise 0.2 --seed 42\n```\n\n**1,500+ events in under 2 seconds.** No API key needed for structural traces.\n\n```\nrun-0000: \"My order arrived damaged, what are my options?\"\n├── run_started                                              08:00:00.007\n├── agent_invoked          researcher    gpt-4o              08:00:00.052\n│   ├── llm_request_sent   \"Search for refund policy...\"     08:00:00.067\n│   ├── llm_response       \"The refund policy states...\"     08:00:00.749\n│   ├── tool_called        web_search    → policy found      08:00:01.705\n│   └── tool_called        file_reader   → order history     08:00:01.898\n├── agent_invoked          analyst       gpt-4o              08:00:02.281\n│   ├── llm_request_sent   \"Analyze refund eligibility...\"   08:00:02.334\n│   ├── llm_response       \"Customer is eligible for...\"     08:00:06.747\n│   └── tool_called        calculator    → refund amount     08:00:08.819\n├── agent_invoked          summarizer    claude-3.5-sonnet   08:00:09.680\n│   ├── llm_request_sent   \"Draft resolution response...\"    08:00:09.717\n│   └── llm_response       \"Dear customer, we apologize...\"  08:00:10.363\n└── run_completed                                            08:00:10.369\n    cost: $0.038 | 3,950 input + 2,516 output tokens | 5 LLM calls | 3 tool calls\n```\n\n## What it generates\n\nEach trace includes LLM prompts and completions, tool call inputs and outputs, agent reasoning chains, inter-agent messages, calibrated token counts, realistic timestamps, and cost estimates — the same data you'd see in LangSmith, Arize, or Braintrust.\n\n**3 workflow patterns:**\n\n```\nSequential:    Research → Analyze → Summarize\nSupervisor:    Supervisor → [Worker A, Worker B, Worker C] → Aggregate\nParallel:      Split → [Worker A ‖ Worker B ‖ Worker C] → Aggregate\n```\n\n**10 deviation types** with ground-truth labels for anomaly detection: skipped steps, wrong tools, swapped order, timeouts, missing handoffs, extra LLM calls, wrong routing, repeated activities, inserted activities, wrong resources.\n\n**10 built-in enterprise domains** — or [define your own](#define-your-own-domains) in YAML:\n\n| Domain | Pattern | What it simulates |\n|--------|---------|-------------------|\n| `customer-support-triage` | sequential | Classify ticket, research KB, draft response |\n| `code-review-pipeline` | supervisor | Delegate to linter, security reviewer, style checker |\n| `incident-response` | supervisor | Route to diagnostics, mitigation, communications |\n| `data-pipeline-debugging` | supervisor | Log analyzer, schema checker, fix proposer |\n| `market-research` | parallel | Competitor analyst, trend researcher, report writer |\n| `content-generation` | parallel | Researcher, writer, editor working concurrently |\n| `academic-paper-review` | parallel | Methodology, novelty, writing reviewers |\n| `legal-document-analysis` | sequential | Extract clauses, check compliance, summarize risks |\n| `financial-analysis` | sequential | Gather filings, compute ratios, write investment memo |\n| `ecommerce-product-enrichment` | sequential | Scrape specs, normalize attributes, generate descriptions |\n\n## Enrich with any LLM\n\nPlug in any OpenAI-compatible endpoint to fill traces with realistic content:\n\n```bash\n# Cloud (OpenRouter — default)\nexport OPENAI_API_KEY=\"your-key\"\nocelgen enrich output.jsonocel --domain customer-support-triage\n\n# Local (llama.cpp, Ollama, vLLM — no API key needed)\nocelgen enrich output.jsonocel -d customer-support-triage \\\n  --model local-model --base-url http://localhost:8080/v1\n\n# Full pipeline: generate + enrich + upload to Hugging Face\nocelgen pipeline --domain customer-support-triage --namespace your-hf-username\n```\n\nEnrichment chains context across agent steps, reflects deviations in the generated content, recalibrates token counts and timestamps, and expands seed queries via LLM for diversity across runs.\n\n## Validated, not just generated\n\nEvery trace is checked by 5 validation layers — tested across all 10 domains, all 3 patterns, and [the live HF dataset](https://huggingface.co/datasets/juliensimon/open-agent-traces):\n\n| Validator | What it checks |\n|-----------|---------------|\n| JSON Schema | OCEL 2.0 structural compliance |\n| Referential integrity | Every relationship points to an existing object |\n| Type attributes | Every attribute matches its declared type schema |\n| Temporal ordering | Causal pairs in order, run boundaries correct |\n| Workflow conformance | Conformant runs follow the template (parallel-aware) |\n\n```python\nfrom ocelgen.generation.engine import generate\nfrom ocelgen.validation import (\n    validate_referential_integrity,\n    validate_workflow_conformance,\n)\n\nresult = generate(\"sequential\", num_runs=50, noise_rate=0.3, seed=42)\nassert validate_referential_integrity(result.log) == []\nassert validate_workflow_conformance(result.log, result.template) == []\n```\n\nTraces load directly in [PM4Py](https://pm4py.fit.fraunhofer.de/) — the reference OCEL 2.0 process mining library:\n\n```bash\npip install open-agent-traces[conformance]\n```\n\n```python\nimport pm4py\nocel = pm4py.read.read_ocel2_json(\"output.jsonocel\")\n```\n\n## Define your own domains\n\nCreate custom domains in YAML — they merge with the 10 built-ins:\n\n```yaml\ndomains:\n  - name: \"hr-onboarding\"\n    description: \"HR onboarding: collect docs, run checks, provision access\"\n    pattern: \"sequential\"\n    runs: 30\n    noise: 0.15\n    seed: 50001\n    user_queries:\n      - \"New hire starting March 15 as Senior Engineer\"\n    agent_personas:\n      researcher: \"You are an HR coordinator collecting new hire documentation\"\n      analyst: \"You are a compliance officer verifying background checks\"\n      summarizer: \"You are an IT provisioner setting up accounts and access\"\n    tool_descriptions:\n      web_search: \"Search HR knowledge base for onboarding checklists\"\n      file_reader: \"Read employee records and compliance documents\"\n```\n\n```bash\nocelgen pipeline --domain hr-onboarding --config domains.yaml --namespace your-hf-username\n```\n\n## Pre-built dataset\n\nDon't want to generate? Load 17,000+ events directly from Hugging Face:\n\n```python\nfrom datasets import load_dataset\n\nds = load_dataset(\"juliensimon/open-agent-traces\", \"incident-response\")\n\nfor event in ds[\"train\"]:\n    if event[\"run_id\"] == \"run-0000\":\n        print(f\"{event['event_type']:25s} | {event['agent_role']:12s} | {event['reasoning'][:60] if event['reasoning'] else ''}\")\n```\n\n## Who is this for?\n\n- **Agent observability teams** — build and test monitoring dashboards with realistic trace data\n- **ML researchers** — train anomaly detectors on labeled conformant vs deviant traces\n- **Process mining researchers** — apply OCEL 2.0 conformance checking to multi-agent systems\n- **Agent framework developers** — test LangGraph, CrewAI, AutoGen, Smolagents pipelines\n- **Evaluation teams** — benchmark agent reasoning quality across domains and architectures\n\n## Examples\n\n| Script | What it shows |\n|--------|---------------|\n| [`basic_generation.py`](examples/basic_generation.py) | Generate logs via Python API, inspect results, write files |\n| [`validate_traces.py`](examples/validate_traces.py) | Run all 5 semantic validators across all 3 patterns |\n| [`inspect_run.py`](examples/inspect_run.py) | Walk a single run's event timeline, LLM calls, tools, costs |\n| [`explore_with_pm4py.py`](examples/explore_with_pm4py.py) | Download from HF, query with pm4py and datasets library |\n| [`conformance_demo.py`](examples/conformance_demo.py) | Generate and load with pm4py |\n\n## Documentation\n\n- **[Quick Start](docs/quickstart.md)** — first dataset in 5 minutes\n- **[User Guide](docs/user-guide.md)** — CLI reference, patterns, domains, custom YAML, validation, PM4Py\n- **[Dataset on HF](https://huggingface.co/datasets/juliensimon/open-agent-traces)** — 17,000+ events across 10 domains\n\n## Development\n\n```bash\ngit clone https://github.com/juliensimon/ocel-generator.git \u0026\u0026 cd ocel-generator\nuv sync --extra dev\nuv run pre-commit install   # ruff + mypy + pytest on every commit\nuv run pytest               # 265 tests, 98% coverage\n```\n\n## License\n\nMIT\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliensimon%2Focel-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuliensimon%2Focel-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliensimon%2Focel-generator/lists"}