{"id":39444457,"url":"https://github.com/mbeacom/genai-processors-pydantic","last_synced_at":"2026-01-18T04:23:43.751Z","repository":{"id":304887712,"uuid":"1019595844","full_name":"mbeacom/genai-processors-pydantic","owner":"mbeacom","description":"The Pydantic Gemini Processor to be used with Gemini's genai-processors","archived":false,"fork":false,"pushed_at":"2025-10-27T13:26:40.000Z","size":510,"stargazers_count":4,"open_issues_count":7,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-09T05:24:23.123Z","etag":null,"topics":["agent","ai","asyncio","gemini","gemini-processor","genai","generative-ai","hacktoberfest","multimodal","processor","pydantic","pydantic-ai","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mbeacom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-14T15:13:21.000Z","updated_at":"2025-11-08T16:24:09.000Z","dependencies_parsed_at":"2025-07-17T02:43:54.787Z","dependency_job_id":"b6789b66-58a8-46b6-acdf-6aa684baea38","html_url":"https://github.com/mbeacom/genai-processors-pydantic","commit_stats":null,"previous_names":["mbeacom/genai-processors-pydantic","mbeacom/pydantic-genai-processor"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/mbeacom/genai-processors-pydantic","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbeacom%2Fgenai-processors-pydantic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbeacom%2Fgenai-processors-pydantic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbeacom%2Fgenai-processors-pydantic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbeacom%2Fgenai-processors-pydantic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mbeacom","download_url":"https://codeload.github.com/mbeacom/genai-processors-pydantic/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbeacom%2Fgenai-processors-pydantic/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28529507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai","asyncio","gemini","gemini-processor","genai","generative-ai","hacktoberfest","multimodal","processor","pydantic","pydantic-ai","python"],"created_at":"2026-01-18T04:23:43.691Z","updated_at":"2026-01-18T04:23:43.743Z","avatar_url":"https://github.com/mbeacom.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# genai-processors-pydantic\n\n[![PyPI version](https://img.shields.io/pypi/v/genai-processors-pydantic.svg)](https://pypi.org/project/genai-processors-pydantic/)\n[![Validation](https://github.com/mbeacom/genai-processors-pydantic/actions/workflows/validate.yml/badge.svg)](https://github.com/mbeacom/genai-processors-pydantic/actions/workflows/validate.yml)\n[![codecov](https://codecov.io/github/mbeacom/pydantic-gemini-processor/graph/badge.svg?token=9Ue94I4FEw)](https://codecov.io/github/mbeacom/pydantic-gemini-processor)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n\nA Pydantic validator processor for Google's [genai-processors](https://github.com/google-gemini/genai-processors) framework.\n\n**Note:** This is an independent contrib processor that extends the genai-processors ecosystem.\n\n## ⚠️ Important: Current Limitations \u0026 Roadmap\n\nThis processor was developed based on feedback from the genai-processors maintainers. While functional and tested, it has known limitations in certain scenarios. See [MAINTAINER_FEEDBACK.md](MAINTAINER_FEEDBACK.md) for detailed analysis and our roadmap to address these challenges:\n\n* **Streaming**: Currently works best with complete JSON in single Parts\n* **Tool Integration**: Planned support for `genai_types.ToolResponse` Parts\n* **Multi-Model Validation**: Single-model design; multi-model support planned\n* **MIME Type Independence**: ✅ Already handles unmarked JSON Parts\n\nWe're committed to addressing these limitations while maintaining a stable API.\n\n## PydanticValidator\n\nThe PydanticValidator is a PartProcessor that validates the JSON content of a ProcessorPart against a specified [Pydantic](https://docs.pydantic.dev/latest/) model. It provides a simple, declarative way to enforce data schemas and improve the robustness of your AI pipelines.\n\n## Motivation\n\nIn many AI applications, processors ingest data from external sources like user inputs or API calls. This data can be unpredictable or malformed. The PydanticValidator solves this by:\n\n* **Preventing Errors:** It catches invalid data early, before it can cause errors in downstream processors like a GenaiModel or a database writer.\n* **Ensuring Structure:** It guarantees that any data moving forward in the pipeline conforms to a reliable, expected structure.\n* **Simplifying Logic:** It allows other processors to focus on their core tasks without being cluttered with boilerplate data validation code.\n\n## Installation\n\nInstall the package from PyPI:\n\n```bash\npip install genai-processors-pydantic\n```\n\nOr with uv:\n\n```bash\nuv add genai-processors-pydantic\n```\n\nThis will automatically install the required dependencies:\n\n* `genai-processors\u003e=1.0.4`\n* `pydantic\u003e=2.0`\n\n## Configuration\n\nYou can customize the validator's behavior by passing a ValidationConfig object during initialization.\n\n```python\nfrom genai_processors_pydantic import PydanticValidator, ValidationConfig\n\nconfig = ValidationConfig(fail_on_error=True, strict_mode=True)\nvalidator = PydanticValidator(MyModel, config=config)\n```\n\n### ValidationConfig Parameters\n\n* fail_on_error (bool, default: False):\n  * If False, the processor will catch ValidationErrors, add error details to the part's metadata, and allow the stream to continue.\n  * If True, the processor will re-raise the ValidationError, stopping the stream immediately. This is useful for \"fail-fast\" scenarios.\n* strict_mode (bool, default: False):\n  * If False, Pydantic will attempt to coerce types where possible (e.g., converting the string \"123\" to the integer 123).\n  * If True, Pydantic will enforce strict type matching and will not perform type coercion.\n\n## Behavior and Metadata\n\nThe PydanticValidator processes parts that contain valid JSON in their text field. For each part it processes, it yields one or more new parts:\n\n1. **The Result Part:** The original part, now with added metadata.\n2. **A Status Part:** A message sent to the STATUS_STREAM indicating the outcome.\n\n### On Successful Validation\n\n* The yielded part's metadata['validation_status'] is set to 'success'.\n* The metadata['validated_data'] contains the serialized dictionary representation of the validated data (ensuring ProcessorParts remain serializable).\n* The part's text is updated to be the formatted JSON representation of the validated data.\n* A processor.status() message like ✅ Successfully validated... is yielded.\n\n### On Failed Validation\n\n* The yielded part's metadata['validation_status'] is set to 'failure'.\n* metadata['validation_errors'] contains a structured list of validation errors.\n* metadata['original_data'] contains the raw data that failed validation.\n* A processor.status() message like ❌ Validation failed... is yielded.\n\n## Practical Demonstration: Building a Robust Pipeline\n\nA common use case is to validate a stream of user data and route valid and invalid items to different downstream processors.\nThis example shows how to create a filter to separate the stream after validation.\n\n### Example\n\n```python\nimport asyncio\nimport json\n\nfrom genai_processors import streams, processor\nfrom genai_processors_pydantic import PydanticValidator\nfrom pydantic import BaseModel, Field\n\n\n# 1. Define the data schema.\nclass UserEvent(BaseModel):\n    user_id: int\n    event_name: str = Field(min_length=3)\n\n\n# 2. Create the validator.\nvalidator = PydanticValidator(model=UserEvent)\n\n# 3. Define downstream processors for success and failure cases.\nclass DatabaseWriter(processor.PartProcessor):\n    async def call(self, part: processor.ProcessorPart):\n        validated_data = part.metadata['validated_data']\n        print(\n            f\"DATABASE: Writing event '{validated_data['event_name']}' \"\n            f\"for user {validated_data['user_id']}\"\n        )\n        yield part\n\n\nclass ErrorLogger(processor.PartProcessor):\n    async def call(self, part: processor.ProcessorPart):\n        errors = part.metadata['validation_errors']\n        print(f\"ERROR_LOG: Found {len(errors)} validation errors.\")\n        yield part\n\n\n# 4. Create a stream with mixed-quality data.\ninput_stream = streams.stream_content([\n    # Valid example\n    processor.ProcessorPart(json.dumps({\"user_id\": 101, \"event_name\": \"login\"})),\n    # Invalid user_id\n    processor.ProcessorPart(json.dumps({\"user_id\": \"102\", \"event_name\": \"logout\"})),\n    # Invalid event_name\n    processor.ProcessorPart(json.dumps({\"user_id\": 103, \"event_name\": \"up\"})),\n    # Ignore this part\n    processor.ProcessorPart(\"This is not a JSON part and will be ignored.\"),\n])\n\n\n# 5. Build and run the pipeline.\nasync def main():\n    print(\"--- Running Validation Pipeline ---\")\n\n    # Process each input part through the validator as it arrives\n    # This avoids buffering the entire stream in memory\n    valid_parts = []\n    invalid_parts = []\n\n    async for input_part in input_stream:\n        async for validated_part in validator(input_part):\n            # Filter based on validation status (skip status messages)\n            status = validated_part.metadata.get(\"validation_status\")\n            if status == \"success\":\n                valid_parts.append(validated_part)\n            elif status == \"failure\":\n                invalid_parts.append(validated_part)\n\n    # Create streams from the filtered parts\n    valid_stream = streams.stream_content(valid_parts)\n    invalid_stream = streams.stream_content(invalid_parts)\n\n    # Create processor instances\n    db_writer = DatabaseWriter()\n    error_logger = ErrorLogger()\n\n    # Process both streams concurrently\n    async def process_valid():\n        async for part in valid_stream:\n            async for result in db_writer(part):\n                pass  # Results are printed in the processor\n\n    async def process_invalid():\n        async for part in invalid_stream:\n            async for result in error_logger(part):\n                pass  # Results are printed in the processor\n\n    # Run both processing pipelines concurrently\n    await asyncio.gather(process_valid(), process_invalid())\n    print(\"--- Pipeline Finished ---\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n\n\n# Expected Output:\n# --- Running Validation Pipeline ---\n# DATABASE: Writing event 'login' for user 101\n# ERROR_LOG: Found 1 validation errors.\n# ERROR_LOG: Found 1 validation errors.\n# --- Pipeline Finished ---\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmbeacom%2Fgenai-processors-pydantic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmbeacom%2Fgenai-processors-pydantic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmbeacom%2Fgenai-processors-pydantic/lists"}