{"id":43975961,"url":"https://github.com/docutray/docutray-python","last_synced_at":"2026-02-07T08:09:29.864Z","repository":{"id":336493102,"uuid":"1149811318","full_name":"docutray/docutray-python","owner":"docutray","description":null,"archived":false,"fork":false,"pushed_at":"2026-02-05T02:55:35.000Z","size":108,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-05T04:33:50.458Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/docutray.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-04T14:51:30.000Z","updated_at":"2026-02-05T02:44:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/docutray/docutray-python","commit_stats":null,"previous_names":["docutray/docutray-python"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/docutray/docutray-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docutray%2Fdocutray-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docutray%2Fdocutray-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docutray%2Fdocutray-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docutray%2Fdocutray-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/docutray","download_url":"https://codeload.github.com/docutray/docutray-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docutray%2Fdocutray-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29189675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T07:37:03.739Z","status":"ssl_error","status_checked_at":"2026-02-07T07:37:03.029Z","response_time":63,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-07T08:09:29.795Z","updated_at":"2026-02-07T08:09:29.858Z","avatar_url":"https://github.com/docutray.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DocuTray Python Library\n\n[![PyPI version](https://img.shields.io/pypi/v/docutray.svg)](https://pypi.org/project/docutray/)\n[![Python versions](https://img.shields.io/pypi/pyversions/docutray.svg)](https://pypi.org/project/docutray/)\n[![License](https://img.shields.io/pypi/l/docutray.svg)](https://github.com/docutray/docutray-python/blob/main/LICENSE)\n\nThe official Python library for the [DocuTray API](https://docutray.com), providing access to document processing capabilities including OCR, document identification, data extraction, and knowledge bases.\n\n## Documentation\n\nFull API documentation is available at [docs.docutray.com](https://docs.docutray.com).\n\n## Installation\n\n```bash\npip install docutray\n```\n\nFor API reference generation tools (maintainers only):\n\n```bash\npip install docutray[docs]\n```\n\n### Requirements\n\n- Python 3.10+\n\n## Quick Start\n\n### Synchronous Usage\n\n```python\nfrom pathlib import Path\nfrom docutray import Client\n\nclient = Client(api_key=\"your-api-key\")\n\n# Convert a document\nresult = client.convert.run(\n    file=Path(\"invoice.pdf\"),\n    document_type_code=\"invoice\"\n)\nprint(result.data)\n\nclient.close()\n```\n\n### Asynchronous Usage\n\n```python\nimport asyncio\nfrom pathlib import Path\nfrom docutray import AsyncClient\n\nasync def main():\n    async with AsyncClient(api_key=\"your-api-key\") as client:\n        result = await client.convert.run(\n            file=Path(\"invoice.pdf\"),\n            document_type_code=\"invoice\"\n        )\n        print(result.data)\n\nasyncio.run(main())\n```\n\n## Configuration\n\n### API Key\n\nSet your API key via constructor argument or environment variable:\n\n```python\n# Via constructor\nclient = Client(api_key=\"your-api-key\")\n\n# Via environment variable\n# export DOCUTRAY_API_KEY=\"your-api-key\"\nclient = Client()  # Reads from DOCUTRAY_API_KEY\n```\n\n### Base URL\n\nOverride the default API endpoint:\n\n```python\nclient = Client(\n    api_key=\"your-api-key\",\n    base_url=\"https://custom-api.example.com\"\n)\n```\n\n### Timeout\n\nConfigure request timeouts (in seconds):\n\n```python\nimport httpx\n\n# Simple timeout (applies to all operations)\nclient = Client(api_key=\"your-api-key\", timeout=30.0)\n\n# Granular timeout control\nclient = Client(\n    api_key=\"your-api-key\",\n    timeout=httpx.Timeout(\n        connect=5.0,\n        read=60.0,\n        write=60.0,\n        pool=10.0\n    )\n)\n```\n\n### Retries\n\nConfigure automatic retry behavior:\n\n```python\n# Default: 2 retries with exponential backoff\nclient = Client(api_key=\"your-api-key\")\n\n# Custom retry count\nclient = Client(api_key=\"your-api-key\", max_retries=5)\n\n# Disable retries\nclient = Client(api_key=\"your-api-key\", max_retries=0)\n```\n\n## Error Handling\n\nThe SDK provides a comprehensive exception hierarchy:\n\n```\nDocuTrayError (base)\n├── APIConnectionError (network errors)\n│   └── APITimeoutError (request timeout)\n└── APIError (HTTP errors)\n    ├── BadRequestError (400)\n    ├── AuthenticationError (401)\n    ├── PermissionDeniedError (403)\n    ├── NotFoundError (404)\n    ├── ConflictError (409)\n    ├── UnprocessableEntityError (422)\n    ├── RateLimitError (429)\n    └── InternalServerError (5xx)\n```\n\n### Catching Errors\n\n```python\nfrom pathlib import Path\nfrom docutray import Client\nfrom docutray import (\n    DocuTrayError,\n    APIConnectionError,\n    APIError,\n    AuthenticationError,\n    RateLimitError,\n    NotFoundError,\n)\n\nclient = Client(api_key=\"your-api-key\")\n\ntry:\n    result = client.convert.run(\n        file=Path(\"document.pdf\"),\n        document_type_code=\"invoice\"\n    )\nexcept AuthenticationError as e:\n    print(f\"Invalid API key: {e.message}\")\nexcept RateLimitError as e:\n    print(f\"Rate limited. Retry after {e.retry_after} seconds\")\nexcept NotFoundError as e:\n    print(f\"Resource not found: {e.message}\")\nexcept APIError as e:\n    print(f\"API error {e.status_code}: {e.message}\")\n    print(f\"Request ID: {e.request_id}\")\nexcept APIConnectionError as e:\n    print(f\"Connection failed: {e.message}\")\nexcept DocuTrayError as e:\n    print(f\"SDK error: {e.message}\")\n```\n\n### Rate Limit Handling\n\n```python\nimport time\nfrom pathlib import Path\nfrom docutray import Client, RateLimitError\n\nclient = Client(api_key=\"your-api-key\")\n\ntry:\n    result = client.convert.run(file=Path(\"document.pdf\"), document_type_code=\"invoice\")\nexcept RateLimitError as e:\n    if e.retry_after:\n        print(f\"Rate limited. Waiting {e.retry_after} seconds...\")\n        time.sleep(e.retry_after)\n        # Retry the request\n```\n\n## Resources\n\n### Convert\n\nConvert documents to structured data using OCR and AI extraction.\n\n```python\nfrom pathlib import Path\n\n# Synchronous conversion (waits for result)\nresult = client.convert.run(\n    file=Path(\"invoice.pdf\"),\n    document_type_code=\"invoice\"\n)\nprint(result.data)\n\n# From bytes\nwith open(\"invoice.pdf\", \"rb\") as f:\n    result = client.convert.run(\n        file=f.read(),\n        document_type_code=\"invoice\"\n    )\n\n# From URL\nresult = client.convert.run(\n    url=\"https://example.com/invoice.pdf\",\n    document_type_code=\"invoice\"\n)\n\n# Asynchronous conversion (returns immediately)\nstatus = client.convert.run_async(\n    file=Path(\"large_document.pdf\"),\n    document_type_code=\"invoice\"\n)\nprint(f\"Conversion ID: {status.conversion_id}\")\n\n# Poll for completion\nfinal_status = status.wait()\nif final_status.is_success():\n    print(final_status.data)\n```\n\n### Identify\n\nAutomatically identify document types.\n\n```python\nresult = client.identify.run(file=Path(\"unknown_document.pdf\"))\n\nprint(f\"Identified as: {result.document_type.name}\")\nprint(f\"Confidence: {result.document_type.confidence:.2%}\")\n\n# View alternatives\nfor alt in result.alternatives:\n    print(f\"  Alternative: {alt.name} ({alt.confidence:.2%})\")\n```\n\n### Document Types\n\nList and retrieve document type definitions.\n\n```python\n# List all document types\npage = client.document_types.list()\nfor doc_type in page.data:\n    print(f\"{doc_type.code}: {doc_type.name}\")\n\n# Search document types\npage = client.document_types.list(search=\"invoice\")\n\n# Get a specific document type\ndoc_type = client.document_types.get(\"dt_invoice\")\nprint(f\"Schema: {doc_type.schema_}\")\n\n# Validate data against a document type schema\nvalidation = client.document_types.validate(\n    \"dt_invoice\",\n    {\"invoice_number\": \"INV-001\", \"total\": 100.00}\n)\nif validation.is_valid():\n    print(\"Data is valid!\")\nelse:\n    for error in validation.errors.messages:\n        print(f\"Validation error: {error}\")\n```\n\n### Steps\n\nExecute predefined document processing workflows.\n\n```python\n# Start async step execution\nstatus = client.steps.run_async(\n    step_id=\"step_invoice_extraction\",\n    file=Path(\"invoice.pdf\")\n)\n\n# Wait for completion with progress callback\ndef on_progress(s):\n    print(f\"Status: {s.status}\")\n\nfinal = status.wait(on_status=on_progress)\nprint(final.data)\n```\n\n### Knowledge Bases\n\nManage document collections with semantic search capabilities.\n\n```python\n# List knowledge bases\nfor kb in client.knowledge_bases.list().auto_paging_iter():\n    print(f\"{kb.name}: {kb.document_count} documents\")\n\n# Create a knowledge base\nkb = client.knowledge_bases.create(\n    name=\"Product Documentation\",\n    description=\"Technical documentation for products\",\n    schema={\n        \"type\": \"object\",\n        \"properties\": {\n            \"title\": {\"type\": \"string\"},\n            \"content\": {\"type\": \"string\"},\n            \"category\": {\"type\": \"string\"}\n        }\n    }\n)\n\n# Add documents\ndoc = client.knowledge_bases.documents(kb.id).create(\n    content={\n        \"title\": \"Getting Started\",\n        \"content\": \"Welcome to our product...\",\n        \"category\": \"guides\"\n    },\n    metadata={\"source\": \"manual\"}\n)\n\n# Semantic search\nresults = client.knowledge_bases.search(\n    kb.id,\n    query=\"how to configure authentication\",\n    limit=5\n)\nfor item in results.data:\n    print(f\"{item.similarity:.2%}: {item.document.content['title']}\")\n\n# Update a document\nclient.knowledge_bases.documents(kb.id).update(\n    doc.id,\n    content={\"title\": \"Updated Title\", \"content\": \"...\"}\n)\n\n# Delete a document\nclient.knowledge_bases.documents(kb.id).delete(doc.id)\n\n# Delete knowledge base\nclient.knowledge_bases.delete(kb.id)\n```\n\n## Pagination\n\nResources that return lists support pagination:\n\n```python\n# Get the first page\npage = client.document_types.list(limit=10)\nprint(f\"Page {page.page} of {page.total_pages}\")\n\n# Iterate through all pages manually\nfor page in client.document_types.list().iter_pages():\n    for doc_type in page.data:\n        print(doc_type.name)\n\n# Auto-iterate through all items\nfor doc_type in client.document_types.list().auto_paging_iter():\n    print(doc_type.name)\n\n# Async pagination (inside an async function)\n# async for doc_type in (await client.document_types.list()).auto_paging_iter_async():\n#     print(doc_type.name)\n```\n\n## Raw Response Access\n\nAccess raw HTTP response data for debugging:\n\n```python\nfrom pathlib import Path\nfrom docutray import Client\n\nclient = Client(api_key=\"your-api-key\")\n\nresponse = client.convert.with_raw_response.run(\n    file=Path(\"invoice.pdf\"),\n    document_type_code=\"invoice\"\n)\n\nprint(f\"Status: {response.status_code}\")\nprint(f\"Headers: {response.headers}\")\nprint(f\"Request ID: {response.headers.get('x-request-id')}\")\n\n# Parse the response body\nresult = response.parse()\nprint(result.data)\n```\n\n## Async Operations\n\nFor long-running operations, use async methods with polling:\n\n```python\nfrom pathlib import Path\nfrom docutray import Client\n\nclient = Client(api_key=\"your-api-key\")\n\n# Start async conversion\nstatus = client.convert.run_async(\n    file=Path(\"large_document.pdf\"),\n    document_type_code=\"invoice\"\n)\n\n# Poll with progress callback\ndef on_status(s):\n    print(f\"Status: {s.status}, Progress: {s.progress or 'N/A'}\")\n\nfinal = status.wait(\n    on_status=on_status,\n    poll_interval=2.0,  # seconds between polls\n    timeout=300.0       # maximum wait time\n)\n\nif final.is_success():\n    print(\"Conversion complete!\")\n    print(final.data)\nelif final.is_failed():\n    print(f\"Conversion failed: {final.error}\")\n```\n\n## Type Safety\n\nThe SDK uses Pydantic models for all responses, providing full type safety:\n\n```python\nfrom pathlib import Path\nfrom docutray import Client\nfrom docutray.types import ConversionResult, DocumentType\n\nclient = Client(api_key=\"your-api-key\")\n\n# Type hints work with your IDE\nresult: ConversionResult = client.convert.run(\n    file=Path(\"invoice.pdf\"),\n    document_type_code=\"invoice\"\n)\n\n# Access typed attributes\nprint(result.conversion_id)  # str\nprint(result.data)           # dict[str, Any]\nprint(result.status)         # str\n```\n\n## Contributing\n\nWe welcome contributions! Here's how to get started:\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/docutray/docutray-python.git\ncd docutray-python\n\n# Install dependencies with uv\nuv sync\n\n# Run tests\nuv run pytest\n\n# Run tests with coverage\nuv run pytest --cov=src/docutray\n\n# Type checking\nuv run mypy src\n\n# Linting\nuv run ruff check src\n\n# Format code\nuv run ruff format src\n```\n\n### Running Tests\n\n```bash\n# All tests\nuv run pytest\n\n# Specific test file\nuv run pytest tests/test_client.py\n\n# Specific test\nuv run pytest tests/test_client.py::test_client_initialization\n```\n\n## Support\n\n- [Documentation](https://docs.docutray.com)\n- [API Reference](https://docs.docutray.com/api)\n- [Issue Tracker](https://github.com/docutray/docutray-python/issues)\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocutray%2Fdocutray-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdocutray%2Fdocutray-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocutray%2Fdocutray-python/lists"}