{"id":48768879,"url":"https://github.com/teddynote-lab/dify-upstageparser-plugin","last_synced_at":"2026-04-13T09:03:56.911Z","repository":{"id":283866234,"uuid":"949054376","full_name":"teddynote-lab/dify-upstageparser-plugin","owner":"teddynote-lab","description":"Upstage Document Parse Plugin for dify","archived":false,"fork":false,"pushed_at":"2025-03-15T16:02:14.000Z","size":124,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-22T17:41:29.831Z","etag":null,"topics":["dify","document","plugin","upstage"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teddynote-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-15T15:15:53.000Z","updated_at":"2025-03-16T13:53:40.000Z","dependencies_parsed_at":"2025-03-22T17:41:36.928Z","dependency_job_id":"3fdafe30-b045-4e2a-8cdc-081dcd7e4aa0","html_url":"https://github.com/teddynote-lab/dify-upstageparser-plugin","commit_stats":null,"previous_names":["teddynote-lab/dify-upstageparser-plugin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/teddynote-lab/dify-upstageparser-plugin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teddynote-lab%2Fdify-upstageparser-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teddynote-lab%2Fdify-upstageparser-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teddynote-lab%2Fdify-upstageparser-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teddynote-lab%2Fdify-upstageparser-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teddynote-lab","download_url":"https://codeload.github.com/teddynote-lab/dify-upstageparser-plugin/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teddynote-lab%2Fdify-upstageparser-plugin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31746116,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T06:26:45.479Z","status":"ssl_error","status_checked_at":"2026-04-13T06:26:44.645Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dify","document","plugin","upstage"],"created_at":"2026-04-13T09:03:53.578Z","updated_at":"2026-04-13T09:03:56.903Z","avatar_url":"https://github.com/teddynote-lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Upstage Document Parse Plugin for Dify\n\n[![en](https://img.shields.io/badge/lang-English-blue.svg)](README.md)\n[![ko](https://img.shields.io/badge/lang-한국어-red.svg)](README_KO.md)\n\n**Ready to use?**\n\n[Download the Dify plugin package](https://www.dropbox.com/scl/fi/ehbl0zmd409njmq2tmya3/upstage-documentparse.difypkg?rlkey=my8l73m70emtnc9fi1mo0tvg7\u0026st=a10wvxty\u0026dl=1) and upload it directly to your Dify instance.\n\n## Repository\n\n**GitHub:** [teddynote-lab/dify-upstageparser-plugin](https://github.com/teddynote-lab/dify-upstageparser-plugin)\n\nYou can clone this repository using:\n\n```bash\ngit clone https://github.com/teddynote-lab/dify-upstageparser-plugin.git\ncd dify-upstageparser-plugin\n```\n\nA powerful document parsing plugin for the [Dify](https://dify.ai) platform that leverages the Upstage Document Parse API to convert various document formats into structured markdown, HTML, or text.\n\n## Features\n\n- **Multi-format Support**: Process PDFs, DOCX files, and various image formats\n- **Intelligent Document Understanding**: Extract text, tables, charts, and figures with their original structure\n- **Multiple Output Formats**: Convert documents to markdown, HTML, or plain text\n- **Efficient Caching**: Avoid reprocessing identical files with content-based caching\n- **OCR Capabilities**: Extract text from scanned documents and images\n- **Chart Recognition**: Identify and extract charts from documents\n- **Batch Processing**: Process multi-page documents efficiently\n- **Coordinate Extraction**: Obtain bounding box coordinates for document elements\n\n## Installation\n\nThe installation steps below are only needed for developers who want to manually develop or modify the plugin. If you're an end user, simply [download the Dify plugin package](https://www.dropbox.com/scl/fi/ehbl0zmd409njmq2tmya3/upstage-documentparse.difypkg?rlkey=my8l73m70emtnc9fi1mo0tvg7\u0026st=a10wvxty\u0026dl=0) and upload it to your Dify instance.\n\nFor development:\n\n```bash\npip install -r requirements.txt\n```\n\nConfigure the plugin in your Dify platform.\n\n## Configuration\n\n### Required Credentials\n\nThe plugin requires the following credentials:\n\n- `upstage_api_key`: Your Upstage API key (obtain from [Upstage Console](https://console.upstage.ai))\n- `base_url`: Your Dify instance base URL (default: \"https://cloud.dify.ai\")\n\n### Parameter Options\n\nWhen using the tool, you can configure the following parameters:\n\n- `result_type`: Output format (options: \"md\", \"html\", \"text\")\n- `as_file`: Whether to return results as a file or text (options: \"file\", \"text\")\n\n## Usage\n\n### In Dify Application\n\n1. Add the Upstage Document Parse tool to your application.\n2. Configure the required credentials.\n3. Use the tool in your application flows to process documents.\n\n### Direct Python Usage\n\nYou can also use the client directly in your Python code:\n\n```python\nfrom tools.upstage_client import UpstageDocumentParseClient\n\n# Initialize the client\nclient = UpstageDocumentParseClient(\n    api_key=\"your_upstage_api_key\",\n    output_dir=\"exported_documents\"\n)\n\n# Convert a document to markdown\nmarkdown_content = client.convert_to_markdown(\"path/to/your/document.pdf\")\n\n# Convert a document to HTML\nhtml_content = client.convert_to_html(\"path/to/your/document.docx\")\n\n# Convert a document to plain text\ntext_content = client.convert_to_text(\"path/to/your/image.jpg\")\n```\n\n## API Parameters\n\nThe plugin uses the following parameters when calling the Upstage Document Parse API:\n\n| Parameter | Type | Description | Default |\n|-----------|------|-------------|---------|\n| `document` | File | The document file to be processed | Required |\n| `ocr` | String | Controls OCR behavior: \"auto\" (apply to images only) or \"force\" (convert all to images first) | \"auto\" |\n| `coordinates` | Boolean | Whether to return bounding box coordinates | false |\n| `chart_recognition` | Boolean | Whether to use chart recognition | true |\n| `output_formats` | List[String] | Format for layout elements: \"text\", \"html\", \"markdown\" | [\"html\", \"markdown\", \"text\"] |\n| `model` | String | Model used for inference | \"document-parse-250305\" |\n| `base64_encoding` | List[String] | Layout categories to provide as base64 encoded strings | [\"table\", \"figure\", \"chart\"] |\n\n## Caching Mechanism\n\nThe plugin implements an efficient caching system:\n\n1. File content hashing to identify duplicate documents\n2. Result caching based on content hash and output format\n3. TTL-based cache expiration (default: 1 hour)\n\n## Examples\n\n### Converting a PDF to Markdown\n\n```python\nclient = UpstageDocumentParseClient(api_key=\"your_api_key\")\nmarkdown = client.convert_to_markdown(\"sample.pdf\")\nprint(markdown)\n```\n\n### Processing a Large Document\n\n```python\nclient = UpstageDocumentParseClient(api_key=\"your_api_key\")\nexported_files = client.process_document(\n    \"large_document.pdf\",\n    wait=True,\n    poll_interval=2,\n    max_wait=600\n)\nprint(f\"Files exported: {exported_files}\")\n```\n\n## Development\n\n### Project Structure\n\n- `upstage-documentparse.py`: Main Dify plugin integration\n- `upstage_client.py`: Core client for interacting with the Upstage API\n- `requirements.txt`: Python dependencies\n\n### Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\n[MIT License](LICENSE.md)\n\n## Contact\n\n**For any inquiries, please contact:**  \ndev@brain-crew.com\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteddynote-lab%2Fdify-upstageparser-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteddynote-lab%2Fdify-upstageparser-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteddynote-lab%2Fdify-upstageparser-plugin/lists"}