{"id":14193412,"url":"https://github.com/aryn-ai/sycamore","last_synced_at":"2026-03-10T10:12:52.408Z","repository":{"id":212195027,"uuid":"663604762","full_name":"aryn-ai/sycamore","owner":"aryn-ai","description":"🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.","archived":false,"fork":false,"pushed_at":"2026-02-18T22:32:44.000Z","size":116633,"stargazers_count":584,"open_issues_count":56,"forks_count":66,"subscribers_count":9,"default_branch":"main","last_synced_at":"2026-02-18T22:39:23.804Z","etag":null,"topics":["ai","dataprep","etl","information-retrieval","llm","ml","nlp","opensearch","search","semantic-search"],"latest_commit_sha":null,"homepage":"https://sycamore.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aryn-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-07-07T17:25:31.000Z","updated_at":"2026-02-18T20:26:15.000Z","dependencies_parsed_at":"2024-02-19T20:46:53.161Z","dependency_job_id":"850dff05-9f95-448c-9e26-ad5a61a2bbfe","html_url":"https://github.com/aryn-ai/sycamore","commit_stats":null,"previous_names":["aryn-ai/sycamore"],"tags_count":61,"template":false,"template_full_name":null,"purl":"pkg:github/aryn-ai/sycamore","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryn-ai%2Fsycamore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryn-ai%2Fsycamore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryn-ai%2Fsycamore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryn-ai%2Fsycamore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aryn-ai","download_url":"https://codeload.github.com/aryn-ai/sycamore/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryn-ai%2Fsycamore/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30329698,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","dataprep","etl","information-retrieval","llm","ml","nlp","opensearch","search","semantic-search"],"created_at":"2024-08-18T18:01:03.467Z","updated_at":"2026-03-10T10:12:52.388Z","avatar_url":"https://github.com/aryn-ai.png","language":"Python","funding_links":[],"categories":["Data Integration \u0026 Migration","Data Pipeline","Text"],"sub_categories":["Document Parsing"],"readme":"\u003ca name=\"readme-top\"\u003e\u003c/a\u003e\n![SycamoreLogoFinal.svg](https://raw.githubusercontent.com/aryn-ai/sycamore/main/docs/source/images/sycamore_logo.svg)\n\n[![PyPI](https://img.shields.io/pypi/v/sycamore-ai)](https://pypi.org/project/sycamore-ai/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sycamore-ai)](https://pypi.org/project/sycamore-ai/)\n[![Slack](https://img.shields.io/badge/slack-sycamore-brightgreen.svg?logo=slack)](https://join.slack.com/t/aryn-community/shared_invite/zt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw)\n[![Docs](https://readthedocs.org/projects/sycamore/badge/?version=stable)](https://sycamore.readthedocs.io/en/stable/?badge=stable)\n![License](https://img.shields.io/github/license/aryn-ai/sycamore)\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/aryn-ai/sycamore)\n\nSycamore is an open source, AI-powered document processing engine for ETL, RAG, LLM-based applications, and analytics on unstructured data. Sycamore can partition and enrich a wide range of document types including reports, presentations, transcripts, manuals, and more. It can analyze and chunk complex documents such as PDFs and images with embedded tables, figures, graphs, and other infographics. Check out an [example notebook](https://github.com/aryn-ai/sycamore/blob/main/notebooks/sycamore-tutorial-intermediate-etl.ipynb).\n\nFor processing documents, Sycamore leverages [Aryn DocParse](https://www.aryn.ai/post/announcing-the-aryn-partitioning-service) (formerly known as the Aryn Partitioning Service), a serverless, GPU-powered API for segmenting and labeling documents, doing OCR, extracting tables and images, and more. It leverages Aryn's state-of-the-art, [open source deep learning DETR AI model](https://huggingface.co/Aryn/deformable-detr-DocLayNet) trained on 80k+ enterprise documents, and it can lead to 6x more accurate data chunking and 2x improved recall on hybrid search or RAG when compared to alternate systems. You can [sign-up for free here](http://www.aryn.ai/get-started), or choose to run the Aryn Partitioner locally.\n\nAryn DocParse takes [documents](https://docs.aryn.ai/docparse/formats_supported) and returns the partitioned output in JSON, and you can use Sycamore for additional data extraction, enrichment, transforms, cleaning, and loading into downstream databases. You can choose the LLMs to use with these transforms.\n\nSycamore reliably loads your vector databases and hybrid search engines, including as OpenSearch, ElasticSearch, Pinecone, DuckDB, Qdrant and Weaviate, with higher quality data. \n\nThe Sycamore framework is built around a scalable and robust abstraction for document processing called a DocSet, and includes powerful high-level transformations in Python for data processing, enrichment, and cleaning. DocSets also encapsulate scalable data processing techniques removing the undifferentiated heavy lifting of reliably loading chunks. DocSets' functional programming approach allows you to rapidly customize and experiment with your chunking for better quality RAG results.\n\n![Untitled](docs/source/images/SycamoreDataflowDiagramv2.png)\n\n## Features\n\n- Integrated with [Aryn DocParse](https://sycamore.readthedocs.io/en/stable/aryn_cloud/aryn_partitioning_service.html), using a [state-of-the art vision AI model](https://huggingface.co/Aryn/deformable-detr-DocLayNet) for segmentation and preserving the semantic structure of documents\n- DocSet abstraction to scalably and reliably transform and manipulate unstructured documents\n- High-quality table extraction, OCR, visual summarization, LLM-powered UDFs, and other performant Python data transforms\n- Quickly create vector embeddings using your choice of AI model\n- Helpful features like automatic data crawlers (Amazon S3 and HTTP), Jupyter notebook for writing and iterating on jobs, and an OpenSearch hybrid search and RAG engine for testing\n- Scalable [Ray](https://github.com/ray-project/ray) backend\n\n## Demo\n\n[Introduction to Aryn DocParse (formerly known as the Aryn Partitioning Service)](https://www.aryn.ai/?name=ArynPartitioningService_Intro)\n\n## Get Started\n\nSycamore currently runs on Linux and Mac OS. To install , run:\n\n```pip install sycamore-ai```\n\nSycamore provides connectors to vector databases via Python extras. To install a connector, include it as an extra with your pip install. For example, \n\n```pip install sycamore-ai[duckdb]```\n\nSupported connectors include `duckdb`, `elasticsearch`, `opensearch`, `pinecone`, `qdrant`, and `weaviate`.\n\nTo use Aryn DocParse, [sign-up for free here](https://www.aryn.ai/get-started) and use the API key.\n\n## Resources\n\n- Documentation: https://sycamore.readthedocs.io\n- Example notebook: https://github.com/aryn-ai/sycamore/blob/main/notebooks/sycamore-tutorial-intermediate-etl.ipynb\n- Slack: https://join.slack.com/t/aryn-community/shared_invite/zt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw\n- Data preparation libraries (PyPi): https://pypi.org/project/sycamore-ai/\n- Contact us: info@aryn.ai\n\n## Contributing\n\nCheck out our [Contributing Guide](https://github.com/aryn-ai/sycamore/blob/main/CONTRIBUTING.md) for more information about how to contribute to Sycamore and set up your environment for development.\n\n## Contributors\n\n\u003ca href=\"https://github.com/aryn-ai/sycamore/graphs/contributors\"\u003e\n  \u003cimg alt=\"contributors\" src=\"https://contrib.rocks/image?repo=aryn-ai/sycamore\"/\u003e\n\u003c/a\u003e\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=aryn-ai/sycamore\u0026type=Date)](https://star-history.com/#aryn-ai/sycamore\u0026Date)\n\n\u003cp align=\"right\" style=\"font-size: 14px; color: #555; margin-top: 20px;\"\u003e\n    \u003ca href=\"#readme-top\" style=\"text-decoration: none; color: #007bff; font-weight: bold;\"\u003e\n        ↑ Back to Top ↑\n    \u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryn-ai%2Fsycamore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faryn-ai%2Fsycamore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryn-ai%2Fsycamore/lists"}