{"id":15674661,"url":"https://github.com/tuanacelik/unstructuredio-haystack","last_synced_at":"2025-05-06T23:25:00.926Z","repository":{"id":191229888,"uuid":"683025238","full_name":"TuanaCelik/unstructuredio-haystack","owner":"TuanaCelik","description":"💙 Unstructured Data Connectors for Haystack 2.0","archived":false,"fork":false,"pushed_at":"2023-09-21T12:43:43.000Z","size":23,"stargazers_count":16,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-31T04:01:48.458Z","etag":null,"topics":["haystack","llm","nlp","python","unstructured-data"],"latest_commit_sha":null,"homepage":"https://haystack.deepset.ai/integrations","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TuanaCelik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-25T12:21:30.000Z","updated_at":"2024-07-14T03:41:42.000Z","dependencies_parsed_at":"2024-10-23T12:09:21.863Z","dependency_job_id":null,"html_url":"https://github.com/TuanaCelik/unstructuredio-haystack","commit_stats":null,"previous_names":["tuanacelik/unstructuredio-haystack"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TuanaCelik%2Funstructuredio-haystack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TuanaCelik%2Funstructuredio-haystack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TuanaCelik%2Funstructuredio-haystack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TuanaCelik%2Funstructuredio-haystack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TuanaCelik","download_url":"https://codeload.github.com/TuanaCelik/unstructuredio-haystack/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252784525,"owners_count":21803699,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["haystack","llm","nlp","python","unstructured-data"],"created_at":"2024-10-03T15:48:56.207Z","updated_at":"2025-05-06T23:25:00.892Z","avatar_url":"https://github.com/TuanaCelik.png","language":"Python","readme":"# Unstructured Haystack\n\n[![PyPI - Version](https://img.shields.io/pypi/v/unstructured-haystack.svg)](https://pypi.org/project/unstructured-haystack)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unstructured-haystack.svg)](https://pypi.org/project/unstructured-haystack)\n\n-----\n\n## Unstructured Connectors for Haystack\n\nThis is an example Haystack 2.0 integration. It's an integration for Unstructured.io connectors. Please contribute 🚀\n\nThe current version has 2 available Unstructured connectors:\n- **Discord**: `UnstructuredDiscordConnector`\n- **GitHub**: `UnstructuredGitHubConnector`\n- **Google Drive**: `UnstructuredGoogleDriveConnector`\n\n## How to use in a Haystack 2.0 Pipeline \nFor example, you can write documents fetched from Discord using the `UnstructuredDiscordConnector`:\n\n```python\nfrom haystack.preview import Pipeline\nfrom haystack.preview.components.writers import DocumentWriter\nfrom unstructured_haystack import UnstructuredDiscordConnector\nfrom chroma_haystack import ChromaDocumentStore\n\n# Chroma is used in-memory so we use the same instances in the two pipelines below\ndocument_store = ChromaDocumentStore()\nconnector = UnstructuredDiscordConnector(api_key=\"UNSTRUCTURED_API_KEY\", discord_token=\"DISCORD_TOKEN\")\n\nindexing = Pipeline()\nindexing.add_component(\"connector\", connector)\nindexing.add_component(\"writer\", DocumentWriter(document_store))\nindexing.connect(\"connector.documents\", \"writer.documents\")\nindexing.run({\"connector\": {\"channels\" : \"993539071815200889\", \"period\": 3, \"output_dir\" : \"discord-example\"}})\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuanacelik%2Funstructuredio-haystack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftuanacelik%2Funstructuredio-haystack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuanacelik%2Funstructuredio-haystack/lists"}