Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tuanacelik/unstructuredio-haystack
💙 Unstructured Data Connectors for Haystack 2.0
https://github.com/tuanacelik/unstructuredio-haystack
haystack llm nlp python unstructured-data
Last synced: 14 days ago
JSON representation
💙 Unstructured Data Connectors for Haystack 2.0
- Host: GitHub
- URL: https://github.com/tuanacelik/unstructuredio-haystack
- Owner: TuanaCelik
- License: mit
- Created: 2023-08-25T12:21:30.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-21T12:43:43.000Z (about 1 year ago)
- Last Synced: 2024-10-04T15:49:00.001Z (about 1 month ago)
- Topics: haystack, llm, nlp, python, unstructured-data
- Language: Python
- Homepage: https://haystack.deepset.ai/integrations
- Size: 22.5 KB
- Stars: 16
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Unstructured Haystack
[![PyPI - Version](https://img.shields.io/pypi/v/unstructured-haystack.svg)](https://pypi.org/project/unstructured-haystack)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unstructured-haystack.svg)](https://pypi.org/project/unstructured-haystack)-----
## Unstructured Connectors for Haystack
This is an example Haystack 2.0 integration. It's an integration for Unstructured.io connectors. Please contribute 🚀
The current version has 2 available Unstructured connectors:
- **Discord**: `UnstructuredDiscordConnector`
- **GitHub**: `UnstructuredGitHubConnector`
- **Google Drive**: `UnstructuredGoogleDriveConnector`## How to use in a Haystack 2.0 Pipeline
For example, you can write documents fetched from Discord using the `UnstructuredDiscordConnector`:```python
from haystack.preview import Pipeline
from haystack.preview.components.writers import DocumentWriter
from unstructured_haystack import UnstructuredDiscordConnector
from chroma_haystack import ChromaDocumentStore# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore()
connector = UnstructuredDiscordConnector(api_key="UNSTRUCTURED_API_KEY", discord_token="DISCORD_TOKEN")indexing = Pipeline()
indexing.add_component("connector", connector)
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("connector.documents", "writer.documents")
indexing.run({"connector": {"channels" : "993539071815200889", "period": 3, "output_dir" : "discord-example"}})
```