https://github.com/rririanto/unstructured-demo-streamlit
Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit
https://github.com/rririanto/unstructured-demo-streamlit
ai data data-extraction gpt unstructured unstructured-data
Last synced: 10 months ago
JSON representation
Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit
- Host: GitHub
- URL: https://github.com/rririanto/unstructured-demo-streamlit
- Owner: rririanto
- Created: 2023-08-01T05:39:14.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-01T16:39:37.000Z (over 2 years ago)
- Last Synced: 2025-03-23T18:54:14.866Z (11 months ago)
- Topics: ai, data, data-extraction, gpt, unstructured, unstructured-data
- Language: Python
- Homepage: https://unstructured-demo.streamlit.app
- Size: 6.84 KB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Extract your docs using Unstructured-IO
## Description
This [Streamlit](https://streamlit.io) app is designed to help you analyze and extract valuable insights from challenging data formats commonly found in enterprise settings, such as HTML, PDF, CSV, PNG, PPTX, and more.
This app uses [unstructured.io](https://unstructured.io) as a base library, providing an easy way to extract and convert unstructured data into a format compatible with popular vector databases and LLM frameworks. With this tool, you can streamline complex data handling and ensure compatibility with your preferred data analysis pipelines.
Supported file types:
| Category | Document Types |
|-----------|-------------------------------|
| Plaintext | `.txt`, `.eml`, `.msg`, `.xml`, `.html`, `.md`, `.rst`, `.json`, `.rtf` |
| Images | `.jpeg`, `.png` |
| Documents | `.doc`, `.docx`, `.ppt`, `.pptx`, `.pdf`, `.odt`, `.epub`, `.csv`, `.tsv`, `.xlsx` |
Find out more about it [unstructured.io](https://github.com/Unstructured-IO/unstructured-api)
To get started, upload any docs file and it will be show's on the preview. You can also adjust the parameters to fine-tune your tests.
## Accessing the App
You can access the app on the Streamlit Cloud community at [https://unstructured-demo.streamlit.app/](https://unstructured-demo.streamlit.app/).
## Getting Started
The app does not require any API key to function; extractions will be processed on streamlit cloud serverunless you choose to process them on unstructured.io server.
However, if you choose to use unstructured.io API, I gave you a temporary key in the app, but it might be limited. Create your own at [unstructured](https://unstructured.io). After obtaining your API key, select unstructured.io API, enter your own API, and upload your file.
##
## Feedback
If you have any feedback or questions about this app, please reach out to me on Twitter at [@rririanto](https://twitter.com/rririanto).
Thank you for checking out the tool!