https://github.com/unstructured-io/irs-manual-demo
https://github.com/unstructured-io/irs-manual-demo
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/unstructured-io/irs-manual-demo
- Owner: Unstructured-IO
- Created: 2023-04-09T21:32:13.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-09T19:17:25.000Z (about 3 years ago)
- Last Synced: 2025-06-14T14:04:41.837Z (about 1 year ago)
- Language: Python
- Size: 63.8 MB
- Stars: 15
- Watchers: 5
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Chat with IRS Manuals
This directory contains an application for chatting with IRS manuals. Once data is available, the chat application only uses self-hosted models and can be run in a disconnected environment. Here's how to get started with the chatbot:
### Installation
```bash
pip install -r requirements.txt
```
### Environment Variables
**Note there are other options for these connections, but these are the ones referenced in this implementation**
[OpenAI](https://platform.openai.com/docs/api-reference)
[Pinecone](https://docs.pinecone.io/)
```python
PINECONE_API_KEY
PINECONE_API_ENV
OPENAI_API_KEY
PINECONE_INDEX_NAME
```
### Download PDFs from IRS website
```bash
python download_data.py
```

### Run PDFs against unstructured-ingest
```bash
PYTHONPATH=. ./unstructured/ingest/main.py \
--local-input-path \
--structured-output-dir \
# optional parameter -> this will hit the *NEW* API vs. processing locally
--partition-by-api
```

Here's an example of the structured json output

### Seed and utilize vector db
```bash
python ingest_data.py
```
### Run the chat CLI
```bash
python cli_app.py
```

### Chat with our hosted instance [here](https://huggingface.co/spaces/unstructuredio/irs-manuals)