An open API service indexing awesome lists of open source software.

https://github.com/unstructured-io/irs-manual-demo


https://github.com/unstructured-io/irs-manual-demo

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

## Chat with IRS Manuals

This directory contains an application for chatting with IRS manuals. Once data is available, the chat application only uses self-hosted models and can be run in a disconnected environment. Here's how to get started with the chatbot:

### Installation

```bash
pip install -r requirements.txt
```

### Environment Variables

**Note there are other options for these connections, but these are the ones referenced in this implementation**

[OpenAI](https://platform.openai.com/docs/api-reference)

[Pinecone](https://docs.pinecone.io/)

```python
PINECONE_API_KEY
PINECONE_API_ENV
OPENAI_API_KEY
PINECONE_INDEX_NAME
```

### Download PDFs from IRS website

```bash
python download_data.py
```
![Download](./gifs/down.gif)

### Run PDFs against unstructured-ingest

```bash
PYTHONPATH=. ./unstructured/ingest/main.py \
--local-input-path \
--structured-output-dir \
# optional parameter -> this will hit the *NEW* API vs. processing locally
--partition-by-api
```

![Download gif](./gifs/process.gif)

Here's an example of the structured json output

![JSON](./gifs/sbys.gif)

### Seed and utilize vector db

```bash
python ingest_data.py
```

### Run the chat CLI

```bash
python cli_app.py
```

![Chat](./gifs/chat.gif)

### Chat with our hosted instance [here](https://huggingface.co/spaces/unstructuredio/irs-manuals)