Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sukhbinder/talk2doc
A tool to ask questions to documents in a set of PDFs
https://github.com/sukhbinder/talk2doc
Last synced: 2 days ago
JSON representation
A tool to ask questions to documents in a set of PDFs
- Host: GitHub
- URL: https://github.com/sukhbinder/talk2doc
- Owner: sukhbinder
- License: apache-2.0
- Created: 2024-08-11T11:41:03.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-19T15:12:23.000Z (5 months ago)
- Last Synced: 2024-11-21T23:53:19.730Z (2 months ago)
- Language: Python
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# talk2doc
A tool to ask questions to documents in a set of PDFsThis works with ollama.
## Installation
To install, run:
```bash
pip install talk2doc
```
You may need to adjust the version of `langchain_community` based on the actual library requirement.## Usage
Run:
```bash
talk2doc [model_name] [-p PDF_FILES] [-s CHUNK_SIZE] [-o CHUNK_OVERLAP] [-k TOP_K]
```Replace `[model_name]` with the name of an LLM model (e.g., "mistral", "gemma").
Replace `PDF_FILES` with a list of paths to PDF files.Optional arguments:
* `-s CHUNK_SIZE`: Chunk size. Default: 500.
* `-o CHUNK_OVERLAP`: Chunk overlap. Default: 50.
* `-k TOP_K`: Top K docs to return. Default: 6.Example usage:
```bash
talk2doc mistral -p /path/to/pdfs.pdf -s 1000 -o 75 -k 10
```This will use the "mistral" model, load PDF files from `/path/to/pdfs.pdf`, split each document into chunks of size 1000 with overlap 75, and return up to 10 top matches for each question.