https://github.com/sukhbinder/talk2doc

A tool to ask questions to documents in a set of PDFs
https://github.com/sukhbinder/talk2doc

Last synced: 4 months ago
JSON representation

A tool to ask questions to documents in a set of PDFs

Host: GitHub
URL: https://github.com/sukhbinder/talk2doc
Owner: sukhbinder
License: apache-2.0
Created: 2024-08-11T11:41:03.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-19T15:12:23.000Z (11 months ago)
Last Synced: 2025-01-22T14:41:40.651Z (6 months ago)
Language: Python
Size: 12.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # talk2doc

A tool to ask questions to documents in a set of PDFs

This works with ollama.

## Installation

To install, run:

```bash

pip install talk2doc

```

You may need to adjust the version of `langchain_community` based on the actual library requirement.

## Usage

Run:

```bash

talk2doc [model_name] [-p PDF_FILES] [-s CHUNK_SIZE] [-o CHUNK_OVERLAP] [-k TOP_K]

```

Replace `[model_name]` with the name of an LLM model (e.g., "mistral", "gemma").

Replace `PDF_FILES` with a list of paths to PDF files.

Optional arguments:

* `-s CHUNK_SIZE`: Chunk size. Default: 500.

* `-o CHUNK_OVERLAP`: Chunk overlap. Default: 50.

* `-k TOP_K`: Top K docs to return. Default: 6.

Example usage:

```bash

talk2doc mistral -p /path/to/pdfs.pdf -s 1000 -o 75 -k 10

```

This will use the "mistral" model, load PDF files from `/path/to/pdfs.pdf`, split each document into chunks of size 1000 with overlap 75, and return up to 10 top matches for each question.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sukhbinder/talk2doc

Awesome Lists containing this project

README