https://github.com/terror/pdfathom
Query PDFs in natural language from the command-line
https://github.com/terror/pdfathom
Last synced: 3 months ago
JSON representation
Query PDFs in natural language from the command-line
- Host: GitHub
- URL: https://github.com/terror/pdfathom
- Owner: terror
- License: cc0-1.0
- Created: 2023-04-19T20:01:15.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-04-20T17:17:25.000Z (over 2 years ago)
- Last Synced: 2025-09-24T03:38:36.627Z (4 months ago)
- Language: Python
- Size: 7.51 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING
- License: LICENSE
Awesome Lists containing this project
README
**pdfathom** is a command-line utility that lets you query PDF documents with
natural language.

### Installation
You can install it via the pip package manager:
```bash
$ pip install pdfathom
```
### Configuration
**pdfathom** looks for a configuration file called `.pdfathom.json` located in
your home directory, and it looks like:
```
{"openai_api_key": ""}
```
You will be prompted for an OpenAI API key upon running the program if it's not
already present in the configuration file, this will also handle creating the
configuration file for you.
### Usage
Below is the output of `pdfathom --help`:
```present python3 pdfathom --help
usage: pdfathom [-h] [--config CONFIG] [--openai_api_key OPENAI_API_KEY]
[--chunk_size CHUNK_SIZE] [--chunk_overlap CHUNK_OVERLAP]
pdfs [pdfs ...]
positional arguments:
pdfs Path to the pdf file(s) or URL(s)
options:
-h, --help show this help message and exit
--config CONFIG, -c CONFIG
Path to the configuration file
--openai_api_key OPENAI_API_KEY, -k OPENAI_API_KEY
OpenAI API key
--chunk_size CHUNK_SIZE, -s CHUNK_SIZE
Chunk size
--chunk_overlap CHUNK_OVERLAP, -o CHUNK_OVERLAP
Chunk overlap
```
A sample run would look like `pdfathom a.pdf b.pdf https://someurl.com/baz.pdf`
(space-separated) to load in respective PDF files into an interactive REPL
environment (assuming those pdf files exist).
The REPL environment gives you access to a few commands that make it easier to
load and switch to different files:
```
- active: Prints the active PDF document.
- clear: Clears the terminal screen.
- exit: Exits the application.
- help: Displays the help text with available commands.
- list: Lists all loaded PDF documents.
- load : Loads a new PDF document from a specified path or URL.
- switch : Switches to another PDF document from a specified path or URL.
```