Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tddschn/langchain-utils
LangChain Utilities for prompt generation from documents, URLs, and arbitrary files - streamlining your interactive workflow with LLMs!
https://github.com/tddschn/langchain-utils
chatgpt cli langchain llm pandoc prompt python3 utility
Last synced: about 15 hours ago
JSON representation
LangChain Utilities for prompt generation from documents, URLs, and arbitrary files - streamlining your interactive workflow with LLMs!
- Host: GitHub
- URL: https://github.com/tddschn/langchain-utils
- Owner: tddschn
- Created: 2023-04-09T04:40:52.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-06-05T17:20:20.000Z (7 months ago)
- Last Synced: 2024-12-25T03:11:51.099Z (4 days ago)
- Topics: chatgpt, cli, langchain, llm, pandoc, prompt, python3, utility
- Language: Python
- Homepage: https://pypi.org/project/langchain-utils
- Size: 745 KB
- Stars: 9
- Watchers: 3
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# langchain-utils
LangChain Utilities
- [langchain-utils](#langchain-utils)
- [Prompt generation using LangChain document loaders](#prompt-generation-using-langchain-document-loaders)
- [Demos](#demos)
- [`pandocprompt`](#pandocprompt)
- [`urlprompt`](#urlprompt)
- [`pdfprompt`](#pdfprompt)
- [`ytprompt`](#ytprompt)
- [`textprompt`](#textprompt)
- [`htmlprompt`](#htmlprompt)
- [Installation](#installation)
- [pipx](#pipx)
- [pip](#pip)
- [Develop](#develop)## Prompt generation using LangChain document loaders
Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?
If yes, these tools are for you!
Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.
Basically, the prompts generated look like this:
```python
REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:"""
{content}
"""
'''.strip()
```You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.
See [`prompts.py`](./langchain_utils/prompts.py) for other variations.
### Demos
- Loading `https://github.com/tddschn/langchain-utils` and copy to clipboard:
- Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the `gpt-3.5-turbo`'s context length limit with langchain's `TokenTextSplitter`.
### `pandocprompt`
```
$ pandocprompt --helpusage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]
[--to PANDOC_TO_FORMAT]
[PATH ...]Get prompts from arbitrary files. You need to have `pandoc` installed and in
$PATH, it will be used to convert source files to desired (hopefully textual)
format. Common use cases: Getting prompts from EPub books or several TeX
files.positional arguments:
PATH Paths to the text files, or stdin if not provided
(default: None)options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
document)
-M, --merge Merge contents of all pages before processing
(default: False)
--from PANDOC_FROM_FORMAT
The format that is passed to -f in pandoc (default:
None)
--to PANDOC_TO_FORMAT
The format that is passed to -t in pandoc. gfm-
raw_html means GitHub Flavored Markdown with raw HTML
stripped. (default: gfm-raw_html)```
### `urlprompt````
$ urlprompt --helpusage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]
[-M] [-j] [-g] [--github-path GITHUB_PATH]
[--github-revision GITHUB_REVISION] [--substack]
URLGet a prompt consisting the text content of a webpage
positional arguments:
URL URL to the webpageoptions:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
webpage)
-M, --merge Merge contents of all pages before processing
(default: False)
-j, --javascript Use JavaScript to render the page (default: False)
-g, --github Load the raw file from a GitHub URL (default: False)
--github-path GITHUB_PATH
Path to the GitHub file (default: README.md)
--github-revision GITHUB_REVISION
Revision for the GitHub file (default: master)
--substack Load from a Substack URL and convert it to Markdown
(default: False)```
### `pdfprompt````
$ pdfprompt --helpusage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT]
[-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]
[-O] [-L OCR_LANGUAGE]
PDF PathGet a prompt consisting the text content of a PDF file
positional arguments:
PDF Path Path to the PDF fileoptions:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-p PAGES [PAGES ...], --pages PAGES [PAGES ...]
Only include specified page numbers (default: None)
-l PAGE_SLICE, --page-slice PAGE_SLICE
Use Python slice syntax to select page numbers (e.g.
1:3, 1:10:2, etc.) (default: None)
-M, --merge Merge contents of all pages before processing
(default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a PDF
file)
-o, --fallback-ocr Use OCR as fallback if no text detected on page,
please set TESSDATA_PREFIX environment variable to the
path of your tesseract data directory (default: False)
-O, --force-ocr Force OCR on all pages (default: False)
-L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
Language to use for Tesseract OCR (like eng, chi_sim,
chi_tra, chi_tra_vert etc.)) (default: eng)```
### `ytprompt````
$ ytprompt --helpusage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT]
YouTube URLGet a prompt consisting Title and Transcript of a YouTube Video
positional arguments:
YouTube URL YouTube URLoptions:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)```
### `textprompt````
$ textprompt --helpusage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M]
[PATH ...]Get a prompt from text files
positional arguments:
PATH Paths to the text files, or stdin if not provided
(default: None)options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
document)
-M, --merge Merge contents of all pages before processing
(default: False)```
### `htmlprompt````
$ htmlprompt --helpusage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M]
[PATH ...]Get a prompt from html files
positional arguments:
PATH Paths to the html files, or stdin if not provided
(default: None)options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the text content of a
html file)
-M, --merge Merge contents of all pages before processing
(default: False)```
## Installation
### pipx
This is the recommended installation method.
```
$ pipx install langchain-utils
```### [pip](https://pypi.org/project/langchain-utils/)
```
$ pip install langchain-utils
```## Develop
```
$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install
```