Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/namuan/dr-doc-search
Converse with book - Built with GPT-3
https://github.com/namuan/dr-doc-search
gpt3 langchain nlp-machine-learning python summarization
Last synced: 4 days ago
JSON representation
Converse with book - Built with GPT-3
- Host: GitHub
- URL: https://github.com/namuan/dr-doc-search
- Owner: namuan
- License: mit
- Created: 2023-01-03T22:31:17.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-01T00:11:26.000Z (4 months ago)
- Last Synced: 2025-01-24T06:05:41.496Z (11 days ago)
- Topics: gpt3, langchain, nlp-machine-learning, python, summarization
- Language: Python
- Homepage:
- Size: 28.8 MB
- Stars: 603
- Watchers: 10
- Forks: 60
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
- awesome-langchain-zh - Doc Search - doc-search?style=social): 与书籍对话 - 使用 GPT-3 构建 (开源项目 / 其他聊天机器人)
- awesome-ChatGPT-repositories - dr-doc-search - Converse with book - Built with GPT-3 (Langchain)
- awesome-generative-ai - TalkToPapers - namuan/dr-doc-search: Converse with book - Built with GPT-3
- awesome-langchain - Doc Search - Built with GPT-3 ![GitHub Repo stars](https://img.shields.io/github/stars/namuan/dr-doc-search?style=social) (Open Source Projects / Other / Chatbots)
README
# Doc Search
[![PyPI](https://img.shields.io/pypi/v/dr-doc-search?style=flat-square)](https://pypi.python.org/pypi/dr-doc-search/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dr-doc-search?style=flat-square)](https://pypi.python.org/pypi/dr-doc-search/)
[![PyPI - License](https://img.shields.io/pypi/l/dr-doc-search?style=flat-square)](https://pypi.python.org/pypi/dr-doc-search/)Converse with a book (PDF)
![](assets/dr-doc-search-github-demo.gif)
See [tweet](https://twitter.com/deskriders_twt/status/1612088387984588802) for full demo.
---
**Documentation**: [https://namuan.github.io/dr-doc-search](https://namuan.github.io/dr-doc-search)
**Source Code**: [https://github.com/namuan/dr-doc-search](https://github.com/namuan/dr-doc-search)
**PyPI**: [https://pypi.org/project/dr-doc-search/](https://pypi.org/project/dr-doc-search/)
---
## Pre-requisites
- [Tessaract OCR](https://github.com/tesseract-ocr/tesseract)
- [ImageMagick](https://imagemagick.org/index.php)> **Note:**
> If you are using Windows, then make sure that you set the location
> of ImageMagick executable in the `IMCONV` environment variable.```shell
# For example, if you have installed ImageMagick in PROGRAMFILES\ImageMagick-7.1.0-Q16-HDRI
set IMCONV="%PROGRAMFILES%\ImageMagick-7.1.0-Q16-HDRI\magick"
```## Installation
```sh
pip install dr-doc-search
```## Example Usage
There are two steps to use this application:
**1.** First, you need to create the index and generate embeddings for the PDF file.
Here I'm using a PDF file generated from this page [Parable of a Monetary Economy
](http://heteconomist.com/parable-of-a-monetary-economy/)Before running this, you need to set up your OpenAI API key. You can get it from [OpenAI](https://beta.openai.com/account/api-keys).
> From version 1.5.0, you can skip OpenAI and use HuggingFace models to generate embeddings and answers.
```shell
export OPENAI_API_KEY=
```The run the following command to start the training process:
```shell
dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
```Use `huggingface` for generating embeddings:
```shell
dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --embedding huggingface
```The training process generates some temporary files in the `OutputDir/dr-doc-search/` folder under your home directory.
Here is what it looks like:```text
~/OutputDir/dr-doc-search/parable-of-a-monetary-economy-heteconomist
$ tree
.
├── images
│ ├── output-1.png
│ ├── output-10.png
│ ├── output-11.png
...
│ └── output-9.png
├── index
│ ├── docsearch.index
│ └── index.pkl
├── parable-of-a-monetary-economy-heteconomist.pdf
└── scanned
├── output-1.txt
...
└── output-9.txt
```> **Note:**
> It is possible to change the base of the output directory by providing the `--app-dir` argument.**2.** Now that we have the index, we can use it to start asking questions.
```shell
dr-doc-search -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --input-question "How did the attempt to reduce the debut resulted in decrease in employment?"
```Or You can open up a web interface (on port :5006) to ask questions:
```shell
dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
```To use `huggingface` model, provide the `--llm` argument:
```shell
dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --llm huggingface
```There are more options for choose the start and end pages for the PDF file.
See the help for more details:```shell
dr-doc-search --help
```## Acknowledgements
- [anton/@abacaj](https://twitter.com/abacaj/status/1608163940726358024) for the idea
- [LangChain](https://github.com/hwchase17/langchain)
- [HoloViz Panel](https://panel.holoviz.org/)
- [OpenAI](https://beta.openai.com/)## Development
* Clone this repository
* Requirements:
* Python 3.7+
* [Poetry](https://python-poetry.org/)* Create a virtual environment and install the dependencies
```sh
poetry install
```* Activate the virtual environment
```sh
poetry shell
```### Validating build
```sh
make build
```### Release process
A release is automatically published when a new version is bumped using `make bump`.
See `.github/workflows/build.yml` for more details.
Once the release is published, `.github/workflows/publish.yml` will automatically publish it to PyPI.### Disclaimer
This project is not affiliated with OpenAI.
The OpenAI API and GPT-3 language model are not free after the trial period.