Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mayooear/gpt4-pdf-chatbot-langchain
GPT4 & LangChain Chatbot for large PDF docs
https://github.com/mayooear/gpt4-pdf-chatbot-langchain
gpt4 langchain nextjs openai pdf typescript
Last synced: 4 days ago
JSON representation
GPT4 & LangChain Chatbot for large PDF docs
- Host: GitHub
- URL: https://github.com/mayooear/gpt4-pdf-chatbot-langchain
- Owner: mayooear
- Created: 2023-03-17T01:23:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-29T01:25:10.000Z (5 months ago)
- Last Synced: 2024-12-02T17:07:46.797Z (11 days ago)
- Topics: gpt4, langchain, nextjs, openai, pdf, typescript
- Language: TypeScript
- Homepage: https://www.youtube.com/watch?v=ih9PBGVVOO4
- Size: 5.21 MB
- Stars: 14,948
- Watchers: 150
- Forks: 3,019
- Open Issues: 31
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome - mayooear/gpt4-pdf-chatbot-langchain - GPT4 & LangChain Chatbot for large PDF docs (TypeScript)
- awesome-chatgpt - gpt4-pdf-chatbot-langchain - Chatbot for large PDF files. (Bots / Examples)
- allinchatgpt - gpt4-pdf-chatbot-langchain - 使用新的 GPT-4 API 为多个大型 PDF 文件构建聊天 GPT 聊天机器人。 (Uncategorized / Uncategorized)
- awesome-ChatGPT-repositories - gpt4-pdf-chatbot-langchain - GPT4 & LangChain Chatbot for large PDF docs (Chatbots)
- awesome-gpt4 - gpt4-pdf-chatbot-langchain - GPT4 & LangChain Chatbot for large PDF docs. (Tools / Open-source projects)
- chatgpt-awesome - gpt4-pdf-chatbot-langchain
- stars - mayooear/gpt4-pdf-chatbot-langchain - GPT4 & LangChain Chatbot for large PDF docs (TypeScript)
- awesome-gpt4-zh-CN - gpt4-pdf-chatbot-langchain - GPT4 和 LangChain 聊天机器人,适用于大型 PDF 文件。 (工具 / 开源项目)
- StarryDivineSky - mayooear/gpt4-pdf-chatbot-langchain
- awesome-gpt - gpt4-pdf-chatbot-langchain - GPT4 & LangChain Chatbot for large PDF docs. (Applications and Demos / LLM (Large Language Model))
- awesome-chatgpt - gpt4-pdf-chatbot-langchain - Chatbot for large PDF files. (Bots / Examples)
- awesome-gpt - gpt4-pdf-chatbot-langchain
- awesome-open-gpt - gpt4-pdf-chatbot-langchain - 4 api为多个大型PDF文件构建chatGPT聊天机器人。 | (精选开源项目合集 / GPT工具)
- AiTreasureBox - mayooear/gpt4-pdf-chatbot-langchain - 12-07_14955_0](https://img.shields.io/github/stars/mayooear/gpt4-pdf-chatbot-langchain.svg) | GPT4 & LangChain Chatbot for large PDF docs | (Repos)
- Awesome-ChatGPT - gpt4-pdf-chatbot-langchain - 4 api为多个大型PDF文件构建chatGPT聊天机器人。 | (精选开源项目合集 / GPT工具)
README
# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Files
Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF files.
Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next.js. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar docs.
[Tutorial video](https://www.youtube.com/watch?v=ih9PBGVVOO4)
[Join the discord if you have questions](https://discord.gg/E4Mc77qwjm)
The visual guide of this repo and tutorial is in the `visual guide` folder.
**If you run into errors, please review the troubleshooting section further down this page.**
Prelude: Please make sure you have already downloaded node on your system and the version is 18 or greater.
## Development
1. Clone the repo or download the ZIP
```
git clone [github https url]
```2. Install packages
First run `npm install yarn -g` to install yarn globally (if you haven't already).
Then run:
```
yarn install
```After installation, you should now see a `node_modules` folder.
3. Set up your `.env` file
- Copy `.env.example` into `.env`
Your `.env` file should look like this:```
OPENAI_API_KEY=PINECONE_API_KEY=
PINECONE_ENVIRONMENT=PINECONE_INDEX_NAME=
```
- Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file.
- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard.4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `npm run ingest`. This namespace will later be used for queries and retrieval.
5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAI` to `gpt-4`, if you have access to `gpt-4` api. Please verify outside this repo that you have access to `gpt-4` api, otherwise the application will not work.
## Convert your PDF files to embeddings
**This repo can load multiple PDF files**
1. Inside `docs` folder, add your pdf files or folders that contain pdf files.
2. Run the script `yarn run ingest` to 'ingest' and embed your docs. If you run into errors troubleshoot below.
3. Check Pinecone dashboard to verify your namespace and vectors have been added.
## Run the app
Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment, and then type a question in the chat interface.
## Troubleshooting
In general, keep an eye out in the `issues` and `discussions` section of this repo for solutions.
**General errors**
- Make sure you're running the latest Node version. Run `node -v`
- Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text.
- `Console.log` the `env` variables and make sure they are exposed.
- Make sure you're using the same versions of LangChain and Pinecone as this repo.
- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name.
- If you change `modelName` in `OpenAI`, make sure you have access to the api for the appropriate model.
- Make sure you have enough OpenAI credits and a valid card on your billings account.
- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local `env` file from the project will be overwritten by systems `env` variable.
- Try to hard code your API keys into the `process.env` variables if there are still issues.**Pinecone errors**
- Make sure your pinecone dashboard `environment` and `index` matches the one in the `pinecone.ts` and `.env` files.
- Check that you've set the vector dimensions to `1536`.
- Make sure your pinecone namespace is in lowercase.
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days.
- Retry from scratch with a new Pinecone project, index, and cloned repo.## Credit
Frontend of this repo is inspired by [langchain-chat-nextjs](https://github.com/zahidkhawaja/langchain-chat-nextjs)