Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/natserract/natserract-ai
Using Doc2Vec, Langchain and OpenAI to chat with Natserract blog https://engineering-natserract.vercel.app/
https://github.com/natserract/natserract-ai
chatbot doc2vec gpt4 langchain natserract openai pgvector qatool
Last synced: 5 days ago
JSON representation
Using Doc2Vec, Langchain and OpenAI to chat with Natserract blog https://engineering-natserract.vercel.app/
- Host: GitHub
- URL: https://github.com/natserract/natserract-ai
- Owner: natserract
- Created: 2023-11-09T02:52:11.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2023-11-16T08:38:37.000Z (about 1 year ago)
- Last Synced: 2024-04-28T06:27:30.218Z (8 months ago)
- Topics: chatbot, doc2vec, gpt4, langchain, natserract, openai, pgvector, qatool
- Language: Python
- Homepage:
- Size: 7.79 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Natserract AI
This repository contains code that demonstrates how to build AI assistant using Langchain, integrating GPT-4 from OpenAI. The assistant can handle question-answering (QA), provide various tools, similarity search with Doc2Vec approach, to provide answers to user queries based on the provided documents.Throughout this journey, i use PostgreSQL as the main Database and PGVector extension to store the embeddings.
## Setup
Before running the script, you need to set up the required credentials and install the necessary libraries.### Install Required Libraries
You can install the required libraries using poetry. Run the following command in your terminal or command prompt:
```sh
poetry install
```### Install Spacy
```sh
python -m spacy download en_core_web_sm
```### Setup API Keys
The script uses the OpenAI API key. You need to set these API keys as environment variables in your system. Replace OPENAI_API_KEY with your actual API keys.### Setup Database
- Postgres 15
- Enable the extension
```sql
CREATE EXTENSION vector;
```## Running
```sh
poetry shellpoetry run python main.py
```## Process
![](process.png)## Demo
https://github.com/natserract/natserract-ai/assets/31182611/8b4be411-1fd4-4c9f-89cc-eb25670f7ead## Custom Datasets
Create `_datasets` directory and place all markdown documents in it.## Performance Considerations
If you need to perform this operation frequently and especially if the set of word vectors is large, it may be practical to use a database or a data store optimized for vector operations. These data stores can persist your word vectors and provide efficient similarity search functionality:
- FAISS by Facebook AI Research is a library for efficient similarity search and clustering of dense vectors.
- Elasticsearch has plugins like elasticsearch-vector-scoring to handle vector similarity.
- Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point.Using such systems can significantly speed up the similarity search process.