Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/natserract/natserract-ai

Using Doc2Vec, Langchain and OpenAI to chat with Natserract blog https://engineering-natserract.vercel.app/
https://github.com/natserract/natserract-ai

chatbot doc2vec gpt4 langchain natserract openai pgvector qatool

Last synced: about 2 months ago
JSON representation

Using Doc2Vec, Langchain and OpenAI to chat with Natserract blog https://engineering-natserract.vercel.app/

Host: GitHub
URL: https://github.com/natserract/natserract-ai
Owner: natserract
Created: 2023-11-09T02:52:11.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2023-11-16T08:38:37.000Z (about 1 year ago)
Last Synced: 2024-04-28T06:27:30.218Z (10 months ago)
Topics: chatbot, doc2vec, gpt4, langchain, natserract, openai, pgvector, qatool
Language: Python
Homepage:
Size: 7.79 MB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Natserract AI
This repository contains code that demonstrates how to build AI assistant using Langchain, integrating GPT-4 from OpenAI. The assistant can handle question-answering (QA), provide various tools, similarity search with Doc2Vec approach, to provide answers to user queries based on the provided documents.

Throughout this journey, i use PostgreSQL as the main Database and PGVector extension to store the embeddings.

## Setup
Before running the script, you need to set up the required credentials and install the necessary libraries.

### Install Required Libraries
You can install the required libraries using poetry. Run the following command in your terminal or command prompt:
```sh
poetry install
```

### Install Spacy
```sh
python -m spacy download en_core_web_sm
```

### Setup API Keys
The script uses the OpenAI API key. You need to set these API keys as environment variables in your system. Replace OPENAI_API_KEY with your actual API keys.

### Setup Database
- Postgres 15
- Enable the extension
```sql
CREATE EXTENSION vector;
```

## Running
```sh
poetry shell

poetry run python main.py
```

## Process
![](process.png)

## Demo
https://github.com/natserract/natserract-ai/assets/31182611/8b4be411-1fd4-4c9f-89cc-eb25670f7ead

## Custom Datasets
Create `_datasets` directory and place all markdown documents in it.

## Performance Considerations

If you need to perform this operation frequently and especially if the set of word vectors is large, it may be practical to use a database or a data store optimized for vector operations. These data stores can persist your word vectors and provide efficient similarity search functionality:

- FAISS by Facebook AI Research is a library for efficient similarity search and clustering of dense vectors.
- Elasticsearch has plugins like elasticsearch-vector-scoring to handle vector similarity.
- Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point.

Using such systems can significantly speed up the similarity search process.