Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brianlesko/rag-text-search
This git repository hosts a user interface for a chat-app, with integrated text similarity search for querying a document. Think of it as an upgrded Cmd+F search. It's written in Pure Python. Created for Learning Purposes.
https://github.com/brianlesko/rag-text-search
cosine-similarity gpt llm openai python search-engine streamlit text text-embedding text-processing ui
Last synced: about 20 hours ago
JSON representation
This git repository hosts a user interface for a chat-app, with integrated text similarity search for querying a document. Think of it as an upgrded Cmd+F search. It's written in Pure Python. Created for Learning Purposes.
- Host: GitHub
- URL: https://github.com/brianlesko/rag-text-search
- Owner: BrianLesko
- License: mit
- Created: 2023-11-05T19:22:25.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-17T23:16:50.000Z (10 months ago)
- Last Synced: 2024-01-18T06:07:09.860Z (10 months ago)
- Topics: cosine-similarity, gpt, llm, openai, python, search-engine, streamlit, text, text-embedding, text-processing, ui
- Language: Python
- Homepage:
- Size: 8.17 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Text Similarity Search
This code implements a chat-app with text similarity search for querying a document. Think of it as an upgraded Cmd+F search. It's written in [Pure Python](https://github.com/BrianLesko/text-similarity-search/blob/main/app.py). Created for Learning Purposes.
## Dependencies
This code uses the following libraries:
- `streamlit`: for building the user interface.
- `openai`: for generating responses to user questions.
- `tiktoken`: for tokenizing text
- `scikit-learn`: for finding the relevant text chunks based on a user's question.
- `numpy`: for creating arrays
- `pandas`: for creating dataframes
## Usage
To run this code, you need an OpenAI API Key. You can get an OpenAI API key by creating an account on the OpenAI website. Copy it to your clipboard and paste it into the app once its running. All the dependencies are handled automatically from the requirements.txt file
Run the following command:
```
pip install --upgrade streamlit
streamlit run https://github.com/BrianLesko/text-similarity-search/blob/main/app.py
```This will start the Streamlit server, and you can access the chatbot by opening a web browser and navigating to `http://localhost:8501`.
## How it Works
The chatbot works as follows:
1. The user enters a question in the input field.
2. The chatbot retrieves relevant text chunks based on the user's question using scikit-learn cosine similarity search.
3. The chatbot adds the user's question to the retrieved text chunks to create an augmented query.
4. The chatbot generates a response to the augmented query using OpenAI's GPT-3.5 (Chat GPT) language model.
5. The chatbot displays the response to the user, along with the chat history.The chat history is saved in the `st.session_state` dictionary, which is a dictionary that persists across Streamlit sessions.
## Repository Structure
```
doc-chat/
├── .streamlit/
│ └── config.toml # theme info for the UI
├── docs/
│ └── content.png
├── app.py # the code and UI integrated together live here
├── about.py # for the UI
├── requirements.txt # the python packages needed to run locally
└── .gitignore # includes the api key file and the local virtual environment
```
## Topics
```
Python | Streamlit | Git | Low Code UI
Template Repository | Chat interface | LLM
Text similarity | Text embeddings | Cosine Similarity
Sklearn | OpenAI
```
╭━━╮╭━━━┳━━┳━━━┳━╮╱╭╮ ╭╮╱╱╭━━━┳━━━┳╮╭━┳━━━╮
┃╭╮┃┃╭━╮┣┫┣┫╭━╮┃┃╰╮┃┃ ┃┃╱╱┃╭━━┫╭━╮┃┃┃╭┫╭━╮┃
┃╰╯╰┫╰━╯┃┃┃┃┃╱┃┃╭╮╰╯┃ ┃┃╱╱┃╰━━┫╰━━┫╰╯╯┃┃╱┃┃
┃╭━╮┃╭╮╭╯┃┃┃╰━╯┃┃╰╮┃┃ ┃┃╱╭┫╭━━┻━━╮┃╭╮┃┃┃╱┃┃
┃╰━╯┃┃┃╰┳┫┣┫╭━╮┃┃╱┃┃┃ ┃╰━╯┃╰━━┫╰━╯┃┃┃╰┫╰━╯┃
╰━━━┻╯╰━┻━━┻╯╱╰┻╯╱╰━╯ ╰━━━┻━━━┻━━━┻╯╰━┻━━━╯
follow all of these or i will kick you