https://github.com/bitswired/website-to-knowledge-base
https://github.com/bitswired/website-to-knowledge-base
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bitswired/website-to-knowledge-base
- Owner: bitswired
- Created: 2023-04-19T19:18:39.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-03T10:43:13.000Z (almost 2 years ago)
- Last Synced: 2024-08-13T07:07:36.208Z (8 months ago)
- Language: Python
- Size: 112 KB
- Stars: 66
- Watchers: 1
- Forks: 27
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - bitswired/website-to-knowledge-base - (Python)
README
# AI-Powered Knowledge Base
[Demo](https://user-images.githubusercontent.com/19983429/235894528-9ad791a8-f2f0-4ad8-a2fc-69f4ed0aa8d8.mp4)
[YouTube Demo](https://youtu.be/qRsNQweVKj0)
This repository contains an AI-powered knowledge base that utilizes the LLMs model to answer questions based on a given website's content and **provide sources as links** to the relevant pages.
The system:
1. Loads the website's content using a **sitemap**
2. Split each web page into **chunks**
4. **Embed** each chunk using a LLM (for now OpenAI) and store them in the **Chroma vector database
5. Then it embeds the user query and run a similarity search using the Chroma database
5. Finally it loads the similarity search results as context for a LLM (for now ChatGPT) to find relevant answers and citing the sourcesIt also provides a Streamlit-based web interface for an easy-to-use experience.
## Files
- `knowledge_base.py`: The main module that creates the KnowledgeBase class. This class is responsible for loading and processing the website content, creating the document index, and querying the LLM model for answers.
- `app.py`: A Streamlit web application that provides a user interface for querying the AI-powered knowledge base.## Installation
1. Clone the repository:
```
git clone [email protected]:bitswired/website-to-knowledge-base.git
```2. Instal the project with poetry:
```
poetry install
```## Usage
### Knowledge Base
To use the KnowledgeBase class, follow these steps:
1. Import the KnowledgeBase class:
```python
from knowledge_base import KnowledgeBase
```2. Instantiate the KnowledgeBase with the appropriate sitemap URL and pattern (optional):
```python
kb = KnowledgeBase(
sitemap_url="https://nextjs.org/sitemap.xml",
pattern="docs/api-refe",
chunk_size=8000,
chunk_overlap=3000,
)
```3. Ask a question:
```python
result = kb.ask("How do I deploy my Next.js app?")
print(result)
```### Web Application
To run the Streamlit web application, execute the following command in your terminal:
```
streamlit run app.py
```The web app will open in your default browser. Enter the URL to the website's sitemap, an optional filter pattern for the URLs, and your question. The AI-powered knowledge base will return an answer based on the content of the website.
## Requirements
- An API key for OpenAI's GPT-4 (see [OpenAI's API documentation](https://beta.openai.com/docs/) for details)
## License
This project is licensed under the MIT License.