{"id":13456794,"url":"https://github.com/mpaepper/content-chatbot","last_synced_at":"2025-04-05T01:06:20.308Z","repository":{"id":144597829,"uuid":"617162058","full_name":"mpaepper/content-chatbot","owner":"mpaepper","description":"Build a chatbot or Q\u0026A bot of your website's content","archived":false,"fork":false,"pushed_at":"2024-01-28T19:55:24.000Z","size":301,"stargazers_count":534,"open_issues_count":1,"forks_count":60,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-29T00:08:18.663Z","etag":null,"topics":["deep-learning","llm","machine-learning"],"latest_commit_sha":null,"homepage":"https://www.paepper.com/blog/posts/build-q-and-a-bot-of-your-website-using-langchain/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpaepper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-03-21T20:21:50.000Z","updated_at":"2025-03-21T14:05:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"1562b6f7-7a8e-470f-816d-9636b994d5e8","html_url":"https://github.com/mpaepper/content-chatbot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fcontent-chatbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fcontent-chatbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fcontent-chatbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpaepper%2Fcontent-chatbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpaepper","download_url":"https://codeload.github.com/mpaepper/content-chatbot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271528,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","llm","machine-learning"],"created_at":"2024-07-31T08:01:27.886Z","updated_at":"2025-04-05T01:06:20.285Z","avatar_url":"https://github.com/mpaepper.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\nThis repo reached the front page of hacker news on March 22nd 2023, see the discussion \u003ca href=\"https://news.ycombinator.com/item?id=35252223\" target=\"_blank\"\u003ehere\u003c/a\u003e.\n\n## Your website content -\u003e chatbot / Q\u0026A agent\n\nTurn your website content into a question answering bot which can cite your document sources.\n\nAlternatively, use it in an interactive chatbot style fashion.\n\nAll this can be achieved with a tool called \u003ca href=\"https://github.com/hwchase17/langchain\" target='_blank'\u003elangchain\u003c/a\u003e which in turn uses the OpenAI API.\n\nThis simple repository showcases how to apply it on your own website content.\n\nTo do so, there are three scripts:\n\n* create_embeddings.py: this is the main script which loops your website's sitemap.xml to create embeddings (vectors representing the semantics of your data) of your content\n* ask_question.py: after you have the embeddings (a file called `faiss_store.pkl` was created), this script can be used to directly ask a question. It will answer the question and return the URLs of your website which were used as the source.\n* start_chat_app.py: starts a simple chat interface where you can ask a question and then follow-up on the answer. If the bot is uncertain, it will indicate so. Note that you can tune the query in this script to be more relevant for your content. In my case I mentioned it to be specific to machine learning and technical topics.\n\nTo install the dependencies, simply run `pip install -r requirements.txt`.\n\n### Create your embeddings\n\n\u003cimg src=\"imgs/llm-qa-overview.png\" alt=\"overview of the embedding process: each blog post is split into N documents and each document yields a vector representation.\" /\u003e\n\nThis is the most important step and you will need to obtain an OpenAI API key to use it.\n\nOnce you have the `$api_key`, you can run `export OPENAI_API_KEY='$api_key'` in your terminal.\n\nThen simply run `python create_embeddings.py --sitemap https://path/to/your/sitemap.xml --filter https://path/to/your/blog/posts`.\n\nThis will create your embeddings in a file called `faiss_store.pkl`. You need to point your website's sitemap.xml to the script and you can filter for URL's to start with filter. If you want to include all pages of your site, you can just set `--filter https://`.\n\nFor more details about this, please check \u003ca href=\"https://www.paepper.com/blog/posts/build-q-and-a-bot-of-your-website-using-langchain/\"\u003ethis blog post\u003c/a\u003e.\n\n### Answering a question while getting the answer source documents\n\n\u003cimg src=\"imgs/llm-qa-process.png\" alt=\"overview of the Q\u0026A process: first we find the closest matches of our documents from the FAISS store and then we ask the question to the GPT3 API.\" /\u003e\n\nWith the embeddings set up, ask a question like this: `python ask_question.py \"How to detect objects in images?\"`\n\n    Answer:\n\n    Object detection in images can be done using algorithms such as R-CNN, Fast R-CNN, and data augmentation techniques such as shifting, rotations, elastic deformations, and gray value variations.\n\n    Sources:\n\n    https://www.paepper.com/blog/posts/deep-learning-on-medical-images-with-u-net/\n    https://www.paepper.com/blog/posts/end-to-end-object-detection-with-transformers/\n\n### Starting a chatbot on your content\n\nWith the embeddings set up, start a chatbot like this: `python start_chat_app.py`. Then when it's running, ask your questions and follow-ups.\n\n\n## Zendesk Content Embedding\n\nThis repository includes an enhancement to the LangChain chatbot project, introducing the `create_embeddings` for Zendesk feature. This functionality utilizes the Zendesk API to retrieve website content and construct a Faiss knowledge base for improved chatbot responses.\n\n### How it Works\n\nThe `create_embeddings` script performs the following steps:\n\n1. **Zendesk API Integration:** Retrieves articles using the Zendesk API.\n2. **Text Cleaning:** Parses HTML content, extracting and cleaning text for embedding.\n3. **Text Splitting:** Breaks down the content into smaller chunks for efficient embedding processing.\n4. **Embedding Creation:** Utilizes OpenAI Embeddings to create embeddings for the text chunks.\n5. **Faiss Knowledge Base Construction:** Constructs a Faiss store with the generated embeddings, facilitating efficient similarity search.\n\n### Usage\n\n- **Zendesk API Credentials:** To create a Faiss knowledge base from Zendesk content, please obtain and configure your Zendesk API credentials.\n\n### Running the Script\n\n   **Execute the Script:** Run the `create_embeddings.py` script to generate the Faiss store.\n\n   Example:\n   ```bash\n    python create_embeddings.py -m zendesk -z \"https://your.zendesk.api/\"   #replace the link\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpaepper%2Fcontent-chatbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpaepper%2Fcontent-chatbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpaepper%2Fcontent-chatbot/lists"}