{"id":14064878,"url":"https://github.com/mkwatson/chat_any_site","last_synced_at":"2025-10-30T03:03:28.058Z","repository":{"id":163636858,"uuid":"639103084","full_name":"mkwatson/chat_any_site","owner":"mkwatson","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-11T15:52:14.000Z","size":595,"stargazers_count":34,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-08-13T07:07:43.520Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkwatson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-10T19:08:46.000Z","updated_at":"2024-08-13T07:07:45.009Z","dependencies_parsed_at":null,"dependency_job_id":"4566c593-882b-4746-9fc9-05b871c76acb","html_url":"https://github.com/mkwatson/chat_any_site","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkwatson%2Fchat_any_site","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkwatson%2Fchat_any_site/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkwatson%2Fchat_any_site/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkwatson%2Fchat_any_site/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkwatson","download_url":"https://codeload.github.com/mkwatson/chat_any_site/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228040876,"owners_count":17860211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:04:08.935Z","updated_at":"2025-10-30T03:03:23.023Z","avatar_url":"https://github.com/mkwatson.png","language":"Python","funding_links":[],"categories":["\u003ca name=\"Python\"\u003e\u003c/a\u003ePython","Python"],"sub_categories":[],"readme":"# Chat-Any-Site\n\n## Description\n\nA small Python project to ask a chatbot questions about any website (with a sitemap), using an LLM service and local vector store.\n\nThis uses OpenAI models (so you must have API access) and Chroma (an in-memory vector store) to store the embeddings of the site data locally.\n\n## Motivation\n\n1. A good excuse to play with LangChain\n2. OpenAI models, like GPT-4, we're only trained up until Sep 2021. So, if you ask if question about anything more recent, it won't have a clue. This project let's you do exactly that.\n3. Publicly available models were only trained on public data. I want a chat bot that can answer questions about sensitive or private data. This can do exactly that. For example, by passing in the sitemap of a private wiki, like Confluence.\n\n## Before You Begin\n\nMake sure you have an [OpenAI API Key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key)\n\n**Respect any wesbite's robot.txt and ai.txt. In fact, maybe just only use this on websites that you own.**\n\n## Setup and Installation\n\nThis project was developed using Python 3.10\n\nFollow these steps to install and run the project:\n\n1. Clone the repository:\n```commandline\ngit clone https://github.com/mkwatson/chat_any_site.git\n```\n\n2. Change into the project directory:\n```commandline\ncd chat_any_site\n```\n\n3. Create a virtual environment and activate it (optional, but recommended):\n```commandline\npython3.10 -m venv env\nsource env/bin/activate  # On Windows, use `env\\Scripts\\activate`\n```\n\n4. Install the project dependencies:\n```commandline\npip install .\n```\n\n## Usage\n\nTo run the command-line interface, execute the script:\n\n```commandline\nchatanysite\n```\n\nThere are two arguments you must pass:\n1. OpenAI API Key (defaults to the `OPENAI_API_KEY` environment variable)\n2. A valid sitemap.xml\n3. OpenAI model you want to use\n\nYou can also pass some or all in like\n```commandline\nchatanysite \\\n  --open-api-key=\u003cyour openai api key\u003e \\\n  --site=https://\u003chost\u003e/sitemap.xml \\\n  --model=gpt-4\n```\n\n## Demo\n\nIn this demo I'm passing along the sitemap to the [LangChain Documentation](https://python.langchain.com/en/latest/index.html).\nLangChain was released on Oct 2022. I'm also using GPT-4, which was only trained on data up to Sep 2021. \nSo, the model, as is, does not know about LangChain. Nevertheless, I'm able to get expert responses about LangChain.\n\n![Chat Any Site Demo Gif](img/demo.gif)\n[Higher quality YouTube version](https://youtu.be/vAWgbTUTuRc)\n\n## Known Limitation\n\n- Because the vector data is stored locally in-memory, it's only transient.\n- It can take a long time to download all the pages listed in the sitemap.\n\n## Next Steps\n\n- [ ] Add the ability to store the vector data in a remote persistent data store\n- [ ] Make a web client\n- [ ] What if you made a Google user, and then if you added it as read-only to any Google Doc or file on Google drive it sucked it down and you could ask questions about it?\n- [ ] Probably at least one test","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkwatson%2Fchat_any_site","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkwatson%2Fchat_any_site","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkwatson%2Fchat_any_site/lists"}