{"id":13757770,"url":"https://github.com/wandb/wandbot","last_synced_at":"2025-06-17T19:41:08.754Z","repository":{"id":155969303,"uuid":"618560494","full_name":"wandb/wandbot","owner":"wandb","description":"wandbot is a technical support bot for Weights \u0026 Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk","archived":false,"fork":false,"pushed_at":"2025-06-10T18:09:02.000Z","size":4096,"stargazers_count":300,"open_issues_count":7,"forks_count":54,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-06-13T23:14:33.543Z","etag":null,"topics":["ai","chat","discord","gpt","gpt-4","openai","slack","support-bot","wandb","zendesk"],"latest_commit_sha":null,"homepage":"https://www.wandb.ai/site","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wandb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-03-24T18:30:59.000Z","updated_at":"2025-06-10T18:09:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"7c0d59a9-b093-4489-a6c8-c21efb935a99","html_url":"https://github.com/wandb/wandbot","commit_stats":{"total_commits":238,"total_committers":15,"mean_commits":"15.866666666666667","dds":"0.47058823529411764","last_synced_commit":"37d142ea112eb4dbfea5c914fb9ff0df05a58bf5"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/wandb/wandbot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wandb%2Fwandbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wandb%2Fwandbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wandb%2Fwandbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wandb%2Fwandbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wandb","download_url":"https://codeload.github.com/wandb/wandbot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wandb%2Fwandbot/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732876,"owners_count":22903098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chat","discord","gpt","gpt-4","openai","slack","support-bot","wandb","zendesk"],"created_at":"2024-08-03T12:00:49.563Z","updated_at":"2025-06-17T19:41:03.740Z","avatar_url":"https://github.com/wandb.png","language":"Python","funding_links":[],"categories":["Example App","Chatbots"],"sub_categories":["Production Level Examples"],"readme":"# WandBot\n\nWandBot is a support assistant designed for Weights \u0026 Biases' Models and Weave.\n\n## What's New\n\n\u003cdetails\u003e\n\u003csummary\u003ewandbot v1.3.0\u003c/summary\u003e\n\n- Up to date wandb docs + code, weave docs + code, example colabs, edu content - using chroma_index:v50\n- Gemini flash-2.0 for query expansion (was gpt-4o)\n- GPT-4o for composing the response (was gpt-4o)\n- Cohere rerank-v3.5 (was rerank-v2.0)\n- Hosted Chroma (was locally hosted chroma)\n- Turned off web-search for now\n- Moved all configs to configs folder\n- Removed most langchain dependencies\n- Implemented LLM and EmbeddingModel classes\n- More robust evaluation pipeline, added retries, error handling and cli args\n- Move package management to use uv\n- Update to use python 3.12\n- Add dotenv for env loading, while developing\n- Add new endpoint, removed retriever endpoint for now\n- Improved error handling and retries for all apis\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ewandbot v1.2.0\u003c/summary\u003e\n\nThis release introduces a number of exciting updates and improvements:\n\n- **Parallel LLM Calls**: Replaced the llama-index with the LECL, enabling parallel LLM calls for increased efficiency.\n- **ChromaDB Integration**: Transitioned from FAISS to ChromaDB to leverage metadata filtering and speed.\n- **Query Enhancer Optimization**: Improved the query enhancer to operate with a single LLM call.\n- **Modular RAG Pipeline**: Split the RAG pipeline into three distinct modules: query enhancement, retrieval, and response synthesis, for improved clarity and maintenance.\n- **Parent Document Retrieval**: Introduced parent document retrieval functionality within the retrieval module to enhance contextuality.\n- **Sub-query Answering**: Added sub-query answering capabilities in the response synthesis module to handle complex queries more effectively.\n- **API Restructuring**: Redesigned the API into separate routers for retrieval, database, and chat operations.\n\nThese updates are part of our ongoing commitment to improve performance and usability.\n\u003c/details\u003e\n\n## Evaluation\nEnglish \n| wandbot version  | Comment  | Response Correctness | Num Trials | Data ingestion Report |\n|---|---|---| --- | --- |\n| 1.0.0 | baseline wandbot |  53.8 % | 1 |  |\n| 1.1.0 | improvement over baseline; in production for the longest | 72.5 %  | 1 |  |\n| 1.2.0 | our new enhanced wandbot | 81.6 % | 1 |  |\n| 1.3.0rc | [1.3.0rc with gpt-4-preview judge](https://wandb.ai/wandbot/wandbot-eval/weave/evaluations?peekPath=%2Fwandbot%2Fwandbot-eval%2Fcalls%2F0196172b-bed6-77e3-8d43-dc1c31fc9a9b%3FhideTraceTree%3D1) | 71.3 % | 5 | [v50](https://wandb.ai/wandbot/wandbot-dev/reports/Prod-v1-3-Wandbot-Data-Ingestion-Report-2025-04-09-15-44-45--VmlldzoxMjIwNzI0Mg) |\n| 1.3.0rc | [1.3.0rc with gpt-4o judge](https://wandb.ai/wandbot/wandbot-eval/weave/evaluations?peekPath=%2Fwandbot%2Fwandbot-eval%2Fcalls%2F019619b7-6ca1-7cc1-bdb9-d1053a6386d8%3FhideTraceTree%3D1) |88.8 % | 5 | [v50](https://wandb.ai/wandbot/wandbot-dev/reports/Prod-v1-3-Wandbot-Data-Ingestion-Report-2025-04-09-15-44-45--VmlldzoxMjIwNzI0Mg) |\n| 1.3.0 | [v1.3.0 prod, v50 index, gpt-4o judge](https://wandb.ai/wandbot/wandbot-eval/weave/evaluations?peekPath=%2Fwandbot%2Fwandbot-eval%2Fcalls%2F01961c5e-9570-7f93-b3db-572ae83d9dbe%3FhideTraceTree%3D1) | 91.2 % | 5 | [v50](https://wandb.ai/wandbot/wandbot-dev/reports/Prod-v1-3-Wandbot-Data-Ingestion-Report-2025-04-09-15-44-45--VmlldzoxMjIwNzI0Mg)  |\n| 1.3.1 | [v1.3.1 prod, v52 index, gpt-4o judge](https://wandb.ai/wandbot/wandbot-eval/weave/calls/01962210-44be-7f53-986d-4dc529660ad1?hideTraceTree=1) | 91.2 % | 5 | [v52](https://wandb.ai/wandbot/wandbot-dev/reports/Wandbot-Data-Ingestion-Report-for-chroma_index-v52-2025-04-10-23-28--VmlldzoxMjIzMDczNQ)\n\n\n**Note**\n- v1.3.1 uses:\n  - claude Sonnet-3.7 for the response synthesizer, updated from gpt-4o-2024-11-20\n  - an updated index that exludes korean and japanese versions of the docs as well as excludes the blog posts from Fully Connected.\n- `1.3.0rc with gpt-4-preview judge` and `1.3.0rc with gpt-4o judge` are the same wandbot system evaluated with different judges. \n- The ~2.5% improvement between `1.3.0rc (gpt-4o judge)` and `1.3.0 prod` is mostly due to using `reranker-v3.5` (from 2.0) and `flash-2.0-001` (from gpt-4o). However evals previous to the v1.3.0 prod eval had 10-12 errors (out of 490 total calls), so there might be some noise in the results.\n\nJapanese\n| wandbot version  | Comment  | response accuracy |\n|---|---|---|\n| 1.2.0 | our new enhanced wandbot | 56.3 % |\n| 1.2.1 | add translation process | 71.9 % |\n\n## Features\n\n- WandBot uses:\n  - a hosted ChromaDB vector store\n  - OpenAI's v3 embeddings\n  - Gemini flash-2.0 for query enhancement\n  - GPT-4o for response synthesis\n  - Cohere's re-ranking model\n- It features periodic data ingestion and report generation, contributing to the bot's continuous improvement. You can view the latest data ingestion report [here](https://wandb.ai/wandbot/wandbot-dev/reportlist).\n- The bot is integrated with Discord and Slack, facilitating seamless integration with these popular collaboration platforms.\n- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights \u0026 Biases Weave\n- Has a fallback mechanism for model selection\n\n## Installation\n\nThe project is built with Python version `3.12` and utilizes `uv` for dependency management. Follow the steps below to install the necessary dependencies:\n\n```bash\nbash build.sh\n```\n\n## Usage\n\n### Running WandBot\n\nBefore running the Q\u0026A bot, ensure the following environment variables are set:\n\n```bash\nOPENAI_API_KEY\nCOHERE_API_KEY\nWANDB_API_KEY\nWANDBOT_API_URL=\"http://localhost:8000\"\nWANDB_TRACING_ENABLED=\"true\"\nLOG_LEVEL=INFO\nWANDB_PROJECT=\"wandbot-dev\"\nWANDB_ENTITY= \u003cyour W\u0026B entity\u003e\n```\n\nIf you're running the slack or discord apps you'll also need the following keys/tokens set as env vars:\n\n```\nSLACK_EN_APP_TOKEN\nSLACK_EN_BOT_TOKEN\nSLACK_EN_SIGNING_SECRET\nSLACK_JA_APP_TOKEN\nSLACK_JA_BOT_TOKEN\nSLACK_JA_SIGNING_SECRET\nDISCORD_BOT_TOKEN\n```\n\nThen build the app to install all dependencies in a virtual env, note this is heavily tailored for Replit.\n\n```\nbash build.sh\n```\n\nStart the Q\u0026A bot application using the following commands:\n\n```bash\nbash run.sh\n```\n\nThen call the `/startup` endpoint to trigger the final wandbot app initialisation:\n```bash\ncurl http://localhost:8000/startup\n```\n\nFor more detailed instructions on installing and running the bot, please refer to the [run.sh](./run.sh) file located in the root of the repository.\n\nExecuting these commands will launch the API, Slackbot, and Discord bot applications, enabling you to interact with the bot and ask questions related to the Weights \u0026 Biases documentation.\n\n### Running the Evaluation pipeline\n\n**Eval Config**\n\nThe eval config can be found here and includes cli args to set the number of trials, weave parallelism, weave logging details and debug mode : `wandbot/src/wandbot/evaluation/eval_config.py`\n\nThe following evaluation sets are used:\n\n[English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26)\n- ref: `weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU`\n    \n[Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26)\n- ref: `weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA`\n\n\n**Dependencies**\n\nEnsure wandbot is installed by installing the production depenencies, activate the virtual env that was created and then install the evaluation dependencies\n\n```\nbash build.sh\nsource wandbot_venv/bin/activate\nuv pip install -r eval_requirements.txt\npoetry install\n```\n\n**Environment variables**\n\nMake sure to set the environment variables (i.e. LLM provider keys etc) from the `.env` file.\n\n**Launch the wandbot app**\nYou can either use `uvicorn` or `gunicorn` to launch N workers to be able to serve eval requests in parallel. Note that weave Evaluations also have a limit on the number of parallel calls make, set via the `WEAVE_PARALLELISM` env variable, which is set further down in the `eval.py` file using the `n_weave_parallelism` flag. Launch wandbot with 8 workers for faster evaluation. The `WANDBOT_FULL_INIT` env var triggers the full wandbot app initialization.\n\n`uvicorn`\n```bash\nWANDBOT_FULL_INIT=1 uvicorn wandbot.api.app:app \\\n--host 0.0.0.0 \\\n--port 8000 \\\n--workers 8 \\\n--timeout-keep-alive 75 \\\n--loop uvloop \\\n--http httptools\n```\n\nTesting: You can test that the app is running correctly by making a request to the `chat/query` endpoint, you should receive a response payload back from wandbot after 30 - 90 seconds:\n\n```bash\ncurl -X POST \\\n   http://localhost:8000/chat/query \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"question\": \"How do I log a W\u0026B artifact?\"}'\n```\n\n**Debugging**\nFor debugging purposes during evaluation you can run a single instance of the app by chaning the `uvicorn` command above to use `--workers 1` \n```\n\n**Run the evaluation**\n\nLaunch W\u0026B Weave evaluation in the root `wandbot` directory. Ensure that you're virtual envionment is active. By default, a sample will be evaluated 3 times in order to account for both the stochasticity of wandbot and our LLM judge. \n\n- For debugging, pass the `--debug` flag to only evaluate on a small number of samples. \n- To adjust the number of parallel evaluation calls weave makes use the `--n_weave_parallelism` flag when calling `eval.py` \n- see `eval_config.py` for all evaluation options.\n\n```\nsource wandbot_venv/bin/activate\n\npython src/wandbot/evaluation/eval.py\n```\n\nDebugging, only running evals on 1 sample and for 1 trial:\n\n```\npython src/wandbot/evaluation/eval.py  --debug --n_debug_samples=1 --n_trials=1\n```\n\nEvaluate on Japanese dataset:\n\n```\npython src/wandbot/evaluation/eval.py  --lang ja\n```\n\nTo only evaluate each sample once:\n\n```\npython src/wandbot/evaluation/eval.py  --n_trials 1\n```\n\n\n### Data Ingestion\n\nThe data ingestion module pulls code and markdown from Weights \u0026 Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.\n\nTo ingest the data run the following command from the root of the repository, see `run_ingestion_config.py` for all available arguments.\n\n```bash\npython -m wandbot.ingestion\n```\n\n**Note:**\n\nPay special attention to the configs in `src/wandbot/configs/vector_store_config.py` and `src/wandbot/configs/ingestion_config` as this is where important settings such as the embedding model, embedding dimensions and hosted vs local vector db are set.\n\nYou will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.\n\nThese datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).\n\n#### Ingestion pipeline debugging\n\nTo help with debugging, you can use the `steps` and `include_sources` flags to specify only sub-components of the pipeline and only certain documents sources to run. For example if you wanted to stop the pipeline before it creates the vector db and creates the artifacts and W\u0026B report AND you only wanted to process the Weave documentation, you would do the following:\n\n```\npython -m wandbot.ingestion --steps prepare preprocess --include_sources \"weave_documentation\" --debug\n```\n\n#### Note on updating hosted Chroma vector db\n\nA. If you compute a diff between the old dev docs and the new ones\n1. You could use delete() then add(), on the same ids if you have consistent ids across updates\n2. You could call update() or upsert() on the same ids, but if you changed any metadata schemas and want to drop old keys, you'll have to explicitly do that.\n\nB. If you don't compute a diff or want a simple way to do this\n1. You could delete everything in the collection and add it\n2. You could create a new collection and insert the new data into that.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwandb%2Fwandbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwandb%2Fwandbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwandb%2Fwandbot/lists"}