{"id":22300094,"url":"https://github.com/rohitedathil/helpster","last_synced_at":"2025-03-25T23:21:18.623Z","repository":{"id":195329062,"uuid":"692408963","full_name":"RohitEdathil/helpster","owner":"RohitEdathil","description":"Helpster one of the entries that won **Inquiry Bot Challange** by IEEE CS KS. It is an LLM-based chat bot that can answer questions related to AICSSYC 23, previous AICSSYC's, IEEE CS KS,  IEEE CS, IEEE, and other related questions.","archived":false,"fork":false,"pushed_at":"2023-10-28T15:43:16.000Z","size":608,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-30T20:39:01.248Z","etag":null,"topics":["bot","chainlit","llm","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RohitEdathil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-16T11:41:44.000Z","updated_at":"2023-10-28T15:43:28.000Z","dependencies_parsed_at":"2025-01-30T20:30:22.425Z","dependency_job_id":null,"html_url":"https://github.com/RohitEdathil/helpster","commit_stats":null,"previous_names":["rohitedathil/helpster"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RohitEdathil%2Fhelpster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RohitEdathil%2Fhelpster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RohitEdathil%2Fhelpster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RohitEdathil%2Fhelpster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RohitEdathil","download_url":"https://codeload.github.com/RohitEdathil/helpster/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245557672,"owners_count":20635023,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","chainlit","llm","python"],"created_at":"2024-12-03T18:09:07.855Z","updated_at":"2025-03-25T23:21:18.601Z","avatar_url":"https://github.com/RohitEdathil.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Helpster\n\nHelpster one of the entries that won **Inquiry Bot Challange** by IEEE CS KS. It is an LLM-based chat bot that can answer questions related to AICSSYC 23, previous AICSSYC's, IEEE CS KS,  IEEE CS, IEEE, and other related questions.\n\nBuilt using:\n\n- [Python](https://www.python.org/)\n- [LangChain](https://www.langchain.com/)\n- [OpenAI](https://openai.com/)\n- [ChainLit](https://github.com/Chainlit/chainlit)\n- [Pinecone](https://www.pinecone.io/)\n\n\u003cimg src=\"https://github.com/RohitEdathil/helpster/blob/main/img/ss1.jpg?raw=true\" width=\"700px\"\u003e\n\u003cimg src=\"https://github.com/RohitEdathil/helpster/blob/main/img/ss2.jpg?raw=true\" width=\"700px\"\u003e\n\n## Setup\n\nClone the repository\n\n```bash\ngit clone https://github.com/RohitEdathil/helpster\n```\n\nInstall dependencies\n\n```bash\npip install -r requirements.txt\n```\n\nSetup environment variables\n\n| Variable           | Description               |\n| ------------------ | ------------------------- |\n| `OPENAI_API_KEY`   | OpenAI API Key            |\n| `PINECONE_API_KEY` | Pinecone API Key          |\n| `PINECONE_ENV`     | Pinecone Environment Name |\n\nIf you want to enable LangChain tracing (Optional)\n\n| Variable               | Description      |\n| ---------------------- | ---------------- |\n| `LANGCHAIN_TRACING_V2` | Enable version 2 |\n| `LANGCHAIN_ENDPOINT`   | API endpoint     |\n| `LANGCHAIN_API_KEY`    | API key          |\n| `LANGCHAIN_PROJECT`    | Project name     |\n\nCreate an index in Pinecone with the name `helpster` and dimension `1536`\n\n## Usage\n\n### Load\n\n```bash\npython3 load.py\n```\n\nRunning this command will load documents from `data` folder to Pinecone index. All files must be plain text files with `.txt` extension. One file is considered as one document in the index.\n\n**Note:** The index `helpster` must be created in Pinecone before running this command.\n\n### Run\n\n```bash\nchainlit run main.py\n```\n\nThis will start the server at `http://localhost:8000` where you can ask questions similar to ChatGPT.\n\n## Working\n\n### RAG\n\nThis bot uses RAG(Retrieval Augmented Queries). It uses a Pinecone index to retrieve the most similar document to the question. Then it uses the retrieved document as a context to the GPT-3 model to generate the answer.\n\n### ConversationBufferWindowMemory\n\nThe bot uses a memory buffer to store the last 3 questions and answers. You can thus ask follow up questions to the bot. It was limited to 3 because of the token consumption limitations.\n\n### Information Constraints\n\nThe bot is explicitly instructed to not answer questions which it is not trained to answer. This prevents hallucination and makes the bot more robust. Also the bot will stay close to its purpose.\n\n## Data Sources\n\nHere are the data sources used to train the bot:\n\n- Official website of AICSSYC 23\n- Official instagram page of AISSYC 23\n- Instagram pages of previous AICSSYC's\n- Facebook page of AICSSYC 23\n- Archived website of previous AICSSYC's\n- IEEE CS KS website\n- IEEE CS website\n- Wikipedia\n\n## Data Handling\n\n### Training Data (Public)\n\nThe data obtained to train the data is publically available. It was obtained mostly manually and a tool called `instaloader` was used to download the instagram posts. The data was then cleaned and formatted to be used for training. Even though the data is publically available, it is still securely stored only in the development system and secure Pinecone servers.\n\n### Chat Logs\n\nAll the **chat information is monitored** through `LangSmith`, a sister project of `LangChain`. It is a tool to diagnose and monitor LLM apps. The chat information is stored in a secure database and is only **used for debugging purposes**. Developers may look into the chat information to improve the bot. The chat information is **not shared with any third party**.\n\n### LLM\n\nThe LLM used by the app is provided by OpenAI. Hence, data is visible to OpenAI.\n\nRead more about OpenAI's API privacy policy [here](https://openai.com/enterprise-privacy).\n\nFrom OpenAI's privacy policy we can see that data from API Platform is **not used to train** the models. Also, the data is not shared with any third party and is SOC 2 Type 2 compliant.\n\nRead more about SOC 2 Type 2 [here](https://us.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc2report).\n\nHence we can conclude there **won't be any data leaks** (user information presented to other users).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohitedathil%2Fhelpster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frohitedathil%2Fhelpster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohitedathil%2Fhelpster/lists"}