{"id":24719224,"url":"https://github.com/vshanbha/ragpoc","last_synced_at":"2026-04-16T07:34:18.427Z","repository":{"id":258333534,"uuid":"860535209","full_name":"vshanbha/RAGPoC","owner":"vshanbha","description":"A Proof of Concept / Quick Start for Retrieval Augmented Generation using Langchain, Python, FAISS and Streamlit.","archived":false,"fork":false,"pushed_at":"2024-12-28T10:43:29.000Z","size":48,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-27T11:17:02.741Z","etag":null,"topics":["faiss","langchain","openai","python","streamlit"],"latest_commit_sha":null,"homepage":"https://ragpoc.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vshanbha.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-20T16:08:22.000Z","updated_at":"2024-12-28T10:43:33.000Z","dependencies_parsed_at":"2024-12-27T19:19:15.055Z","dependency_job_id":"77d58f30-1007-4a8b-a9e2-a009ba59294f","html_url":"https://github.com/vshanbha/RAGPoC","commit_stats":null,"previous_names":["vshanbha/ragpoc"],"tags_count":0,"template":false,"template_full_name":"streamlit/document-qa-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vshanbha%2FRAGPoC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vshanbha%2FRAGPoC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vshanbha%2FRAGPoC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vshanbha%2FRAGPoC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vshanbha","download_url":"https://codeload.github.com/vshanbha/RAGPoC/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244952554,"owners_count":20537467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["faiss","langchain","openai","python","streamlit"],"created_at":"2025-01-27T11:17:03.774Z","updated_at":"2026-04-16T07:34:18.422Z","avatar_url":"https://github.com/vshanbha.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📄 Document question answering template\n\nA simple Streamlit app that answers questions about an uploaded document via OpenAI's GPT-3.5.\n\n[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://ragpoc.streamlit.app/)\n\n### How to run it on your own machine\n\n## Prerequisites ##\n- Python version 3.13\n\n- Java 7 or higher version installed\n\n## Dev Setup ##\n\n1. Create Python venv Open a terminal and run:\n\n   ```\n   python3.13 -m venv .venv\n   source .venv/bin/activate\n   ```\n\n\n2. Install the requirements\n\n   ```\n   $ pip install -r requirements.txt\n   ```\n\n3. Run the app\n\n   ```\n   $ streamlit run streamlit_app.py\n   ```\n\n## Users and Roles ##\nCreate a file called `secrets.toml` inside the `.streamlit` director and add the following information.\n   ```\n   API_KEY=\"\u003cyour-OpenAI-api-key\u003e\"\n\n   [passwords]\n   # Follow the rule: username = \"password\"\n   \u003cuser\u003e = \"\u003cpassword\u003e\"\n\n   [roles]\n   # Follow the rule: username = \"role\"\n   \u003cuser\u003e = \"\u003crole\u003e\"\n   ```\nReplace `\u003cuser\u003e` with actual user names for login to the application. \nReplace `\u003crole\u003e` with one of `user`, `admin` or `super-admin`\n\n## Troubleshooting ##\n1. Certificate issues preventing text extraction\nThe application uses the Apache Tika port of Python for extracting text from Documents. \nTo run this, the system requires Java 7+ installed on the machine.\nFor MacOS running the code might cause the below exception at the time of uploading the document(s)\n   ```\n   ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)\n   ```\nTo resolve, consider going through the steps provided on this [Stackoverflow](https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-error) question\n\n## Known Issues ##\n1. Token Limit\nAt the moment the application tries to embed the whole document text into one single document in the Vector DB. If the document size hits the token limit for the embedding model then document upload does not work. \n\n2. Chat Errors due to Token Limit\nAt the moment no attempt has been made to strip down the quantum of content sent to the AI for RAG. The code does limit the number of documents sent for RAG, but if the sum of tokens for all the documents is more than the limit of the model, we get an error.\n\n3. Delete buttons work but the solution is not efficient. Need to figure out how to efficiently delete individual documents from FAISS when the indexing was done using the Langchain Indexing APIs\n\n## References ##\n[Streamlit Docs](https://docs.streamlit.io/)\n\n[Langchain How To Guides](https://python.langchain.com/docs/how_to/)\n\n[Langchain Docs on RAG](https://python.langchain.com/docs/how_to/indexing/)\n\n[Medium Blog Links](https://medium.com/gopenai/how-to-perform-crud-operations-with-vector-database-using-langchain-2df3f7fb48aa)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvshanbha%2Fragpoc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvshanbha%2Fragpoc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvshanbha%2Fragpoc/lists"}