{"id":18068363,"url":"https://github.com/amha-kindu/fastllama","last_synced_at":"2026-04-20T05:33:26.691Z","repository":{"id":260339689,"uuid":"880993475","full_name":"amha-kindu/FastLlama","owner":"amha-kindu","description":"A Scalable Local Knowledge Base augmented with an LLM using FastAPI, LlamaIndex, and MongoDB. It features two operational modes: a Question Answering mode, which retrieves answers from a local database or queries OpenAI's ChatGPT API when necessary, and an in-development Chatbot mode that will allow broader topic coverage.","archived":false,"fork":false,"pushed_at":"2024-10-30T19:08:55.000Z","size":804,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T16:12:29.893Z","etag":null,"topics":["docker","fastapi","llama-index","llm","mongodb","openai-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amha-kindu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-30T18:20:40.000Z","updated_at":"2024-11-03T18:17:33.000Z","dependencies_parsed_at":"2024-10-30T20:34:04.890Z","dependency_job_id":null,"html_url":"https://github.com/amha-kindu/FastLlama","commit_stats":null,"previous_names":["amha-kindu/fastllama"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amha-kindu%2FFastLlama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amha-kindu%2FFastLlama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amha-kindu%2FFastLlama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amha-kindu%2FFastLlama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amha-kindu","download_url":"https://codeload.github.com/amha-kindu/FastLlama/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247361701,"owners_count":20926643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","fastapi","llama-index","llm","mongodb","openai-api"],"created_at":"2024-10-31T08:06:11.207Z","updated_at":"2026-04-20T05:33:26.622Z","avatar_url":"https://github.com/amha-kindu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FastLlama: Local Knowledge Base Augmented LLM\n\nThis project leverages a Local Knowledge Base augmented with a Language Model (LLM) to provide scalable question-answering capabilities for millions of users. Built on top of **LlamaIndex**, **FastAPI**, and **MongoDB**, the system supports two operational modes: Question Answering and Chatbot.\n\n![System Architecture](./imgs/system_architecture.png)\n\n## Modes of Operation\n\n### 1. Question Answering Mode\n\nIn this mode, the bot utilizes a local knowledge base sourced from a CSV file containing standard question/answer pairs. The operational flow is as follows:\n\n- **Data Ingestion**: Upon the first execution, the CSV file is ingested. Standard questions are vectorized using **LlamaIndex**, which serves as the embedding engine. Standard answers are stored in **MongoDB**. This decoupling allows for flexibility and efficient data retrieval.\n  \n- **Query Processing**: When a user poses a question:\n  - The query engine searches for a matching question in the local database. If a match is found, the corresponding answer is retrieved from MongoDB.\n  - If no suitable match exists, the bot queries OpenAI's ChatGPT API to obtain an answer and stores the new question in the index for future reference.\n  \n- **Relevance Check**: If the question does not pertain to the designated topic (in this implementation, Golf), the bot will decline to provide an answer.\n\n### 2. Chatbot Mode\n\nCurrently under development, the Chatbot Mode aims to allow the bot to respond to a broader range of questions, irrespective of the specific topic. It will also enable the extraction of relevant information from chat history.\n\n#### Interaction Examples\n\n- **Knowledge Base Interaction**  \n  ![Question Answering Demo](./imgs/question_answering_demo_1.png)\n\n- **Irrelevant Question Handling**  \n  ![Irrelevant Question Demo](./imgs/question_answering_demo_2.png)\n\n## System Architecture Overview\n\nThe bot operates within a robust architecture comprising:\n\n- **FastAPI**: The chosen web framework, providing high performance and asynchronous capabilities.\n- **LlamaIndex**: Serves as the search engine, facilitating efficient vector embeddings.\n- **MongoDB**: Utilized for metadata storage, ensuring quick retrieval of answers.\n\n### Technical Implementation\n\n- **Embedding Process**: Utilizes the OpenAI API at `https://api.openai.com/v1/embeddings` for high-performance embeddings. The process is both cost-effective and efficient.\n  \n- **Answer Retrieval**: Queries OpenAI's ChatGPT for answers via `https://api.openai.com/v1/chat/completions`, defaulting to the `gpt-3.5-turbo` model for its response generation.\n\n- **Concurrency Support**: The system is designed to handle concurrent requests natively, ensuring scalability for multiple users.\n\n## Future Enhancements\n\n- Integrate OpenAI's Assistant API as a potential alternative search engine (current trials indicate that LlamaIndex outperforms it at this time).\n- Develop additional test cases to enhance reliability and performance.\n\n## Development Instructions\n\n### Environment Setup\n\nTo set up the development environment, follow these steps:\n\n```bash\nexport OPENAI_API_KEY=your_openai_api_key \npyenv install 3.11.8 \nvirtualenv -p python3.11 env\nsource env/bin/activate\npip install -r requirements.txt\n```\n\n### Running Unit Tests\n\nExecute the following command to run unit tests:\n\n```bash\npytest -ss\n```\n\n### Starting the Server\n\nTo start the FastAPI server, use:\n\n```bash\nuvicorn app.main:app --host 127.0.0.1 --port 8081\n# Alternatively, run with:\n# PYTHONPATH=. python app/main.py 8082\n```\n\n### API Documentation\n\nAccess the automatically generated API documentation at:\n\n```plaintext\nhttp://127.0.0.1:8081/docs\n```\n\nTo generate OpenAPI documentation, use:\n\n```bash\nPYTHONPATH=. python app/utils/api-docs/extract_openapi.py app.main:app --out openapi.yaml\npython app/utils/api-docs/swagger_html.py \u003c openapi.yaml \u003e swagger.html\npython app/utils/api-docs/redoc_html.py \u003c openapi.yaml \u003e redoc.html\n```\n\n### Local Testing Guidelines\n\n- Write test cases in files located at `/app/tests/test_*.py`.\n- Ensure all local test cases pass before committing any changes.\n\n## References\n\n- [LlamaIndex Official Documentation: Fullstack App Guide](https://docs.llamaindex.ai/en/stable/understanding/putting_it_all_together/apps/fullstack_app_guide.html)\n- [LlamaIndex Official Demo Code: Flask + React](https://github.com/logan-markewich/llama_index_starter_pack/tree/main/flask_react)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famha-kindu%2Ffastllama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famha-kindu%2Ffastllama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famha-kindu%2Ffastllama/lists"}