{"id":19173768,"url":"https://github.com/tade0726/ato_chatbot","last_synced_at":"2025-04-09T19:36:08.852Z","repository":{"id":261713053,"uuid":"884846194","full_name":"tade0726/ato_chatbot","owner":"tade0726","description":"Australian Tax Office (ATO) chatbot using LlamaIndex RAG and OpenAI. Features automated documentation processing with ZenML pipelines, Qdrant vector storage, and Streamlit interface. Built for accurate tax information retrieval and natural language query processing.","archived":false,"fork":false,"pushed_at":"2024-12-12T07:05:17.000Z","size":823,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-23T21:35:44.262Z","etag":null,"topics":["llamaindex","llm","rag","streamlit"],"latest_commit_sha":null,"homepage":"https://ato-chat.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tade0726.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-07T13:50:11.000Z","updated_at":"2024-12-29T12:06:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"1d14a275-e584-48ae-8380-541ae9e9fc3f","html_url":"https://github.com/tade0726/ato_chatbot","commit_stats":null,"previous_names":["tade0726/ato_chatbot"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tade0726%2Fato_chatbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tade0726%2Fato_chatbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tade0726%2Fato_chatbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tade0726%2Fato_chatbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tade0726","download_url":"https://codeload.github.com/tade0726/ato_chatbot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248098429,"owners_count":21047437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llamaindex","llm","rag","streamlit"],"created_at":"2024-11-09T10:14:43.407Z","updated_at":"2025-04-09T19:36:08.821Z","avatar_url":"https://github.com/tade0726.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ATO Chatbot\n\nA RAG-based chatbot system for Australian Taxation Office (ATO) information retrieval and assistance, powered by data sourced from ato.gov.au.\n\n\n## Live Demo\nhttps://ato-chat.streamlit.app/\n\n\n![Chat Interface Screenshot](./docs/chat_interface.png)\n*Figure 1: Streamlit Chat Interface with example conversation*\n\n\n## Overview\n\nThis project implements a Retrieval-Augmented Generation (RAG) chatbot system specifically designed for ATO-related queries. It consists of two main components:\n\n1. **Data Pipeline \u0026 Model Training**: A modular pipeline built with ZenML for data processing and index creation\n2. **Interactive Interface**: A Streamlit-based chat interface for user interactions\n\n## Architecture\n\n![System Architecture](./docs/architecture.svg)\n\n## Technology Stack\n\n- **Data Collection \u0026 Processing**\n  - [Firecrawl](https://github.com/brave-experiments/firecrawl) - Web crawling and content extraction\n  - [ZenML](https://zenml.io/) - MLOps pipeline orchestration\n  - [Qdrant](https://qdrant.tech/) - Vector database for embeddings storage\n\n- **Machine Learning \u0026 AI**\n  - [OpenAI](https://openai.com/) - Large Language Model API\n  - [LlamaIndex](https://www.llamaindex.ai/) - RAG framework and indexing\n\n- **Backend \u0026 Infrastructure**\n  - [Docker](https://www.docker.com/) - Containerization\n  - [MongoDB](https://www.mongodb.com/) - Document storage\n\n- **Frontend**\n  - [Streamlit](https://streamlit.io/) - Interactive web interface\n  - [Streamlit-Chat](https://streamlit.io/components) - Chat UI components\n\n## Components\n\n### 1. Data Pipeline\n\nThe data pipeline is built using ZenML and consists of several key steps:\n\n1. **Data Collection**: Uses Firecrawl to extract content from ATO pages\n2. **Data Cleaning**: Processes and filters the collected data\n3. **Index Creation**: Creates embeddings and stores them in Qdrant\n\n![Data Pipeline Flow](./docs/pipeline_flow.png)\n*Figure 2: ZenML Pipeline Workflow showing data processing steps*\n\nKey pipeline components:\n\n```\npython:src/ato_chatbot/pipelines/simple_index_pipeline.py\n```\n\n\n### 2. Chat Interface\n\nThe chat interface is built with Streamlit and implements a 3-step RAG process:\n\n1. **Query Rephrasing**: Improves query understanding\n2. **Knowledge Retrieval**: Fetches relevant information from Qdrant\n3. **Response Generation**: Uses OpenAI to generate contextual responses\n\n\nKey interface components:\n\n```\npython:src/ato_chatbot/chat_interface.py\n```\n\n## Setup\n\n### Prerequisites\n\n- Python 3.12+g\n- Docker and Docker Compose\n- OpenAI API key\n\n### Installation\n\n1. Clone the repository\n2. Install dependencies:\n\n```\nuv install\n```\n\n3. Start required services:\n\n```bash\nmake up\n```\n\n\n### Running the Application\n\n1. Train the model:\n\n```bash\nmake zen_run_simple_index\n```\n\n\n2. Start the chat interface:\n\n```bash\nmake streamlit\n```\n\n\n## Dependencies\n\nKey dependencies include:\n\n```\ntoml:pyproject.toml\n```\n\n\n## License\n\nApache License 2.0","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftade0726%2Fato_chatbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftade0726%2Fato_chatbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftade0726%2Fato_chatbot/lists"}