An open API service indexing awesome lists of open source software.

https://github.com/zeuscoderbe/legal-chat-bot

This project is designed as a legal advisory chatbot system to assist users by answering questions related to Vietnamese law.The main data source includes legal documents collected from official Vietnamese legal websites.
https://github.com/zeuscoderbe/legal-chat-bot

cuda-toolkit docker fine-tuning flask llm mlops python

Last synced: about 1 month ago
JSON representation

This project is designed as a legal advisory chatbot system to assist users by answering questions related to Vietnamese law.The main data source includes legal documents collected from official Vietnamese legal websites.

Awesome Lists containing this project

README

        

# Legal Advisory Chatbot System
## Description

The **Legal Advisory Chatbot System** is designed to assist users by answering questions related to Vietnamese law. By leveraging advanced Natural Language Processing (NLP) techniques, this system can analyze and retrieve relevant legal documents from an extensive collection of official Vietnamese legal texts. The core functionality of the chatbot includes providing users with accurate and up-to-date legal information, helping them understand complex legal terminology, and offering clear, actionable advice based on Vietnamese legal documents. This system is highly valuable for individuals, businesses, and legal professionals seeking quick access to legal knowledge.

![Law_ChatBot (2)](https://github.com/user-attachments/assets/2a10c03a-3d00-4894-a2ce-f5591a7cd27f)

## Demo

![image](https://github.com/user-attachments/assets/443b303d-f11b-44ef-9299-63f5f13a5936)

## Model Fine-Tuned

**Link Model Bert Fine-Tuned for Extract : [quanghuy123/BERT-LAW](https://huggingface.co/quanghuy123/BERT-LAW)**

**Link Model Sentences Transformer for Embedding: [quanghuy123/LEGAL_EMBEDDING](https://huggingface.co/quanghuy123/LEGAL_EMBEDDING)**

**Author : Đặng Nguyễn Quang Huy**

## Features

- **Vietnamese Law Knowledge Base**: The chatbot is built upon a large dataset of Vietnamese legal documents, ensuring that the responses are based on reliable and official sources.
- **Contextual Understanding**: By utilizing advanced NLP models, the chatbot can understand and process complex legal queries, providing responses that are contextually relevant.
- **Efficient Legal Information Retrieval**: The system prioritizes relevant legal documents using state-of-the-art models to ensure that users get accurate and timely information.
- **User-Friendly Interface**: A seamless and interactive user experience through the chatbot, making it easy for anyone to get legal answers quickly.

## Data Collection & Processing

The core of the system relies on high-quality legal documents obtained from trusted Vietnamese government websites. The steps involved in data collection and processing include:

1. **Data Collection**: Using web scraping techniques, I gather legal articles, laws, regulations, and official government texts published on various legal websites.
2. **Data Cleaning**: After gathering the raw data, it undergoes a cleaning process to remove any noise, irrelevant content, or inconsistencies in the text.
3. **Data Organization**: The cleaned data is organized into structured formats, making it easier to process and query. This includes separating legal content into meaningful chunks (sections, paragraphs, etc.).
4. **Data Storage**: To enable fast and efficient retrieval, the processed data is stored in a vector database (Qdrant). Qdrant allows us to index the data based on vector embeddings, which significantly improves the search and retrieval process by matching user queries to relevant documents.

## Model Optimization

To optimize the chatbot's performance and improve the relevance of its responses, I employed the following techniques:

1. **BERT Fine-Tuning for Question-Answering (QA)**:BERT-Law is a fine-tuned version of BERT (Bidirectional Encoder Representations from Transformers), focusing on information extraction from legal documents. The model is specifically trained on a custom dataset called UTE_LAW, which consists of approximately 30,000 pairs of legal questions and related documents. The main goal of this model is to extract relevant information from legal text while reducing the costs associated with using third-party APIs.Additionally, the model supports Retrieval-Augmented Generation (RAG), which enhances its ability to handle smaller context windows, thereby optimizing API costs for third-party integrations. RAG is especially useful in scenarios where processing large legal documents in a single request might be inefficient or expensive.
2. **Re-Ranking Model**: A re-ranking model is implemented to reorder search results based on their relevance. By evaluating the context and query intent, this model ensures that the most relevant legal documents are presented to the user first.
3. **Embedding Model Optimization**: To improve the quality of search results, I optimized embedding models to map both legal documents and user queries into a shared vector space. This ensures that similar queries and documents are closely matched, improving the chatbot's ability to retrieve relevant content.

## Deployment

The Legal Advisory Chatbot is designed to be scalable, accessible, and easy to use. The deployment process involves the following:

1. **Flask API**: Flask is used to create a lightweight API that handles incoming requests from users, processes them through the chatbot model, and returns the most relevant responses.
2. **Docker for Deployment**: The entire application is containerized using Docker, which ensures easy deployment, scaling, and isolation. Docker allows the system to run consistently across different environments, from local development to production.

By using Flask and Docker, the chatbot can be easily deployed to any cloud platform or on-premise server, providing a reliable service for end-users.

## Technologies Used

- **LangChain**: A framework for building language model-driven applications. It helps manage the flow of data, queries, and responses in the system.
- **Qdrant**: A vector database designed for fast similarity search and document retrieval, making it an ideal solution for storing and querying legal documents.
- **Python**: The primary programming language used to develop the chatbot and integrate the various components.
- **Large Language Models (LLMs)**: Advanced language models like BERT are used to process and understand legal text and respond to user queries.
- **Transformers**: A library by Hugging Face that provides pre-trained models like BERT for fine-tuning and deployment.
- **Flask**: A micro web framework used to build the API that serves the chatbot responses.
- **Docker**: A containerization platform that ensures the chatbot is deployed and runs consistently across different environments.

## Future Enhancements

- **Multilingual Support**: Expand the chatbot's capabilities to handle queries in multiple languages, including regional dialects, to ensure a wider reach.
- **Improved User Interface**: Develop a web-based UI or mobile app for a more interactive and user-friendly experience.
- **Continuous Learning**: Implement mechanisms for continuous model training and updating with the latest legal documents to ensure the chatbot remains current and accurate.

## Conclusion

The **Legal Advisory Chatbot System** is an advanced AI-driven solution designed to provide instant, accurate, and actionable legal information based on Vietnamese law. Through the use of cutting-edge NLP techniques, vector databases, and scalable deployment tools, this system ensures that users have access to relevant legal knowledge at their fingertips. Whether you're a student, legal professional, or anyone seeking legal advice, this chatbot makes navigating the complex legal landscape of Vietnam easier and more efficient.