https://github.com/FudanDISC/DISC-LawLLM

DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services
https://github.com/FudanDISC/DISC-LawLLM
Last synced: 9 months ago
JSON representation
DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services
Host: GitHub
URL: https://github.com/FudanDISC/DISC-LawLLM
Owner: FudanDISC
License: apache-2.0
Created: 2023-09-21T11:37:35.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-04-22T12:04:06.000Z (over 1 year ago)
Last Synced: 2024-04-22T13:05:02.488Z (over 1 year ago)
Language: Python
Homepage:
Size: 43 MB
Stars: 404
Watchers: 10
Forks: 38
Open Issues: 19
Metadata Files:
- Readme: README-en.md
- License: LICENSE
Awesome Lists containing this project

StarryDivineSky - FudanDISC/DISC-LawLLM - Law-SFT 数据集](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT) (A01_文本生成_文本对话 / 大语言对话模型及数据)
Awesome-Domain-LLM - DISC-LawLLM
README

          


[ZH](./README.md) | EN

DISC-LawLLM

  

[![Generic badge](https://img.shields.io/badge/🤗-Huggingface%20Repo-green.svg)](https://huggingface.co/ShengbinYue/DISC-LawLLM)

[![license](https://img.shields.io/github/license/modelscope/modelscope.svg)](./LICENSE)

[Demo](http://law.fudan-disc.com) | [Technical Report](https://arxiv.org/abs/2309.11325)



DISC-LawLLM is a large language model specialized in Chinese legal domain, developed and open-sourced by [Fudan University Data Intelligence and Social Computing Lab (Fudan-DISC)](http://fudan-disc.com), to provide comprehensive intelligent legal services.

We will open-source the following resources in this project:

* [DISC-Law-SFT dataset](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT) 

* [DISC-LawLLM model weights](https://huggingface.co/ShengbinYue/DISC-LawLLM)

* [DISC-Law-Eval Benchmark](./eval/)

You can experience our DISC-LawLLM [online](http://law.fudan-disc.com).

## News

**[2024/10/15]** 🎉 we released DISC-Law-SFT's [legal Q&A part](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT)（DISC-Law-SFT-Pair-QA-released.jsonl and DISC-Law-SFT-Triplet-QA-released.jsonl）

**[2024/03/15]** 🎉🥳✨ Our paper "[LawLLM: Intelligent Legal System with Legal Reasoning and Verifiable Retrieval](https://link.springer.com/chapter/10.1007/978-981-97-5569-1_19)" is accepted as a LONG PAPER for the Research Track at DASFAA 2024 (**CCF-B**). ✨

**[2023/12/20]** 🎉 We have evaluated DISC-LawLLM on the latest Benchmark [Lawbench](https://github.com/open-compass/LawBench) ，[Our performance](#model-performance-on-lawbench) is only worse than **GPT-4**, surpassing **GPT3.5** and all other existing LLMs in law domain. 

**[2023/11/20]** 🎉 We have open sourced the evaluation scripts of our DISC-Law-Eval Benchmark. You can view more details [here](./eval/README.md).

**[2023/10/19]** We have open sourced the [evaluation datasets](./eval/datasets/) (including reference outputs) of our DISC-Law-Eval Benchmark.

**[2023/09/25]** DISC-LawLLM v1.0 has been officially released, with the [DISC-LawLLM-13B model weights](https://huggingface.co/ShengbinYue/DISC-LawLLM) and the [DISC-Law-SFT dataset](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT) made open source.

## Table of Contents

- [Overview](#overview)

- [Inference and Deployment](#inference-and-deployment)

- [Model Fine-tuning](#model-fine-tuning)

- [DISC-Law-Eval Benchmark](#disc-law-eval-benchmark)

- [Acknowledgements](#acknowledgements)

- [Disclaimer](#disclaimer)

- [Citation](#citation)

- [License](#license)

## Overview

![Image](./images/model_en.png)



DISC-LawLLM is a large language model designed to provide professional, intelligent, and comprehensive legal services. It caters to different user groups and offers assistance in various scenarios, with the following main features:

* **Legal text processing abilities:** DISC-LawLLM is capable of comprehending legal knowledge and generating based on legal text. Its main functionalities include information extraction, text summarization, etc. which have been fine-tuned using publicly available NLP datasets related to Chinese legal tasks and real-world legal texts.

* **Legal reasoning abilities:** To meet the requirements of tasks in smart legal services, DISC-LawLLM possesses specialized legal reasoning techniques. It leverages the concept of legal syllogism, a theory of legal reasoning, to effectively enhance its reasoning capabilities in Chinese legal domain.

* **Compliance with Chinese legal domain knowledge:** DISC-LawLLM is augmented with a retrieval module, strengthening the its ability to retrieve, comprehend, and adhere to background Chinese legal knowledge.

In addition to these features, we have made the following contributions during our research behind DISC-LawLLM:

* **High-Quality training datasets and universally effective training paradigms**

* **Comprehensive Chinese legal model evaluation framework and evaluation datasets**

### Model Performance on Lawbench

DISC-LawLLM's performance on [Lawbench](https://github.com/open-compass/LawBench) is only worse than GPT-4, surpassing all other existing LLMs in law domain. Below is the average performance (zero-shot and one-shot) of DISC-LawLLM and other LLMs evaluated on LawBench.

#### Zero-shot Performance

![lawbench1](https://github.com/FudanDISC/DISC-LawLLM/assets/82264449/50600757-262a-4acb-9873-a867f03c42d8)

#### One-shot Performance

![lawbench2](https://github.com/FudanDISC/DISC-LawLLM/assets/82264449/f0d0c945-ab39-48a2-a452-7489885c968a)

### Demonstration

#### Legal consultation

![consult_demo](./images/example_consult.gif)

#### Agreement writing

![document_demo](./images/example_document.gif)

#### Professional judicial tools

![tool_demo](./images/example_tool.gif)

#### Examination Assistant

![exam_ref_demo](./images/example_exam_ref.gif)

#### Law retrieval

![law_ref_demo](./images/example_law_ref.gif)

#### Legal consultation with retrieval module

![consult_ref_demo](./images/example_consult_ref.gif)

### DISC-Law-SFT Dataset

Intelligent applications in Chinese legal domain under different scenarios often require a combination of various abilities, including legal text understanding and generation. To achieve this, we have constructed a high-quality supervised fine-tuning dataset called DISC-Law-SFT. This dataset covers different judicial application scenarios and includes a wide variety of tasks such as legal information extraction, legal judgment prediction, legal document summarization, and legal question answering. DISC-Law-SFT comprises two subsets, DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet. The former aims to introduce legal reasoning abilities to the LLM, while the latter helps enhance the model's capability to utilize external legal knowledge. For more detailed information, please refer to our [technical report](https://arxiv.org/abs/2309.11325). The distribution of the dataset is as follows:



  

    Dataset

    Task/Source

    Size

    Scenario

  

  

    DISC-Law-SFT-Pair

    Legal information extraction

    32K

    Legal professional assistant

  

  

    Legal event detection

    27K

  

  

    Legal case classification

    20K

  

  

    Legal judgement prediction

    11K

  

  

    Legal case matching

    8K

  

  

    Legal text summarization

    9K

  

  

    Judicial public opinion summarization

    6K

  

  

    Legal question answering

    93K

    Legal consultation services

  

  

    Legal reading comprehension

    38K

    Judicial examination assistant

  

  

    Judicial examination

    12K

  

  

    DISC-Law-SFT-Triplet

    Legal judgement prediction

    16K

    Legal professional assistant

  

  

    Legal question answering

    23K

    Legal consultation services

  

  

    General

    Alpaca-GPT4

    48K

    General scenarios

  

  

    Firefly

    60K

  

  

    Total

    403K

  

We have released a total of nearly 300K training data, including both DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet datasets. They are currently available from this [link](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT).

## Retrieval Module

On the basis of DISC-LawLLM, we have augmented it with a retrieval module based on the open-source retrieval framework [Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat). Our knowledge base currently includes repositories of legal provisions, judicial documents, judicial examinations.

* The repository of legal provisions and judicial documents includes over 800 national and local Chinese laws, regulations, and provisions. It covers *Constitution*, *Criminal Law*, *Administrative Procedure*, *Labor Law*, *Copyright Law*, *Civil Code*, *Patent Law*, *Law on the Exclusive Economic Zone and the Continental Shelf*, *Measures for the Election of Deputies from the Chinese People's Liberation Army to the National People's Congress and Local People's Congresses at or above the County Level*, *Anti-Secession Law*, *Regulation on the Administration of the Entry and Exit of Foreign Nationals*, *Provisions of State Council for Encouraging Taiwan Compatriots to Invest*, *Provisions on the Administration of Religious Activities of Aliens Within the Territory*, and more.

* The repository for judicial examinations includes 24K problems related to Chinese legal knowledge.

In the future, we will continuously expand the knowledge base of our retrieval module. Furthermore, we will continue to explore and enhance the retrieval system of DISC-LawLLM. This may include, but is not limited to, mechanisms for joint training of the retrieval module and the LLM. If you are interested, we welcome further discussions and collaboration in this regard.

## Inference and Deployment

The open-source DISC-LawLLM is fine-tuned based on [Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B). Our model weights can be downloaded directly from [Hugging Face](https://huggingface.co/ShengbinYue/DISC-LawLLM), or obtained automatically from the example code below. First of all, please install the dependencies required for this project:

```

pip install -r requirements.txt

```

### Python

```python

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer

from transformers.generation.utils import GenerationConfig

model_path = "ShengbinYue/DISC-LawLLM"

model = AutoModelForCausalLM.from_pretrained(

    model_path, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True

)

model.generation_config = GenerationConfig.from_pretrained(model_path)

tokenizer = AutoTokenizer.from_pretrained(

    model_path, use_fast=False, trust_remote_code=True,

)

messages = [

    {"role": "user", "content": "生产销售假冒伪劣商品罪如何判刑？"},

]

response = model.chat(tokenizer, messages)

```

### Command Line Tool

```

python cli_demo.py

```

### Web Demo

Based on streamlit, the following command will start a web server. The console will output an address, which can be visited by entering in the browser:

```

streamlit run web_demo.py --server.port 8888

```

### Deployment

The current version of DISC-LawLLM is fine-tuned based on the Baichuan-13B model, so you can refer to [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) documentation for information on deploying int8 or int4 quantized inference and CPU deployment.

## Model Fine-Tuning

Developers can fine-tune DISC-LawLLM for specialized use. To do this, you can refer to [LLaMA Efficient Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning) or our [DISC-MedLLM](https://github.com/FudanDISC/DISC-MedLLM) medical model, which are compatible with DISC-LawLLM for fine-tuning. Here we will take [LLaMA Efficient Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning) as an example, and provide reference scripts for both **full** and **LoRA** fine-tuning.

First, download [LLaMA Efficient Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning) and follow its [instructions](https://github.com/hiyouga/LLaMA-Efficient-Tuning#getting-started) to install the required dependencies. Note that the training data should be processed in the format specified by the project. The example scripts will be given as follows.

### Full Fine-Tuning

We have tested full fine-tuning under the setting of 8 * Nvidia A800 80 GB + deepspeed. The script is as follows:

```

deepspeed --num_gpus=8 src/train_bash.py \

    --stage sft \

    --model_name_or_path S heng bin \

    --do_train \

    --dataset alpaca_gpt4_zh \

    --template baichuan \

    --finetuning_type full \

    --output_dir path_to_your_sft_checkpoint \

    --overwrite_cache \

    --per_device_train_batch_size 4 \ 

    --per_device_eval_batch_size 4 \ 

    --gradient_accumulation_steps 8 \ 

    --preprocessing_num_workers 8 \

    --lr_scheduler_type cosine \

    --logging_steps 10 \

    --save_steps 100 \

    --eval_steps 100 \

    --learning_rate 5e-5 \

    --max_grad_norm 0.5 \

    --num_train_epochs 2.0 \

    --dev_ratio 0.01 \

    --evaluation_strategy steps \

    --load_best_model_at_end \

    --plot_loss \

    --fp16 \

    --deepspeed deepspeed.json

```

`deep_speed.json` configuration is as follows:

```json

{

    "train_micro_batch_size_per_gpu": "auto",

    "zero_allow_untested_optimizer": true,

    "fp16": {

        "enabled": "auto",

        "loss_scale": 0,

        "initial_scale_power": 16, 

        "loss_scale_window": 1000,

        "hysteresis": 2,

        "min_loss_scale": 1

    },  

    "zero_optimization": {

        "stage": 2,

        "allgather_partitions": true,

        "allgather_bucket_size": 5e8,

        "overlap_comm": false,

        "reduce_scatter": true,

        "reduce_bucket_size": 5e8,

        "contiguous_gradients" : true

    }

}

```

## LoRA Fine-Tuning

We tested LoRA fine-tuning under the setting of 4 * Nvidia A800 80G GPUs. The scripts is as follows:

```

torchrun --nproc_per_node 4 src/train_bash.py \

    --stage sft \

    --model_name_or_path ShengbinYue/DISC-LawLLM \

    --do_train \

    --dataset alpaca_gpt4_zh \

    --template baichuan \

    --finetuning_type lora \

    --lora_rank 8 \ 

    --lora_target W_pack \

    --output_dir path_to_your_sft_checkpoint \

    --overwrite_cache \

    --per_device_train_batch_size 4 \ 

    --per_device_eval_batch_size 4 \ 

    --gradient_accumulation_steps 8 \ 

    --preprocessing_num_workers 16 \

    --lr_scheduler_type cosine \

    --logging_steps 10 \

    --save_steps 100 \

    --eval_steps 100 \

    --learning_rate 1e-5 \

    --max_grad_norm 0.5 \

    --num_train_epochs 2.0 \

    --dev_ratio 0.01 \

    --evaluation_strategy steps \

    --load_best_model_at_end \

    --plot_loss \

    --fp16

```

## DISC-Law-Eval-Benchmark

Inspired by the composition of judicial examinations, we developed a fair and comprehensive evaluation framework called DISC-Law-Eval Benchmark. This framework assesses the performance of LLMs in Chinese legal domain from both objective and subjective perspectives. More details about our DISC-Law-Eval Benchmark is available [here](./eval/README-en.md). We have also a Python package for **M**ultilevel **L**egal **LLM** called [ml3m](https://github.com/Charlie-XIAO/ml3m) with documentation available [here](https://charlie-xiao.github.io/ml3m/).

### Objective Evaluation

*Note: Throughout this project, we will use the term "single-choice question" to represent "multiple choice question with single correct option", and the term "multiple-choice question" to represent only "multiple-choice question with more than one correct options".*

In order to objectively and quantitatively evaluate Chinese legal knowledge and reasoning abilities of LLMs in Chinese legal domain, the objective evaluation dataset consists of a series of single-choice and multiple-choice questions from standardized Chinese judicial examinations and legal knowledge competitions. These questions are categorized into three levels of difficulty: hard, medium, and easy, based on contextual and deductive complexity. This dataset provides a more challenging and reliable way to measure whether a model can utilize its knowledge in Chinese legal domain to deduce correct answers. The chosen options are extracted from the model responses by [a delicate set of regular expressions](./eval/src/eval.py#L5), compared with the standard solutions, and the performance is determined by the percentage of correctly-answered questions. You can check [here](./eval/datasets/objective/) for our objective evaluation datasets. Details are revealed as follows:

  

    Subject

    Difficulty

    Size (Single Choice)

    Size (Multiple Choice)

    Size (Total)

  

  

    NJE: China's Unified Qualification Exam for Legal Professionals 

    Hard

    537

    463

    1000

  

  

    PAE: Qualification Exam of Patent Attorney

    118

    276

    394

  

  

    CPA: Qualification Exam of Certified Public Accountant

    197

    120

    317

  

  

    UNGEE: China's Unified Entrance Examination for Master of Laws

    Medium

    320

    87

    407

  

  

    LBK: Repository of Fundamental Chinese Legal Knowledge

    Easy

    275

    -

    275

  

  

    PFE: Legal Examination for Public Service

    170

    -

    170

  

### Subjective Evaluation

For subjective evaluation, we assess the LLMs in the question-answering format to simulate the process of subjective examination. We have manually constructed a high-quality test set by sourcing data from legal consultations, online forums, judicial-related publications, and Chinese judicial documents. We use GPT-3.5 Turbo as the referee model to evaluate the model's outputs and provide scores ranging from 1 to 5 based on three criteria: accuracy, completeness, and clarity with respect to standard answers. Details of these criteria can be found in the [ml3m documentation](https://charlie-xiao.github.io/ml3m/modules/ml3m.qa.html#ml3m.qa.QaOpenAIEvaluator).

The subjective evaluation dataset is a high-quality test set comprising 300 examples. It covers various scenarios, including legal knowledge QA, legal consultations, and legal judgment predictions. These examples are manually curated from legal consultations, online posts, judicial-related publications, and legal documents. You can check [here](./eval/datasets/subjective/) for our subjective evaluation datasets.

### Evaluation Results

We applied the few-shot approach for objective evaluation. The following results represent the accuracy (%) of the models on objective questions. S stands for single-choice questions, and M stands for multiple-choice questions.

|        Model        |  NJE (S)  |  NJE (M)   |  PAE (S)   |  PAE (M)   |  CPA (S)  |  CPA (M)   | UNGEE (S)  | UNGEE (M)  |  PFE (S)   |  LBK (S)   |   Average   |

|:----------------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|

|     ChatGLM      |   31.66   |   1.08    |   27.97   |   2.90    |   37.06   |   13.33   |   39.69   |   20.69   |   37.65   |   42.91   |   24.66   |

|  Baichuan-Chat   |   31.47   |   10.15   |   29.66   |   8.70    |   35.53   |   19.17   |   50.00   |   27.59   |   53.12   |   53.45   |   30.78   |

| Chinese-Alpaca-2 |   25.70   |   10.15   |   30.51   |   11.59   |   32.99   |   19.17   |   40.94   |   21.84   |   44.12   |   43.27   |   26.73   |

|  GPT-3.5-turbo   |   36.50   |   10.58   |   37.29   |   17.03   | **42.13** | **21.67** | **51.25** | **28.74** |   53.53   |   54.18   |   34.10   |

|     LexiLaw      |   20.11   |   7.56    |   23.73   |   10.14   |   24.87   |   19.17   |   31.56   |   16.09   |   31.76   |   40.36   |   21.50   |

|      LawGPT      |   22.91   |   6.26    |   31.36   |   7.61    |   25.38   |   16.67   |   30.31   |   13.79   |   34.71   |   29.09   |   20.60   |

|   Lawyer LLaMa   |   35.75   |   5.62    |   32.20   |   6.52    |   29.95   |   13.33   |   32.50   |   14.94   |   39.41   |   39.64   |   25.05   |

|     ChatLaw      |   27.56   |   7.99    |   31.36   |   9.42    |   35.53   |   11.67   |   35.62   |   17.24   |   42.35   |   41.09   |   25.20   |

|   DISC-LawLLM    | **42.09** | **19.87** | **40.68** | **18.48** |   39.59   |   19.17   |   50.94   |   25.29   | **57.06** | **54.91** | **37.10** |

The results of subjective evaluation are as follows. Each score is on the scale of 1 to 5.

|        Model        | Accuracy  | Completeness  | Clarity  |  Average  |

|:----------------:|:----:|:----:|:----:|:----:|

|     ChatGLM      | 2.64 | 2.75 | 3.23 | 2.87 |

|  Baichuan-Chat   | 3.22 | 3.34 | 3.18 | 3.25 |

| Chinese-Alpaca-2 | 3.13 | 3.23 | 3.17 | 3.17 |

|     LexiLaw      | 3.06 | 2.62 | 3.00 | 2.90 |

|      LawGPT      | 3.02 | 2.58 | 2.96 | 2.86 |

|   Lawyer LLaMa   | 3.13 | 2.83 | 3.35 | 3.10 |

|     ChatLaw      | 3.31 | 2.90 | 3.35 | 3.19 |

|   DISC-LawLLM    | 3.46 | 3.12 | 3.59 | 3.39 |

## Acknowledgements

This project is built upon the following open-source projects, and we would like to express our sincere gratitude to the respective projects and developers:

- [**Baichuan-13B**](https://github.com/baichuan-inc/Baichuan-13B)

- [**Langchain-Chatchat**](https://github.com/chatchat-space/Langchain-Chatchat)

- [**LLaMA Efficient Tuning**](https://github.com/hiyouga/LLaMA-Efficient-Tuning)

- [**FireFly**](https://github.com/yangjianxin1/Firefly)

We also extend our gratitide to other contributors who have provided valuable assistance to this project, though not explicitly listed due to the limited space.

## Disclaimer

DISC-LawLLM comes with issues and limitations that current LLMs have yet to overcome. While it can provide Chinese legal services in many a wide variety of tasks and scenarios, the model should be used for reference purposes only and cannot replace professional lawyers and legal experts. We encourage users of DISC-LawLLM to evaluate the model critically. We do not take responsibility for any issues, risks, or adverse consequences that may arise from the use of DISC-LawLLM.

## Citation

If our project has been helpful for your research and work, please kindly cite our work as follows:

```

@misc{yue2023disclawllm,

    title={DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services}, 

    author={Shengbin Yue and Wei Chen and Siyuan Wang and Bingxuan Li and Chenchen Shen and Shujun Liu and Yuxuan Zhou and Yao Xiao and Song Yun and Xuanjing Huang and Zhongyu Wei},

    year={2023},

    eprint={2309.11325},

    archivePrefix={arXiv},

    primaryClass={cs.CL}

}

@inproceedings{yue2024lawllm,

  title={LawLLM: Intelligent Legal System with Legal Reasoning and Verifiable Retrieval},

  author={Yue, Shengbin and Liu, Shujun and Zhou, Yuxuan and Shen, Chenchen and Wang, Siyuan and Xiao, Yao and Li, Bingxuan and Song, Yun and Shen, Xiaoyu and Chen, Wei and others},

  booktitle={International Conference on Database Systems for Advanced Applications},

  pages={304--321},

  year={2024},

  organization={Springer}

}

```

## License

DISC-LawLLM is available under the Apache License. See the [LICENSE](./LICENSE) file for more information.

## Star History
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/FudanDISC/DISC-LawLLM

Awesome Lists containing this project

README

DISC-LawLLM