https://github.com/dmis-lab/law-and-order

Last synced: 12 months ago
JSON representation
Host: GitHub
URL: https://github.com/dmis-lab/law-and-order
Owner: dmis-lab
Created: 2025-05-10T23:28:38.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-16T07:15:09.000Z (about 1 year ago)
Last Synced: 2025-07-18T04:44:42.493Z (about 1 year ago)
Language: Python
Size: 71.3 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Law & Order

Benchmark Dataset for Evaluating Large Language Models in Policing

## Contributors

	

		Name		

		Affiliation

		Email

	

	

		Heedou Kim		

		Korean National Police Agency
 Data Mining and Information Systems Lab,
Korea University, South Korea

		heedou123@korea.ac.kr

	

  

		Mogan Gim		

		Department of Biomedical Engineering,
Hankuk University of Foreign Studies, South Korea

		gimmogan@hufs.ac.kr

	

 	

		Donghee Choi		

		Department of Metabolism, Digestion and Reproduction, 
Imperial College London, United Kingdom

		donghee.choi@imperial.ac.uk

	

   	

		Soonil Bae		

		Police Science Institute, 
Korea National Police University, South Korea

		soonil.bae@police.go.kr

	

   	

		Miyoung Kim*		

		Department of Computing Science, 
University of Alberta, Canada

		miyoung2@ualberta.ca

	

	

		Jaewoo Kang*		

		Data Mining and Information Systems Lab,
Korea University, South Korea

		kangj@korea.ac.kr

	

- &ast;: *Corresponding Author*

# How to Use Dataset

```python

from datasets import load_dataset

# Criminal Hypothesis

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Criminal_Hypothesis")

print(ds["train"][0])     

print(ds["validation"][0])  

print(ds["test"][0])        

# Statute_Mapping 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Statute_Mapping")

# Element_Analysis 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Element_Analysis")

# Fradulent_Intention_Interpretation 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Fradulent_Intention_Interpretation")

# Fradulent_Scenario_Completion 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Fradulent_Scenario_Completion")

# Case_Analysis_NER 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Case_Analysis_NER")

# Deceptive_Message_Analysis 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Deceptive_Message_Analysis")

# Offense_Detection 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PO_Offense_Detection")

# Operational_QA 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PO_Operational_QA")

# Emergency_Reports_Summarization 

ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PT_Emergency_Reports_Summarization")

```

## Link to Dataset

https://huggingface.co/datasets/PSI-PAIRC/Law_and_Order

# Benchmarks

| LLM as                | Task                                | Metric            | GPT4o | Gemini 2.0 | EEVE 10.8B | SOLAR 10.7B | Llama 3.1-8B | Llama 3.2-1B |

|-----------------------|-------------------------------------|-------------------|--------|--------------|--------------|---------------|----------------|----------------|

| Police Officer        | Operational QA                      | LLM-as-a-Judge    | 0.69   | 0.66         | 0.87         | 0.85          | 0.88           | 0.64           |

|                       | Offense Detection                   | ACC               | 0.86   | 0.86         | 0.87         | 0.98          | 0.50           | 0.21           |

|                       |                                     | F1                | 0.90   | 0.93         | 0.95         | 0.99          | 0.77           | 0.61           |

| Intelligence Analyst  | Fraudulent Scenario Detection       | ACC               | 0.97   | 0.87         | 0.99         | 0.99          | 0.86           | 0.63           |

|                       |                                     | F1                | 0.97   | 0.88         | 0.99         | 0.99          | 0.85           | 0.58           |

|                       | Fraudulent Scenario Completion      | LLM-as-a-Judge    | 0.70   | 0.66         | 0.67         | 0.71          | 0.71           | 0.64           |

|                       | Fraudulent Intention Interpretation | ACC               | 0.11   | 0.16         | 0.19         | 0.14          | 0.14           | 0.04           |

|                       |                                     | F1 (micro)        | 0.51   | 0.79         | 0.64         | 0.47          | 0.56           | 0.27           |

|                       | Deceptive Message Analysis          | ACC               | 0.88   | 0.93         | 0.97         | 0.99          | 0.97           | 0.88           |

|                       |                                     | F1 (macro)        | 0.70   | 0.76         | 0.91         | 0.98          | 0.95           | 0.73           |

|                       |                                     | F1 (micro)        | 0.88   | 0.93         | 0.97         | 0.99          | 0.97           | 0.88           |

|                       | Case Analysis NER                   | Precision         | 0.17   | 0.14         | 0.31         | 0.52          | 0.17           | 0.08           |

|                       |                                     | F1 (macro)        | 0.17   | 0.14         | 0.22         | 0.29          | 0.16           | 0.06           |

|                       |                                     | F1 (micro)        | 0.46   | 0.44         | 0.06         | 0.11          | 0.04           | 0.03           |

|                       |                                     | F1 (weighted avg) | 0.51   | 0.49         | 0.26         | 0.42          | 0.22           | 0.11           |

| Patrol Officer        | Emergency Reports Summarization     | LLM-as-a-Judge    | 0.89   | 0.75         | 0.62         | 0.56          | 0.51           | 0.20           |

| Criminal Investigator | Criminal Hypothesis                 | ACC               | 0.73   | 0.62         | 0.74         | 0.62          | 0.62           | 0.62           |

|                       |                                     | F1                | 0.79   | 0.77         | 0.79         | 0.77          | 0.77           | 0.77           |

|                       | Statute Mapping                     | ACC               | 0.43   | 0.40         | 0.86         | 0.88          | 0.19           | 0.07           |

|                       |                                     | F1                | 0.65   | 0.69         | 0.92         | 0.95          | 0.35           | 0.12           |

|                       | Element Analysis                    | ACC               | 0.67   | 0.81         | 0.66         | 0.71          | 0.64           | 0.12           |

|                       |                                     | F1                | 0.84   | 0.93         | 0.81         | 0.88          | 0.82           | 0.24           |

# Licensing Information

Licensed under the CC BY-NC 4.0
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmis-lab/law-and-order

Awesome Lists containing this project

README