https://github.com/dmis-lab/law-and-order
https://github.com/dmis-lab/law-and-order
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/dmis-lab/law-and-order
- Owner: dmis-lab
- Created: 2025-05-10T23:28:38.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-16T07:15:09.000Z (about 1 year ago)
- Last Synced: 2025-07-18T04:44:42.493Z (11 months ago)
- Language: Python
- Size: 71.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Law & Order
Benchmark Dataset for Evaluating Large Language Models in Policing
## Contributors
Name
Affiliation
Email
Heedou Kim
Korean National Police Agency
Data Mining and Information Systems Lab,
Korea University, South Korea
heedou123@korea.ac.kr
Mogan Gim
Department of Biomedical Engineering,
Hankuk University of Foreign Studies, South Korea
gimmogan@hufs.ac.kr
Donghee Choi
Department of Metabolism, Digestion and Reproduction,
Imperial College London, United Kingdom
donghee.choi@imperial.ac.uk
Soonil Bae
Police Science Institute,
Korea National Police University, South Korea
soonil.bae@police.go.kr
Miyoung Kim*
Department of Computing Science,
University of Alberta, Canada
miyoung2@ualberta.ca
Jaewoo Kang*
Data Mining and Information Systems Lab,
Korea University, South Korea
kangj@korea.ac.kr
- *: *Corresponding Author*
# How to Use Dataset
```python
from datasets import load_dataset
# Criminal Hypothesis
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Criminal_Hypothesis")
print(ds["train"][0])
print(ds["validation"][0])
print(ds["test"][0])
# Statute_Mapping
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Statute_Mapping")
# Element_Analysis
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="CI_Element_Analysis")
# Fradulent_Intention_Interpretation
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Fradulent_Intention_Interpretation")
# Fradulent_Scenario_Completion
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Fradulent_Scenario_Completion")
# Case_Analysis_NER
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Case_Analysis_NER")
# Deceptive_Message_Analysis
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="IA_Deceptive_Message_Analysis")
# Offense_Detection
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PO_Offense_Detection")
# Operational_QA
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PO_Operational_QA")
# Emergency_Reports_Summarization
ds = load_dataset("PSI-PAIRC/Law_and_Order", name="PT_Emergency_Reports_Summarization")
```
## Link to Dataset
https://huggingface.co/datasets/PSI-PAIRC/Law_and_Order
# Benchmarks
| LLM as | Task | Metric | GPT4o | Gemini 2.0 | EEVE 10.8B | SOLAR 10.7B | Llama 3.1-8B | Llama 3.2-1B |
|-----------------------|-------------------------------------|-------------------|--------|--------------|--------------|---------------|----------------|----------------|
| Police Officer | Operational QA | LLM-as-a-Judge | 0.69 | 0.66 | 0.87 | 0.85 | 0.88 | 0.64 |
| | Offense Detection | ACC | 0.86 | 0.86 | 0.87 | 0.98 | 0.50 | 0.21 |
| | | F1 | 0.90 | 0.93 | 0.95 | 0.99 | 0.77 | 0.61 |
| Intelligence Analyst | Fraudulent Scenario Detection | ACC | 0.97 | 0.87 | 0.99 | 0.99 | 0.86 | 0.63 |
| | | F1 | 0.97 | 0.88 | 0.99 | 0.99 | 0.85 | 0.58 |
| | Fraudulent Scenario Completion | LLM-as-a-Judge | 0.70 | 0.66 | 0.67 | 0.71 | 0.71 | 0.64 |
| | Fraudulent Intention Interpretation | ACC | 0.11 | 0.16 | 0.19 | 0.14 | 0.14 | 0.04 |
| | | F1 (micro) | 0.51 | 0.79 | 0.64 | 0.47 | 0.56 | 0.27 |
| | Deceptive Message Analysis | ACC | 0.88 | 0.93 | 0.97 | 0.99 | 0.97 | 0.88 |
| | | F1 (macro) | 0.70 | 0.76 | 0.91 | 0.98 | 0.95 | 0.73 |
| | | F1 (micro) | 0.88 | 0.93 | 0.97 | 0.99 | 0.97 | 0.88 |
| | Case Analysis NER | Precision | 0.17 | 0.14 | 0.31 | 0.52 | 0.17 | 0.08 |
| | | F1 (macro) | 0.17 | 0.14 | 0.22 | 0.29 | 0.16 | 0.06 |
| | | F1 (micro) | 0.46 | 0.44 | 0.06 | 0.11 | 0.04 | 0.03 |
| | | F1 (weighted avg) | 0.51 | 0.49 | 0.26 | 0.42 | 0.22 | 0.11 |
| Patrol Officer | Emergency Reports Summarization | LLM-as-a-Judge | 0.89 | 0.75 | 0.62 | 0.56 | 0.51 | 0.20 |
| Criminal Investigator | Criminal Hypothesis | ACC | 0.73 | 0.62 | 0.74 | 0.62 | 0.62 | 0.62 |
| | | F1 | 0.79 | 0.77 | 0.79 | 0.77 | 0.77 | 0.77 |
| | Statute Mapping | ACC | 0.43 | 0.40 | 0.86 | 0.88 | 0.19 | 0.07 |
| | | F1 | 0.65 | 0.69 | 0.92 | 0.95 | 0.35 | 0.12 |
| | Element Analysis | ACC | 0.67 | 0.81 | 0.66 | 0.71 | 0.64 | 0.12 |
| | | F1 | 0.84 | 0.93 | 0.81 | 0.88 | 0.82 | 0.24 |
# Licensing Information
Licensed under the CC BY-NC 4.0