{"id":50262908,"url":"https://github.com/XMUDeepLIT/LegalGraphRAG","last_synced_at":"2026-06-13T03:00:42.151Z","repository":{"id":351437369,"uuid":"1182983194","full_name":"XMUDeepLIT/LegalGraphRAG","owner":"XMUDeepLIT","description":"[ACL'26 Main] Official code for \"LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning\".","archived":false,"fork":false,"pushed_at":"2026-04-23T07:53:44.000Z","size":13462,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T09:09:56.087Z","etag":null,"topics":["graphrag","legal","llms","rag"],"latest_commit_sha":null,"homepage":"https://www.researchgate.net/publication/403734810_LegalGraphRAG_Multi-Agent_Graph_Retrieval-Augmented_Generation_for_Reliable_Legal_Reasoning","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/XMUDeepLIT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-16T06:39:39.000Z","updated_at":"2026-04-23T07:53:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/XMUDeepLIT/LegalGraphRAG","commit_stats":null,"previous_names":["xmudeeplit/legalgraphrag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/XMUDeepLIT/LegalGraphRAG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XMUDeepLIT%2FLegalGraphRAG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XMUDeepLIT%2FLegalGraphRAG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XMUDeepLIT%2FLegalGraphRAG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XMUDeepLIT%2FLegalGraphRAG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/XMUDeepLIT","download_url":"https://codeload.github.com/XMUDeepLIT/LegalGraphRAG/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XMUDeepLIT%2FLegalGraphRAG/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34270417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graphrag","legal","llms","rag"],"created_at":"2026-05-27T12:00:22.872Z","updated_at":"2026-06-13T03:00:42.145Z","avatar_url":"https://github.com/XMUDeepLIT.png","language":"Python","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# **LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning**\n\n\u003e An evaluation framework for legal judgment prediction that integrates multi-agent graph retrieval and supports reproducible comparisons across multiple models and baselines.\n\n\u003c!-- \u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.researchgate.net/publication/403734810_LegalGraphRAG_Multi-Agent_Graph_Retrieval-Augmented_Generation_for_Reliable_Legal_Reasoning\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Paper-ResearchGate-blue?style=flat-square\" alt=\"Paper\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/DEEP-PolyU/LegalGraphRAG\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/GitHub-Project-181717?logo=github\u0026style=flat-square\" alt=\"GitHub\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e --\u003e\n\n---\n\n## 🚀 **Highlights**\n- ✅ **Automated Evaluation**: Computes `Accuracy (Acc)` and `Micro-F1` automatically for legal judgment prediction tasks.\n- ✅ **Multi-Model Support**: Supports Qwen, DeepSeek, GPT, InternLM, GLM, Gemma, and more.\n- ✅ **Dataset Coverage**: Includes legal datasets such as CAIL and CMDL.\n- ✅ **Baseline Comparison**: Enables direct comparison with `HippoRAG2`, `RAPTOR`, `LightRAG`, `LegalΔ`, and `ADAPT`.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/method.png\" width=\"95%\" alt=\"Framework Overview\"\u003e\n\u003c/p\u003e\n\n---\n\n## 🧩 **Project Structure**\n\n```text\nLegalGraphRAG/\n├── core/                      # Core modules\n│   ├── LegalGraphRAG.py       # Main LegalGraphRAG class\n│   ├── models/                # Model implementations\n│   │   ├── transformers/      # Transformers-based models (Qwen, InternLM, GLM, Gemma)\n│   │   └── openai/            # OpenAI-compatible models (DeepSeek, GPT)\n│   ├── graph_construct/       # Graph construction and management\n│   ├── judge/                 # Legal judgment modules\n│   ├── preprocess/            # Data preprocessing\n│   ├── prompt/                # Prompt templates\n│   └── utils/                 # Utility functions\n├── scripts/                   # Data preparation scripts\n├── raw_data/                  # User-provided source files for preprocessing\n├── datas/                     # Generated preprocessing outputs\n├── run.py                     # Main evaluation script\n├── env.example                # Configuration file template\n└── README.md                  # Project documentation\n```\n\n---\n\n## 🛠️ **Usage**\n\n### 1️⃣ Environment Setup\n\n```bash\n# Install dependencies\npip install -r requirements.txt\n\n# Copy and configure environment file\ncp env.example .env\n# Edit .env with model paths, API keys, and runtime settings\n```\n\n### 2️⃣ Data Preparation (CAIL Example)\n\nPut these source files under `./raw_data/`:\n\n- `final_test.json`: raw CAIL case records used to build the case corpus.\n- `law_to_crime.json`: base mapping from law article ids to candidate crimes.\n- `criminal_law_processed.json`: structured criminal law articles (article id + item texts).\n- `judicial_explanations.json`: judicial interpretation snippets linked to law article ids.\n- `law_corpus.jsonl`: full law text corpus used as fallback when law text is missing.\n\nUse one command to prepare all required data:\n\n```bash\npython scripts/prepare_data.py --dotenv-path .env --raw-data-dir ./raw_data\n```\n\nThis pipeline does four things in order:\n\n- Builds sampled CAIL cases from raw records.\n- Generates evaluation input file under `datasets/`.\n- Uses an LLM to extract structured case features.\n- Uses an LLM to generate law judgment dependency hints.\n- Merges law resources into final project-ready law mapping data.\n\nAfter these steps, make sure both files exist:\n\n- `datas/cases_with_feature.json`\n- `datasets/crime_data_CAIL_small.json`\n- `datas/law_to_crime.json`\n\n### 3️⃣ Run Evaluation\n\n```bash\npython run.py --model qwen3 --datasets CAIL --devices cuda:2 cuda:3\n```\n\n**Main arguments**\n\n- `--model`: `qwen3`, `qwen2_5`, `gemma3`, `internlm3`, `glm4`, `deepseek_v3`, `gpt4o_mini`\n- `--datasets`: dataset name, e.g. `CAIL`, `CMDL`\n- `--dotenv_path`: path to `.env` (default: `.env`)\n- `--datasets_path`: path to datasets (default: `./datasets`)\n- `--devices`: GPU devices, e.g. `cuda:0 cuda:1`\n- `--no-build-graph`: skip graph construction when graph already exists\n- `--force-rebuild`: force graph rebuild even if artifacts already exist\n\n### 4️⃣ Output Files\n\n- Prediction outputs:\n  - `{output_dir}/{dataset}/{model}_results_combined.json`\n- Statistics:\n  - `{output_dir}/{dataset}/{model}_stats.json`\n\nExample output summary:\n\n```json\n{\n  \"model_name\": \"qwen3\",\n  \"dataset\": \"CAIL\",\n  \"total_cases\": 1000,\n  \"correct_count\": 850,\n  \"elapsed_time\": 3600.0,\n  \"output_file\": \"./outputs/CAIL/qwen3_results_combined.json\"\n}\n```\n\n---\n\n## ⚙️ **Configuration**\n\nConfiguration is managed via `.env`. Key groups include:\n\n- **Model Configuration**: model names, devices, API keys, generation parameters\n- **Data Configuration**: dataset paths and output directory\n- **Graph Configuration**: graph construction and retrieval settings\n\nSee `env.example` for the full configuration list.\n\n---\n\n## 🎯 **Supported Models**\n\n- Qwen3-8B\n- Qwen2.5-7B-Instruct\n- DeepSeek-V3\n- GPT-4o-mini\n- InternLM3\n- GLM-4\n\n---\n\n## ⚡ **Multi-GPU Execution**\n\nRun on multiple GPUs by passing several devices:\n\n```bash\npython run.py --model qwen3 --datasets CAIL --devices cuda:0 cuda:1 cuda:2 cuda:3\n```\n\nCases are automatically distributed across the selected devices.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FXMUDeepLIT%2FLegalGraphRAG","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FXMUDeepLIT%2FLegalGraphRAG","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FXMUDeepLIT%2FLegalGraphRAG/lists"}