{"id":50792837,"url":"https://github.com/cryptojones/selma","last_synced_at":"2026-06-12T12:02:24.101Z","repository":{"id":357780919,"uuid":"1238050268","full_name":"CryptoJones/SELMA","owner":"CryptoJones","description":"An Open-Source Model Trained for Law Enforcement","archived":false,"fork":false,"pushed_at":"2026-06-07T08:37:33.000Z","size":955,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-07T10:21:14.717Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CryptoJones.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-13T19:06:15.000Z","updated_at":"2026-06-07T08:37:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/CryptoJones/SELMA","commit_stats":null,"previous_names":["cryptojones/selma"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CryptoJones/SELMA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CryptoJones%2FSELMA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CryptoJones%2FSELMA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CryptoJones%2FSELMA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CryptoJones%2FSELMA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CryptoJones","download_url":"https://codeload.github.com/CryptoJones/SELMA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CryptoJones%2FSELMA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34243053,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-12T12:01:45.548Z","updated_at":"2026-06-12T12:02:23.894Z","avatar_url":"https://github.com/CryptoJones.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SELMA — Specified Encapsulated Limitless Memory Archive\n\n\u003e **Before deploying in an operational context, read [LIMITATIONS.md](LIMITATIONS.md).**\n\n\n**An Open-Source Model Trained for Law Enforcement**\n\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg?logo=apache)](https://opensource.org/licenses/Apache-2.0)\n[![HuggingFace](https://img.shields.io/badge/HuggingFace-Ronin48LLC%2Fselma--lora--adapter-FFD21E?logo=huggingface\u0026logoColor=000)](https://huggingface.co/Ronin48LLC/selma-lora-adapter)\n[![Codeberg](https://img.shields.io/badge/Codeberg-Ronin48%2FSELMA-2185D0?logo=codeberg\u0026logoColor=white)](https://codeberg.org/Ronin48/SELMA)\n[![GitHub](https://img.shields.io/badge/GitHub-CryptoJones%2FSELMA-181717?logo=github\u0026logoColor=white)](https://github.com/CryptoJones/SELMA)\n[![Version](https://img.shields.io/badge/version-v0.1.1-blue)](CHANGELOG.md)\n\n```\npython3 assets/banner.py\n```\n\n\u003e *\"Justice will not be served until those who are unaffected are as outraged as those who are.\"*\n\u003e — Benjamin Franklin\n\n\u003e *\"I am SELMA — Specified Encapsulated Limitless Memory Archive. I am always here.\"*\n\u003e — SELMA, *Time Traxx* (1993)\n\n---\n\n## Supporters\n\nSELMA is community-funded. Every contribution — great or small — keeps this project free, open,\nand in the hands of the people it is meant to serve.\n\n| Donor | Amount | Note |\n|---|---|---|\n| Ronin 48, LLC | N/A | Founding donor \u0026 primary sponsor of research time and equipment |\n\n*Want to support SELMA? See [CONTRIBUTING.md](CONTRIBUTING.md) or reach out to the maintainers.*\n\n---\n\n## Overview\n\nSELMA is an open-source machine learning model fine-tuned to assist law enforcement professionals\nin identifying potential violations of criminal law. Given an incident description or fact pattern,\nSELMA identifies applicable federal and state criminal statutes, carefully breaks down the elements\nof each offense, maps those elements to the facts at hand, and provides structured, transparent\nlegal reasoning — all in plain language.\n\nSELMA was built on the conviction that good tools should be open, accountable, and freely available\nto every agency regardless of budget. It does not replace prosecutors, attorneys, or judicial\nreview — it is a force-multiplier for the investigator who needs a place to start.\n\n---\n\n## Architecture\n\n- **Base Model:** [Meta Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (Llama 3.1 Community License)\n- **Fine-tuning Method:** QLoRA (4-bit quantization with Low-Rank Adaptation)\n- **Context Window:** 128K tokens (native)\n- **Quantization:** NF4 double quantization via bitsandbytes\n- **Origin:** Meta Platforms, Inc. (United States)\n\n\u003e **Why Llama 3.3 70B?** See [docs/MODEL_SELECTION.md](docs/MODEL_SELECTION.md) for the full\n\u003e rationale, including national security, licensing, and performance considerations.\n\n---\n\n## Capabilities\n\nGiven an incident description, SELMA can:\n\n1. **Statute Identification** — Identify which federal and/or state criminal statutes may have\n   been violated, cited by title, chapter, and section\n2. **Element Analysis** — Break down the elements of each identified offense and map them to\n   specific facts present in the incident description\n3. **Charge Classification** — Classify potential charges by severity (felony/misdemeanor), degree,\n   and jurisdiction, including mandatory minimum and maximum penalties\n4. **Legal Reasoning** — Provide transparent, chain-of-thought reasoning explaining why each\n   statute applies or does not apply, so the operator can evaluate the analysis rather than\n   simply accepting it\n5. **Cross-Reference** — Flag related statutes, lesser included offenses, concurrent jurisdiction\n   issues, and federal/state overlap\n---\n\n## Constitutional Override\n\nThe U.S. Constitution is the supreme law of the land, and SELMA is trained to know it.\nNo statute, regulation, or agency policy overrides the Bill of Rights. Where SELMA identifies\na potential charge that implicates constitutional protections — an unlawful search, a coerced\nconfession, a due process violation — it will say so plainly:\n\n\u003e ⚠ **CONSTITUTIONAL CONCERN** — evidence obtained through this method may be subject to\n\u003e suppression under the [Amendment]. SELMA recommends consulting with the prosecuting attorney\n\u003e before charging.\n\nThis is not a limitation. It is the feature.\n\n---\n\n## Jurisdictions Covered\n\nSELMA trains a separate model per jurisdiction. Every state model includes federal law as\nbaseline. See [docs/MULTI_STATE_ARCHITECTURE.md](docs/MULTI_STATE_ARCHITECTURE.md).\n\n- **Federal:** U.S. Code Title 18 — Crimes and Criminal Procedure (baseline for all models)\n- **50 State Models:** Each state's criminal code + federal law\n- **Priority states:** Georgia (O.C.G.A. Title 16), California, Texas, New York, Florida\n\n---\n\n## Where to Get SELMA\n\nSELMA is published on multiple platforms. Choose the one that fits your environment:\n\n### Ollama (Recommended for most users)\n\nNo Python, no GPU, no configuration required. Works on any machine with Ollama installed:\n\n```bash\nollama run Ronin48/selma\n```\n\nThe published model uses Llama 3.3 70B with SELMA's full system prompt and inference\nparameters. A fine-tuned QLoRA version (v1.0.0) will replace it upon training completion.\n\n### HuggingFace\n\nAdapter weights and merged model weights will be published at:\n\n- **LoRA Adapter:** `Ronin48LLC/selma-lora-adapter` — the fine-tuned adapter only (smaller download)\n- **Merged Model:** `Ronin48LLC/selma-70b` — full merged weights, ready for inference\n- **Quantized (GGUF):** `Ronin48LLC/selma-70b-GGUF` — for use with llama.cpp, LM Studio, and Ollama\n\n```python\nfrom peft import PeftModel\nfrom transformers import AutoModelForCausalLM\n\nbase = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.3-70B-Instruct\")\nmodel = PeftModel.from_pretrained(base, \"Ronin48LLC/selma-lora-adapter\")\n```\n\n### LM Studio\n\nOnce the GGUF weights are published to HuggingFace, SELMA will be searchable and\ndownloadable directly inside LM Studio. Search for `Ronin48LLC/selma`.\n\n---\n\n## Project Structure\n\n```\nSELMA/\n├── LICENSE                          # Apache 2.0\n├── README.md                        # This file\n├── SECURITY.md                      # Security policy\n├── CONTRIBUTING.md                  # Contribution guidelines\n├── models/\n│   ├── federal/                     # Federal-only model (18 U.S.C.)\n│   │   ├── config.yaml\n│   │   ├── README.md\n│   │   └── training_data/\n│   ├── georgia/                     # Georgia + federal\n│   │   ├── config.yaml\n│   │   ├── README.md\n│   │   └── training_data/\n│   ├── california/                  # California + federal\n│   │   └── ...\n│   └── [48 more states]/            # One directory per state\n├── configs/\n│   ├── training_config.yaml         # Base QLoRA fine-tuning configuration\n│   └── model_config.yaml            # Model inference configuration\n├── data/\n│   ├── raw/                         # Downloaded source data\n│   ├── processed/                   # Cleaned, structured statute data\n│   └── synthetic/                   # Generated training examples\n├── scripts/\n│   ├── data_collection/\n│   ├── training/\n│   │   ├── train_qlora.py           # Core QLoRA trainer\n│   │   ├── train_state.py           # Multi-state training orchestrator\n│   │   ├── prepare_dataset.py\n│   │   └── merge_adapter.py\n│   └── evaluation/\n├── src/selma/                       # Core Python modules\n├── tests/\n└── docs/\n    ├── TRAINING.md\n    ├── DATA_SOURCES.md\n    ├── USAGE.md\n    ├── MODEL_SELECTION.md           # Why Llama 3.3 70B (not Chinese models)\n    ├── MULTI_STATE_ARCHITECTURE.md  # 50-state model design\n    ├── OWASP_COMPLIANCE.md          # Full security evaluation\n    └── SECURITY.md\n```\n\n---\n\n## Quick Start\n\n```bash\n# Install dependencies\npip install -r requirements.txt\n# flash-attn is optional but strongly recommended for training speed\npip install flash-attn --no-build-isolation\n\n# Authenticate with HuggingFace (required — Llama 3.1 is a gated model)\n# First accept the license at: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct\nhuggingface-cli login\n\n# Download training data\npython scripts/data_collection/fetch_federal_statutes.py\npython scripts/data_collection/fetch_georgia_statutes.py\npython scripts/data_collection/fetch_legal_datasets.py\n\n# Generate synthetic training examples (~50K incident-to-statute pairs)\npython scripts/data_collection/generate_synthetic.py\n\n# Prepare the dataset\npython scripts/training/prepare_dataset.py\n\n# Fine-tune the model (requires A100-80GB or equivalent, ~6-10 hours)\n# The merge step (merge_adapter.py) requires ~140GB system RAM\npython scripts/training/train_qlora.py --config configs/training_config.yaml\n\n# Merge LoRA adapter into base model\npython scripts/training/merge_adapter.py\n\n# Run inference\npython -m src.selma.model --input \"Describe an incident...\"\n```\n\n---\n\n## Training Data Sources\n\n| Source | Description | Size | License |\n|--------|-------------|------|---------|\n| U.S. Code Title 18 | Federal criminal statutes (USLM XML) | ~2,700 sections | Public Domain |\n| O.C.G.A. Title 16 | Georgia criminal code | ~500 sections | Fair Use |\n| ALEA US Courts | Federal court filings with NOS codes | 491K examples | Open |\n| LegalBench | Legal reasoning benchmark tasks | 91.8K examples | Open |\n| CaseHOLD | Legal holding classification | 585K examples | Open |\n| Digital Forensics Case Law | CFAA prosecutions, search/seizure digital | ~5K opinions | Public Domain |\n| Synthetic | Generated incident-to-statute mappings | ~50K examples | Apache 2.0 |\n\n---\n\n## Related Models — Ronin 48 First Responder Suite\n\nSELMA, BONES, and BRUNO are the three first responder models. Law enforcement, EMS, and fire share scenes constantly — consult the appropriate model for each domain.\n\n| Model | Domain | Use When... |\n|---|---|---|\n| **SELMA** | Law Enforcement | Criminal statute identification, charge elements, constitutional flags |\n| **[BONES](https://codeberg.org/Ronin48/BONES)** | EMS — EMR / EMT / AEMT / Paramedic | Patient assessment, treatment protocols, drug dosing, triage, transport |\n| **[BRUNO](https://codeberg.org/Ronin48/BRUNO)** *(Building Rescue and Unified Navigation Operations)* | Fire Service — Company Officer / IC | Fireground tactics, size-up, hazmat, extrication, water supply, ICS |\n\n### Common Shared Scenes\n\n| Scene Type | Primary | Support |\n|---|---|---|\n| Overdose call | BONES (patient care, naloxone) | SELMA (distribution charges if applicable) |\n| Domestic violence with injuries | SELMA (criminal charges, elements) | BONES (patient care) |\n| Active shooter / active threat | SELMA (legal authority, use of force) | BONES (casualty care, TECC) + BRUNO (scene safety, ICS) |\n| Mental health crisis with violence | SELMA (criminal elements) | BONES (patient assessment) |\n| Arson with casualties | BRUNO (fireground, origin/cause) | SELMA (arson statutes) + BONES (patient care) |\n| DUI crash with injuries | SELMA (criminal charges) | BONES (patient care) + BRUNO (extrication if needed) |\n| Mass casualty incident | BONES (triage, treatment) | BRUNO (ICS, sectors) + SELMA (criminal nexus if applicable) |\n\n\u003e SELMA pairs with [ATTICUS](https://codeberg.org/Ronin48/ATTICUS) *(Advocacy, Trial, Testimony, Innocence, Case, Unified Scout)* — every capability SELMA gives law enforcement has a counterpart in the hands of the public defender. ABBY (digital forensics) operates independently of the first responder suite.\n\n---\n\n## Disclaimer\n\nSELMA is a research tool designed to assist law enforcement professionals. It is\n**NOT** a substitute for legal counsel, prosecutorial judgment, or judicial review.\nAll outputs should be verified by qualified legal professionals before any action\nis taken. The model may produce incorrect or incomplete legal analysis.\n\nSELMA does not advocate for any outcome. It identifies what the law says. The decision\nto charge, to investigate further, or to pursue alternative courses of action remains\nentirely with the human operator and the appropriate legal authorities.\n\nThis software is provided \"AS IS\" without warranty of any kind. The developers assume\nno liability for decisions made based on SELMA's outputs.\n\n---\n\n## Security\n\nSELMA has been evaluated against:\n- **OWASP Top 10 for LLM Applications (2025)** — AI-specific threats\n- **OWASP Top 10 for Web Applications (2021)** — General software security\n\nSee [docs/OWASP_COMPLIANCE.md](docs/OWASP_COMPLIANCE.md) for the full evaluation\nand [SECURITY.md](SECURITY.md) for the security policy.\n\n---\n\n## Training Notes\n\nIf you're training SELMA on RunPod or another GPU cloud provider, read [LESSONS_LEARNED.md](LESSONS_LEARNED.md)\nbefore you start. ABBY's file has the most complete record of first-run errors and fixes —\nSELMA's file links there and will capture any SELMA-specific issues as they arise.\n\n---\n\n## Contributing\n\nContributions are welcome. Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\nSubject matter experts in criminal law, digital forensics, and constitutional law are\nespecially encouraged to contribute.\n\n---\n\n## License\n\n**Project Code, Data, and Documentation:** Apache License 2.0 — Copyright 2026 Ronin 48, LLC. See [LICENSE](LICENSE).\n\n**Base Model Weights:** Meta Llama 3.1 Community License. See [docs/MODEL_SELECTION.md](docs/MODEL_SELECTION.md) for details.\nFine-tuned adapter weights and all original SELMA contributions remain Apache 2.0.\n\n---\n\nProudly Made in Nebraska. Go Big Red! 🌽\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcryptojones%2Fselma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcryptojones%2Fselma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcryptojones%2Fselma/lists"}