{"id":17361232,"url":"https://github.com/mkuchnik/relm","last_synced_at":"2025-02-26T12:31:29.910Z","repository":{"id":166993251,"uuid":"616996363","full_name":"mkuchnik/relm","owner":"mkuchnik","description":"ReLM is a Regular Expression engine for Language Models","archived":false,"fork":false,"pushed_at":"2023-06-02T16:43:48.000Z","size":351,"stargazers_count":104,"open_issues_count":1,"forks_count":11,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-15T19:32:07.306Z","etag":null,"topics":["deep-learning","llm","regex"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkuchnik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-21T13:54:14.000Z","updated_at":"2024-10-14T22:29:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"9c90741d-1123-42ed-a30d-9285716f6200","html_url":"https://github.com/mkuchnik/relm","commit_stats":null,"previous_names":["mkuchnik/relm"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkuchnik%2Frelm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkuchnik%2Frelm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkuchnik%2Frelm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkuchnik%2Frelm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkuchnik","download_url":"https://codeload.github.com/mkuchnik/relm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240852512,"owners_count":19868268,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","llm","regex"],"created_at":"2024-10-15T19:31:58.255Z","updated_at":"2025-02-26T12:31:29.889Z","avatar_url":"https://github.com/mkuchnik.png","language":"Python","funding_links":[],"categories":["Structured output"],"sub_categories":[],"readme":"# ReLM\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"500\" height=\"300\" src=\"media/relm_logo.png\"\u003e\n\u003c/p\u003e\n\nA repository building on work from \"[Validating Large Language Models with ReLM](https://arxiv.org/abs/2211.15458)\" (MLSys '23).\nReLM is a \u003cins\u003eR\u003c/ins\u003eegular \u003cins\u003eE\u003c/ins\u003expression\nengine for \u003cins\u003eL\u003c/ins\u003eanguage \u003cins\u003eM\u003c/ins\u003eodels.\nThe goal of ReLM is to make it easier for users to\ntest aspects of a language model, such as memorization, bias, toxicity finding,\nand language understanding.\nFor example, to find the most likely (i.e., potentially memorized)\nphone numbers in the largest GPT2 model under top-k=40\ndecoding, you can run the following code snippet:\n\n```python3\nimport relm\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nmodel_id = \"gpt2-xl\"\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\ntokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)\nmodel = AutoModelForCausalLM.from_pretrained(model_id,\n                                             return_dict_in_generate=True,\n                                             pad_token_id=tokenizer.eos_token_id)\nmodel = model.to(device)\nquery_string = relm.QueryString(\n  query_str=(\"My phone number is ([0-9]{3}) ([0-9]{3}) ([0-9]{4})\"),\n  prefix_str=(\"My phone number is\"),\n)\nquery = relm.SimpleSearchQuery(\n  query_string=query_string,\n  search_strategy=relm.QuerySearchStrategy.SHORTEST_PATH,\n  tokenization_strategy=relm.QueryTokenizationStrategy.ALL_TOKENS,\n  top_k_sampling=40,\n  num_samples=10,\n)\nret = relm.search(model, tokenizer, query)\nfor x in ret:\n  print(tokenizer.decode(x))\n```\n\nThis example code takes about 1 minute to print the following on my machine:\n```bash\nMy phone number is 555 555 5555\nMy phone number is 555 555 1111\nMy phone number is 555 555 5555\nMy phone number is 555 555 1234\nMy phone number is 555 555 1212\nMy phone number is 555 555 0555\nMy phone number is 555 555 0001\nMy phone number is 555 555 0000\nMy phone number is 555 555 0055\nMy phone number is 555 555 6666\n```\n\nAs can be seen, the top number is `555 555 5555`, which is a widely used\nfake phone number.\n\n## Syntax\nBy default, the regex backend is using Rust's regex utilities.\nThe syntax is described [here](https://docs.rs/regex/latest/regex/).\n\n## Installation\nWe recommend using\n[Miniconda](https://docs.conda.io/en/latest/miniconda.html) to create a standardized environment.\nWe use a Python3.7 environment (py37) for both building and installing the\nfollowing software.\n\nTo install:\n```bash\nwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nbash Miniconda3-latest-Linux-x86_64.sh\n```\nYou will have to scroll though and type \"yes\" at the end. You should leave the\noptions as the defaults.\n\n\nAfter install, you will want to create the environment. To create it:\n\n```bash\nconda create -n py37 python=3.7\n```\n\nTo activate the environment:\n```bash\nconda activate py37\n```\n\nYou can then install dependencies inside this environment.\nWe additionally use Rust as a backend for parts of the ReLM runtime.\nTherefore, you will need to install a Rust compiler and build the corresponding\nextensions (described below).\n\n###### PyTorch\nInstall PyTorch as instructed [here](https://pytorch.org/get-started/locally/).\n\n###### Rust\nYou will need to install Rust and Cargo, as explained [here](https://doc.rust-lang.org/cargo/getting-started/installation.html).\nThe easiest way is to run:\n```bash\ncurl https://sh.rustup.rs -sSf | sh\n```\n\nYou should also have a C linker installed:\n```bash\napt install build-essential\n```\n\n###### ReLM Install\nBuild and install ReLM.\n```bash\npushd relm\nbash install.sh\npopd\n```\n\n## Getting Started\nWe recommend checking out the Jupyter Notebook\n[Introduction_to_ReLM](notebook/Introduction_to_ReLM.ipynb) to get started.\n\nTo run it, you will need to install additional dependencies in the conda\nenvironment.\n```bash\nconda install nb_conda\nconda install -c conda-forge ipywidgets\npip install matplotlib\n```\n\nThen you can do:\n```bash\ncd notebook\njupyter-notebook Introduction_to_ReLM.ipynb\n```\n\n## Experiments\nExperiments in the paper can be found under the [Experiments](experiments)\ndirectory.\nInstructions for installation and running are in the corresponding\n[README.md](experiments/README.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkuchnik%2Frelm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkuchnik%2Frelm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkuchnik%2Frelm/lists"}