{"id":27891141,"url":"https://github.com/dmis-lab/monet","last_synced_at":"2025-05-05T11:53:38.527Z","repository":{"id":266787671,"uuid":"899346612","full_name":"dmis-lab/Monet","owner":"dmis-lab","description":"[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers","archived":false,"fork":false,"pushed_at":"2025-01-23T07:00:46.000Z","size":258,"stargazers_count":60,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-25T06:34:12.581Z","etag":null,"topics":["iclr","iclr2025","interpretability","large-language-models","mixture-of-experts","sparse-autoencoders"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2412.04139","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmis-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-06T04:43:04.000Z","updated_at":"2025-03-10T13:36:17.000Z","dependencies_parsed_at":"2025-01-23T08:17:59.316Z","dependency_job_id":"2bdaa691-1ff8-48d5-a137-fb9285951007","html_url":"https://github.com/dmis-lab/Monet","commit_stats":null,"previous_names":["dmis-lab/monet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FMonet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FMonet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FMonet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmis-lab%2FMonet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmis-lab","download_url":"https://codeload.github.com/dmis-lab/Monet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252495077,"owners_count":21757224,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["iclr","iclr2025","interpretability","large-language-models","mixture-of-experts","sparse-autoencoders"],"created_at":"2025-05-05T11:53:37.422Z","updated_at":"2025-05-05T11:53:38.510Z","avatar_url":"https://github.com/dmis-lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Monet: Mixture of Monosemantic Experts for Transformers\n\n[![arXiv](https://img.shields.io/badge/arXiv-2412.04139-b31b1b?style=flat-square)](https://arxiv.org/abs/2412.04139)\n[![Models](https://img.shields.io/badge/%F0%9F%A4%97Hugging_Face-Model_Zoo-ffd200?style=flat-square)](https://huggingface.co/MonetLLM)\n[![Demo](https://img.shields.io/badge/%F0%9F%A4%97Hugging_Face-Demo-ffd200?style=flat-square)](https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer)\n[![code](https://img.shields.io/badge/Github-Code-keygen.svg?logo=github\u0026style=flat-square)](https://github.com/dmis-lab/Monet)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue?style=flat-square)](./LICENSE)\n\n![](./assets/figure1.png)\n\n## Introduction\n\n**Monet** presents a novel approach to enhancing mechanistic interpretability in large language models (LLMs) through an innovative Sparse Mixture-of-Experts (SMoE) architecture. By directly incorporating sparse dictionary learning into end-to-end pretraining, **Monet** addresses the fundamental challenge of polysemanticity - where individual neurons respond to multiple unrelated concepts - while maintaining model performance.\n\n#### ✨Key Highlights\n\n- 📈 **Scalable Expert Architecture**: **Monet** introduces parameter-efficient expert decomposition methods that enable scaling to 262,144 experts per layer while ensuring total parameters scale proportionally to the square root of expert count.\n- 📊 **Monosemantic Experts**: Through fine-grained expert specialization, **Monet** achieves monosemantic experts that demonstrate mutual exclusivity of knowledge, allowing transparent observation of model behavior and parametric knowledge.\n- 🛠️ **Robust Knowledge Control**: The architecture enables precise manipulation of domain-specific knowledge, language capabilities, and toxicity mitigation without compromising general performance.\n\n### Why Monet?\n\nUnlike traditional approaches using post-hoc reconstruction (like Sparse Autoencoders), **Monet** integrates interpretability directly into its architecture. This enables both transparent understanding of model internals and fundamental behavior control. By scaling monosemantic experts, Monet paves the way for more transparent and controllable language models.\n\n## News\n\n- **2025-01-23**: Our paper has been accepted to **ICLR 2025**! 🎉\n- **2024-12-06**: Released **Monet: Mixture of Monosemantic Experts for Transformers** on [arXiv](https://arxiv.org/abs/2412.04139), with [GitHub](https://github.com/dmis-lab/Monet), [models](https://huggingface.co/MonetLLM), and [demo](https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer).\n\n## Model Checkpoints\n\n#### Base Models\n\n\u003ctable class=\"center\"\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eModel\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eDataset\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003e#Params\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003e#Tokens\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eCheckpoint\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eDemo\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\" rowspan=\"4\"\u003e\u003cb\u003eMonet-VD\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\" rowspan=\"3\"\u003e\u003ca href=\"https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu\"\u003eFineWeb-Edu\u003c/a\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e850M\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-vd-850M-100BT-hf\"\u003emonet-vd-850M-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e1.4B\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-vd-1.4B-100BT-hf\"\u003emonet-vd-1.4B-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e🔍\u003ca href=\"https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer\"\u003eViewer\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e4.1B\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-vd-4.1B-100BT-hf\"\u003emonet-vd-4.1B-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e\u003ca href=\"https://huggingface.co/datasets/bigcode/starcoderdata\"\u003eStarCoderData\u003c/a\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e1.4B\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/codemonet-vd-1.4B-100BT-hf\"\u003ecodemonet-vd-1.4B-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e🔍\u003ca href=\"https://huggingface.co/spaces/MonetLLM/codemonet-vd-1.4B-100BT-hf-viewer\"\u003eViewer\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\" rowspan=\"3\"\u003e\u003cb\u003eMonet-HD\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\" rowspan=\"3\"\u003e\u003ca href=\"https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu\"\u003eFineWeb-Edu\u003c/a\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e850M\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-hd-850M-100BT-hf\"\u003emonet-hd-850M-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e1.4B\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-hd-1.4B-100BT-hf\"\u003emonet-hd-1.4B-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e4.1B\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e100BT\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-hd-4.1B-100BT-hf\"\u003emonet-hd-4.1B-100BT-hf\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n#### Instruction-Tuned Models\n\n\u003ctable class=\"center\"\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eModel\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003ePurpose\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eRecipe\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003e#Params\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003cb\u003eCheckpoint\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\" rowspan=\"2\"\u003e\u003cb\u003eMonet-VD\u003c/b\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003eChat Completion\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm\"\u003eSmolLM\u003c/a\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e1.4B\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/monet-vd-1.4B-100BT-chat-hf\"\u003emonet-vd-1.4B-100BT-chat-hf\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd align=\"center\"\u003eVision-Language Model\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/haotian-liu/LLaVA\"\u003eLLaVA\u003c/a\u003e\u003c/td\u003e\n        \u003ctd align=\"center\"\u003e1.6B\u003c/td\u003e\n        \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/MonetLLM/visionmonet-vd-1.4B-100BT-hf\"\u003evisionmonet-vd-1.4B-100BT-hf\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n## Quickstart\n\nYou can explore the core implementation of **Monet** in [modeling_monet.py](./modeling_monet.py). We've made it easy to use Monet by including our custom code in the 🤗[Hugging Face model zoo](https://huggingface.co/MonetLLM). Simply set `trust_remote_code=True` when loading the models through the Transformers library.\n\n### Text Generation\n\n```python\nfrom transformers import pipeline\n\nmodel_name = \"MonetLLM/monet-vd-1.4B-100BT-hf\"\npipe = pipeline(\n    \"text-generation\",\n    model_name,\n    tokenizer=AutoTokenizer.from_pretrained(model_name),\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n    trust_remote_code=True,\n)\nprint(pipe(\"The key to life is\", max_new_tokens=20, do_sample=True)[0][\"generated_text\"])\n```\n\nOutput:\n\n```\n\u003cs\u003e The key to life is learning how to live creatively. The question is: how do we do that, and what will\n```\n\n### Code Generation\n\n```python\nfrom transformers import pipeline\n\nmodel_name = \"MonetLLM/codemonet-vd-1.4B-100BT-hf\"\npipe = pipeline(\n    \"text-generation\",\n    model_name,\n    tokenizer=AutoTokenizer.from_pretrained(model_name),\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n    trust_remote_code=True,\n)\n\ntext = '''\ndef print_len(x: str):\n    \"\"\"For a given string x, print the length of x.\"\"\"\n'''\nprint(pipe(text, max_new_tokens=10)[0][\"generated_text\"].split(\"\\n\\n\")[0])\n```\n\nOutput:\n\n```\n\u003cs\u003e\ndef print_len(x: str):\n    \"\"\"For a given string x, print the length of x.\"\"\"\n    print(len(x))\n```\n\n### Chat Completion\n\n```python\nfrom transformers import pipeline\n\nmodel_name = \"MonetLLM/codemonet-vd-1.4B-100BT-chat-hf\"\npipe = pipeline(\n    \"text-generation\",\n    model_name,\n    tokenizer=AutoTokenizer.from_pretrained(model_name),\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n    trust_remote_code=True,\n)\n\ntext = tokenizer.apply_chat_template(\n    [{\"role\": \"user\", \"content\": \"Hi! How are you?\"}],\n    add_generation_prompt=True,\n    tokenize=False,\n)\nprint(pipe(text, max_new_tokens=30, do_sample=True)[0][\"generated_text\"])\n```\n\nOutput:\n\n```\n\u003cs\u003e[INST] Hi! How are you? [/INST] I'm good, thanks! How can I help you today? \u003c/s\u003e\n```\n\n### Using vLLM\n\nFor enhanced inference performance, **Monet** can be integrated with the vLLM engine. Note that **Monet** requires manual registration with vLLM's `ModelRegistry` before initialization. The custom implementation is provided in [modeling_monet_vllm.py](./modeling_monet_vllm.py).\n\n```python\nfrom vllm import LLM, ModelRegistry, SamplingParams\nfrom modeling_monet_vllm import MonetForCausalLM\n\n# Register Monet architecture with vLLM\nModelRegistry.register_model(\"MonetForCausalLM\", MonetForCausalLM)\n\nmodel = LLM(\n    \"MonetLLM/monet-vd-1.4B-100BT-hf\",\n    trust_remote_code=True,\n    dtype=\"bfloat16\",\n    gpu_memory_utilization=0.8\n)\nsampling_params = SamplingParams(max_tokens=20, temperature=1.0)\nprint(model.generate(\"The key to life is\", sampling_params)[0].outputs[0].text)\n```\nOutput:\n```\n what you’re born with. If you think that you don’t have the same control and\n```\n\n### Get Expert Routing Probabilities\n\nBased on expert routing probabilities, **Monet** enables mechanistic interpretability by understanding which sparse features are activated to which token. Following the standard MoE approach, you can obtain expert routing probabilities for all layers by setting `output_router_probs=True`. The example below demonstrates how to compute and analyze the expert activation patterns:\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"MonetLLM/monet-vd-1.4B-100BT-hf\",\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n    trust_remote_code=True,\n)\ntokenizer = AutoTokenizer.from_pretrained(\"MonetLLM/monet-vd-1.4B-100BT-hf\")\n\ninputs = tokenizer(\"City and County of San Francisco\", return_tensors=\"pt\")\noutputs = model(**inputs.to(model.device), output_router_probs=True)\n\n# Get full expert routing probabilities: [batch_size, seq_len, moe_heads, moe_experts**2]\ng1, g2 = outputs.router_probs[0][0], outputs.router_probs[0][1]\ng = torch.einsum(\"bthi,bthj-\u003ebthij\", g1, g2).flatten(-2)\nprint(g.shape)\n\n# Print number of activated experts per token.\nfor token, routing in zip(inputs.input_ids.squeeze(0), g.squeeze(0)):\n    token = tokenizer.decode(token).ljust(16, \" \")\n    expert_indices = (routing.sum(0) \u003e 1e-2).argwhere().squeeze(-1)\n    print(f\"Token: {token} Activated Experts: {len(expert_indices)}\")\n```\n\nOutput:\n\n```\ntorch.Size([1, 7, 8, 262144])\nToken: \u003cs\u003e              Activated Experts: 62\nToken: City             Activated Experts: 60\nToken: and              Activated Experts: 16\nToken: County           Activated Experts: 102\nToken: of               Activated Experts: 11\nToken: San              Activated Experts: 70\nToken: Francisco        Activated Experts: 67\n```\n\n## Citation\nPlease cite related papers/blogs using this BibTeX if you find this useful for your research and applications.\n```bibtex\n@article{park2024monet,\n      title={{Monet: Mixture of Monosemantic Experts for Transformers}}, \n      author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},\n      journal={arXiv preprint arXiv:2404.05567},\n      year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmis-lab%2Fmonet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmis-lab%2Fmonet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmis-lab%2Fmonet/lists"}