{"id":13520920,"url":"https://github.com/marin-community/levanter","last_synced_at":"2026-04-06T06:31:42.914Z","repository":{"id":61262630,"uuid":"496005961","full_name":"marin-community/levanter","owner":"marin-community","description":"Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax","archived":false,"fork":false,"pushed_at":"2026-01-26T20:08:30.000Z","size":15073,"stargazers_count":699,"open_issues_count":25,"forks_count":121,"subscribers_count":13,"default_branch":"main","last_synced_at":"2026-03-15T15:42:06.361Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://levanter.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marin-community.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2022-05-24T22:26:27.000Z","updated_at":"2026-03-15T15:31:08.000Z","dependencies_parsed_at":"2023-12-14T20:51:48.626Z","dependency_job_id":"d02b18dd-0589-49b1-9b6e-d4855ba8f269","html_url":"https://github.com/marin-community/levanter","commit_stats":null,"previous_names":["marin-community/levanter","stanford-crfm/levanter"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/marin-community/levanter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marin-community%2Flevanter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marin-community%2Flevanter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marin-community%2Flevanter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marin-community%2Flevanter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marin-community","download_url":"https://codeload.github.com/marin-community/levanter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marin-community%2Flevanter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31463012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T06:00:24.288Z","updated_at":"2026-04-06T06:31:42.890Z","avatar_url":"https://github.com/marin-community.png","language":"Python","funding_links":[],"categories":["Models and Projects","A01_文本生成_文本对话","Libraries","Python"],"sub_categories":["大语言对话模型及数据"],"readme":"# Levanter\n\n\u003e [!IMPORTANT]\n\u003e **Levanter has been merged into [Marin](https://github.com/marin-community/marin)** as of November 2025.\n\u003e\n\u003e All active development now happens in the [Marin monorepo](https://github.com/marin-community/marin) at [`lib/levanter/`](https://github.com/marin-community/marin/tree/main/lib/levanter).\n\u003e\n\u003e - **Issues**: Please open new issues at [marin-community/marin](https://github.com/marin-community/marin/issues)\n\u003e - **Pull Requests**: Submit new PRs to [marin-community/marin](https://github.com/marin-community/marin)\n\u003e - **Installation**: `pip install levanter` still works\n\u003e\n\u003e See [marin#1773](https://github.com/marin-community/marin/issues/1773) and [marin#1723](https://github.com/marin-community/marin/pull/1723) for details on the merger.\n\n---\n\n\u003ca href=\"https://github.com/stanford-crfm/levanter/actions?query=branch%3Amain++\"\u003e\n    \u003cimg alt=\"Build Status\" src=\"https://img.shields.io/github/actions/workflow/status/stanford-crfm/levanter/run_tests.yaml?branch=main\"\u003e\n\u003c/a\u003e\n\u003ca href=\"https://levanter.readthedocs.io/en/latest/?badge=latest\"\u003e\n    \u003cimg alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/levanter/badge/?version=latest\"\u003e\n\u003c/a\u003e\n\u003ca href=\"\"\u003e\n\u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/stanford-crfm/levanter?color=blue\" /\u003e\n\u003c/a\u003e\n\u003ca href=\"https://https://pypi.org/project/levanter/\"\u003e\n    \u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/levanter?color=blue\" /\u003e\n\u003c/a\u003e\n\n\n\u003c!--levanter-intro-start--\u003e\n\u003e *You could not prevent a thunderstorm, but you could use the electricity; you could not direct the wind, but you could trim your sail so as to propel your vessel as you pleased, no matter which way the wind blew.* \u003cbr/\u003e\n\u003e — Cora L. V. Hatch\n\nLevanter is a framework for training large language models (LLMs) and other foundation models that strives for legibility, scalability, and reproducibility:\n\n1. **Legible**: Levanter uses our named tensor library [Haliax](https://github.com/stanford-crfm/haliax) to write easy-to-follow, composable deep learning code, while still being high performance.\n2. **Scalable**: Levanter scales to large models, and to be able to train on a variety of hardware, including GPUs and TPUs.\n3. **Reproducible**: Levanter is bitwise deterministic, meaning that the same configuration will always produce the same results, even in the face of preemption and resumption.\n\nWe built Levanter with [JAX](https://github.com/jax-ml/jax), [Equinox](https://github.com/patrick-kidger/equinox), and [Haliax](https://github.com/stanford-crfm/haliax).\n\n## Documentation\n\nLevanter's documentation is available at [levanter.readthedocs.io](https://levanter.readthedocs.io/en/latest/).\nHaliax's documentation is available at [haliax.readthedocs.io](https://haliax.readthedocs.io/en/latest/).\n\n## Features\n\n* **Distributed Training**: We support distributed training on TPUs and GPUs, including FSDP and tensor parallelism.\n* **Compatibility**: Levanter supports importing and exporting models to/from the Hugging Face ecosystem, including tokenizers, datasets, and models via [SafeTensors](https://github.com/huggingface/safetensors).\n* **Performance**: Levanter's performance rivals commercially-backed frameworks like MosaicML's Composer or Google's MaxText.\n* **Resilience**: Levanter supports fast, distributed checkpointing and fast resume from checkpoints with no data seek, making Levanter robust to preemption and hardware failure.\n* **Cached On-Demand Data Preprocessing**: We preprocess corpora online, but we cache the results of preprocessing so\nthat resumes are much faster and so that subsequent runs are even faster. As soon as the first part of the cache is complete, Levanter will start training.\n* **Logging**: Levanter logs a rich and detailed set of metrics covering loss and performance. Levanter also supports a few different logging backends, including [WandB](https://wandb.ai/site) and [TensorBoard](https://www.tensorflow.org/tensorboard). (Adding a new logging backend is easy!) Levanter even exposes the ability\nto log inside of JAX `jit`-ted functions.\n* **Reproducibility**: On TPU, Levanter is bitwise deterministic, meaning that the same configuration will always produce the same results, even in the face of preemption and resumption.\n* **Distributed Checkpointing**: Distributed checkpointing is supported via Google's [TensorStore](https://google.github.io/tensorstore/) library. Training can even be resumed on a different number of hosts, though this breaks reproducibility for now.\n* * **Optimization**: Levanter supports the new [Sophia](https://arxiv.org/abs/2305.14342) optimizer, which can be 2x as fast as Adam. We also support [Optax](https://github.com/deepmind/optax) for optimization with AdamW, etc.\n* * **Flexible**: Levanter supports tuning data mixtures without having to retokenize or shuffle data.\n\n\u003c!--levanter-intro-end--\u003e\n\nLevanter was created by [Stanford's Center for Research on Foundation Models (CRFM)](https://crfm.stanford.edu/)'s research engineering team.\nYou can also find us in the #levanter channel on the unofficial [Jax LLM Discord](https://discord.gg/CKazXcbbBm)\n\n## Getting Started\n\nHere is a small set of examples to get you started. For more information about the various configuration options,\nplease see the [Getting Started](./docs/Getting-Started-Training.md) guide or the [In-Depth Configuration Guide](docs/reference/Configuration.md).\nYou can also use `--help` or poke around other configs to see all the options available to you.\n\n\n### Installing Levanter\n\n\u003c!--levanter-installation-start--\u003e\n\nAfter [installing JAX](https://github.com/google/jax/blob/main/README.md#installation) with the appropriate configuration\nfor your platform, you can install Levanter with:\n\n```bash\npip install levanter\n```\n\nor using the latest version from GitHub:\n\n```bash\npip install git+https://github.com/stanford-crfm/levanter.git\nwandb login  # optional, we use wandb for logging\n```\n\nIf you're developing Haliax and Levanter at the same time, you can do something like.\n```bash\ngit clone https://github.com/stanford-crfm/levanter.git\ncd levanter\npip install -e .\ncd ..\ngit clone https://github.com/stanford-crfm/haliax.git\ncd haliax\npip install -e .\ncd ../levanter\n```\n\n\u003c!--levanter-installation-end--\u003e\n\nPlease refer to the [Installation Guide](docs/Installation.md) for more information on how to install Levanter.\n\nIf you're using a TPU, more complete documentation for setting that up is available [here](docs/Getting-Started-TPU-VM.md). GPU support is still in-progress; documentation is available [here](docs/Getting-Started-GPU.md).\n\n\u003c!--levanter-user-guide-start--\u003e\n\n### Training a GPT2-nano\n\nAs a kind of hello world, here's how you can train a GPT-2 \"nano\"-sized model on a small dataset.\n\n```bash\npython -m levanter.main.train_lm --config_path config/gpt2_nano.yaml\n\n# alternatively, if you didn't use -e and are in a different directory\npython -m levanter.main.train_lm --config_path gpt2_nano\n```\n\nThis will train a GPT2-nano model on the [WikiText-103](https://huggingface.co/datasets/Salesforce/wikitext) dataset.\n\n### Training a Llama-small on your own data\n\nYou can also change the dataset by changing the `dataset` field in the config file.\nIf your dataset is a [Hugging Face dataset](https://huggingface.co/docs/datasets/loading_datasets.html), you can use the `data.id` field to specify it:\n\n```bash\npython -m levanter.main.train_lm --config_path config/llama_small_fast.yaml --data.id openwebtext\n\n# optionally, you may specify a tokenizer and/or a cache directory, which may be local or on gcs\npython -m levanter.main.train_lm --config_path config/llama_small_fast.yaml --data.id openwebtext --data.tokenizer \"NousResearch/Llama-2-7b-hf\" --data.cache_dir \"gs://path/to/cache/dir\"\n```\n\nIf instead your data is a list of URLs, you can use the `data.train_urls` and `data.validation_urls` fields to specify them.\nData URLS can be local files, gcs files, or http(s) URLs, or anything that [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) supports.\nLevanter (really, fsspec) will automatically uncompress `.gz` and `.zstd` files, and probably other formats too.\n\n```bash\npython -m levanter.main.train_lm --config_path config/llama_small_fast.yaml --data.train_urls [\"https://path/to/train/data_*.jsonl.gz\"] --data.validation_urls [\"https://path/to/val/data_*.jsonl.gz\"]\n```\n\n### Customizing a Config File\n\nYou can modify the config file to change the model, the dataset, the training parameters, and more. Here's\nthe `llama_small_fast.yaml` file:\n\n```yaml\ndata:\n  train_urls:\n      - \"gs://pubmed-mosaic/openwebtext-sharded/openwebtext_train.{1..128}-of-128.jsonl.gz\"\n  validation_urls:\n      - \"gs://pubmed-mosaic/openwebtext-sharded/openwebtext_val.{1..8}-of-8.jsonl.gz\"\n  cache_dir: \"gs://pubmed-mosaic/tokenized/openwebtext/\"\nmodel:\n  type: llama\n  hidden_dim: 768\n  intermediate_dim: 2048\n  num_heads: 12\n  num_kv_heads: 12\n  num_layers: 12\n  seq_len: 1024\n  gradient_checkpointing: true\ntrainer:\n  tracker:\n    type: wandb\n    project: \"levanter\"\n    tags: [ \"openwebtext\", \"llama\" ]\n\n  mp: p=f32,c=bfloat16\n  model_axis_size: 1\n  per_device_parallelism: 4\n\n  train_batch_size: 512\noptimizer:\n  learning_rate: 6E-4\n  weight_decay: 0.1\n  min_lr_ratio: 0.1\n```\n\n### Other Architectures\n\nCurrently, we support the following architectures:\n\n* GPT-2\n* [LLama](https://ai.meta.com/llama/), including Llama 1, 2 and 3\n* [Gemma](https://ai.google.dev/gemma), including Gemma 1, 2 and Gemma 3.\n* [Qwen2](https://huggingface.co/Qwen/Qwen2.5-7B)\n* [Qwen3](https://huggingface.co/Qwen/Qwen3-8B)\n* [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)\n* [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)\n* [Olmo2](https://huggingface.co/allenai/Olmo-2-1124-7B)\n\nWe plan to add more in the future.\n\nFor speech, we currently only support [Whisper](https://huggingface.co/openai/whisper-large-v3).\n\n#### Continued Pretraining with Llama\n\nHere's an example of how to continue pretraining a Llama 1 or Llama 2 model on the OpenWebText dataset:\n\n```bash\npython -m levanter.main.train_lm --config_path config/llama2_7b_continued.yaml\n```\n\n\n## Distributed and Cloud Training\n\n### Training on a TPU Cloud VM\n\nPlease see the [TPU Getting Started](docs/Getting-Started-TPU-VM.md) guide for more information on how to set up a TPU Cloud VM and run Levanter there.\n\n### Training with CUDA\n\nPlease see the [CUDA Getting Started](docs/Getting-Started-GPU.md) guide for more information on how to set up a CUDA environment and run Levanter there.\n\n\u003c!--levanter-user-guide-end--\u003e\n\n## Contributing\n\n[![GitHub repo Good Issues for newbies](https://img.shields.io/github/issues/stanford-crfm/levanter/good%20first%20issue?style=flat\u0026logo=github\u0026logoColor=green\u0026label=Good%20First%20issues)](https://github.com/stanford-crfm/levanter/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) [![GitHub Help Wanted issues](https://img.shields.io/github/issues/stanford-crfm/levanter/help%20wanted?style=flat\u0026logo=github\u0026logoColor=b545d1\u0026label=%22Help%20Wanted%22%20issues)](https://github.com/stanford-crfm/levanter/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22) [![GitHub Help Wanted PRs](https://img.shields.io/github/issues-pr/stanford-crfm/levanter/help%20wanted?style=flat\u0026logo=github\u0026logoColor=b545d1\u0026label=%22Help%20Wanted%22%20PRs)](https://github.com/stanford-crfm/levanter/pulls?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22) [![GitHub repo Issues](https://img.shields.io/github/issues/stanford-crfm/levanter?style=flat\u0026logo=github\u0026logoColor=red\u0026label=Issues)](https://github.com/stanford-crfm/levanter/issues?q=is%3Aopen)\n\nWe welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for more information.\n\n## License\n\nLevanter is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full license text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarin-community%2Flevanter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarin-community%2Flevanter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarin-community%2Flevanter/lists"}