{"id":51041871,"url":"https://github.com/voidful/barbet","last_synced_at":"2026-06-22T11:02:07.581Z","repository":{"id":364410539,"uuid":"1265604034","full_name":"voidful/Barbet","owner":"voidful","description":"Hugging Face Transformers modeling code for the Barbet language model family","archived":false,"fork":false,"pushed_at":"2026-06-12T22:37:32.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-13T00:16:28.286Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/voidful.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-10T23:32:59.000Z","updated_at":"2026-06-12T22:37:36.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/voidful/Barbet","commit_stats":null,"previous_names":["voidful/barbet"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/voidful/Barbet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FBarbet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FBarbet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FBarbet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FBarbet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/voidful","download_url":"https://codeload.github.com/voidful/Barbet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2FBarbet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34645688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-22T11:02:06.684Z","updated_at":"2026-06-22T11:02:07.575Z","avatar_url":"https://github.com/voidful.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Barbet\n\nBarbet is a Hugging Face Transformers implementation of the Barbet causal\nlanguage model family. The repository provides remote-code compatible modeling\nclasses and three configuration presets: Barbet 300M, Barbet 1B, and a Barbet\n1B 1M research-extension config. The architecture mirrors the R2 revision of the\n[Open Formosa](https://github.com/voidful/open_formosa) training stack\n(Taiwan-Omni-300M-R2 / Taiwan-Omni-1B-R2).\n\nThis repository is intentionally lightweight. It contains model code, config\nmetadata, and checkpoint-conversion tooling. Megatron runtime artifacts remain\nin the Open Formosa training stack.\n\n## Contents\n\n- `BarbetConfig`\n- `BarbetModel`\n- `BarbetForCausalLM`\n- `configs/barbet_300m/config.json`\n- `configs/barbet_1b/config.json`\n- `configs/barbet_1b_1m/config.json`\n- remote-code files for Hugging Face Hub loading:\n  - `configuration_barbet.py`\n  - `modeling_barbet.py`\n\n## Model Summary\n\nBarbet is a decoder-only hybrid language model with:\n\n- grouped-query attention\n- QK RMSNorm\n- RoPE with large-context theta\n- a repeating `global, sliding, sliding, mamba` layer motif\n- local sliding-window attention layers\n- SwiGLU feed-forward layers\n- tied token embeddings and LM head (R2 rebalance: the saved vocab budget\n  funds extra depth)\n- the frozen `voidful/PangolinTokenizer` vocabulary (114944 padded entries)\n- incremental decoding with a hybrid KV/conv-state cache (rolling window for\n  sliding layers, O(1) Mamba steps)\n- optional multi-token prediction loss for training\n- optional QK logit clipping and learnable attention sink (off in the shipped\n  R2 configs, matching the validated upstream recipe)\n- an optional `mamba_ssm` GPU path for Megatron-compatible Mamba2 scan kernels,\n  with a self-contained PyTorch fallback when those kernels are unavailable\n\nThe 300M config (20 layers, 8K context) is the proxy model family used for\nsystems validation. The 1B config (28 layers, 256K context) is the target\nfamily configuration. The 1B 1M config keeps the same weights and enables\nlinear RoPE scaling x4 from the 256K base for inference-time extrapolation\nexperiments.\n\n## Quick Start\n\n```bash\npip install -e \".[dev]\"\npytest -q\n```\n\n```python\nfrom barbet import BarbetConfig, BarbetForCausalLM\n\nconfig = BarbetConfig.barbet_300m()\nmodel = BarbetForCausalLM(config)\n```\n\n## Hugging Face Loading\n\nAfter converted `safetensors` and the remote-code files are uploaded to a\nHugging Face model repository, the model can be loaded with:\n\n```python\nfrom transformers import AutoConfig, AutoModelForCausalLM\n\nconfig = AutoConfig.from_pretrained(\"voidful/barbet-1b-base\", trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\"voidful/barbet-1b-base\", trust_remote_code=True)\n```\n\nThe config files under `configs/` already include the `auto_map` fields required\nfor remote-code loading.\n\n## Checkpoint Conversion\n\nProduction Megatron `torch_dist` checkpoints can be converted with:\n\n```bash\npython scripts/convert_torch_dist_to_hf.py \\\n  --checkpoint /path/to/megatron/checkpoint_dir \\\n  --output-dir /path/to/hf_export \\\n  --force\n```\n\nThe converter exports the main causal-LM path to `model.safetensors`. Megatron\nMTP auxiliary heads are training-only and are intentionally not exported.\n\n## Documentation\n\n- [Architecture](docs/architecture.md)\n- [Configuration](docs/configuration.md)\n- [Transformers Usage](docs/transformers_usage.md)\n- [Checkpoint Conversion](docs/checkpoint_conversion.md)\n- [Long Context](docs/long_context.md)\n- [Development](docs/development.md)\n\n## Current Limitations\n\n- CPU-only Mamba uses the PyTorch fallback. For closest Megatron decode parity,\n  install `mamba_ssm` and run on CUDA so the model uses the fused Mamba2 scan\n  and gated RMSNorm path.\n- The bundled PyTorch reference path can express the 1M RoPE extension, but\n  practical 1M prefill still needs an optimized external long-context runtime.\n  Global attention layers are quadratic without such a runtime.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fbarbet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoidful%2Fbarbet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fbarbet/lists"}