{"id":50425005,"url":"https://github.com/dioxuslabs/betlang","last_synced_at":"2026-05-31T10:01:17.316Z","repository":{"id":358582225,"uuid":"1237200286","full_name":"DioxusLabs/betlang","owner":"DioxusLabs","description":"Like guesslang, but smaller","archived":false,"fork":false,"pushed_at":"2026-05-18T04:45:24.000Z","size":3306,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-18T04:56:36.848Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DioxusLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-13T01:11:19.000Z","updated_at":"2026-05-16T23:28:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/DioxusLabs/betlang","commit_stats":null,"previous_names":["dioxuslabs/betlang"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/DioxusLabs/betlang","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DioxusLabs%2Fbetlang","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DioxusLabs%2Fbetlang/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DioxusLabs%2Fbetlang/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DioxusLabs%2Fbetlang/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DioxusLabs","download_url":"https://codeload.github.com/DioxusLabs/betlang/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DioxusLabs%2Fbetlang/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33726719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-31T10:01:16.471Z","updated_at":"2026-05-31T10:01:17.288Z","avatar_url":"https://github.com/DioxusLabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Betlang\n\n[![Crates.io](https://img.shields.io/crates/v/betlang.svg)](https://crates.io/crates/betlang)\n[![Docs.rs](https://docs.rs/betlang/badge.svg)](https://docs.rs/betlang)\n\nCPU source-language detection for code with a tiny 50kb model.\n\n```toml\n[dependencies]\nbetlang = \"0.1.0\"\n```\n\n```rust\nlet detection = betlang::detect(\"fn main() { println!(\\\"hi\\\"); }\");\n\nassert_eq!(detection.language(), Some(betlang::Language::Rust));\n```\n\nUse `betlang::detect(source)` for UTF-8 source strings or byte slices. It\nreturns a `Detection`; call `Detection::language()` to read the top language.\nCall `Detection::top_languages()` when you need ranked probabilities.\n\n## Supported Languages\n\nSlugs parse through the standard `FromStr` implementation:\n\n```rust\nassert_eq!(\"rust\".parse::\u003cbetlang::Language\u003e(), Ok(betlang::Language::Rust));\n```\n\n`asm`, `batch`, `c`, `clojure`, `cmake`, `cobol`, `cpp`, `cs`, `css`, `dart`,\n`dockerfile`, `elixir`, `erlang`, `gemfile`, `gemspec`, `go`, `gradle`,\n`groovy`, `haskell`, `html`, `ini`, `java`, `javascript`, `json`, `julia`,\n`kotlin`, `lisp`, `lua`, `markdown`, `objectivec`, `ocaml`, `perl`, `php`,\n`powershell`, `python`, `r`, `ruby`, `rust`, `scala`, `shell`, `sql`, `swift`,\n`toml`, `typescript`, `vba`, `verilog`, `xml`, `yaml`.\n\nThese are the model's 48 output labels. Runtime detections expose them\none-to-one with no label aggregation.\n\nThe confusion matrix uses the same labels:\n\n![Betlang wordseq confusion](https://raw.githubusercontent.com/ealmloff/betlang/92da743c8c97fd11bb57645ec394371ee7cf836f/assets/confusion-overall.png)\n\n## Model\n\nThe embedded model is `assets/magika/source-student-q4.bin`, a 47,840-byte\nweights-only MSQ1 payload with SHA-256:\n\n```text\n59ef24167bddd1364eb9c1650add8a67e1a542b5155fac67f5e1cda07df0c0f0\n```\n\nArchitecture: `wordseq-b1024-k3-m2048-tiny-3conv-hidden`, tokenizer version 3.\nOn the manifest-aligned held-out filesystem-label test split it reaches\n`test_fs_accuracy=0.965238` with `macro_recall=0.965411`.\n\nSee [MODEL_CARD.md](MODEL_CARD.md) for the training and evaluation summary.\n\n## Performance\n\nBetlang uses a fixed 4096-byte Magika window and pads runtime inference to the\nsame 2048-token shape used by evaluation. The model is loaded once per process\nand then reused through a `OnceLock`.\n\nNative CPU inference dispatches through `fearless_simd`. Benchmark entry points\nare available through `cargo bench`. Current baseline numbers are tracked in\n[BENCHMARKS.md](BENCHMARKS.md).\n\n## License And Attribution\n\nBetlang is licensed under MIT. The embedded student model was trained from\noutputs of Google's Magika teacher model; Magika is published by Google under\nApache-2.0. Keep this attribution with redistributed model artifacts.\n\n## Confusion By File Size\n\nThe shipped wordseq model is evaluated below on the held-out `bigorig` test\nsplit. Each panel is a row-normalized confusion matrix for one file-size\nbucket: actual labels are rows, predicted labels are columns, and the diagonal\nis correct classification.\n\n![Betlang wordseq confusion by file size](https://raw.githubusercontent.com/ealmloff/betlang/92da743c8c97fd11bb57645ec394371ee7cf836f/assets/confusion-by-size.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdioxuslabs%2Fbetlang","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdioxuslabs%2Fbetlang","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdioxuslabs%2Fbetlang/lists"}