{"id":13440107,"url":"https://github.com/greyblake/whatlang-rs","last_synced_at":"2025-05-13T17:07:27.990Z","repository":{"id":12844248,"uuid":"72954239","full_name":"greyblake/whatlang-rs","owner":"greyblake","description":"Natural language detection library for Rust. Try demo online: https://whatlang.org/","archived":false,"fork":false,"pushed_at":"2025-03-12T10:56:38.000Z","size":2146,"stargazers_count":1010,"open_issues_count":10,"forks_count":112,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-04-24T01:52:58.625Z","etag":null,"topics":["ai","algorithm","classifier","detect-language","language","language-recognition","nlp","rust","rustlang","text-analysis","text-classification","text-classifier","whatlang"],"latest_commit_sha":null,"homepage":"https://whatlang.org/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greyblake.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"SUPPORTED_LANGUAGES.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-05T21:26:51.000Z","updated_at":"2025-04-20T22:39:17.000Z","dependencies_parsed_at":"2024-01-08T14:30:36.238Z","dependency_job_id":"4d9f1d00-369a-4eb2-a13b-dce42d4e1a08","html_url":"https://github.com/greyblake/whatlang-rs","commit_stats":{"total_commits":485,"total_committers":23,"mean_commits":21.08695652173913,"dds":"0.12577319587628866","last_synced_commit":"0c03d281a8d327558ab89632d2d997e644a8c7dd"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greyblake%2Fwhatlang-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greyblake%2Fwhatlang-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greyblake%2Fwhatlang-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greyblake%2Fwhatlang-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greyblake","download_url":"https://codeload.github.com/greyblake/whatlang-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253990466,"owners_count":21995774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","algorithm","classifier","detect-language","language","language-recognition","nlp","rust","rustlang","text-analysis","text-classification","text-classifier","whatlang"],"created_at":"2024-07-31T03:01:19.840Z","updated_at":"2025-05-13T17:07:27.969Z","avatar_url":"https://github.com/greyblake.png","language":"Rust","funding_links":[],"categories":["Libraries","Rust","库 Libraries","库","函式庫","Packages","NLP"],"sub_categories":["Text processing","文本处理 Text processing","文本处理","書籍","Libraries"],"readme":"\u003cp align=\"center\"\u003e\u003cimg width=\"160\" src=\"https://raw.githubusercontent.com/greyblake/whatlang-rs/master/misc/logo/whatlang-logo.svg\" alt=\"Whatlang - rust library for natural language detection\"\u003e\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eWhatlang\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003eNatural language detection for Rust with focus on simplicity and performance.\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003ca href=\"https://whatlang.org/\" target=\"_blank\"\u003eTry online demo.\u003c/a\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/greyblake/whatlang-rs/actions/workflows/ci.yml\" rel=\"nofollow\"\u003e\u003cimg src=\"https://github.com/greyblake/whatlang-rs/actions/workflows/ci.yml/badge.svg\" alt=\"Build Status\"\u003e\u003c/a\u003e\n\u003ca href=\"https://raw.githubusercontent.com/greyblake/whatlang-rs/master/LICENSE\" rel=\"nofollow\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"License\"\u003e\u003c/a\u003e\n\u003ca href=\"https://docs.rs/whatlang\" rel=\"nofollow\"\u003e\u003cimg src=\"https://docs.rs/whatlang/badge.svg\" alt=\"Documentation\"\u003e\u003c/a\u003e\n\u003cp\u003e\n\n[![Stand With Ukraine](https://raw.githubusercontent.com/vshymanskyy/StandWithUkraine/main/banner2-direct.svg)](https://stand-with-ukraine.pp.ua/)\n\n## Content\n* [Features](#features)\n* [Get started](#get-started)\n* [Who uses Whatlang?](#who-uses-whatlang)\n* [Documentation](https://docs.rs/whatlang)\n* [Supported languages](https://github.com/greyblake/whatlang-rs/blob/master/SUPPORTED_LANGUAGES.md)\n* [Feature toggles](#feature-toggles)\n* [How does it work?](#how-does-it-work)\n  * [How language recognition works?](#how-language-recognition-works)\n  * [How is_reliable calculated?](#how-is_reliable-calculated)\n* [Running benchmark](#running-benchmarks)\n* [Comparison with alternatives](#comparison-with-alternatives)\n* [Ports and clones](#ports-and-clones)\n* [Donations](#donations)\n* [Derivation](#derivation)\n* [License](#license)\n* [Contributors](#contributors)\n\n\n## Features\n* Supports [69 languages](https://github.com/greyblake/whatlang-rs/blob/master/SUPPORTED_LANGUAGES.md)\n* 100% written in Rust\n* Lightweight, fast and simple\n* Recognizes not only a language, but also a script (Latin, Cyrillic, etc)\n* Provides reliability information\n\n## Get started\n\nExample:\n\n```rust\nuse whatlang::{detect, Lang, Script};\n\nfn main() {\n    let text = \"Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!\";\n\n    let info = detect(text).unwrap();\n    assert_eq!(info.lang(), Lang::Epo);\n    assert_eq!(info.script(), Script::Latin);\n    assert_eq!(info.confidence(), 1.0);\n    assert!(info.is_reliable());\n}\n```\n\nFor more details (e.g. how to blacklist some languages) please check the [documentation](https://docs.rs/whatlang).\n\n## Who uses Whatlang?\n\nWhatlang is used within the following big projects as direct or indirect dependency for language recognition.\nYou're gonna be in a great company using Whatlang:\n\n* [Sonic](https://github.com/valeriansaliou/sonic) - fast, lightweight and schema-less search backend in Rust.\n* [Meilisearch](https://github.com/meilisearch) - an open-source, easy-to-use, blazingly fast, and hyper-relevant search engine built in Rust.\n\n## Feature toggles\n\n| Feature     | Description                                                                           |\n|-------------|---------------------------------------------------------------------------------------|\n| `enum-map`  | `Lang` and `Script` implement `Enum` trait from [enum-map](https://docs.rs/enum-map/) |\n| `arbitrary` | Support [Arbitrary](https://crates.io/crates/arbitrary)                               |\n| `serde`     | Implements `Serialize` and `Deserialize` for `Lang` and `Script`                      |\n| `dev`       | Enables `whatlang::dev` module which provides some internal API.\u003cbr/\u003e It exists for profiling purposes and normal users are discouraged to to rely on this API.  |\n\n## How does it work?\n\n### How does the language recognition work?\n\nThe algorithm is based on the trigram language models, which is a particular case of n-grams.\nTo understand the idea, please check the original whitepaper [Cavnar and Trenkle '94: N-Gram-Based Text Categorization'](https://www.researchgate.net/publication/2375544_N-Gram-Based_Text_Categorization).\n\n### How is `is_reliable` calculated?\n\nIt is based on the following factors:\n* How many unique trigrams are in the given text\n* How big is the difference between the first and the second(not returned) detected languages? This metric is called `rate` in the code base.\n\nTherefore, it can be presented as 2d space with threshold functions, that splits it into \"Reliable\" and \"Not reliable\" areas.\nThis function is a hyperbola and it looks like the following one:\n\n\u003cimg alt=\"Language recognition whatlang rust\" src=\"https://raw.githubusercontent.com/greyblake/whatlang-rs/master/misc/images/whatlang_is_reliable.png\" width=\"450\" height=\"300\" /\u003e\n\nFor more details, please check a blog article [Introduction to Rust Whatlang Library and Natural Language Identification Algorithms](https://www.greyblake.com/blog/introduction-to-rust-whatlang-library-and-natural-language-identification-algorithms/).\n\n## Make tasks\n\n* `make bench` - run performance benchmarks\n* `make doc` - generate and open doc\n* `make test` - run tests\n* `make watch` - watch changes and run tests\n\n## Comparison with alternatives\n\n|                           | Whatlang   | CLD2        | CLD3           |\n| ------------------------- | ---------- | ----------- | -------------- |\n| Implementation language   | Rust       | C++         | C++            |\n| Languages                 | 68         | 83          | 107            |\n| Algorithm                 | trigrams   | quadgrams   | neural network |\n| Supported Encoding        | UTF-8      | UTF-8       | ?              |\n| HTML support              | no         | yes         | ?              |\n\n\n## Ports and clones\n\n* [whatlang-ffi](https://github.com/greyblake/whatlang-ffi) - C bindings\n* [whatlanggo](https://github.com/abadojack/whatlanggo) - whatlang clone for Go language\n* [whatlang-py](https://github.com/cathalgarvey/whatlang-py) - bindings for Python\n* [whatlang-rb](https://gitlab.com/KitaitiMakoto/whatlang-rb) - bindings for Ruby\n* [whatlangex](https://github.com/pierrelegall/whatlangex) - bindings for Elixir\n\n## Donations\n\nYou can support the project by donating [NEAR tokens](https://near.org).\n\nOur NEAR wallet address is `whatlang.near`\n\n## Derivation\n\n**Whatlang** is a derivative work from [Franc](https://github.com/wooorm/franc) (JavaScript, MIT) by [Titus Wormer](https://github.com/wooorm).\n\n## License\n\n[MIT](https://github.com/greyblake/whatlang-rs/blob/master/LICENSE) © [Sergey Potapov](http://greyblake.com/)\n\n\n## Contributors\n\n- [greyblake](https://github.com/greyblake) Potapov Sergey - creator, maintainer.\n- [Dr-Emann](https://github.com/Dr-Emann) Zachary Dremann - optimization and improvements\n- [BaptisteGelez](https://github.com/BaptisteGelez) Baptiste Gelez - improvements\n- [Vishesh Chopra](https://github.com/KarmicKonquest) - designed the logo\n- [Joel Natividad](https://github.com/jqnatividad) - support of Tagalog\n- [ManyTheFish](https://github.com/ManyTheFish) - crazy optimization\n- [Kerollmops](https://github.com/Kerollmops) Clément Renault - crazy optimization\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreyblake%2Fwhatlang-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreyblake%2Fwhatlang-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreyblake%2Fwhatlang-rs/lists"}