Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wooorm/franc
Natural language detection
https://github.com/wooorm/franc
classification classify detect detection javascript language language-detection natural natural-language nlp
Last synced: 6 days ago
JSON representation
Natural language detection
- Host: GitHub
- URL: https://github.com/wooorm/franc
- Owner: wooorm
- License: mit
- Created: 2014-07-19T14:50:53.000Z (over 10 years ago)
- Default Branch: main
- Last Pushed: 2024-06-12T22:48:23.000Z (7 months ago)
- Last Synced: 2024-10-29T15:07:56.299Z (3 months ago)
- Topics: classification, classify, detect, detection, javascript, language, language-detection, natural, natural-language, nlp
- Language: JavaScript
- Homepage: https://wooorm.com/franc/
- Size: 4.4 MB
- Stars: 4,129
- Watchers: 43
- Forks: 174
- Open Issues: 4
-
Metadata Files:
- Readme: readme.md
- Changelog: changelog.md
- Funding: funding.yml
- License: license
Awesome Lists containing this project
- my-awesome-list - franc
- awesome-nodejs-cn - franc - 检测文本语言 (包 / 自然语言处理)
- awesome-nodejs - franc - Detect the language of text. ![](https://img.shields.io/github/stars/wooorm/franc.svg?style=social&label=Star) (Repository / Natural language processing)
- awesome-nodejs-cn - franc - **star:4170** 检测文本的语言 ![star > 2000][Awesome] (包 / 自然语言处理)
- awesome-imgcook - wooorm/franc - Detect the language of text. (JavaScript packages for machine learning / Natural language processing)
- awesome-nodejs - franc - Detect the language of text. (Packages / Natural language processing)
- low-resource-languages - Franc - Natural language detection https://wooorm.com/franc/. (Software / Utilities)
- awesome-nodejs - franc - Natural language detection - ★ 2823 (Natural language processing)
- awesome-node - franc - Detect the language of text. (Packages / Natural language processing)
- awesome-nodejs-cn - franc - 检测文本语言. (目录 / NLP自然语言处理)
- starred-awesome - franc - Natural language detection (JavaScript)
README
# ![franc][logo]
[![Build Status][build-badge]][build]
[![Coverage Status][coverage-badge]][coverage]Detect the language of text.
## What’s so cool about franc?
1. **franc** can support more languages(†) than any other
library
2. **franc** is packaged with support for [82][s], [186][m], or [419][l]
languages
3. **franc** has a CLI† - Based on the [UDHR][], the most translated copyright-free document in the
world.## What’s not so cool about franc?
**franc** supports many languages, which means it’s easily confused on small
samples.
Make sure to pass it big documents to get reliable results.## Install
> 👉 **Note**: this installs the [`franc`][m] package, with support for 187
> languages (languages which have 1 million or more speakers).
> [`franc-min`][s] (82 languages, 8m or more speakers) and [`franc-all`][l]
> (all 414 possible languages) are also available.
> Finally, use `franc-cli` to install the [CLI][].This package is [ESM only][esm].
In Node.js (version 14.14+, 16.0+), install with [npm][]:```sh
npm install franc
```In Deno with [`esm.sh`][esmsh]:
```js
import {franc, francAll} from 'https://esm.sh/franc@6'
```In browsers with [`esm.sh`][esmsh]:
```html
import {franc, francAll} from 'https://esm.sh/franc@6?bundle'
```
## Use
```js
import {franc, francAll} from 'franc'franc('Alle menslike wesens word vry') //=> 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') //=> 'ben'
franc('Alle menneske er fødde til fridom') //=> 'nno'franc('') //=> 'und' (language code that stands for undetermined)
// You can change what’s too short (default: 10):
franc('the') //=> 'und'
franc('the', {minLength: 3}) //=> 'sco'console.log(francAll('Considerando ser essencial que os direitos humanos'))
//=> [['por', 1], ['glg', 0.771284519307895], ['spa', 0.6034146900423971], …123 more items]console.log(francAll('Considerando ser essencial que os direitos humanos', {only: ['por', 'spa']}))
//=> [['por', 1 ], ['spa', 0.6034146900423971]]console.log(francAll('Considerando ser essencial que os direitos humanos', {ignore: ['spa', 'glg']}))
//=> [['por', 1], ['cat', 0.5367251059928957], ['src', 0.47461899851037015], …121 more items]
```## API
This package exports the identifiers `franc`, `francAll`.
There is no default export.### `franc(value[, options])`
Get the most probable language for the given value.
###### Parameters
* `value` (`string`) — value to test
* `options` (`Options`, optional) — configuration###### Returns
The most probable language (`string`).
### `francAll(value[, options])`
Get the most probable language for the given value.
###### Parameters
* `value` (`string`) — value to test
* `options` (`Options`, optional) — configuration###### Returns
Array containing language—distance tuples (`Array<[string, number]>`).
### `Options`
Configuration (`Object`, optional) with the following fields:
###### `options.only`
Languages to allow (`Array`, optional).
###### `options.ignore`
Languages to ignore (`Array`, optional).
###### `options.minLength`
Minimum length to accept (`number`, default: `10`).
## CLI
Install:
```sh
npm install franc-cli --global
```Use:
```text
CLI to detect the language of textUsage: franc [options]
Options:
-h, --help output usage information
-v, --version output version number
-m, --min-length minimum length to accept
-o, --only allow languages
-i, --ignore disallow languages
-a, --all display all guessesUsage:
# output language
$ franc "Alle menslike wesens word vry"
# afr# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben# ignore certain languages
$ franc --ignore por,glg "O Brasil caiu 26 posições"
# src# output language from stdin with only
$ echo "Alle mennesker er født frie og" | franc --only nob,dan
# nob
```## Data
###### Supported languages
| Package | Languages | Speakers |
| - | - | - |
| [`franc-min`][s] | 82 | 8M or more |
| [`franc`][m] | 187 | 1M or more |
| [`franc-all`][l] | 414 | - |###### Language code
> 👉 **Note**: franc returns [ISO 639-3][iso6393] codes (three letter codes).
> **Not** ISO 639-1 or ISO 639-2.
> See also [GH-10][] and [GH-30][].To get more info about the languages represented by ISO 639-3, use
[`iso-639-3`][iso-639-3].
There is also an index available to map ISO 639-3 to ISO 639-1 codes,
[`iso-639-3/to-1.json`][iso-639-3-to-1], but note that not all 639-3 codes can
be represented in 639-1.## Types
These packages are fully typed with [TypeScript][].
They export the additional types `TrigramTuple` and `Options`.## Compatibility
These package are at least compatible with all maintained versions of Node.js.
As of now, that is Node.js 14.14+ and 16.0+.
They also works in Deno and modern browsers.## Ports
Franc has been ported to several other programming languages.
* Elixir — [`paasaa`](https://github.com/minibikini/paasaa)
* Erlang — [`efranc`](https://github.com/G-Corp/efranc)
* Go — [`franco`](https://github.com/kapsteur/franco),
[`whatlanggo`](https://github.com/abadojack/whatlanggo)
* R — [`franc`](https://github.com/MangoTheCat/franc)
* Rust — [`whatlang-rs`](https://github.com/greyblake/whatlang-rs)
* Dart — [`francd`](https://github.com/svonidze/francd)
* Python — [`pyfranc`](https://github.com/cyb3rk0tik/pyfranc)The works franc is derived from have themselves also been ported to other
languages.## Derivation
Franc is a derivative work from [guess-language][] (Python, LGPL),
[guesslanguage][] (C++, LGPL), and [Language::Guess][language-guess]
(Perl, GPL).
Their creators granted me the rights to distribute franc under the MIT license:
respectively, [Kent S. Johnson][grant-3], [Jacob R. Rideout][grant-2], and
[Maciej Ceglowski][grant-1].## Contribute
Yes please!
See [How to Contribute to Open Source][contribute].## Security
This package is safe.
## License
[MIT][] © [Titus Wormer][home]
[logo]: https://raw.githubusercontent.com/wooorm/franc/a162cc0/logo.svg?sanitize=true
[build-badge]: https://github.com/wooorm/franc/workflows/main/badge.svg
[build]: https://github.com/wooorm/franc/actions
[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/franc.svg
[coverage]: https://codecov.io/github/wooorm/franc
[npm]: https://docs.npmjs.com/cli/install
[guess-language]: https://github.com/kent37/guess-language
[guesslanguage]: http://websvn.kde.org/branches/work/sonnet-refactoring/common/nlp/guesslanguage.cpp?view=markup
[language-guess]: http://web.archive.org/web/20090228163219/http://languid.cantbedone.org/
[grant-1]: https://github.com/wooorm/franc/issues/6#issuecomment-59669191
[grant-2]: https://github.com/wooorm/franc/issues/6#issuecomment-60196819
[grant-3]: https://github.com/wooorm/franc/issues/6#issuecomment-59936827
[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
[esmsh]: https://esm.sh
[typescript]: https://www.typescriptlang.org
[contribute]: https://opensource.guide/how-to-contribute/
[mit]: license
[home]: http://wooorm.com
[cli]: #cli
[udhr]: http://unicode.org/udhr/
[s]: https://github.com/wooorm/franc/tree/main/packages/franc-min
[m]: https://github.com/wooorm/franc/tree/main/packages/franc
[l]: https://github.com/wooorm/franc/tree/main/packages/franc-all
[iso6393]: https://iso639-3.sil.org/code_tables/639/data
[gh-10]: https://github.com/wooorm/franc/issues/10
[gh-30]: https://github.com/wooorm/franc/issues/30
[iso-639-3]: https://github.com/wooorm/iso-639-3
[iso-639-3-to-1]: https://github.com/wooorm/iso-639-3/blob/main/iso6393-to-1.js