https://github.com/proycon/lingua-cli
Very small simple command-line interface for language detection using lingua-rs
https://github.com/proycon/lingua-cli
languagedetection nlp
Last synced: 7 months ago
JSON representation
Very small simple command-line interface for language detection using lingua-rs
- Host: GitHub
- URL: https://github.com/proycon/lingua-cli
- Owner: proycon
- License: gpl-3.0
- Created: 2022-04-16T21:26:34.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-10-12T21:13:02.000Z (about 1 year ago)
- Last Synced: 2025-04-23T03:49:31.590Z (7 months ago)
- Topics: languagedetection, nlp
- Language: Rust
- Homepage: https://git.sr.ht/~proycon/lingua-cli
- Size: 78.1 KB
- Stars: 7
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-cli-apps-in-a-csv - lingua-cli - This is a small command-line tool for language detection, it is a simple wrapper around the lingua-rs library for Rust. (<a name="text-processing"></a>Text processing)
- awesome-cli-apps - lingua-cli - This is a small command-line tool for language detection, it is a simple wrapper around the lingua-rs library for Rust. (<a name="text-processing"></a>Text processing)
README
[](https://crates.io/crates/lingua-cli)
[](https://github.com/proycon/lingua-cli/releases/)
[](https://www.repostatus.org/#active)

# Lingua-cli
This is a small command-line tool for language detection, it is a simple
wrapper around the [lingua-rs](https://github.com/pemistahl/lingua-rs/) library
for Rust, read there for extensive documentation. A distinguishing feature is
that this library works better for short texts thanmany other libraries
## Installation
Ensure you have Rust's package manager `cargo`, then download, isntall and compile `lingua-cli` in one go as follows:
``$ cargo install lingua-cli``
## Usage
Pass text as parameter
``$ lingua-cli bonjour à tous``
Pass text via standard input:
``$ echo "bonjour à tous" | lingua-cli``
Constrain the languages you want to detect using `-l` with iso-639-1 languages
codes. Constraining the list improves accuracy. Do `-L` to see a list of
supported languages.
``$ echo "bonjour à tous" | lingua-cli -l "fr,de,es,nl,en"``
To classify input line-by-line, pass ``-n``.
``$ echo -e "bonjour à tous\nhola a todos\nhallo allemaal" | lingua-cli -n -l "fr,de,es,nl,en"``
```
fr 0.9069164472389637 bonjour à tous
es 0.918273871035807 hola a todos
nl 0.988293648761749 hallo allemaal
```
Output is TSV and consists of an iso-639-1 language code, confidence score, and in line-by-line mode, a copy of the line.
You can also classified mixed text using the ``--multi`` option. This will then output UTF-8 byte offsets:
```
$ lingua-cli --multi -l fr,de,en < /tmp/test.txt
0 23 fr Parlez-vous français?
23 73 de Ich spreche ein bisschen spreche Französisch ja.
73 110 en A little bit is better than nothing.
```