https://github.com/rmalouf/polars_whichlang
Language identification plugin for polars
https://github.com/rmalouf/polars_whichlang
Last synced: 5 months ago
JSON representation
Language identification plugin for polars
- Host: GitHub
- URL: https://github.com/rmalouf/polars_whichlang
- Owner: rmalouf
- License: mit
- Created: 2025-06-28T00:47:52.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-07-03T23:22:10.000Z (6 months ago)
- Last Synced: 2025-07-03T23:28:15.878Z (6 months ago)
- Language: Python
- Homepage:
- Size: 39.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-polars - polar-whichlang - Polars plugin for fast language identification by [@rmalouf](https://github.com/rmalouf). (Libraries/Packages/Scripts / Polars plugins)
README
# polars-whichlang
This polars plugin is a wrapper for [whichlang](https://github.com/quickwit-oss/whichlang),
a very fast and reasonably accurate language identification library written in rust.
It currently supports the following languages:
- Arabic (ara)
- Dutch (nld)
- English (eng)
- French (fra)
- German (deu)
- Hindi (hin)
- Italian (ita)
- Japanese (jpn)
- Korean (kor)
- Mandarin (cmn)
- Portuguese (por)
- Russian (rus)
- Spanish (spa)
- Swedish (swe)
- Turkish (tur)
- Vietnamese (vie)
## Installation
```
pip install polars-whichlang
```
## Examples
```python
import polars as pl
from polars_whichlang import detect_lang
df = pl.DataFrame(
{
"index": [1, 2, 3, 4],
"text": [
"This is a test.",
"Đây là một bài kiểm tra.",
"Dies ist ein Test",
"这是一个测试"
],
}
)
df.with_columns(detect_lang('text').alias('lang'))
```
```
shape: (4, 3)
┌───────┬──────────────────────────┬──────┐
│ index ┆ text ┆ lang │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═══════╪══════════════════════════╪══════╡
│ 1 ┆ This is a test. ┆ eng │
│ 2 ┆ Đây là một bài kiểm tra. ┆ vie │
│ 3 ┆ Dies ist ein Test ┆ deu │
│ 4 ┆ 这是一个测试 ┆ cmn │
└───────┴──────────────────────────┴──────┘
```