https://github.com/rmalouf/polars_whichlang
  
  
    Language identification plugin for polars 
    https://github.com/rmalouf/polars_whichlang
  
        Last synced: 4 months ago 
        JSON representation
    
Language identification plugin for polars
- Host: GitHub
 - URL: https://github.com/rmalouf/polars_whichlang
 - Owner: rmalouf
 - License: mit
 - Created: 2025-06-28T00:47:52.000Z (4 months ago)
 - Default Branch: main
 - Last Pushed: 2025-07-03T23:22:10.000Z (4 months ago)
 - Last Synced: 2025-07-03T23:28:15.878Z (4 months ago)
 - Language: Python
 - Homepage:
 - Size: 39.1 KB
 - Stars: 0
 - Watchers: 0
 - Forks: 0
 - Open Issues: 0
 - 
            Metadata Files:
            
- Readme: README.md
 - License: LICENSE
 
 
Awesome Lists containing this project
- awesome-polars - polar-whichlang - Polars plugin for fast language identification by [@rmalouf](https://github.com/rmalouf). (Libraries/Packages/Scripts / Polars plugins)
 
README
          # polars-whichlang
This polars plugin is a wrapper for [whichlang](https://github.com/quickwit-oss/whichlang), 
a very fast and reasonably accurate language identification library written in rust. 
It currently supports the following languages:
 
- Arabic (ara)
- Dutch (nld)
- English (eng)
- French (fra)
- German (deu)
- Hindi (hin)
- Italian (ita)
- Japanese (jpn)
- Korean (kor)
- Mandarin (cmn)
- Portuguese (por)
- Russian (rus)
- Spanish (spa)
- Swedish (swe)
- Turkish (tur)
- Vietnamese (vie)
## Installation
```
pip install polars-whichlang
```
## Examples
```python
import polars as pl
from polars_whichlang import detect_lang
df = pl.DataFrame(
    {
        "index": [1, 2, 3, 4],
        "text": [
            "This is a test.", 
            "Đây là một bài kiểm tra.", 
            "Dies ist ein Test", 
            "这是一个测试"
        ],
    }
)
df.with_columns(detect_lang('text').alias('lang'))
```
```
shape: (4, 3)
┌───────┬──────────────────────────┬──────┐
│ index ┆ text                     ┆ lang │
│ ---   ┆ ---                      ┆ ---  │
│ i64   ┆ str                      ┆ str  │
╞═══════╪══════════════════════════╪══════╡
│ 1     ┆ This is a test.          ┆ eng  │
│ 2     ┆ Đây là một bài kiểm tra. ┆ vie  │
│ 3     ┆ Dies ist ein Test        ┆ deu  │
│ 4     ┆ 这是一个测试               ┆ cmn  │
└───────┴──────────────────────────┴──────┘
```