https://github.com/pear/text_languagedetect

PHP library to identify human languages from text samples.
https://github.com/pear/text_languagedetect

detect-language languages php

Last synced: about 1 year ago
JSON representation

PHP library to identify human languages from text samples.

Host: GitHub
URL: https://github.com/pear/text_languagedetect
Owner: pear
Created: 2012-04-13T05:41:24.000Z (over 14 years ago)
Default Branch: master
Last Pushed: 2023-02-27T20:55:38.000Z (over 3 years ago)
Last Synced: 2025-05-07T15:08:07.164Z (about 1 year ago)
Topics: detect-language, languages, php
Language: PHP
Homepage: http://pear.php.net/package/Text_LanguageDetect
Size: 308 KB
Stars: 50
Watchers: 17
Forks: 18
Open Issues: 1
Metadata Files:
- Readme: README.rst

Awesome Lists containing this project

README

          *******************

Text_LanguageDetect

*******************

PHP library to identify human languages from text samples.

Returns confidence scores for each.

Installation

============

PEAR

----

::

    $ pear install Text_LanguageDetect

Composer

--------

::

    $ composer require pear/text_languagedetect

Usage

=====

Also see the examples in the ``docs/`` directory and

the `official documentation`__.

__ http://pear.php.net/package/Text_LanguageDetect/docs

Language detection

------------------

Simple language detection::

    detectSimple($text);

    echo $language;

    //output: german

Show the three most probable languages with their confidence score::

    detect($text, 3);

    foreach ($results as $language => $confidence) {

        echo $language . ': ' . number_format($confidence, 2) . "\n";

    }

    //output:

    //german: 0.35

    //dutch: 0.25

    //swedish: 0.20

    ?>

Language code

-------------

Instead of returning the full language name, ISO 639-2 two and three

letter codes can be returned::

    setNameMode(2);

    echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";

    //will output the ISO 639-2 three-letter language code

    // "deu"

    $ld->setNameMode(3);

    echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";

    ?>

Supported languages

===================

- albanian

- arabic

- azeri

- bengali

- bulgarian

- cebuano

- croatian

- czech

- danish

- dutch

- english

- estonian

- farsi

- finnish

- french

- german

- hausa

- hawaiian

- hindi

- hungarian

- icelandic

- indonesian

- italian

- kazakh

- kyrgyz

- latin

- latvian

- lithuanian

- macedonian

- mongolian

- nepali

- norwegian

- pashto

- pidgin

- polish

- portuguese

- romanian

- russian

- serbian

- slovak

- slovene

- somali

- spanish

- swahili

- swedish

- tagalog

- turkish

- ukrainian

- urdu

- uzbek

- vietnamese

- welsh

Links

=====

Homepage

  http://pear.php.net/package/Text_LanguageDetect

Bug tracker

  http://pear.php.net/bugs/search.php?cmd=display&package_name[]=Text_LanguageDetect

Documentation

  http://pear.php.net/package/Text_LanguageDetect/docs

Unit test status

  https://travis-ci.org/pear/Text_LanguageDetect

  .. image:: https://travis-ci.org/pear/Text_LanguageDetect.svg?branch=master

     :target: https://travis-ci.org/pear/Text_LanguageDetect

Notes

=====

Where are the data from?

 I don't recall where I got the original data set.

 It's just the frequencies of 3-letter combinations in each supported language.

 It could be generated from a few random wikipedia pages from each language.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pear/text_languagedetect

Awesome Lists containing this project

README