https://github.com/pear/text_languagedetect
PHP library to identify human languages from text samples.
https://github.com/pear/text_languagedetect
detect-language languages php
Last synced: 5 months ago
JSON representation
PHP library to identify human languages from text samples.
- Host: GitHub
- URL: https://github.com/pear/text_languagedetect
- Owner: pear
- Created: 2012-04-13T05:41:24.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2023-02-27T20:55:38.000Z (over 2 years ago)
- Last Synced: 2025-05-07T15:08:07.164Z (5 months ago)
- Topics: detect-language, languages, php
- Language: PHP
- Homepage: http://pear.php.net/package/Text_LanguageDetect
- Size: 308 KB
- Stars: 50
- Watchers: 17
- Forks: 18
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
*******************
Text_LanguageDetect
*******************
PHP library to identify human languages from text samples.
Returns confidence scores for each.Installation
============PEAR
----
::$ pear install Text_LanguageDetect
Composer
--------
::$ composer require pear/text_languagedetect
Usage
=====
Also see the examples in the ``docs/`` directory and
the `official documentation`__.__ http://pear.php.net/package/Text_LanguageDetect/docs
Language detection
------------------
Simple language detection::detectSimple($text);
echo $language;
//output: germanShow the three most probable languages with their confidence score::
detect($text, 3);
foreach ($results as $language => $confidence) {
echo $language . ': ' . number_format($confidence, 2) . "\n";
}//output:
//german: 0.35
//dutch: 0.25
//swedish: 0.20
?>Language code
-------------
Instead of returning the full language name, ISO 639-2 two and three
letter codes can be returned::setNameMode(2);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";//will output the ISO 639-2 three-letter language code
// "deu"
$ld->setNameMode(3);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";
?>Supported languages
===================
- albanian
- arabic
- azeri
- bengali
- bulgarian
- cebuano
- croatian
- czech
- danish
- dutch
- english
- estonian
- farsi
- finnish
- french
- german
- hausa
- hawaiian
- hindi
- hungarian
- icelandic
- indonesian
- italian
- kazakh
- kyrgyz
- latin
- latvian
- lithuanian
- macedonian
- mongolian
- nepali
- norwegian
- pashto
- pidgin
- polish
- portuguese
- romanian
- russian
- serbian
- slovak
- slovene
- somali
- spanish
- swahili
- swedish
- tagalog
- turkish
- ukrainian
- urdu
- uzbek
- vietnamese
- welshLinks
=====
Homepage
http://pear.php.net/package/Text_LanguageDetect
Bug tracker
http://pear.php.net/bugs/search.php?cmd=display&package_name[]=Text_LanguageDetect
Documentation
http://pear.php.net/package/Text_LanguageDetect/docs
Unit test status
https://travis-ci.org/pear/Text_LanguageDetect.. image:: https://travis-ci.org/pear/Text_LanguageDetect.svg?branch=master
:target: https://travis-ci.org/pear/Text_LanguageDetectNotes
=====
Where are the data from?I don't recall where I got the original data set.
It's just the frequencies of 3-letter combinations in each supported language.
It could be generated from a few random wikipedia pages from each language.