{"id":13468035,"url":"https://github.com/Mimino666/langdetect","last_synced_at":"2025-03-26T03:31:27.466Z","repository":{"id":16942780,"uuid":"19704737","full_name":"Mimino666/langdetect","owner":"Mimino666","description":"Port of Google's language-detection library to Python.","archived":false,"fork":false,"pushed_at":"2024-01-24T10:11:21.000Z","size":1001,"stargazers_count":1722,"open_issues_count":66,"forks_count":198,"subscribers_count":26,"default_branch":"master","last_synced_at":"2024-10-29T15:06:24.519Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mimino666.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2014-05-12T15:44:46.000Z","updated_at":"2024-10-24T09:50:53.000Z","dependencies_parsed_at":"2024-03-16T15:31:50.523Z","dependency_job_id":"cdbcc696-7ee1-4cc8-8831-1c461f1ef312","html_url":"https://github.com/Mimino666/langdetect","commit_stats":{"total_commits":55,"total_committers":10,"mean_commits":5.5,"dds":"0.36363636363636365","last_synced_commit":"a1598f1afcbfe9a758cfd06bd688fbc5780177b2"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mimino666%2Flangdetect","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mimino666%2Flangdetect/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mimino666%2Flangdetect/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mimino666%2Flangdetect/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mimino666","download_url":"https://codeload.github.com/Mimino666/langdetect/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242714048,"owners_count":20173583,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:01:04.505Z","updated_at":"2025-03-26T03:31:26.229Z","avatar_url":"https://github.com/Mimino666.png","language":"Python","funding_links":[],"categories":["Python","Data Processing","Feature Extraction","NLP"],"sub_categories":["Natural Language Processing","Text/NLP","Analysis"],"readme":"langdetect\n==========\n\n[![Build Status](https://travis-ci.org/Mimino666/langdetect.svg?branch=master)](https://travis-ci.org/Mimino666/langdetect)\n\nPort of Nakatani Shuyo's [language-detection](https://github.com/shuyo/language-detection) library (version from 03/03/2014) to Python.\n\n\nInstallation\n============\n\n    $ pip install langdetect\n\nSupported Python versions 2.7, 3.4+.\n\n\nLanguages\n=========\n\n``langdetect`` supports 55 languages out of the box ([ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)):\n\n    af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,\n    hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,\n    pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw\n\n\nBasic usage\n===========\n\nTo detect the language of the text:\n\n```python\n\u003e\u003e\u003e from langdetect import detect\n\u003e\u003e\u003e detect(\"War doesn't show who's right, just who's left.\")\n'en'\n\u003e\u003e\u003e detect(\"Ein, zwei, drei, vier\")\n'de'\n```\n\nTo find out the probabilities for the top languages:\n\n```python\n\u003e\u003e\u003e from langdetect import detect_langs\n\u003e\u003e\u003e detect_langs(\"Otec matka syn.\")\n[sk:0.572770823327, pl:0.292872522702, cs:0.134356653968]\n```\n\n**NOTE**\n\nLanguage detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results everytime you run it.\n\nTo enforce consistent results, call following code before the first language detection:\n\n```python\nfrom langdetect import DetectorFactory\nDetectorFactory.seed = 0\n```\n\nHow to add new language?\n========================\n\nYou need to create a new language profile. The easiest way to do it is to use the [langdetect.jar](https://github.com/shuyo/language-detection/raw/master/lib/langdetect.jar) tool, which can generate language profiles from Wikipedia abstract database files or plain text.\n\nWikipedia abstract database files can be retrieved from \"Wikipedia Downloads\" ([http://download.wikimedia.org/](http://download.wikimedia.org/)). They form '(language code)wiki-(version)-abstract.xml' (e.g. 'enwiki-20101004-abstract.xml' ).\n\nusage: ``java -jar langdetect.jar --genprofile -d [directory path] [language codes]``\n\n- Specify the directory which has abstract databases by -d option.\n- This tool can handle gzip compressed file.\n\nRemark: The database filename in Chinese is like 'zhwiki-(version)-abstract-zh-cn.xml' or zhwiki-(version)-abstract-zh-tw.xml', so that it must be modified 'zh-cnwiki-(version)-abstract.xml' or 'zh-twwiki-(version)-abstract.xml'.\n\nTo generate language profile from a plain text, use the genprofile-text command.\n\nusage: ``java -jar langdetect.jar --genprofile-text -l [language code] [text file path]``\n\nFor more details see [language-detection Wiki](https://code.google.com/archive/p/language-detection/wikis/Tools.wiki).\n\n\nOriginal project\n================\n\nThis library is a direct port of Google's [language-detection](https://code.google.com/p/language-detection/) library from Java to Python. All the classes and methods are unchanged, so for more information see the project's website or wiki.\n\nPresentation of the language detection algorithm: [http://www.slideshare.net/shuyo/language-detection-library-for-java](http://www.slideshare.net/shuyo/language-detection-library-for-java).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMimino666%2Flangdetect","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMimino666%2Flangdetect","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMimino666%2Flangdetect/lists"}