{"id":13830877,"url":"https://github.com/bsolomon1124/pycld3","last_synced_at":"2025-04-05T15:08:49.141Z","repository":{"id":53867655,"uuid":"212926892","full_name":"bsolomon1124/pycld3","owner":"bsolomon1124","description":"Python3 bindings for the Compact Language Detector v3 (CLD3)","archived":false,"fork":false,"pushed_at":"2023-06-26T09:05:12.000Z","size":800,"stargazers_count":145,"open_issues_count":11,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-03-15T13:07:32.016Z","etag":null,"topics":["cld3","cpp","cython","language-detection","python3"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bsolomon1124.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-10-05T01:18:50.000Z","updated_at":"2024-03-09T06:29:48.000Z","dependencies_parsed_at":"2024-01-13T16:23:32.179Z","dependency_job_id":"965a2145-9ceb-409b-9cf0-b911b5a9de2f","html_url":"https://github.com/bsolomon1124/pycld3","commit_stats":{"total_commits":61,"total_committers":5,"mean_commits":12.2,"dds":0.2622950819672131,"last_synced_commit":"af6187d4020eafbe3f8517fa144072a8aa1d9bbc"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsolomon1124%2Fpycld3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsolomon1124%2Fpycld3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsolomon1124%2Fpycld3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsolomon1124%2Fpycld3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bsolomon1124","download_url":"https://codeload.github.com/bsolomon1124/pycld3/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353746,"owners_count":20925329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cld3","cpp","cython","language-detection","python3"],"created_at":"2024-08-04T10:01:11.273Z","updated_at":"2025-04-05T15:08:49.116Z","avatar_url":"https://github.com/bsolomon1124.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# `pycld3`\n\nPython bindings to the Compact Language Detector v3 (CLD3).\n\n[![CircleCI](https://circleci.com/gh/bsolomon1124/pycld3.svg?style=svg)](https://circleci.com/gh/bsolomon1124/pycld3)\n[![License](https://img.shields.io/github/license/bsolomon1124/pycld3.svg)](https://github.com/bsolomon1124/pycld3/blob/master/LICENSE)\n[![PyPI](https://img.shields.io/pypi/v/pycld3.svg)](https://pypi.org/project/pycld3/)\n[![Wheel](https://img.shields.io/pypi/wheel/pycld3)](https://img.shields.io/pypi/wheel/pycld3)\n[![Status](https://img.shields.io/pypi/status/pycld3.svg)](https://pypi.org/project/pycld3/)\n[![Python](https://img.shields.io/pypi/pyversions/pycld3.svg)](https://pypi.org/project/pycld3)\n[![Implementation](https://img.shields.io/pypi/implementation/pycld3)](https://pypi.org/project/pycld3)\n\n## Newer Alternative: `gcld3`\n\n**Note**: Since the original publication of this `pycld3`, Google's `cld3` authors have published the Python package [gcld3](https://pypi.org/project/gcld3/), which are official Python bindings built with [pybind](https://github.com/pybind/pybind11). Please check that project out as it is part of the canonical `cld3` repository and will likely stay in better lock step with any `cld3` changes over time.\n\n## Overview\n\nThis package contains Python bindings (via Cython) to Google's [CLD3](https://github.com/google/cld3/) library.\n\n```python\n\u003e\u003e\u003e import cld3\n\u003e\u003e\u003e cld3.get_language(\"影響包含對氣候的變化以及自然資源的枯竭程度\")\nLanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0)\n```\n\nThe library outputs BCP-47-style language codes. For some languages, output is differentiated by script. Language and script names from Unicode CLDR. It supports over 100 languages/scripts. See full list of [supported languages/scripts](https://github.com/google/cld3/blob/master/README.md#supported-languages) in Google's CLD3 documentation.\n\n## Installing with Wheels: Supported Versions and Platforms\n\nThis project supports **CPython versions 3.6 through 3.9.**\n\nWe publish [wheels](https://pypi.org/project/pycld3/#files) for the following matrix:\n\n- **MacOS**: CPython 3.6 thru 3.9\n- **Linux**: CPython 3.6 thru 3.9; ([manylinux1](https://www.python.org/dev/peps/pep-0513/#the-manylinux1-policy))\n\n\u003csup\u003eThe wheels for both MacOS and manylinux1 include the external protobuf library copied into the wheel itself\nvia [auditwheel](https://github.com/pypa/auditwheel) or\n[delocate](https://github.com/matthew-brett/delocate) so that you won't need to install any extra non-PyPI dependencies.\u003c/sup\u003e\n\nIf you are installing on one of the variants listed above, you should **not** need to have `protoc` or `libprotobuf` installed:\n\n```bash\npython -m pip install -U pycld3\n```\n\n## Installing from Source: Prerequisites\n\nIf you are not on a platform variant that is eligible to use a wheel, you may still be able to use `pycld3` via its [source distribution](https://docs.python.org/3/distutils/sourcedist.html) (`tar.gz`), but a bit more work is required to install.\nNamely, you'll also need:\n\n- the Protobuf compiler (the `protoc` executable)\n- the Protobuf development headers and `libprotoc` library\n- a compiler, preferably `g++`\n\nPlease consult [the official protobuf repository](https://github.com/protocolbuffers/protobuf) for information on installing Protobuf.\nThe project contains an [Installation README](https://github.com/protocolbuffers/protobuf/tree/master/src) that covers installation\non Windows and Unix.\n\nIf for whatever reason you are on a Unix host but unable to use the wheels (for instance, if you have an i686 architecture), here is a quick-and-dirty guide to installing.\n\n### Debian/Ubuntu\n\n```bash\nsudo apt-get update -y\nsudo apt-get install -y --no-install-recommends \\\n    g++ \\\n    protobuf-compiler \\\n    libprotobuf-dev\npython -m pip install -U pycld3\n```\n\n### Alpine Linux\n\n_Note_:\n[Alpine Linux does not support PyPI wheels](https://pythonspeed.com/articles/alpine-docker-python/)\nas of April 2020.  The steps below are mandatory on Alpine Linux because you will need\nto install from the source distribution.  If the situation permits, using a Debian distro\nshould be much easier (and faster).\n\n```bash\napk --update add g++ protobuf protobuf-dev\npython -m pip install -U pycld3\n```\n\n### CentOS/RHEL\n\nInstall from source, as root/UID 0:\n\n```bash\nsudo su -\nset -ex\npushd /opt\nPROTOBUF_VERSION='3.11.4'\nyum update -y\nyum install -y autoconf automake gcc-c++ glibc-headers gzip libtool make python3-devel zlib-devel\ncurl -Lo /opt/protobuf.tar.gz \\\n    \"https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-cpp-${PROTOBUF_VERSION}.tar.gz\"\ntar -xzvf protobuf.tar.gz\nrm -f protobuf.tar.gz\npushd \"protobuf-${PROTOBUF_VERSION}\"\n./configure --with-zlib --disable-debug \u0026\u0026 make \u0026\u0026 make install \u0026\u0026 ldconfig --verbose\npopd \u0026\u0026 rm -rf \"protobuf-${PROTOBUF_VERSION}\" \u0026\u0026 popd \u0026\u0026 set +ex\n\npython -m pip install -U pycld3\n```\n\nNote: the steps above are for CentOS 8.  For earlier versions, you may need to replace:\n\n- `gcc-c++` with `g++`\n- `python3-devel` with `python-devel`\n\n### MacOS/Homebrew\n\n```bash\nbrew update\nbrew upgrade protobuf || brew install -v protobuf\npython -m pip install -U pycld3\n```\n\n### Windows\n\nPlease consult Protobuf's\n[C++ Installation - Windows](https://github.com/protocolbuffers/protobuf/tree/master/src#c-installation---windows)\nsection for help with installing Protobuf on Windows.\n\nIf you would like to help contribute Windows wheels (preferably as a job within the project's\nCI/CD pipelines), please [file an issue](https://github.com/bsolomon1124/pycld3).\n\n## Usage\n\n`cld3` exports two module-level functions, `get_language()` and `get_frequent_languages()`:\n\n```python\n\u003e\u003e\u003e import cld3\n\n\u003e\u003e\u003e cld3.get_language(\"影響包含對氣候的變化以及自然資源的枯竭程度\")\nLanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0)\n\n\u003e\u003e\u003e cld3.get_language(\"This is a test\")\nLanguagePrediction(language='en', probability=0.9999980926513672, is_reliable=True, proportion=1.0)\n\n\u003e\u003e\u003e for lang in cld3.get_frequent_languages(\n...     \"This piece of text is in English. Този текст е на Български.\",\n...     num_langs=3\n... ):\n...     print(lang)\n...\nLanguagePrediction(language='bg', probability=0.9173890948295593, is_reliable=True, proportion=0.5853658318519592)\nLanguagePrediction(language='en', probability=0.9999790191650391, is_reliable=True, proportion=0.4146341383457184)\n```\n\n## FAQ\n\n### `cld3` incorrectly detects my input.  How can I fix this?\n\nA first resort is to **preprocess (clean) your input text** based on conditions specific to your program.\n\nA salient example is to remove URLs and email addresses from the input.  **CLD3 (unlike [CLD2](https://github.com/CLD2Owners/cld2))\ndoes almost none of this cleaning for you**, in the spirit of not penalizing other users with overhead that they may not need.\n\nHere's such an example using a simplified URL regex from _Regular Expressions Cookbook, 2nd ed._:\n\n```python\n\u003e\u003e\u003e import re\n\u003e\u003e\u003e import cld3\n\n# cld3 does not ignore the URL components by default\n\u003e\u003e\u003e s = \"Je veux que: https://site.english.com/this/is/a/url/path/component#fragment\"\n\u003e\u003e\u003e cld3.get_language(s)\nLanguagePrediction(language='en', probability=0.5319557189941406, is_reliable=False, proportion=1.0)\n\n\u003e\u003e\u003e url_re = r\"\\b(?:https?://|www\\.)[a-z0-9-]+(\\.[a-z0-9-]+)+(?:[/?].*)?\"\n\u003e\u003e\u003e new_s = re.sub(url_re, \"\", s)\n\u003e\u003e\u003e new_s\n'Je veux que: '\n\u003e\u003e\u003e cld3.get_language(new_s)\nLanguagePrediction(language='fr', probability=0.9799421429634094, is_reliable=True, proportion=1.0)\n```\n\n\u003csup\u003e_Note_: This URL regex aims for simplicity.  It requires a domain name, and doesn't allow a username or password; it allows the scheme\n(http or https) to be omitted if it can be inferred from the subdomain (www).  Source: _Regular Expressions Cookbook, 2nd ed._ - Goyvaerts \u0026 Levithan.\u003c/sup\u003e\n\n**In some other cases, you cannot fix the incorrect detection.**\nLanguage detection algorithms in general may perform poorly with very short inputs.\nRarely should you trust the output of something like `detect(\"hi\")`.  Keep this limitation in mind regardless\nof what library you are using.\n\nPlease remember that, at the end of the day, this project is just a Python wrapper to the CLD3 C++ library that does the actual heavy-lifting.\n\n### I'm seeing an error during `pip` installation.  How can I fix this?\n\nFirst, please make sure you have read the [installation](#installation-supported-versions-and-platforms) section that that you have\ninstalled Protobuf if necessary.\n\nIf that doesn't help, please [file an issue](https://github.com/bsolomon1124/pycld3/issues) in this repository.\nThe build process for this project is somewhat complex because it involves both Cython and Protobuf, but I do my best\nto make it work everywhere possible.\n\n### Protobuf is installed, but I'm still seeing \"cannot open shared object file\"\n\nIf you've installed Protobuf, but are seeing an error such as:\n\n```\nImportError: libprotobuf.so.22: cannot open shared object file: No such file or directory\n```\n\nThis likely means that Python is not finding the `libprotobuf` shared object,\npossibly because `ldconfig` didn't do what it was supposed to.\nYou may need to tell it where to look.\n\nYou can find where the library sits via:\n\n```bash\n$ find /usr -name 'libprotoc.so' \\( -type l -o -type f \\)\n/usr/local/lib/libprotoc.so\n```\n\nThen, you can add the directory containing this file to `LD_LIBRARY_PATH`:\n\n```bash\nexport LD_LIBRARY_PATH=\"$(dirname $(find /usr -name 'libprotoc.so' \\( -type l -o -type f \\))):$LD_LIBRARY_PATH\"\n```\n\nYou can quickly test that this worked:\n\n```bash\n$ python -c 'import cld3; print(cld3.get_language(\"影響包含對氣候的變化以及自然資源的枯竭程度\"))'\nLanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0)\n```\n\n### Authors\n\nThis repository contains a fork of [`google/cld3`](https://github.com/google/cld3/) at commit 06f695f.  The license for `google/cld3` can be found at\n[LICENSES/CLD3\\_LICENSE](https://github.com/bsolomon1124/pycld3/blob/master/LICENSES/CLD3_LICENSE).\n\nThis repository is a combination of changes [introduced](https://github.com/google/cld3/issues/15) by [various forks](https://github.com/google/cld3/network/members) of `google/cld3` by the following people:\n\n- Johannes Baiter ([@jbaiter](https://github.com/jbaiter))\n- Elizabeth Myers ([@Elizafox](https://github.com/Elizafox))\n- Witold Bołt ([@houp](https://github.com/houp))\n- Alfredo Luque ([@iamthebot](https://github.com/iamthebot))\n- WISESIGHT ([@wisesight](https://github.com/wisesight))\n- RNogales ([@RNogales94](https://github.com/RNogales94))\n- Brad Solomon ([@bsolomon1124](https://github.com/bsolomon1124))\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbsolomon1124%2Fpycld3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbsolomon1124%2Fpycld3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbsolomon1124%2Fpycld3/lists"}