{"id":28626200,"url":"https://github.com/commoncrawl/language-detection-cld2","last_synced_at":"2025-06-12T08:41:11.735Z","repository":{"id":54460839,"uuid":"138736671","full_name":"commoncrawl/language-detection-cld2","owner":"commoncrawl","description":"Natural language detection, Java bindings for CLD2","archived":false,"fork":false,"pushed_at":"2024-11-09T17:07:23.000Z","size":141,"stargazers_count":14,"open_issues_count":2,"forks_count":2,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-11-09T18:18:55.078Z","etag":null,"topics":["language-detection","language-identification","natural-language"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/commoncrawl.png","metadata":{"files":{"readme":"README.deezer-weslang","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-26T12:44:07.000Z","updated_at":"2024-11-09T17:07:27.000Z","dependencies_parsed_at":"2024-11-09T18:18:09.855Z","dependency_job_id":"a11d1e25-7f9c-4d1e-a92f-c7849d857c16","html_url":"https://github.com/commoncrawl/language-detection-cld2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/commoncrawl/language-detection-cld2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Flanguage-detection-cld2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Flanguage-detection-cld2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Flanguage-detection-cld2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Flanguage-detection-cld2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/commoncrawl","download_url":"https://codeload.github.com/commoncrawl/language-detection-cld2/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Flanguage-detection-cld2/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259432329,"owners_count":22856727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-detection","language-identification","natural-language"],"created_at":"2025-06-12T08:41:08.937Z","updated_at":"2025-06-12T08:41:11.725Z","avatar_url":"https://github.com/commoncrawl.png","language":"Java","funding_links":[],"categories":["人工智能"],"sub_categories":["自然语言处理"],"readme":"This is a Java wrapper for the library CLD2 (https://code.google.com/p/cld2/)\nusing JNA.\n\nInitially the classes CLDHints and Cld2Library were automatically generated\nusing jnaerator (https://code.google.com/p/jnaerator/). To use it we needed to\nremove the include of \u003cvector\u003e as it crashed the app.\n\nThen we executed the following command:\n$ java -jar jnaerator-0.12-20140604.001151-54-shaded.jar -library Cld2 ~/language_detection/cld2/internal/generated_language.h ~/language_detection/cld2/public/encodings.h ~/language_detection/cld2/public/compact_lang_det.h -o . -v -noJar -noComp -runtime JNA -f -noComments\n\nThen using:\n$ nm libcld2_full.so\nwe got the mangled cpp names and replaced those in Cld2Library, this is because\nof bug https://github.com/ochafik/nativelibs4java/issues/515.\n\nWe also removed lot of undesired content and made the class protected.\nNote also, that we replaced the signature of ExtDetectLanguagesummary to use\narrays. We only keep one function, as the others can be easily implemented on\ntop of it.\n\nWe also extracted both Enumerations to their own classes.\n\nIn CLDHints, we replaced the pointers by strings and removed the references to\nbyreference and bypointer.\n\nIMPORTANT: this bindings have only been tested for linux-x86-64, any other\nOSes and flavors are not explicitly supported.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcommoncrawl%2Flanguage-detection-cld2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcommoncrawl%2Flanguage-detection-cld2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcommoncrawl%2Flanguage-detection-cld2/lists"}