{"id":37030462,"url":"https://github.com/kju2/language-detector","last_synced_at":"2026-01-14T03:42:47.160Z","repository":{"id":57733846,"uuid":"155407768","full_name":"kju2/language-detector","owner":"kju2","description":"Language Detection Library for Java","archived":true,"fork":true,"pushed_at":"2019-03-23T13:00:12.000Z","size":3419,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-07-28T10:09:48.814Z","etag":null,"topics":["java","language-detection","library"],"latest_commit_sha":null,"homepage":"https://kju2.github.io/language-detector/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"optimaize/language-detector","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kju2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-30T15:19:19.000Z","updated_at":"2023-04-16T15:58:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kju2/language-detector","commit_stats":null,"previous_names":[],"tags_count":9,"template":null,"template_full_name":null,"purl":"pkg:github/kju2/language-detector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kju2%2Flanguage-detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kju2%2Flanguage-detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kju2%2Flanguage-detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kju2%2Flanguage-detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kju2","download_url":"https://codeload.github.com/kju2/language-detector/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kju2%2Flanguage-detector/sbom","scorecard":{"id":562478,"data":{"date":"2025-08-11","repo":{"name":"github.com/kju2/language-detector","commit":"17de8a46916dabf7f26ab32b530478dafa3c5579"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"project is archived","details":["Warn: Repository is archived."],"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-20T14:06:22.049Z","repository_id":57733846,"created_at":"2025-08-20T14:06:22.049Z","updated_at":"2025-08-20T14:06:22.049Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28408858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T01:52:23.358Z","status":"online","status_checked_at":"2026-01-14T02:00:06.678Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","language-detection","library"],"created_at":"2026-01-14T03:42:46.545Z","updated_at":"2026-01-14T03:42:47.152Z","avatar_url":"https://github.com/kju2.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# language-detector\n\n## Try it out\n\n```bash\ncurl -X POST https://k18vr57k5l.execute-api.eu-central-1.amazonaws.com/default/languageDetectorV1 \\\n  -H 'x-api-key: eFU71njyBpYOE0O6NUqW5mk7ZtmpYYO5fCe7s8u3' \\\n  -d '\"Now, here, you see, it takes all the running you can do, to keep in the same place.\"'\n```\n\n## How to Use\n\nAdd the dependecy in your \"pom.xml\":\n\n```xml\n\u003cdependency\u003e\n    \t\u003cgroupId\u003eio.github.kju2.languagedetector\u003c/groupId\u003e\n\t\u003cartifactId\u003elanguage-detector\u003c/artifactId\u003e\n\t\u003cversion\u003e1.0.3\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nThen create an instance of the LanguageDetector and feed it some text:\n\n```java\nLanguageDetector detector = new LanguageDetector();\n\nString text = \"This is the text you want to know the language it is written in.\";\nLanguage detectedLanguage = detector.detectPrimaryLanguageOf(text);\n```\n\n## Language Support\n\n### 68 Built-in Language Profiles\n\n1. AFRIKAANS (af)\n2. ALBANIAN (sq)\n3. ARABIC (ar)\n4. ARAGONESE (an)\n5. BASQUE (eu)\n6. BELARUSIAN (be)\n7. BENGALI (bn)\n8. BRETON (br)\n9. BULGARIAN (bg)\n10. CATALAN (ca)\n11. CENTRAL_KHMER (km)\n12. CHINESE (zh)\n13. CROATIAN (hr)\n14. CZECH (cs)\n15. DANISH (da)\n16. DUTCH (nl)\n17. ENGLISH (en)\n18. ESTONIAN (et)\n19. FINNISH (fi)\n20. FRENCH (fr)\n21. GALICIAN (gl)\n22. GERMAN (de)\n23. GREEK (el)\n24. GUJARATI (gu)\n25. HAITIAN (ht)\n26. HEBREW (he)\n27. HINDI (hi)\n28. HUNGARIAN (hu)\n29. ICELANDIC (is)\n30. INDONESIAN (id)\n31. IRISH (ga)\n32. ITALIAN (it)\n33. JAPANESE (ja)\n34. KANNADA (kn)\n35. KOREAN (ko)\n36. LATVIAN (lv)\n37. LITHUANIAN (lt)\n38. MACEDONIAN (mk)\n39. MALAY (ms)\n40. MALAYALAM (ml)\n41. MALTESE (mt)\n42. MARATHI (mr)\n43. NEPALI (ne)\n44. NORWEGIAN (no)\n45. OCCITAN (oc)\n46. PANJABI (pa)\n47. PERSIAN (fa)\n48. POLISH (pl)\n49. PORTUGUESE (pt)\n50. ROMANIAN (ro)\n51. RUSSIAN (ru)\n52. SERBIAN (sr)\n53. SLOVAK (sk)\n54. SLOVENIAN (sl)\n55. SOMALI (so)\n56. SPANISH (es)\n57. SWAHILI (sw)\n58. SWEDISH (sv)\n59. TAGALOG (tl)\n60. TAMIL (ta)\n61. TELUGU (te)\n62. THAI (th)\n63. TURKISH (tr)\n64. UKRAINIAN (uk)\n65. URDU (ur)\n66. VIETNAMESE (vi)\n67. WELSH (cy)\n68. YIDDISH (yi)\n\n### Other Languages\n\nYou can create a language profile for your own language easily.\nSee https://github.com/optimaize/language-detector/blob/master/src/main/resources/README.md\n\n\n## How it Works\n\nThe software uses language profiles which were created based on common text for each language.\nN-grams http://en.wikipedia.org/wiki/N-gram were then extracted from that text, and that's what is stored in the profiles.\n\nWhen trying to figure out in what language a certain text is written, the program goes through the same process:\nIt creates the same kind of n-grams of the input text. Then it compares the relative frequency of them, and finds the\nlanguage that matches best.\n\n\n### Challenges\n\nThis software does not work as well when the input text to analyze is short, or unclean. For example tweets.\n\nWhen a text is written in multiple languages, the default algorithm of this software is not appropriate.\nYou can try to split the text (by sentence or paragraph) and detect the individual parts. Running the language guesser\non the whole text will just tell you the language that is most dominant, in the best case.\n\nThis software cannot handle it well when the input text is in none of the expected (and supported) languages.\nFor example if you only load the language profiles from English and German, but the text is written in French,\nthe program may pick the more likely one, or say it doesn't know. (An improvement would be to clearly detect that\nit's unlikely one of the supported languages.)\n\nIf you are looking for a language detector / language guesser library in Java, this seems to be the best open source\nlibrary you can get at this time. If it doesn't need to be Java, you may want to take a look at https://code.google.com/p/cld2/\n\n## How You Can Help\n\nIf your language is not supported yet, then you can provide clean \"training text\", that is, common text written in your\nlanguage. The text should be fairly long (a couple of pages at the very least). If you can provide that, please open\na ticket.\n\nIf your language is supported already, but not identified clearly all the time, you can still provide such training\ntext. We might then be able to improve detection for your language.\n\nIf you're a programmer, dig in the source and see what you can improve. Check the open tasks.\n\n\n## History and Changes\n\nThis project is a fork of a fork, the original author is Nakatani Shuyo.\nFor detail see https://github.com/optimaize/language-detector/wiki/History-and-Changes\n\n## License\n\nApache 2 (business friendly)\n\n\n## Authors\n\nNakatani Shuyo, Fabian Kessler, Francois ROLAND, Robert Theis, Kju2\n\nFor detail see https://github.com/optimaize/language-detector/wiki/Authors\n\n## References\n\n\n### Research Papers and Articles on Language Identification\n- [Automatic Language Identification in Texts: A Survey (2018)](https://arxiv.org/abs/1804.08186)\n\n\n### Libraries for Language Identification\n- Compact Language Detector 2 ([C++](https://github.com/CLD2Owners/cld2), 83 languages supported) uses Naïve Bayesian classifier for language identification. The library can handle HTML and is optimized for texts with 200 characters.\n- Compact Language Detector 3 ([C++](https://github.com/google/cld3), ? languages supported) is used in Chromium and based on a neural network. \n- Langid ([C](https://github.com/saffsd/langid.c), [Javascript](https://github.com/saffsd/langid.js), [Python](https://github.com/saffsd/langid.py)) is a library for language detection with models for 97 languages included. It is based on [Langid.py: an off-the-shelf language identification tool (2012)](https://dl.acm.org/citation.cfm?id=2390475).\n- Language-Detection ([Java](https://github.com/shuyo/language-detection), [Wiki](https://code.google.com/archive/p/language-detection/), [Slides](https://www.slideshare.net/shuyo/language-detection-library-for-java)) for 53 languages. This is the origin of this library and unfortunately no longer maintained.\n\n\n### Webservices for Language Identification\n- [API Layer](https://apilayer.com)\n- [Cortal](http://www.cortical.io/detect-language.html)\n- [Detect Language](https://detectlanguage.com)\n- [Google Translation API](https://cloud.google.com/translate/docs/detecting-language)\n- [Microsoft Azure Cognitive Services](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-language-detection)\n- [UClassify](https://uclassify.com/browse/uclassify/language-detector)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkju2%2Flanguage-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkju2%2Flanguage-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkju2%2Flanguage-detector/lists"}