{"id":16696946,"url":"https://github.com/seratch/juniversalchardet","last_synced_at":"2025-04-05T03:43:39.044Z","repository":{"id":36886010,"uuid":"41193001","full_name":"seratch/juniversalchardet","owner":"seratch","description":"Automatically exported from code.google.com/p/juniversalchardet","archived":false,"fork":false,"pushed_at":"2015-08-22T05:52:29.000Z","size":668,"stargazers_count":1,"open_issues_count":17,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-10T12:12:37.070Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seratch.png","metadata":{"files":{"readme":"readme.txt","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-22T05:49:57.000Z","updated_at":"2019-03-04T06:54:53.000Z","dependencies_parsed_at":"2022-09-11T23:41:48.683Z","dependency_job_id":null,"html_url":"https://github.com/seratch/juniversalchardet","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seratch%2Fjuniversalchardet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seratch%2Fjuniversalchardet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seratch%2Fjuniversalchardet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seratch%2Fjuniversalchardet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seratch","download_url":"https://codeload.github.com/seratch/juniversalchardet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247284918,"owners_count":20913691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T17:45:22.749Z","updated_at":"2025-04-05T03:43:39.019Z","avatar_url":"https://github.com/seratch.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"juniversalchardet\r\n\r\n\r\n1. What is it?\r\n\r\njuniversalchardet is a Java port of \"universalchardet\",\r\nthat is the encoding detector library of Mozilla.\r\n\r\nThe original code of universalchardet is available at\r\nhttp://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/\r\n\r\nTechniques used by universalchardet are described at\r\nhttp://www.mozilla.org/projects/intl/UniversalCharsetDetection.html\r\n\r\n\r\n2. Encodings that can be detected\r\n\r\n- Chinese\r\n  - ISO-2022-CN\r\n  - BIG-5\r\n  - EUC-TW\r\n  - GB18030\r\n  - HZ-GB-2312\r\n\r\n- Cyrillic\r\n  - ISO-8859-5\r\n  - KOI8-R\r\n  - WINDOWS-1251\r\n  - MACCYRILLIC\r\n  - IBM866\r\n  - IBM855\r\n\r\n- Greek\r\n  - ISO-8859-7\r\n  - WINDOWS-1253\r\n\r\n- Hebrew\r\n  - ISO-8859-8\r\n  - WINDOWS-1255\r\n\r\n- Japanese\r\n  - ISO-2022-JP\r\n  - Shift_JIS\r\n  - EUC-JP\r\n\r\n- Korean\r\n  - ISO-2022-KR\r\n  - EUC-KR\r\n\r\n- Unicode\r\n  - UTF-8\r\n  - UTF-16BE / UTF-16LE\r\n  - UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-3412 / X-ISO-10646-UCS-4-2143\r\n\r\n- Others\r\n  - WINDOWS-1252\r\n\r\nAll supported encodings are listed in org.mozilla.universalchardet.Constants.\r\n\r\n\r\n3. How to use it\r\n\r\n(1) Construct an instance of org.mozilla.universalchardet.UniversalDetector.\r\n(2) Feed some data (typically some thousand bytes) to the detector\r\n    using UniversalDetector.handleData().\r\n(3) Notify the detector of the end of data by using\r\n    UniversalDetector.dataEnd().\r\n(4) Get the detected encoding name by using\r\n    UniversalDetector.getDetectedCharset().\r\n(5) Don't forget to call UniversalDetector.reset() before you reuse\r\n    the detector instance for another guess.\r\n\r\n\r\n------------ Sample Code ------------\r\nimport org.mozilla.universalchardet.UniversalDetector;\r\n\r\npublic class TestDetector\r\n{\r\n  public static void main(String[] args)\r\n  {\r\n    byte[] buf = new byte[4096];\r\n    java.io.FileInputStream fis = new java.io.FileInputStream(\"test.txt\");\r\n\r\n    // (1)\r\n    UniversalDetector detector = new UniversalDetector(null);\r\n\r\n    // (2)\r\n    int nread;\r\n    while ((nread = fis.read(buf)) \u003e 0 \u0026\u0026 !detector.isDone()) {\r\n      detector.handleData(buf, 0, nread);\r\n    }\r\n    // (3)\r\n    detector.dataEnd();\r\n\r\n    // (4)\r\n    String encoding = detector.getDetectedCharset();\r\n    if (encoding != null) {\r\n      System.out.println(\"Detected encoding = \" + encoding);\r\n    } else {\r\n      System.out.println(\"No encoding detected.\");\r\n    }\r\n\r\n    // (5)\r\n    detector.reset();\r\n  }\r\n}\r\n\r\n\r\n4. Related Woks\r\n\r\n- jchardet  http://jchardet.sourceforge.net/\r\n\r\njchardet is another Java port of the Mozilla's encoding dectection library.\r\nThe main difference between jchardet and juniversalchardet is modules\r\nthey are based on. jchardet is based on the \"chardet\" module that has\r\nlong existed. juniversalchardet is based on the \"universalchardet\" module\r\nthat is new and generally provides better accuracy on detection results.\r\n\r\n\r\n5. License\r\n\r\nThe library is subject to the Mozilla Public License Version 1.1.\r\nAlternatively, the library may be used under the terms of either\r\nthe GNU General Public License Version 2 or later, or the GNU\r\nLesser General Public License 2.1 or later.\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseratch%2Fjuniversalchardet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseratch%2Fjuniversalchardet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseratch%2Fjuniversalchardet/lists"}