Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/glavo/java-chardet
https://github.com/glavo/java-chardet
Last synced: 3 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/glavo/java-chardet
- Owner: Glavo
- Created: 2024-04-29T20:10:33.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-04-30T22:26:08.000Z (7 months ago)
- Last Synced: 2024-05-02T14:24:02.492Z (7 months ago)
- Language: Java
- Size: 476 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Java Chardet
[![Latest release](https://img.shields.io/maven-central/v/org.glavo/chardet)](https://github.com/Glavo/java-chardet/releases/latest)
This library is a fork of [albfernandez/juniversalchardet](https://github.com/albfernandez/juniversalchardet),
based on commit [ff74981](https://github.com/albfernandez/juniversalchardet/commit/ff7498139012dfc82e2b6c0a8eb257a9c1fd657f).The purpose of this library is to detect the encoding of unknown encoded text.
Note: This library is in beta stage, and there may be breaking changes to the API in the future.
## Differences from upstream
The main difference between this library and the upstream (and my motivation for creating this fork)
is that all APIs are based on `ByteBuffer` instead of `byte[]`, so this library can directly handle off-heap memory.Of course, I've also provided `byte[]` based shorthands for these APIs, so working with `byte[]` isn't any more cumbersome.
In addition, I also did some cleaning up of the library.
The more important difference is that this library no longer uses `String` to represent encoding,
instead [DetectedCharset](src/main/java/org/glavo/chardet/DetectedCharset.java) is used.
You can convert DetectedCharset to Java `java.nio.charset.Charset` easily:```java
DetectedCharset result = UniversalDetector.detectCharset(Paths.get("testfile.txt"));
Charset charset = result != null ? result.getCharset() : StandardCharsets.UTF_8;
```The reason for not using `Charset` directly is that this library supports detection of some encodings that Java does not support (e.g. `HZ-GB-2312`).
There are some other minor cleanups and fixes to this library. I plan to submit some patches to upstream in the future.
## Adding the library to your build
Maven:
```xmlorg.glavo
chardet
2.4.0-beta1```
Gradle:
```kotlin
implementation("org.glavo:chardet:2.4.0-beta1")
```## License
The library is subject to the Mozilla Public License Version 1.1.
Alternatively, the library may be used under the terms of either the GNU General Public License Version 2 or later,
or the GNU Lesser General Public License 2.1 or later.