Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/glavo/java-chardet


https://github.com/glavo/java-chardet

Last synced: 3 days ago
JSON representation

Awesome Lists containing this project

README

        

# Java Chardet

[![Latest release](https://img.shields.io/maven-central/v/org.glavo/chardet)](https://github.com/Glavo/java-chardet/releases/latest)

This library is a fork of [albfernandez/juniversalchardet](https://github.com/albfernandez/juniversalchardet),
based on commit [ff74981](https://github.com/albfernandez/juniversalchardet/commit/ff7498139012dfc82e2b6c0a8eb257a9c1fd657f).

The purpose of this library is to detect the encoding of unknown encoded text.

Note: This library is in beta stage, and there may be breaking changes to the API in the future.

## Differences from upstream

The main difference between this library and the upstream (and my motivation for creating this fork)
is that all APIs are based on `ByteBuffer` instead of `byte[]`, so this library can directly handle off-heap memory.

Of course, I've also provided `byte[]` based shorthands for these APIs, so working with `byte[]` isn't any more cumbersome.

In addition, I also did some cleaning up of the library.
The more important difference is that this library no longer uses `String` to represent encoding,
instead [DetectedCharset](src/main/java/org/glavo/chardet/DetectedCharset.java) is used.
You can convert DetectedCharset to Java `java.nio.charset.Charset` easily:

```java
DetectedCharset result = UniversalDetector.detectCharset(Paths.get("testfile.txt"));
Charset charset = result != null ? result.getCharset() : StandardCharsets.UTF_8;
```

The reason for not using `Charset` directly is that this library supports detection of some encodings that Java does not support (e.g. `HZ-GB-2312`).

There are some other minor cleanups and fixes to this library. I plan to submit some patches to upstream in the future.

## Adding the library to your build

Maven:
```xml

org.glavo
chardet
2.4.0-beta1

```

Gradle:
```kotlin
implementation("org.glavo:chardet:2.4.0-beta1")
```

## License

The library is subject to the Mozilla Public License Version 1.1.

Alternatively, the library may be used under the terms of either the GNU General Public License Version 2 or later,
or the GNU Lesser General Public License 2.1 or later.