https://github.com/sonicdoe/detect-character-encoding

Detect character encoding using ICU
https://github.com/sonicdoe/detect-character-encoding

c-plus-plus character-encoding charset detect encoding icu javascript nodejs

Last synced: 10 months ago
JSON representation

Detect character encoding using ICU

Host: GitHub
URL: https://github.com/sonicdoe/detect-character-encoding
Owner: sonicdoe
License: other
Created: 2015-03-15T11:01:13.000Z (over 11 years ago)
Default Branch: develop
Last Pushed: 2024-01-06T12:33:37.000Z (over 2 years ago)
Last Synced: 2025-04-02T09:07:41.960Z (over 1 year ago)
Topics: c-plus-plus, character-encoding, charset, detect, encoding, icu, javascript, nodejs
Language: C++
Homepage:
Size: 57 MB
Stars: 83
Watchers: 2
Forks: 15
Open Issues: 3
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md

Awesome Lists containing this project

README

          # detect-character-encoding

> Detect character encoding using [ICU](http://site.icu-project.org)

**Tip:** If you don’t need ICU in particular, consider using [ced](https://github.com/sonicdoe/ced), which is based on Google’s lighter [compact_enc_det](https://github.com/google/compact_enc_det) library.

## Installation

```console

$ npm install detect-character-encoding

```

detect-character-encoding is a C++ addon. Therefore, you may need to install various build tools. Check [node-gyp’s readme](https://github.com/nodejs/node-gyp#installation) for more information.

## Usage

```js

const fs = require('fs');

const detectCharacterEncoding = require('detect-character-encoding');

const fileBuffer = fs.readFileSync('file.txt');

const charsetMatch = detectCharacterEncoding(fileBuffer);

console.log(charsetMatch);

// {

//   encoding: 'UTF-8',

//   confidence: 60

// }

```

detect-character-encoding may return `null` if no charset matches.

## Supported operating systems

- macOS Sonoma

- Ubuntu 22.04 and 20.04

- Debian 12, 11, and 10

detect-character-encoding does not support 32-bit operating systems.

## Supported character sets

As listed in [ICU’s user guide](http://userguide.icu-project.org/conversion/detection#TOC-Detected-Encodings):

- UTF-8

- UTF-16BE

- UTF-16LE

- UTF-32BE

- UTF-32LE

- Shift_JIS

- ISO-2022-JP

- ISO-2022-CN

- ISO-2022-KR

- GB18030

- Big5

- EUC-JP

- EUC-KR

- ISO-8859-1

- ISO-8859-2

- ISO-8859-5

- ISO-8859-6

- ISO-8859-7

- ISO-8859-8

- ISO-8859-9

- windows-1250

- windows-1251

- windows-1252

- windows-1253

- windows-1254

- windows-1255

- windows-1256

- KOI8-R

- IBM420

- IBM424

## License

detect-character-encoding is licensed under the BSD 2-clause license but includes third-party software under different licenses. See [`LICENSE.md`](./LICENSE.md) for the full license text.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sonicdoe/detect-character-encoding

Awesome Lists containing this project

README