https://github.com/runk/node-chardet

Character encoding detection tool for NodeJS
https://github.com/runk/node-chardet

hacktoberfest

Last synced: 2 months ago
JSON representation

Character encoding detection tool for NodeJS

Host: GitHub
URL: https://github.com/runk/node-chardet
Owner: runk
License: mit
Created: 2013-04-29T14:29:28.000Z (about 12 years ago)
Default Branch: master
Last Pushed: 2025-02-24T22:26:27.000Z (5 months ago)
Last Synced: 2025-05-04T15:39:11.638Z (2 months ago)
Topics: hacktoberfest
Language: TypeScript
Homepage:
Size: 1.86 MB
Stars: 291
Watchers: 7
Forks: 73
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # chardet

_Chardet_ is a character detection module written in pure JavaScript (TypeScript). Module uses occurrence analysis to determine the most probable encoding.

- Packed size is only **22 KB**

- Works in all environments: Node / Browser / Native

- Works on all platforms: Linux / Mac / Windows

- No dependencies

- No native code / bindings

- 100% written in TypeScript

- Extensive code coverage

## Installation

```

npm i chardet

```

## Usage

To return the encoding with the highest confidence:

```javascript

import chardet from 'chardet';

const encoding = chardet.detect(Buffer.from('hello there!'));

// or

const encoding = await chardet.detectFile('/path/to/file');

// or

const encoding = chardet.detectFileSync('/path/to/file');

```

To return the full list of possible encodings use `analyse` method.

```javascript

import chardet from 'chardet';

chardet.analyse(Buffer.from('hello there!'));

```

Returned value is an array of objects sorted by confidence value in descending order

```javascript

[

  { confidence: 90, name: 'UTF-8' },

  { confidence: 20, name: 'windows-1252', lang: 'fr' },

];

```

In browser, you can use [Uint8Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array) instead of the `Buffer`:

```javascript

import chardet from 'chardet';

chardet.analyse(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f]));

```

## Working with large data sets

Sometimes, when data set is huge and you want to optimize performance (with a trade off of less accuracy),

you can sample only the first N bytes of the buffer:

```javascript

const encoding = await chardet.detectFile('/path/to/file', { sampleSize: 32 });

```

You can also specify where to begin reading from in the buffer:

```javascript

const encoding = await chardet.detectFile('/path/to/file', {

  sampleSize: 32,

  offset: 128,

});

```

## Working with strings

In both Node.js and browsers, all strings in memory are represented in UTF-16 encoding. This is a fundamental aspect of the JavaScript language specification. Therefore, you cannot use plain strings directly as input for `chardet.analyse()` or `chardet.detect()`. Instead, you need the original string data in the form of a Buffer or Uint8Array.

In other words, if you receive a piece of data over the network and want to detect its encoding, use the original data payload, not its string representation. By the time you convert data to a string, it will be in UTF-16 encoding.

Note on [TextEncoder](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder/TextEncoder): By default, it returns a UTF-8 encoded buffer, which means the buffer will not be in the original encoding of the string.

## Supported Encodings:

- UTF-8

- UTF-16 LE

- UTF-16 BE

- UTF-32 LE

- UTF-32 BE

- ISO-2022-JP

- ISO-2022-KR

- ISO-2022-CN

- Shift_JIS

- Big5

- EUC-JP

- EUC-KR

- GB18030

- ISO-8859-1

- ISO-8859-2

- ISO-8859-5

- ISO-8859-6

- ISO-8859-7

- ISO-8859-8

- ISO-8859-9

- windows-1250

- windows-1251

- windows-1252

- windows-1253

- windows-1254

- windows-1255

- windows-1256

- KOI8-R

Currently only these encodings are supported.

## TypeScript?

Yes. Type definitions are included.

### References

- ICU project http://site.icu-project.org/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/runk/node-chardet

Awesome Lists containing this project

README