https://github.com/treyhunner/detect-charset

Detect character set of file
https://github.com/treyhunner/detect-charset

Last synced: 3 months ago
JSON representation

Detect character set of file

Host: GitHub
URL: https://github.com/treyhunner/detect-charset
Owner: treyhunner
Created: 2015-02-01T05:35:38.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2015-02-01T07:17:02.000Z (over 10 years ago)
Last Synced: 2025-02-28T16:33:18.663Z (4 months ago)
Language: JavaScript
Size: 203 KB
Stars: 2
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md

Awesome Lists containing this project

README

        # detect-charset 

[![NPM version][npm-image]][npm-url] [![Build Status][travis-image]][travis-url] [![Dependency Status][daviddm-url]][daviddm-image] [![Coverage Status][coveralls-image]][coveralls-url]

Detect character set of file

## Install

```bash

$ npm install --save detect-charset

```

## Usage

```javascript

var detectCharset = require('detect-charset');

var fileBuffer = fs.readFileSync('myfile.txt');

detectCharset(fileBuffer); // "latin1"

```

## Supported Features

### latin1

Any file without a byte order mark or unicode characters will return "latin1".

Valid UTF-8 files without any unicode charaters will return "latin1".

### utf-8

Any file that contains unicode characters but does not have a byte order mark will return UTF-8.

In the future, more advanced character set guessing techniques may be employed, but at the moment all files without a byte order mark are assumed to be UTF-8.

### utf-8-bom

Any file with a UTF-8 byte order mark (`'\xEF\xBB\xBF'`) will return "utf-8-bom".

### utf-16be

Any file with a UTF-16 big-endian byte order mark (`'\xFE\xFF'`) will return "utf-16be".

### utf-16be

Any file with a UTF-16 little-endian byte order mark (`'\xFF\xFE'`) will return "utf-16le".

### utf-32be

Any file with a UTF-32 big-endian byte order mark (`'\x00\x00\xFE\xFF'`) will return "utf-32be".

### utf-32be

Any file with a UTF-32 little-endian byte order mark (`'\x00\x00\xFF\xFE'`) will return "utf-32le".

## Contributing

In lieu of a formal styleguide, take care to maintain the existing coding style. Add unit tests for any new or changed functionality. Lint and test your code using [gulp](http://gulpjs.com/).

## License

Copyright (c) 2015 Trey Hunner. Licensed under the MIT license.

[npm-url]: https://npmjs.org/package/detect-charset

[npm-image]: https://badge.fury.io/js/detect-charset.svg

[travis-url]: https://travis-ci.org/treyhunner/detect-charset

[travis-image]: https://travis-ci.org/treyhunner/detect-charset.svg?branch=master

[daviddm-url]: https://david-dm.org/treyhunner/detect-charset.svg?theme=shields.io

[daviddm-image]: https://david-dm.org/treyhunner/detect-charset

[coveralls-url]: https://coveralls.io/r/treyhunner/detect-charset

[coveralls-image]: https://coveralls.io/repos/treyhunner/detect-charset/badge.png

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/treyhunner/detect-charset

Awesome Lists containing this project

README