https://github.com/fumoboy007/universalcharsetdetection
A Swift wrapper around the `uchardet` library to detect the character encoding of a sequence of bytes.
https://github.com/fumoboy007/universalcharsetdetection
character-encoding charset-detection charset-detector swift swift4-2 swift5 uchardet
Last synced: 8 months ago
JSON representation
A Swift wrapper around the `uchardet` library to detect the character encoding of a sequence of bytes.
- Host: GitHub
- URL: https://github.com/fumoboy007/universalcharsetdetection
- Owner: fumoboy007
- License: other
- Created: 2020-02-22T20:10:24.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-04-19T23:31:38.000Z (about 6 years ago)
- Last Synced: 2025-10-16T16:32:38.134Z (8 months ago)
- Topics: character-encoding, charset-detection, charset-detector, swift, swift4-2, swift5, uchardet
- Language: C++
- Homepage:
- Size: 242 KB
- Stars: 9
- Watchers: 1
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE-UCHARDET
Awesome Lists containing this project
README
# UniversalCharsetDetection
A Swift wrapper around the [`uchardet` library](https://www.freedesktop.org/wiki/Software/uchardet/) to detect the [character encoding](https://en.wikipedia.org/wiki/Character_encoding) of a sequence of bytes.
`uchardet` is more versatile than [`NSString.stringEncoding(for:encodingOptions:convertedString:usedLossyConversion:)`](https://developer.apple.com/documentation/foundation/nsstring/1413576-stringencoding) because
- it supports many more encodings;
- it supports streaming large files; and
- its output is compatible with [`iconv`](https://en.wikipedia.org/wiki/Iconv).
## Usage
Compatible with Swift 4.2+.
- To integrate the library into your project, add a dependency on this package in your project’s Swift Package Manager configuration file or in your Xcode 11+ project.
- To detect the character encoding of a file, see `CharacterEncodingDetector+File`.
- To detect the character encoding of a collection of bytes, see `DataProtocol+CharacterEncoding`.
- To detect the character encoding of a manually-provided stream of bytes, see `CharacterEncodingDetector`.
## License
See the `LICENSE.md` file.
## Contributing
Since the Swift Package Manager does not yet support binary dependencies, we copy the `uchardet` source code into the `Sources/Cuchardet` directory to enable the Swift Package Manager to build and link with the `uchardet` library. See the `adapt-uchardet-to-swiftpm` script.
To change the version of the `uchardet` library, run the following commands in the root of the source code directory tree:
```
$ git init uchardet
$ git submodule update --remote uchardet
$ cd uchardet
$ git checkout
$ cd ..
$ ./adapt-uchardet-to-swiftpm
```