https://github.com/dimchansky/utfbom
Detection of the BOM and removing as necessary
https://github.com/dimchansky/utfbom
bom golang unicode utf
Last synced: 19 days ago
JSON representation
Detection of the BOM and removing as necessary
- Host: GitHub
- URL: https://github.com/dimchansky/utfbom
- Owner: dimchansky
- License: apache-2.0
- Created: 2017-03-25T17:59:08.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-11-20T08:06:53.000Z (over 1 year ago)
- Last Synced: 2025-03-31T15:01:18.672Z (27 days ago)
- Topics: bom, golang, unicode, utf
- Language: Go
- Homepage:
- Size: 21.5 KB
- Stars: 125
- Watchers: 5
- Forks: 13
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# utfbom [](https://godoc.org/github.com/dimchansky/utfbom) [](https://opensource.org/licenses/Apache-2.0) [](https://travis-ci.org/dimchansky/utfbom) [](https://goreportcard.com/report/github.com/dimchansky/utfbom) [](https://coveralls.io/github/dimchansky/utfbom?branch=master)
The package utfbom implements the detection of the BOM (Unicode Byte Order Mark) and removing as necessary. It can also return the encoding detected by the BOM.
## Installation
go get -u github.com/dimchansky/utfbom
## Example```go
package mainimport (
"bytes"
"fmt"
"io""github.com/dimchansky/utfbom"
)func main() {
trySkip([]byte("\xEF\xBB\xBFhello"))
trySkip([]byte("hello"))
}func trySkip(byteData []byte) {
fmt.Println("Input:", byteData)// just skip BOM
output, err := io.ReadAll(utfbom.SkipOnly(bytes.NewReader(byteData)))
if err != nil {
fmt.Println(err)
return
}
fmt.Println("ReadAll with BOM skipping", output)// skip BOM and detect encoding
sr, enc := utfbom.Skip(bytes.NewReader(byteData))
fmt.Printf("Detected encoding: %s\n", enc)
output, err = io.ReadAll(sr)
if err != nil {
fmt.Println(err)
return
}
fmt.Println("ReadAll with BOM detection and skipping", output)
fmt.Println()
}
```Output:
```
$ go run main.go
Input: [239 187 191 104 101 108 108 111]
ReadAll with BOM skipping [104 101 108 108 111]
Detected encoding: UTF8
ReadAll with BOM detection and skipping [104 101 108 108 111]Input: [104 101 108 108 111]
ReadAll with BOM skipping [104 101 108 108 111]
Detected encoding: Unknown
ReadAll with BOM detection and skipping [104 101 108 108 111]
```