Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/orisano/gosax
Go library for XML SAX (Simple API for XML) parsing
https://github.com/orisano/gosax
go sax xml
Last synced: 2 days ago
JSON representation
Go library for XML SAX (Simple API for XML) parsing
- Host: GitHub
- URL: https://github.com/orisano/gosax
- Owner: orisano
- License: other
- Created: 2024-06-26T03:49:42.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-10-06T10:26:22.000Z (about 1 month ago)
- Last Synced: 2024-10-30T23:05:01.774Z (17 days ago)
- Topics: go, sax, xml
- Language: Go
- Homepage:
- Size: 27.3 KB
- Stars: 117
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# gosax
[![Go Reference](https://pkg.go.dev/badge/github.com/orisano/gosax.svg)](https://pkg.go.dev/github.com/orisano/gosax)
`gosax` is a Go library for XML SAX (Simple API for XML) parsing, supporting read-only functionality. This library is
designed for efficient and memory-conscious XML parsing, drawing inspiration from various sources to provide a
performant parser.## Features
- **Read-only SAX parsing**: Stream and process XML documents without loading the entire document into memory.
- **Efficient parsing**: Utilizes techniques inspired by `quick-xml` and `pkg/json` for high performance.
- **SWAR (SIMD Within A Register)**: Optimizations for fast text processing, inspired by `memchr`.
- **Compatibility with encoding/xml**: Includes utility functions to bridge `gosax` types with `encoding/xml` types, facilitating easy integration with existing code that uses the standard library.## Benchmark
```
goos: darwin
goarch: arm64
pkg: github.com/orisano/gosax
BenchmarkReader_Event-12 5 211845800 ns/op 1103.30 MB/s 2097606 B/op 6 allocs/op
```## Installation
To install `gosax`, use `go get`:
```bash
go get github.com/orisano/gosax
```## Usage
Here is a basic example of how to use `gosax` to parse an XML document:
```go
package mainimport (
"fmt"
"log"
"strings""github.com/orisano/gosax"
)func main() {
xmlData := `Value`
reader := strings.NewReader(xmlData)r := gosax.NewReader(reader)
for {
e, err := r.Event()
if err != nil {
log.Fatal(err)
}
if e.Type() == gosax.EventEOF {
break
}
fmt.Println(string(e.Bytes))
}
// Output:
//
//
// Value
//
//
}```
### Bridging with encoding/xml
**Important Note for encoding/xml Users:**
> When migrating from `encoding/xml` to `gosax`, note that self-closing tags are handled differently. To mimic `encoding/xml` behavior, set `gosax.Reader.EmitSelfClosingTag` to `true`. This ensures self-closing tags are recognized and processed correctly.#### Using TokenE
If you are used to `encoding/xml`'s `Token`, start with `gosax.TokenE`.
**Note:** Using `gosax.TokenE` and `gosax.Token` involves memory allocation due to interfaces.**Before:**
```go
var dec *xml.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
// ...
}
```**After:**
```go
var dec *gosax.Reader
for {
tok, err := gosax.TokenE(dec.Event())
if err == io.EOF {
break
}
// ...
}
```#### Utilizing xmlb
`xmlb` is an extension for `gosax` to simplify rewriting code from `encoding/xml`. It provides a higher-performance bridge for XML parsing and processing.**Before:**
```go
var dec *xml.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
switch t := tok.(type) {
case xml.StartElement:
// ...
case xml.CharData:
// ...
case xml.EndElement:
// ...
}
}
```**After:**
```go
var dec *xmlb.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
switch tok.Type() {
case xmlb.StartElement:
t, _ := tok.StartElement()
// ...
case xmlb.CharData:
t, _ := tok.CharData()
// ...
case xmlb.EndElement:
t := tok.EndElement()
// ...
}
}
```## License
This library is licensed under the terms specified in the LICENSE file.
## Acknowledgements
`gosax` is inspired by the following projects and resources:
- [Dave Cheney's GopherCon SG 2023 Talk](https://dave.cheney.net/paste/gophercon-sg-2023.html)
- [quick-xml](https://github.com/tafia/quick-xml)
- [memchr](https://github.com/BurntSushi/memchr) (SWAR part)## Contributing
Contributions are welcome! Please fork the repository and submit pull requests.
## Contact
For any questions or feedback, feel free to open an issue on the GitHub repository.