https://github.com/dps/go-xml-parse
Streaming XML parser example in go
https://github.com/dps/go-xml-parse
Last synced: 9 months ago
JSON representation
Streaming XML parser example in go
- Host: GitHub
- URL: https://github.com/dps/go-xml-parse
- Owner: dps
- Created: 2012-06-19T03:38:44.000Z (about 14 years ago)
- Default Branch: master
- Last Pushed: 2015-07-04T01:01:46.000Z (almost 11 years ago)
- Last Synced: 2025-07-21T10:19:44.271Z (11 months ago)
- Language: Go
- Size: 253 KB
- Stars: 133
- Watchers: 7
- Forks: 27
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
go-xml-parse
============
Streaming XML parser example in Go
Intro
-----
I've recently been messing around with the XML dumps of Wikipedia. These are pretty huge XML files - for instance the most recent revision is 36G when uncompressed. That's a lot of XML!
I've been experimenting with a few different languages and parsers for my task (which also happens to involve some non trivial processing for each article) and found Go to be a great fit.
Go has a common library package for parsing xml (encoding/xml) which is very convenient to code against. However, the simple version of the API requires parsing the whole document at once, which for 36G is not a viable strategy.
The parser can also be used in a streaming mode but I found the documentation and examples online to be terse and non-existant respectively, so here is my example code for parsing wikipedia with encoding/xml and a little explanation! (full example code at https://github.com/dps/go-xml-parse/blob/master/go-xml-parse.go)
Here's a little snippet of an example wikipedia page in the doc:
```xml
Apollo 11
...
...
{{Infobox Space mission
|mission_name=