An open API service indexing awesome lists of open source software.

https://github.com/dps/go-xml-parse

Streaming XML parser example in go
https://github.com/dps/go-xml-parse

Last synced: 9 months ago
JSON representation

Streaming XML parser example in go

Awesome Lists containing this project

README

          

go-xml-parse
============

Streaming XML parser example in Go

Intro
-----

I've recently been messing around with the XML dumps of Wikipedia. These are pretty huge XML files - for instance the most recent revision is 36G when uncompressed. That's a lot of XML!

I've been experimenting with a few different languages and parsers for my task (which also happens to involve some non trivial processing for each article) and found Go to be a great fit.

Go has a common library package for parsing xml (encoding/xml) which is very convenient to code against. However, the simple version of the API requires parsing the whole document at once, which for 36G is not a viable strategy.

The parser can also be used in a streaming mode but I found the documentation and examples online to be terse and non-existant respectively, so here is my example code for parsing wikipedia with encoding/xml and a little explanation! (full example code at https://github.com/dps/go-xml-parse/blob/master/go-xml-parse.go)

Here's a little snippet of an example wikipedia page in the doc:

```xml

Apollo 11

...

...

{{Infobox Space mission
|mission_name=