Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/antchfx/xquery
Extract data or evaluate value from HTML/XML documents using XPath
https://github.com/antchfx/xquery
extracting golang html scraping xml xpath
Last synced: about 1 month ago
JSON representation
Extract data or evaluate value from HTML/XML documents using XPath
- Host: GitHub
- URL: https://github.com/antchfx/xquery
- Owner: antchfx
- License: mit
- Archived: true
- Created: 2016-10-09T05:54:10.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-15T05:19:11.000Z (over 6 years ago)
- Last Synced: 2024-08-03T15:06:15.724Z (5 months ago)
- Topics: extracting, golang, html, scraping, xml, xpath
- Language: Go
- Homepage: https://github.com/antchfx/xpath
- Size: 93.8 KB
- Stars: 158
- Watchers: 10
- Forks: 28
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go - xquery - xquery, extract data or evaluate value from HTML/XML documents using XPath expression in Go - ★ 136 (XML)
- awesome-go-extra - ARCHIVED - 10-09T05:54:10Z|2018-05-15T05:19:11Z| (XML / Routers)
README
xquery
====
[![Build Status](https://travis-ci.org/antchfx/xquery.svg?branch=master)](https://travis-ci.org/antchfx/xquery)
[![Coverage Status](https://coveralls.io/repos/github/antchfx/xquery/badge.svg?branch=master)](https://coveralls.io/github/antchfx/xquery?branch=master)
[![GoDoc](https://godoc.org/github.com/antchfx/xquery?status.svg)](https://godoc.org/github.com/antchfx/xquery)
[![Go Report Card](https://goreportcard.com/badge/github.com/antchfx/xquery)](https://goreportcard.com/report/github.com/antchfx/xquery)> NOTE: This package is deprecated. Recommends use [htmlquery](https://github.com/antchfx/htmlquery) and [xmlquery](https://github.com/antchfx/xmlquery) package, get latest version to fixed some issues.
Overview
===Golang package, lets you extract data from HTML/XML documents using XPath expression.
List of supported XPath functions you can found here [XPath Package](https://github.com/antchfx/xpath).
Installation
====> go get github.com/antchfx/xquery
HTML Query [![GoDoc](https://godoc.org/github.com/antchfx/xquery/html?status.svg)](https://godoc.org/github.com/antchfx/xquery/html)
===Extract data from HTML document.
```go
package mainimport (
"github.com/antchfx/xpath"
"github.com/antchfx/xquery/html"
)func main() {
// Load HTML file.
f, err := os.Open(`./examples/test.html`)
if err != nil {
panic(err)
}
// Parse HTML document.
doc, err := htmlquery.Parse(f)
if err != nil{
panic(err)
}// Option 1: using xpath's expr to matches nodes.
expr := xpath.MustCompile("count(//div[@class='article'])")
fmt.Printf("%f \n", expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64))expr = xpath.MustCompile("//a/@href")
iter := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(*xpath.NodeIterator)
for iter.MoveNext() {
fmt.Printf("%s \n", iter.Current().Value()) // output href
}// Option 2: using build-in functions Find() to matches nodes.
for _, n := range htmlquery.Find(doc, "//a/@href") {
fmt.Printf("%s \n", htmlquery.SelectAttr(n, "href")) // output href
}
}
```XML Query [![GoDoc](https://godoc.org/github.com/antchfx/xquery/xml?status.svg)](https://godoc.org/github.com/antchfx/xquery/xml)
===
Extract data from XML document.```go
package mainimport (
"github.com/antchfx/xpath"
"github.com/antchfx/xquery/xml"
)func main() {
// Load XML document from file.
f, err := os.Open(`./examples/test.xml`)
if err != nil {
panic(err)
}
// Parse XML document.
doc, err := xmlquery.Parse(f)
if err != nil{
panic(err)
}// Option 1: using xpath's expr to matches nodes.
// sum all book's price via Evaluate()
expr, err := xpath.Compile("sum(//book/price)")
if err != nil {
panic(err)
}
fmt.Printf("total price: %f\n", expr.Evaluate(xmlquery.CreateXPathNavigator(doc)).(float64))for _, n := range xmlquery.Find(doc, "//book") {
fmt.Printf("%s : %s \n", n.SelectAttr("id"), xmlquery.FindOne(n, "title").InnerText())
}
// Option 2: using build-in functions FindOne() to matches node.
n := xmlquery.FindOne(doc, "//book[@id='bk104']")
fmt.Printf("%s \n", n.OutputXML(true))
}
```