{"id":13413951,"url":"https://github.com/antchfx/htmlquery","last_synced_at":"2025-03-14T20:30:48.566Z","repository":{"id":39461677,"uuid":"113114020","full_name":"antchfx/htmlquery","owner":"antchfx","description":"htmlquery is golang XPath package for HTML query.","archived":false,"fork":false,"pushed_at":"2024-06-24T04:07:51.000Z","size":140,"stargazers_count":715,"open_issues_count":8,"forks_count":72,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-07-31T20:53:11.099Z","etag":null,"topics":["go","golang","html","html-parser","xpath","xpath-selector","xpath2"],"latest_commit_sha":null,"homepage":"https://github.com/antchfx/xpath","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antchfx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-05T01:08:41.000Z","updated_at":"2024-07-30T11:23:12.000Z","dependencies_parsed_at":"2024-04-06T12:29:51.767Z","dependency_job_id":"b9410511-3952-4fbc-838e-e6803ff24de5","html_url":"https://github.com/antchfx/htmlquery","commit_stats":{"total_commits":110,"total_committers":13,"mean_commits":8.461538461538462,"dds":"0.18181818181818177","last_synced_commit":"11003dab477278c8e3b7b0441a58b6f35c75ff61"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antchfx%2Fhtmlquery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antchfx%2Fhtmlquery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antchfx%2Fhtmlquery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antchfx%2Fhtmlquery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antchfx","download_url":"https://codeload.github.com/antchfx/htmlquery/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243642008,"owners_count":20323949,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","html","html-parser","xpath","xpath-selector","xpath2"],"created_at":"2024-07-30T20:01:53.507Z","updated_at":"2025-03-14T20:30:48.228Z","avatar_url":"https://github.com/antchfx.png","language":"Go","readme":"# htmlquery\n\n[![Build Status](https://github.com/antchfx/htmlquery/actions/workflows/testing.yml/badge.svg)](https://github.com/antchfx/htmlquery/actions/workflows/testing.yml)\n[![GoDoc](https://godoc.org/github.com/antchfx/htmlquery?status.svg)](https://godoc.org/github.com/antchfx/htmlquery)\n[![Go Report Card](https://goreportcard.com/badge/github.com/antchfx/htmlquery)](https://goreportcard.com/report/github.com/antchfx/htmlquery)\n\n# Overview\n\n`htmlquery` is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.\n\n`htmlquery` built-in the query object caching feature based on [LRU](https://godoc.org/github.com/golang/groupcache/lru), this feature will caching the recently used XPATH query string. Enable query caching can avoid re-compile XPath expression each query.\n\nYou can visit this page to learn about the supported XPath(1.0/2.0) syntax. https://github.com/antchfx/xpath\n\n# XPath query packages for Go\n\n| Name                                              | Description                               |\n| ------------------------------------------------- | ----------------------------------------- |\n| [htmlquery](https://github.com/antchfx/htmlquery) | XPath query package for the HTML document |\n| [xmlquery](https://github.com/antchfx/xmlquery)   | XPath query package for the XML document  |\n| [jsonquery](https://github.com/antchfx/jsonquery) | XPath query package for the JSON document |\n\n# Installation\n\n```\ngo get github.com/antchfx/htmlquery\n```\n\n# Getting Started\n\n#### Query, returns matched elements or error.\n\n```go\nnodes, err := htmlquery.QueryAll(doc, \"//a\")\nif err != nil {\n\tpanic(`not a valid XPath expression.`)\n}\n```\n\n#### Load HTML document from URL.\n\n```go\ndoc, err := htmlquery.LoadURL(\"http://example.com/\")\n```\n\n#### Load HTML from document.\n\n```go\nfilePath := \"/home/user/sample.html\"\ndoc, err := htmlquery.LoadDoc(filePath)\n```\n\n#### Load HTML document from string.\n\n```go\ns := `\u003chtml\u003e....\u003c/html\u003e`\ndoc, err := htmlquery.Parse(strings.NewReader(s))\n```\n\n#### Find all A elements.\n\n```go\nlist := htmlquery.Find(doc, \"//a\")\n```\n\n#### Find all A elements that have `href` attribute.\n\n```go\nlist := htmlquery.Find(doc, \"//a[@href]\")\n```\n\n#### Find all A elements with `href` attribute and only return `href` value.\n\n```go\nlist := htmlquery.Find(doc, \"//a/@href\")\nfor _ , n := range list{\n\tfmt.Println(htmlquery.InnerText(n)) // output @href value\n}\n```\n\n### Find the third A element.\n\n```go\na := htmlquery.FindOne(doc, \"//a[3]\")\n```\n\n### Find children element (img) under A `href` and print the source\n\n```go\na := htmlquery.FindOne(doc, \"//a\")\nimg := htmlquery.FindOne(a, \"//img\")\nfmt.Prinln(htmlquery.SelectAttr(img, \"src\")) // output @src value\n```\n\n#### Evaluate the number of all IMG element.\n\n```go\nexpr, _ := xpath.Compile(\"count(//img)\")\nv := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)\nfmt.Printf(\"total count is %f\", v)\n```\n\n# Quick Starts\n\n```go\nfunc main() {\n\tdoc, err := htmlquery.LoadURL(\"https://www.bing.com/search?q=golang\")\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\t// Find all news item.\n\tlist, err := htmlquery.QueryAll(doc, \"//ol/li\")\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\tfor i, n := range list {\n\t\ta := htmlquery.FindOne(n, \"//a\")\n\t\tif a != nil {\n\t\t    fmt.Printf(\"%d %s(%s)\\n\", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, \"href\"))\n\t\t}\n\t}\n}\n```\n\n# FAQ\n\n#### `Find()` vs `QueryAll()`, which is better?\n\n`Find` and `QueryAll` both do the same things, searches all of matched html nodes.\nThe `Find` will panics if you give an error XPath query, but `QueryAll` will return an error for you.\n\n#### Can I save my query expression object for the next query?\n\nYes, you can. We offer the `QuerySelector` and `QuerySelectorAll` methods, It will accept your query expression object.\n\nCache a query expression object(or reused) will avoid re-compile XPath query expression, improve your query performance.\n\n#### XPath query object cache performance\n\n```\ngoos: windows\ngoarch: amd64\npkg: github.com/antchfx/htmlquery\nBenchmarkSelectorCache-4                20000000                55.2 ns/op\nBenchmarkDisableSelectorCache-4           500000              3162 ns/op\n```\n\n#### How to disable caching?\n\n```\nhtmlquery.DisableSelectorCache = true\n```\n\n# Questions\n\nPlease let me know if you have any questions.\n","funding_links":[],"categories":["开源类库","Text Processing","HTML utilities","Open source library","文本处理","Specific Formats","Template Engines","Bot Building","文本处理`解析和操作文本的代码库`"],"sub_categories":["文本处理","HTTP Clients","Word Processing","Markup Languages","查询语","标记语言","交流"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantchfx%2Fhtmlquery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantchfx%2Fhtmlquery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantchfx%2Fhtmlquery/lists"}