{"id":13464576,"url":"https://github.com/yhat/scrape","last_synced_at":"2025-05-13T16:20:10.213Z","repository":{"id":32260309,"uuid":"35834749","full_name":"yhat/scrape","owner":"yhat","description":"A simple, higher level interface for Go web scraping.","archived":false,"fork":false,"pushed_at":"2016-11-28T14:46:10.000Z","size":10,"stargazers_count":1511,"open_issues_count":2,"forks_count":99,"subscribers_count":41,"default_branch":"master","last_synced_at":"2024-10-29T17:50:09.299Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yhat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-18T18:20:30.000Z","updated_at":"2024-10-25T16:08:41.000Z","dependencies_parsed_at":"2022-08-07T17:15:43.151Z","dependency_job_id":null,"html_url":"https://github.com/yhat/scrape","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhat%2Fscrape","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhat%2Fscrape/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhat%2Fscrape/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhat%2Fscrape/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yhat","download_url":"https://codeload.github.com/yhat/scrape/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245454044,"owners_count":20617967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T14:00:46.529Z","updated_at":"2025-03-25T11:31:45.730Z","avatar_url":"https://github.com/yhat.png","language":"Go","readme":"# scrape\n\nA simple, higher level interface for Go web scraping.\n\nWhen scraping with Go, I find myself redefining tree traversal and other\nutility functions.\n\nThis package is a place to put some simple tools which build on top of the\n[Go HTML parsing library](https://godoc.org/golang.org/x/net/html).\n\nFor the full interface check out the godoc\n[![GoDoc](https://godoc.org/github.com/yhat/scrape?status.svg)](https://godoc.org/github.com/yhat/scrape)\n\n## Sample\n\nScrape defines traversal functions like `Find` and `FindAll` while attempting\nto be generic. It also defines convenience functions such as `Attr` and `Text`.\n\n```go\n// Parse the page\nroot, err := html.Parse(resp.Body)\nif err != nil {\n    // handle error\n}\n// Search for the title\ntitle, ok := scrape.Find(root, scrape.ByTag(atom.Title))\nif ok {\n    // Print the title\n    fmt.Println(scrape.Text(title))\n}\n```\n\n## A full example: Scraping Hacker News\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"net/http\"\n\n\t\"github.com/yhat/scrape\"\n\t\"golang.org/x/net/html\"\n\t\"golang.org/x/net/html/atom\"\n)\n\nfunc main() {\n\t// request and parse the front page\n\tresp, err := http.Get(\"https://news.ycombinator.com/\")\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\troot, err := html.Parse(resp.Body)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\t// define a matcher\n\tmatcher := func(n *html.Node) bool {\n\t\t// must check for nil values\n\t\tif n.DataAtom == atom.A \u0026\u0026 n.Parent != nil \u0026\u0026 n.Parent.Parent != nil {\n\t\t\treturn scrape.Attr(n.Parent.Parent, \"class\") == \"athing\"\n\t\t}\n\t\treturn false\n\t}\n\t// grab all articles and print them\n\tarticles := scrape.FindAll(root, matcher)\n\tfor i, article := range articles {\n\t\tfmt.Printf(\"%2d %s (%s)\\n\", i, scrape.Text(article), scrape.Attr(article, \"href\"))\n\t}\n}\n```\n","funding_links":[],"categories":["All","Go"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhat%2Fscrape","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyhat%2Fscrape","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhat%2Fscrape/lists"}