{"id":26504900,"url":"https://github.com/keinberger/goscraper","last_synced_at":"2025-03-20T20:46:16.291Z","repository":{"id":39362486,"uuid":"480839366","full_name":"Keinberger/goScraper","owner":"Keinberger","description":"Go web scraper library","archived":false,"fork":false,"pushed_at":"2022-06-01T06:53:47.000Z","size":70,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-06-21T13:10:01.128Z","etag":null,"topics":["go","golang","html-node","scraper","web-scraper"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Keinberger.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-12T14:12:23.000Z","updated_at":"2022-05-22T20:55:21.000Z","dependencies_parsed_at":"2022-09-17T05:41:32.802Z","dependency_job_id":null,"html_url":"https://github.com/Keinberger/goScraper","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Keinberger%2FgoScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Keinberger%2FgoScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Keinberger%2FgoScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Keinberger%2FgoScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Keinberger","download_url":"https://codeload.github.com/Keinberger/goScraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244686978,"owners_count":20493535,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","html-node","scraper","web-scraper"],"created_at":"2025-03-20T20:46:15.733Z","updated_at":"2025-03-20T20:46:16.286Z","avatar_url":"https://github.com/Keinberger.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://app.travis-ci.com/Keinberger/goScraper.svg?token=ki5cZ9pSp5tPxqp4fztS\u0026branch=main)](https://app.travis-ci.com/Keinberger/goScraper)\n[![Go Report Card](https://goreportcard.com/badge/github.com/keinberger/goScraper)](https://goreportcard.com/report/github.com/keinberger/goScraper)\n[![Go Reference](https://pkg.go.dev/badge/github.com/keinberger/goScraper.svg)](https://pkg.go.dev/github.com/keinberger/goScraper)\n\n# goScraper\n\ngoScraper is a small web-scraping library for Go.\n\n## Installation\n\nPackage can be installed manually using \u003cbr /\u003e\n```go\ngo get github.com/keinberger/goScraper\n```\n\nBut may also be normally imported when using go modules\u003cbr /\u003e\n```go\nimport \"github.com/keinberger/goScraper\"\n```\n\n## Usage\n\nThe package provides several exported functions to provide high functionality.\u003cbr /\u003e\nHowever, the main scrape functions \n```go\nfunc (w Website) Scrape(funcs map[string]interface{}, vars ...interface{}) (string, error)\n```\n```go\nfunc (el lookUpElement) ScrapeTreeForElement(node *html.Node) (string, error)\n```\n```go\nfunc (e *Element) GetElementNodes(doc *html.Node) ([]*html.Node, error)\n```\nshould be the preffered way to use the scraper library.\n\nAs these functions use the other exported functions, as well, it provides all the features of the library packed together\nand guided by only having to provide a minimal amount of input. For the main `Scrape()` function, the user input is scoped to only having to provide a custom Website variable.\n\n### Example using `Scrape()`\n\nThis example provides a tutorial on how to scrape a website for specific html elements. The html elements will be returned chained-together, separated by a custom separator.\n\nThe example will use a custom website variable, where the `Scrape()` function will be called upon. The arguments of the `Scrape()` function are optional and will not be needed in this example.\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/keinberger/goScraper\"\n)\n\nfunc main() {\n\twebsite := scraper.Website{\n\t\tURL: \"https://wikipedia.org/wiki/wikipedia\",\n\t\tElements: []scraper.Element{\n\t\t\t{\n\t\t\t\tHtmlElement: scraper.HtmlElement{\n\t\t\t\t\tTyp: \"h1\",\n\t\t\t\t\tTags: []scraper.Tag{\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tTyp:   \"id\",\n\t\t\t\t\t\t\tValue: \"firstHeading\",\n\t\t\t\t\t\t},\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t},\n\t\t\t{\n\t\t\t\tHtmlElement: scraper.HtmlElement{\n\t\t\t\t\tTyp: \"td\",\n\t\t\t\t\tTags: []scraper.Tag{\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tTyp:   \"class\",\n\t\t\t\t\t\t\tValue: \"infobox-data\",\n\t\t\t\t\t\t},\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t\tIndex: 0,\n\t\t\t},\n\t\t},\n\t\tSeparator: \", \",\n\t}\n\n\tscraped, err := website.Scrape(nil)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\tfmt.Println(scraped)\n}\n```\n\n### Example using `ScrapeTreeForElement()`\nThis example will use ScrapeTreeForElement, which will return the content of an html element (*html.Node) inside of a bigger node tree. This function is especially useful, if one only wants one html element from a website, but still wants to retain control over formatting settings.\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/keinberger/scraper\"\n)\n\nfunc main() {\n\thtmlNode, err := scraper.GetHTMLNode(\"https://wikipedia.org/wiki/wikipedia\")\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\telement := scraper.Element{\n\t\tHtmlElement: scraper.HtmlElement{\n\t\t\tTyp: \"li\",\n\t\t\tTags: []scraper.Tag{\n\t\t\t\t{\n\t\t\t\t\tTyp:   \"id\",\n\t\t\t\t\tValue: \"ca-viewsource\",\n\t\t\t\t},\n\t\t\t},\n\t\t},\n\t}\n\tcontent, err := element.ScrapeTreeForElement(htmlNode)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\tfmt.Println(content)\n}\n```\n\n### Other exported functions\nGetElementNodes returns all html elements `[]*html.Node` found in an html code `htmlNode *html.Node` with the same properties as `e *Element`\n```go\nfunc (e *Element) GetElementNodes(htmlNode *html.Node) ([]*html.Node, error)\n```\nGetTextOfNodes returns the content of an html element `node *html.Node`\n```go\nfunc GetTextOfNode(node *html.Node, notRecursive bool) (text string) \n```\nRenderNode returns the string representation of a `node *html.Node`\n```go\nfunc RenderNode(node *html.Node) string\n```\nGetHTMLNode returns the node tree `*html.Node` of the html string data\n```go\nfunc GetHTMLNode(data string) (*html.Node, error)\n```\nGetHTML returns the HTML data of URL\n```go\nfunc GetHTML(URL string) (string, error)\n```\n\n## Contributions\n\nI created this project as a side-project from my normal work. Any contributions are very welcome. Just open up new issues or create a pull request if you want to contribute.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeinberger%2Fgoscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeinberger%2Fgoscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeinberger%2Fgoscraper/lists"}