{"id":13491111,"url":"https://github.com/anaskhan96/soup","last_synced_at":"2025-05-14T14:07:43.320Z","repository":{"id":44406612,"uuid":"82963274","full_name":"anaskhan96/soup","owner":"anaskhan96","description":"Web Scraper in Go, similar to BeautifulSoup","archived":false,"fork":false,"pushed_at":"2023-11-02T18:55:42.000Z","size":102,"stargazers_count":2195,"open_issues_count":22,"forks_count":167,"subscribers_count":35,"default_branch":"master","last_synced_at":"2025-04-11T06:13:50.137Z","etag":null,"topics":["beautifulsoup","go","golang","html-node","web-scraper","webscraper","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anaskhan96.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-23T19:28:58.000Z","updated_at":"2025-04-09T13:49:31.000Z","dependencies_parsed_at":"2023-10-20T17:05:57.302Z","dependency_job_id":"6fa59ca3-c467-4677-a9c5-5ef497beac89","html_url":"https://github.com/anaskhan96/soup","commit_stats":{"total_commits":132,"total_committers":21,"mean_commits":6.285714285714286,"dds":0.2954545454545454,"last_synced_commit":"fd2b5f70c1ddce68f515ce92f5f9bd9a399cf161"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anaskhan96%2Fsoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anaskhan96%2Fsoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anaskhan96%2Fsoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anaskhan96%2Fsoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anaskhan96","download_url":"https://codeload.github.com/anaskhan96/soup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254159194,"owners_count":22024558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","go","golang","html-node","web-scraper","webscraper","webscraping"],"created_at":"2024-07-31T19:00:53.656Z","updated_at":"2025-05-14T14:07:43.259Z","avatar_url":"https://github.com/anaskhan96.png","language":"Go","funding_links":[],"categories":["Go","Repositories"],"sub_categories":[],"readme":"# soup\n[![Build Status](https://travis-ci.org/anaskhan96/soup.svg?branch=master)](https://travis-ci.org/anaskhan96/soup)\n[![GoDoc](https://godoc.org/github.com/anaskhan96/soup?status.svg)](https://pkg.go.dev/github.com/anaskhan96/soup)\n[![Go Report Card](https://goreportcard.com/badge/github.com/anaskhan96/soup)](https://goreportcard.com/report/github.com/anaskhan96/soup)\n\n**Web Scraper in Go, similar to BeautifulSoup**\n\n*soup* is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.\n\nExported variables and functions implemented till now :\n```go\nvar Headers map[string]string // Set headers as a map of key-value pairs, an alternative to calling Header() individually\nvar Cookies map[string]string // Set cookies as a map of key-value  pairs, an alternative to calling Cookie() individually\nfunc Get(string) (string,error) {} // Takes the url as an argument, returns HTML string\nfunc GetWithClient(string, *http.Client) {} // Takes the url and a custom HTTP client as arguments, returns HTML string\nfunc Post(string, string, interface{}) (string, error) {} // Takes the url, bodyType, and payload as an argument, returns HTML string\nfunc PostForm(string, url.Values) {} // Takes the url and body. bodyType is set to \"application/x-www-form-urlencoded\"\nfunc Header(string, string) {} // Takes key,value pair to set as headers for the HTTP request made in Get()\nfunc Cookie(string, string) {} // Takes key, value pair to set as cookies to be sent with the HTTP request in Get()\nfunc HTMLParse(string) Root {} // Takes the HTML string as an argument, returns a pointer to the DOM constructed\nfunc Find([]string) Root {} // Element tag,(attribute key-value pair) as argument, pointer to first occurence returned\nfunc FindAll([]string) []Root {} // Same as Find(), but pointers to all occurrences returned\nfunc FindStrict([]string) Root {} //  Element tag,(attribute key-value pair) as argument, pointer to first occurence returned with exact matching values\nfunc FindAllStrict([]string) []Root {} // Same as FindStrict(), but pointers to all occurrences returned\nfunc FindNextSibling() Root {} // Pointer to the next sibling of the Element in the DOM returned\nfunc FindNextElementSibling() Root {} // Pointer to the next element sibling of the Element in the DOM returned\nfunc FindPrevSibling() Root {} // Pointer to the previous sibling of the Element in the DOM returned\nfunc FindPrevElementSibling() Root {} // Pointer to the previous element sibling of the Element in the DOM returned\nfunc Children() []Root {} // Find all direct children of this DOM element\nfunc Attrs() map[string]string {} // Map returned with all the attributes of the Element as lookup to their respective values\nfunc Text() string {} // Full text inside a non-nested tag returned, first half returned in a nested one\nfunc FullText() string {} // Full text inside a nested/non-nested tag returned\nfunc SetDebug(bool) {} // Sets the debug mode to true or false; false by default\nfunc HTML() {} // HTML returns the HTML code for the specific element\n```\n\n`Root` is a struct, containing three fields :\n* `Pointer` containing the pointer to the current html node\n* `NodeValue` containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode\n* `Error` containing an error in a struct if one occurrs, else `nil` is returned. \nA detailed text explaination of the error can be accessed using the `Error()` function. A field `Type` in this struct of type `ErrorType` will denote the kind of error that took place, which will consist of either of the following\n\t* `ErrUnableToParse`\n\t* `ErrElementNotFound`\n\t* `ErrNoNextSibling`\n\t* `ErrNoPreviousSibling`\n\t* `ErrNoNextElementSibling`\n\t* `ErrNoPreviousElementSibling`\n\t* `ErrCreatingGetRequest`\n\t* `ErrInGetRequest`\n\t* `ErrReadingResponse`\n\n## Installation\nInstall the package using the command\n```bash\ngo get github.com/anaskhan96/soup\n```\n\n## Example\nAn example code is given below to scrape the \"Comics I Enjoy\" part (text and its links) from [xkcd](https://xkcd.com).\n\n[More Examples](https://github.com/anaskhan96/soup/tree/master/examples)\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/anaskhan96/soup\"\n\t\"os\"\n)\n\nfunc main() {\n\tresp, err := soup.Get(\"https://xkcd.com\")\n\tif err != nil {\n\t\tos.Exit(1)\n\t}\n\tdoc := soup.HTMLParse(resp)\n\tlinks := doc.Find(\"div\", \"id\", \"comicLinks\").FindAll(\"a\")\n\tfor _, link := range links {\n\t\tfmt.Println(link.Text(), \"| Link :\", link.Attrs()[\"href\"])\n\t}\n}\n```\n\n## Contributions\nThis package was developed in my free time. However, contributions from everybody in the community are welcome, to make it a better web scraper. If you think there should be a particular feature or function included in the package, feel free to open up a new issue or pull request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanaskhan96%2Fsoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanaskhan96%2Fsoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanaskhan96%2Fsoup/lists"}