{"id":32948862,"url":"https://github.com/aafeher/go-sitemap-parser","last_synced_at":"2026-01-14T12:33:37.989Z","repository":{"id":223746940,"uuid":"754710110","full_name":"aafeher/go-sitemap-parser","owner":"aafeher","description":"Go library for parsing Sitemaps","archived":false,"fork":false,"pushed_at":"2025-08-25T15:45:59.000Z","size":139,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-12T22:02:51.071Z","etag":null,"topics":["go","golang","robots-txt","robotstxt","sitemap","sitemap-parser","sitemap-xml","sitemap-xml-gz","sitemaps","sitemapxml"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aafeher.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-08T16:12:46.000Z","updated_at":"2025-08-25T15:46:03.000Z","dependencies_parsed_at":"2024-02-21T21:44:55.775Z","dependency_job_id":"152f7459-2829-41f6-b4ab-1ec86111ec02","html_url":"https://github.com/aafeher/go-sitemap-parser","commit_stats":null,"previous_names":["aafeher/go-sitemap-parser"],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/aafeher/go-sitemap-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aafeher%2Fgo-sitemap-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aafeher%2Fgo-sitemap-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aafeher%2Fgo-sitemap-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aafeher%2Fgo-sitemap-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aafeher","download_url":"https://codeload.github.com/aafeher/go-sitemap-parser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aafeher%2Fgo-sitemap-parser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28420798,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T10:47:48.104Z","status":"ssl_error","status_checked_at":"2026-01-14T10:46:19.031Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","robots-txt","robotstxt","sitemap","sitemap-parser","sitemap-xml","sitemap-xml-gz","sitemaps","sitemapxml"],"created_at":"2025-11-12T20:00:35.670Z","updated_at":"2026-01-14T12:33:37.981Z","avatar_url":"https://github.com/aafeher.png","language":"Go","funding_links":[],"categories":["Text Processing","文本处理","Template Engines"],"sub_categories":["Scrapers","刮刀"],"readme":"# go-sitemap-parser\n\n[![codecov](https://codecov.io/gh/aafeher/go-sitemap-parser/graph/badge.svg?token=KEABI9UTQY)](https://codecov.io/gh/aafeher/go-sitemap-parser)\n[![Go](https://github.com/aafeher/go-sitemap-parser/actions/workflows/go.yml/badge.svg)](https://github.com/aafeher/go-sitemap-parser/actions/workflows/go.yml)\n[![Go Reference](https://pkg.go.dev/badge/github.com/aafeher/go-sitemap-parser.svg)](https://pkg.go.dev/github.com/aafeher/go-sitemap-parser)\n[![Go Report Card](https://goreportcard.com/badge/github.com/aafeher/go-sitemap-parser)](https://goreportcard.com/report/github.com/aafeher/go-sitemap-parser)\n[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)\n\nA Go package to parse XML Sitemaps compliant with the [Sitemaps.org protocol](http://www.sitemaps.org/protocol.html).\n\n## Features\n- Recursive parsing\n\n## Formats supported\n- `robots.txt`\n- XML `.xml`\n- Gzip compressed XML `.xml.gz`\n\n## Installation\n\n```bash\ngo get github.com/aafeher/go-sitemap-parser\n```\n\n```go\nimport \"github.com/aafeher/go-sitemap-parser\"\n```\n\n## Usage\n\n### Create instance\n\nTo create a new instance with default settings, you can simply call the `New()` function.\n```go\ns := sitemap.New()\n```\n\n### Configuration defaults\n\n - userAgent: `\"go-sitemap-parser (+https://github.com/aafeher/go-sitemap-parser/blob/main/README.md)\"`\n - fetchTimeout: `3` seconds\n - multiThread: `true`\n\n### Overwrite defaults\n\n#### User Agent\n\nTo set the user agent, use the `SetUserAgent()` function.\n\n```go\ns := sitemap.New()\ns = s.SetUserAgent(\"YourUserAgent\")\n```\n... or ...\n```go\ns := sitemap.New().SetUserAgent(\"YourUserAgent\")\n```\n\n#### Fetch timeout\n\nTo set the fetch timeout, use the `SetFetchTimeout()` function. It should be specified in seconds as an **uint8** value.\n\n```go\ns := sitemap.New()\ns = s.SetFetchTimeout(10)\n```\n... or ...\n```go\ns := sitemap.New().SetFetchTimeout(10)\n```\n\n#### Multi-threading\n\nBy default, the package uses multi-threading to fetch and parse sitemaps concurrently.\nTo set the multi-thread flag on/off, use the `SetMultiThread()` function.\n\n```go\ns := sitemap.New()\ns = s.SetMultiThread(false)\n```\n... or ...\n```go\ns := sitemap.New().SetMultiThread(false)\n```\n\n#### Follow rules\n\nTo set the follow rules, use the `SetFollow()` function. It should be specified a `[]string` value.\nIt is a list of regular expressions. When parsing a sitemap index, only sitemaps with a `loc` that matches one of these expressions will be followed and parsed.\nIf no follow rules are provided, all sitemaps in the index are followed.\n\n```go\ns := sitemap.New()\ns.SetFollow([]string{\n\t`\\.xml$`,\n\t`\\.xml\\.gz$`,\n})\n```\n... or ...\n```go\ns := sitemap.New().SetFollow([]string{\n\t`\\.xml$`,\n\t`\\.xml\\.gz$`,\n})\n```\n\n#### URL rules\n\nTo set the URL rules, use the `SetRules()` function. It should be specified a `[]string` value.\nIt is a list of regular expressions. Only URLs that match one of these expressions will be included in the final result.\nIf no rules are provided, all URLs found are included.\n\n```go\ns := sitemap.New()\ns.SetRules([]string{\n\t`product/`,\n\t`category/`,\n})\n```\n... or ...\n```go\ns := sitemap.New().SetRules([]string{\n\t`product/`,\n\t`category/`,\n})\n```\n\n#### Chaining methods\n\nIn both cases, the functions return a pointer to the main object of the package, allowing you to chain these setting methods in a fluent interface style:\n```go\ns := sitemap.New().SetUserAgent(\"YourUserAgent\").SetFetchTimeout(10)\n```\n\n### Parse\n\nOnce you have properly initialized and configured your instance, you can parse sitemaps using the `Parse()` function.\n\nThe `Parse()` function takes in two parameters:\n - `url`: the URL of the sitemap to be parsed,\n   - `url` can be a robots.txt or sitemapindex or sitemap (urlset)\n - `urlContent`: an optional string pointer for the content of the URL.\n\nIf you wish to provide the content yourself, pass the content as the second parameter. If not, simply pass nil and the function will fetch the content on its own.\nThe `Parse()` function performs concurrent parsing and fetching optimized by the use of Go's goroutines and sync package, ensuring efficient sitemap handling.\n\n```go\ns, err := s.Parse(\"https://www.sitemaps.org/sitemap.xml\", nil)\n```\nIn this example, sitemap is parsed from \"https://www.sitemaps.org/sitemap.xml\". The function fetches the content itself, as we passed nil as the urlContent.\n\n## Examples\n\nExamples can be found in [/examples](https://github.com/aafeher/go-sitemap-parser/tree/main/examples).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faafeher%2Fgo-sitemap-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faafeher%2Fgo-sitemap-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faafeher%2Fgo-sitemap-parser/lists"}