{"id":13493172,"url":"https://github.com/PuerkitoBio/purell","last_synced_at":"2025-03-28T11:31:51.548Z","repository":{"id":4725493,"uuid":"5874026","full_name":"PuerkitoBio/purell","owner":"PuerkitoBio","description":"tiny Go library to normalize URLs","archived":false,"fork":false,"pushed_at":"2024-03-19T17:42:59.000Z","size":102,"stargazers_count":473,"open_issues_count":4,"forks_count":58,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-10-29T20:32:54.328Z","etag":null,"topics":["normalizer","url"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PuerkitoBio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-09-19T15:34:36.000Z","updated_at":"2024-10-14T07:30:27.000Z","dependencies_parsed_at":"2022-08-09T14:08:33.773Z","dependency_job_id":"5ff85dd1-2fdc-4f44-9930-43dff8e6464b","html_url":"https://github.com/PuerkitoBio/purell","commit_stats":{"total_commits":90,"total_committers":14,"mean_commits":6.428571428571429,"dds":"0.30000000000000004","last_synced_commit":"49367e944ff1b9e4671d2c477bffd54da9672191"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PuerkitoBio%2Fpurell","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PuerkitoBio%2Fpurell/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PuerkitoBio%2Fpurell/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PuerkitoBio%2Fpurell/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PuerkitoBio","download_url":"https://codeload.github.com/PuerkitoBio/purell/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246021041,"owners_count":20710871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["normalizer","url"],"created_at":"2024-07-31T19:01:12.877Z","updated_at":"2025-03-28T11:31:51.318Z","avatar_url":"https://github.com/PuerkitoBio.png","language":"Go","funding_links":[],"categories":["开源类库","Go","Open source library"],"sub_categories":["HTTP","HTTP Print Test"],"readme":"# Purell\n\nPurell is a tiny Go library to normalize URLs. It returns a pure URL. Pure-ell. Sanitizer and all. Yeah, I know...\n\nBased on the [wikipedia paper][wiki] and the [RFC 3986 document][rfc].\n\n[![CI](https://github.com/PuerkitoBio/purell/actions/workflows/ci.yml/badge.svg)](https://github.com/PuerkitoBio/purell/actions/workflows/ci.yml)\n\n## Install\n\n`go get github.com/PuerkitoBio/purell`\n\n## Changelog\n\n*    **v1.1.1** : Fix failing test due to Go1.12 changes (thanks to @ianlancetaylor).\n*    **2016-11-14 (v1.1.0)** : IDN: Conform to RFC 5895: Fold character width (thanks to @beeker1121).\n*    **2016-07-27 (v1.0.0)** : Normalize IDN to ASCII (thanks to @zenovich).\n*    **2015-02-08** : Add fix for relative paths issue ([PR #5][pr5]) and add fix for unnecessary encoding of reserved characters ([see issue #7][iss7]).\n*    **v0.2.0** : Add benchmarks, Attempt IDN support.\n*    **v0.1.0** : Initial release.\n\n## Examples\n\nFrom `example_test.go` (note that in your code, you would import \"github.com/PuerkitoBio/purell\", and would prefix references to its methods and constants with \"purell.\"):\n\n```go\npackage purell\n\nimport (\n  \"fmt\"\n  \"net/url\"\n)\n\nfunc ExampleNormalizeURLString() {\n  if normalized, err := NormalizeURLString(\"hTTp://someWEBsite.com:80/Amazing%3f/url/\",\n    FlagLowercaseScheme|FlagLowercaseHost|FlagUppercaseEscapes); err != nil {\n    panic(err)\n  } else {\n    fmt.Print(normalized)\n  }\n  // Output: http://somewebsite.com:80/Amazing%3F/url/\n}\n\nfunc ExampleMustNormalizeURLString() {\n  normalized := MustNormalizeURLString(\"hTTpS://someWEBsite.com:443/Amazing%fa/url/\",\n    FlagsUnsafeGreedy)\n  fmt.Print(normalized)\n\n  // Output: http://somewebsite.com/Amazing%FA/url\n}\n\nfunc ExampleNormalizeURL() {\n  if u, err := url.Parse(\"Http://SomeUrl.com:8080/a/b/.././c///g?c=3\u0026a=1\u0026b=9\u0026c=0#target\"); err != nil {\n    panic(err)\n  } else {\n    normalized := NormalizeURL(u, FlagsUsuallySafeGreedy|FlagRemoveDuplicateSlashes|FlagRemoveFragment)\n    fmt.Print(normalized)\n  }\n\n  // Output: http://someurl.com:8080/a/c/g?c=3\u0026a=1\u0026b=9\u0026c=0\n}\n```\n\n## API\n\nAs seen in the examples above, purell offers three methods, `NormalizeURLString(string, NormalizationFlags) (string, error)`, `MustNormalizeURLString(string, NormalizationFlags) (string)` and `NormalizeURL(*url.URL, NormalizationFlags) (string)`. They all normalize the provided URL based on the specified flags. Here are the available flags:\n\n```go\nconst (\n\t// Safe normalizations\n\tFlagLowercaseScheme           NormalizationFlags = 1 \u003c\u003c iota // HTTP://host -\u003e http://host, applied by default in Go1.1\n\tFlagLowercaseHost                                            // http://HOST -\u003e http://host\n\tFlagUppercaseEscapes                                         // http://host/t%ef -\u003e http://host/t%EF\n\tFlagDecodeUnnecessaryEscapes                                 // http://host/t%41 -\u003e http://host/tA\n\tFlagEncodeNecessaryEscapes                                   // http://host/!\"#$ -\u003e http://host/%21%22#$\n\tFlagRemoveDefaultPort                                        // http://host:80 -\u003e http://host\n\tFlagRemoveEmptyQuerySeparator                                // http://host/path? -\u003e http://host/path\n\n\t// Usually safe normalizations\n\tFlagRemoveTrailingSlash // http://host/path/ -\u003e http://host/path\n\tFlagAddTrailingSlash    // http://host/path -\u003e http://host/path/ (should choose only one of these add/remove trailing slash flags)\n\tFlagRemoveDotSegments   // http://host/path/./a/b/../c -\u003e http://host/path/a/c\n\n\t// Unsafe normalizations\n\tFlagRemoveDirectoryIndex   // http://host/path/index.html -\u003e http://host/path/\n\tFlagRemoveFragment         // http://host/path#fragment -\u003e http://host/path\n\tFlagForceHTTP              // https://host -\u003e http://host\n\tFlagRemoveDuplicateSlashes // http://host/path//a///b -\u003e http://host/path/a/b\n\tFlagRemoveWWW              // http://www.host/ -\u003e http://host/\n\tFlagAddWWW                 // http://host/ -\u003e http://www.host/ (should choose only one of these add/remove WWW flags)\n\tFlagSortQuery              // http://host/path?c=3\u0026b=2\u0026a=1\u0026b=1 -\u003e http://host/path?a=1\u0026b=1\u0026b=2\u0026c=3\n\n\t// Normalizations not in the wikipedia article, required to cover tests cases\n\t// submitted by jehiah\n\tFlagDecodeDWORDHost           // http://1113982867 -\u003e http://66.102.7.147\n\tFlagDecodeOctalHost           // http://0102.0146.07.0223 -\u003e http://66.102.7.147\n\tFlagDecodeHexHost             // http://0x42660793 -\u003e http://66.102.7.147\n\tFlagRemoveUnnecessaryHostDots // http://.host../path -\u003e http://host/path\n\tFlagRemoveEmptyPortSeparator  // http://host:/path -\u003e http://host/path\n\n\t// Convenience set of safe normalizations\n\tFlagsSafe NormalizationFlags = FlagLowercaseHost | FlagLowercaseScheme | FlagUppercaseEscapes | FlagDecodeUnnecessaryEscapes | FlagEncodeNecessaryEscapes | FlagRemoveDefaultPort | FlagRemoveEmptyQuerySeparator\n\n\t// For convenience sets, \"greedy\" uses the \"remove trailing slash\" and \"remove www. prefix\" flags,\n\t// while \"non-greedy\" uses the \"add (or keep) the trailing slash\" and \"add www. prefix\".\n\n\t// Convenience set of usually safe normalizations (includes FlagsSafe)\n\tFlagsUsuallySafeGreedy    NormalizationFlags = FlagsSafe | FlagRemoveTrailingSlash | FlagRemoveDotSegments\n\tFlagsUsuallySafeNonGreedy NormalizationFlags = FlagsSafe | FlagAddTrailingSlash | FlagRemoveDotSegments\n\n\t// Convenience set of unsafe normalizations (includes FlagsUsuallySafe)\n\tFlagsUnsafeGreedy    NormalizationFlags = FlagsUsuallySafeGreedy | FlagRemoveDirectoryIndex | FlagRemoveFragment | FlagForceHTTP | FlagRemoveDuplicateSlashes | FlagRemoveWWW | FlagSortQuery\n\tFlagsUnsafeNonGreedy NormalizationFlags = FlagsUsuallySafeNonGreedy | FlagRemoveDirectoryIndex | FlagRemoveFragment | FlagForceHTTP | FlagRemoveDuplicateSlashes | FlagAddWWW | FlagSortQuery\n\n\t// Convenience set of all available flags\n\tFlagsAllGreedy    = FlagsUnsafeGreedy | FlagDecodeDWORDHost | FlagDecodeOctalHost | FlagDecodeHexHost | FlagRemoveUnnecessaryHostDots | FlagRemoveEmptyPortSeparator\n\tFlagsAllNonGreedy = FlagsUnsafeNonGreedy | FlagDecodeDWORDHost | FlagDecodeOctalHost | FlagDecodeHexHost | FlagRemoveUnnecessaryHostDots | FlagRemoveEmptyPortSeparator\n)\n```\n\nFor convenience, the set of flags `FlagsSafe`, `FlagsUsuallySafe[Greedy|NonGreedy]`, `FlagsUnsafe[Greedy|NonGreedy]` and `FlagsAll[Greedy|NonGreedy]` are provided for the similarly grouped normalizations on [wikipedia's URL normalization page][wiki]. You can add (using the bitwise OR `|` operator) or remove (using the bitwise AND NOT `\u0026^` operator) individual flags from the sets if required, to build your own custom set.\n\nThe [full godoc reference is available on gopkgdoc][godoc].\n\nSome things to note:\n\n*    `FlagDecodeUnnecessaryEscapes`, `FlagEncodeNecessaryEscapes`, `FlagUppercaseEscapes` and `FlagRemoveEmptyQuerySeparator` are always implicitly set, because internally, the URL string is parsed as an URL object, which automatically decodes unnecessary escapes, uppercases and encodes necessary ones, and removes empty query separators (an unnecessary `?` at the end of the url). So this operation cannot **not** be done. For this reason, `FlagRemoveEmptyQuerySeparator` (as well as the other three) has been included in the `FlagsSafe` convenience set, instead of `FlagsUnsafe`, where Wikipedia puts it.\n\n*    The `FlagDecodeUnnecessaryEscapes` decodes the following escapes (*from -\u003e to*):\n    -    %24 -\u003e $\n    -    %26 -\u003e \u0026\n    -    %2B-%3B -\u003e +,-./0123456789:;\n    -    %3D -\u003e =\n    -    %40-%5A -\u003e @ABCDEFGHIJKLMNOPQRSTUVWXYZ\n    -    %5F -\u003e _\n    -    %61-%7A -\u003e abcdefghijklmnopqrstuvwxyz\n    -    %7E -\u003e ~\n\n\n*    When the `NormalizeURL` function is used (passing an URL object), this source URL object is modified (that is, after the call, the URL object will be modified to reflect the normalization).\n\n*    The *replace IP with domain name* normalization (`http://208.77.188.166/ → http://www.example.com/`) is obviously not possible for a library without making some network requests. This is not implemented in purell.\n\n*    The *remove unused query string parameters* and *remove default query parameters* are also not implemented, since this is a very case-specific normalization, and it is quite trivial to do with an URL object.\n\n### Safe vs Usually Safe vs Unsafe\n\nPurell allows you to control the level of risk you take while normalizing an URL. You can aggressively normalize, play it totally safe, or anything in between.\n\nConsider the following URL:\n\n`HTTPS://www.RooT.com/toto/t%45%1f///a/./b/../c/?z=3\u0026w=2\u0026a=4\u0026w=1#invalid`\n\nNormalizing with the `FlagsSafe` gives:\n\n`https://www.root.com/toto/tE%1F///a/./b/../c/?z=3\u0026w=2\u0026a=4\u0026w=1#invalid`\n\nWith the `FlagsUsuallySafeGreedy`:\n\n`https://www.root.com/toto/tE%1F///a/c?z=3\u0026w=2\u0026a=4\u0026w=1#invalid`\n\nAnd with `FlagsUnsafeGreedy`:\n\n`http://root.com/toto/tE%1F/a/c?a=4\u0026w=1\u0026w=2\u0026z=3`\n\n## TODOs\n\n*    Add a class/default instance to allow specifying custom directory index names? At the moment, removing directory index removes `(^|/)((?:default|index)\\.\\w{1,4})$`.\n\n## Thanks / Contributions\n\n@rogpeppe\n@jehiah\n@opennota\n@pchristopher1275\n@zenovich\n@beeker1121\n\n## License\n\nThe [BSD 3-Clause license][bsd].\n\n[bsd]: http://opensource.org/licenses/BSD-3-Clause\n[wiki]: http://en.wikipedia.org/wiki/URL_normalization\n[rfc]: http://tools.ietf.org/html/rfc3986#section-6\n[godoc]: http://go.pkgdoc.org/github.com/PuerkitoBio/purell\n[pr5]: https://github.com/PuerkitoBio/purell/pull/5\n[iss7]: https://github.com/PuerkitoBio/purell/issues/7\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPuerkitoBio%2Fpurell","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPuerkitoBio%2Fpurell","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPuerkitoBio%2Fpurell/lists"}