Projects in Awesome Lists tagged with url-normalization
A curated list of projects in awesome lists tagged with url-normalization .
https://github.com/sindresorhus/normalize-url
Normalize a URL
compare-urls npm-package sanitize-url url-normalization
Last synced: 14 May 2025
https://github.com/patternhelloworld/url-knife
Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity
email-extractor email-parser email-parsing pre-processing uri-template url-extractor url-normalization url-normalizer url-parser url-parsing url-validation
Last synced: 09 Jul 2025
https://github.com/hanover-computing/canonicize-url
Get a stable, canonical version of any URL, with DNS and HTTPS checks, redirects, tracker stripping, and canonical link extraction!
amp canonical canonical-urls compare-urls javascript normalize-url npm-package privacy sanitize-url ssrf tracker tracking url-normalization
Last synced: 28 Jul 2025
https://github.com/vladkens/url-normalize
๐๐งน Normalize URLs to a standardized form. HTTPS by default, flexible configuration, custom protocols, domain extraction, humazing URL, and punycode support. Both CJS & ESM modules available.
cjs esm normalization normalizer npm-package punycode typescript url url-normalization url-normalizer
Last synced: 24 Apr 2025
https://github.com/seroperson/urlopt4s
Allows you to remove ad/tracking query params from a given URL in Scala
adguard graaljs js query-params-filtering scala url-canonicalization url-normalization url-query
Last synced: 07 Mar 2026
https://github.com/opensite-ai/domain_extractor
๐ Lightweight Ruby library for parsing URLs and extracting domain components with accurate multi-part TLD support. Handles nested subdomains, query parameters, and URL normalization. Perfect for web scraping, analytics, and URL manipulation. Built on URI and public_suffix gem.
analytics domain-analysis domain-extraction domain-parser public-suffix ruby ruby-library rubygem subdomain-parser tld-parser url-manipulation url-normalization url-parser url-parsing web-scraping
Last synced: 12 Dec 2025
https://github.com/simonpierreboucher/crawler
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
concurrent-crawling content-extraction data-collection data-extraction-pipeline data-preservation-and-recovery data-scraping error-handling html-parsing http-requests metadata-storage modular-design pdf-text-extraction python-crawler rate-limiting structured-data-storage text-processing url-normalization web-crawling yaml-configuration
Last synced: 30 Mar 2025
https://github.com/manu-sh/http_normalizer
http url normalization for web crawlers
crawler http spider url-normalization
Last synced: 12 Jun 2025
https://github.com/hueristiq/url
A Go (Golang) package for URL parsing and normalization.
go golang golang-package url url-normalization url-normalizer url-parser url-parsing
Last synced: 15 May 2025
https://github.com/chipslays/php-url-fingerprint
๐ Pathor is a PHP library for normalizing, analyzing, and comparing URLs.
fingerprint url url-fingerprint url-normalization url-normalizer
Last synced: 09 Feb 2026