{"id":15887828,"url":"https://github.com/seroperson/urlopt4s","last_synced_at":"2026-03-07T18:04:33.359Z","repository":{"id":236081269,"uuid":"791872304","full_name":"seroperson/urlopt4s","owner":"seroperson","description":"Allows you to remove ad/tracking query params from a given URL in Scala","archived":false,"fork":false,"pushed_at":"2025-08-15T08:11:25.000Z","size":89,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-15T10:09:54.404Z","etag":null,"topics":["adguard","graaljs","js","query-params-filtering","scala","url-canonicalization","url-normalization","url-query"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seroperson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-04-25T14:33:04.000Z","updated_at":"2025-08-15T08:11:28.000Z","dependencies_parsed_at":"2024-10-27T23:45:10.521Z","dependency_job_id":"4dd9424c-0327-4065-a253-b7fc59d5184a","html_url":"https://github.com/seroperson/urlopt4s","commit_stats":null,"previous_names":["seroperson/urlopt4s"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/seroperson/urlopt4s","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seroperson%2Furlopt4s","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seroperson%2Furlopt4s/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seroperson%2Furlopt4s/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seroperson%2Furlopt4s/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seroperson","download_url":"https://codeload.github.com/seroperson/urlopt4s/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seroperson%2Furlopt4s/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30225468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T17:00:40.062Z","status":"ssl_error","status_checked_at":"2026-03-07T17:00:39.026Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adguard","graaljs","js","query-params-filtering","scala","url-canonicalization","url-normalization","url-query"],"created_at":"2024-10-06T06:05:10.193Z","updated_at":"2026-03-07T18:04:33.353Z","avatar_url":"https://github.com/seroperson.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# urlopt4s\n\n[![Build Status](https://github.com/seroperson/urlopt4s/actions/workflows/build.yml/badge.svg)](https://github.com/seroperson/urlopt4s/actions/workflows/build.yml)\n[![Maven Central Version](https://img.shields.io/maven-central/v/me.seroperson/urlopt4s_2.12)](https://mvnrepository.com/artifact/me.seroperson/urlopt4s)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/seroperson/urlopt4s/LICENSE)\n\nThis small library allows you to remove advertising and tracking query\nparameters from a given URL in Scala. It does not contain any filtering logic\nitself, but instead uses part of the JS [AdGuard][1] adblocker engine under the\nhood. So, besides of its’ main function, you can treat this library as a some\nkind of PoC that it is totally possible to run any (almost) kind of JS code on a\nJVM. Sometimes this method can be excessive, so [read on](#preface) to find out\nwhy we need to do it this way.\n\n## Installation\n\nIn case if you use `sbt`:\n\n```sbt\nlibraryDependencies += \"me.seroperson\" %% \"urlopt4s\" % \"0.3.0\"\n```\n\nIn case of `mill`:\n\n```scala\nivy\"me.seroperson::urlopt4s::0.3.0\"\n```\n\n## How to use it\n\nThe main object is `UrlOptimizer[F]`. It provides a Scala API to remove\nadvertising and tracking query parameters from a given URL and it interacts with\na JS adblocker engine via GraalJS under the hood. As of usage example, you can\ncheck the tests and also `example` directory, but I'll show some example here\ntoo:\n\n```scala\n// ...\n\nobject ExampleApp extends IOApp {\n\n  override def run(args: List[String]): IO[ExitCode] = UrlOptimizer[IO]()\n    .use { urlOptimizer =\u003e\n      for {\n        result \u003c- urlOptimizer.removeAdQueryParams(\"https://www.google.com/?utm_source=test\")\n        _ \u003c- IO.println(result) // \"https://www.google.com/\"\n      } yield ExitCode.Success\n    }\n\n}\n```\n\nYou can run a real example like so:\n\n```\n./millw 'example.cats[2.12.18].run' 'https://google.com/?utm_source=test'\n```\n\nThe function `removeAdQueryParams` takes some time to execute, so you usually\nwant to run your processing in the background. It can be used concurrently, as\nall the necessary locks have been implemented internally. Also, be sure not to\nblock on `UrlOptimizer` resource initialization, as it may take some time for it\nto start.\n\nYou can also pass your own custom rule list to the `UrlOptimizer.apply` method\nif the default one (see `urlopt4s/resources/rules.txt`) does not cover your\nneeds. You can also redefine the GraalJS context, but usually this is not\nnecessary.\n\n## Preface\n\nI encountered the neccessity to remove advertising and tracking query parameters\nwhile developing my pet-project, a Telegram bot \"[the advanced link saver][4]\".\nI wanted to implement a feature that strips redundant query parameters from URLs\nand, the first thing which I coded was a simple filter for a predefined set of\ncommonly used tracking query parameters, such as `utm_source`, `utm_medium`,\n`fbclid`, etc.\n\nQuite quickly, I realized that this method didn't work well enough. There were a\nreally lot of parameters around, and you couldn't cover all of them. For\nexample, the Google Search URL typically looks like this:\n\n```text\nhttps://www.google.com/search?q=hello\u0026sca_esv=494940dbc25649b8\u0026source=hp\u0026ei=rmEhZuPhF6eLxc8P6-C_mAI\u0026iflsig=ANes7DEAAAAAZiFvvg9IypzVMAznAHWL3LCM0tiJHpsL\u0026udm=\u0026ved=0ahUKEwjj8PzlrcyFAxWnRfEDHWvwDyMQ4dUDCA0\u0026uact=5\u0026oq=hello\u0026gs_lp=Egdnd3Mtd2l6IgVoZWxsbzIIEC4YgAQYsQMyCxAuGIAEGLEDGNQCMggQABiABBixAzIIEC4YgAQYsQMyCBAuGIAEGLEDMgsQLhiABBixAxiDATILEC4YgAQYsQMY1AIyCBAAGIAEGLEDMggQABiABBixAzIIEAAYgAQYsQNInAhQAFivBnAAeACQAQCYAUigAdwCqgEBNbgBA8gBAPgBAZgCBaAC5wLCAhEQLhiABBixAxjRAxiDARjHAcICDhAuGIAEGMcBGI4FGK8BwgIEEAAYA8ICCxAAGIAEGLEDGIMBwgIYEC4YgAQYARixAxjRAxiDARjHARiKBRgKwgIFEAAYgATCAhEQLhiABBixAxiDARjHARivAZgDAJIHATWgB5VV\u0026sclient=gws-wiz\n```\n\nThe only part that matters here is:\n\n```text\nhttps://www.google.com/search?q=hello\n```\n\nOf course, you can manually collect a list of redundant query parameters by\nvisiting the most popular websites and carefully searching for truly ad-related\nquery parameters. However, if we dive deeply:\n\n- It's a really time-consuming process and it's hard to do it properly.\n- A parameter that you think is related to ads may be actually so at one website\n  but not at another, and you may never know if something has gone wrong.\n- Sometimes you want to filter parameters or match domain names using regular\n  expressions.\n- And probably some other points that aren't so obvious.\n\nAs you can see, this simple task becomes the more and more difficult as you go\nalong.\n\n## Implementation details\n\nThe method I came up with is reusing the code that popular web adblockers\nalready have. If you have a good adblocker installed as a browser extension, you\nmay notice that it sometimes rewrites your URLs to get rid of advertising or\ntracking query parameters. This means that adblockers actually already manage a\nlist of trashy query parameters and have all the necessary code to filter URLs.\nWe just need to find this code and reuse it.\n\nI have chosen [AdGuard ecosystem][1] to do this. They have very friendly\ndocumentation, most things are open-source, and it is relatively easy to get\nthings right with them. Project [tsurlfilter][2] is the core, which is\nresponsible for the common logic and is used in all their adblockers. Using this\nAPI, we can initialize the adblocker engine, pass in some URLs, match them\nagainst your adblocker rules, and then perform blocking, filtering or\nredirecting and so on, depending on what is matched.\n\nAs I said, an adblocker usually works according to a predefined set of rules.\nTherefore, we also need to create our own list of rules that only contain\nentries related to filtering query parameters. The [FiltersRegistry][3] allows\nyou to do it. [We will discuss this later](#how-to-make-your-own-rules-list).\n\nThen, if our backend was written in JS, we would have no further problems: just\nadd dependencies, maybe some polyfills, and run the code. But we're running on\nthe JVM, and that's actually another purpose of this library - to show that it's\npossible to run a large webpack bundle consisting of TypeScript libraries and\nmodern JS APIs on the JVM.\n\nSo, that's how I have done it:\n\n- We are writing `urlopt4s-js` JS module, which interacts with `tsurlfilter`\n  library, inits an engine and provides functions to be called from a JVM, like\n  `removeAdQueryParams(str)`.\n- We are building `urlopt4s-js` bundle with webpack, adding some polyfills, some\n  tricks to make JS-on-JVM working.\n- We are compiling our custom set of rules, which has only query params\n  filtering things.\n- Finally, we are writing `urlopt4s` Scala module, and packing JS bundle and\n  rules inside. It inits everything and then just provides Scala interface to\n  call JS code.\n\nJS-on-JVM is implemented using GraalJS and works quite well, but a lot of tricky\nthings were required to get everything working together.\n\nStill, there is plenty of room for optimization, and I believe many things could\nbe improved, but as for now I leave it as it is.\n\n## How to build JAR artifact\n\nFirstly, you have to build webpack bundle which will be included in final\n`.jar`. Just go to `urlopt4s-js` and do:\n\n```\nnpm exec webpack\n```\n\nYour bundle will be available at `urlopt4s-js/dist/main-bundle.mjs`. It should\nbe moved then to `urlopt4s/resources/urlopt4s.mjs`.\n\nNow you can compile and build `.jar`:\n\n```\n./millw __.publishLocal\n```\n\n## How to make your own rules list\n\n`urlopt4s` comes with predefined set of rules: `urlopt4s/resources/rules.txt`.\nIt was compiled using [FiltersRegistry][3] repository and contains only\n`$removeparam` directives. You may use the default one or compile your own. The\nrepository has pretty nice documentation, but compiling the list which you see\nhere requires some additional code. I'm leaving the patch which I did to do it:\n\n```diff\ndiff --git a/scripts/build/build.js b/scripts/build/build.js\nindex 8f7332b7657..033c0a59c55 100755\n--- a/scripts/build/build.js\n+++ b/scripts/build/build.js\n@@ -1,6 +1,7 @@\n const fs = require('fs');\n const path = require('path');\n const compiler = require('adguard-filters-compiler');\n+const compilerOptimization = require('../../node_modules/adguard-filters-compiler/src/main/optimization.js');\n\n const customPlatformsConfig = require('./custom_platforms');\n const { formatDate } = require('../utils/strings');\n@@ -72,6 +73,8 @@ const buildFilters = async () =\u003e {\n         await fs.promises.cp(platformsPath, copyPlatformsPath, { recursive: true });\n     }\n\n+    compilerOptimization.disableOptimization();\n+\n     await compiler.compile(\n         filtersDir,\n         logPath,\ndiff --git a/scripts/build/custom_platforms.js b/scripts/build/custom_platforms.js\nindex 71dcb17cd00..867603af78b 100644\n--- a/scripts/build/custom_platforms.js\n+++ b/scripts/build/custom_platforms.js\n@@ -533,7 +533,44 @@ const SAFARI_BASED_EXTENSION_PATTERNS = [\n     ...JSONPRUNE_MODIFIER_PATTERNS,\n ];\n\n+const ONLY_REMOVEPARAM_MODIFIER_PATTERNS = [\n+    '^(?!.*(\\\\$(?!#|(path|domain)=.*]).*removeparam(,|=|$))).*$',\n+];\n+\n+const SKIP_CONTENT_TYPE_PATTERNS = [\n+    '\\\\$.*document',\n+    '\\\\$.*subdocument',\n+    '\\\\$.*font',\n+    '\\\\$.*image',\n+    '\\\\$.*media',\n+    '\\\\$.*object',\n+    '\\\\$.*other',\n+    '\\\\$.*ping',\n+    '\\\\$.*script',\n+    '\\\\$.*stylesheet',\n+    '\\\\$.*websocket',\n+    '\\\\$.*xmlhttprequest'\n+];\n+\n module.exports = {\n+    'LINK_OPTIMIZER': {\n+        'platform': 'link_optimizer',\n+        'path': 'link_optimizer',\n+        'expires': '10 days',\n+        'configuration': {\n+            // removing everything except of $removeparam\n+            'removeRulePatterns': [\n+                ...ONLY_REMOVEPARAM_MODIFIER_PATTERNS,\n+                ...SKIP_CONTENT_TYPE_PATTERNS\n+            ],\n+            'replacements': null,\n+            'ignoreRuleHints': false,\n+        },\n+        'defines': {\n+            'adguard': true,\n+            'adguard_ext_chromium': true,\n+        },\n+    },\n     'WINDOWS': {\n         'platform': 'windows',\n         'path': 'windows',\n```\n\nAfter compiling you will have to concat all the output and get rid of\nduplicates.\n\n## License\n\n```text\nMIT License\n\nCopyright (c) 2024 Daniil Sivak\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n```\n\n[1]: https://github.com/AdguardTeam\n[2]: https://github.com/AdguardTeam/tsurlfilter\n[3]: https://github.com/AdguardTeam/FiltersRegistry\n[4]: https://seroperson.me/2023/09/08/link-saver-bot-for-telegram/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseroperson%2Furlopt4s","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseroperson%2Furlopt4s","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseroperson%2Furlopt4s/lists"}