{"id":13448955,"url":"https://github.com/jimen0/differer","last_synced_at":"2025-03-22T18:32:16.507Z","repository":{"id":57550395,"uuid":"259895880","full_name":"jimen0/differer","owner":"jimen0","description":"differer finds how URLs are parsed by different languages in order to help bug hunters break filters","archived":false,"fork":false,"pushed_at":"2020-05-03T08:13:58.000Z","size":34,"stargazers_count":63,"open_issues_count":0,"forks_count":5,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-10-28T15:42:19.119Z","etag":null,"topics":["bugbounty","cloudrun","go","golang","serverless","url"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jimen0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-29T10:34:26.000Z","updated_at":"2024-07-25T13:37:39.000Z","dependencies_parsed_at":"2022-08-29T20:41:13.392Z","dependency_job_id":null,"html_url":"https://github.com/jimen0/differer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimen0%2Fdifferer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimen0%2Fdifferer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimen0%2Fdifferer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimen0%2Fdifferer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jimen0","download_url":"https://codeload.github.com/jimen0/differer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245002937,"owners_count":20545518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bugbounty","cloudrun","go","golang","serverless","url"],"created_at":"2024-07-31T06:00:26.217Z","updated_at":"2025-03-22T18:32:11.467Z","avatar_url":"https://github.com/jimen0.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"## differer\n\nDifferer project aims to help Bug Bounty Hunters to find differences between several languages and libraries URL parsers. Not all of them behave in the same way and it might lead to unexpected vulnerabilities.\n\nURLs format is defined in [RFC 3986](https://tools.ietf.org/html/rfc3986), however there are small differences between languages, libraries and how they deal with incorrect URLs. Some of them report an error to the caller, other raise exceptions and other go with the best-effort approach and try to fix them for you. It is exactly there where unexpected security issues might arise.\n\n```\n         foo://example.com:8042/over/there?name=ferret#nose\n         \\_/   \\______________/\\_________/ \\_________/ \\__/\n          |           |            |            |        |\n       scheme     authority       path        query   fragment\n          |   _____________________|__\n         / \\ /                        \\\n         urn:example:animal:ferret:nose\n```\n\nA lot of work has been done in this particular topic already. One of the most popular places where it has been discussed is on Orange's presentation _[A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages!](https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf)_.\n\nThis project doesn't bring any new attack technique, rather than that it tries to make the process of finding parser differences easier.\n\n### The goal\n\nTo be able to run the parsers against the desired URLs on demand and without worrying about setting up the compilers, interpreters or the messaging broker. The main use case of this project is to be deployed under Google Cloud Platform using the smallest amount of resources and letting it tear down the services if they aren't being used.\n\nThe only thing I want to do is to submit a URL somewhere and get the different parsers results.\n\n### Show me the numbers\n\n\u003e Please, remember that this project focuses on maintainability, not on performance.\n\nHere are the numbers when running the project with `400` URLs against `3` parsers. Parser instances located in `europe-west1`, have `128 MiB` of RAM, `1` vCPU, max `4` instances per parser and up to `60` concurrent requests for each instance. The parsers are:\n\n| Language | Version       | Parser                                                                                   |\n|----------|---------------|------------------------------------------------------------------------------------------|\n| Go       | 1.14.2        | [`net/url.Parse`](https://pkg.go.dev/net/url?tab=doc#Parse )                             |\n| Python   | 3.8           | [`urllib.urlparse`](https://docs.python.org/3/library/urllib.parse.html)                 |\n| Node     | 14            | [`url`](https://nodejs.org/api/url.html)                                                 |\n\n\n```console\n$ curl -o /dev/null -X'POST' -d @data.json -s -w \"%{time_total}\\n\" \"https://REDACTED/differer\"\n0.888706\n```\n\nFor cold loads (calls that happen after cloudrun shuts down the containers) time is higher, as expected:\n\n```console\n$ curl -o /dev/null -X'POST' -d @data.json -s -w \"%{time_total}\\n\" \"https://REDACTED/differer\"\n2.760033\n```\n\n### Why microservices?\n\nBecause I don't want to run the tools locally each time I want to see how different languages parse an URL. I simply run a query to my service and get the output.\n\nSetting the project up using App Engine and Cloud Run allows me to forget about infrastructure. GCP shuts my services down and up, plus allows me to restrict the access to them thanks to the firewall and IAM rules.\n\n![GCP architecture](./docs/differer.svg)\n\nHowever, the project can be used locally too. See [local setup docs](./docs/RUN_LOCAL.md) or [GCP docs](./docs/RUN_GCP.md).\n\n### How to configure?\n\nThe configuration file is a simple YAML file. Here is an example if you want to run the project locally with Go's, Python's and Node's parsers. See the [`config_example.yaml`](./config_example.yaml) file for a raw example.\n\n```yaml\n---\nrunners:\n  golang: http://golang-parseurl:8082/\n  python3_urllib_urlparse: http://python-parseurl:8083/\n  node_url_parse: http://node-parseurl:8084/\ntimeout: 10s\n```\n\n### How to add a language or library?\n\nAs long as your new `runner` listens on HTTP for a `POST` request containing the jobs, the service is agnostic and doesn't care about where or how you run each runner.\n\nThe `Job` and `Result` structure can be found into the [Protocol Buffer](./scheduler/scheduler.proto) the project uses. Use the [`protoc`](https://github.com/protocolbuffers/protobuf) compiler to generate your language's jobs and results parsers. See [this document](./docs/RUNNER_EXAMPLE.md) for a complete example using Go.\n\nOnce your `runner` is deployed somewhere, just edit your `config.yaml` adding it.\n\n### How to run locally?\n\nFor simplicity, let's assume you run all the services using Docker containers. Follow the [local setup guide](./docs/RUN_LOCAL.md) and then just send a request to `differer` with the URLs you want it to parse.\n\n\u003cdetails\u003e\n\u003csummary\u003eLocal run\u003c/summary\u003e\n\u003cbr\u003e\n\n```bash\n$ curl -s --request POST 'http://127.0.0.1:8080/differer' \\\n  --header 'Content-Type: application/json' \\\n  --data-raw '{\n    \"addresses\": [\n        \"https://google.com:443/foobar\",\n        \"http://user:legit.com@attacker.com/?pwnz=1\"\n    ]\n}' | jq .\n{\n  \"results\": [\n    {\n      \"runner\": \"python3_urllib_urlparse\",\n      \"string\": \"https://google.com:443/foobar\",\n      \"outputs\": {\n        \"id\": \"python3:urllib:urlparse\",\n        \"value\": \"Scheme=https; Host=google.com:443; Path=/foobar;\"\n      }\n    },\n    {\n      \"runner\": \"python3_urllib_urlparse\",\n      \"string\": \"http://user:legit.com@attacker.com/?pwnz=1\",\n      \"outputs\": {\n        \"id\": \"python3:urllib:urlparse\",\n        \"value\": \"Scheme=http; Host=user:legit.com@attacker.com; Path=/; User=user:legit.com;\"\n      }\n    },\n    {\n      \"runner\": \"node_url_parse\",\n      \"string\": \"http://user:legit.com@attacker.com/?pwnz=1\",\n      \"outputs\": {\n        \"id\": \"node14:url.parse\",\n        \"value\": \"Scheme=http:; Host=attacker.com; Path=/; User=user:legit.com\"\n      }\n    },\n    {\n      \"runner\": \"node_url_parse\",\n      \"string\": \"https://google.com:443/foobar\",\n      \"outputs\": {\n        \"id\": \"node14:url.parse\",\n        \"value\": \"Scheme=https:; Host=google.com:443; Path=/foobar;\"\n      }\n    },\n    {\n      \"runner\": \"golang\",\n      \"string\": \"https://google.com:443/foobar\",\n      \"outputs\": {\n        \"id\": \"golang\",\n        \"value\": \"Scheme=https; Host=google.com:443; Path=/foobar;\"\n      }\n    },\n    {\n      \"runner\": \"golang\",\n      \"string\": \"http://user:legit.com@attacker.com/?pwnz=1\",\n      \"outputs\": {\n        \"id\": \"golang\",\n        \"value\": \"Scheme=http; Host=attacker.com; Path=/; User=user:legit.com;\"\n      }\n    }\n  ]\n}\n```\n\u003c/details\u003e\n\n### Why do the runners only accept one task at a time?\n\nIndeed the service would be faster if runners would accept multiple tasks at a time, and changing it to support them would be straight forward. However, I decided to keep it as simple as possible as it's performant enough for me.\n\n### It would be faster if it used `xyz`\n\nThe project aims for maintainability and ease of use over performance. Feel free to fork it if you disagree.\n\n### How can I contribute?\n\nPlease, check the [contributing documentation](./docs/CONTRIBUTE.md).\n\n### Credits\n\nI decided to build this project after a discussion with some friends ([Karel](https://github.com/karelorigin), [Karim](https://github.com/KarimPwnz), [Corben](https://github.com/lc) and [Amal](https://github.com/amalmurali47)).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjimen0%2Fdifferer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjimen0%2Fdifferer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjimen0%2Fdifferer/lists"}