{"id":30294431,"url":"https://github.com/linkedin/url-detector","last_synced_at":"2025-08-17T01:34:54.938Z","repository":{"id":26974918,"uuid":"30438494","full_name":"linkedin/URL-Detector","owner":"linkedin","description":"A Java library to detect and normalize URLs in text","archived":false,"fork":false,"pushed_at":"2025-07-12T01:38:27.000Z","size":86,"stargazers_count":782,"open_issues_count":21,"forks_count":186,"subscribers_count":67,"default_branch":"master","last_synced_at":"2025-07-19T16:27:48.771Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linkedin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-06T23:53:22.000Z","updated_at":"2025-04-04T14:38:06.000Z","dependencies_parsed_at":"2022-08-31T12:13:12.282Z","dependency_job_id":null,"html_url":"https://github.com/linkedin/URL-Detector","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/linkedin/URL-Detector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FURL-Detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FURL-Detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FURL-Detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FURL-Detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linkedin","download_url":"https://codeload.github.com/linkedin/URL-Detector/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2FURL-Detector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796217,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-17T01:34:54.018Z","updated_at":"2025-08-17T01:34:54.925Z","avatar_url":"https://github.com/linkedin.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Url Detector\n\nThe url detector is a library created by the Linkedin Security Team to detect and extract urls in a long piece of text.\n\nIt is able to find and detect any urls such as:\n\n* __HTML 5 Scheme__   - //www.linkedin.com\n* __Usernames__       - user:pass@linkedin.com\n* __Email__           - fred@linkedin.com\n* __IPv4 Address__    - 192.168.1.1/hello.html\n* __IPv4 Octets__     - 0x00.0x00.0x00.0x00\n* __IPv4 Decimal__    - http://123123123123/\n* __IPv6 Address__    - ftp://[::]/hello\n* __IPv4-mapped IPv6 Address__  - http://[fe30:4:3:0:192.3.2.1]/\n\n_Note: Keep in mind that for security purposes, its better to overdetect urls and check more against blacklists than to not detect a url that was submitted. As such, some things that we detect might not be urls but somewhat look like urls. Also, instead of complying with RFC 3986 (http://www.ietf.org/rfc/rfc3986.txt), we try to detect based on browser behavior, optimizing detection for urls that are visitable through the address bar of Chrome, Firefox, Internet Explorer, and Safari._\n\nIt is also able to identify the parts of the identified urls. For example, for the url: `http://user@linkedin.com:39000/hello?boo=ff#frag`\n\n* Scheme   - \"http\"\n* Username - \"user\"\n* Password - null\n* Host     - \"linkedin.com\"\n* Port     - 39000\n* Path     - \"/hello\"\n* Query    - \"?boo=ff\"\n* Fragment - \"#frag\"\n\n---\n## How to Use:\n\nUsing the URL detector library is simple. Simply import the UrlDetector object and give it some options. In response, you will get a list of urls which were detected.\n\nFor example, the following code will find the url `linkedin.com`\n\n```java\n\n    UrlDetector parser = new UrlDetector(\"hello this is a url Linkedin.com\", UrlDetectorOptions.Default);\n    List\u003cUrl\u003e found = parser.detect();\n\n    for(Url url : found) {\n        System.out.println(\"Scheme: \" + url.getScheme());\n        System.out.println(\"Host: \" + url.getHost());\n        System.out.println(\"Path: \" + url.getPath());\n    }\n```\n\n### Quote Matching and HTML\nDepending on your input string, you may want to handle certain characters in a special way. For example if you are\nparsing HTML, you probably want to break out of things like quotes and brackets. For example, if your input looks like\n\n\u003e \u0026lt;a href=\"http://linkedin.com/abc\"\u0026gt;linkedin.com\u0026lt;/a\u0026gt;\n\nYou probably want to make sure that the quotes and brackets are extracted. For that reason, using UrlDetectorOptions\nwill allow you to change the sensitivity level of detection based on your expected input type. This way you can detect\n`linkedin.com` instead of `linkedin.com\u003c/a\u003e`.\n\nIn code this looks like:\n\n```java\n\n    UrlDetector parser = new UrlDetector(\"\u003ca href=\"linkedin.com/abc\"\u003elinkedin.com\u003c/a\u003e\", UrlDetectorOptions.HTML);\n    List\u003cUrl\u003e found = parser.detect();\n\n```\n\n\n---\n## About:\n\nThis library was written by the security team and Linkedin when other options did not exist. Some of the primary authors are:\n\n* Vlad Shlosberg (vshlosbe@linkedin.com)\n* Tzu-Han Jan (tjan@linkedin.com)\n* Yulia Astakhova (jastakho@linkedin.com)\n\n---\n## Third Party Dependencies\n\n####TestNG\n* http://testng.org/\n* Copyright © 2004-2014 Cédric Beust\n* License: Apache 2.0\n\n####Apache CommonsLang3: org.apache.commons:commons-lang3:3.1\n* http://commons.apache.org/proper/commons-lang/\n* Copyright © 2001-2014 The Apache Software Foundation\n* License: Apache 2.0\n\n---\n## License\n\nCopyright 2015 LinkedIn Corp. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the license at http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Furl-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinkedin%2Furl-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Furl-detector/lists"}