{"id":16582092,"url":"https://github.com/robinst/autolink-java","last_synced_at":"2025-05-14T13:09:42.788Z","repository":{"id":33221382,"uuid":"36864499","full_name":"robinst/autolink-java","owner":"robinst","description":"Java library to extract links (URLs, email addresses) from plain text; fast, small and smart","archived":false,"fork":false,"pushed_at":"2024-11-19T08:00:20.000Z","size":161,"stargazers_count":209,"open_issues_count":3,"forks_count":42,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-07T20:11:21.750Z","etag":null,"topics":["autolink","extraction","java-library","linkify","links","url"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robinst.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"robinst"}},"created_at":"2015-06-04T10:42:53.000Z","updated_at":"2025-05-05T15:33:25.000Z","dependencies_parsed_at":"2024-10-11T22:31:40.038Z","dependency_job_id":"2a46e676-00e7-4c0c-a536-1b8baae7522c","html_url":"https://github.com/robinst/autolink-java","commit_stats":{"total_commits":147,"total_committers":6,"mean_commits":24.5,"dds":0.08163265306122447,"last_synced_commit":"096071033f56e120d4f9545d7ef24044a5531b31"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robinst%2Fautolink-java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robinst%2Fautolink-java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robinst%2Fautolink-java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robinst%2Fautolink-java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robinst","download_url":"https://codeload.github.com/robinst/autolink-java/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254149977,"owners_count":22022852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autolink","extraction","java-library","linkify","links","url"],"created_at":"2024-10-11T22:31:26.846Z","updated_at":"2025-05-14T13:09:37.751Z","avatar_url":"https://github.com/robinst.png","language":"Java","readme":"autolink-java\n=============\n\nJava library to extract links such as URLs and email addresses from plain text.\nIt's smart about where a link ends, such as with trailing punctuation.\n\n[![ci](https://github.com/robinst/autolink-java/workflows/ci/badge.svg)](https://github.com/robinst/autolink-java/actions?query=workflow%3Aci)\n[![Coverage status](https://codecov.io/gh/robinst/autolink-java/branch/main/graph/badge.svg)](https://codecov.io/gh/robinst/autolink-java)\n[![Maven Central status](https://img.shields.io/maven-central/v/org.nibor.autolink/autolink.svg)](https://search.maven.org/search?q=g:org.nibor.autolink%20AND%20a:autolink\u0026core=gav)\n\nIntroduction\n------------\n\nYou might think: \"Do I need a library for this? I can just write a regex for this!\".\nLet's look at a few cases:\n\n* In text like `https://example.com/.` the link should not include the trailing dot\n* `https://example.com/,` should not include the trailing comma\n* `(https://example.com/)` should not include the parens\n\nSeems simple enough. But then we also have these cases:\n\n* `https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)` should include the trailing paren\n* `https://üñîçøðé.com/ä` should also work for Unicode (including Emoji and Punycode)\n* `\u003chttps://example.com/\u003e` should not include angle brackets\n\nThis library behaves as you'd expect in the above cases and many more.\nIt parses the input text in one pass with limited backtracking.\n\nThanks to [Rinku](https://github.com/vmg/rinku) for the inspiration.\n\nUsage\n-----\n\nThis library is supported on Java 11 or later. It works on Android\n(minimum API level 19). It has no external dependencies.\n\nMaven coordinates\n(see\n[here](https://central.sonatype.com/artifact/org.nibor.autolink/autolink/0.11.0/overview)\nfor other build systems):\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.nibor.autolink\u003c/groupId\u003e\n    \u003cartifactId\u003eautolink\u003c/artifactId\u003e\n    \u003cversion\u003e0.11.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nExtracting links:\n\n```java\nimport org.nibor.autolink.*;\n\nvar input = \"two links: https://test.com and https://example.com\";\nvar linkExtractor = LinkExtractor.builder()\n        .linkTypes(EnumSet.of(LinkType.URL)) // limit to URLs\n        .build();\nvar links = new ArrayList\u003c\u003e();\nfor (var span : linkExtractor.extractLinks(input)) {\n    var link = input.substring(span.getBeginIndex(), span.getEndIndex());\n    links.add(link);\n}\n\nlinks;  // List.of(\"https://test.com\", \"https://example.com\")\n```\n\nNote that by default all supported types of links are extracted. If\nyou're only interested in specific types, narrow it down using the\n`linkTypes` method.\n\nThe above returns all the links. Sometimes what you want to do is go over some input,\nprocess the links and keep the surrounding text. For that case,\nthere's an `extractSpans` method.\n\nHere's an example of using that to transform the text to HTML and wrapping URLs in\nan `\u003ca\u003e` tag (escaping is done using owasp-java-encoder):\n\n```java\nimport org.nibor.autolink.*;\nimport org.owasp.encoder.Encode;\n\nString input = \"wow http://test.com such linked\";\nLinkExtractor linkExtractor = LinkExtractor.builder()\n        .linkTypes(EnumSet.of(LinkType.URL)) // limit to URLs\n        .build();\nIterable\u003cSpan\u003e spans = linkExtractor.extractSpans(input);\n\nStringBuilder sb = new StringBuilder();\nfor (Span span : spans) {\n    String text = input.substring(span.getBeginIndex(), span.getEndIndex());\n    if (span instanceof LinkSpan) {\n        // span is a URL\n        sb.append(\"\u003ca href=\\\"\");\n        sb.append(Encode.forHtmlAttribute(text));\n        sb.append(\"\\\"\u003e\");\n        sb.append(Encode.forHtml(text));\n        sb.append(\"\u003c/a\u003e\");\n    } else {\n        // span is plain text before/after link\n        sb.append(Encode.forHtml(text));\n    }\n}\n\nsb.toString();  // \"wow \u003ca href=\\\"http://test.com\\\"\u003ehttp://test.com\u003c/a\u003e such linked\"\n```\n\nNote that this assumes that the input is plain text, not HTML.\nAlso see the \"What this is not\" section below.\n\nFeatures\n--------\n\n### URL extraction\n\nExtracts URLs of the form `scheme://example` with any potentially valid scheme.\nURIs such as `example:test` are not matched (may be added as an option in the\nfuture). If only certain schemes should be allowed, the result can be filtered.\n(Note that schemes can contain dots, so `foo.http://example` is recognized as\na single link.)\n\nIncludes heuristics for not including trailing delimiters such as punctuation\nand unbalanced parentheses, see examples below.\n\nSupports internationalized domain names (IDN). Note that they are not validated\nand as a result, invalid URLs may be matched.\n\nExample input and linked result:\n\n* `http://example.com.` → [http://example.com]().\n* `http://example.com,` → [http://example.com](),\n* `(http://example.com)` → ([http://example.com]())\n* `(... (see http://example.com))` → (... (see [http://example.com]()))\n* `https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)` →\n  [https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)]()\n* `http://üñîçøðé.com/` → [http://üñîçøðé.com/]()\n\nUse `LinkType.URL` for this, and see [test\ncases here](src/test/java/org/nibor/autolink/AutolinkUrlTest.java).\n\n### WWW link extraction\n\nExtract links like `www.example.com`. They need to start with `www.` but\ndon't need a `scheme://`. For detecting the end of the link, the same\nheuristics apply as for URLs.\n\nExamples:\n\n* `www.example.com.` → [www.example.com]().\n* `(www.example.com)` → ([www.example.com]())\n* `[..] link:www.example.com [..]` → \\[..\\] link:[www.example.com]() \\[..\\]\n\nNot supported:\n\n* Uppercase `www`'s, e.g. `WWW.example.com` and `wWw.example.com`\n* Too many or too few `w`'s, e.g. `wwww.example.com`\n\nThe domain must have at least 3 parts, so `www.com` is not valid, but `www.something.co.uk` is.\n\nUse `LinkType.WWW` for this, and see [test\ncases here](src/test/java/org/nibor/autolink/AutolinkWwwTest.java).\n\n### Email address extraction\n\nExtracts emails such as `foo@example.com`. Matches international email\naddresses, but doesn't verify the domain name (may match too much).\n\nExamples:\n\n* `foo@example.com` → [foo@example.com]()\n* `foo@example.com.` → [foo@example.com]().\n* `foo@example.com,` → [foo@example.com](),\n* `üñîçøðé@üñîçøðé.com` → [üñîçøðé@üñîçøðé.com]()\n\nNot supported:\n\n* Quoted local parts, e.g. `\"this is sparta\"@example.com`\n* Address literals, e.g. `foo@[127.0.0.1]`\n\nNote that the domain must have at least one dot (e.g. `foo@com` isn't\nmatched), unless the `emailDomainMustHaveDot` option is disabled.\n\nUse `LinkType.EMAIL` for this, and see [test cases\nhere](src/test/java/org/nibor/autolink/AutolinkEmailTest.java).\n\nWhat this is not\n----------------\n\nThis library is intentionally *not* aware of HTML. If it was, it would need to depend on an HTML parser and renderer.\nConsider this input:\n\n```\nHTML that contains \u003ca href=\"https://one.example\"\u003elinks\u003c/a\u003e but also plain URLs like https://two.example.\n```\n\nIf you want to turn the plain links into `a` elements but leave the already linked ones intact, I recommend:\n\n1. Parse the HTML using an HTML parser library\n2. Walk through the resulting DOM and use autolink-java to find links within *text* nodes only\n3. Turn those into `a` elements\n4. Render the DOM back to HTML\n\nContributing\n------------\n\nSee CONTRIBUTING.md file.\n\nLicense\n-------\n\nCopyright (c) 2015-2022 Robin Stocker and others, see Git history\n\nMIT licensed, see LICENSE file.\n","funding_links":["https://github.com/sponsors/robinst"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobinst%2Fautolink-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobinst%2Fautolink-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobinst%2Fautolink-java/lists"}