{"id":15062280,"url":"https://github.com/nigelhorne/html-simplelinkextor","last_synced_at":"2025-10-22T08:52:04.394Z","repository":{"id":56833803,"uuid":"312657372","full_name":"nigelhorne/HTML-SimpleLinkExtor","owner":"nigelhorne","description":"Extract links from HTML","archived":false,"fork":true,"pushed_at":"2024-12-03T19:47:09.000Z","size":115,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-03T20:34:04.323Z","etag":null,"topics":["cpan","perl","perl-module"],"latest_commit_sha":null,"homepage":"https://metacpan.org/pod/HTML::SimpleLinkExtor","language":"Perl","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"CPAN-Adoptable-Modules/html-simplelinkextor","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nigelhorne.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-13T18:46:06.000Z","updated_at":"2024-12-03T19:47:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nigelhorne/HTML-SimpleLinkExtor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nigelhorne%2FHTML-SimpleLinkExtor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nigelhorne%2FHTML-SimpleLinkExtor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nigelhorne%2FHTML-SimpleLinkExtor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nigelhorne%2FHTML-SimpleLinkExtor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nigelhorne","download_url":"https://codeload.github.com/nigelhorne/HTML-SimpleLinkExtor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235364933,"owners_count":18978264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpan","perl","perl-module"],"created_at":"2024-09-24T23:33:36.720Z","updated_at":"2025-10-05T04:31:48.180Z","avatar_url":"https://github.com/nigelhorne.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NAME\n\nHTML::SimpleLinkExtor - Extract links from HTML\n\n# SYNOPSIS\n\n        use HTML::SimpleLinkExtor;\n\n        my $extor = HTML::SimpleLinkExtor-\u003enew();\n        $extor-\u003eparse_file($filename);\n        #--or--\n        $extor-\u003eparse($html);\n\n        $extor-\u003eparse_file($other_file); # get more links\n\n        $extor-\u003eclear_links; # reset the link list\n\n        #extract all of the links\n        @all_links   = $extor-\u003elinks;\n\n        #extract the img links\n        @img_srcs    = $extor-\u003eimg;\n\n        #extract the frame links\n        @frame_srcs  = $extor-\u003eframe;\n\n        #extract the hrefs\n        @area_hrefs  = $extor-\u003earea;\n        @a_hrefs     = $extor-\u003ea;\n        @base_hrefs  = $extor-\u003ebase;\n        @hrefs       = $extor-\u003ehref;\n\n        #extract the body background link\n        @body_bg     = $extor-\u003ebody;\n        @background  = $extor-\u003ebackground;\n\n        @links       = $extor-\u003eschemes( 'http' );\n\n# DESCRIPTION\n\nThis is a simple HTML link extractor designed for the person who does\nnot want to deal with the intricacies of `HTML::Parser` or the\nde-referencing needed to get links out of `HTML::LinkExtor`.\n\nYou can extract all the links or some of the links (based on the HTML\ntag name or attribute name). If a `\u003cBASE HREF\u003e` tag is found,\nall of the relative URLs will be resolved according to that reference.\n\nThis module is simply a subclass around `HTML::LinkExtor`, so it can\nonly parse what that module can handle.  Invalid HTML or XHTML may\ncause problems.\n\nIf you parse multiple files, the link list grows and contains the\naggregate list of links for all of the files parsed. If you want to\nreset the link list between files, use the clear\\_links method.\n\n## Class Methods\n\n- $extor = HTML::SimpleLinkExtor-\u003enew()\n\n    Create the link extractor object.\n\n- $extor = HTML::SimpleLinkExtor-\u003enew('')\n- $extor = HTML::SimpleLinkExtor-\u003enew($base)\n\n    Create the link extractor object and resolve the relative URLs\n    accoridng to the supplied base URL. The supplied base URL overrides\n    any other base URL found in the HTML.\n\n    Create the link extractor object and do not resolve relative\n    links.\n\n- HTML::SimpleLinkExtor-\u003eua;\n\n    Returns the internal user agent, an `LWP::UserAgent` object.\n\n- HTML::SimpleLinkExtor-\u003eadd\\_tags( TAG \\[, TAG \\] )\n\n    `HTML::SimpleLinkExtor` keeps an internal list of HTML tags (such as\n    'a' and 'img') that have URLs as values. If you run into another tag\n    that this module doesn't handle, please send it to me and I'll add it.\n    Until then you can add that tag to the internal list. This affects\n    the entire class, including previously created objects.\n\n- HTML::SimpleLinkExtor-\u003eadd\\_attributes( ATTR \\[, ATTR\\] )\n\n    `HTML::SimpleLinkExtor` keeps an internal list of HTML tag attributes\n    (such as 'href' and 'src') that have URLs as values. If you run into\n    another attribute that this module doesn't handle, please send it to\n    me and I'll add it. Until then you can add that attribute to the\n    internal list. This affects the entire class, including previously\n    created objects.\n\n- can()\n\n    A smarter `can` that can tell which attributes are also methods.\n\n- HTML::SimpleLinkExtor-\u003eremove\\_tags( TAG \\[, TAG \\] )\n\n    Take tags out of the internal list that `HTML::SimpleLinkExtor` uses\n    to extract URLs. This affects the entire class, including previously\n    created objects.\n\n- HTML::SimpleLinkExtor-\u003eremove\\_attributes( ATTR \\[, ATTR\\] )\n\n    Takes attributes out of the internal list that\n    `HTML::SimpleLinkExtor` uses to extract URLs. This affects the entire\n    class, including previously created objects.\n\n- HTML::SimpleLinkExtor-\u003eattribute\\_list\n\n    Returns a list of the attributes `HTML::SimpleLinkExtor` pays\n    attention to.\n\n- HTML::SimpleLinkExtor-\u003etag\\_list\n\n    Returns a list of the tags `HTML::SimpleLinkExtor` pays attention to.\n    These tags have convenience methods.\n\n## Object methods\n\n- $extor-\u003eparse\\_file( $filename )\n\n    Parse the file for links. Inherited from `HTML::Parser`.\n\n- $extor-\u003eparse\\_url( $url \\[, $ua\\] )\n\n    Fetch URL and parse its content for links.\n\n- $extor-\u003eparse( $data )\n\n    Parse the HTML in `$data`. Inherited from `HTML::Parser`.\n\n- $extor-\u003eclear\\_links\n\n    Clear the link list. This way, you can use the same parser for\n    another file.\n\n- $extor-\u003elinks\n\n    Return a list of the links.\n\n- $extor-\u003eimg\n\n    Return a list of the links from all the SRC attributes of the\n    IMG.\n\n- $extor-\u003eframe\n\n    Return a list of all the links from all the SRC attributes of\n    the FRAME.\n\n- $extor-\u003eiframe\n\n    Return a list of all the links from all the SRC attributes of\n    the IFRAME.\n\n- $extor-\u003eframes\n\n    Returns the combined list from frame and iframe.\n\n- $extor-\u003esrc\n\n    Return a list of the links from all the SRC attributes of any\n    tag.\n\n- $extor-\u003ea\n\n    Return a list of the links from all the HREF attributes of the\n    A tags.\n\n- $extor-\u003earea\n\n    Return a list of the links from all the HREF attributes of the\n    AREA tags.\n\n- $extor-\u003ebase\n\n    Return a list of the links from all the HREF attributes of the\n    BASE tags.  There should only be one.\n\n- $extor-\u003ehref\n\n    Return a list of the links from all the HREF attributes of any\n    tag.\n\n- $extor-\u003ebody, $extor-\u003ebackground\n\n    Return the link from the BODY tag's BACKGROUND attribute.\n\n- $extor-\u003escript\n\n    Return the link from the SCRIPT tag's SRC attribute\n\n- $extor-\u003eschemes( SCHEME, \\[ SCHEME, ... \\] )\n\n    Return the links that use any of SCHEME. These must be absolute URLs (which\n    might include those converted to absolute URLs by specifying a\n    base). SCHEME is case-insensitive. You can specify more than one\n    scheme.\n\n    In list context it returns the links. In scalar context it returns\n    the count of the matching links.\n\n- $extor-\u003eabsolute\\_links\n\n    Returns the absolute URLs (which might include those converted to\n    absolute URLs by specifying a base).\n\n    In list context it returns the links. In scalar context it returns\n    the count of the matching links.\n\n- $extor-\u003erelative\\_links\n\n    Returns the relatives URLs (which might exclude those converted to\n    absolute URLs by specifying a base or having a base in the document).\n\n    In list context it returns the links. In scalar context it returns\n    the count of the matching links.\n\n# TO DO\n\nThis module doesn't handle all of the HTML tags that might\nhave links.  If someone wants those, I'll add them, or you\ncan edit `%AUTO_METHODS` in the source.\n\n# CREDITS\n\nWill Crain who identified a problem with IMG links that had\na USEMAP attribute.\n\n# AUTHORS\n\nbrian d foy, `\u003cbdfoy@cpan.org\u003e`\n\nMaintained by Nigel Horne, `\u003cnjh at bandsman.co.uk\u003e`\n\n# COPYRIGHT AND LICENSE\n\nCopyright © 2004-2019, brian d foy \u003cbdfoy@cpan.org\u003e. All rights reserved.\n\nThis program is free software; you can redistribute it and/or modify\nit under the terms of the Artistic License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnigelhorne%2Fhtml-simplelinkextor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnigelhorne%2Fhtml-simplelinkextor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnigelhorne%2Fhtml-simplelinkextor/lists"}