{"id":47867792,"url":"https://github.com/jvwaldrich0/html-href-lookup","last_synced_at":"2026-04-04T00:37:16.469Z","repository":{"id":226983041,"uuid":"770109487","full_name":"jvwaldrich0/html-href-lookup","owner":"jvwaldrich0","description":"This script parses an HTML file and extracts the URLs from the href attributes in the body section. It then performs a DNS lookup on each URL and saves the results to an output file","archived":false,"fork":false,"pushed_at":"2024-03-11T00:04:10.000Z","size":227,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-29T20:58:31.210Z","etag":null,"topics":["bash-script","desec","linux-shell"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jvwaldrich0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-10T23:35:12.000Z","updated_at":"2024-03-10T23:50:29.000Z","dependencies_parsed_at":"2024-03-17T12:49:15.523Z","dependency_job_id":null,"html_url":"https://github.com/jvwaldrich0/html-href-lookup","commit_stats":null,"previous_names":["jvwaldrich0/parsing-html","jvwaldrich0/html-href-finder"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jvwaldrich0/html-href-lookup","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jvwaldrich0%2Fhtml-href-lookup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jvwaldrich0%2Fhtml-href-lookup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jvwaldrich0%2Fhtml-href-lookup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jvwaldrich0%2Fhtml-href-lookup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jvwaldrich0","download_url":"https://codeload.github.com/jvwaldrich0/html-href-lookup/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jvwaldrich0%2Fhtml-href-lookup/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31383634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T23:20:52.058Z","status":"ssl_error","status_checked_at":"2026-04-03T23:20:51.675Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash-script","desec","linux-shell"],"created_at":"2026-04-04T00:37:15.993Z","updated_at":"2026-04-04T00:37:16.456Z","avatar_url":"https://github.com/jvwaldrich0.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HTML HREF Lookup Documentation\nThis script is written in Bash and is used to parse an HTML file, extract URLs from the href attributes in the body section, perform a DNS lookup on each URL, and save the results to an output file.\n\n## Usage\n![alt text](image.png)\n```shell\n./parsing.sh [TARGET_URL]\n```\nReplace `[TARGET_URL]` with the URL of the HTML file you want to parse.\n\n## Variables\n- `DEFAULT_OUTPUT_FILE`: The default path for the output file.\n- `DEFAULT_INDEX_FILE`: The default name for the index file.\n- `TEMP_FOLDER_NAME`: The name of the temporary folder used during the script's execution.\n- `TARGET_URL`: The URL of the HTML file to parse.\n## Functions\n- `filter_href_html`: This function filters the href attributes from the HTML body section.\n## Workflow\n1. The script creates a temporary folder and navigates into it.\n2. It downloads the HTML file from the target URL.\n3. It extracts URLs from the HTML file, performs a DNS lookup on each URL, and saves the results to an array.\n4. The array of results is saved to the output file.\n5. The script navigates back to the parent directory and removes the temporary folder.\n\n## Output\n\nThe script outputs a list of URLs found in the HTML file, along with the results of a DNS lookup for each URL. This output is saved to the output file specified by ``DEFAULT_OUTPUT_FILE``.\n\n## Error Handling\nIf the script fails to create the temporary folder or navigate into it, it will exit with a status code of 1.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjvwaldrich0%2Fhtml-href-lookup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjvwaldrich0%2Fhtml-href-lookup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjvwaldrich0%2Fhtml-href-lookup/lists"}