{"id":15374454,"url":"https://github.com/ivan-sincek/domain-extractor","last_synced_at":"2025-02-28T00:32:28.915Z","repository":{"id":107018916,"uuid":"286042240","full_name":"ivan-sincek/domain-extractor","owner":"ivan-sincek","description":"Extract valid or partially valid domain names and IPs from malicious or invalid URLs.","archived":true,"fork":false,"pushed_at":"2023-06-19T21:10:00.000Z","size":5,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-16T14:56:51.655Z","etag":null,"topics":["bug-bounty","computer-forensics","defensive-security","domain","domain-name","ethical-hacking","extractor","incident-response","ip","penetration-testing","python","red-team-engagement","security","threat-hunting","threat-intelligence","url"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ivan-sincek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-08T12:49:57.000Z","updated_at":"2024-11-22T17:04:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"fdbe6837-5f40-4249-8b8e-b77d51f8fe5d","html_url":"https://github.com/ivan-sincek/domain-extractor","commit_stats":{"total_commits":1,"total_committers":1,"mean_commits":1.0,"dds":0.0,"last_synced_commit":"c6ea36480ddb293b0c117319ac19d016e9b48afb"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivan-sincek%2Fdomain-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivan-sincek%2Fdomain-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivan-sincek%2Fdomain-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivan-sincek%2Fdomain-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ivan-sincek","download_url":"https://codeload.github.com/ivan-sincek/domain-extractor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241078872,"owners_count":19905949,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bug-bounty","computer-forensics","defensive-security","domain","domain-name","ethical-hacking","extractor","incident-response","ip","penetration-testing","python","red-team-engagement","security","threat-hunting","threat-intelligence","url"],"created_at":"2024-10-01T13:58:49.456Z","updated_at":"2025-02-28T00:32:28.907Z","avatar_url":"https://github.com/ivan-sincek.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Domain Extractor\n\nExtract valid or partially valid domain names and IPs from malicious or invalid URLs.\n\nKeep in mind that the algorithm is not that perfect, there might be false positives.\n\nTested on Kali Linux v2023.1 (64-bit).\n\nCheck the testing URLs [here](https://github.com/ivan-sincek/domain-extractor/blob/master/examples/urls.txt) and the results [here](https://github.com/ivan-sincek/domain-extractor/blob/master/examples/results.json).\n\nMade for educational purposes. I hope it will help!\n\nFuture plans:\n\n* detect IPv6 addresses.\n\n## How to Run\n\nOpen your preferred console from [/src/](https://github.com/ivan-sincek/domain-extractor/tree/master/src) and run the commands shown below.\n\nInstall required packages:\n\n```fundamental\npip3 install -r requirements.txt\n```\n\nRun the script:\n\n```fundamental\npython3 domain_extractor.py\n```\n\n## Extract Results\n\nExtract hosts from the results:\n\n```bash\njq -r '.[].hosts[]' results.json | sort -u -f | tee -a hosts.txt\n```\n\nExtract URLs with valid or partially valid hosts from the results:\n\n```bash\njq -r '.[] | if (.hosts != []) then (.original) else (empty) end' results.json | sort -u -f | tee -a valid_urls.txt\n```\n\nExtract URLs with no valid nor partially valid hosts from the results:\n\n```bash\njq -r '.[] | if (.hosts == []) then (.original) else (empty) end' results.json | sort -u -f | tee -a invalid_urls.txt\n```\n\n## Usage\n\n```fundamental\nDomain Extractor v3.0 ( github.com/ivan-sincek/domain-extractor )\n\nUsage:   python3 domain_extractor.py -f file               -o out\nExample: python3 domain_extractor.py -f malicious_urls.txt -o results.json\n\nDESCRIPTION\n    Extract valid or partially valid domain names and IPs from URLs\nFILE\n    File with URLs you want to extract data from\n    -f \u003cfile\u003e - malicious_urls.txt | etc.\nOUT\n    Output file\n    -o \u003cout\u003e - results.json | etc.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivan-sincek%2Fdomain-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fivan-sincek%2Fdomain-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivan-sincek%2Fdomain-extractor/lists"}