{"id":22960246,"url":"https://github.com/stringmanolo/wtm","last_synced_at":"2025-04-02T02:43:15.317Z","repository":{"id":245322670,"uuid":"817863225","full_name":"StringManolo/wtm","owner":"StringManolo","description":"WebTextMiner extracts text from webs to create wordlists usefull for brute-force attacks","archived":false,"fork":false,"pushed_at":"2024-06-20T18:03:20.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T16:21:25.879Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StringManolo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-20T15:40:01.000Z","updated_at":"2024-06-20T18:03:23.000Z","dependencies_parsed_at":"2024-06-21T11:56:33.815Z","dependency_job_id":"6a884dc7-a462-4a8d-8b70-be421a931e75","html_url":"https://github.com/StringManolo/wtm","commit_stats":null,"previous_names":["stringmanolo/wtm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StringManolo%2Fwtm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StringManolo%2Fwtm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StringManolo%2Fwtm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StringManolo%2Fwtm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StringManolo","download_url":"https://codeload.github.com/StringManolo/wtm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246746879,"owners_count":20827061,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-14T18:31:42.538Z","updated_at":"2025-04-02T02:43:15.297Z","avatar_url":"https://github.com/StringManolo.png","language":"Shell","readme":"# wtm\nWebTextMiner extracts text from webs to create wordlists usefull for brute-force attacks\n\n### What is this program?\n\n- WebTextMiner creates wordlists from webpages to be used in brute-force attacks.  \n- This wordlist can be used to enumerate passwords, usernames, etc.\n- You probably want to feed this wordlists to another tool like Hydra, Hashcat, John the ripper, etc.\n\n### Download\n\n```bash\ngit clone https://github.com/stringmanolo/wtm\ncd wtm\n```\n\n### Install\n\n##### In Linux and Proot-Distro\n```bash\nchmod +775 wtm.sh\nmv wtm.sh /bin/wtm\n```\n\n##### In Termux\n```bash\nmv wtm.sh /data/data/com.termux/files/usr/bin/wtm\n```\n\n### Usage\n```bash\nwtm -u url -f filename.txt -d depth_level\n```\n\n\u003e filename.txt can be reused. If you already have word/passwords/usernames, the new words will be added to it without duplicates.\n\n\n### Example\n```bash\nwtm -u https://example.com -f passwords.txt -d 2\n\ncat passwords.txt\n# cat debug_urls.txt\n```\n\n\u003e Carefull with the depth level. The bigger the depth level, the more words and urls will be extracted\n\u003e You can scan millions of urls with depth level 3+ or 4+ depending on the amount of urls the webpage has\n\n\n### Alternative Install\n\n##### You can also copy the script and paste it in your text editor if u lazy to git clone \n```bash\n#!/bin/sh\n\ndebug_urls() {\n    if [ -f \"debug_urls.txt\" ]; then\n        rm -f debug_urls.txt\n    fi\n}\n\nextract_words_and_urls() {\n    local url=\"$1\"\n    local wordlist_file=\"$2\"\n    local temp_file=$(mktemp)\n\n    local user_agent=\"Mozilla/5.0 (Linux; Android 10; SM-G960F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.181 Mobile Safari/537.36\"\n\n    curl -s -L -A \"$user_agent\" \"$url\" -o \"$temp_file\"\n    if [ $? -ne 0 ]; then\n        echo \"Error: Failed to download the webpage $url.\"\n        rm -f \"$temp_file\"\n        exit 1\n    fi\n\n    if [ -f \"$temp_file\" ]; then\n        # Extract words and URLs\n        tr -s '[:space:]' '\\n' \u003c \"$temp_file\" | grep -oE '\\b\\w+\\b' | sort -u \u003e\u003e \"$wordlist_file\"\n        sed -nE 's/.*((https?|ftp|file):\\/\\/[^\"]+).*/\\1/p' \"$temp_file\" | sort -u \u003e\u003e debug_urls.txt\n        rm \"$temp_file\"\n    else\n        echo \"Error: Temporary file not found.\"\n        exit 1\n    fi\n}\n\nextract_urls_recursively() {\n    local urls=\"$1\"\n    local depth=\"$2\"\n    local current_depth=1\n\n    while [ \"$current_depth\" -le \"$depth\" ]; do\n        echo \"Depth: $current_depth\"\n\n        new_urls=\"\"\n        for url in $urls; do\n            extract_words_and_urls \"$url\" \"$wordlist_file\"\n            extracted_urls=$(sed -nE 's/.*((https?|ftp|file):\\/\\/[^\"]+).*/\\1/p' debug_urls.txt | sort -u)\n            new_urls=\"$new_urls $extracted_urls\"\n        done\n\n        urls=$(echo \"$new_urls\" | sort -u)\n        current_depth=$((current_depth + 1))\n    done\n\n    echo \"A total of $(wc -l \u003c \"$wordlist_file\") unique words have been extracted.\"\n}\n\nmain() {\n    local url=\"\"\n    local wordlist_file=\"wordlist.txt\"\n    local depth=1\n\n    debug_urls  # Limpiar archivo de debug existente\n\n    while [ $# -gt 0 ]; do\n        case \"$1\" in\n            -u|--url)\n                url=\"$2\"\n                shift 2\n                ;;\n            -f|--file)\n                wordlist_file=\"$2\"\n                shift 2\n                ;;\n            -d|--depth)\n                depth=\"$2\"\n                shift 2\n                ;;\n            -h|--help)\n                echo \"Usage: $0 -u|--url URL [-f|--file FILE] [-d|--depth DEPTH]\"\n                exit 0\n                ;;\n            *)\n                echo \"Unrecognized argument: $1\"\n                exit 1\n                ;;\n        esac\n    done\n\n    if [ -z \"$url\" ]; then\n        echo \"Error: You must provide a URL using -u or --url.\"\n        exit 1\n    fi\n\n    echo \"Extracting words and URLs recursively with depth $depth from $url...\"\n\n    extract_urls_recursively \"$url\" \"$depth\"\n}\n\nmain \"$@\"\n```\n\n##### Remember to give permissions to the script\n```bash\nchmod +775 wtm.sh\n\n./wtm.sh -h\n\n# Move it to your bins folder as wtm to use it as a command from any folder\n\n# Termux:\n# mv wtm.sh /data/data/com.termux/files/usr/bin/wtm\n\n# Linux and Proot-Distro:\n# mv wtm.sh /bin/wtm\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstringmanolo%2Fwtm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstringmanolo%2Fwtm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstringmanolo%2Fwtm/lists"}