{"id":34028923,"url":"https://github.com/roys/cewler","last_synced_at":"2026-04-05T11:31:24.358Z","repository":{"id":65781473,"uuid":"599662806","full_name":"roys/cewler","owner":"roys","description":"CeWLeR - Custom Word List generator Redefined. CeWL alternative in Python, based on the Scrapy framework.","archived":false,"fork":false,"pushed_at":"2025-11-08T19:53:16.000Z","size":162,"stargazers_count":131,"open_issues_count":8,"forks_count":16,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-08T21:18:02.850Z","etag":null,"topics":["bugbounty","crawler","reconnaissance","spider"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roys.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-02-09T16:08:04.000Z","updated_at":"2025-11-08T19:53:19.000Z","dependencies_parsed_at":"2025-04-12T07:23:54.326Z","dependency_job_id":"44cfab6c-53dd-49b1-833b-19a80bcdbbd1","html_url":"https://github.com/roys/cewler","commit_stats":{"total_commits":22,"total_committers":1,"mean_commits":22.0,"dds":0.0,"last_synced_commit":"fbcd7e9f214c24e4fc59d1d509d305f1ab92edec"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/roys/cewler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roys%2Fcewler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roys%2Fcewler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roys%2Fcewler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roys%2Fcewler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roys","download_url":"https://codeload.github.com/roys/cewler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roys%2Fcewler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31434624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T08:13:15.228Z","status":"ssl_error","status_checked_at":"2026-04-05T08:13:11.839Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bugbounty","crawler","reconnaissance","spider"],"created_at":"2025-12-13T17:22:45.636Z","updated_at":"2026-04-05T11:31:24.351Z","avatar_url":"https://github.com/roys.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CeWLeR - Custom Word List generator Redefined\n_CeWLeR_ crawls from a specified URL and collects words to create a custom wordlist.\n\nIt's a great tool for security testers and bug bounty hunters. The lists can be used for password cracking, subdomain enumeration, directory and file brute forcing, API endpoint discovery, etc. It's good to have an additional target specific wordlist that is different than what everybody else use.\n\n_CeWLeR_ was sort of originally inspired by the really nice tool [CeWL](https://github.com/digininja/CeWL). I had some challenges with _CeWL_ on a site I wanted a wordlist from, but without any Ruby experience I didn't know how to contribute or work around it. So instead I created a custom wordlist generator in Python to get the job done.\n\n## At a glance\n\u003cimg src=\"https://github.com/roys/cewler/blob/main/misc/demo.gif?raw=true\" width=\"800\" /\u003e\n\n## Features\n- Generates custom wordlists by scraping words from web sites\n- A lot of options:\n  - Output to screen or file\n  - Can stay within subdomain, or visit sibling and child subdomains, or visit anything within the same top domain\n  - Can stay within a certain depth of a website\n  - Speed can be controlled\n  - Word length and casing can be configured\n  - JavaScript and CSS can be included\n  - Text can be extracted from PDF files (using [pypdf](https://pypi.org/project/pypdf/))\n  - Crawled URLs can be output to separate file\n  - Scraped e-mail addresses can also be output to separate file\n  - Custom HTTP headers can be added\n  - ++\n- Using the excellent [Scrapy](https://scrapy.org) framework for scraping and using the beautiful [rich](https://github.com/Textualize/rich) library for terminal output\n\n## Commands and options\n### Quick examples\n#### Output to file\nWill output to screen unless a file is specified.  \n`cewler --output wordlist.txt https://example.com`  \n\n#### Control speed and depth\nThe rate is specified in requests per second. Depth controls link hops from the start URL (not URL path depth). Please play nicely and don't break any rules.\n`cewler --output wordlist.txt --rate 5 --depth 2 https://example.com`  \n\n#### Change User-Agent header\nThe default User-Agent is a common browser.  \n`cewler --output wordlist.txt --user-agent \"Cewler\" https://example.com`  \n\n#### Add custom HTTP headers\nIt's possible to specify custom HTTP headers for the requests. Multiple headers can be specified. \n`cewler -H \"X-Bounty: d14c14ec\" https://httpbin.org/headers`  \n\n#### Control casing, word length and characters\nUnless specified the words will have mixed case and be of at least 5 in length.  \n`cewler --output wordlist.txt --lowercase --min-word-length 2 --without-numbers https://example.com`  \n\n#### Visit all domains - including parent, children and siblings\nThe default is to just visit exactly the same (sub)domain as specified.  \n`cewler --output wordlist.txt -s all https://example.com`  \n\n#### Visit same (sub)domain + any belonging child subdomains\n`cewler --output wordlist.txt -s children https://example.com`  \n\n#### Include JavaScript and/or CSS\nIf you want you can include links from `\u003cscript\u003e` and `\u003clink\u003e` tags, plus words from within JavaScript and CSS.  \n`cewler --output wordlist.txt --include-js --include-css https://example.com`  \n\n#### Include PDF files\nIt's easy to extract text from PDF files as well.    \n`cewler --output wordlist.txt --include-pdf https://example.com`  \n\n#### Output visited URLs to file\nIt's also possible to store the crawled files to a file.  \n`cewler --output wordlist.txt --output-urls urls.txt https://example.com`  \n\n#### Output e-mails to file\nIt's also possible to store the scraped e-mail addresses to a separate file (they are always added to the wordlist).  \n`cewler --output wordlist.txt --output-emails emails.txt https://example.com`  \n\n#### HTTP proxy\nYou can specify a HTTP proxy.  \n`cewler --proxy http://localhost:8080 https://example.com`  \n\n#### Ninja trick 🥷\nIf it just takes too long to crawl a site you can press `ctrl + c` once(!) and wait while the spider finishes the current requests and then whatever words have been found so far is stored to the output file.\n\n### All options\n```\ncewler -h\nusage: cewler [-h] [-d DEPTH] [-css] [-js] [-pdf] [-l] [-m MIN_WORD_LENGTH] [-o OUTPUT] [-oe OUTPUT_EMAILS]\n              [-ou OUTPUT_URLS] [-r RATE] [-s {all,children,exact}] [--stream] [-u USER_AGENT] [-H HEADER] [-p PROXY]\n              [-v] [-w]\n              url\n\nCeWLeR - Custom Word List generator Redefined\n\npositional arguments:\n  url                   URL to start crawling from\n\noptions:\n  -h, --help            show this help message and exit\n  -d, --depth DEPTH     max link hops from start URL, 0 for unlimited (default: 2)\n  -css, --include-css   include CSS from external files and \u003cstyle\u003e tags\n  -js, --include-js     include JavaScript from external files and \u003cscript\u003e tags\n  -pdf, --include-pdf   include text from PDF files\n  -l, --lowercase       lowercase all parsed words\n  -m, --min-word-length MIN_WORD_LENGTH\n                        minimum word length to include (default: 5)\n  -o, --output OUTPUT   file were to stream and store wordlist instead of screen (default: screen)\n  -oe, --output-emails OUTPUT_EMAILS\n                        file were to stream and store e-mail addresses found (they will always be outputted in the\n                        wordlist)\n  -ou, --output-urls OUTPUT_URLS\n                        file were to stream and store URLs visited (default: not outputted)\n  -r, --rate RATE       requests per second (default: 20)\n  -s, --subdomain_strategy {all,children,exact}\n                        allow crawling [all] domains, including children and siblings, only [exact] the same (sub)domain\n                        (default), or same domain and any belonging [children]\n  --stream              writes to file after each request (may produce duplicates because of threading) (default: false)\n  -u, --user-agent USER_AGENT\n                        User-Agent header to send (default: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36\n                        (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36)\n  -H, --header HEADER   custom header in 'Name: Value' format (can be used multiple times, overrides -u if 'User-Agent'\n                        is specified)\n  -p, --proxy PROXY     proxy URL ([http(s)://[user:pass@]]host[:port])\n  -v, --verbose         a bit more detailed output\n  -w, --without-numbers\n                        ignore words that are numbers or contain numbers\n```\n\n### Subdomain strategies\n\nExample URL to scan `https://sub.example.com`:\n\n|   | `-s exact`* | `-s children` | `-s all` |\n| --- | --- | --- | --- |\n| `sub.example.com` | ✅ | ✅ | ✅ |\n| `child.sub.example.com` | ❌ | ✅ | ✅ |\n| `sibling.example.com` | ❌ | ❌ | ✅ |\n| `example.com` | ❌ | ❌ | ✅ |\n\\* Default strategy\n\n### Digging into the code\nIf you want to do some tweaking yourself you can probably find what you want in [src/cewler/constants.py](src/cewler/constants.py) and [src/cewler/spider.py](src/cewler/spider.py)\n\n## Installation and upgrade\n### Alternative 1 - installing from PyPI\nPackage homepage: https://pypi.org/project/cewler/\n\n`python3 -m pip install cewler`\n\n#### Upgrade\n`python3 -m pip install cewler --upgrade`\n\n### Alternative 2 - installing from GitHub\n#### 1. Clone repository\n```\ngit clone https://github.com/roys/cewler.git --depth 1\ncd cewler\n```\n\n#### 2. Create virtual environment (optional, but recommended)\nThis keeps dependencies isolated and avoids affecting your system Python.\n```\npython3 -m venv venv\nsource venv/bin/activate\n```\n\n#### 3. Install cewler in editable mode\n```\npython3 -m pip install -e .\n```\n\nThis installs cewler and all its dependencies, creating the `cewler` command that you can run from anywhere (while the venv is active). Any changes you make to the source code will be immediately reflected when you run the command.\n\n#### Upgrade\n`git pull`\n\n## Docker\nTo run CeWLeR with docker you first build the docker container:  \n`docker build . -t cewler`\n\nAfter the container finishes building you can run CeWLeR like this to store the output in the current folder:  \n`docker run -v \"$(pwd):/app\" cewler --output /app/wordlist.txt --depth 1 https://blog.roysolberg.com`\n\n## Pronunciation\n_CeWLeR_ is pronounced _\"cooler\"_.\n\n## Contributors\nA huge thank you to everyone who has contributed to making CeWLeR better! Your contributions, big and small, make a significant difference.\n\nContributions of any kind are welcome and recognized. From bug reports to coding, documentation to design, every effort is appreciated:\n - [Chris Dale](https://github.com/ChrisAD) - for testing, bug reporting and fixing\n - [Mathies Svarrer-Lanthén](https://github.com/seihtam) - for adding support for PDF extraction\n - [webhak](https://github.com/webhak) - for adding Docker support\n\n## License\n[Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froys%2Fcewler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froys%2Fcewler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froys%2Fcewler/lists"}