{"id":13585497,"url":"https://github.com/ReconInfoSec/web-traffic-generator","last_synced_at":"2025-04-07T10:30:56.837Z","repository":{"id":38779052,"uuid":"84889399","full_name":"ReconInfoSec/web-traffic-generator","owner":"ReconInfoSec","description":"A quick and dirty HTTP/S \"organic\" traffic generator. ","archived":false,"fork":false,"pushed_at":"2023-04-06T08:59:21.000Z","size":30,"stargazers_count":475,"open_issues_count":7,"forks_count":164,"subscribers_count":29,"default_branch":"master","last_synced_at":"2024-11-06T03:42:54.180Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ReconInfoSec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-03-14T00:56:54.000Z","updated_at":"2024-10-21T10:37:22.000Z","dependencies_parsed_at":"2022-08-09T06:00:47.451Z","dependency_job_id":"cde23112-bf01-42a9-86b5-c5e9e051efdd","html_url":"https://github.com/ReconInfoSec/web-traffic-generator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReconInfoSec%2Fweb-traffic-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReconInfoSec%2Fweb-traffic-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReconInfoSec%2Fweb-traffic-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReconInfoSec%2Fweb-traffic-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ReconInfoSec","download_url":"https://codeload.github.com/ReconInfoSec/web-traffic-generator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247636025,"owners_count":20970847,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:04:58.723Z","updated_at":"2025-04-07T10:30:56.559Z","avatar_url":"https://github.com/ReconInfoSec.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# web-traffic-generator\n\nA quick and dirty HTTP/S \"organic\" traffic generator.\n\n## About\n\nJust a simple (poorly written) Python script that aimlessly \"browses\" the internet by starting at pre-defined `ROOT_URLS` and randomly \"clicking\" links on pages until the pre-defined `MAX_DEPTH` is met.\n\nI created this as a noise generator to use for an Incident Response / Network Defense simulation. The only issue is that my simulation environment uses multiple IDS/IPS/NGFW devices that will not pass and log simple TCPreplays of canned traffic. I needed the traffic to be as organic as possible, essentially mimicking real users browsing the web.\n\nTested on Ubuntu 14.04 \u0026 16.04 minimal, but should work on any system with Python installed.\n\n[![asciicast](https://asciinema.org/a/304683.png)](https://asciinema.org/a/304683)\n\n## How it works\n\nAbout as simple as it gets...\n\n**First, specify a few settings at the top of the script...**\n\n- `MAX_DEPTH = 10`, `MIN_DEPTH = 5` Starting from each root URL (ie: www.yahoo.com), our generator will click to a depth\nradomly selected between MIN_DEPTH and MAX_DEPTH.\n\n*The interval between every HTTP GET requests is chosen at random between the following two variables...*\n\n- `MIN_WAIT = 5` Wait a minimum of `5` seconds between requests... Be careful with making requests to quickly as that tends to piss off web servers.\n- `MAX_WAIT = 10` I think you get the point.\n\n- `DEBUG = False` A poor man's logger. Set to `True` for verbose realtime printing to console for debugging or development. I'll incorporate proper logging later on (maybe).\n\n- `ROOT_URLS = [url1,url2,url3]` The list of root URLs to start from when browsing. Randomly selected.\n\n- `blacklist = [\".gif\", \"intent/tweet\", \"badlink\", etc...]` A blacklist of strings that we check every link against. If the link contains any of the strings in this list, it's discarded. Useful to avoid things that are not traffic-generator friendly like \"Tweet this!\" links or links to image files.\n\n- `userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3).......'` You guessed it, the user-agent our headless browser hands over to the web server. You can probably leave it set to the default, but feel free to change it. I would strongly suggest using a common/valid one or else you'll likely get rate-limited quick.\n\n## Dependencies\n\nOnly thing you need and *might* not have is `requests`. Grab it with\n\n```bash\nsudo pip install requests\n```\n\n## Usage\n\nCreate your config file first:\n\n```bash\ncp config.py.template config.py\n```\n\nRun the generator:\n\n```bash\npython gen.py\n```\n\n## Troubleshooting and debugging\n\nTo get more deets on what is happening under the hood, change the Debug variable in `config.py` from `False` to `True`. This provides the following output...\n\n```console\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nTraffic generator started\nDiving between 3 and 10 links deep into 489 different root URLs,\nWaiting between 5 and 10 seconds between requests.\nThis script will run indefinitely. Ctrl+C to stop.\nRandomly selecting one of 489 URLs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nRecursively browsing [https://arstechnica.com] ~~~ [depth = 7]\n  Requesting page...\n  Page size: 77.6KB\n  Data meter: 77.6KB\n  Good requests: 1\n  Bad reqeusts: 0\n  Scraping page for links\n  Found 171 valid links\n  Pausing for 7 seconds...\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nRecursively browsing [https://arstechnica.com/author/jon-brodkin/] ~~~ [depth = 6]\n  Requesting page...\n  Page size: 75.7KB\n  Data meter: 153.3KB\n  Good requests: 2\n  Bad reqeusts: 0\n  Scraping page for links\n  Found 168 valid links\n  Pausing for 9 seconds...\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nRecursively browsing [https://arstechnica.com/information-technology/2020/01/directv-races-to-decommission-broken-boeing-satellite-before-it-explodes/] ~~~ [depth = 5]\n  Requesting page...\n  Page size: 43.8KB\n  Data meter: 197.1KB\n  Good requests: 3\n  Bad reqeusts: 0\n  Scraping page for links\n  Found 32 valid links\n  Pausing for 8 seconds...\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nRecursively browsing [https://www.facebook.com/sharer.php?u=https%3A%2F%2Farstechnica.com%2F%3Fpost_type%3Dpost%26p%3D1647915] ~~~ [depth = 4]\n  Requesting page...\n  Page size: 64.2KB\n  Data meter: 261.2KB\n  Good requests: 4\n  Bad reqeusts: 0\n  Scraping page for links\n  Found 0 valid links\n  Stopping and blacklisting: no links\n```\n\nThe last URL attempted provides a good example of when a particular URL throws an error. We simply add it to our `config.blacklist` array in memory, and continue browsing. This prevents a known bad URL from returning to the queue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FReconInfoSec%2Fweb-traffic-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FReconInfoSec%2Fweb-traffic-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FReconInfoSec%2Fweb-traffic-generator/lists"}