{"id":13586219,"url":"https://github.com/wisepythagoras/website-fingerprinting","last_synced_at":"2025-10-27T11:30:40.628Z","repository":{"id":53916716,"uuid":"107194806","full_name":"wisepythagoras/website-fingerprinting","owner":"wisepythagoras","description":"Deanonymizing Tor or VPN users with website fingerprinting and machine learning.","archived":false,"fork":false,"pushed_at":"2024-08-15T23:03:01.000Z","size":197,"stargazers_count":91,"open_issues_count":1,"forks_count":24,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-04T03:31:36.335Z","etag":null,"topics":["anonymous","anonymous-users","capture","classifier","deanonymization","fingerprinting","fingerprinting-algorithm","fingerprinting-methods","goldberg","machine-learning","packets","privacy","scikit-learn","tor","traffic","user-privacy","wang","website-fingerprinting"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wisepythagoras.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-16T23:45:02.000Z","updated_at":"2025-01-31T13:13:52.000Z","dependencies_parsed_at":"2024-01-07T22:49:34.378Z","dependency_job_id":"5c65405c-d7ec-46e8-93c1-886c0ce94d25","html_url":"https://github.com/wisepythagoras/website-fingerprinting","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wisepythagoras%2Fwebsite-fingerprinting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wisepythagoras%2Fwebsite-fingerprinting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wisepythagoras%2Fwebsite-fingerprinting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wisepythagoras%2Fwebsite-fingerprinting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wisepythagoras","download_url":"https://codeload.github.com/wisepythagoras/website-fingerprinting/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238486743,"owners_count":19480470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymous","anonymous-users","capture","classifier","deanonymization","fingerprinting","fingerprinting-algorithm","fingerprinting-methods","goldberg","machine-learning","packets","privacy","scikit-learn","tor","traffic","user-privacy","wang","website-fingerprinting"],"created_at":"2024-08-01T15:05:24.485Z","updated_at":"2025-10-27T11:30:35.342Z","avatar_url":"https://github.com/wisepythagoras.png","language":"Python","funding_links":[],"categories":["Python","\u003ca id=\"d62a971d37c69db9f3b9187318c3921a\"\u003e\u003c/a\u003e工具"],"sub_categories":["\u003ca id=\"8ea8f890cf767c3801b5e7951fca3570\"\u003e\u003c/a\u003e公网访问局域网"],"readme":"# Website Fingerprinting\n\nWebsite fingerprinting is a method of Tor or VPN packet inspection that aims to collect enough features and information from individual sessions that could aid in identifying the activity of anonymized users.\n\nContributions and bug fixes are welcome.\n\n## Introduction\n\nFor this experiment, Tor is required. It can be installed by running the following commands:\n\n\n``` bash\n# For Debian or Ubuntu\nsudo apt install tor lynx\n\n# For Fedora\nsudo yum install tor lynx\n\n# For ArchLinux\nsudo pacman -S tor torsocks lynx\n```\n\nBy installing Tor we also get a program called `torsocks`; this program will be used to redirect traffic of common programs through the Tor network. For example, it can be run as follows:\n\n``` bash\n# SSH through Tor.\ntorsocks ssh user@example.com\n\n# CUrl through Tor.\ntorsocks curl -L http://httpbin.org/ip\n\n# Etc...\n```\n\n## Required Python 3 Modules\n\nFirstly, activate a virtual environment:\n\n``` bash\ncd path/to/website-fingerprinting\npython -m venv $PWD/venv\nsource venv/bin/activate\n```\n\nAnd then install all the dependencies:\n\n``` bash\npip install -r requirements.txt\n```\n\n## Data Collection\n\nFor the data collection process two terminal windows in a side-by-side orientation are required, as this process is fairly manual. Also, it's advised to collect the fingerprints in a VM, in order to avoid caputring any unintended traffic. To listen on traffic there exists a script, namely [capture.sh](pcaps/capture.sh), which should be run in one of the terminals:\n\n``` bash\n./pcaps/capture.sh duckduckgo.com\n```\n\nOnce the listener is capturing traffic, on the next terminal run:\n\n``` bash\ntorsocks lynx https://duckduckgo.com\n```\n\nOnce the website has finished loading, the capture process needs to be killed, along with the browser session (by hitting the `q` key twice). The process should be repeated several times for each web page so that there is enough data.\n\n## Machine Learning\n\n[Scikit Learn](http://scikit-learn.org/stable/) was used to write a [k Nearest Neighbors](http://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification) classifier, that would read the pcap files, as specified in the [config.json](config.json) file. `config.json` can be changed according to which webpages were targeted for training. The training script is [gather_and_train.py](gather_and_train.py).\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://scikit-learn.org/stable/_images/sphx_glr_plot_classification_001.png\" alt=\"Scikit Learn kNN\" /\u003e\n\u003c/p\u003e\n\n## Classifying Unknown Traffic\n\n```bash\n# python predict.py [packet to classify]\n  python predict.py xyz.pcap\n```\nOnce the training is done, and the `classifier-nb.dmp` is created, the [predict.py](predict.py) script can be run with the pcap file as the sole argument. The script will load the classifier and attempt to identify which web page the traffic originated from.\n\nIt is worth noting that from each sample only the first 40 packets will be used to train a usable model and to run through the resulting classifier.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"graphs/graph-screenshot.png\" alt=\"Visualizing the patterns\" /\u003e\n\u003c/p\u003e\n\nAs can be seen in the screenshot above, the patterns of the packets of each website can be seen clearly on a 3D scale. The classifier visualizes the data in a similar way and gives us the most accurate result.\n\nAn interactive version of this graph can be found in the [graphs](graphs) folder.\n\n## Limitations and Disclaimers\n\nThis setup was created in order to research the topic of website fingerprinting and how easy it is to attempt to deanonymize users over Tor or VPNs. Traffic was captured and identified in a private setting and for purely academic purposes; the use of this source code is intended for those reasons only.\n\nTraffic is never \"clean\", as the assumption was - for simplicity - in this research. However, if an entity has enough resources, the desired anonymized traffic can be isolated and fed into this simple classifier. This means that it is entirely possible to use a method like this to compromise anonymized users.\n\n## References\n\nThis research was inspired by the following research:\n\n1. Wang, T. and Goldberg, I. (2017). Website Fingerprinting. [online] Cse.ust.hk. Available at: http://web.archive.org/web/*/https://www.cse.ust.hk/~taow/wf/*.\n2. Wang, T. and Goldberg, I. (2013). Improved Website Fingerprinting on Tor. Cheriton School of Computer Science. Available at: http://www.cypherpunks.ca/~iang/pubs/webfingerprint-wpes.pdf\n3. Wang, T. (2015). Website Fingerprinting: Attacks and Defenses. University of Waterloo. Available at: https://uwspace.uwaterloo.ca/bitstream/handle/10012/10123/Wang_Tao.pdf\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwisepythagoras%2Fwebsite-fingerprinting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwisepythagoras%2Fwebsite-fingerprinting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwisepythagoras%2Fwebsite-fingerprinting/lists"}