{"id":19174634,"url":"https://github.com/equalitie/bothound","last_synced_at":"2025-10-28T03:32:23.250Z","repository":{"id":93032727,"uuid":"43558572","full_name":"equalitie/BotHound","owner":"equalitie","description":"Automatic attack detector and botnet classifier","archived":false,"fork":false,"pushed_at":"2017-01-10T13:36:56.000Z","size":21358,"stargazers_count":28,"open_issues_count":1,"forks_count":9,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-20T01:33:01.247Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/equalitie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-10-02T14:51:28.000Z","updated_at":"2024-08-12T19:19:31.000Z","dependencies_parsed_at":"2023-03-13T17:22:53.867Z","dependency_job_id":null,"html_url":"https://github.com/equalitie/BotHound","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2FBotHound","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2FBotHound/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2FBotHound/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2FBotHound/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/equalitie","download_url":"https://codeload.github.com/equalitie/BotHound/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252931815,"owners_count":21827171,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T10:18:36.375Z","updated_at":"2025-10-28T03:32:23.183Z","avatar_url":"https://github.com/equalitie.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"BotHound\n=======\n\nAutomatic DDoS attack detector and botnet classifier\n-----------\n\n# Description\nBothound is an automatic DDoS attack detector and botnet classifier. Its purpose is to create a historical classification of the attacks with detailed information regarding the attackers (country-based, time-based, etc.).\n\nBothound's role is to detect and classify the attacks (incidents), using the anomaly-detection and machine-learning tool [Grey Memory](https://github.com/greymemory). BotHound attack classifier reacts to anomalous detectors and starts gathering live information from the Deflect network. It computes a behaviour vector for all visitors of the network when Grey Memory detects an anomaly. BotHound groups the client IPs in different groups (clusters) using unsupervised machine learning algorithms in order to profile the group of malicious visitors. It uses different measures to tag the groups which are more likely to be attackers. After that, it feeds all the behaviour vectors of bot IPs into a classifier to detect if the botnet has a history of attacking the [Deflect network](https://wiki.deflect.ca/wiki/Main_Page) in the past. It finally generates a report based on its conclusions for Deflect's sysops and gets feedback to improve its classification performance.\n\n# Installation\n\n## Python\nPython 2.7 should be installed\n\n## Libraries\n\nFirst add the Jessie backports repository to `/etc/apt/sources.list`:\n\n    deb http://http.debian.net/debian jessie-backports main\n\nand run `apt-get update`.\n\nThe following libraries should be installed:  \n\n```  \n[sudo] apt-get install emacs python libmysqlclient-dev build-essential python-dev python-numpy python-setuptools python-scipy libatlas-dev python-matplotlib python-mysqldb python-geoip libffi-dev python-dnspython libssl-dev python-zmq   \n[sudo] apt-get install python-pip\n[sudo] pip install -U scikit-learn  \n[sudo] apt-get install python-twisted\n[sudo] apt-get install git  \n[sudo] apt-get install openjdk-8-jre openjdk-8-jdk\n[sudo] apt-get install mysql-server\n[sudo] apt-get install ant\n ```  \nWhen installing `openjdk-8-jre` and `openjdk-8-jdk`, make sure that version 7 is not installed.\n \n \n## Adminer\nInstall [Adminer](https://www.adminer.org/) interface  \n\n## Jupyter\n* First make sure that you install Jupyter locally because nbextension has a bug and is only able to install if there is a local installation.  \n``` \nsudo pip install jupyter_contrib_core\nsudo pip install jupyter --user\n```\n\n* Install Jupyter system-wide  \n```\nsudo pip install jupyter\n```\n\n* Install Jupyter nbextensions  \n```\npip install https://github.com/ipython-contrib/IPython-notebook-extensions/archive/master.zip\n```\n\n* The file is erroneously copied in the local folder. Copy the files to the system-wide folder.  \n```\nsudo cp -R /root/.local/share/jupyter /usr/local/share/\nsudo chmod -R a+r /usr/local/share/jupyter \n```\n\n## Get Source Code \n```\ngit clone https://github.com/equalitie/bothound  \ncd bothound/\n```\n\n## Install Packages\nInstall required packages from requirements.txt:  \n\n```\npip install -r requirements.txt  \n```\n\n## Configuration \nYou need to create a configuration file bothound.yaml\n\n* Make a copy of the [example configuration file](conf/rename_me_to_bothound.yaml)  \n* Rename the copy to bothound.yaml  \n* Update the file with your credentials.  \n\nbothound.yaml description:  \n\n* encryption\\_passphrase - the password for IP encryption  \n* hash\\_passphrase - the solt for hash function used for IP hash, stored in the database  \n* sniffles section - not supported yet  \n* elastic\\_db - Elastic search node credentials  \n\t\n## Greymemory installation\n* Build greymemory using the following script:  \n```\nsh build_greymemory.sh\n```  \n\nThe script will get the source code from github and build the source code using ant.\nMake sure the build is successfull and subfolder \"greymemory/greymemory.AnomalyDetector/dist\" contains greymemory.AnomalyDetector.jar.\n* rename greymemory/greymemory.AnomalyDetector/rename\\_me\\_to\\_AnomalyDetector.config to AnomalyDetector.config  \n\n## Greymemory configuration\nGreymemory monitors the rate of successful http requests for every protected host.\nTo calculate this rate Greymemory sends two request to ElasticSearch: 1) to get the total number of successful http request, and 2) to get the total number of failed http requests. The rate is calculated every 2 minutes by default. Every time the new rate is calculated Greymemory calculates the corresponding anomaly rate for the new value. If this anomaly rate is greater than a threshold, an anomaly is reported to bothound. Bothound creates a new incident for the corresponding host.\n\nFile greymemory/greymemory.AnomalyDetector/AnomalyDetector.config contains greymemory configuration:  \n\n* threshold - the threshold value of anomaly rate(default is 0.95)  \n* sample\\_rate\\_in\\_minutes - the sampling rate (default is 2 minutes)  \n* es\\_host, es\\_port, es\\_user, es\\_password - Elastic Search credentials  \n* mail\\_alert1, mail\\_alert2,... - emails for anomaly notifications  \n* target\\_host1, target\\_host2=... - the hosts being monitored. Don't use \"www.\"   \n\n# Initialization\n\n## Creating a database\n* Make sure Mysql server is up and running.  \n* To create a database, you just need to launch bothound :  \n```\npython src/bothound.py  \n```\nMake sure the database and the tables are created successfully.  \n* Create a test incident using the followin sql :  \n```\nINSERT INTO incidents (start,stop,process,target) VALUES (2016-06-01, 2016-06-02, 1, 'mysite.com');\n```\n* Make sure bothound is processing data from elastic search server. You should see the following message if the testing incident is processed correctly : \"Incident 1 processed\"\n\n## Establish ZMQ relay\nZMQ relay script provides communication channel between Greymemory and Bothound. Bothound uses TCP socket to connect to the relay. The relay uses encrypted ZMQ messages to communicate to Bothound. This design enables to scale the system and run multiple instances of Greymemory.\nTo run the relay:  \n```\npython src/util/socket2zmq.py\n```  \nMake sure you see the message \"Listening on port ... , relayint to ZMQ port ...\"\n\n## Test Greymemory\nTo run greymemory :  \n```\ncd ./greymemory/greymemory.AnomalyDetector\nsh anomaly_detector.sh\n```  \nMake sure you see a test anomaly message in bothound console : \"New incident : test_host, ...\"\n\n# Running\n\n## Running Bothound\nThe following scripts are created to simply the launch procedure. Launch in any order:  \n\n* bothound.sh  \n* greymemory.sh  \n* relay.sh  \n\n## Running Jupyter\n1. Make sure the Jupyter instance is running on the Bothound server. \nTo run the instance, launch this command:  \n```\njupyter notebook --no-browser --port=8889\n```\n2. Establish a tunnel to the Jupyter instance from your local computer:  \n```\nssh -N -L 8889:127.0.0.1:8889 user@server\n```\n3. Open the local URL [http://localhost:8889/](http://localhost:8889/).\nMake sure you see a list of files and folders.\n\n# Definitions\n* Session - an IP and a vector of feature values recorded and calculated during a period of the IP activity  \n* Feature - an individual measurable property of a session   \n* Incident - a set of sessions recorded during a time interval  \n* Attack - a subset of sessions in an incident which was labeled as an attack  \n* Botnet - a list of IPs that participated in similar attacks   \n\n# Incidents \nIncidents are created manually using the Adminer interface. In the future, incidents will be created automatically based on messages from the Grey Memory anomaly detector.\n\n## Creating incidents \n* Insert a new record into the \"incidents\" table. \n* Make sure you filled at least the \"start\", \"stop\" and \"target\" fields.\n* The target URL should not contain \"www.\" at the beginning. If you have multiple targets, you can add them separated by a comma.\n* Set \"process\" field to 1.\n\n## Creating incidents from nginx logs\n* Insert a new record into the \"incidents\" table. \n* Make sure you filled \"file_name\" with the full path to a nginx log file.\n* Set \"process\" field to 1.\n\n## Jupyter Notebook\nThe [Jupyter Notebook](http://jupyter.org/) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. \nNotebook contains a list of cells (markdown, python code, graphs). \nUse Shift+Enter to execute a cell.\nYou can fold/unfold the content of a cell using the left \"arrow\" key.\n\n# Sessions\n\nBothound calculates sessions for all the records in the incidents table containing \"1\" in the \"Process\" field. \n* Bothound monitors records in INCIDETNS table. \n* Bothound recalculates sessions for all the records from \"Incident\" table containing \"1\" in the \"Process\" field. \n* For regular incidents, the Bothound runs ElasticSearch queries. For nginx incidents, the Bothound will parse the corresponding log file.\n* The sessions will be stored in the \"sessions\" table.\n\n## IP Encryption\nFor security reasons, Bothound stores only encrypted IPs in the session table, in the \"ip\\_encrypted\", \"ip\\_iv\",and \"ip\\_tag\" fields. \nThe hash of the IP is also stored in the \"ip\" field.\nThe encryption key is set in the configuration file \"conf/bothound.yaml\" (\"encryption\\_passphrase\").\nBothound supports multiple encryption keys. The encryption table contains the hash value of the key which was used to encrypt the IPs of an incident. \n\nIn order to get the decrypted IPs of the incident, use the extract_attack_ips() function in bothound_tools.py \n\n# Attacks\nBothound uses clustering methods in order to separate attackers from regular traffic.\nThis process of labelling a subset of incident sessions as an attack is manual. \nThe user opens a Jupyter notebook, chooses an incident, clusters the sessions with different clustering algorithms and manually assigns an arbitrary attack number to the selected clusters. \n\n## Loading incident\n* Open Jupyter interface URL: [http://localhost:8889/](http://localhost:8889/)\n* Open src/Clustering.ipynb  \n* Execute \"Initialization\" chapter  \n* \"Configuration\" chapter: change the assignment of variable \"id\\_incident = ...\" to your incident number  \n* \"Configuration\" chapter: uncomment the features you want to use: \"features = [...]\"  \n* Execute \"Configuration\" chapter  \n* Execute \"Load Data\" chapter \n\n## Clustering\n* Execute DBSCAN Clustering chapter. \nAfter the clustering is done, you will see a bar plot of clusters. \nY-axis corresponds to the size of the cluster. Every cluster has its own color from a predefined palette.\n\n* Use plot3() function in the second cell of the chapter to create different 3D scatter plots of the calculated clusters:\n\n```python\nplot3([0,1,3], X, clusters, [])  \n```\nThe first argument of this function is an array of indexes of the 3 features to display at the scatter plot. Note that these are the indexes in the array of uncommented features from the \"Configuration\" chapter. If you have more than 3 uncommented features, choose different indexes and re-execute plot3() cell.\n\n* Choose your features carefully. \nIt is always better to experiment and play with different features subsets (uncommented in \"Configuration\" chapter). Clustering is very sensitive to feature selection. \nDifferent attacks might have different distinguishable features. \nIf you change your features selection in \"Configuration\" chapter, you must re-execute the \"Configuration\", \"Load Data\", and \"Clustering\" chapters. \n\n* Double clustering.\nIn some cases DBSCAN clustering is not good enough. The suspected cluster might have a weird shape and even contain two different botnets. In order to further divide such a cluster, you can use the second iteration, which we call \"Double Clustering\". You should choose the target cluster after the first clustering, as well as the number of clusters for K-Means clustering algorithm.  \nThe second cell in this chapter is the same plot3() function which displays a 3D scatter plot of double clustering.\n\n```python\nplot3([0,1,3], X2, clusters2, [])\n```\nNote that you should use X2 and clusters2 arguments.\n\n## Attack saving\n* Choose your attack ID(s).\nAttack IDs are arbitrary numbers you assign to each botnet. The attack is identified by its incident ID and attack ID.\nIt is possible to have more than one attack in a single incident. \n\n* Modify the tools.label\\_attack() function arguments  \nIf you have more than one attack number to save, you should add a call to the label/attack() function for every attack.  \nFor example, for attack #1 you choose cluster #3:  \n```python \ntools.label_attack(id_incident, attack_number = 1, selected_clusters = [3], selected_clusters2 = [])  \n```\nIf you use double clustering, don't forget to specify the indexes for selected_clusters2.\nFor example, for attack #1 you will choose cluster #3 and double clusters #4 and #5:   \n```python\ntools.label_attack(id_incident, attack_number = 1, selected_clusters = [3], selected_clusters2 = [4,5])  \n```\n\n* Execute \"Save Attack\" chapter. \n\n## Feature exploration\nIn this section, users can explore the distribution of a single feature over the clusters to verify the quality of the clustering results.  \n\n```python\nbox_plot_feature(clusters, num_clusters = 4, X = X, feature_index = 2)  \n```\n\nThe function will display a boxplot of feature values distribution per cluster.\nUsing this graph, you can get more insight into the quality of the clustering you used.  \nFor instance, if you know in advance that the attack you are clustering should have a significant higher hit rate, then you can expect that a proper attack cluster should have a similar high boxplot of \"request_interval\" features.\n\n## Common IPs with other incidents\nIf two attacks share a significant portion of identical IPs, they are likely to belong to the same botnet.\n\n```python\nplot_intersection(clusters, num_clusters, id_incident, ips, id_incident2 = ..., attack2 = -1)  \n```\n\nThis function will create a bar plot highlighting portions of the clusters which share identical IPs with another incident (specified by variable id_incident2). It is also possible to specify a particular attack index.\n\n## Countries\nThis graph explores the country distribution over the clusters. \n\n## Banjax\nEven if an IP was banned during the incident, Bothound does not use this information for clustering.\nNevertheless, the distribution of banned IPs over the clusters might be useful.\nThis graph will display portions of IPs, banned by [Banjax](https://github.com/equalitie/banjax) per cluster.\n\n# Analytics\nWhen attack labeling is completed (see \"Attacks\" chapter), a set of analytic scripts may be executed from a separate Jupyter notebook:\n\n* Open Jupyter interface URL: [http://localhost:8889/](http://localhost:8889/)\n* Open src/Analytics_1.ipynb \n* Execute \"Initialization\" chapter  \n* \"Configuration\" chapter: type the incident IDs to explore  \n* Execute \"Read Data\" chapter\n\n## Attacks Summary\nIn this section you can get the general information about the attacks in the selected incidents:  \n* number of unique IPs  \n* IDs of labeled attacks  \n* number of bots in each attack  \n```python\nIncident 29, num IPs = 14790, num Bots = 13013  \nIncident 42, num IPs = 10963, num Bots = 9023  \nAttack 1 = 13857 ips  \nAttack 4 = 2589 ips  \nAttack 7 = 11746 ips  \n```\n\n## Countries by attack\nA barplot of country distribution over the botnets.\n\n## Countries by Incident\nA barplot of country distribution over the incidents.\n\n## User Agents\nThe top used User Agent string used by attackers.\n\n## Attacks Scatter Plot\nThis 3D scatter plot illustrates the distribution of attack sessions vs. the regular traffic.\nThe first cell contains the code for preprocessing the plot.\nThe first line in this cell defines an array with all the features.  \n```python\nfeatures = [  \n    \"request_interval\", #1  \n    \"ua_change_rate\",#2  \n    \"html2image_ratio\",#3  \n    \"variance_request_interval\",#4  \n    \"payload_average\",#5  \n    \"error_rate\",#6  \n    \"request_depth\",#7  \n    \"request_depth_std\",#8  \n    \"session_length\",#9  \n    \"percentage_cons_requests\",#10  \n]  \n...  \n```  \nThe second cell contains the call to plot3() function (the same function used in \"Clustering.ipynb\" Jupyter notebook).\nMake sure you correctly specify the first argument: an array of 3 indexes from the features array.  \n```python\nplot3([3,2,5], X, incident_indexes, -1, \"Attack \")  \n```  \n\n## Attack metrics\nThe basic 3 metrics of the attacks:  \n\n* session length   \n* html/image ratio  \n* hit rate  \n\n## Attack similarity\nAttack similarity is a very important measure. It gives you a quantitative measure of how close a selected attack is to previously processed attacks.  \n```python\ntools.calculate_distances(  \n    id_incident = 29, # incident to explore  \n    id_attack = 1, # attack to explore  \n    id_incidents = [29,30,31,32,33,34,36,37,39,40,42], # incidents to compare with  \n    features = [] # specify the features by name. Use all features if empty  \n)  \n```  \nThe output is a list of previous attacks ordered by similarity or distance.  \n\n## Common IPs\nThe amount of common IPs with previously recorded attacks is another important metric.\nWhen a new attack shares a significant portion of IPs with another attack, it is a plausible sign that a single botnet is behind both attacks.  \n\n```python  \n# common ips with other attacks  \ntools.calculate_common_ips(  \n    incidents1 = [29,30], # incidents to explore  \n    id_attack = 1, # attack to explore(use -1 for all attacks)  \n    incidents2 = [36,37,39,40] # incidents to compare with  \n)  \n```  \n\nThe output is a list of attacks, ordered by the portion of common IPs.  \n* The first number - \"identical\" - is the total number of common identical IPs\n* The second number - % of attack - is the portion of identical IPs in the target attack\n* The third number - % of incident IPs - is the portion of identical IPs in the incident botnet\n\n```python  \nIntersection with incidents:  \n[36, 37, 39, 40]  \n\n========================== Attack 1:  \nNum IPs in the attack 13857:  \n\n__________ Incident 36:  \nNum IPs in the incident 111:  \n# identical   IPs: 134  \n% of attack   IPs: 5.00%   \n% of incident IPs: 77.00%  \n\n__________ Incident 37:  \nNum IPs in the incident 2720:  \n# identical   IPs: 4567  \n% of attack   IPs: 12.00%  \n% of incident IPs: 7.00%  \n```\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fequalitie%2Fbothound","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fequalitie%2Fbothound","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fequalitie%2Fbothound/lists"}