{"id":16668628,"url":"https://github.com/codhek/mitibot","last_synced_at":"2025-06-23T09:33:08.382Z","repository":{"id":55318618,"uuid":"215525474","full_name":"CodHeK/mitiBot","owner":"CodHeK","description":"A Graph based machine learning approach to bot mitigation systems.","archived":false,"fork":false,"pushed_at":"2021-01-05T15:57:12.000Z","size":14844,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-13T00:19:56.569Z","etag":null,"topics":["bot-mitigation","bot-mitigation-systems","bots-protection","machine-learning"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CodHeK.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-10-16T10:54:10.000Z","updated_at":"2022-04-03T14:20:40.000Z","dependencies_parsed_at":"2022-08-14T20:53:11.386Z","dependency_job_id":null,"html_url":"https://github.com/CodHeK/mitiBot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CodHeK/mitiBot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodHeK%2FmitiBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodHeK%2FmitiBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodHeK%2FmitiBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodHeK%2FmitiBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CodHeK","download_url":"https://codeload.github.com/CodHeK/mitiBot/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodHeK%2FmitiBot/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261453019,"owners_count":23160436,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot-mitigation","bot-mitigation-systems","bots-protection","machine-learning"],"created_at":"2024-10-12T11:26:13.183Z","updated_at":"2025-06-23T09:33:03.370Z","avatar_url":"https://github.com/CodHeK.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mitiBot\nA Graph based machine learning approach to bot mitigation systems.\n\n### Datasets\n\nJust run `setup.sh` to download the `training` and `testing` datasets. The datasets get downloaded into the `./datasets` folder.\n\n### Using the data files\n\n```\nb = Build(['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '52.csv', '53.csv'])\n```\n\nJust pass the file names, it will read the files from the `./datasets` directory and load the data.\n\n\n### e2e mode\n\nTo perform both `training` then `testing` use the `e2e` flag.\n\n```\npython3 model.py --e2e\n```\n\nConfiguration used in the e2e mode:\n\n```\n# Training dataset\n\nb = Build(['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '52.csv', '53.csv'])\nb.data = b.build_train_set(b.non_bot_tuples, b.bot_tuples)\nb.preprocess()\n\ntrain_p1()\ntrain_p2()\n\n# Testing dataset\n\nt = Build(['50.csv', '51.csv'])\nt.data = t.build_test_set(t.non_bot_tuples, t.bot_tuples, 50)\nt.preprocess()\n\ntest()\n```\n\nTotal time:\n```\n  Avg: ~45m\n```\n\n\n### k-fold mode\n\nPerform K-fold cross-validation on the 9 datasets using the `--kfold` flag.\n\nWe use:\n\n```\ndatasets = ['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '50.csv', '51.csv', '52.csv', '53.csv']\n```\n\nIn each iteration we use one of the datasets for testing and the rest for training.\n\n```\n$ python3 model.py --kfold\n```\n\nTakes about `~8 hours` in total to complete! (Check [logs](https://github.com/CodHeK/mitiBot/blob/master/kfold.logs))\n\n![kfold-output](screenshots/kfold-output.png)\n\nIn the end prints the average accuracy for Logistic Regression and Naive Bayes using DBSCAN in phase 1.\n\nDBSCAN + LR | DBSCAN + NB\n:-------------------------:|:-------------------------:\n97.46%  |  97.29%\n\n### Training\n\nYou can train the model in 2 ways, as it has PHASE 1 (UNSUPERVISED) and PHASE 2 (SUPERVISED)\n\nThis will peform both the phases one by one.\n```\npython3 model.py --train\n```\n\nIf you want to perform the 2 phases separately `(given the feature vectors are already saved in f.json and fvecs.json)`\n\n```\npython3 model.py --phase1\n```\n\nand\n\n```\npython3 model.py --phase2\n```\n\nOnce trained, it creates the pickle files of the model and saves it in the `saved` folder which is then used for the testing.\n\nNOTE:\n\nYou could directly used the saved `feature vectors` store in the JSON format in folder `saved_train` and directly train `phase2` of the training process inorder to fasten the training process!\n\nThe above, weights saved are trained on the following data files: `['42.csv', '43.csv', '46.csv', '47.csv', '48.csv', '52.csv', '53.csv']` in case you want to modify you'll have to train `phase1` first whose weights once trained will we saved in the `/saved` folder.\n\n\n\n### Testing\n\nUsing the command below will use the pre-trained classifier saved in the pickle file in the `saved` folder.\n```\npython3 model.py --test\n```\n\n### Cluster size maps\n#\n Kmeans (n_clusters=2, random_state=0) | DBScan (eps=0.4, min_samples=4)\n:-------------------------:|:-------------------------:\n![cluster_png](screenshots/kmeans.png)  |  ![cluster_png](screenshots/dbscan.png)\n\n### DBSCAN + Naive Bayes Classifer\n\nTested on the data files `50.csv` and `51.csv`.\n#\nTest run:\n#\n![test50_51](screenshots/test_db_nb.png)\n#\nTest time:\n```\n  Avg: ~7m\n```\n\n### DBSCAN + Logistic Regression Classifer\n\nTested on the data files `50.csv` and `51.csv`.\n#\nTest run:\n#\n![test50_51](screenshots/test_db_lr.png)\n#\nTest time:\n```\n  Avg: ~6m\n```\n\n### Experimenting\n\nUsing only Unsupervised Learning as our learning technique we get :\n\n#### DBScan (eps=1.0, min_samples=4)\n\nTesting the cluster on the data files `50.csv` and `51.csv`\n\n![dbscan_exp](screenshots/dbscan_exp.png)\n\n### Using various number of clusters for KMeans\n\nTesting the cluster on the data files `50.csv` and `51.csv`\n\n![kmeans_exp](screenshots/kmeans_exp.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodhek%2Fmitibot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodhek%2Fmitibot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodhek%2Fmitibot/lists"}