{"id":22848379,"url":"https://github.com/feedzai/research-arms","last_synced_at":"2025-04-30T04:49:31.149Z","repository":{"id":37574236,"uuid":"240403149","full_name":"feedzai/research-arms","owner":"feedzai","description":null,"archived":false,"fork":false,"pushed_at":"2020-02-17T13:25:43.000Z","size":2813,"stargazers_count":6,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-30T04:49:16.772Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feedzai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-14T01:23:30.000Z","updated_at":"2025-04-09T11:10:20.000Z","dependencies_parsed_at":"2022-08-29T06:33:00.432Z","dependency_job_id":null,"html_url":"https://github.com/feedzai/research-arms","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fresearch-arms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fresearch-arms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fresearch-arms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feedzai%2Fresearch-arms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feedzai","download_url":"https://codeload.github.com/feedzai/research-arms/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251644827,"owners_count":21620630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-13T04:11:34.610Z","updated_at":"2025-04-30T04:49:31.124Z","avatar_url":"https://github.com/feedzai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](misc/arms_logo.png)\n\n**Paper**: *ARMS: Automated rules management system for fraud detection*, D. Aparício,\nR. Barata, J. Bravo, J. T. Ascensão, P. Bizarro, submitted to KDD 2020.\n\nThis repository contains the following:\n*  ARMS binary in bin/ARMS. Please see the [README](bin/README) and the [license](bin/LICENSE).\n*  The synthetic data used in the paper in [data/](data), as well as the [script used to generate the data](data_generator/data_generator.py).\n*  [More detailed results than the ones shown in the paper for the synthetic data](results/synthetic_data_results.xlsx).\n*  [A more detailed description of the optimization algorithms](supplementary_material/algorithms.pdf).\n*  [A presentation with a simple overview of ARMS](supplementary_material/ARMS_overview.pptx).\n*  [More figures than the ones shown in the paper, mostly with more detailed results on real data](supplementary_material/figures.pdf).\n\n# ARMS parameters\n\n*  `-df` train dataset file; check [data/synthetic_data_train.csv](data/synthetic_data_train.csv) for the correct format.\n*  `-tdf` validation dataset file\n*  `-pr` priorities file; check [data/synthetic_data_priorities.csv](data/synthetic_data_priorities.csv) for the correct format.\n*  `-lff` loss function file; check [data/loss_function.txt](data/loss_function.txt) for the correct format.\n*  `-seed` value\n*  `-m` method\n    * `single_eval` evaluate a rules configuration; check [data/best_genetic_w_arp_config](data/best_genetic_w_arp_config)\n    * `random` use random search\n        * `-rr` number of random runs\n        * `-sp` mutation/shut-off probability \n    * `greedy` use greedy expansion \n    * `genetic` use genetic programming\n        * `-nr` number of runs\n        * `-ps` population size\n        * `-tps` survivors fraction\n        * `-mp` mutation probability\n*   `-arp` augment rules pool before optimization\n*   `-gpcp`, `-ipcp` random priority shuffling during optimization\n\n# Reproducing the experiments on synthetic data\n\nSince we can not share our clients data, and we could not find a similar dataset\nonline, we created synthetic data as described in the main paper.\n\nNote that ARMS writes the results to `results/[DATSET]/[FILENAME].json`. The last\nline of ARMS' output is the path to the json results file.\n\n## Evaluate the original system\n\n### On the train set\n```\nbin/ARMS -df data/synthetic_data_train.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/all_on -lff data/loss_function.txt -seed 42\n```\n\n### On the validation set\n```\nbin/ARMS -df data/synthetic_data_validation.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/all_on -lff data/loss_function.txt -seed 42\n```\n\n### On the test set\n```\nbin/ARMS -df data/synthetic_data_test.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/all_on -lff data/loss_function.txt -seed 42\n```\n\n## Optimization using random search\n\nFirst, we train on the train set and evaluate on the validation set.\n\n```\nfor mp in 0.04 0.10 0.16 0.22 0.28 0.34 0.40 0.46 0.52 0.58 0.64 0.70 0.76 0.82 0.88 0.84\ndo\n   bin/ARMS -df data/synthetic_data_train.csv -pr data/synthetic_data_priorities.csv -m random -rr 300000 -mp \"$mp\" -lff data/loss_function.txt -seed 42 -tdf data/synthetic_data_train.csv\ndone\n```\n\nFrom these evaluations (i.e., by checking the \"validation system\" object of each evaluation json file),\nwe observe that `mp = 40%` obtained the best results in the validation set. Then, by checking the \"removed rules\", we created \na rule configuration file [\"best_random_config\"](data/best_genetic_config) and evaluated that rule configuration\non the test set.\n\n```\nbin/ARMS -df data/synthetic_data_test.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/best_random_config -lff data/loss_function.txt -seed 42\n```\n\n## Optimization using greedy expansion\n\n### On the original rules\n\nFirst, we train on the train set and evaluate on the validation set, only using the original rules.\n\n```\nbin/ARMS -df data/synthetic_data_train.csv -pr data/synthetic_data_priorities.csv -m greedy -lff data/loss_function.txt -seed 42 -tdf data/synthetic_data_validation.csv\n```\n\nSimilarly to what we described for random search, we evaluate the best rule configuration found on the test set.\n\n```\nbin/ARMS -df data/synthetic_data_test.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/best_greedy_config -lff data/loss_function.txt -seed 42\n```\n\n### On the augmented rules pool\n\nSecond, we train on the train set and evaluate on the validation set, on the augmented rules pool.\n\n```\nbin/ARMS -df data/synthetic_data_train.csv -pr data/synthetic_data_priorities.csv -arp -m greedy -lff data/loss_function.txt -seed 42 -tdf data/synthetic_data_validation.csv\n```\n\nWe also evaluate the best configuration found in the test set.\n\n```\nbin/ARMS -df data/synthetic_data_test.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/best_greedy_w_arp_config -lff data/loss_function.txt -seed 42\n```\n\n## Optimization using genetic programming\n\n### Using the original priorities\n\nFirst, we train on the train set and evaluate on the validation set, only using the original rule priorities.\n\n\n```\nfor ps in 20 30\ndo\n    for tps in 10 20 30\n    do\n        for mp in 2 5\n            bin/ARMS -df data/synthetic_data_train.csv -pr data/synthetic_data_priorities.csv -m genetic -nr 10000 -ps \"$ps\" -tps \"$tps\" -mp \"$mp\" -lff data/loss_function.txt -seed 42 -tdf data/synthetic_data_train.csv\n        done\n    done\ndone\n\n```\n\nSimilarly to what we described for random search and greedy expansion, we evaluate the best rule configuration found on the test set.\n\n```\nbin/ARMS -df data/synthetic_data_test.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/best_genetic_config -lff data/loss_function.txt -seed 42\n```\n\n### Using priority shuffling\n\nInitially we tried augmenting the rules pool like we did for the greedy expansion, but we found out\nthat this increased the search space too much and the method's performance actually degraded. Then, we\ntried to do random priority shuffling during optimization and results improved.\n\n```\nfor ps in 20 30\ndo\n    for tps in 10 20 30\n    do\n        for mp in 2 5\n            bin/ARMS -df data/synthetic_data_train.csv -pr data/synthetic_data_priorities.csv -m genetic -nr 10000 -ps \"$ps\" -tps \"$tps\" -mp \"$mp\" -gpcp 0.2 -ipcp 0.1 -lff data/loss_function.txt -seed 42 -tdf data/synthetic_data_train.csv\n        done\n    done\ndone\n\n```\n\nWe also evaluate the best configuration found in the test set.\n\n```\nbin/ARMS -df data/synthetic_data_test.csv -pr data/synthetic_data_priorities.csv -m single_eval -cf data/best_genetic_w_arp_config -lff data/loss_function.txt -seed 42\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Fresearch-arms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeedzai%2Fresearch-arms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeedzai%2Fresearch-arms/lists"}