{"id":17118894,"url":"https://github.com/masoudhashemi/satreeverify","last_synced_at":"2025-03-24T02:18:45.105Z","repository":{"id":55525017,"uuid":"503189264","full_name":"masoudhashemi/SATreeVerify","owner":"masoudhashemi","description":"Verifying decision tree ensembles using propositional satisfiability","archived":false,"fork":false,"pushed_at":"2023-02-14T00:12:37.000Z","size":344,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-29T08:22:48.959Z","etag":null,"topics":["adversarial-attacks","ensemble-model","machine-learning","propositional-logic","sat-solver"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/masoudhashemi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-14T03:00:56.000Z","updated_at":"2023-01-12T16:22:27.000Z","dependencies_parsed_at":"2024-12-01T05:20:43.647Z","dependency_job_id":"001d936d-8d34-444e-a180-f27d4fdb8865","html_url":"https://github.com/masoudhashemi/SATreeVerify","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masoudhashemi%2FSATreeVerify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masoudhashemi%2FSATreeVerify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masoudhashemi%2FSATreeVerify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masoudhashemi%2FSATreeVerify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/masoudhashemi","download_url":"https://codeload.github.com/masoudhashemi/SATreeVerify/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245195962,"owners_count":20575938,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adversarial-attacks","ensemble-model","machine-learning","propositional-logic","sat-solver"],"created_at":"2024-10-14T17:55:42.453Z","updated_at":"2025-03-24T02:18:45.085Z","avatar_url":"https://github.com/masoudhashemi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Verifying Tree Ensembles using SAT\n\nThis code implements a method to represent ensemble trees (XGBoost and RanomForest) with propsitional logic. We then use this representation to test the robustness of the models using SAT (with Z3 solver).\n\nClique Formulation as Weighted Max-SAT\n--------------------------------------\n\nWe aim to formulate a weighted Max-SAT problem which, given a point $x$\nwith predicted output $y$ and $\\text{Ball}(x,\\epsilon)$ represents the\ntask of searching over a random forest to find a point $x^\\prime$ such\nthat:\n\n$\\lor_{x^\\prime} x^\\prime \\in \\text{Ball}(x,\\epsilon)  \\land f(x^\\prime) \\neq y$\n\nLet $b=\\{b_1, ...,b_T\\}$, where\n$b \\in \\mathcal{B} = \\mathcal{B}_1 \\times ... \\times \\mathcal{B}_T$\nwhere $b_i$ indexes the leaf nodes in tree $i$. Furthermore, let\n$\\text{Box}(b_i)$ define the bounding box in the coordinate space\nrepresented by leaf node $b_i$ and $\\text{Val}(b_i)$ denote the value\noutput by that leaf.\n\nTo craft our statement we need two types of weighted clauses:\n\n1.  $\\phi_{ij}(b_i,b_j)$, Clauses with weight 1 to check whether leaf\n    nodes from two different trees $i$ and $j$ overlap:\n    $\\text{Box}(b_i) \\cap \\text{Box}(b_j)$\n\n2.  $\\phi_i(b_i)$, Clauses with weight $\\text{Val}(b_i)$ to check\n    whether a leaf node from a tree $i$ over laps with the ball:\n    $\\text{Box}(b_i) \\cap \\text{Ball}(x,\\epsilon)$\n\nWe can then combine these two clauses to write out\n\n$\\land_{ij} \\phi_{ij} \\land_{i} \\phi_i$\n\nand with the associated weights and run a weight MaxSAT over the\nexpression. The solver should return a configuration values for the\nvariables $b$.\n\nBoolean encoding\n----------------\n\nThe $b$ variables in the problem of the previous section take on values\nbeyond zero and one. In this section we describe a formulation using the\nvariables described in the table below to rewrite adversarial perturbation in propositional logic.\n\nTo code the equivalent CNF model of the ensemble tree, we extract all\nthresholds from the nodes of the tree. For each threshold we create one column in data which is True if the value of the\ncorresponding feature $x_{n_i}$ is in the right path and is False\notherwise. Therefore, $\\mathbf{X} \\subseteq \\{0,1\\}^N$, where $N$ is the\nnumber of unique feature threshold pairs.\n\nUsing the binary data, let\n$\\mathcal{F}: \\mathbf{X}^n \\rightarrow \\mathcal{R}^2$ be the decision\nfunction of the tree based classifier. For each path combination $p$ in\na tree ensemble $\\mathbf{X}_p \\subseteq \\mathbf{X}^n$ that leads to\ntraversing $p$, the corresponding output is $y_p \\in \\mathcal{R}^2$. To\ncreate the SAT clauses we generate the equivalence classes as pairs of\n$(\\mathbf{X}_p, y_p)$, where $y_{p,i}$ if the decision associated with\nthe leaf $p_i$ of tree $i$ and we use the leaf node value $Val(b_{p,i})$\nas the MaxSAT weights.\n\nTo check the adversarial robustness of the tree based model with SAT,\nthe CNF of the ensemble contains the following components:\n\n1.  Declaring Boolean variables for each binary feature\n    $x_{f, th_{f,i}}$ defined for feature $f$ and threshold $i$\n    associated with that feature $th_{f,i}$; and each decision box\n    $b_{p,i}$.\n\n2.  Define the boundaries of the decision boxes, e.g.,\n    $b_{1,1} \\leftrightarrow (\\neg x_{2, th_{2,1}} \\wedge x_{5, th_{5,3}})$\n    means that the boundaries of $b_{1,1}$ are defined by\n    $x_{2, th_{2, 1}}$ and $x_{5, th_{5,3}}$. Since $x_{2, th_{2,1}}$ is\n    negated, the values of this box should on the left branch (smaller\n    than or equal to $th_{2,1}$. And the other boundary contains the\n    feature $x_5$ and should be on the right branch (larger than\n    $th_{5,3}$).\n\n3.  Each tree must have only one decision. Therefore, we force each tree\n    to have exactly one True decision, e.g., for a tree with three\n    decision boxes $b_i, i \\in \\{1,2,3\\}$ we have\n    $(b_1 \\vee b_2 \\vee b_3) \\wedge (b_1 \\rightarrow \\neg(b2 \\vee b_3)) \\wedge (b_2 \\rightarrow \\neg(b_1 \\vee b_3)) \\wedge (b_3 \\rightarrow \\neg(b_1 \\vee b2))$.\n\n4.  The search should be limited to the boxes that overlap with\n    $\\text{Ball}(x,\\epsilon)$. Therefore, all the decision boxes that do\n    not have an overlap with the ball are negated.\n\n5.  To choose the best combination of the leaves, we use the value of\n    the leaves as the weights of the soft assertions; and only choose\n    the leaves that have an overlap with $\\text{Ball}(x,\\epsilon)$.\n    MaxSAT therefore finds the satisfying clauses with the highest\n    score.\n\n6.  Since we want the decision result to be changed, the current leaves\n    should not be chosen. To enforce this constraint the decision boxes\n    associated with the input sample are negated.\n\nAssuming that the trees are of equal size, the number of path\ncombinations in the tree ensemble with $T$ trees of depth $d$ is\n$2^{d \\times T}$. However, in practice, decisions made by the individual\ntrees are influenced by a subset of features shared amongst several\ntrees within the same ensemble, and thus several path combinations are\ninfeasible and may be discarded from analysis.\n\nDemo\n====\nCheck `examples` folder for examples of using the code.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasoudhashemi%2Fsatreeverify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmasoudhashemi%2Fsatreeverify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasoudhashemi%2Fsatreeverify/lists"}