{"id":18932005,"url":"https://github.com/kratsg/optimization","last_synced_at":"2025-07-05T08:35:57.832Z","repository":{"id":32876361,"uuid":"36470577","full_name":"kratsg/optimization","owner":"kratsg","description":"Code for optimizing simple n-tuples","archived":false,"fork":false,"pushed_at":"2025-03-04T18:09:04.000Z","size":783,"stargazers_count":7,"open_issues_count":21,"forks_count":9,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-16T10:51:50.506Z","etag":null,"topics":["analysis","high-energy-physics","ntuples","optimization","root-cern","root-ntuples"],"latest_commit_sha":null,"homepage":"http://giordonstark.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kratsg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-05-28T22:54:00.000Z","updated_at":"2025-03-04T18:09:08.000Z","dependencies_parsed_at":"2025-04-15T18:50:16.825Z","dependency_job_id":null,"html_url":"https://github.com/kratsg/optimization","commit_stats":null,"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"purl":"pkg:github/kratsg/optimization","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kratsg%2Foptimization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kratsg%2Foptimization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kratsg%2Foptimization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kratsg%2Foptimization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kratsg","download_url":"https://codeload.github.com/kratsg/optimization/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kratsg%2Foptimization/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263711080,"owners_count":23499817,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","high-energy-physics","ntuples","optimization","root-cern","root-ntuples"],"created_at":"2024-11-08T11:47:44.086Z","updated_at":"2025-07-05T08:35:57.812Z","avatar_url":"https://github.com/kratsg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1\u003eDeprecated\u003c/h1\u003e\n\nMostly deprecated -- use it for a brute-force approach,, but it's not being maintained anymore! Please see the work on [cabin](https://github.com/scipp-atlas/cabin) by Juan Robles and Mike Hance.\n\n# Optimization - A uproot Codebase\n\nThis tool allows you to take a series of ROOT ntuples, signal \u0026 background, apply a lot of cuts automatically, and figure out the most optimal selections to maximize significance. It comes packed with a lot of features\n\n- generator script to create, what we call, a supercuts file containing all the rules to tell the script what cuts to apply and on which branches\n- cut script which will take your signal, background, and supercuts; run them all; and output a series of files with the appropriate event counts for all cuts provided\n- optimization script which will take your signal counts and background counts; run them all; and output a sorted list of optimal cuts\n- hash look up script to reverse-engineer the cut for a given hash when you supply the supercuts file\n\n*Note*: as part of making the script run as fast as possible, I try to maintain a low memory profile. It will not store (or remember) the cut used to create a significance value. Instead, we compute a 32-bit hash which is used to encode the cuts, and a way to \"decode\" the hash is also provided.\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*\n\n- [Major Dependencies](#major-dependencies)\n- [Quick Start](#quick-start)\n  - [Installing](#installing)\n    - [Using virtual environment](#using-virtual-environment)\n    - [Without using virtual environment](#without-using-virtual-environment)\n    - [On a CVMFS-enabled machine](#on-a-cvmfs-enabled-machine)\n  - [Using](#using)\n    - [Grab some optimization ntuples](#grab-some-optimization-ntuples)\n    - [Generate a supercuts template](#generate-a-supercuts-template)\n    - [Running the cuts](#running-the-cuts)\n    - [Calculating the significances](#calculating-the-significances)\n    - [Looking up a cut (or two)](#looking-up-a-cut-or-two)\n  - [Profiling Code](#profiling-code)\n  - [Example Script](#example-script)\n- [Documentation](#documentation)\n  - [Top-Level](#top-level)\n    - [Parameters](#parameters)\n  - [Action:Generate](#actiongenerate)\n    - [Required Parameters](#required-parameters)\n    - [Optional Parameters](#optional-parameters)\n    - [Output](#output)\n  - [Action:Cut](#actioncut)\n    - [Required Parameters](#required-parameters-1)\n    - [Optional Parameters](#optional-parameters-1)\n    - [Output](#output-1)\n  - [Action:Optimize](#actionoptimize)\n    - [Required Parameters](#required-parameters-2)\n    - [Optional Parameters](#optional-parameters-2)\n    - [Output](#output-2)\n  - [Action:Hash](#actionhash)\n    - [Required Parameters](#required-parameters-3)\n    - [Optional Parameters](#optional-parameters-3)\n    - [Output](#output-3)\n  - [Action:Summary](#actionsummary)\n    - [Required Parameters](#required-parameters-4)\n    - [Optional Parameters](#optional-parameters-4)\n    - [Output](#output-4)\n  - [Supercuts File](#supercuts-file)\n    - [Defining a fixed cut](#defining-a-fixed-cut)\n    - [Defining a supercut](#defining-a-supercut)\n    - [Example of a supercuts file](#example-of-a-supercuts-file)\n    - [More Complicated Selections](#more-complicated-selections)\n- [Authors](#authors)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n\n## Major Dependencies\n - [numpy](http://www.numpy.org/)\n - [uproot](https://github.com/scikit-hep/uproot)\n\nAll other dependencies are listed in [requirements.txt](requirements.txt) and can be installed in one line with `pip install -r requirements.txt`.\n\n## Quick Start\n\ntl;dr - copy and paste, and off you go.\n\n### Installing\n\n#### Using virtual environment\n\nI use [`virtualenvwrapper`](https://virtualenvwrapper.readthedocs.org/en/latest/) to manage my python dependencies and workspace. I assume you have `pip`.\n\n```bash\npip install virtualenvwrapper\necho \"source /usr/local/bin/virtualenvwrapper.sh\" \u003e\u003e ~/.bash_profile\nsource ~/.bash_profile\n```\n\nand then at this point, you can set up and install:\n\n```bash\nmkvirtualenv optimization\nworkon optimization\npip install root-optimize\nrooptimize -h\n```\n\nStart a new environment with `mkvirtualenv NameOfEnv` and everytime you open a new shell, you just need to type `workon NameOfEnv`. Type `workon` alone to see a list of environments you've created already. Read the [virtualenvwrapper docs](https://virtualenvwrapper.readthedocs.org/en/latest/) for more information.\n\n#### Without using virtual environment\n\n```bash\npip install root-optimize\nrooptimize -h\n```\n\n#### On a CVMFS-enabled machine\n\n\nFirst, set up a virtual environment using `python3` on the CVMFS-enabled machine and install the package:\n\n```bash\npython3 -m venv optimization\nsource optimization/bin/activate\npip install root-optimize\nrooptimize -h\n```\n\nwhich gets us a virtual environment (`optimization`) to work with. Lastly, all that's left when you log-in next time is\n\n```bash\nsource optimization/bin/activate\nrooptimize -h\n```\n\nand you're good to go!\n\n### Using\n\n#### Generate a supercuts template\n\nA straightforward example is simply just\n\n```bash\nrooptimize generate \"Gtt_0L_a/fetch/data-optimizationTree/user.lgagnon:user.lgagnon.370101.Gtt.DAOD_SUSY10.e4049_s2608_r6765_r6282_p2411_tag_10_v1_output_xAOD.root-0.root\"\n```\n\nwhich will create a `supercuts.json` file for you to edit so that you can run the optimizations. As a more advanced example, I only wanted to generate a file using a subset of the branches in my file as well as setting some of them to be a fixed cut that I would configure, so I ran\n\n```bash\nrooptimize generate \"Gtt_0L_a/fetch/data-optimizationTree/user.lgagnon:user.lgagnon.370101.Gtt.DAOD_SUSY10.e4049_s2608_r6765_r6282_p2411_tag_10_v1_output_xAOD.root-0.root\" --fixedBranches multiplicity_topTag* -o dump.json -b -vv --skipBranches *_jet_rc*\n```\n\nwhich will write branches that match `multiplicity_topTag*` to have a fixed cut when I eventually run `optimize` over it; and will also skip branches that match `*_jet_rc*` so they won't be considered at all for cuts.\n\n#### Running the cuts\n\nAfter that, we just specify all of our ROOT files. The script takes advantage of `TChain` and \\*nix file handling, it will automatically handle multiple files specified either as a pattern or just explicitly writing them out. We will group every output by the filenames/treenames passed in.\n\n```bash\nrooptimize cut TA07_MBJ10V1/*_0L_a/fetch/data-optimizationTree/*.root --supercuts=supercuts_small.json -o cuts_0L_a -b\nrooptimize cut TA07_MBJ10V1/*_0L_a/fetch/data-optimizationTree/*.root --supercuts=supercuts_small.json -o cuts_0L_b -b\nrooptimize cut TA07_MBJ10V1/*_1L/fetch/data-optimizationTree/*.root --supercuts=supercuts_small.json -o cuts_1L -b\n```\n\nWe use `numpy` and `awkward-array` in order to calculate the number of events passing a given cut. We will also attempt to parallelize the computations as much as possible.\n\n#### Calculating the significances\n\nAfter that, we just (at a bare minimum) specify the `signal` and `bkgd` json cut files. The following example takes the `0L_a` files and calculates significances:\n\n```bash\nrooptimize optimize --signal 37* --bkgd 4* --searchDirectory=cuts_0L_a -b --o=significances_0L_a\n```\n\nand this will automatically combine background and produce a significances file for each signal passed in.\n\n#### Looking up a cut (or two)\n\nWhen the optimizations have finished running, you'll want to take the given hash(es) and figure out what cut it corresponds to, you can do this with\n\n```bash\nrooptimize hash e31dcf5ba4786d9e8ffa9e642729a6b9 4e16fdc03c171913bc309d57739c7225 8fa0e0ab6bf6a957d545df68dba97a53 --supercuts=supercuts_small.json\n```\n\nwhich will create `outputHash/\u003chash\u003e.json` files detailing the cuts involved.\n\n### Profiling Code\n\nThis is one of those pieces of python code we always want to run as fast as possible. Optimization should not take long. To figure out those dead-ends, I use [snakeviz](https://jiffyclub.github.io/snakeviz/). The `requirements.txt` file contains this dependency. To run it, I first profile the code by running it:\n\n```bash\npython -m cProfile -o profiler.log rooptimize cut TA06_MBJ05/*_0L/fetch/data-optimizationTree/*.root --supercuts=supercuts.json -o cuts_0L -b\n```\n\nthen I use the `snakeviz` script to help me visualize this\n\n```bash\nsnakeviz profiler.log\n```\n\nand I'm good to go.\n\n### Example Script\n\nSee [example_script.sh](example_script.sh) for an idea how how to run everything in order to produce a plot of significances.\n\n## Documentation\n\n### Top-Level\n\n```bash\nrooptimize\n```\n\nor\n\n```bash\nrooptimize -h\n```\n\n```\nusage: rooptimize [-h] [-a] {generate,cut,optimize,hash,summary} ...\n\nAuthor: Giordon Stark. vX.Y.Z\n\npositional arguments:\n  {generate,cut,optimize,hash,summary}\n                              actions available\n    generate                  Write supercuts template\n    cut                       Apply the cuts\n    optimize                  Calculate significances for a series of computed\n                              cuts\n    hash                      Translate hash to cut\n    summary                   Summarize Optimization Results\n\noptional arguments:\n  -h, --help                  show this help message and exit\n  -a, --allhelp               show this help message and all subcommand help\n                              messages and exit\n\nThis is the top-level. You have no power here.\n```\n\n#### Parameters\n\nThere is only one required position argument: the `action`. You can choose from\n\n- [generate](#actiongenerate)\n- [cut](#actioncut)\n- [optimize](#actionoptimize)\n- [hash](#actionhash)\n- [summary](#actionsummary)\n\nWe also provide an optional argument `-a, --allhelp` which will print all the help documentation at once instead of just the top-level `-h, --help`.\n\n### Action:Generate\n\nGenerate helps you quickly start. Given the ROOT ntuples, generate a supercuts.json template.\n\n```bash\nusage: rooptimize generate --signal=signal.root [..] --bkgd=bkgd.root [...] [options]\n```\n\n#### Required Parameters\n\nVariable | Type | Description\n---------|------|------------\nfile | string | path to a root file containing an optimization ntuple to use\n\n#### Optional Parameters\n\nVariable | Type | Description\n---------|------|------------\n-h, --help | bool | display help message | False\n-v, --verbose | count | enable more verbose output | 0\n--debug | bool | enable full-on debugging | False\n-b, --batch | bool | enable batch mode for ROOT | False\n--tree | string | ttree name in the ntuples | oTree\n--eventWeight | string | event weight branch name | event_weight\n--o, --output | string | output json file to store generated supercuts file | supercuts.json\n--fixedBranches | strings | branches that should have a fixed cut | []\n--skipBranches | strings | branches that should not have a cut (skip them) | []\n\n- `--globalMinVal` is just an aesthetic feature to make it easier to identify the \"true\" minimum of your ntuples. I often output -99.0 in case there is (for example) no 4th jet, or I could not calculate some substructure information, this allows me to automatically chop off the low end of a branch to get a better calculation of the percentiles\n- `--fixedBranches` and `--skipBranches` can take a series of strings or a series of patterns\n\n  ```bash\n  --fixedBranches multiplicity_jet multiplicity_topTag_loose multiplicity_topTag_tight\n  ```\n\n  or\n\n  ```bash\n  --fixedBranches multiplicity_* pt_jet_rc8_1\n  ```\n\n  which aims to make life easier for all of us.\n\n#### Output\n\nThis script will generate a supercuts json file. See [Supercuts File](#supercuts-file) for more information.\n\n### Action:Cut\n\nCut helps you by generating the cuts from a supercuts file and applying them to create an output file of counts. Process ROOT ntuples and apply cuts.\n\n```bash\nusage: rooptimize cut \u003cfile.root\u003e ... [options]\n```\n\n#### Required Parameters\n\nVariable | Type | Description\n---------|------|------------\nfiles    | string | path(s) to root files containing ntuples\n\n#### Optional Parameters\n\nVariable | Type | Description | Default\n---------|------|-------------|---------\n-h, --help | bool | display help message | False\n-v, --verbose | count | enable more verbose output | 0\n--debug | bool | enable full-on debugging | False\n-b, --batch | bool | enable batch mode for ROOT | False\n--tree-pattern | string | patterns for ttree names in the files | *\n--eventWeight | string | event weight branch name | event_weight\n--supercuts | string | path to json dict of supercuts for generating cuts | supercuts.json\n--o, --output | directory | output directory to store json files containing cuts | cuts\n\n#### Output\n\nVariable | Type | Description\n---------|------|------------\nhash | 32-bit string | md5 hash of the cut\nraw | integer | raw number of events passing cut\nweighted | float | apply event weights to events passing cut\n\nWeighted events are applying the monte-carlo event weights that you specify. The calculation of significance is done for both the raw events and weighted events.\n\nThe output is a directory of json files which will look like\n\n```json\n{\n    ...\n    \"09a130622e1e6345b83739b3527eccb1\": {\n        \"raw\": 90909,\n        \"weighted\": 2.503\n    },\n    ...\n}\n```\n\nThis code will group your input files by filenames and tree names and will try its best to do its job to group your sample files.\n\n### Action:Optimize\n\nOptimize helps you find your optimal cuts. Process cuts and determine significance.\n\n```bash\nusage: rooptimize optimize  --signal=signal.root [..] --bkgd=bkgd.root [...] [options]\n```\n\n**Note**: You can specify multiple backgrounds and multiple signals. Each signal will be run over separately and each background will be merged and treated as a single background.\n\n#### Required Parameters\n\nVariable | Type | Description\n---------|------|------------\n--signal | string | path(s) to json files containing signal cuts\n--bkgd | string | path(s) to json files containing background cuts\n\n**Note**: this will search for files under the `search_directory` option, default is `cuts` to search for files produced by `rooptimize cut`.\n\n#### Optional Parameters\n\nVariable | Type | Description | Default\n---------|------|-------------|---------\n-h, --help | bool | display help message | False\n-v, --verbose | count | enable more verbose output | 0\n--debug | bool | enable full-on debugging | False\n-b, --batch | bool | enable batch mode for ROOT | False\n--searchDirectory | string | the directory that contains all cut.json files | 'cuts'\n--bkgdUncertainty | float | bkgd sigma for calculating sig. | 0.3\n--bkgdStatUncertainty | float | bkgd statistical uncertainty for significance | 0.3\n--insignificance | int | min. number of events for non-zero sig. | 0.5\n--o, --output | string | output directory to store significances calculated | significances\n-n, --max-num-hashes | int | maximum number of hashes to dump in the significance files | 25\n\n#### Output\n\nVariable | Type | Description\n---------|------|------------\nhash | 32-bit string | md5 hash of the cut\nsignificance | float | calculated significance of the cut\nyield | float | number of events passing the cut\n\nThe output is a directory of json files which will look like\n\n```json\n[\n    ...\n    {\n        \"hash\": \"7595976a84303a003f6a4a7458f12b8d\",\n        \"significance_raw\": 7.643122000999725,\n        \"significance_weighted\": 18.34212454602254,\n        \"yield_raw\": { ... },\n        \"yield_weighted\": { ... }\n    },\n    ...\n]\n```\n\nif a significance was calculated successfully or\n\n```json\n[\n    ...\n    {\n        \"hash\": \"c911af35708dcdc51380ebbde81c9b1e\",\n        \"significance_raw\": -3,\n        \"significance_weighted\": -3,\n        \"yield_raw\": { ... },\n        \"yield_weighted\": { ... }\n    },\n    {\n        \"hash\": \"b383cea24037667ffb6136d670a33468\",\n        \"significance_raw\": -2,\n        \"significance_weighted\": -2,\n        \"yield_raw\": { ... },\n        \"yield_weighted\": { ... }\n    },\n    {\n        \"hash\": \"095414bacf1022f2c941cc6164b175a1\",\n        \"significance_raw\": 9.421795580339449,\n        \"significance_weighted\": 20.37611073465684,\n        \"yield_raw\": { ... },\n        \"yield_weighted\": { ... }\n    },\n    ...\n]\n```\n\nif the number of events in signal or background did not pass the `--insignificance` minimum threshold set. The significance will always be flagged as a negative number depending on the insignificance observed. The table below summarizes these cases:\n\nSig. Value | What Happened\n----------:|:-------------\n-1         | The signal was insignificant\n-2         | The background was insignificant\n-3         | There were not enough statistics in the background events\n\nNote that `--max-num-hashes` determines how many hashes you will actually see in these output files.\n\n### Action:Hash\n\nHash to cut translation. Given a hash from optimization, dump the cuts associated with it.\n\n```bash\nusage: rooptimize hash \u003chash\u003e [\u003chash\u003e ...] [options]\n```\n\n#### Required Parameters\n\nVariable | Type | Description\n---------|------|------------\nhash (positional) | string | 32-bit hash(es) to decode as cuts. If --use-summary is flagged, you can pass in your summary.json file instead.\n\n#### Optional Parameters\n\nVariable | Type | Description| Default\n---------|------|------------|---------\n-h, --help | bool | display help message | False\n-v, --verbose | count | enable more verbose output | 0\n--debug | bool | enable full-on debugging | False\n-b, --batch | bool | enable batch mode for ROOT | False\n--supercuts | string | path to json dict of supercuts | supercuts.json\n--o, --output | directory | output directory to store json files containing cuts | outputHash\n--use-summary | bool | if enabled, you can pass in your summary.json file instead of a bunch of hashes | False\n\n#### Output\n\nThe hash subcommand will create an output directory with multiple json files, one for each hash, containing details about the cut applied. Unlike a standard supercuts file, the hash will only output dictionaries of **4** elements\n\nVariable | Type | Description\n---------|------|------------\nbranch | string | name of branch that cut was applied on\nfixed | bool | whether the cut was from a fixed cut or a supercut\npivot | number | the value which we cut on, see `signal_direction` for more\nsignal_direction | string | `? = \u003e` or `? = \u003c`, cuts obey the rule `value ? pivot`\n\n### Action:Summary\n\nOptimize results to summary json. Given the finished results of an optimization, produce a json summarizing it entirely.\n\n```bash\nusage: rooptimize summary [options]\n```\n\n#### Required Parameters\n\nNo required parameters\n\n#### Optional Parameters\n\nVariable | Type | Description| Default\n---------|------|------------|---------\n-h, --help | bool | display help message | False\n-v, --verbose | count | enable more verbose output | 0\n--debug | bool | enable full-on debugging | False\n-b, --batch | bool | enable batch mode for ROOT | False\n--ncores | int | Number of cores to use for parallelization | \u003cnum cores\u003e\n-o, --output | str | Name of output file to use | summary.json\n--searchDirectory | str | The directory containing the significances produced from `rooptimize optimize`\n-f, --fmtstr | str | format string for matching against signal filenames in config.json | \"([a-zA-Z]+)_(\\d+)_(\\d+)_(\\d+)\"\n-p, --interpretation | str | how to interpret the corresponding format string | \"signal_type:gluino:stop:neutralino\"\n\n\n#### Output\n\nThe summary subcommand will create an output json file containing a list of dictionaries, one for each signal used in optimization. It will contain the following variables in each dictionary (assuming defaults):\n\nVariable | Type | Description\n---------|------|------------\nbkgd | float | Background yield\nfilename | str | signal filename used\nhash | 32-bit string | md5 hash of the optimal cut\ngluino | str | Mass of Gluino\nneutralino | str | Mass of LSP\nstop | str | Mass of Stop\nsignal_type | str | type of signal\nratio | float | Ratio of signal/bkgd\nsignificance | float | Significance of optimal cut\n\nThis will look something like:\n\n```\n[\n    ...\n    {\n        \"bkgd\": 5.656293846714225,\n        \"filename\": \"significances/Gtt_900_5000_400.json\",\n        \"hash\": \"dc41780c77207a9a5dcf6b97b0cac5ac\",\n        \"gluino\": \"900\",\n        \"neutralino\": \"400\",\n        \"stop\": \"5000\",\n        \"ratio\": 79.33950854202577,\n        \"signal\": 448.76757396759103,\n        \"signal_type\": \"Gtt\",\n        \"significance\": 29.78897424015455\n    },\n    ...\n]\n```\n\n### Supercuts File\n\nThis is a potentially large [JSON](http://www.json.org/) file that tells the [optimize](#actionoptimize), [hash](#actionhash), and [generate](#actiongenerate) commands the rules of your cuts.\n\n- The `optimize` command uses it to generate a series of cuts to apply to your ntuples, then hash these cuts and store them with their calculated significance.\n- The `hash` command uses it to recompute the hash and find the cuts that match up to the hashes you need to decode.\n- The `generate` command creates this file for you based on your ntuples to help you get started.\n\nThe file will always contain a list of objects (dictionaries) for each branch that you care about cutting on.\n\n#### Defining a fixed cut\n\nA fixed cut is a single cut on a single branch. This is like taking a partial derivative where you fix one thing and vary others. In this case, we fix a branch defined by a fixed cut.\n\nKey | Type | Description\n----|------|------------\nselections | string | the various selections to apply for the cut\npivot | number | the value at which we cut (or *pivot* against)\n\nThe simplest example is when we want to use a single fixed cut on a single branch. Your object will look like\n\n```json\n[\n    ...\n    {\n        \"selections\": \"multiplicity_jet \u003e {0}\",\n        \"pivot\": 3,\n    },\n    ...\n]\n```\n\nThis says we would like a fixed cut on `multiplicity_jet` requiring that there are more than 3 jets (eg: the rule we obey is `value \u003e 3`).\n\n#### Defining a supercut\n\nA supercut is our term for an object that generates more than 1 cut on the defined branch. A fixed cut will generate 1 cut, but a supercut can generate a boundless number of cuts.\n\nKey | Type | Description\n----|------|------------\nselections | string | the various selections to apply for the cut\nst3 | list | a list of [start, stop, step] values for each set of pivots\nlist | list | a list of [cut1, cut2, ..., cutN] values for each set of pivots\n\n**Note**: the direction in which cuts are generated can be controlled by running cuts in increasing values (`start \u003c stop`, `step \u003e 0`) or decreasing values (`start \u003e stop`, `step \u003c 0`).\n\nThere are two main examples we will provide to show the different cuts that could be generated.\n\n```json\n[\n    ...\n    {\n        \"selections\": \"multiplicity_jet \u003c {0}\",\n        \"st3\": [\n            [2, 7, 2]\n        ]\n    },\n    ...\n]\n```\n\nThis says we would like a supercut on `multiplicity_jet` where the pivot values are `2, 4, 6` obeying the rule that `value \u003c pivot`. This supercut will generate 3 cuts:\n\n- `value \u003c 2`\n- `value \u003c 4`\n- `value \u003c 6`\n\nin that order.\n\n```json\n[\n    ...\n    {\n        \"selections\": \"multiplicity_jet \u003e {0}\",\n        \"st3\": [\n            [3, 1, -1]\n        ]\n    },\n    ...\n]\n```\n\nThis says we would like a supercut on `multiplicity_jet` where the pivot values are `3, 2` obeying the rule that `value \u003e pivot`. This supercut will generate 2 cuts:\n\n- `value \u003e 3`\n- `value \u003e 2`\n\nin that order.\n\n#### Example of a supercuts file\n\nHere is an example `supercuts.json` file\n\n```json\n[\n    {\n        \"selections\": \"multiplicity_jet \u003e {0}\",\n        \"st3\": [\n            [2, 15, 1]\n        ]\n    },\n    {\n        \"selections\": \"multiplicity_jet_largeR \u003e {0}\",\n        \"st3\": [\n            [3, 1, -1]\n        ]\n    },\n    {\n        \"selections\": \"multiplicity_topTag_loose \u003e {0}\",\n        \"pivot\": [1]\n    }\n]\n```\n\nHow do we interpret this? This file tells the code that there are 3 branches to apply cuts on: `multiplicity_jet`, `multiplicity_jet_largeR`, and `multiplicity_topTag_loose`. Each object `{...}` represents a branch. In order:\n\n- This is a supercut. **13** cuts will be generated for `multiplicity_jet` starting from `2` to `15` in increments of `1`. This means the cut values (`pivot`) used will be `2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14` (inclusive start, exclusive end - adhere to python standards). The `signal_direction` specifies where we expect the signal to be. `\u003e` means to cut on the **right** so we only want to keep events with `value \u003e pivot`.\n- This is a supercut.**2** cuts will be generated for `multiplicity_jet_largeR` starting from `3` to `1` in incremenets of `-1`. This means the cut values (`pivot`) used will be `3, 2` (inclusive start, exclusive end - adhere to python standards). The `signal_direction` specifies where we expect the signal to be. `\u003c` means to cut on the **left** so we only want to keep events with `value \u003c pivot`.\n- This is a fixed cut. **1** cut will be used for `multiplicity_topTag_loose` with a `pivot = 1` and `signal_direction = \u003e`. This means we will only select events with `value \u003e 1` always. The `pivot` will be fixed. One could also fix the cut by providing `start`, `stop`, `step` such that it only generates 1 cut, but the script will not identify this as a fixed cut for you when you look up the `cut` using [hash](#actionhash).\n\nThis supercuts file will generate **26** total cuts (`13*2*1 = 26`). Each cut will have an associated hash value and an associated significance which will be recorded to an output file when you run [optimize](#actionoptimize).\n\nIf you wish to provide a fixed cut (the pivot does not change), you simply need to specify the pivot instead. Taking the example shown above, you might have something like\n\n```json\n[\n    {\n        \"selections\": \"multiplicity_jet \u003e= {0}\",\n        \"pivot\": [4]\n    },\n    {\n        \"selections\": \"multiplicity_jet_largeR \u003e {0}\",\n        \"st3\": [\n            [3, 1, -1]\n        ]\n    },\n    {\n        \"selections\": \"multiplicity_topTag_loose \u003e {0}\",\n        \"pivot\": [1]\n    }\n]\n```\n\nwhich tells the code to always apply a cut of `multiplicity_jet \u003e= 4` always.\n\n#### More Complicated Selections\n\nOne can certainly provide more complicated selections involving multiple pivots and multiple branches. In fact, this makes our optimization increasingly more flexible and faster than any other code in existence. We use the [formulate](https://github.com/scikit-hep/formulate/) and [numexpr](https://github.com/pydata/numexpr/) packages to provide the parsing of the selection strings. This supports \"standard\" selection strings as well as those recognized by `TCut`/`TFormula` as well. Their documentation has examples of what you can do. You still need to specify placeholders for your pivots like below:\n\n```json\n[\n    ...\n    {\n        \"selections\": \"(mass_jets_largeR_1 \u003e {0} \u0026 mass_jets_largeR_2 \u003e {0} \u0026 mass_jets_largeR_3 \u003e {0}) \u003e= {1}\",\n        \"st3\": [\n            [50, 2000, 50],\n            [0, 4, 1],\n        ]\n    },\n    ...\n]\n```\n\nis an example of a perhaps more complicated selection that can be done. In this case, we are determining how many of the 3 leading jets pass a mass cut, but also applying a cut on that count. In this case, the `{0}` pivot placeholder refers to the first `st^3` option: `[50, 2000, 50]` which is to vary the first pivot `{0}` from 50 GeV to 2 TeV in 50 GeV steps. The `{1}` pivot placeholder refers to the second `st^3` option: `[0, 4, 1]` which is to vary the second pivot `{1}` from 0 to 4 in steps of 1. This will allow us to iterate over all possible values of pivots (the product of `[50, 2000, 50] X [0, 4, 1]`).\n\n## Authors\n- [Giordon Stark](https://github.com/kratsg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkratsg%2Foptimization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkratsg%2Foptimization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkratsg%2Foptimization/lists"}