{"id":25441944,"url":"https://github.com/oskar-j/thresher","last_synced_at":"2026-05-02T03:08:30.236Z","repository":{"id":57475303,"uuid":"302882794","full_name":"oskar-j/thresher","owner":"oskar-j","description":"Thresher - THRESHold EvaluatoR for Python","archived":false,"fork":false,"pushed_at":"2020-10-17T18:09:18.000Z","size":6980,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-28T05:36:05.310Z","etag":null,"topics":["automl","fine-tuning","machine-learning","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oskar-j.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-10T11:07:28.000Z","updated_at":"2021-04-26T07:03:30.000Z","dependencies_parsed_at":"2022-09-07T17:12:06.328Z","dependency_job_id":null,"html_url":"https://github.com/oskar-j/thresher","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/oskar-j/thresher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fthresher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fthresher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fthresher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fthresher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oskar-j","download_url":"https://codeload.github.com/oskar-j/thresher/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fthresher/sbom","scorecard":{"id":713579,"data":{"date":"2025-08-11","repo":{"name":"github.com/oskar-j/thresher","commit":"91724ce584c7b1ed4425a8cf39258406a08a8312"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2,"checks":[{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":2,"reason":"8 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2018-34 / GHSA-2fc2-6r4j-p65h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2018-33 / GHSA-cw6w-4rcx-xphc","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: PYSEC-2020-73"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-22T08:50:25.867Z","repository_id":57475303,"created_at":"2025-08-22T08:50:25.867Z","updated_at":"2025-08-22T08:50:25.867Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32521136,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","fine-tuning","machine-learning","python","scikit-learn"],"created_at":"2025-02-17T13:16:05.523Z","updated_at":"2026-05-02T03:08:30.218Z","avatar_url":"https://github.com/oskar-j.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Thresher - THRESHold EvaluatoR for Python\n\n![eye illusion old vs young woman face](https://www.shared.com/content/images/2018/09/InkedUntitled-design--5-_LI_GH_content_1150px.jpg) \n\n_That's either a young girl's head, or an old woman face - it all depends on what the brain chooses to see._\n\n_Choose your cut-off point wise!_\n\n## Project description\n\nA bare pandas implementation of a tool for finding the threshold which maximizes accuracy \nof `predict_proba` like-outputs (from e.g. `scikit-learn`), in regard to the provided ground truth (labels).\n\n_Note: you can jump directly to the sample usage [here](https://github.com/oskar-j/thresher#sample-usage)._\n\nMethod interesting for the user is `optimize_threshold(scores, actual_classes)`, which is available \nfrom the `Thresher` class. This method, for given _scores_ and _actual classes_, \nreturns a threshold that yields the _**highest fraction** of correctly classified_ samples.\n\n```\noptimize_threshold parameters:\n  scores​:list\n    The list of scores.\n  actual_classes​:list\n    The list of ground truth (correct) classes. \n    Classes are represented as -1 and 1.\nreturns:\n  threshold:​float\n    The threshold value that yields ​the highest fraction of correctly classified \n    samples​. If multiple thresholds give the optimal fraction, return any threshold.\n```\n\n### An oracle mechanism\n\nWe implemented a meta-optimizer - an 'oracle' mechanism, which chooses a proper algorithm in regard to the provided data. This is the default behaviour, and can be controlled by changing the `algorithm` param of the `Thresher` constructor. See the source code of [oracle.py](https://github.com/oskar-j/thresher/blob/main/thresher/oracle.py) and [interface.py](https://github.com/oskar-j/thresher/blob/main/thresher/interface.py) for more details.\n\n### Implemented algorithms\n\n### Linear search\n\nThis is the most basic, iterative approach. Recommended for smaller datasets. For every _threshold_ present in the input (in the _scores_ list), we evaluate it by calculating the exact accuracy of _split_ produced by such threshold. Then, return the threshold which produce the most accurate split. \n\nList of parameters to customize:\n* `n_jobs` (default: 1) - set to `-1` for using all available processors except one; any value of `2` or more \nenables multiprocessing, while the default value of `1` disables multiprocessing\n\n### 2-dim Stochastic Gradient Descent\n\nThis algorithm uses a naive implementation of the popular algorithm 'Stochastic Gradient Descent', which tries to converge over a function - in our case, it\nis an error curve representing ratio of miss-classifies for a threshold. Using a gradient, algorithm follows the curve to find the optimal value, that is, \na threshold producing the smaller number of miss-classifies. The disadvantage of this algorithm is it's questionable robustness - it may happen\nthat it converges to a local optimum instead of a global one. \n\nList of parameters to customize:\n* `num_of_iters` (default: 200) - number of iterations during which algorithm tries to converge\n* `stop_thresh` (default: 0.001) - minimal value of improvement, below which algorithm stops\n* `alpha` (default: 0.01)\n\n### Evolutionary algorithm\n\nThis is a simulation approach which uses an evolutionary algorithm. It works by simulating multiple generations of a \"population\" of candidate solutions. During every iteration of a single generation, algorithm stochasticly evaluates the candidate solution. After the end of a single generation, we remove the from the population least fit agents (solutions), and do the _crossover_ between the left solitions to produce new \"offspring\" candidate solutions. Moreover, they may mutate to provide additional random chance. \n\nList of parameters to customize:\n* `population_size` (default: 30) - number of agents in the simulation\n* `number_of_generations` (default: 20) - number of generations\n* `number_of_iterations` (default: 10) - number of iterations per a generation\n* `sus_factor` (default: 2) - how many least-fit agents should be childless at the end of generation\n* `stoch_ratio` (default: 0.02) - percentage of data to evaluate fit of a single agent per iteration\n* `optimized_start` (default: True)\n* `mutation_chance` (default: 0.05)\n* `mutation_factor` (default: 0.10)\n\n### Grid search\n\nAdded in version `0.1.2`. This algorithm works by generate a grid of possible solutions, with a granularity set\nby parameter named `no_of_decimal_places`. All candidate solutions are evaluated thoroughly \nand the best one is chosen at the end.\n\nList of parameters to customize:\n* `no_of_decimal_places` (default: 2) - generate the grid by rounding the number to the given number of decimal places\n\n### Stochastic Grid search\n\nAdded in version `0.1.2`. This algorithm works similarly like the above-mentioned 'Grid search' method, with the difference, that\nevery single point generated by the grid is evaluated only partially (which can be controlled by the `stoch_ratio` parameter)\n\nList of parameters to customize:\n* `no_of_decimal_places` (default: 2) - generate the grid by rounding the number to the given number of decimal places\n* `stoch_ratio` (default: 0.05) - percentage of data to evaluate fit of a candidate number in the grid\n* `reshuffle` (default: False) - set whether the random projection should be calculated every step, or not\n\n## How to setup?\n\nThe process is rather straightforward, you just need to just whether to install \nfrom the sources (latest revision), or from the PyPI repository (stable release).\n\n### Requirements\n\nTested with Python `3.7+`, on a standard Unix environment\n\n### Installation\n\nInstallation from source:\n\n```\npip install git+https://github.com/oskar-j/thresher.git\n```\n\nStable release using the `pip` tool:\n\n```\npip install thresher-py\n```\n\n## Custom parameters\n\nIt's possible to provide additional parameters in the `Thresher` constructor. \n\n```python\nThresher(algorithm='auto',\n         allow_parallel=True,\n         verbose=False, \n         progress_bar=False,\n         labels=(0,1))\n```\n\nHere is a description of what does every particular parameter do:\n\n* **algorithm** (default value: `'auto'`) - allows to manually choose the algorithm from the list of available algorithms.\nSame effect can be achieved with running the method called `set_algorithm(algorithm_name)` on the `Thresher` instance. \nThe default value is 'auto', which means that the tool uses an oracle mechanism to manually choose a proper algorithm.\n* **allow_parallel** (default value: `True`) - enables/disabled multiprocessing for algorithms\n* **verbose** (default value: `False`) - enables verbosity\n* **progress_bar** (default value: `False`) - shows a progress bar in the terminal (if supported by the algorithm)\n* **labels** - necessary if your labels are different from `(-1, 1)` - first item from the tuple/list is a negative label, \nand the second item is a positive label\n\n### Control parameters for the algorithms\n\nSome of the above-mentioned algorithms allow to change their parameters. \nThey should be provided in a dictionary, inside the `algorithm_params` parameter. \nIf no such customs parameters are provided, default values apply.\n\nExamples:\n\n```python\nt = thresher.Thresher(algorithm_params={'n_jobs': 3})\n```\n\n```python\nt = thresher.Thresher(algorithm_params={'no_of_decimal_places': 3,\n                                        'stoch_ratio': 0.10})\n```\n\n## Sample usage\n\n```python\nimport thresher\n\nt = thresher.Thresher()\n\nprint('Currently supported algorithms:')\nprint(t.get_supported_algorithms())\n\ncases = [0.1, 0.3, 0.4, 0.7]\nactual_labels = [-1, -1, 1, 1]\n\nprint(f'Optimization result: {t.optimize_threshold(cases, actual_labels)}')\n```\n\nSee the [examples](https://github.com/oskar-j/thresher/tree/main/examples) directory for more sample code.\n\n## Performance tests\n\nA very basic performance test (with 10 repeats, on a real-world [anonymized data](https://github.com/oskar-j/thresher/blob/main/examples/performance_test/milion_samples.7z) consisting of `10^6` rows) can be found in the Notebook [located here](https://github.com/oskar-j/thresher/blob/main/examples/performance_test/TresherPerformanceTest.ipynb).\nSimilar experiment, but with more iterations, was conducted in the file [TresherPerformanceTestExtended.ipynb](https://github.com/oskar-j/thresher/blob/main/examples/performance_test/TresherPerformanceTestExtended.ipynb) to test the oracle.\n\n## Future work\n\n* adding more algorithms,\n* publishing on conda,\n* more heavy test loads,\n* python docs,\n* CI/CD pipeline for automated tests.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foskar-j%2Fthresher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foskar-j%2Fthresher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foskar-j%2Fthresher/lists"}