{"id":22344721,"url":"https://github.com/languagemachines/paramsearch","last_synced_at":"2025-03-26T10:13:27.933Z","repository":{"id":17718471,"uuid":"20543661","full_name":"LanguageMachines/paramsearch","owner":"LanguageMachines","description":"Automated parameter optimisation for Timbl","archived":false,"fork":false,"pushed_at":"2014-12-08T15:30:18.000Z","size":232,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-17T16:19:03.671Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LanguageMachines.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-06-05T22:24:38.000Z","updated_at":"2015-11-23T10:44:18.000Z","dependencies_parsed_at":"2022-08-25T09:21:38.229Z","dependency_job_id":null,"html_url":"https://github.com/LanguageMachines/paramsearch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LanguageMachines%2Fparamsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LanguageMachines%2Fparamsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LanguageMachines%2Fparamsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LanguageMachines%2Fparamsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LanguageMachines","download_url":"https://codeload.github.com/LanguageMachines/paramsearch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245632418,"owners_count":20647194,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-04T09:14:54.672Z","updated_at":"2025-03-26T10:13:27.908Z","avatar_url":"https://github.com/LanguageMachines.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"paramsearch 1.3\n\ncopyright (c) 2003-2011, Antal van den Bosch\n\n    paramsearch is free software; you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation; either version 3 of the License, or\n    (at your option) any later version.\n    \n    paramsearch is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    GNU General Public License for more details.\n    \n    You should have received a copy of the GNU General Public License\n    along with this program; if not, see \u003chttp://www.gnu.org/licenses/\u003e.\n    \nAuthor: Antal van den Bosch / antalb@uvt.nl / http://ilk.uvt.nl\n\n\nGENERAL INFORMATION\n\nOn the basis of an instance base (a data file containing a list of\nexamples of some classification task, where one example is represented\nby a list of feature values and a class label), paramsearch produces a\ncombination of algorithmic parameters of a machine learning algorithm\nof choice, estimated to do well on unseen material from the same\nsource as the input instance base. Paramsearch implements two\nheuristics for search in multi-dimensional algorithmic parameter\nspaces:\n\n1. cross-validated classifier wrapping, recombining parameter settings\n   pseudo-exhaustively, for small data sets (less than 1000 instances);\n2. wrapped progressive sampling for larger data sets (\u003e=1000 instances).\n\nThe main goal of paramsearch is to provide a fast approximation of\nutopian exhaustive parameter optimisation-by-validation. While small\ndata sets do allow for some pseudo-exhaustive classifier wrapping, for\nlarger data sets paramsearch uses wrapped progressive sampling, a\ncombination of classifier wrapping and progressive sampling.\n\nA reasonably working metaphor for wrapped progressive sampling is that\nof a competition among mountaineers climbing a mountain. At the start\nof the competition, some thousand mountaineers run up the foot of the\nmountain and start to climb. After some time, the group of\nmountaineers that has been the least successful in terms of the height\nthey reached (established by a one-dimensional clustering along the\nheight dimension) is dropped out of the competition, and the race\ncontinues with the remaining contestants, higher up the mountain,\nwhere climbing becomes increasingly difficult. This\ncompetition-in-stages is repeated until only one mountaineer is left,\nor the top of the mountain is reached by a group of mountaineers, in\nwhich case one random contestant from this group is selected.\n\nWrapped progressive sampling always starts with a 500-instance\ntraining set and a 100-instance test set. During the progressive\nsampling steps, the training set is exponentially grown to 80% of the\nfull training set, while the size of the test set is synchronously\ngrown at 20% of the size of the training set.\n\nParamsearch currently supports the following machine learning\nalgorithms: IB1, Fambl, SVM-light, Ripper, Zhang Le's Maxent, C4.5,\nWinnow, and Perceptron (the latter two using SNoW). More details\nbelow.\n\n\nINSTALLATION\n\nUnpack the paramsearch.1.3.tar.gz tarball, go to the paramsearch\ndirectory, and type \"make\":\n\n\u003e tar zxf paramsearch.1.3.tar.gz\n\u003e cd paramsearch.1.3\n\u003e make\n\nThis generates a bunch of executables. Install the \"paramsearch\"\nexecutable in a directory in your $PATH, e.g. /usr/local/bin if you\nhave permission.\n\n\u003e cp paramsearch /usr/local/bin\n\nThen, set the environment variable PARAMSEARCH_DIR to the paramsearch\ndirectory.\n\nIn (t)csh: \u003e setenv PARAMSEARCH_DIR /the/path/to/paramsearch\nIn bash:   \u003e export PARAMSEARCH_DIR=/the/path/to/paramsearch\n\n\nUSAGE\n\n\u003e paramsearch \u003calgorithm\u003e \u003ctrainingfile\u003e [extra]\n\nWhere \u003calgorithm\u003e is ib1, ib1-bin, ibn, fambl, igtree, tribl2,\nsvmlight, ripper, c4.5, winnow, or perceptron. \u003calgorithm\u003e is supposed\nto be installed, and present in $PATH. [extra] either contains an\noptional metric modifier for IB1, or the (obligatory!) number of\nclasses for Winnow or Perceptron (the underlying software emulating\nWinnow and Perceptron, SNoW, demands a user specification of the\nnumber of classes).\n\nAn optional metric modifier for IB1 is attached to -mO, -mM or -mJ,\nand can be used to ignore features or overrule similarity function\nsettings for certain features (e.g. declare them as numeric). An\nexample metric setting in which features 1 and 2 are ignored, and 4 is\ndeclared to be numeric: :I1,2:N4 .\n\nMore details on the algorithms:\n\n- ib1        IB1, from the TiMBL software package. Required \n             version: 6.4 or higher. Current \n             tested version: 6.4.\n             See http://ilk.uvt.nl/timbl\n- ib1-bin    An emulation of IB1 inside the Fambl package,\n             with automatically binarized multi-valued features. \n             Required version: 2.1.10 or higher. \n             See http://ilk.uvt.nl/~antalb\n- ibn        IB1 from TiMBL with numeric feature metrics only.\n             Tested version: 6.4. See http://ilk.uvt.nl/timbl\n- igtree     IGTree, from the TiMBL software package. Tested\n  \t     version: 6.4. See http://ilk.uvt.nl/timbl\n- tribl2     TRIBL2, from the TiMBL software package. Tested\n  \t     version: 6.4. See http://ilk.uvt.nl/timbl\n- ib1-sparse IB1 with sparse feature coding (-F Sparse). Tested \n  \t     version: 6.4. See http://ilk.uvt.nl/timbl\n- fambl      Fambl. Required version: 2.1.10 or higher.\n             See http://ilk.uvt.nl/~antalb\n- svmlight   SVM-Light by Thorsten Joachims. Tested version: 5.00. \n             See http://svmlight.joachims.org \n- ripper     Ripper by William Cohen. Tested version: V1 release\n             2.5 (patch 1). \n             See http://www.wcohen.com\n- maxent     Maximum entropy classifier by Zhang Le. Tested\n             version: 20041229. See \n             http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html\n- c4.5       C4.5 by J. Ross Quinlan. Tested version: release 8.\n             See http://www.cse.unsw.edu.au/~quinlan/ \n- winnow     Sparse Network of Winnow as implemented in SNoW by\n             Carlson, Cumby, Rosen, \u0026 Roth. Tested version: 3.1.3\n             See http://l2r.cs.uiuc.edu/~danr/snow.html\n- perceptron Perceptron (sparse implementation) as implemented\n             in SNoW by Carlson, Cumby, Rosen, \u0026 Roth. Tested \n             version: 3.1.3. \n             See http://l2r.cs.uiuc.edu/~danr/snow.html\n\nNote that \n\n(1) \u003ctrainingfile\u003e must adhere to the data formatting \n    requirements of \u003calgorithm\u003e (which are quite different among the\n    current set of supported algorithms; paramsearch does not do\n    any conversion)\n\n(2) C4.5 expects a \"names\" file declaring all feature value and\n    class names present in training and test material. This should\n    actually be named \u003cfilestem\u003e.data.names for paramsearch to operate\n    correctly. (TiMBL or Ripper can generate names files automatically).\n\n(3) C4.5 and Ripper return errors, not accuracies.\n\n(4) Winnow and Perceptron demand a third command line argument stating \n    the number of classes.\n\n\nADDITIONAL TOOLS\n\nThe paramsearch distribution contains the following tools:\n\n* runfull-\u003calgorithm\u003e: runs a full experiment with a \u003ctrainingfile\u003e\nand a \u003ctestfile\u003e, using the settings as logged in\n\u003ctrainingfile\u003e.\u003calgorithm\u003e.bestsetting. This saves you from typing the\nbest settings found in the *.bestsetting file.\n\n* make_binary.simpel.pl: written by Iris Hendrickx. Rewrites a TiMBL\ndata file into a \"binary\" file suited either for SVMLight, Maxent, or\nSNoW (winnow and perceptron). \n\nUsage: perl make_binary.simpel.pl [svm | snow]  trainfile ( testfile )\n\nSee the file for more information.\n\n\nACKNOWLEDGEMENTS\n\nThanks to Iris Hendrickx, Walter Daelemans, Dan Roth, William Cohen,\nZhang Le, Erwin Marsi, Piroska Lendvai, Erik Tjong Kim Sang, Bertjan\nBusser, Ko van der Sloot, Jakub Zavrel, Sabine Buchholz, Jorn\nVeenstra, Sander Canisius, Menno van Zaanen, Anders Noklestad, Pei-yun\nHsueh, Guy De Pauw, Gunn Inger Lyse, Svetoslav Marinov, Handre\nGroenewald, Kim Luyckx, and Maarten van Gompel for merciless testing\nand for passing along pieces of the puzzle.\n\n\nBUGS\n\nThis software is in development and may contain errors. Please send\nall bug reports to Antal van den Bosch \u003cAntal.vdnBosch@uvt.nl\u003e.\n\n\nDISCLAIMER\n\nParamsearch comes WITHOUT ANY WARRANTY. Author nor distributor accept\nresponsibility to anyone for the consequences of using it or for\nwhether it serves any particular purpose or works at all.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flanguagemachines%2Fparamsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flanguagemachines%2Fparamsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flanguagemachines%2Fparamsearch/lists"}