{"id":13656009,"url":"https://github.com/proycon/python-timbl","last_synced_at":"2025-04-15T06:53:35.344Z","repository":{"id":6887186,"uuid":"8136669","full_name":"proycon/python-timbl","owner":"proycon","description":"python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.","archived":false,"fork":false,"pushed_at":"2025-01-22T14:08:36.000Z","size":169,"stargazers_count":18,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-15T06:53:26.453Z","etag":null,"topics":["k-nearest-neighbours","knn","machine-learning","python","timbl"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/proycon.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-02-11T11:07:42.000Z","updated_at":"2025-01-22T13:53:27.000Z","dependencies_parsed_at":"2024-12-18T12:21:54.754Z","dependency_job_id":"428f8dba-84c8-4e80-9eeb-8e188cb2884d","html_url":"https://github.com/proycon/python-timbl","commit_stats":{"total_commits":157,"total_committers":2,"mean_commits":78.5,"dds":0.07643312101910826,"last_synced_commit":"acdb8d57c6c7a444a74ab28fd00ed499879b6201"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proycon%2Fpython-timbl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proycon%2Fpython-timbl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proycon%2Fpython-timbl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proycon%2Fpython-timbl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/proycon","download_url":"https://codeload.github.com/proycon/python-timbl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249023711,"owners_count":21199958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["k-nearest-neighbours","knn","machine-learning","python","timbl"],"created_at":"2024-08-02T04:00:45.779Z","updated_at":"2025-04-15T06:53:35.337Z","avatar_url":"https://github.com/proycon.png","language":"Python","readme":".. image:: http://applejack.science.ru.nl/lamabadge.php/python-timbl\n   :target: http://applejack.science.ru.nl/languagemachines/\n\n.. image:: https://www.repostatus.org/badges/latest/active.svg\n   :alt: Project Status: Active – The project has reached a stable, usable state and is being actively developed.\n   :target: https://www.repostatus.org/#active\n\n.. image:: https://zenodo.org/badge/8136669.svg\n   :target: https://zenodo.org/badge/latestdoi/8136669\n\n======================\n README: python-timbl\n======================\n\n:Authors: Sander Canisius, Maarten van Gompel\n:Contact: proycon@anaproy.nl\n:Web site: https://github.com/proycon/python-timbl/\n\npython-timbl is a Python extension module wrapping the full TiMBL C++\nprogramming interface. With this module, all functionality exposed\nthrough the C++ interface is also available to Python scripts. Being\nable to access the API from Python greatly facilitates prototyping\nTiMBL-based applications.\n\nThis is the 2013 release by Maarten van Gompel, building on the 2006 release by Sander Canisius. For those used to the old library, there is one backwards-incompatible change, adapt your scripts to use ``import timblapi`` instead of ``import timbl``, as the latter is now a higher-level interface.\n\nSince 2020, this only supports Python 3, Python 2 support has been deprecated.\n\nLicense\n=======\n\npython-timbl is free software, distributed under the terms of the GNU `General\nPublic License`_. Please cite TiMBL in  publication of research that uses\nTiMBL.\n\n.. _General Public License: http://www.gnu.org/licenses/gpl.html\n\nInstallation\n============\n\nIn a Python virtual environment, run:\n\n```\npip install python3-timbl\n```\n\nNote that on macOS, wheel packages are currently only available for Python\n3.13, as this the the Python version Homebrew uses in linking libboost-python.\n\nIf no wheels (binary packages) are available for your system, then this will\nattempt to compile from source. If that is the case, a number of dependencies\nare required:\n\npython-timbl depends on two external packages, which must have been built\nand/or installed on your system in order to successfully build python-timbl.\nThe first is TiMBL itself; download its tarball from TiMBL's homepage and\nfollow the installation instructions.  The second prerequisite is Boost.Python, a library that facilitates writing\nPython extension modules in C++. Many Linux distributions come with prebuilt\npackages of Boost.Python. If so, install this package; on Ubuntu/Debian this\ncan be done as follows.\n\n\t$ sudo apt-get install libboost-python libboost-python-dev\n\n\nUsage\n=======\n\npython-timbl offers two interface to the timbl API. A low-level interface contained in the module ``timblapi``, which is very much like the C++ library, and a high-level object oriented interface in the ``timbl`` module, which offers a ``TimblClassifier`` class.\n\ntimbl.TimblClassifier: High-level interface\n----------------------------------------------\n\nThe high-level interface features as ``TimblClassifier`` class which can be used for training and testing classifiers. An example is provided in ``example.py``, parts of it will be discussed here.\n\nAfter importing the necessary module, the classifier is instantiated by passing it an identifier which will be used as prefix used for all filenames written, and a string containing options just as you would pass them to Timbl::\n\n\timport timbl\n\tclassifier = timbl.TimblClassifier(\"wsd-bank\", \"-a 0 -k 1\" )\n\nNormalization of theclass distribution is enabled by default (regardless of the ``-G`` option to Timbl), pass ``normalize=False`` to disable it.\n\nTraining instances can be added using the ``append(featurevector, classlabel)`` method::\n\n\tclassifier.append( (1,0,0), 'financial')\n\tclassifier.append( (0,1,0), 'furniture')\n\tclassifier.append( (0,0,1), 'geographic')\n\nSubsequently, you invoke the actual training, note that at each step Timbl may output considerable details about what it is doing to standard error output::\n\n\tclassifier.train()\n\nThe results of this training is an instance base file, which you can save to file so you can load it again later::\n\n\tclassifier.save()\n\n\tclassifier = timbl.TimblClassifier(\"wsd-bank\", \"-a 0 -k 1\" )\n\tclassifier.load()\n\n\n\nThe main advantage of the Python library is the fact that you can classify instances on the fly as follows, just pass a feature vector and optionally also a class label to ``classify(featurevector, classlabel)``::\n\n\tclasslabel, distribution, distance = classifier.classify( (1,0,0) )\n\nYou can also create a test file and test it all at once::\n\n\tclassifier = timbl.TimblClassifier(\"wsd-bank\", \"-a 0 -k 1\" )\n\tclassifier.load()\n\tclassifier.addinstance(\"testfile\", (1,0,0),'financial' ) #addinstance can be used to add instances to external files (use append() for training)\n\tclassifier.addinstance(\"testfile\", (0,1,0),'furniture' )\n\tclassifier.addinstance(\"testfile\", (0,0,1),'geograpic' )\n\tclassifier.addinstance(\"testfile\", (1,1,0),'geograpic' ) #this one will be wrongly classified as financial \u0026 furniture\n\tclassifier.test(\"testfile\")\n\n\tprint \"Accuracy: \", classifier.getAccuracy()\n\n\nReal multithreading support\n-----------------------------\n\nIf you are writing a multithreaded Python application (i.e. using the\n``threading`` module) and want to benefit from actual concurrency,\nside-stepping Python's Global Interpreter Lock, add the parameter\n``threading=True`` when invoking the ``TimblClassifier`` constructor.  Take\ncare to instantiate ``TimblClassifier`` *before* threading. You can then call\n``TimblClassifier.classify()`` from within your threads.  Concurrency only\nexists for this ``classify`` method.\n\nIf you do not set this option, everything will still work fine, but you won't benefit\nfrom actual concurrency due to Python's the Global Interpret Lock.\n\n\ntimblapi: Low-level interface\n-------------------------------\n\nFor documentation on the low level ``timblapi`` interface you can consult the TiMBL API guide.  Although this document actually describes the C++ interface to TiMBL, the latter is similar enough to its Python binding for this document to be a useful reference for python-timbl as well. For most part, the Python TiMBL interface follows the C++ version closely. The differences are listed below.\n\n**Naming style**\n\nIn the C++ interface, method names are in *UpperCamelCase*; for example, ``Classify``, ``SetOptions``, etc. In contrast, the Python interface uses *lowerCamelCase*: ``classify``, ``setOptions``, etc.\nMethod overloading TiMBL's ``Classify`` methods use the C++ method overloading feature to provide three different kinds of outputs. Method overloading is non-existant in Python though; therefore, python-timbl has three differently named methods to mirror the functionality of the overloaded Classify method. The mapping is as follows::\n\n\t# bool TimblAPI::Classify(const std::string\u0026 Line,\n\t#                         std::string\u0026 result);\n\t#\n\tdef TimblAPI.classify(line) -\u003e bool, result\n\n\t#\n\t# bool TimblAPI::Classify(const std::string\u0026 Line,\n\t#                         std::string\u0026 result,\n\t#                         double\u0026 distance);\n\t#\n\tdef TimblAPI.classify2(line) -\u003e bool, string, distance\n\n\t#\n\t# bool TimblAPI::Classify(const std::string\u0026 Line,\n\t#                         std::string\u0026 result,\n\t#                         std::string\u0026 Distrib,\n\t#                         double\u0026 distance);\n\t#\n\tdef TimblAPI.classify3(line, bool normalize=true,int requireddepth=0) -\u003e bool, string, dictionary, distance\n\n    #Thread-safe version of the above, releases and reacquires Python's Global Interprer Lock\n\tdef TimblAPI.classify3safe(line, normalize, requireddepth=0) -\u003e bool, string, dictionary, distance\n\n\nNote that the ``classify3`` function returned a string representation of the\ndistribution in versions of python-timbl prior to 2015.08.12, now it returns an\nactual dictionary. When using ``classify3safe`` (the thread-safe version) ,\nensure you first call initthreads after instantiating ``timblapi``, and\nmanually call the ``initthreading()`` method.\n\n\n**Python-only methods**\n\nThree TiMBL API methods print information to a standard C++ output stream object (ShowBestNeighbors, ShowOptions, ShowSettings, ShowSettings). In the Python interface, these methods will only work with Python (stream) objects that have a fileno method returning a valid file descriptor. Alternatively, three new methods are provided (bestNeighbo(u)rs, options, settings); these methods return the same information as a Python string object.\n\n\n**scikit-learn wrapper**\n\nA wrapper for use in scikit-learn has been added. It was designed for use in scikit-learn Pipeline objects. The wrapper is not finished and has to date only been tested on sparse data. Note that TiMBL does not work well with large amounts of features. It is suggested to reduce the amount of features to a number below 100 to keep system performance reasonable. Use on servers with large amounts of memory and processing cores advised.\n","funding_links":[],"categories":["Python","Uncategorized"],"sub_categories":["General-Purpose Machine Learning","Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproycon%2Fpython-timbl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproycon%2Fpython-timbl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproycon%2Fpython-timbl/lists"}