{"id":23487113,"url":"https://github.com/lisitsyn/tapkee","last_synced_at":"2025-04-06T20:13:36.919Z","repository":{"id":5234954,"uuid":"6411898","full_name":"lisitsyn/tapkee","owner":"lisitsyn","description":"A flexible and efficient С++ template library for dimension reduction","archived":false,"fork":false,"pushed_at":"2024-05-19T12:40:04.000Z","size":1315,"stargazers_count":231,"open_issues_count":16,"forks_count":58,"subscribers_count":23,"default_branch":"main","last_synced_at":"2024-05-19T14:26:27.541Z","etag":null,"topics":["dimension-reduction","machine-learning"],"latest_commit_sha":null,"homepage":"http://tapkee.lisitsyn.me","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"CyanogenMod/android_libcore","license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lisitsyn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-10-26T23:35:36.000Z","updated_at":"2024-05-20T17:23:33.038Z","dependencies_parsed_at":"2024-05-20T17:23:23.362Z","dependency_job_id":null,"html_url":"https://github.com/lisitsyn/tapkee","commit_stats":{"total_commits":605,"total_committers":9,"mean_commits":67.22222222222223,"dds":0.0760330578512397,"last_synced_commit":"a755ddb772492591571147cccc43938317ec7389"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lisitsyn%2Ftapkee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lisitsyn%2Ftapkee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lisitsyn%2Ftapkee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lisitsyn%2Ftapkee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lisitsyn","download_url":"https://codeload.github.com/lisitsyn/tapkee/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247543595,"owners_count":20955865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dimension-reduction","machine-learning"],"created_at":"2024-12-24T22:25:13.803Z","updated_at":"2025-04-06T20:13:36.885Z","avatar_url":"https://github.com/lisitsyn.png","language":"C++","readme":"Tapkee is a C++ template library for dimensionality reduction with some bias on\nspectral methods. The Tapkee origins from the code developed during\n[GSoC 2011](http://www.google-melange.com/gsoc/homepage/google/gsoc2011) as\nthe part of the [Shogun machine learning toolbox](https://github.com/shogun-toolbox/shogun).\nThe project aim is to provide efficient and flexible standalone library for\ndimensionality reduction which can be easily integrated to existing codebases.\nTapkee leverages capabilities of effective [Eigen3 linear algebra library](http://eigen.tuxfamily.org) and\noptionally makes use of the [ARPACK eigensolver](http://www.caam.rice.edu/software/ARPACK/).\nThe library uses CoverTree and VP-tree data structures to compute nearest neighbors. To achieve\ngreater flexibility we provide a callback interface which decouples dimension reduction algorithms from\nthe data representation and storage schemes.\n\nThe library is distributed under permissive\n[BSD 3-clause license](https://github.com/lisitsyn/tapkee/blob/master/include/LICENSE)\n(except a few rather optional parts that are distributed under other\nopen sources licenses, see Licensing section of this document). If you use\nthis software in any publication we would be happy if you cite the following paper:\n\n\tSergey Lisitsyn and Christian Widmer and Fernando J. Iglesias Garcia. Tapkee: An Efficient Dimension Reduction Library. Journal of Machine Learning Research, 14: 2355-2359, 2013.\n\nTo get started with dimension reduction you may try the\n[go.py script](https://github.com/lisitsyn/tapkee/blob/master/examples/go.py)\nthat embeds common datasets (swissroll, helix, scurve) using\nthe Tapkee library and outputs it with the help of\nMatplotlib library. To use the script build the\nsample application (see the Application section for more details)\nand call go.py with the following command:\n\n\t./examples/go.py [swissroll|helix|scurve|...] [lle|isomap|...]\n\nYou may also try out an minimal example using `make minimal` (examples/minimal)\nand the RNA example using `make rna` (examples/rna). There are also a few graphical\nexamples. To run MNIST digits embedding example use `make mnist` (examples/mnist),\nto run promoters embedding example use `make promoters` (examples/promoters)\nand to run embedding for faces dataset use `make faces` (examples/faces).\nAll graphical examples require Matplotlib which can be usually\ninstalled with a package manager. The promoters example also\nhas non-trivial dependency on Shogun machine learning toolbox\n(minimal version is 2.1.0). We also provide\nsome examples of usage Tapkee in Shogun as\n`make langs` (examples/langs) example.\n\nAPI\n---\n\nWe provide an interface based on the method chaining technique. The chain starts with the call\nof the `with(const ParametersSet\u0026)` method, which is used to provide parameters like the method\nto use and its settings. The provided argument is formed with the following syntax:\n\n\t(keyword1=value1, keyword2=value2)\n\nSuch syntax is possible due to comma operator overloading which groups all assigned keywords\nin the comma separated list.\n\nKeywords are defined in the `tapkee` namespace. Currently, the following keywords\nare defined: `method`, `eigen_method`, `neighbors_method`, `num_neighbors`, `target_dimension`,\n`diffusion_map_timesteps`, `gaussian_kernel_width`, `max_iteration`, `spe_global_strategy`,\n`spe_num_updates`, `spe_tolerance`, `landmark_ratio`, `nullspace_shift`, `klle_shift`,\n`check_connectivity`, `fa_epsilon`, `progress_function`, `cancel_function`, `sne_perplexity`,\n`sne_theta`. See the documentation for their detailed meaning.\n\nAs an example of parameters setting, if you want to use the Isomap\nalgorithm with the number of neighbors set to 15:\n\n\ttapkee::with((method=Isomap,num_neighbors=15))\n\nPlease note that the inner parentheses are necessary as it uses the\ncomma operator which appears to be ambiguous in this case.\n\nNext, you may either embed the provided matrix with:\n\n\ttapkee::with((method=Isomap,num_neighbors=15)).embedUsing(matrix);\n\nOr provide callbacks (kernel, distance and features) using any combination\nof the `withKernel(KernelCallback)`, `withDistance(DistanceCallback)` and\n`withFeatures(FeaturesCallback)` member functions:\n\n\ttapkee::with((method=Isomap,num_neighbors=15))\n\t       .withKernel(kernel_callback)\n\t       .withDistance(distance_callback)\n\t       .withFeatures(features_callback)\n\nOnce callbacks are initialized you may either embed data using an\nSTL-compatible sequence of indices or objects (that supports the\n`begin()` and `end()` methods to obtain the corresponding iterators)\nwith the `embedUsing(Sequence)` member function\nor embed the data using a sequence range with the\n`embedRange(RandomAccessIterator, RandomAccessIterator)`\nmember function.\n\nAs a summary - a few examples:\n\n\tTapkeeOutput output = with((method=Isomap,num_neighbors=15))\n\t    .embedUsing(matrix);\n\n\tTapkeeOutput output = with((method=Isomap,num_neighbors=15))\n\t    .withDistance(distance_callback)\n\t    .embedUsing(indices);\n\n\tTapkeeOutput output = with((method=Isomap,num_neighbors=15))\n\t    .withDistance(distance_callback)\n\t    .embedRange(indices.begin(),indices.end());\n\nMinimal example\n---------------\n\nA minimal working example of a program that uses the library is:\n\n\t#include \u003ctapkee/tapkee.hpp\u003e\n\t#include \u003ctapkee/callbacks/dummy_callbacks.hpp\u003e\n\n\tusing namespace std;\n\tusing namespace tapkee;\n\n\tstruct MyDistanceCallback\n\t{\n\t\tScalarType distance(IndexType l, IndexType r) { return abs(l-r); }\n\t};\n\n\tint main(int argc, const char** argv)\n\t{\n\t\tconst int N = 100;\n\t\tvector\u003cIndexType\u003e indices(N);\n\t\tfor (int i=0; i\u003cN; i++) indices[i] = i;\n\n\t\tMyDistanceCallback d;\n\n\t\tTapkeeOutput output = tapkee::with((method=MultidimensionalScaling,target_dimension=1))\n\t\t   .withDistance(d)\n\t\t   .embedUsing(indices);\n\n\t\tcout \u003c\u003c output.embedding.transpose() \u003c\u003c endl;\n\t\treturn 0;\n\t}\n\nThis example require Tapkee to be in the include path. With Linux compilers\nyou may do that with the `-I/path/to/tapkee/headers/folder` key.\n\nIntegration\n-----------\n\nThere are a few issues related to including the Tapkee library to your code. First, if your library\nalready includes Eigen3 (and only if) - you might need to let Tapkee\nknow about that with the following define:\n\n`#define TAPKEE_EIGEN_INCLUDE_FILE \u003cpath/to/your/eigen/include/file.h\u003e`\n\nPlease note that if you don't include Eigen3 in your project there is no need to define that variable -\nin this case Eigen3 will be included by Tapkee. This issue comes from the need of including the Eigen3 library\nonly once when using some specific parameters (like debug and extensions).\n\nIf you are able to use less restrictive licenses (such as LGPLv3) you may define\nthe following variable:\n\n- `TAPKEE_USE_LGPL_COVERTREE` to use Covertree code by John Langford.\n\nWhen compiling your software that includes Tapkee be sure Eigen3 headers are in include path and your code\nis linked against ARPACK library (-larpack key for g++ and clang++).\n\nFor an example of integration you may check\n[Tapkee adapter in Shogun](https://github.com/shogun-toolbox/shogun/blob/master/src/shogun/lib/tapkee/tapkee_shogun.cpp).\n\nWhen working with installed headers you may check which version of the library\ndo you have with checking the values of `TAPKEE_WORLD_VERSION`, `TAPKEE_MAJOR_VERSION`\nand `TAPKEE_MINOR_VERSION` defines.\n\nWe welcome any integration so please contact authors if you have got any questions. If you have\nsuccessfully used the library please also let authors know about that - mentions of any\napplications are very appreciated.\n\nCustomization\n-------------\n\nTapkee is designed to be highly customizable with preprocessor definitions.\n\nIf you want to use float as internal numeric type (default is double) you may do\nthat with definition of `#define TAPKEE_CUSTOM_NUMTYPE float`\nbefore including [defines header](https://github.com/lisitsyn/tapkee/blob/master/include/tapkee_defines.hpp).\n\nIf you use some non-standard STL-compatible realization of vector, map and pair you may redefine them\nwith `TAPKEE_INTERNAL_VECTOR`, `TAPKEE_INTERNAL_PAIR`, `TAPKEE_INTERNAL_MAP`\n(they are set to std::vector, std::pair and std::map by default otherwise).\n\nYou may define `TAPKEE_USE_FIBONACCI_HEAP` or `TAPKEE_USE_PRIORITY_QUEUE` to select which\ndata structure should be used in the shortest paths computing algorithm. By default\na priority queue is used.\n\nOther properties can be loaded from some provided header file using `#define TAPKEE_CUSTOM_PROPERTIES`. Currently\nsuch file should define only one variable - `COVERTREE_BASE` which defines the base of the CoverTree (default is 1.3).\n\nCommand line application\n-----------\n\nTapkee comes with a sample application which can be used to construct\nlow-dimensional representations of dense feature matrices. For more information on\nits usage please run:\n\n`./bin/tapkee -h`\n\nThe application takes plain ASCII file containing dense matrix (each vector is a column and each\nline contains values of some feature). The output of the application is stored into the provided\nfile in the same format (each line is feature).\n\nTo compile the application please use [CMake](http://cmake.org/). The workflow of compilation\nTapkee with CMake is usual. When using Unix-based\nsystems you may use the following command to compile the Tapkee application:\n\n`mkdir build \u0026\u0026 cd build \u0026\u0026 cmake [definitions] .. \u0026\u0026 make`\n\nThere are a few cases when you'd want to put some definitions:\n\n- To enable unit-tests compilation add to `-DBUILD_TESTS=1` to `[definitions]` when building. Please note that\n  building unit-tests require googletest. If you are running Ubuntu you may install `libgtest-dev` package for that.\n  Otherwise, if you have gtest sources around you may provide them as `-DGTEST_SOURCE_DIR` and `-DGTEST_INCLUDES_DIR`.\n  You may also download gtest with the following command:\n\n`wget https://github.com/google/googletest/archive/release-1.8.0.tar.gz \u0026\u0026 tar xfv release-1.8.0.tar.gz`\n\n  Downloaded sources will be used by Tapkee.\n  To run tests use `make test` command (or better 'ctest -VV').\n\n- To let make script store test coverage information using GCOV and\n  add a target for output test coverage in HTML with LCOV add the `-DUSE_GCOV=1` flag to `[definitions]`.\n\n- To enable precomputation of kernel/distance matrices which can speed-up algorithms (but requires much more memory) add\n  `-DPRECOMPUTED=1` to `[definitions]` when building.\n\n- To build application without parts licensed by LGPLv3 use `-DGPL_FREE=1` definition.\n\nThe library requires Eigen3 to be available in your path. The ARPACK library is also highly\nrecommended to achieve best performance. On Debian/Ubuntu these packages can be installed with\n\n\tsudo apt-get install libeigen3-dev libarpack2-dev\n\nIf you are using Mac OS X and Macports you can install these packages with\n\n\tsudo port install eigen3 \u0026\u0026 sudo port install arpack`\n\nIn case you want to use some non-default\ncompiler use `CC=your-C-compiler CXX=your-C++-compiler cmake [definitions] ..` when running cmake.\n\nDirectory contents\n------------------\n\nThe repository of Tapkee contains the following directories:\n\n- `src/` that contains simple command-line application (`src/cli`)\n  and CMake module finders (`src/cmake`).\n- `includes/` that contains the library itself in the `includes/tapkee`\n  subdirectory.\n- `test/` that contains unit-tests in the `test/unit` subdirectory and\n  a few helper scripts.\n- `examples/` that contains a few examples including already mentioned\n  (these examples are supposed to be called through `make` as described\n   above, e.g. `make minimal`).\n- `data/` a git submodule that contains data required for\n  examples. To initialize this submodule use `git submodule update --init`.\n- `doc/` that contains Doxygen interface file which is used to\n  generate HTML documentation of the library. Calling\n  `doxygen doc/Doxyfile` will generate it in this folder.\n\nOnce built, the root will also contain the following directories:\n- `bin` that contains binaries (`tapkee` that is command line application\n  and various tests with common naming `test_*`)\n- `lib` that contains gtest shared libraries.\n\nNeed help?\n----------\n\nIf you need any help or advice don't hesitate to send [an email](mailto://lisitsyn@hey.com) or\nfire [an issue at github](https://github.com/lisitsyn/tapkee/issues/new).\n\nSupported platforms\n-------------------\n\nTapkee is tested to be fully functional on Linux (ICC, GCC, Clang compilers)\nand Mac OS X (GCC and Clang compilers). It also compiles under Windows natively\n(MSVS 2012 compiler) with a few known issues. In general, Tapkee uses no platform\nspecific code and should work on other systems as well. Please\n[let us know](mailto://lisitsyn@hey.com) if you have successfully compiled\nor have got any issues on any other system not listed above.\n\nSupported dimension reduction methods\n-------------------------------------\n\nTapkee provides implementations of the following dimension reduction methods (urls to descriptions provided):\n\n* Locally Linear Embedding and Kernel Locally Linear Embedding (LLE/KLLE)\n* Neighborhood Preserving Embedding (NPE)\n* Local Tangent Space Alignment (LTSA)\n* Linear Local Tangent Space Alignment (LLTSA)\n* Hessian Locally Linear Embedding (HLLE)\n* Laplacian eigenmaps\n* Locality Preserving Projections\n* Diffusion map\n* Isomap and landmark Isomap\n* Multidimensional scaling and landmark Multidimensional scaling (MDS/lMDS)\n* Stochastic Proximity Embedding (SPE)\n* Principal Component Analysis (PCA)\n* Kernel Principal Component Analysis (PCA)\n* Random projection\n* Factor analysis\n* t-SNE\n* Barnes-Hut-SNE\n\nLicensing\n---------\n\nThe library is distributed under the [BSD 3-clause](https://github.com/lisitsyn/tapkee/blob/master/LICENSE) license.\n\nExceptions are:\n\n- [Barnes-Hut-SNE code](https://github.com/lisitsyn/tapkee/blob/master/include/external/barnes_hut_sne/) by Laurens van der Maaten which\n  is distributed under the BSD 4-clause license.\n\n- [Covertree code](https://github.com/lisitsyn/tapkee/blob/master/include/neighbors/covertree.hpp) by John Langford and Dinoj Surendran\n  which is distributed under the [LGPLv3 license](https://github.com/lisitsyn/tapkee/blob/master/LGPL-LICENSE).\n","funding_links":[],"categories":["Software"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flisitsyn%2Ftapkee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flisitsyn%2Ftapkee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flisitsyn%2Ftapkee/lists"}