{"id":13689328,"url":"https://github.com/eamid/trimap","last_synced_at":"2025-05-01T23:33:53.340Z","repository":{"id":37622724,"uuid":"130825673","full_name":"eamid/trimap","owner":"eamid","description":"TriMap: Large-scale Dimensionality Reduction Using Triplets","archived":false,"fork":false,"pushed_at":"2022-08-29T16:48:47.000Z","size":24962,"stargazers_count":305,"open_issues_count":4,"forks_count":20,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-11-10T12:52:11.884Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eamid.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-24T08:57:41.000Z","updated_at":"2024-11-08T11:32:40.000Z","dependencies_parsed_at":"2022-08-30T00:10:47.662Z","dependency_job_id":null,"html_url":"https://github.com/eamid/trimap","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eamid%2Ftrimap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eamid%2Ftrimap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eamid%2Ftrimap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eamid%2Ftrimap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eamid","download_url":"https://codeload.github.com/eamid/trimap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224282236,"owners_count":17285793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T15:01:43.799Z","updated_at":"2024-11-12T13:31:30.031Z","avatar_url":"https://github.com/eamid.png","language":"Python","readme":"\n======\nTriMap\n======\n\nTriMap is a dimensionality reduction method that uses triplet constraints\nto form a low-dimensional embedding of a set of points. The triplet constraints\nare of the form \"point *i* is closer to point *j* than point *k*\". The triplets are \nsampled from the high-dimensional representation of the points and a weighting \nscheme is used to reflect the importance of each triplet. \n\nTriMap provides a significantly better global view of the data than the\nother dimensionality reduction methods such t-SNE, LargeVis, and UMAP. The global \nstructure includes relative distances of the clusters, multiple scales in \nthe data, and the existence of possible outliers. We define a global score to quantify the quality of an embedding in reflecting the global structure of the data.\n\nCIFAR-10 dataset (test set) passed through a CNN (*n = 10,000, d = 1024*): Notice the semantic structure unveiled by TriMap.\n\n.. image:: results/cifar10.png\n    :alt: Visualizations of the CIFAR-10 dataset\n\nThe following implementation is in Python. Further details and more experimental results are available in the `paper \u003chttps://arxiv.org/abs/1910.00204\u003e`_. See the `example colab \u003chttps://github.com/eamid/examples/blob/master/TriMap.ipynb\u003e`_ for some analysis.\n\n-----------------\nNews!\n-----------------\n\n[Mar 16, 2022] An example colab using TriMap `JAX implementation \u003chttps://github.com/google-research/google-research/tree/master/trimap\u003e`_ is now available at https://github.com/eamid/examples/blob/master/TriMap.ipynb. We analyze the results on S-curve, MNIST, Fashion MNIST, etc. using t-SNE, UMAP, TriMap, and PCA.\n\n[Feb 17, 2022] A JAX implementation is now available at https://github.com/google-research/google-research/tree/master/trimap. More updates are coming soon!\n\n\n-----------------\nHow to use TriMap\n-----------------\n\nTriMap has a transformer API similar to other sklearn libraries. To use \nTriMap with the default parameters, simply do:\n\n.. code:: python\n\n    import trimap\n    from sklearn.datasets import load_digits\n\n    digits = load_digits()\n\n    embedding = trimap.TRIMAP().fit_transform(digits.data)\n\nTo find the embedding using a precomputed pairwise distance matrix D, pass D as input and set use_dist_matrix to True:\n\n.. code:: python\n\n    embedding = trimap.TRIMAP(use_dist_matrix=True).fit_transform(D)\n\nYou can also pass the precomputed k-nearest neighbors and their corresponding distances as a tuple (knn_nbrs, knn_distances). Note that the rows must be in order, starting from point 0 to n-1. This feature also requires X to compute the embedding\n\n.. code:: python\n\n    embedding = trimap.TRIMAP(knn_tuple=(knn_nbrs, knn_distances)).fit_transform(X)\n\nTo calculate the global score, do:\n\n.. code:: python\n\n    gs = trimap.TRIMAP(verbose=False).global_score(digits.data, embedding)\n    print(\"global score %2.2f\" % gs)\n\n\n-----------------\nParameters\n-----------------\n\nThe list of parameters is given blow:\n\n -  ``n_dims``: Number of dimensions of the embedding (default = 2)\n\n -  ``n_inliers``: Number of nearest neighbors for forming the nearest neighbor triplets (default = 12).\n\n -  ``n_outliers``: Number of outliers for forming the nearest neighbor triplets (default = 4).\n\n -  ``n_random``: Number of random triplets per point (default = 3).\n\n -  ``distance``: Distance measure ('euclidean' (default), 'manhattan', 'angular' (or 'cosine'), 'hamming')\n\n -  ``weight_temp``: Temperature of the logarithm applied to the weights. Larger temperatures generate more compact embeddings. weight_temp=0. corresponds to no transformation (default=0.5).\n\n -  ``weight_adj`` (deprecated): The value of gamma for the log-transformation (default = 500.0).\n\n -  ``lr``: Learning rate (default = 0.1).\n\n -  ``n_iters``: Number of iterations (default = 400).\n \nThe other parameters include:\n\n -  ``knn_tuple``: Use the precomputed nearest-neighbors information in form of a tuple (knn_nbrs, knn_distances) (default = None)\n\n -  ``use_dist_matrix``: Use the precomputed pairwise distance matrix (default = False)\n\n -  ``apply_pca``: Reduce the number of dimensions of the data to 100 if necessary before applying the nearest-neighbor search (default = True).\n\n -  ``opt_method``: Optimization method {'sd' (steepest descent), 'momentum' (GD with momentum), 'dbd' (delta-bar-delta, default)}.\n\n -  ``verbose``: Print the progress report (default = False).\n\n -  ``return_seq``: Store the intermediate results and return the results in a tensor (default = False).\n\nAn example of adjusting these parameters:\n\n.. code:: python\n\n    import trimap\n    from sklearn.datasets import load_digits\n\n    digits = load_digits()\n\n    embedding = trimap.TRIMAP(n_inliers=20,\n                              n_outliers=10,\n                              n_random=10).fit_transform(digits.data)\n\nThe nearest-neighbor calculation is performed using  `ANNOY \u003chttps://github.com/spotify/annoy\u003e`_. \n\n\n--------\nExamples\n--------\n\nThe following are some of the results on real-world datasets. The values of nearest-neighbor accuracy and global score are shown as a pair (NN, GS) on top of each figure. For more results, please refer to our `paper \u003chttps://arxiv.org/abs/1910.00204\u003e`_.\n\nUSPS Handwritten Digits (*n = 11,000, d = 256*)\n\n.. image:: results/usps.png\n    :alt: Visualizations of the USPS dataset\n\n20 News Groups (*n = 18,846, d = 100*)\n\n.. image:: results/news20.png\n    :alt: Visualizations of the 20 News Groups dataset\n\nTabula Muris (*n = 53,760, d = 23,433*)\n\n.. image:: results/tabula.png\n    :alt: Visualizations of the Tabula Muris Mouse Tissues dataset\n\nMNIST Handwritten Digits (*n = 70,000, d = 784*)\n\n.. image:: results/mnist.png\n    :alt: Visualizations of the MNIST dataset\n\nFashion MNIST (*n = 70,000, d = 784*)\n\n.. image:: results/fmnist.png\n    :alt: Visualizations of the  Fashion MNIST dataset\n    \nTV News (*n = 129,685, d = 100*)\n\n.. image:: results/tvnews.png\n    :alt: Visualizations of the  TV News dataset\n\n\nRuntime of t-SNE, LargeVis, UMAP, and TriMap in the hh:mm:ss format on a single machine with 2.6 GHz Intel Core i5 CPU and 16 GB of memory is given in the following table. We limit the runtime of each method to 12 hours. Also, UMAP runs out of memory on datasets larger than ~4M points.\n\n.. image:: results/runtime.png\n    :alt: Runtime of TriMap compared to other methods\n\n\n----------\nInstalling\n----------\n\nRequirements:\n\n* numpy\n* scikit-learn\n* numba\n* annoy\n\n**Installing annoy**\n\nIf you are having trouble with installing `annoy` on macOS using the command:\n\n.. code:: bash\n\n    pip3 install annoy\n\nyou can alternatively try:\n\n.. code:: bash\n\n    pip3 install git+https://github.com/sutao/annoy.git@master\n\n**Install Options**\n\nIf you have all the requirements installed, you can use pip:\n\n.. code:: bash\n\n    sudo pip install trimap\n    \nPlease regularly check for updates and make sure you are using the most recent version. If you have TriMap installed and would like to upgrade to the newer version, you can use the command:\n\n.. code:: bash\n\n    sudo pip install --upgrade --force-reinstall trimap\n\nAn alternative is to install the dependencies manually using anaconda and using pip \nto install TriMap:\n\n.. code:: bash\n\n    conda install numpy\n    conda install scikit-learn\n    conda install numba\n    conda install annoy\n    pip install trimap\n\nFor a manual install get this package:\n\n.. code:: bash\n\n    wget https://github.com/eamid/trimap/archive/master.zip\n    unzip master.zip\n    rm master.zip\n    cd trimap-master\n\nInstall the requirements\n\n.. code:: bash\n\n    sudo pip install -r requirements.txt\n\nor\n\n.. code:: bash\n\n    conda install scikit-learn numba annoy\n\nInstall the package\n\n.. code:: bash\n\n    python setup.py install\n\n\n------------------------\nSupport and Contribution\n------------------------\n\nThis implementation is still a work in progress. Any comments/suggestions/bug-reports\nare highly appreciated. Please feel free contact me at: eamid@ucsc.edu. If you would \nlike to contribute to the code, please `fork the project \u003chttps://github.com/eamid/trimap/issues#fork-destination-box\u003e`_\nand send me a pull request.\n\n\n--------\nCitation\n--------\n\nIf you use TriMap in your publications, please cite our current reference on arXiv:\n\n::\n\n   @article{2019TRIMAP,\n        author = {{Amid}, Ehsan and {Warmuth}, Manfred K.},\n        title = \"{TriMap: Large-scale Dimensionality Reduction Using Triplets}\",\n        journal = {arXiv preprint arXiv:1910.00204},\n        archivePrefix = \"arXiv\",\n        eprint = {1910.00204},\n        year = 2019,\n   }\n\n\n-------\nLicense\n-------\n\nPlease see the LICENSE file.\n\n\n","funding_links":[],"categories":["Python","Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feamid%2Ftrimap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feamid%2Ftrimap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feamid%2Ftrimap/lists"}