{"id":15063954,"url":"https://github.com/davisidarta/fastlapmap","last_synced_at":"2025-04-10T16:42:35.692Z","repository":{"id":57428634,"uuid":"420576923","full_name":"davisidarta/fastlapmap","owner":"davisidarta","description":"Fast Laplacian Eigenmaps: lightweight multicore LE for non-linear dimensional reduction with minimal memory usage. Outperforms sklearn's implementation and escalates linearly beyond 10e6 samples.","archived":false,"fork":false,"pushed_at":"2021-11-12T15:20:25.000Z","size":5587,"stargazers_count":23,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-18T13:32:23.746Z","etag":null,"topics":["denoising","dimensionality-reduction","embedding","feature-engineering","laplacian-eigenmaps","machine-learning","multithreading","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davisidarta.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-24T03:23:47.000Z","updated_at":"2025-01-13T03:28:28.000Z","dependencies_parsed_at":"2022-08-26T04:00:14.747Z","dependency_job_id":null,"html_url":"https://github.com/davisidarta/fastlapmap","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davisidarta%2Ffastlapmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davisidarta%2Ffastlapmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davisidarta%2Ffastlapmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davisidarta%2Ffastlapmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davisidarta","download_url":"https://codeload.github.com/davisidarta/fastlapmap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248252728,"owners_count":21072703,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["denoising","dimensionality-reduction","embedding","feature-engineering","laplacian-eigenmaps","machine-learning","multithreading","python","scikit-learn"],"created_at":"2024-09-25T00:09:15.027Z","updated_at":"2025-04-10T16:42:35.675Z","avatar_url":"https://github.com/davisidarta.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Latest PyPI version](https://img.shields.io/pypi/v/fastlapmap.svg)](https://pypi.org/project/fastlapmap/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/DaviSidarta.svg?label=Follow%20%40davisidarta\u0026style=social)](https://twitter.com/davisidarta)\n        \n# Fast Laplacian Eigenmaps in python\n\nOpen-source [Laplacian Eigenmaps](https://www2.imm.dtu.dk/projects/manifold/Papers/Laplacian.pdf) for dimensionality reduction of large data in python. Comes with an\n wrapper for [NMSlib](https://github.com/nmslib/nmslib) to compute approximate-nearest-neighbors.\nPerforms several times faster than the default [scikit-learn implementation](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.SpectralEmbedding.html).    \n        \n# Installation\n\nYou'll need NMSlib for using this package properly. Installing it with no binaries is recommended if your CPU supports\n advanced instructions (it problably does): \n```\npip3 install --no-binary :all: nmslib\n```\nAlong with requirements:\n```\npip3 install numpy pandas scipy scikit-learn \n```\nThen you can install this package with pip:\n```\npip3 install fastlapmap\n```\n\n\n# Usage \n\nSee the following example with the [handwritten digits data](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits).\nHere, I visually compare results from the scikit-learn Laplacian Eigenmaps \n[implementation](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.SpectralEmbedding.html#sklearn.manifold.SpectralEmbedding) to \nthose from my implementation.\n\nNote that this implementation contains two similarity-learning algorithms: [anisotropic diffusion maps](https://doi.org/10.1073/pnas.0500334102) and [fuzzy simplicial sets](https://arxiv.org/abs/1802.03426).\n\n\n```\n# Import libraries\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.manifold import SpectralEmbedding\nfrom fastlapmap import LapEigenmap\n\n# Load some data\nfrom sklearn.datasets import load_digits\ndigits = load_digits()\ndata = digits.data\n\n# Define hyperparameters\nN_EIGS=2\nN_NEIGHBORS=10\nN_JOBS=10\n\nsk_se = SpectralEmbedding(n_components=N_EIGS, n_neighbors=N_NEIGHBORS, n_jobs=N_JOBS).fit_transform(data)\n\nflapmap_diff = LapEigenmap(data, n_eigs=2, similarity='diffusion', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)\nflapmap_fuzzy = LapEigenmap(data, n_eigs=2, similarity='fuzzy', norm_laplacian=True, k=N_NEIGHBORS, n_jobs=N_JOBS)\n\nfig, (ax1, ax2, ax3) = plt.subplots(1, 3)\nfig.suptitle('Handwritten digits data:', fontsize=24)\nax1.scatter(sk_se[:, 0], sk_se[:, 1], c=digits.target, cmap='Spectral', s=5)\nax1.set_title('Sklearn\\'s Laplacian Eigenmaps', fontsize=20)\nax2.scatter(flapmap_diff[:, 0], flapmap_diff[:, 1], c=digits.target, cmap='Spectral', s=5)\nax2.set_title('Fast Laplacian Eigenmaps with diffusion harmonics', fontsize=20)\nax3.scatter(flapmap_fuzzy[:, 0], flapmap_fuzzy[:, 1], c=digits.target, cmap='Spectral', s=5)\nax3.set_title('Fast Laplacian Eigenmaps with fuzzy simplicial sets', fontsize=20)\nplt.show()\n```\n![](figs/Embedding_comparison.png)\n\nAs we can see, results are nearly identical. Qualitatively, fastlapmap embeddings seem to better\nseparate distinct digits than the default\n[scikit-learn implementation](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html).\n\n# Parameters\n\n`data` : numpy.ndarray, pandas.DataFrame or scipy.sparse.csr_matrix Input data. By default will use nmslib for\n    approximate nearest-neighbors, which works both on numpy arrays and sparse matrices (faster and cheaper option).\n     Alternatively, users can provide a precomputed affinity matrix by stating `metric='precomputed'`.\n\n`n_eigs` : int (optional, default 10).\n     Number of eigenvectors to decompose the graph Laplacian into.\n\n`k` : int (optional, default 10).\n        Number of k-nearest-neighbors to use when computing affinities.\n\n`metric` : str (optional, default 'euclidean').\n        which metric to use when computing neighborhood distances. Defaults to 'euclidean'.\n        Accepted metrics include:\n        -'sqeuclidean'\n        -'euclidean'\n        -'l1'\n        -'lp' - requires setting the parameter `p` - equivalent to minkowski distance\n        -'cosine'\n        -'angular'\n        -'negdotprod'\n        -'levenshtein'\n        -'hamming'\n        -'jaccard'\n        -'jansen-shan'\n\n`M` : int (optional, default 10).\n        defines the maximum number of neighbors in the zero and above-zero layers during HSNW\n        (Hierarchical Navigable Small World Graph). However, the actual default maximum number\n        of neighbors for the zero layer is 2*M.  A reasonable range for this parameter\n        is 5-100. For more information on HSNW, please check https://arxiv.org/abs/1603.09320.\n        HSNW is implemented in python via NMSlib. Please check more about NMSlib at https://github.com/nmslib/nmslib.\n\n`efC` : int (optional, default 20).\n        A 'hnsw' parameter. Increasing this value improves the quality of a constructed graph\n        and leads to higher accuracy of search. However this also leads to longer indexing times.\n        A reasonable range for this parameter is 10-500.\n\n`efS` : int (optional, default 100).\n        A 'hnsw' parameter. Similarly to efC, increasing this value improves recall at the\n        expense of longer retrieval time. A reasonable range for this parameter is 10-500.\n\n`n_jobs` : int (optional, default 1)\n        How many threads to use in approximate-nearest-neighbors computation.\n\n`similarity` : str (optional, default 'diffusion').\n        Which algorithm to use for similarity learning. Options are diffusion harmonics ('diffusion')\n        , fuzzy simplicial sets ('fuzzy') and continuous k-nearest-neighbors ('cknn').\n\n`norm_laplacian` : bool (optional, default True).\n        Whether to renormalize the graph Laplacian.\n\n`return_evals` : bool (optional, default False).\n        Whether to also return the eigenvalues in a tuple of eigenvectors, eigenvalues. Defaults to False.\n\n`verbose` : bool (optional, default False).\n        Whether to report information on the current progress of the algorithm.\n\n\n\n# Benchmark\n\nSee the runtime comparison between this implementation and scikit-learn:\n\n```\n## Load benchmark function:\nfrom fastlapmap.benchmark import runtime_benchmark\n\n# Load data\nfrom sklearn.datasets import load_digits\ndigits = load_digits()\ndata = digits.data\n\n# Define hyperparameters\nN_EIGS = 2\nN_NEIGHBORS = 10\nN_JOBS = 10\nSIZES = [1000, 5000, 10000, 25000, 50000, 100000]\nN_RUNS = 3\n\nruntime_benchmark(data,\n                  n_eigs=N_EIGS,\n                  n_neighbors=N_NEIGHBORS,\n                  n_jobs=N_JOBS,\n                  sizes=SIZES,\n                  n_runs=N_RUNS)\n```\n\n![](figs/Runtime_benchmark.png)\n\nAs you can see, the diffusion harmoics model is the fastest, followed closely by fuzzy simplicial sets. Both outperform \nscikit-learn default implementation and escalate linearly with sample size.\n\n\n# License\nMIT License\n2021, Davi Sidarta-Oliveira\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavisidarta%2Ffastlapmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavisidarta%2Ffastlapmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavisidarta%2Ffastlapmap/lists"}