https://github.com/CannyLab/tsne-cuda
GPU Accelerated t-SNE for CUDA with Python bindings
https://github.com/CannyLab/tsne-cuda
barnes-hut barnes-hut-tsne cuda data-analysis data-visualization fit-tsne gpu mnist multithreading python tsne tsne-algorithm tsne-cuda
Last synced: about 1 year ago
JSON representation
GPU Accelerated t-SNE for CUDA with Python bindings
- Host: GitHub
- URL: https://github.com/CannyLab/tsne-cuda
- Owner: CannyLab
- License: bsd-3-clause
- Created: 2018-03-24T01:14:53.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2024-10-02T16:38:47.000Z (over 1 year ago)
- Last Synced: 2025-03-12T08:01:38.542Z (about 1 year ago)
- Topics: barnes-hut, barnes-hut-tsne, cuda, data-analysis, data-visualization, fit-tsne, gpu, mnist, multithreading, python, tsne, tsne-algorithm, tsne-cuda
- Language: Cuda
- Homepage:
- Size: 15.1 MB
- Stars: 1,839
- Watchers: 29
- Forks: 132
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - CannyLab/tsne-cuda - cuda是一个基于CUDA加速的t-SNE降维算法实现项目,通过Python绑定接口实现高效的数据可视化处理。该项目专为大规模高维数据集设计,利用NVIDIA GPU的并行计算能力,将传统t-SNE算法的计算效率提升数十倍,特别适合处理包含数万甚至数十万数据点的机器学习任务。其核心工作原理是将t-SNE的梯度下降优化过程分解为多个并行计算单元,通过CUDA线程块分配每个数据点的相似性计算任务,结合共享内存优化和Barnes-Hut近似算法降低复杂度,最终在GPU上实现毫秒级的降维速度。开发者提供了完整的Python API,支持从NumPy数组直接加载数据,包含参数调优接口和可视化输出功能,用户可通过pip安装后在Jupyter Notebook或命令行中快速部署。项目还包含针对MNIST、CIFAR等标准数据集的基准测试脚本,实测显示在10,000个数据点的场景下,其速度比CPU版本快约35倍。开发者持续维护CUDA内核代码,支持NVIDIA Volta及Ampere架构显卡,同时提供详细的文档说明和GitHub讨论区,适合需要快速可视化高维数据的科研人员和工程师使用。 (其他_机器学习与深度学习)
README
# TSNE-CUDA





This repo is an optimized CUDA version of [FIt-SNE algorithm](https://github.com/KlugerLab/FIt-SNE) with associated python modules. We find that our implementation of t-SNE can be up to 1200x faster than Sklearn, or up to 50x faster than Multicore-TSNE when used with the right GPU. The paper describing our approach, as well as the results below, is available at [https://arxiv.org/abs/1807.11824](https://arxiv.org/abs/1807.11824).
You can install binaries with anaconda for CUDA version 10.1 and 10.2 using `conda install tsnecuda -c conda-forge`. Tsnecuda supports CUDA versions 9.0 and later through source installation, check out the wiki for up to date installation instructions. [https://github.com/CannyLab/tsne-cuda/wiki/](https://github.com/CannyLab/tsne-cuda/wiki/)
# Benchmarks
### Simulated Data

Time taken compared to other state of the art algorithms on synthetic datasets with 50 dimensions and four clusters for varying numbers of points. Note the log scale on both the points and time axis, and that the scale of the x-axis is in thousands of points (thus, the values on the x-axis range from 1K to 10M points. Dashed lines on SkLearn, BH-TSNE, and MULTICORE-4 represent projected times. Projected scaling assumes an O(nlog(n)) implementation.
### MNIST

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the MNIST dataset. t-SNE-CUDA runs on the raw pixels of the MNIST dataset (60000 images x 768 dimensions) in under 7 seconds.
### CIFAR

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the CIFAR-10 dataset. t-SNE-CUDA runs on the output of a classifier on the CIFAR-10 training set (50000 images x 1024 dimensions) in under 6 seconds. While we can run on the full pixel set in under 12 seconds, Euclidean distance is a poor metric in raw pixel space leading to poor quality embeddings.
### Comparison of Embedding Quality
The quality of the embeddings produced by t-SNE-CUDA do not differ significantly from the state of the art implementations. See below for a comparison of MNIST cluster outputs.

Left: MULTICORE-4 (501s), Middle: BH-TSNE (1156s), Right: t-SNE-CUDA (Ours, 6.98s).
# Installation
To install our library, follow the [installation instructions](https://github.com/CannyLab/tsne-cuda/blob/main/INSTALL.md).
### Run
Like many of the libraries available, the python wrappers subscribe to the same API as [sklearn.manifold.TSNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).
You can run it as follows:
```
from tsnecuda import TSNE
X_embedded = TSNE(n_components=2, perplexity=15, learning_rate=10).fit_transform(X)
```
We only support `n_components=2`. We currently have no plans to support more dimensions as this requires significant changes to the code to accomodate.
For more information on running the library, or using it as a C++ library, see the [Python usage](https://github.com/CannyLab/tsne-cuda/wiki/Basic-Usage:-Python) or [C++ Usage](https://github.com/CannyLab/tsne-cuda/wiki/Basic-Usage:-Cxx) sections of the wiki.
# Citation
Please cite the corresponding paper if it was useful for your research:
```
@article{chan2019gpu,
title={GPU accelerated t-distributed stochastic neighbor embedding},
author={Chan, David M and Rao, Roshan and Huang, Forrest and Canny, John F},
journal={Journal of Parallel and Distributed Computing},
volume={131},
pages={1--13},
year={2019},
publisher={Elsevier}
}
```
This library is built on top of the following technology, without this tech, none of this would be possible!
[L. Van der Maaten's paper](http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf)
[FIt-SNE](https://github.com/KlugerLab/FIt-SNE)
[Multicore-TSNE](https://github.com/DmitryUlyanov/Multicore-TSNE)
[BHTSNE](https://github.com/lvdmaaten/bhtsne/)
[CUDA Utilities/Pairwise Distance](https://github.com/OrangeOwlSolutions)
[LONESTAR-GPU](http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu)
[FAISS](https://github.com/facebookresearch/faiss)
[GTest](https://github.com/google/googletest)
[CXXopts](https://github.com/jarro2783/cxxopts)
# License
Our code is built using components from FAISS, the Lonestar GPU library, GTest, CXXopts, and OrangeOwl's CUDA utilities. Each portion of the code is governed by their respective licenses - however our code is governed by the BSD-3 license found in LICENSE.txt