https://github.com/doxakis/cosinesimilaritydistancesongpu
Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA
https://github.com/doxakis/cosinesimilaritydistancesongpu
cuda
Last synced: 2 months ago
JSON representation
Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA
- Host: GitHub
- URL: https://github.com/doxakis/cosinesimilaritydistancesongpu
- Owner: doxakis
- License: mit
- Created: 2018-06-14T04:00:58.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-06-14T04:22:01.000Z (about 8 years ago)
- Last Synced: 2025-07-10T04:34:36.538Z (11 months ago)
- Topics: cuda
- Language: C#
- Size: 10.7 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cosine similarity distances on GPU
Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA
This was coded in c# with the library Alea GPU. A similar logic could be reuse in any language (Python, c++, etc.)
I plan to use it as a preprocessing step before running HDBSCAN (text clustering in a unsupervised way).
Calculating distances could make the algorithm faster and can be a way to scale out. (No need to use PCA to reduce the complexity)
This is more like a proof of concept.
Please note that the first time the kernel function run, a JIT compilation occur. It takes about 1 sec.
I would recommend to run it when starting your application if possible to minimize the impact on perceived performance.
# Future works
- Batch processing (if the array is too large, it does not work. We got : System.Exception: '[CUDAError] CUDA_ERROR_OUT_OF_MEMORY')
- Find optimal parameter (determine if it's better to use CPU only)
# Copyright and license
Code released under the MIT license.