https://github.com/hansalemaos/cyhdbscan
Very fast hdbscan for Python - written in Cython/C++
https://github.com/hansalemaos/cyhdbscan
cpp cython data-science euclidean fast hdbscan python
Last synced: 7 months ago
JSON representation
Very fast hdbscan for Python - written in Cython/C++
- Host: GitHub
- URL: https://github.com/hansalemaos/cyhdbscan
- Owner: hansalemaos
- License: mit
- Created: 2025-01-18T19:18:20.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-18T19:18:22.000Z (9 months ago)
- Last Synced: 2025-04-15T21:15:26.513Z (7 months ago)
- Topics: cpp, cython, data-science, euclidean, fast, hdbscan, python
- Language: C++
- Homepage: https://pypi.org/project/cyhdbscan
- Size: 19.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
- License: LICENSE
Awesome Lists containing this project
README
# Python Wrapper for HDBSCAN-C++
### `pip install cyhdbscan`
This repository contains a Python wrapper for the [HDBSCAN-C++ implementation by Rohan Mohapatra / Sumedh Basarkod](https://github.com/rohanmohapatra/hdbscan-cpp) . It allows you to perform HDBSCAN clustering directly from Python using Cython to bridge between Python and C++. It has no dependencies (except Cython for the compilation)
## Features
- Utilize the fast and efficient HDBSCAN algorithm implemented in C++
- Easy to use from Python
- Supports different distance metrics - Euclidean and Manhattan
## Prerequisites
Before you can use this wrapper, ensure you have the following installed:
- Python (of course)
- Cython
- A C++ compiler (e.g., GCC or MSVC)
## Usage example
```py
from cyhdbscan import py_calculate_hdbscan # The lib will be compiled the first time you import it
dataset = [
(0.837, 2.136),
(-1.758, 2.974),
(1.190, 4.728),
(2.140, 0.706),
(-1.035, 8.206),
(1.255, 0.090),
(0.596, 4.086),
(1.280, 1.058),
(1.730, 1.147),
(-0.949, 8.464),
(0.935, 5.332),
(2.369, 0.795),
(0.429, 4.974),
(-2.048, 6.654),
(-1.457, 7.487),
(0.529, 3.808),
(1.782, 0.908),
(-1.956, 8.616),
(-1.746, 3.012),
(-1.180, 3.128),
(1.164, 3.791),
(1.362, 1.366),
(2.601, 1.088),
(0.272, 5.470),
(-3.122, 3.282),
(-0.588, 8.614),
(1.669, -0.436),
(-0.683, 7.675),
(2.368, 0.552),
(1.052, 4.545),
(2.227, 1.263),
(2.439, -0.073),
(1.345, 4.857),
(-1.315, 6.839),
(0.983, 5.375),
(-1.063, 2.208),
(-1.607, 3.565),
(1.573, 0.484),
(-2.179, 8.086),
(1.834, 0.754),
(2.106, 3.495),
(-1.643, 7.527),
(1.106, 1.264),
(1.612, 1.823),
(0.460, 5.450),
(-0.538, 3.016),
(1.678, 0.609),
(-1.012, 3.603),
(1.342, 0.594),
(1.428, 1.624),
(2.045, 1.125),
(1.673, 0.659),
(-1.359, 2.322),
(1.131, 0.936),
(-1.739, 1.948),
(-0.340, 8.167),
(-1.638, 2.433),
(-1.688, 2.241),
(2.430, -0.064),
(-1.380, 7.185),
(-1.252, 2.339),
(-2.395, 3.398),
(-2.092, 7.481),
(0.488, 3.268),
(-0.539, 7.456),
(-2.592, 8.076),
(-1.047, 2.965),
(1.256, 3.382),
(-1.622, 4.272),
(1.869, 5.441),
(-1.764, 2.222),
(-1.382, 7.288),
(0.008, 4.176),
(-1.103, 7.302),
(-1.794, 7.581),
(-1.512, 7.944),
(0.959, 4.561),
(-0.601, 6.300),
(0.225, 4.770),
(1.567, 0.018),
(-1.034, 2.921),
(-0.922, 8.099),
(-1.886, 2.248),
(1.869, 0.956),
(1.101, 4.890),
(-1.932, 8.306),
(0.670, 4.041),
(0.744, 4.122),
(1.640, 1.819),
(0.815, 4.785),
(-2.633, 2.631),
(-0.961, 1.274),
(0.214, 4.885),
(1.435, 1.307),
(1.214, 3.648),
(1.083, 4.063),
(-1.226, 8.296),
(1.482, 0.690),
(1.896, 5.185),
(-1.324, 4.131),
(-1.150, 7.893),
(2.469, 1.679),
(2.311, 1.304),
(0.573, 4.088),
(-0.968, 3.122),
(2.625, 0.950),
(1.684, 4.196),
(-2.221, 2.731),
(-1.578, 3.034),
(0.082, 4.567),
(1.433, 4.377),
(1.063, 5.176),
(0.768, 4.398),
(2.470, 1.315),
(-1.732, 7.164),
(0.347, 3.452),
(-1.001, 2.849),
(1.016, 4.485),
(0.560, 4.214),
(-2.118, 2.035),
(-1.362, 2.383),
(-2.784, 2.992),
(1.652, 3.656),
(-1.940, 2.189),
(-1.815, 7.978),
(1.202, 3.644),
(-0.969, 3.267),
(1.870, -0.108),
(-1.807, 2.068),
(1.218, 3.893),
(-1.484, 6.008),
(-1.564, 2.853),
(-0.686, 8.683),
(1.076, 4.685),
(-0.976, 6.738),
(1.380, 4.548),
(-1.641, 2.681),
(-0.002, 4.581),
(1.714, 5.025),
(-1.405, 7.726),
(-0.708, 2.504),
(-0.886, 2.646),
(1.984, 0.490),
(2.952, -0.344),
(0.432, 4.335),
(-1.866, 7.625),
(2.527, 0.618),
(2.041, 0.455),
(-2.580, 3.188),
(1.620, 0.068),
(-2.588, 3.131),
(0.444, 3.115),
(-0.457, 7.306),
(-1.129, 7.805),
(2.130, 5.192),
(1.004, 4.191),
(-1.393, 8.746),
(0.728, 3.855),
(0.893, 1.011),
(-1.108, 2.920),
(0.789, 4.337),
(1.976, 0.719),
(-1.249, 3.085),
(-1.078, 8.881),
(-1.868, 3.080),
(2.768, 1.088),
(0.277, 4.844),
(3.411, 0.872),
(-1.581, 7.553),
(-1.530, 7.705),
(-1.825, 7.360),
(-1.686, 7.953),
(-1.651, 3.446),
(-1.304, 3.003),
(-0.731, 6.242),
(2.406, 4.870),
(-1.536, 3.014),
(1.489, 0.652),
(0.514, 4.627),
(-1.815, 3.290),
(-1.937, 3.914),
(-0.615, 3.950),
(2.032, 0.197),
(2.149, 1.037),
(-1.370, 7.770),
(0.914, 4.550),
(0.334, 4.936),
(-2.160, 3.410),
(1.367, 0.635),
(-0.571, 8.133),
(-1.006, 3.084),
(1.495, 3.858),
(-0.590, 7.695),
(0.715, 5.413),
(2.114, 1.247),
(1.201, 0.602),
(-2.546, 3.150),
(-1.959, 2.430),
(2.338, 3.431),
(3.353, 1.700),
(1.843, 0.073),
(1.320, 1.404),
(2.097, 4.847),
(-1.243, 8.152),
(-1.859, 7.789),
(2.747, 1.545),
(2.608, 1.089),
(1.660, 3.563),
(2.352, 0.828),
(2.223, 0.839),
(3.229, 1.132),
(-1.559, 7.248),
(-0.647, 3.429),
(-1.327, 8.515),
(0.917, 3.906),
(2.295, -0.766),
(1.816, 1.120),
(-1.120, 7.110),
(-1.655, 8.614),
(-1.276, 7.968),
(1.974, 1.580),
(2.518, 1.392),
(0.439, 4.536),
(0.369, 7.791),
(-1.791, 2.750),
]
result = py_calculate_hdbscan(
data=dataset, min_points=5, min_cluster_size=5, distance_metric="Euclidean"
)
import pandas as pd
print(pd.DataFrame(result).to_string())
# original_data label membership_probability outlier_score outlier_id
# 0 [0.837, 2.136] 4 0.000000 0.000000 66
# 1 [-1.758, 2.974] 7 0.000000 0.000000 29
# 2 [1.19, 4.728] 6 0.000000 0.000000 6
# 3 [2.14, 0.706] 4 0.742785 0.000000 80
# 4 [-1.035, 8.206] 3 0.000000 0.000000 188
# 5 [1.255, 0.09] 4 0.738651 0.000000 48
# 6 [0.596, 4.086] 6 0.742785 0.000000 76
# 7 [1.28, 1.058] 4 0.719853 0.000000 103
# 8 [1.73, 1.147] 4 0.689899 0.000000 70
# 9 [-0.949, 8.464] 3 0.742785 0.000000 190
# 10 [0.935, 5.332] 6 0.738651 0.000000 177
# 11 [2.369, 0.795] 4 0.416808 0.000000 108
# 12 [0.429, 4.974] 6 0.719853 0.000000 159
# 13 [-2.048, 6.654] 3 0.738651 0.000000 51
# 14 [-1.457, 7.487] 3 0.719853 0.000000 97
# 15 [0.529, 3.808] 6 0.689899 0.000000 86
# 16 [1.782, 0.908] 4 0.826320 0.000000 82
# 17 [-1.956, 8.616] 3 0.689899 0.000000 128
# 18 [-1.746, 3.012] 7 0.742785 0.000000 186
# 19 [-1.18, 3.128] 7 0.738651 0.000000 166
# 20 [1.164, 3.791] 6 0.416808 0.000000 118
# 21 [1.362, 1.366] 4 0.636566 0.000000 87
# 22 [2.601, 1.088] 4 0.639942 0.000000 133
# 23 [0.272, 5.47] 6 0.826320 0.000000 117
# 24 [-3.122, 3.282] 7 0.719853 0.000000 57
# 25 [-0.588, 8.614] 3 0.416808 0.000000 18
# 26 [1.669, -0.436] 4 0.582667 0.000000 19
# 27 [-0.683, 7.675] 3 0.826320 0.000000 185
# 28 [2.368, 0.552] 4 0.461632 0.000000 41
# 29 [1.052, 4.545] 6 0.636566 0.000000 169
# 30 [2.227, 1.263] 4 0.722914 0.000000 168
# 31 [2.439, -0.073] 4 0.671035 0.000000 74
# 32 [1.345, 4.857] 6 0.639942 0.000000 219
# 33 [-1.315, 6.839] 3 0.636566 0.000000 184
# 34 [0.983, 5.375] 6 0.582667 0.000000 1
# 35 [-1.063, 2.208] 7 0.689899 0.000000 176
# 36 [-1.607, 3.565] 7 0.416808 0.000000 131
# 37 [1.573, 0.484] 4 0.122696 0.000000 92
# 38 [-2.179, 8.086] 3 0.639942 0.000000 123
# 39 [1.834, 0.754] 4 0.737856 0.000000 78
# 40 [2.106, 3.495] 6 0.461632 0.000000 201
# 41 [-1.643, 7.527] 3 0.582667 0.000000 21
# 42 [1.106, 1.264] 4 0.673931 0.000000 148
# 43 [1.612, 1.823] 4 0.721101 0.000000 49
# 44 [0.46, 5.45] 6 0.722914 0.000000 12
# 45 [-0.538, 3.016] 7 0.826320 0.000000 196
# 46 [1.678, 0.609] 4 0.341140 0.000000 93
# 47 [-1.012, 3.603] 7 0.636566 0.000000 7
# 48 [1.342, 0.594] 4 0.760534 0.000000 61
# 49 [1.428, 1.624] 4 0.760116 0.000000 150
# 50 [2.045, 1.125] 4 0.689325 0.000000 121
# 51 [1.673, 0.659] 4 0.685775 0.005590 104
# 52 [-1.359, 2.322] 7 0.639942 0.023094 14
# 53 [1.131, 0.936] 4 0.701151 0.031076 42
# 54 [-1.739, 1.948] 7 0.582667 0.056146 39
# 55 [-0.34, 8.167] 3 0.461632 0.057855 204
# 56 [-1.638, 2.433] 7 0.461632 0.062953 75
# 57 [-1.688, 2.241] 7 0.722914 0.080455 2
# 58 [2.43, -0.064] 4 0.387009 0.081781 139
# 59 [-1.38, 7.185] 3 0.722914 0.087602 46
# 60 [-1.252, 2.339] 7 0.671035 0.094225 173
# 61 [-2.395, 3.398] 7 0.122696 0.098303 116
# 62 [-2.092, 7.481] 3 0.671035 0.104613 162
# 63 [0.488, 3.268] 6 0.671035 0.135718 211
# 64 [-0.539, 7.456] 3 0.122696 0.140783 37
# 65 [-2.592, 8.076] 3 0.737856 0.160861 56
# 66 [-1.047, 2.965] 7 0.737856 0.161913 145
# 67 [1.256, 3.382] 6 0.122696 0.162461 170
# 68 [-1.622, 4.272] 0 0.000000 0.180867 194
# 69 [1.869, 5.441] 6 0.737856 0.180867 102
# 70 [-1.764, 2.222] 7 0.673931 0.180867 50
# 71 [-1.382, 7.288] 3 0.673931 0.180867 216
# 72 [0.008, 4.176] 6 0.673931 0.180867 83
# 73 [-1.103, 7.302] 3 0.721101 0.183881 4
# 74 [-1.794, 7.581] 3 0.341140 0.183881 100
# 75 [-1.512, 7.944] 3 0.760534 0.183881 203
# 76 [0.959, 4.561] 6 0.721101 0.190656 209
# 77 [-0.601, 6.3] 3 0.760116 0.190656 30
# 78 [0.225, 4.77] 6 0.341140 0.190656 183
# 79 [1.567, 0.018] 4 0.326183 0.196035 71
# 80 [-1.034, 2.921] 7 0.721101 0.203713 11
# 81 [-0.922, 8.099] 3 0.689325 0.208430 160
# 82 [-1.886, 2.248] 7 0.341140 0.208627 224
# 83 [1.869, 0.956] 4 0.417112 0.208888 16
# 84 [1.101, 4.89] 6 0.760534 0.212415 3
# 85 [-1.932, 8.306] 3 0.685775 0.217258 112
# 86 [0.67, 4.041] 6 0.760116 0.217690 153
# 87 [0.744, 4.122] 6 0.689325 0.226782 157
# 88 [1.64, 1.819] 4 0.386631 0.233446 171
# 89 [0.815, 4.785] 6 0.685775 0.249974 54
# 90 [-2.633, 2.631] 7 0.760534 0.253104 59
# 91 [-0.961, 1.274] 0 0.000000 0.255871 161
# 92 [0.214, 4.885] 6 0.701151 0.256637 129
# 93 [1.435, 1.307] 4 0.470280 0.256637 125
# 94 [1.214, 3.648] 6 0.387009 0.256637 94
# 95 [1.083, 4.063] 6 0.326183 0.256637 20
# 96 [-1.226, 8.296] 3 0.701151 0.256637 214
# 97 [1.482, 0.69] 4 0.684759 0.261700 22
# 98 [1.896, 5.185] 6 0.417112 0.261700 113
# 99 [-1.324, 4.131] 7 0.760116 0.269267 206
# 100 [-1.15, 7.893] 3 0.387009 0.275141 95
# 101 [2.469, 1.679] 4 0.838035 0.280604 15
# 102 [2.311, 1.304] 4 0.727384 0.283523 164
# 103 [0.573, 4.088] 6 0.386631 0.286972 84
# 104 [-0.968, 3.122] 7 0.689325 0.288302 147
# 105 [2.625, 0.95] 4 0.338441 0.292141 208
# 106 [1.684, 4.196] 6 0.470280 0.300090 28
# 107 [-2.221, 2.731] 7 0.685775 0.300860 155
# 108 [-1.578, 3.034] 7 0.701151 0.306750 96
# 109 [0.082, 4.567] 6 0.684759 0.310031 144
# 110 [1.433, 4.377] 6 0.838035 0.324773 89
# 111 [1.063, 5.176] 6 0.727384 0.325524 126
# 112 [0.768, 4.398] 6 0.338441 0.327308 217
# 113 [2.47, 1.315] 4 0.635927 0.333191 136
# 114 [-1.732, 7.164] 3 0.326183 0.335403 221
# 115 [0.347, 3.452] 6 0.635927 0.336760 52
# 116 [-1.001, 2.849] 7 0.387009 0.342861 195
# 117 [1.016, 4.485] 6 0.482353 0.344652 197
# 118 [0.56, 4.214] 6 0.430840 0.344827 179
# 119 [-2.118, 2.035] 7 0.326183 0.348280 142
# 120 [-1.362, 2.383] 7 0.417112 0.352888 105
# 121 [-2.784, 2.992] 7 0.386631 0.355081 124
# 122 [1.652, 3.656] 6 0.472956 0.355697 32
# 123 [-1.94, 2.189] 7 0.470280 0.362003 178
# 124 [-1.815, 7.978] 3 0.417112 0.363128 81
# 125 [1.202, 3.644] 6 0.393618 0.372836 135
# 126 [-0.969, 3.267] 7 0.684759 0.386366 9
# 127 [1.87, -0.108] 4 0.482353 0.387209 8
# 128 [-1.807, 2.068] 7 0.838035 0.391382 191
# 129 [1.218, 3.893] 6 0.743208 0.392763 120
# 130 [-1.484, 6.008] 3 0.386631 0.395156 114
# 131 [-1.564, 2.853] 7 0.727384 0.397227 213
# 132 [-0.686, 8.683] 3 0.470280 0.401954 222
# 133 [1.076, 4.685] 6 0.502175 0.402517 109
# 134 [-0.976, 6.738] 3 0.684759 0.403438 141
# 135 [1.38, 4.548] 6 0.766235 0.418471 62
# 136 [-1.641, 2.681] 7 0.338441 0.432980 53
# 137 [-0.002, 4.581] 6 0.189619 0.437722 73
# 138 [1.714, 5.025] 6 0.759588 0.439706 200
# 139 [-1.405, 7.726] 3 0.838035 0.439706 79
# 140 [-0.708, 2.504] 7 0.635927 0.439706 127
# 141 [-0.886, 2.646] 7 0.482353 0.439706 182
# 142 [1.984, 0.49] 4 0.430840 0.441015 146
# 143 [2.952, -0.344] 4 0.472956 0.457827 85
# 144 [0.432, 4.335] 6 0.624909 0.457827 218
# 145 [-1.866, 7.625] 3 0.727384 0.458990 119
# 146 [2.527, 0.618] 4 0.393618 0.463463 137
# 147 [2.041, 0.455] 4 0.743208 0.470460 60
# 148 [-2.58, 3.188] 7 0.430840 0.470825 149
# 149 [1.62, 0.068] 4 0.502175 0.483465 165
# 150 [-2.588, 3.131] 7 0.472956 0.485582 38
# 151 [0.444, 3.115] 0 0.000000 0.492651 163
# 152 [-0.457, 7.306] 3 0.338441 0.505280 33
# 153 [-1.129, 7.805] 3 0.635927 0.505572 172
# 154 [2.13, 5.192] 6 0.417656 0.511402 111
# 155 [1.004, 4.191] 6 0.441913 0.511402 193
# 156 [-1.393, 8.746] 3 0.482353 0.516557 192
# 157 [0.728, 3.855] 6 0.438766 0.516557 27
# 158 [0.893, 1.011] 4 0.766235 0.518120 110
# 159 [-1.108, 2.92] 7 0.393618 0.521854 189
# 160 [0.789, 4.337] 6 0.758176 0.524485 101
# 161 [1.976, 0.719] 4 0.189619 0.526365 212
# 162 [-1.249, 3.085] 7 0.743208 0.532096 156
# 163 [-1.078, 8.881] 3 0.430840 0.535303 67
# 164 [-1.868, 3.08] 7 0.502175 0.536606 98
# 165 [2.768, 1.088] 4 0.759588 0.536606 202
# 166 [0.277, 4.844] 6 0.425497 0.536606 154
# 167 [3.411, 0.872] 4 0.624909 0.536606 138
# 168 [-1.581, 7.553] 3 0.472956 0.543011 122
# 169 [-1.53, 7.705] 3 0.393618 0.544018 207
# 170 [-1.825, 7.36] 3 0.743208 0.544851 35
# 171 [-1.686, 7.953] 3 0.502175 0.547966 187
# 172 [-1.651, 3.446] 7 0.766235 0.555784 220
# 173 [-1.304, 3.003] 7 0.189619 0.560654 25
# 174 [-0.731, 6.242] 3 0.766235 0.564289 10
# 175 [2.406, 4.87] 6 0.806437 0.572578 45
# 176 [-1.536, 3.014] 7 0.759588 0.576194 107
# 177 [1.489, 0.652] 4 0.417656 0.577034 205
# 178 [0.514, 4.627] 6 0.671555 0.579684 44
# 179 [-1.815, 3.29] 7 0.624909 0.582450 47
# 180 [-1.937, 3.914] 7 0.417656 0.587861 34
# 181 [-0.615, 3.95] 0 0.000000 0.600229 90
# 182 [2.032, 0.197] 4 0.441913 0.600301 132
# 183 [2.149, 1.037] 4 0.438766 0.602544 140
# 184 [-1.37, 7.77] 3 0.189619 0.604401 36
# 185 [0.914, 4.55] 6 0.738818 0.610496 134
# 186 [0.334, 4.936] 6 0.779360 0.611313 17
# 187 [-2.16, 3.41] 7 0.441913 0.615856 64
# 188 [1.367, 0.635] 4 0.758176 0.616716 55
# 189 [-0.571, 8.133] 3 0.759588 0.617330 23
# 190 [-1.006, 3.084] 7 0.438766 0.618283 180
# 191 [1.495, 3.858] 6 0.638316 0.619707 106
# 192 [-0.59, 7.695] 3 0.624909 0.621117 43
# 193 [0.715, 5.413] 6 0.610878 0.621668 5
# 194 [2.114, 1.247] 4 0.425497 0.622072 158
# 195 [1.201, 0.602] 4 0.806437 0.628202 72
# 196 [-2.546, 3.15] 7 0.758176 0.629062 115
# 197 [-1.959, 2.43] 7 0.425497 0.630759 88
# 198 [2.338, 3.431] 0 0.000000 0.640282 26
# 199 [3.353, 1.7] 4 0.671555 0.643993 24
# 200 [1.843, 0.073] 4 0.738818 0.652346 152
# 201 [1.32, 1.404] 4 0.779360 0.666485 31
# 202 [2.097, 4.847] 6 0.642165 0.673339 58
# 203 [-1.243, 8.152] 3 0.417656 0.675609 63
# 204 [-1.859, 7.789] 3 0.441913 0.676482 99
# 205 [2.747, 1.545] 4 0.638316 0.676673 69
# 206 [2.608, 1.089] 4 0.610878 0.689151 210
# 207 [1.66, 3.563] 6 0.331853 0.708094 13
# 208 [2.352, 0.828] 4 0.642165 0.709907 175
# 209 [2.223, 0.839] 4 0.331853 0.710540 40
# 210 [3.229, 1.132] 4 0.680165 0.713226 65
# 211 [-1.559, 7.248] 3 0.438766 0.718864 181
# 212 [-0.647, 3.429] 7 0.806437 0.731078 174
# 213 [-1.327, 8.515] 3 0.758176 0.740460 151
# 214 [0.917, 3.906] 6 0.680165 0.747472 130
# 215 [2.295, -0.766] 4 0.760572 0.751100 68
# 216 [1.816, 1.12] 4 0.324560 0.752195 215
# 217 [-1.12, 7.11] 3 0.425497 0.758514 77
# 218 [-1.655, 8.614] 3 0.806437 0.766876 167
# 219 [-1.276, 7.968] 3 0.671555 0.767946 223
# 220 [1.974, 1.58] 4 0.657128 0.771445 199
# 221 [2.518, 1.392] 4 0.546996 0.779360 0
# 222 [0.439, 4.536] 6 0.760572 0.782303 198
# 223 [0.369, 7.791] 3 0.738818 0.816012 143
# 224 [-1.791, 2.75] 7 0.671555 0.827391 91
```