https://github.com/gagolews/quitefastmst
quitefastmst: Euclidean and Mutual Reachability Minimum Spanning Trees
https://github.com/gagolews/quitefastmst
cluster-analysis clustering clustering-evaluation euclidean-distances genie hdbscan hdbscan-clustering-algorithm machine-learning machine-learning-algorithms minimum-spanning-tree mst mutual-reachability-distance neighbor-search outlier-detection
Last synced: 4 months ago
JSON representation
quitefastmst: Euclidean and Mutual Reachability Minimum Spanning Trees
- Host: GitHub
- URL: https://github.com/gagolews/quitefastmst
- Owner: gagolews
- License: other
- Created: 2025-07-18T09:01:44.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2026-02-02T14:08:31.000Z (4 months ago)
- Last Synced: 2026-02-03T02:54:00.811Z (4 months ago)
- Topics: cluster-analysis, clustering, clustering-evaluation, euclidean-distances, genie, hdbscan, hdbscan-clustering-algorithm, machine-learning, machine-learning-algorithms, minimum-spanning-tree, mst, mutual-reachability-distance, neighbor-search, outlier-detection
- Language: C++
- Homepage: https://quitefastmst.gagolewski.com/
- Size: 25.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# [*quitefastmst*](https://quitefastmst.gagolewski.com/) Package for R and Python
## Euclidean and Mutual Reachability Minimum Spanning Trees


**Keywords**: Euclidean minimum spanning tree, MST, EMST,
mutual reachability distance, nearest neighbours, k-nn, k-d tree,
Borůvka, Prim, Jarník, Kruskal, Genie, HDBSCAN\*, DBSCAN,
clustering, outlier detection.
Package **features**:
* [Euclidean Minimum Spanning Trees](https://en.wikipedia.org/wiki/Euclidean_minimum_spanning_tree)
using single-, sesqui-, and dual-tree Borůvka algorithms – quite fast
in spaces of low intrinsic dimensionality,
* Minimum spanning trees with respect to mutual reachability distances based
on the Euclidean metric (used in the definition of the HDBSCAN\* algorithm;
see Campello, Moulavi, Sander, 2013),
* Euclidean nearest neighbours with nicely-optimised K-d trees,
* relatively fast fallback algorithms for spaces of higher dimensionality,
* supports multiprocessing via OpenMP (on selected platforms).
Refer to the package **homepage** at
for the reference manual, tutorials, examples, and benchmarks.
**Author and maintainer**: [Marek Gagolewski](https://www.gagolewski.com/)
Possible applications in topological data analysis:
clustering ([HDBSCAN\*](https://hdbscan.readthedocs.io/en/latest/index.html),
[Lumbermark](https://lumbermark.gagolewski.com/),
[Genie](https://genieclust.gagolewski.com/), Single linkage, etc.),
outlier detection ([Deadwood](https://deadwood.gagolewski.com/)),
density estimation, dimensionality reduction, and many more.
## How to Install
### Python Version
To install from [PyPI](https://pypi.org/project/quitefastmst), call:
```bash
pip3 install quitefastmst # python3 -m pip install quitefastmst
```
*To learn more about Python, check out my open-access textbook*
[Minimalist Data Wrangling in Python](https://datawranglingpy.gagolewski.com/).
For best performance, advanced users will benefit from compiling the package
from sources:
```bash
CPPFLAGS="-O3 -march=native" pip3 install quitefastmst --force --no-binary="quitefastmst"
```
🚧 TO DO (help needed): How to enable OpenMP support on macOS/Darwin in `setup.py`?
### R Version
To install from [CRAN](https://CRAN.R-project.org/package=quitefastmst), call:
```r
install.packages("quitefastmst")
```
*To learn more about R, check out my open-access textbook*
[Deep R Programming](https://deepr.gagolewski.com/).
For best performance, advanced users will benefit from compiling the package
from sources:
```r
Sys.setenv(CXX_DEFS="-O3 -march=native") # for gcc and clang
install.packages("quitefastmst", type="source")
```
### Other
The core functionality is implemented in the form of a C++ library.
It can thus be easily adapted for use in other environments.
New contributions are welcome, e.g., Julia, Matlab/GNU Octave wrappers.
## License
Copyright (C) 2025–2026 Marek Gagolewski
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU Affero General Public License Version 3,
19 November 2007, published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero
General Public License Version 3 for more details. You should have
received a copy of the License along with this program. If not, see
.
## References
O. Borůvka, O jistém problému minimálním,
*Práce Moravské Přírodovědecké Společnosti* **3**, 1926, 37–58
J.L. Bentley, Multidimensional binary search trees used for associative
searching, *Communications of the ACM* **18**(9), 509–517, 1975,
[DOI:10.1145/361002.361007](https://doi.org/10.1145/361002.361007)
R.J.G.B. Campello, D. Moulavi, A. Zimek, J. Sander, Hierarchical
density estimates for data clustering, visualization, and outlier detection,
*ACM Transactions on Knowledge Discovery from Data (TKDD)* **10**(1),
2015, 1–51, [DOI:10.1145/2733381](https://doi.org/10.1145/2733381)
R.J.G.B. Campello, D. Moulavi, J. Sander,
Density-based clustering based on hierarchical density estimates,
*Lecture Notes in Computer Science* **7819**, 2013, 160–172,
[DOI:10.1007/978-3-642-37456-2_14](https://doi.org/10.1007/978-3-642-37456-2_14)
M. Gagolewski, quitefastmst, in preparation, 2026
M. Gagolewski, A. Cena, M. Bartoszuk, Ł. Brzozowski,
Clustering with minimum spanning trees: How good can it be?,
*Journal of Classification* **42**, 2025, 90–112,
[DOI:10.1007/s00357-024-09483-1](https://doi.org/10.1007/s00357-024-09483-1)
V. Jarník, O jistém problému minimálním (z dopisu panu O. Borůvkovi),
*Práce Moravské Přírodovědecké Společnosti* **6**, 1930, 57–63
S. Maneewongvatana, D.M. Mount, It's okay to be skinny, if your friends
are fat, *The 4th CGC Workshop on Computational Geometry*, 1999
W.B. March, R. Parikshit, A. Gray, Fast Euclidean minimum spanning
tree: Algorithm, analysis, and applications,
*Proc. 16th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining (KDD '10)*,
2010, 603–612
C.F. Olson, Parallel algorithms for hierarchical clustering,
*Parallel Computing* **21**(8), 1995, 1313–1325
L. McInnes, J. Healy, Accelerated hierarchical density-based
clustering, *IEEE Intl. Conf. Data Mining Workshops (ICMDW)*, 2017, 33–42,
[DOI:10.1109/ICDMW.2017.12](https://doi.org/10.1109/ICDMW.2017.12)
R. Prim, Shortest connection networks and some generalizations,
*The Bell System Technical Journal* **36**(6), 1957, 1389–1401
N. Sample, M. Haines, M. Arnold, T. Purcell,
Optimizing search strategies in K-d Trees, *5th WSES/IEEE Conf. Circuits,
Systems, Communications & Computers (CSCC'01)*, 2001
See **quitefastmst**'s [homepage](https://quitefastmst.gagolewski.com/)
for more references.