https://github.com/d-chambers/dbscan1d
An efficient 1D implementation of the DBSCAN clustering algorithm
https://github.com/d-chambers/dbscan1d
clustering dbscan-algorithm machine-learning python
Last synced: 10 months ago
JSON representation
An efficient 1D implementation of the DBSCAN clustering algorithm
- Host: GitHub
- URL: https://github.com/d-chambers/dbscan1d
- Owner: d-chambers
- License: lgpl-3.0
- Created: 2019-10-07T05:04:44.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-10-09T20:38:49.000Z (over 1 year ago)
- Last Synced: 2025-06-12T18:41:37.683Z (11 months ago)
- Topics: clustering, dbscan-algorithm, machine-learning, python
- Language: Python
- Size: 146 KB
- Stars: 24
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DBSCAN1D
[](https://codecov.io/gh/d-chambers/dbscan1d)
[](https://pypi.python.org/pypi/dbscan1d)
[](https://pepy.tech/project/dbscan1d)
[](https://www.gnu.org/licenses/lgpl.html)
dbscan1d is a 1D implementation of the [DBSCAN algorithm](https://en.wikipedia.org/wiki/DBSCAN). It was created to efficiently
preform clustering on large 1D arrays.
[Sci-kit Learn's DBSCAN implementation](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) does
not have a special case for 1D, where calculating the full distance matrix is wasteful. It is much better to simply sort
the input array and performing efficient bisects for finding closest points. Here are the results of running the simple
profile script included with the package. In every case DBSCAN1D is much faster than scikit learn's implementation.

## Installation
Simply use pip to install dbscan1d:
```bash
pip install dbscan1d
```
It only requires numpy.
## Quickstart
dbscan1d is designed to be interchangable with sklearn's implementation in almost
all cases.
```python
from sklearn.datasets import make_blobs
from dbscan1d.core import DBSCAN1D
# make blobs to test clustering
X = make_blobs(1_000_000, centers=2, n_features=1)[0]
# init dbscan object
dbs = DBSCAN1D(eps=.5, min_samples=4)
# get labels for each point
labels = dbs.fit_predict(X)
# show core point indices
dbs.core_sample_indices_
# get values of core points
dbs.components_
```
## Notes
- dbscan1d can return different group numbers than sklearn for non-core points which are within
eps distances of core points for two separate groups. For example:
`--C1--C1--P--C2--C2`
Here C1 and C2 are core points for group 1 and group 2, respectively. If P is within eps of both C1 and
C2, dbscan1d will assign it the same label as the core point that is closest. Sklearn doesn't always do this.