Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/annoviko/pyclustering

pyclustering is a Python, C++ data mining library.
https://github.com/annoviko/pyclustering

algorithms c-plus-plus clustering data-mining data-science machine-learning neural-networks oscillatory-networks python python3

Last synced: 25 days ago
JSON representation

pyclustering is a Python, C++ data mining library.

Awesome Lists containing this project

README

        

Warning - Attention Users
=========================

**Please be aware that the `pyclustering` library is no longer supported as of 2021 due to personal reasons. There will be no further maintenance, issue addressing, or feature development for this repository.**

**For continued usage, I recommend seeking alternative solutions.**

**Thank you for your understanding.**

Build Status
============

|Build Status Linux MacOS| |Build Status Win| |Coverage Status| |PyPi| |Download Counter| |JOSS|

PyClustering
============

**pyclustering** is a Python, C++ data mining library (clustering
algorithm, oscillatory networks, neural networks). The library provides
Python and C++ implementations (C++ pyclustering library) of each algorithm or
model. C++ pyclustering library is a part of pyclustering and supported for
Linux, Windows and MacOS operating systems.

**Version**: 0.11.dev

**License**: The 3-Clause BSD License

**E-Mail**: [email protected]

**Documentation**: https://pyclustering.github.io/docs/0.10.1/html/

**Homepage**: https://pyclustering.github.io/

**PyClustering Wiki**: https://github.com/annoviko/pyclustering/wiki

Dependencies
============

**Required packages**: scipy, matplotlib, numpy, Pillow

**Python version**: >=3.6 (32-bit, 64-bit)

**C++ version**: >= 14 (32-bit, 64-bit)

Performance
===========

Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python
implementation is used, otherwise C/C++. Implementation can be chosen by `ccore` flag (by default it is always
'True' and it means that C/C++ is used), for example:

.. code:: python

# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);

# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);

# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);

Installation
============

Installation using pip3 tool:

.. code:: bash

$ pip3 install pyclustering

Manual installation from official repository using Makefile:

.. code:: bash

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit # build for 64-bit OS

# $ make ccore_32bit # build for 32-bit OS

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using CMake:

.. code:: bash

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# generate build files.
$ mkdir build
$ cmake ..

# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using Microsoft Visual Studio solution:

1. Clone repository from: https://github.com/annoviko/pyclustering.git
2. Open folder `pyclustering/ccore`
3. Open Visual Studio project `ccore.sln`
4. Select solution platform: `x86` or `x64`
5. Build `pyclustering-shared` project.
6. Add pyclustering folder to python path or install it using setup.py

.. code:: bash

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Proposals, Questions, Bugs
==========================

In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.

PyClustering Status
===================

+----------------------+------------------------------+-------------------------------------+---------------------------------+
| Branch | master | 0.10.dev | 0.10.1.rel |
+======================+==============================+=====================================+=================================+
| Build (Linux, MacOS) | |Build Status Linux MacOS| | |Build Status Linux MacOS 0.10.dev| | |Build Status Linux 0.10.1.rel| |
+----------------------+------------------------------+-------------------------------------+---------------------------------+
| Build (Win) | |Build Status Win| | |Build Status Win 0.10.dev| | |Build Status Win 0.10.1.rel| |
+----------------------+------------------------------+-------------------------------------+---------------------------------+
| Code Coverage | |Coverage Status| | |Coverage Status 0.10.dev| | |Coverage Status 0.10.1.rel| |
+----------------------+------------------------------+-------------------------------------+---------------------------------+

Cite the Library
================

If you are using pyclustering library in a scientific paper, please, cite the library:

Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.

BibTeX entry:

.. code::

@article{Novikov2019,
doi = {10.21105/joss.01230},
url = {https://doi.org/10.21105/joss.01230},
year = 2019,
month = {apr},
publisher = {The Open Journal},
volume = {4},
number = {36},
pages = {1230},
author = {Andrei Novikov},
title = {{PyClustering}: Data Mining Library},
journal = {Journal of Open Source Software}
}

Brief Overview of the Library Content
=====================================

**Clustering algorithms and methods (module pyclustering.cluster):**

+------------------------+---------+-----+
| Algorithm | Python | C++ |
+========================+=========+=====+
| Agglomerative | ✓ | ✓ |
+------------------------+---------+-----+
| BANG | ✓ | |
+------------------------+---------+-----+
| BIRCH | ✓ | |
+------------------------+---------+-----+
| BSAS | ✓ | ✓ |
+------------------------+---------+-----+
| CLARANS | ✓ | |
+------------------------+---------+-----+
| CLIQUE | ✓ | ✓ |
+------------------------+---------+-----+
| CURE | ✓ | ✓ |
+------------------------+---------+-----+
| DBSCAN | ✓ | ✓ |
+------------------------+---------+-----+
| Elbow | ✓ | ✓ |
+------------------------+---------+-----+
| EMA | ✓ | |
+------------------------+---------+-----+
| Fuzzy C-Means | ✓ | ✓ |
+------------------------+---------+-----+
| GA (Genetic Algorithm) | ✓ | ✓ |
+------------------------+---------+-----+
| G-Means | ✓ | ✓ |
+------------------------+---------+-----+
| HSyncNet | ✓ | ✓ |
+------------------------+---------+-----+
| K-Means | ✓ | ✓ |
+------------------------+---------+-----+
| K-Means++ | ✓ | ✓ |
+------------------------+---------+-----+
| K-Medians | ✓ | ✓ |
+------------------------+---------+-----+
| K-Medoids | ✓ | ✓ |
+------------------------+---------+-----+
| MBSAS | ✓ | ✓ |
+------------------------+---------+-----+
| OPTICS | ✓ | ✓ |
+------------------------+---------+-----+
| ROCK | ✓ | ✓ |
+------------------------+---------+-----+
| Silhouette | ✓ | ✓ |
+------------------------+---------+-----+
| SOM-SC | ✓ | ✓ |
+------------------------+---------+-----+
| SyncNet | ✓ | ✓ |
+------------------------+---------+-----+
| Sync-SOM | ✓ | |
+------------------------+---------+-----+
| TTSAS | ✓ | ✓ |
+------------------------+---------+-----+
| X-Means | ✓ | ✓ |
+------------------------+---------+-----+

**Oscillatory networks and neural networks (module pyclustering.nnet):**

+--------------------------------------------------------------------------------+---------+-----+
| Model | Python | C++ |
+================================================================================+=========+=====+
| CNN (Chaotic Neural Network) | ✓ | |
+--------------------------------------------------------------------------------+---------+-----+
| fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model) | ✓ | |
+--------------------------------------------------------------------------------+---------+-----+
| HHN (Oscillatory network based on Hodgkin-Huxley model) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+
| Hysteresis Oscillatory Network | ✓ | |
+--------------------------------------------------------------------------------+---------+-----+
| LEGION (Local Excitatory Global Inhibitory Oscillatory Network) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+
| PCNN (Pulse-Coupled Neural Network) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+
| SOM (Self-Organized Map) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+
| Sync (Oscillatory network based on Kuramoto model) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+
| SyncPR (Oscillatory network for pattern recognition) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+
| SyncSegm (Oscillatory network for image segmentation) | ✓ | ✓ |
+--------------------------------------------------------------------------------+---------+-----+

**Graph Coloring Algorithms (module pyclustering.gcolor):**

+------------------------+---------+-----+
| Algorithm | Python | C++ |
+========================+=========+=====+
| DSatur | ✓ | |
+------------------------+---------+-----+
| Hysteresis | ✓ | |
+------------------------+---------+-----+
| GColorSync | ✓ | |
+------------------------+---------+-----+

**Containers (module pyclustering.container):**

+------------------------+---------+-----+
| Algorithm | Python | C++ |
+========================+=========+=====+
| KD Tree | ✓ | ✓ |
+------------------------+---------+-----+
| CF Tree | ✓ | |
+------------------------+---------+-----+

Examples in the Library
=======================

The library contains examples for each algorithm and oscillatory network model:

**Clustering examples:** ``pyclustering/cluster/examples``

**Graph coloring examples:** ``pyclustering/gcolor/examples``

**Oscillatory network examples:** ``pyclustering/nnet/examples``

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/example_cluster_place.png
:alt: Where are examples?

Code Examples
=============

**Data clustering by CURE algorithm**

.. code:: python

from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;

# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();

# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();

**Data clustering by K-Means algorithm**

.. code:: python

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()

# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)

# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

**Data clustering by OPTICS algorithm**

.. code:: python

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)

# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)

# Performs cluster analysis
optics_instance.process()

# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()

# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)

# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

**Simulation of oscillatory network PCNN**

.. code:: python

from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer

# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)

# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()

# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)

**Simulation of chaotic neural network CNN**

.. code:: python

from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer

# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)

# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))

# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)

# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)

# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)

# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()

Illustrations
=============

**Cluster allocation on FCPS dataset collection by DBSCAN:**

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/fcps_cluster_analysis.png
:alt: Clustering by DBSCAN

**Cluster allocation by OPTICS using cluster-ordering diagram:**

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/optics_example_clustering.png
:alt: Clustering by OPTICS

**Partial synchronization (clustering) in Sync oscillatory network:**

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/sync_partial_synchronization.png
:alt: Partial synchronization in Sync oscillatory network

**Cluster visualization by SOM (Self-Organized Feature Map)**

.. image:: https://github.com/annoviko/pyclustering/blob/master/docs/img/target_som_processing.png
:alt: Cluster visualization by SOM

.. |Build Status Linux MacOS| image:: https://travis-ci.org/annoviko/pyclustering.svg?branch=master
:target: https://travis-ci.org/annoviko/pyclustering
.. |Build Status Win| image:: https://ci.appveyor.com/api/projects/status/4uly2exfp49emwn0/branch/master?svg=true
:target: https://ci.appveyor.com/project/annoviko/pyclustering/branch/master
.. |Coverage Status| image:: https://coveralls.io/repos/github/annoviko/pyclustering/badge.svg?branch=master&ts=1
:target: https://coveralls.io/github/annoviko/pyclustering?branch=master
.. |DOI| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4280556.svg
:target: https://doi.org/10.5281/zenodo.4280556
.. |PyPi| image:: https://badge.fury.io/py/pyclustering.svg
:target: https://badge.fury.io/py/pyclustering
.. |Build Status Linux MacOS 0.10.dev| image:: https://travis-ci.org/annoviko/pyclustering.svg?branch=0.10.dev
:target: https://travis-ci.org/annoviko/pyclustering
.. |Build Status Win 0.10.dev| image:: https://ci.appveyor.com/api/projects/status/4uly2exfp49emwn0/branch/0.10.dev?svg=true
:target: https://ci.appveyor.com/project/annoviko/pyclustering/branch/0.9.dev
.. |Coverage Status 0.10.dev| image:: https://coveralls.io/repos/github/annoviko/pyclustering/badge.svg?branch=0.10.dev&ts=1
:target: https://coveralls.io/github/annoviko/pyclustering?branch=0.9.dev
.. |Build Status Linux 0.10.1.rel| image:: https://travis-ci.org/annoviko/pyclustering.svg?branch=0.10.1.rel
:target: https://travis-ci.org/annoviko/pyclustering
.. |Build Status Win 0.10.1.rel| image:: https://ci.appveyor.com/api/projects/status/4uly2exfp49emwn0/branch/0.10.1.rel?svg=true
:target: https://ci.appveyor.com/project/annoviko/pyclustering/branch/0.10.1.rel
.. |Coverage Status 0.10.1.rel| image:: https://coveralls.io/repos/github/annoviko/pyclustering/badge.svg?branch=0.10.1.rel&ts=1
:target: https://coveralls.io/github/annoviko/pyclustering?branch=0.10.1.rel
.. |Download Counter| image:: https://pepy.tech/badge/pyclustering
:target: https://pepy.tech/project/pyclustering
.. |JOSS| image:: http://joss.theoj.org/papers/10.21105/joss.01230/status.svg
:target: https://doi.org/10.21105/joss.01230