https://github.com/adriacabeza/graphclustering
:milky_way: Method to partition large networks into communities
https://github.com/adriacabeza/graphclustering
clustering-methods graphs large-network python3 spectral-clustering
Last synced: 4 months ago
JSON representation
:milky_way: Method to partition large networks into communities
- Host: GitHub
- URL: https://github.com/adriacabeza/graphclustering
- Owner: adriacabeza
- Created: 2019-11-08T17:04:47.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-10-03T21:39:03.000Z (over 1 year ago)
- Last Synced: 2025-01-11T03:41:52.295Z (5 months ago)
- Topics: clustering-methods, graphs, large-network, python3, spectral-clustering
- Language: TeX
- Homepage:
- Size: 121 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
:milky_way: Graph Clustering into communities
[](http://hits.dwyl.io/AlbertSuarez/GraphClustering)
[](https://github.com/adriacabeza/GraphClustering)
[](https://www.python.org/)
[](https://GitHub.com/adriacabeza/GraphClustering/stargazers/)We will be using the following graphs from the Stanford Network Analysis Project (SNAP): ca-GrQc, Oregon-1, roadNet-CA, soc-Epinions1, and web-NotreDame (http://snap.stanford.edu/data/index.html). Project description in [*project.pdf*](./project.pdf) and final report in [*report.pdf*](./report/report.pdf).
## Initial example visualization and clustering of the graph ca-GrQc
![]()
![]()
Kamada-Kawai graph visualization of the ca-GrQc graph and Clustering using the Spectral Embedding.
## Statistics of graph datasets
| Graph | #vertices | #edges | #clusters |
|---------------|-----------|---------|-----------|
| ca-GrQc | 4158 | 13428 | 2 |
| Oregon-1 | 10670 | 22002 | 5 |
| soc-Epinions1 | 75877 | 405739 | 10 |
| web-NotreDame | 325729 | 1117563 | 20 |
| roadNet-CA | 1957027 | 2760388 | 50 |
## Run it### Requirements
Python 3 and install dependencies:
```bash
pip install -r requirements.txt
```### Recommendations
Usage of [virtualenv](https://realpython.com/blog/python/python-virtual-environments-a-primer/) is recommended for package library / runtime isolation.### Usage
Run the clustering algorithm from the main Python file *graph_clustering.py*. You can read arguments help and find command examples in *EXPERIMENTS.sh*. List of arguments:- *seed*: Random seed.
- *iterations*: Number of iterations with different seed.
- *file*: Path of the input graph file.
- *outputs_path*: Path to save the outputs.
- *clustering*: Use "kmeans", "custom_kmeans", "kmeans_sklearn", "xmeans" or "agglomerative".
- *random_centroids*: Random centroids initialization for "custom_kmeans".
- *distance_metric*: Distance metric for "custom_kmeans": "MINKOWSKI", "CHEBYSHEV", "EUCLIDEAN".
- *compute_eig*: Compute eigenvectors or load them.
- *k*: Number of desired clusters.
- *networkx*: Use networkx library for Laplacian.
- *eig_kept*: Number of eigen vectors kept.
- *normalize_laplacian*: Normalize Laplacian.
- *invert_laplacian*: Invert Laplacian.
- *second*: Using only second smallest eigenvector.
- *eig_normalization*: Normalization of eigen vectors by "vertex", "eig" or "None".## Authors
š¤ Ćlvaro Orgaz Expósito ([alvarorgaz](https://github.com/alvarorgaz))
š¤ AdriĆ Cabeza ([adriacabeza](https://github.com/adriacabeza))