Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benedekrozemberczki/splitter
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
https://github.com/benedekrozemberczki/splitter
clustering community-detection deep-learning deep-neural-network deepwalk ego-splitting factorization gensim graph-embedding graph-neural-network graph-representation-learning implicit-factorization machine-learning network-embedding node-embedding node2vec overlapping-community-detection pytorch word-vector word2vec
Last synced: 3 days ago
JSON representation
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
- Host: GitHub
- URL: https://github.com/benedekrozemberczki/splitter
- Owner: benedekrozemberczki
- License: gpl-3.0
- Created: 2019-03-17T16:08:17.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-06-06T22:50:48.000Z (over 1 year ago)
- Last Synced: 2025-02-08T14:33:51.747Z (13 days ago)
- Topics: clustering, community-detection, deep-learning, deep-neural-network, deepwalk, ego-splitting, factorization, gensim, graph-embedding, graph-neural-network, graph-representation-learning, implicit-factorization, machine-learning, network-embedding, node-embedding, node2vec, overlapping-community-detection, pytorch, word-vector, word2vec
- Language: Python
- Homepage:
- Size: 11.4 MB
- Stars: 212
- Watchers: 10
- Forks: 44
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
Splitter [](https://arxiv.org/pdf/1905.02138.pdf) [](https://github.com/benedekrozemberczki/Splitter/archive/master.zip) [](https://twitter.com/intent/follow?screen_name=benrozemberczki)
======================
A **PyTorch** implementation of **Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).**
![]()
### Abstract
Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.This repository provides a PyTorch implementation of Splitter as described in the paper:
> Splitter: Learning Node Representations that Capture Multiple Social Contexts.
> Alessandro Epasto and Bryan Perozzi.
> WWW, 2019.
> [[Paper]](http://epasto.org/papers/www2019splitter.pdf)The original Tensorflow implementation is available [[here]](https://github.com/google-research/google-research/tree/master/graph_embedding/persona).
### Requirements
The codebase is implemented in Python 3.5.2. package versions used for development are just below.
```
networkx 1.11
tqdm 4.28.1
numpy 1.15.4
pandas 0.23.4
texttable 1.5.0
scipy 1.1.0
argparse 1.1.0
torch 1.1.0
gensim 3.6.0
```
### Datasets
The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.### Outputs
The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.### Options
The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.#### Input and output options
```
--edge-path STR Edge list csv. Default is `input/chameleon_edges.csv`.
--embedding-output-path STR Embedding output csv. Default is `output/chameleon_embedding.csv`.
--persona-output-path STR Persona mapping JSON. Default is `output/chameleon_personas.json`.
```
#### Model options
```
--seed INT Random seed. Default is 42.
--number of walks INT Number of random walks per node. Default is 10.
--window-size INT Skip-gram window size. Default is 5.
--negative-samples INT Number of negative samples. Default is 5.
--walk-length INT Random walk length. Default is 40.
--lambd FLOAT Regularization parameter. Default is 0.1
--dimensions INT Number of embedding dimensions. Default is 128.
--workers INT Number of cores for pre-training. Default is 4.
--learning-rate FLOAT SGD learning rate. Default is 0.025
```--------------------------------------------------------------------------------
### Examples
The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.```
python src/main.py
```
![]()
Training a Splitter model with 32 dimensions.
```
python src/main.py --dimensions 32
```
Increasing the number of walks and the walk length.
```
python src/main.py --number-of-walks 20 --walk-length 80
```--------------------------------------------------------------------------------
**License**
- [GNU License](https://github.com/benedekrozemberczki/Splitter/blob/master/LICENSE)
--------------------------------------------------------------------------------