Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benedekrozemberczki/MUSAE
The reference implementation of "Multi-scale Attributed Node Embedding". (Journal of Complex Networks 2021)
https://github.com/benedekrozemberczki/MUSAE
aane asne asonam attributed-embedding deep-learning deepwalk embedding gemsec gensim graph-embedding graph-neural-network implicit-factorization musae network-analysis network-embedding node-embedding node2vec tadw walklets word2vec
Last synced: about 1 month ago
JSON representation
The reference implementation of "Multi-scale Attributed Node Embedding". (Journal of Complex Networks 2021)
- Host: GitHub
- URL: https://github.com/benedekrozemberczki/MUSAE
- Owner: benedekrozemberczki
- License: gpl-3.0
- Created: 2019-04-17T12:57:59.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-09-19T18:34:40.000Z (about 2 years ago)
- Last Synced: 2024-08-01T22:41:50.420Z (4 months ago)
- Topics: aane, asne, asonam, attributed-embedding, deep-learning, deepwalk, embedding, gemsec, gensim, graph-embedding, graph-neural-network, implicit-factorization, musae, network-analysis, network-embedding, node-embedding, node2vec, tadw, walklets, word2vec
- Language: Python
- Homepage: https://karateclub.readthedocs.io/
- Size: 19.4 MB
- Stars: 153
- Watchers: 5
- Forks: 22
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-network-embedding - [Python
README
MUSAE
============
[![Arxiv](https://img.shields.io/badge/ArXiv-1909.13021-orange.svg)](https://arxiv.org/abs/1909.13021) [![codebeat badge](https://codebeat.co/badges/5aef8ac3-08bf-44b9-94ec-929778ec3b94)](https://codebeat.co/projects/github-com-benedekrozemberczki-musae-master) [![repo size](https://img.shields.io/github/repo-size/benedekrozemberczki/MUSAE.svg)](https://github.com/benedekrozemberczki/MUSAE/archive/master.zip) [![benedekrozemberczki](https://img.shields.io/twitter/follow/benrozemberczki?style=social&logo=twitter)](https://twitter.com/intent/follow?screen_name=benrozemberczki)The reference implementation of **Multi-Scale Attributed Node Embedding. (Journal of Complex Networks 2021)**
### Abstract
We present network embedding algorithms that capture information about a node from the local distribution over node attributes around it, as observed over random walks following an approach similar to Skip-gram. Observations from neighborhoods of different sizes are either pooled (AE) or encoded distinctly in a multi-scale approach (MUSAE). Capturing attribute-neighborhood relationships over multiple scales is useful for a diverse range of applications, including latent feature identification across disconnected networks with similar attributes. We prove theoretically that matrices of node-feature pointwise mutual information are implicitly factorized by the embeddings. Experiments show that our algorithms are robust, computationally efficient and outperform comparable models on social, web and citation network datasets.The second-order random walks sampling methods were taken from the reference implementation of [Node2Vec](https://github.com/aditya-grover/node2vec).
The datasets are also available on [SNAP](http://snap.stanford.edu/).
The model is now also available in the package [Karate Club](https://github.com/benedekrozemberczki/karateclub).
This repository provides the reference implementations for **MUSAE** and **AE** as described in the paper:
> Multi-Scale Attributed Node Embedding.
> [Benedek Rozemberczki](http://homepages.inf.ed.ac.uk/s1668259/), [Carl Allen](http://homepages.inf.ed.ac.uk/s1577741/), and [Rik Sarkar](https://homepages.inf.ed.ac.uk/rsarkar/).
> [Journal of Complex Networks 2021](https://arxiv.org/abs/1909.13021)### Table of Contents
1. [Citing](#citing)
2. [Requirements](#requirements)
3. [Datasets](#datasets)
4. [Logging](#logging)
5. [Options](#options)
6. [Examples](#examples)### Citing
If you find MUSAE useful in your research, please consider citing the following paper:
```bibtex
>@article{musae,
author = {Rozemberczki, Benedek and Allen, Carl and Sarkar, Rik},
title = {{Multi-Scale Attributed Node Embedding}},
journal = {Journal of Complex Networks},
volume = {9},
number = {2},
year = {2021},
}
```
### Requirements
The codebase is implemented in Python 3.5.2. package versions used for development are just below.
```
networkx 2.4
tqdm 4.28.1
numpy 1.15.4
pandas 0.23.4
texttable 1.5.0
scipy 1.1.0
argparse 1.1.0
gensim 3.6.0
```
### Datasets### Logging
The models are defined in a way that parameter settings and runtimes are logged. Specifically we log the followings:
```
1. Hyperparameter settings. We save each hyperparameter used in the experiment.
2. Optimization runtime. We measure the time needed for optimization - measured by seconds.
3. Sampling runtime. We measure the time needed for sampling - measured by seconds.
```### Options
Learning the embedding is handled by the `src/main.py` script which provides the following command line arguments.
#### Input and output options
```
--graph-input STR Input edge list csv. Default is `input/edges/chameleon_edges.csv`.
--features-input STR Input features json. Default is `input/features/chameleon_features.json`.
--output STR Embedding output path. Default is `output/chameleon_embedding.csv`.
--log STR Log output path. Default is `logs/chameleon.json`.
```
#### Random walk options```
--sampling STR Random walker order (first/second). Default is `first`.
--P FLOAT Return hyperparameter for second-order walk. Default is 1.0
--Q FLOAT In-out hyperparameter for second-order walk. Default is 1.0.
--walk-number INT Walks per source node. Default is 5.
--walk-length INT Truncated random walk length. Default is 80.
```#### Model options
```
--model STR Pooled or multi-scale model (AE/MUSAE). Default is `musae`.
--base-model STR Use of Doc2Vec base model. Default is `null`.
--approximation-order INT Matrix powers approximated. Default is 3.
--dimensions INT Number of dimensions. Default is 32.
--down-sampling FLOAT Length of random walk per source. Default is 0.001.
--exponent FLOAT Downsampling exponent of frequency. Default is 0.75.
--alpha FLOAT Initial learning rate. Default is 0.05.
--min-alpha FLOAT Final learning rate. Default is 0.025.
--min-count INT Minimal occurence of features. Default is 1.
--negative-samples INT Number of negative samples per node. Default is 5.
--workers INT Number of cores used for optimization. Default is 4.
--epochs INT Gradient descent epochs. Default is 5.
```### Examples
Training a MUSAE model for a 10 epochs.
```sh
$ python src/main.py --epochs 10
```
Changing the dimension size.
```sh
$ python src/main.py --dimensions 32
```
------------------------------**License**
- [GNU](https://github.com/benedekrozemberczki/MUSAE/blob/master/LICENSE)