Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eliorc/node2vec
Implementation of the node2vec algorithm.
https://github.com/eliorc/node2vec
deep-learning embeddings machine-learning-algorithms
Last synced: 25 days ago
JSON representation
Implementation of the node2vec algorithm.
- Host: GitHub
- URL: https://github.com/eliorc/node2vec
- Owner: eliorc
- License: mit
- Created: 2017-12-08T13:30:06.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-08-02T11:14:24.000Z (3 months ago)
- Last Synced: 2024-10-01T15:01:50.765Z (about 1 month ago)
- Topics: deep-learning, embeddings, machine-learning-algorithms
- Language: Python
- Size: 88.9 KB
- Stars: 1,215
- Watchers: 20
- Forks: 247
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Node2Vec
[![Downloads](http://pepy.tech/badge/node2vec)](http://pepy.tech/project/node2vec)Python3 implementation of the node2vec algorithm Aditya Grover, Jure Leskovec and Vid Kocijan.
[node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.](https://snap.stanford.edu/node2vec/)# Maintenance
### I no longer have time to maintain this, if someone wants to pick the baton let me know
## Installation
`pip install node2vec`
## Usage
```python
import networkx as nx
from node2vec import Node2Vec# Create a graph
graph = nx.fast_gnp_random_graph(n=100, p=0.5)# Precompute probabilities and generate walks - **ON WINDOWS ONLY WORKS WITH workers=1**
node2vec = Node2Vec(graph, dimensions=64, walk_length=30, num_walks=200, workers=4) # Use temp_folder for big graphs# Embed nodes
model = node2vec.fit(window=10, min_count=1, batch_words=4) # Any keywords acceptable by gensim.Word2Vec can be passed, `dimensions` and `workers` are automatically passed (from the Node2Vec constructor)# Look for most similar nodes
model.wv.most_similar('2') # Output node names are always strings# Save embeddings for later use
model.wv.save_word2vec_format(EMBEDDING_FILENAME)# Save model for later use
model.save(EMBEDDING_MODEL_FILENAME)# Embed edges using Hadamard method
from node2vec.edges import HadamardEmbedderedges_embs = HadamardEmbedder(keyed_vectors=model.wv)
# Look for embeddings on the fly - here we pass normal tuples
edges_embs[('1', '2')]
''' OUTPUT
array([ 5.75068220e-03, -1.10937878e-02, 3.76693785e-01, 2.69105062e-02,
... ... ....
..................................................................],
dtype=float32)
'''# Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks
edges_kv = edges_embs.as_keyed_vectors()# Look for most similar edges - this time tuples must be sorted and as str
edges_kv.most_similar(str(('1', '2')))# Save embeddings for later use
edges_kv.save_word2vec_format(EDGES_EMBEDDING_FILENAME)```
### Parameters
#### `node2vec.Node2vec`
- `Node2Vec` constructor:
1. `graph`: The first positional argument has to be a networkx graph. Node names must be all integers or all strings. On the output model they will always be strings.
2. `dimensions`: Embedding dimensions (default: 128)
3. `walk_length`: Number of nodes in each walk (default: 80)
4. `num_walks`: Number of walks per node (default: 10)
5. `p`: Return hyper parameter (default: 1)
6. `q`: Input parameter (default: 1)
7. `weight_key`: On weighted graphs, this is the key for the weight attribute (default: 'weight')
8. `workers`: Number of workers for parallel execution (default: 1)
9. `sampling_strategy`: Node specific sampling strategies, supports setting node specific 'q', 'p', 'num_walks' and 'walk_length'.
Use these keys exactly. If not set, will use the global ones which were passed on the object initialization`
10. `quiet`: Boolean controlling the verbosity. (default: False)
11. `temp_folder`: String path pointing to folder to save a shared memory copy of the graph - Supply when working on graphs that are too big to fit in memory during algorithm execution.
12. `seed`: Seed for the random number generator (default: None). Deterministic results can be obtained if seed is set and `workers=1`.- `Node2Vec.fit` method:
Accepts any key word argument acceptable by gensim.Word2Vec#### `node2vec.EdgeEmbedder`
`EdgeEmbedder` is an abstract class which all the concrete edge embeddings class inherit from.
The classes are `AverageEmbedder`, `HadamardEmbedder`, `WeightedL1Embedder` and `WeightedL2Embedder` which their practical definition could be found in the [paper](https://arxiv.org/pdf/1607.00653.pdf) on table 1
Notice that edge embeddings are defined for any pair of nodes, connected or not and even node with itself.- Constructor:
1. `keyed_vectors`: A gensim.models.KeyedVectors instance containing the node embeddings
2. `quiet`: Boolean controlling the verbosity. (default: False)- `EdgeEmbedder.__getitem__(item)` method, better known as `EdgeEmbedder[item]`:
1. `item` - A tuple consisting of 2 nodes from the `keyed_vectors` passed in the constructor. Will return the embedding of the edge.- `EdgeEmbedder.as_keyed_vectors` method: Returns a `gensim.models.KeyedVectors` instance with all possible node pairs in a *sorted* manner as string.
For example, for nodes ['1', '2', '3'] we will have as keys "('1', '1')", "('1', '2')", "('1', '3')", "('2', '2')", "('2', '3')" and "('3', '3')".## Caveats
- Node names in the input graph must be all strings, or all ints
- Parallel execution not working on Windows (`joblib` known issue). To run non-parallel on Windows pass `workers=1` on the `Node2Vec`'s constructor## TODO
- [x] Parallel implementation for walk generation
- [ ] Parallel implementation for probability precomputation