Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/filipinascimento/tpsimilarity
TP Similarity is a Python package designed to compute Transition Probability (TP) similarities between nodes in a network. This package provides various methods to estimate these similarities, including exact computation, estimated methods, and node2vec-based cosine similarity.
https://github.com/filipinascimento/tpsimilarity
Last synced: about 2 months ago
JSON representation
TP Similarity is a Python package designed to compute Transition Probability (TP) similarities between nodes in a network. This package provides various methods to estimate these similarities, including exact computation, estimated methods, and node2vec-based cosine similarity.
- Host: GitHub
- URL: https://github.com/filipinascimento/tpsimilarity
- Owner: filipinascimento
- License: mit
- Created: 2024-08-20T06:39:25.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-11-05T17:14:25.000Z (about 2 months ago)
- Last Synced: 2024-11-05T18:27:03.338Z (about 2 months ago)
- Language: Python
- Size: 52.7 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TP Similarity
**TP Similarity** is a Python package for computing Transition Probability (TP) similarities between nodes in a network. It offers various methods to estimate these similarities, including exact computation, estimation via random walks, shortest paths, and node2vec-based cosine similarity.
[![Build Status](https://github.com/filipinascimento/tpsimilarity/actions/workflows/test.yml/badge.svg)](https://github.com/filipinascimento/tpsimilarity/actions/workflows/test.yml)
[![Coverage Status](https://coveralls.io/repos/github/filipinascimento/tpsimilarity/badge.svg?branch=main)](https://coveralls.io/github/filipinascimento/tpsimilarity?branch=main)## Table of Contents
- [Overview](#overview)
- [Installation](#installation)
- [Features](#features)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Importing the Package](#importing-the-package)
- [Example Usage](#example-usage)
- [Compute Exact Transition Probabilities (TP)](#1-compute-exact-transition-probabilities-tp)
- [Compute Estimated Transition Probabilities](#2-compute-estimated-transition-probabilities)
- [Compute Node2Vec Similarity](#3-compute-node2vec-similarity)
- [Compute Shortest Paths Transition Probabilities](#4-compute-shortest-paths-transition-probabilities)
- [Parameters](#parameters)
- [Examples](#examples)
- [Contributing](#contributing)
- [Authors](#authors)
- [License](#license)## Overview
TP similarity is a measure designed for papers and authors, simulating a literature search procedure on citation networks. Inspired by information retrieval concepts, this approach does not rely on curated classification systems, avoids clustering complexities, and provides a continuous measure of similarity between nodes. By implementing the TP similarity measure, researchers can approximate the research interest similarity of individual scientists using publication-level information.
The package accompanies the paper:
**Varga, Attila, Sadamori Kojaku, and Filipi N. Silva. "Measuring Research Interest Similarity with Transition Probabilities."** *arXiv preprint arXiv:2409.18240* (2024). [Available on arXiv](https://arxiv.org/abs/2409.18240)
## Installation
Install the package using pip:
```bash
pip install tpsimilarity
```## Features
- **Exact Transition Probabilities (TP):** Computes the exact transition probabilities between nodes in a graph.
- **Estimated Transition Probabilities:** Estimates transition probabilities using random walks.
- **Shortest Paths Transition Probabilities:** Computes transition probabilities along the shortest paths.
- **Node2Vec Similarity:** Computes cosine similarity between node embeddings generated by node2vec.## Getting Started
### Prerequisites
- **Python**: Version 3.6 or higher
- **Dependencies**:
- `numpy`
- `scipy`
- `gensim`
- `tqdm`
- `joblib`
- `igraph`
- (optional) `networkx`Install the dependencies using:
```bash
pip install numpy scipy networkx gensim tqdm joblib igraph
```### Importing the Package
```python
from tpsimilarity import similarity
```### Example Usage
#### 1. Compute Exact Transition Probabilities (TP)
```python
import networkx as nx
import igraph as ig
from tpsimilarity import similarity# Create or load your graph
G = nx.karate_club_graph()# Convert NetworkX graph to iGraph
G = ig.Graph.from_networkx(G)# Define sources and targets
sources = [0, 1, 2] # Source nodes
targets = [3, 4, 5] # Target nodes# Compute exact TP similarities
tp_sim = similarity.TP(
graph=G,
sources=sources,
targets=targets,
window_length=5
)# tp_sim contains the similarity matrix or list based on return_type
```#### 2. Compute Estimated Transition Probabilities
```python
# Estimate TP similarities using random walks
estimated_tp = similarity.estimatedTP(
graph=G,
sources=sources,
targets=targets,
window_length=5,
walks_per_source=1000,
batch_size=100,
return_type="matrix",
degreeNormalization=True,
progressBar=True
)
```#### 3. Compute Node2Vec Similarity
```python
# Compute node2vec-based cosine similarities
node2vec_sim = similarity.node2vec(
graph=G,
sources=sources,
targets=targets,
dimensions=64,
window_length=40,
context_size=5,
workers=4,
batch_walks=100,
return_type="matrix",
progressBar=True
)
```#### 4. Compute Shortest Paths Transition Probabilities
```python
# Compute TP similarities along shortest paths
sp_tp = similarity.shortestPathsTP(
graph=G,
sources=sources,
targets=targets,
window_length=5
)
```### Parameters
- **graph** (`networkx.Graph` or `igraph.Graph`): The graph on which to compute the similarities.
- **sources** (`list`): List of source node indices.
- **targets** (`list`): List of target node indices.
- **window_length** (`int`): The length of the random walks.
- **return_type** (`str`, optional): The format of the output (`"list"`, `"matrix"`, or `"dict"`). Default is `"matrix"`.
- **degreeNormalization** (`bool`, optional): Whether to normalize by the degree of the target node. Default is `True`.
- **dimensions** (`int`, optional): Number of dimensions for node embeddings in node2vec. Default is `64`.
- **context_size** (`int`, optional): Context size for the node2vec algorithm. Default is `10`.
- **workers** (`int`, optional): Number of parallel workers for node2vec. Default is `4`.
- **batch_walks** (`int`, optional): Number of walks per batch for node2vec. Default is `10000`.
- **progressBar** (`bool` or `tqdm`, optional): Whether to display a progress bar during computation. Default is `True`.## Examples
You can find more examples and tutorials in the [examples directory](examples/) or in the [Jupyter notebooks](notebooks/) provided.
## Authors
- **Attila Varga**
- **Sadamori Kojaku**
- **Filipi N. Silva**## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.