Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/filipinascimento/tpsimilarity

TP Similarity is a Python package designed to compute Transition Probability (TP) similarities between nodes in a network. This package provides various methods to estimate these similarities, including exact computation, estimated methods, and node2vec-based cosine similarity.
https://github.com/filipinascimento/tpsimilarity

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/filipinascimento/tpsimilarity
Owner: filipinascimento
License: mit
Created: 2024-08-20T06:39:25.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-11-05T17:14:25.000Z (about 2 months ago)
Last Synced: 2024-11-05T18:27:03.338Z (about 2 months ago)
Language: Python
Size: 52.7 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # TP Similarity

**TP Similarity** is a Python package for computing Transition Probability (TP) similarities between nodes in a network. It offers various methods to estimate these similarities, including exact computation, estimation via random walks, shortest paths, and node2vec-based cosine similarity.

[![Build Status](https://github.com/filipinascimento/tpsimilarity/actions/workflows/test.yml/badge.svg)](https://github.com/filipinascimento/tpsimilarity/actions/workflows/test.yml)

[![Coverage Status](https://coveralls.io/repos/github/filipinascimento/tpsimilarity/badge.svg?branch=main)](https://coveralls.io/github/filipinascimento/tpsimilarity?branch=main)

## Table of Contents

- [Overview](#overview)

- [Installation](#installation)

- [Features](#features)

- [Getting Started](#getting-started)

  - [Prerequisites](#prerequisites)

  - [Importing the Package](#importing-the-package)

  - [Example Usage](#example-usage)

    - [Compute Exact Transition Probabilities (TP)](#1-compute-exact-transition-probabilities-tp)

    - [Compute Estimated Transition Probabilities](#2-compute-estimated-transition-probabilities)

    - [Compute Node2Vec Similarity](#3-compute-node2vec-similarity)

    - [Compute Shortest Paths Transition Probabilities](#4-compute-shortest-paths-transition-probabilities)

  - [Parameters](#parameters)

- [Examples](#examples)

- [Contributing](#contributing)

- [Authors](#authors)

- [License](#license)

## Overview

TP similarity is a measure designed for papers and authors, simulating a literature search procedure on citation networks. Inspired by information retrieval concepts, this approach does not rely on curated classification systems, avoids clustering complexities, and provides a continuous measure of similarity between nodes. By implementing the TP similarity measure, researchers can approximate the research interest similarity of individual scientists using publication-level information.

The package accompanies the paper:

**Varga, Attila, Sadamori Kojaku, and Filipi N. Silva. "Measuring Research Interest Similarity with Transition Probabilities."** *arXiv preprint arXiv:2409.18240* (2024). [Available on arXiv](https://arxiv.org/abs/2409.18240)

## Installation

Install the package using pip:

```bash

pip install tpsimilarity

```

## Features

- **Exact Transition Probabilities (TP):** Computes the exact transition probabilities between nodes in a graph.

- **Estimated Transition Probabilities:** Estimates transition probabilities using random walks.

- **Shortest Paths Transition Probabilities:** Computes transition probabilities along the shortest paths.

- **Node2Vec Similarity:** Computes cosine similarity between node embeddings generated by node2vec.

## Getting Started

### Prerequisites

- **Python**: Version 3.6 or higher

- **Dependencies**:

  - `numpy`

  - `scipy`

  - `gensim`

  - `tqdm`

  - `joblib`

  - `igraph` 

  - (optional) `networkx`

Install the dependencies using:

```bash

pip install numpy scipy networkx gensim tqdm joblib igraph

```

### Importing the Package

```python

from tpsimilarity import similarity

```

### Example Usage

#### 1. Compute Exact Transition Probabilities (TP)

```python

import networkx as nx

import igraph as ig

from tpsimilarity import similarity

# Create or load your graph

G = nx.karate_club_graph()

# Convert NetworkX graph to iGraph

G = ig.Graph.from_networkx(G)

# Define sources and targets

sources = [0, 1, 2]  # Source nodes

targets = [3, 4, 5]  # Target nodes

# Compute exact TP similarities

tp_sim = similarity.TP(

    graph=G,

    sources=sources,

    targets=targets,

    window_length=5

)

# tp_sim contains the similarity matrix or list based on return_type

```

#### 2. Compute Estimated Transition Probabilities

```python

# Estimate TP similarities using random walks

estimated_tp = similarity.estimatedTP(

    graph=G,

    sources=sources,

    targets=targets,

    window_length=5,

    walks_per_source=1000,

    batch_size=100,

    return_type="matrix",

    degreeNormalization=True,

    progressBar=True

)

```

#### 3. Compute Node2Vec Similarity

```python

# Compute node2vec-based cosine similarities

node2vec_sim = similarity.node2vec(

    graph=G,

    sources=sources,

    targets=targets,

    dimensions=64,

    window_length=40,

    context_size=5,

    workers=4,

    batch_walks=100,

    return_type="matrix",

    progressBar=True

)

```

#### 4. Compute Shortest Paths Transition Probabilities

```python

# Compute TP similarities along shortest paths

sp_tp = similarity.shortestPathsTP(

    graph=G,

    sources=sources,

    targets=targets,

    window_length=5

)

```

### Parameters

- **graph** (`networkx.Graph` or `igraph.Graph`): The graph on which to compute the similarities.

- **sources** (`list`): List of source node indices.

- **targets** (`list`): List of target node indices.

- **window_length** (`int`): The length of the random walks.

- **return_type** (`str`, optional): The format of the output (`"list"`, `"matrix"`, or `"dict"`). Default is `"matrix"`.

- **degreeNormalization** (`bool`, optional): Whether to normalize by the degree of the target node. Default is `True`.

- **dimensions** (`int`, optional): Number of dimensions for node embeddings in node2vec. Default is `64`.

- **context_size** (`int`, optional): Context size for the node2vec algorithm. Default is `10`.

- **workers** (`int`, optional): Number of parallel workers for node2vec. Default is `4`.

- **batch_walks** (`int`, optional): Number of walks per batch for node2vec. Default is `10000`.

- **progressBar** (`bool` or `tqdm`, optional): Whether to display a progress bar during computation. Default is `True`.

## Examples

You can find more examples and tutorials in the [examples directory](examples/) or in the [Jupyter notebooks](notebooks/) provided.

## Authors

- **Attila Varga**

- **Sadamori Kojaku**

- **Filipi N. Silva**

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.