https://github.com/thudm/mcns

Source code and dataset for KDD 2020 paper "Understanding Negative Sampling in Graph Representation Learning"
https://github.com/thudm/mcns

Last synced: about 1 year ago
JSON representation

Source code and dataset for KDD 2020 paper "Understanding Negative Sampling in Graph Representation Learning"

Host: GitHub
URL: https://github.com/thudm/mcns
Owner: THUDM
Created: 2020-04-20T15:12:46.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2021-07-15T06:33:40.000Z (almost 5 years ago)
Last Synced: 2025-03-24T13:11:18.435Z (over 1 year ago)
Language: Python
Homepage:
Size: 16.6 MB
Stars: 112
Watchers: 5
Forks: 33
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# MCNS

### __[Arxiv](https://arxiv.org/abs/2005.09863)__

Understanding Negative Sampling in Graph Representation Learning.

Zhen Yang*, Ming Ding*, Chang Zhou, Hongxia Yang, Jingren Zhou, Jie Tang. (*These authors contributed equally to this work.)

In KDD 2020 (Research Track)

## Introduction
We systematically analyze the role of negative sampling from the perspectives of both objective and risk, and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings.

## Preparation
* Python 3.7
* Tensorflow 1.14.0

## Training
### Training on the existing datasets
#### For GraphSAGE:
You can use ```$ ./experiments/graphsage/***.sh``` to train MCNS model on the recommendation task. For example, if you want to train on the Amazon dataset, you can run ```$ ./experiments/graphsage/amazon.sh``` or ```python main.py --input data/amazon/ --model graphsage_mean ``` to train MCNS model.

#### For DeepWalk:
You can use ```$ ./experiments/deepwalk/***.sh``` to train MCNS model on the recommendation task. For example, if you want to train on the ml-100k dataset, you can run ```$ ./experiments/deepwalk/ml.sh``` or ```python main.py --input data/ml-100k/ --model deepwalk ``` to train MCNS model.

#### For GCN:
You can use ```$ ./experiments/gcn.sh``` to train MCNS model on the ml-100k dataset for recommendation task.

### Training on your own datasets
if you want to train MCNS on your own dataset, you should prepare the following four files:
* train.txt: Each line represents an edge ``` ```.
* valid.txt: the same format with train.txt
* test.txt: the same format with train.txt
* test_neg.txt: For each node, we select some unconnected nodes as negs for evaluation. For Amazon and Alibaba datasets, we select 500 negs, and all unconnected negs for ml-100k to evaluate hits@k and MRR. ```(generated by load_data.py/load_test_neg function).```

## Dataset
* ml-100k contains 943 users, 1,682 items and 100,000 edges.
* Amazon contains 192,403 users, 63,001 items and 1,689,188 edges.
* Alibaba contains 106,042 users, 53591 items and 907,470 edges.

## Acknowledgement
The trainable encoder of our code is based on [GraphSAGE](https://github.com/williamleif/GraphSAGE).

## Cite

Please cite our paper if you find this code useful for your research:
```
@misc{yang2020understanding,
Author = {Zhen Yang and Ming Ding and Chang Zhou and Hongxia Yang and Jingren Zhou and Jie Tang},
Title = {Understanding Negative Sampling in Graph Representation Learning},
Year = {2020},
Eprint = {arXiv:2005.09863},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thudm/mcns

Awesome Lists containing this project

README