https://github.com/thudm/mcns
Source code and dataset for KDD 2020 paper "Understanding Negative Sampling in Graph Representation Learning"
https://github.com/thudm/mcns
Last synced: about 1 year ago
JSON representation
Source code and dataset for KDD 2020 paper "Understanding Negative Sampling in Graph Representation Learning"
- Host: GitHub
- URL: https://github.com/thudm/mcns
- Owner: THUDM
- Created: 2020-04-20T15:12:46.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-07-15T06:33:40.000Z (almost 5 years ago)
- Last Synced: 2025-03-24T13:11:18.435Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 16.6 MB
- Stars: 112
- Watchers: 5
- Forks: 33
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MCNS
### __[Arxiv](https://arxiv.org/abs/2005.09863)__
Understanding Negative Sampling in Graph Representation Learning.
Zhen Yang*, Ming Ding*, Chang Zhou, Hongxia Yang, Jingren Zhou, Jie Tang. (*These authors contributed equally to this work.)
In KDD 2020 (Research Track)
## Introduction
We systematically analyze the role of negative sampling from the perspectives of both objective and risk, and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings.
## Preparation
* Python 3.7
* Tensorflow 1.14.0
## Training
### Training on the existing datasets
#### For GraphSAGE:
You can use ```$ ./experiments/graphsage/***.sh``` to train MCNS model on the recommendation task. For example, if you want to train on the Amazon dataset, you can run ```$ ./experiments/graphsage/amazon.sh``` or ```python main.py --input data/amazon/ --model graphsage_mean ``` to train MCNS model.
#### For DeepWalk:
You can use ```$ ./experiments/deepwalk/***.sh``` to train MCNS model on the recommendation task. For example, if you want to train on the ml-100k dataset, you can run ```$ ./experiments/deepwalk/ml.sh``` or ```python main.py --input data/ml-100k/ --model deepwalk ``` to train MCNS model.
#### For GCN:
You can use ```$ ./experiments/gcn.sh``` to train MCNS model on the ml-100k dataset for recommendation task.
### Training on your own datasets
if you want to train MCNS on your own dataset, you should prepare the following four files:
* train.txt: Each line represents an edge ``` ```.
* valid.txt: the same format with train.txt
* test.txt: the same format with train.txt
* test_neg.txt: For each node, we select some unconnected nodes as negs for evaluation. For Amazon and Alibaba datasets, we select 500 negs, and all unconnected negs for ml-100k to evaluate hits@k and MRR. ```(generated by load_data.py/load_test_neg function).```
## Dataset
* ml-100k contains 943 users, 1,682 items and 100,000 edges.
* Amazon contains 192,403 users, 63,001 items and 1,689,188 edges.
* Alibaba contains 106,042 users, 53591 items and 907,470 edges.
## Acknowledgement
The trainable encoder of our code is based on [GraphSAGE](https://github.com/williamleif/GraphSAGE).
## Cite
Please cite our paper if you find this code useful for your research:
```
@misc{yang2020understanding,
Author = {Zhen Yang and Ming Ding and Chang Zhou and Hongxia Yang and Jingren Zhou and Jie Tang},
Title = {Understanding Negative Sampling in Graph Representation Learning},
Year = {2020},
Eprint = {arXiv:2005.09863},
}
```