Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ysyushi/aspem

Code and data for paper AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
https://github.com/ysyushi/aspem

Last synced: 4 days ago
JSON representation

Code and data for paper AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

Awesome Lists containing this project

README

        

# AspEm

This repository provides codes and data for the paper:

> AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

> Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han.

> In Proceedings of the 2018 SIAM International Conference on Data Mining, SIAM, 2018.

Particularly, it includes (1) a reference implementation of incompatibility measure, (2) ad-hoc implementations of the single-aspect embedding algorithm for the datasets used in the paper, (3) the IMDb dataset (the full DBLP dataset is excluded from this repository due to its file size), and (4) the class labels used in the DBLP classification tasks.

### Basic Usage

#### Input

1. The supported input HIN file should contain all edges of the input HIN. Each line corresponds to an edge, with the format

node_1 node_2 edge_weight edge_type

Note that node_1 and node_2 should be in the form

node_type:node_name

An example input HIN file can be found at ``data/imdb/imdb.hin``.

2. Additionally, to run the ad-hoc implementation of the single-aspect embedding algorithm for star-schema datasets, one should also have a file of all center nodes (e.g., ``data/imdb/movie.node``) and a file of all attribute nodes as input (e.g., ``data/imdb/uadg.node``).

#### Execute

To measure the incompactibility of all base aspects in an HIN:

$ python src/calc_base_aspect_inconsistency.py --input $input-hin-file --output $base-aspect-inc-file [optional: --sample-rate $sample-rate]

To aggregate incompactibility for all base aspects from the result of the previous step:

$ python src/agg_aspect_inconsistency.py $base-aspect-inc-file

As an exmaple, to calculate the incompatibility of each aspect of the IMDb dataset, execute the following commands sequentially:

$ python src/calc_base_aspect_inconsistency.py --input data/imdb/imdb.hin --output data/imdb/imdb_base_aspect_inc.csv
$ python src/agg_aspect_inconsistency.py $data/imdb/imdb_base_aspect_inc.csv

To execute the ad-hoc implementation of the embedding algorithm, one should makefile in the corresponding source code directory in ``src/``, and then execute the binary code in its ``bin/``. The argument ``-types`` specifies the attribute node types involved in the current aspect with the following mapping: in IMDb -- u for user, a for actor, d for director, g for genre; in DBLP -- a for author, p for reference, v for venue, w for term, y for year.

As an example, to embed the IMDb network with only attribute node types ``user`` and ``director``, execute the following commands sequentially:

$ cd src/emb_imdb/; make; cd ../..
$ ./src/emb_imdb/bin/emb_imdb -types ud -hin data/imdb/imdb.hin -center data/imdb/movie.node -attribute data/imdb/uadg.node -output data/imdb/attribute.emb -output-center data/imdb/center.em``

### Class Labels for DBLP Classification

In the DBLP experiment of the paper, two classification tasks were conducted based on the two class label files in

data/class_label/

### Citing
If you find *PReP* useful for your research, please consider citing the following paper:

@inproceedings{shi2018aspem,
author = {Shi, Yu and Gui, Huan and Zhu, Qi and Kaplan, Lance and Han, Jiawei},
title = {AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks},
booktitle = {Proceedings of the 2018 SIAM International Conference on Data Mining},
year = {2018},
organization={SIAM}
}

### Miscellaneous

Please send any questions you might have about the codes and/or the algorithm to .

*Note:* This is only a reference implementation of the *AspEm* algorithm. As discussed in the paper, AspEm is a flexible framework and one can choose their favorite network embedding algorithm to embed every single aspect.