Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zechengz/hin-dataset
Heterogeneous Information Network Datasets
https://github.com/zechengz/hin-dataset
heterogeneous-information-networks hin meta-path network-embedding
Last synced: 12 days ago
JSON representation
Heterogeneous Information Network Datasets
- Host: GitHub
- URL: https://github.com/zechengz/hin-dataset
- Owner: zechengz
- Created: 2019-05-26T04:21:03.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-05-26T04:35:45.000Z (over 5 years ago)
- Last Synced: 2024-12-05T17:52:08.583Z (18 days ago)
- Topics: heterogeneous-information-networks, hin, meta-path, network-embedding
- Homepage:
- Size: 2.93 KB
- Stars: 16
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Heterogeneous Information Network Datasets
### Download links
[DBLP (Google Drive)](https://drive.google.com/open?id=1YG9VR3vd6ewtMhdrcNXF5T_WTbx6MwYK): 601.4MB
[SLAP (Google Drive)](https://drive.google.com/open?id=1mIcLcxyg3WZApq6a4fIlADyU42WQKeGB): 295.8MB
[ACM (Google Drive)](https://drive.google.com/open?id=16R7ewS9cb5Bci7ClC0Ao1IYQmWPb-lHs): 752.1MB
[IMDB (Google Drive)](https://drive.google.com/open?id=1tqzNDkbZWGoG-vpM_M2X-EqRoPT1rp9k): 94.3MB### Datasets information
| Dataset | # Nodes |Node types | Meta-paths | # Meta-path instances| # Labels | # Features |
|:-------:|:----------:|:-------------------------------------:|:-----------------------------------------------------:|:------------------:|:--------:|:----------:|
| DBLP | 14475(A) | Author(A)
Paper(P)
Conference(C)| APA
APCPA | 40269
19445349 | 4 | 5000+ |
| SLAP | 20419(G) | Gene(G)
Gene Ontology(O)
Pathway(P)
Compound(C)
Tissue(T)
Gene Family(F)
Disease(D) | GTG
GFG
GDG
GPG
GOG
GG
GDCDG | 303487
582741
7494
416462
3185779
172248
18095 | 15 | 2695 |
| ACM | 12499(P) | Paper(P)
Author(A)
Proceeding(O)
Institute(I)
Conference(C) | PAP
PAIAP
POP
POCOP
PP | 91662
13303015
700386
7849967
30621 | 11 | 8000 |
| IMDB* | 18352(M) | Movie(M)
Actor(A)
Actress(E)
Director(D) | MAM?
MDM?
MEM? | 63659?
1085810?
565443?
| 9 | 1000 |### Notice
* * Multiple label dataset.
* ? Not sure which meta-path is corresponding to which number of meta-path instances.
* + Use `nltk.corpus.stopwords` and extract the bag-of-word representation.
* For `DBLP`, `SLAP` and `ACM`, please refer to the paper [Meta Path-Based Collective Classification in Heterogeneous Information Networks](https://arxiv.org/pdf/1305.4433.pdf).
* For `IMDB`, please refer to the paper [Column Networks for Collective Classification](https://arxiv.org/pdf/1609.04508.pdf).