https://github.com/simonepri/edgelist-mapper
📊Maps nodes and edges of a multi-relational graph to integer
https://github.com/simonepri/edgelist-mapper
converter dataset edgelist kb kg knowledge-graph link-prediction machine-learning map relation-prediction
Last synced: 6 months ago
JSON representation
📊Maps nodes and edges of a multi-relational graph to integer
- Host: GitHub
- URL: https://github.com/simonepri/edgelist-mapper
- Owner: simonepri
- License: mit
- Created: 2019-06-19T19:51:39.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-06-23T04:46:48.000Z (over 4 years ago)
- Last Synced: 2025-03-14T12:18:15.761Z (7 months ago)
- Topics: converter, dataset, edgelist, kb, kg, knowledge-graph, link-prediction, machine-learning, map, relation-prediction
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
edgelist-mapper
📊 Maps nodes and edges of a multi-relational graph to integer## Synopsis
edgelist-mapper is a simple tool that reads an edge-list file representing a graph and maps each node and relation to integer.
The mapping assigned is such that entities and relations that appear more frequently in the graph are mapped to smaller numerical values.This tool is particularly useful to pre-process some of the publicly available knowledge graph datasets that are often used for the machine learning task of [relation prediction][repo:NLP-progress->relation_prediction.md].
Do you believe that this is *useful*?
Has it *saved you time*?
Or maybe you simply *like it*?
If so, [support this work with a Star ⭐️][start].## Input format
The tool takes as input a file (`edgelist.tsv`) that represents a graph as tab-separated triples of the form `(head, relation, tail)` and generates three new files, namely `mapped_edgelist.tsv`, `entities_map.tsv`, and `relations_map.tsv`.
```
san_marino locatedin europe
belgium locatedin europe
russia locatedin europe
monaco locatedin europe
croatia locatedin europe
poland locatedin europe
```
> Example content of the `edgelist.tsv` file.```
0 europe
1 san_marino
2 russia
3 poland
4 monaco
5 croatia
6 belgium
```
> Content of the `entities_map.tsv` generated from the `edgelist.tsv` file.```
0 locatedin
```
> Content of the `relations_map.tsv` generated from the `edgelist.tsv` file.```
1 0 0
6 0 0
2 0 0
4 0 0
5 0 0
3 0 0
```
> Content of the `mapped_edgelist.tsv` generated from the `edgelist.tsv` file.## CLI Usage
The CLI takes the following positional arguments:
```
edgelist Path of the edgelist file
output Path of the output directory
```Example usage:
```bash
pip install edgelist-mapper
python -m edgelist_mapper.bin.run \
edgelist.tsv \
.
```
> NB: You need Python 3 to run the CLI.## Showcase
This tool has been used to create [this collection of datasets][repo:datasets-knowledge-embedding].
## Authors
- **Simone Primarosa** - [simonepri][github:simonepri]
See also the list of [contributors][contributors] who participated in this project.
## License
This project is licensed under the MIT License - see the [license][license] file for details.
[start]: https://github.com/simonepri/edgelist-mapper#start-of-content
[license]: https://github.com/simonepri/edgelist-mapper/tree/master/license
[contributors]: https://github.com/simonepri/edgelist-mapper/contributors[github:simonepri]: https://github.com/simonepri
[repo:NLP-progress->relation_prediction.md]:https://github.com/sebastianruder/NLP-progress/blob/master/english/relation_prediction.md
[repo:datasets-knowledge-embedding]: https://github.com/simonepri/datasets-knowledge-embedding