https://github.com/mberr/ea-active-learning
Code for paper "Active Learning for Entity Alignment" (https://arxiv.org/abs/2001.08943)
https://github.com/mberr/ea-active-learning
active-learning entity-alignment knowledge-graph
Last synced: 10 months ago
JSON representation
Code for paper "Active Learning for Entity Alignment" (https://arxiv.org/abs/2001.08943)
- Host: GitHub
- URL: https://github.com/mberr/ea-active-learning
- Owner: mberr
- License: mit
- Created: 2020-12-28T11:00:21.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-05-01T14:12:15.000Z (almost 3 years ago)
- Last Synced: 2025-04-10T14:35:33.829Z (about 1 year ago)
- Topics: active-learning, entity-alignment, knowledge-graph
- Language: Python
- Homepage:
- Size: 81.1 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Active Learning for Entity Alignment
[](https://arxiv.org/abs/2001.08943)
[](https://docs.python.org/3.8/)
[](https://pytorch.org/docs/stable/index.html)
[](https://opensource.org/licenses/MIT)
This repository contains the source code for the paper
```
Active Learning for Entity Alignment
Max Berrendorf*, Evgeniy Faerman*, and Volker Tresp
https://arxiv.org/abs/2001.08943
```
# Installation
Setup and activate a virtual environment:
```shell script
python3.8 -m venv ./venv
source ./venv/bin/activate
```
Install requirements (in this virtual environment):
```shell script
pip install -U pip
pip install -U -r requirements.txt
```
# Preparation
In order to track results to a MLFlow server, start it first by running
```shell script
mlflow server
```
_Note: When storing the result for many configurations, we recommend to setup a database backend following the [instructions](https://mlflow.org/docs/latest/tracking.html)._
For the following examples, we assume that the server is running at
```shell script
TRACKING_URI=http://localhost:5000
```
# Experiments
For all experiments the results are logged to the running MLFlow instance. You can inspect the results during training by accessing the `TRACKING_URI` through a browser.
Moreover, all experiments are synced via the MLFlow instance.
Thus, you can start multiple instances of each command on different worker machines to parallelize the experiment.
## Random Baseline
To run the random baseline use
```shell script
PYTHONPATH=./src python3 executables/evaluate_active_learning_heuristic.py --phase=random --tracking_uri=${TRACKING_URI}
```
## Hyperparameter Search
To run the hyperparameter search use
```shell script
PYTHONPATH=./src python3 executables/evaluate_active_learning_heuristic.py --phase=hpo --tracking_uri=${TRACKING_URI}
```
_Note: The hyperparameter searches takes a significant amount of time (~multiple days), and requires access to GPU(s). You can abort the script at any time, and inspect the current results via the web interface of MLFlow._
## Best Configurations
To rerun the best configurations we found in our hyperparameter search use
```shell script
PYTHONPATH=./src python3 executables/evaluate_active_learning_heuristic.py --phase=best --tracking_uri=${TRACKING_URI}
```
# Evaluation
To reproduce the tables and numbers of the paper use
```bash
PYTHONPATH=./src python3 executables/collate_results.py --tracking_uri=${TRACKING_URI}
```
To avoid re-downloading data from a remote MLFLow instance, the metrics and parameters get buffered. To enforce a re-download, e.g., since you conducted additional runs, use `--force`.