https://github.com/johngiorgi/seq2rel-ds
This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.
https://github.com/johngiorgi/seq2rel-ds
coreference-resolution entity-extraction information-extraction relation-extraction seq2rel seq2seq
Last synced: 8 months ago
JSON representation
This is a companion repository to seq2rel (https://github.com/JohnGiorgi/seq2rel) which aims to make it easy to generate training data.
- Host: GitHub
- URL: https://github.com/johngiorgi/seq2rel-ds
- Owner: JohnGiorgi
- Created: 2021-03-24T22:40:52.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2022-04-13T16:40:27.000Z (about 4 years ago)
- Last Synced: 2023-03-03T22:31:14.262Z (over 3 years ago)
- Topics: coreference-resolution, entity-extraction, information-extraction, relation-extraction, seq2rel, seq2seq
- Language: Python
- Homepage:
- Size: 1000 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# seq2rel: Datasets
[](https://github.com/JohnGiorgi/seq2rel-ds/actions/workflows/ci.yml)
[](https://codecov.io/gh/JohnGiorgi/seq2rel-ds)
[](http://mypy-lang.org/)

This is a companion repository to [`seq2rel`](https://github.com/JohnGiorgi/seq2rel), which makes it easy to preprocess training data.
## Installation
This repository requires Python 3.8 or later.
### Setting up a virtual environment
Before installing, you should create and activate a Python virtual environment. If you need pointers on setting up a virtual environment, please see the [AllenNLP install instructions](https://github.com/allenai/allennlp#installing-via-pip).
### Installing the library and dependencies
If you _do not_ plan on modifying the source code, install from `git` using `pip`
```bash
pip install git+https://github.com/JohnGiorgi/seq2rel-ds.git
```
Otherwise, clone the repository and install from source using [Poetry](https://python-poetry.org/):
```bash
# Install poetry for your system: https://python-poetry.org/docs/#installation
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
# Clone and move into the repo
git clone https://github.com/JohnGiorgi/seq2rel-ds
cd seq2rel-ds
# Install the package with poetry
poetry install
```
## Usage
Installing this package gives you access to a simple command-line tool, `seq2rel-ds`. To see the list of available commands, run:
```bash
seq2rel-ds --help
```
> Note, you can also call the underlying python files directly, e.g. `python path/to/seq2rel_ds/main.py --help`.
To preprocess a dataset (and in most cases, download it), call one of the commands, e.g.
```bash
seq2rel-ds cdr main "path/to/cdr"
```
> Note, you have to include `main` because [`typer`](https://typer.tiangolo.com/) does not support default commands.
This will create the preprocessed `tsv` files under the specified output directory, e.g.
```
cdr
┣ train.tsv
┣ valid.tsv
┗ test.tsv
```
which can then be used to train a [`seq2rel`](https://github.com/JohnGiorgi/seq2rel) model.