Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/f1uctus/webanno2spacy
Convert WebAnno TSVs to spaCy's Doc-s.
https://github.com/f1uctus/webanno2spacy
spacy spacy-extension webanno webanno-tsv
Last synced: 28 days ago
JSON representation
Convert WebAnno TSVs to spaCy's Doc-s.
- Host: GitHub
- URL: https://github.com/f1uctus/webanno2spacy
- Owner: F1uctus
- Created: 2024-05-10T18:48:52.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-10-07T12:39:40.000Z (30 days ago)
- Last Synced: 2024-10-09T11:43:03.863Z (28 days ago)
- Topics: spacy, spacy-extension, webanno, webanno-tsv
- Language: Python
- Homepage: https://pypi.org/project/webanno2spacy
- Size: 789 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WebAnno ⟶ spaCy
[![](https://img.shields.io/pypi/v/webanno2spacy)](https://pypi.org/project/webanno2spacy)
[![](https://img.shields.io/pypi/wheel/webanno2spacy)](https://pypi.org/project/webanno2spacy/#files)A tool that helps you to convert the
[WebAnno TSV 3.2](https://webanno.github.io/webanno/releases/3.4.5/docs/user-guide.html#sect_webannotsv)
files to [spaCy](https://spacy.io)'s `Doc` format.
The relations are saved into the Doc's `rel` extension attribute,
in the same way as done in the [spaCy tutorial video](https://www.youtube.com/watch?v=8HL-Ap5_Axo&t=1530s):
```python
{
(0, 6): { "label1": 1.0, "label2": 0.0, ... },
(6, 0): { "label1": 0.0, "label2": 0.0, ... },
...
}
```## Usage
```
$ poetry install
$ webanno2spacy --help
Usage: webanno2spacy [OPTIONS] SPACY_MODEL INPUT_TEXT_FILE INPUT_WEBANNO_FILEArguments:
SPACY_MODEL [required]
INPUT_TEXT_FILE [required]
INPUT_WEBANNO_FILE [required]Options:
--output-file PATH
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
```## TODO
- Implement batch conversion of multiple files to DocBin## See also
- [WebAnno TSV 3.2 File format](https://webanno.github.io/webanno/releases/3.4.5/docs/user-guide.html#sect_webannotsv)
- [spaCy custom relation extraction pipeline (YouTube)](https://www.youtube.com/watch?v=8HL-Ap5_Axo)
- [spaCy custom relation extraction pipeline (code)](https://github.com/explosion/projects/tree/v3/tutorials/rel_component)