https://github.com/veldhub/veld_chain__eltec_udpipe_inference
chain velds using udpipe 1 to infer on five ELTeC corpora.
https://github.com/veldhub/veld_chain__eltec_udpipe_inference
annotation nlp udpipe
Last synced: 4 months ago
JSON representation
chain velds using udpipe 1 to infer on five ELTeC corpora.
- Host: GitHub
- URL: https://github.com/veldhub/veld_chain__eltec_udpipe_inference
- Owner: veldhub
- License: mit
- Created: 2024-12-04T13:06:55.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-03-03T13:50:42.000Z (7 months ago)
- Last Synced: 2025-03-25T11:15:00.355Z (7 months ago)
- Topics: annotation, nlp, udpipe
- Language: XSLT
- Homepage:
- Size: 225 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#  veld_chain__eltec_udpipe_inference
This chain VELD repo uses [udpipe 1](https://ufal.mff.cuni.cz/udpipe/1) to infer on five ELTeC
corpora and aggregates their linguistic features into simple overview statistics.These code velds are reused here:
- https://github.com/veldhub/veld_code__xml_xslt_transformer
- https://github.com/veldhub/veld_code__downloader
- https://github.com/veldhub/veld_code__udpipe
- https://github.com/veldhub/veld_code__analyse_conllu## requirements
- git
- docker compose (note: older docker compose versions require running `docker-compose` instead of
`docker compose`)Clone this repo with all its submodules
```
git clone --recurse-submodules https://github.com/veldhub/veld_chain__eltec_udpipe_inference.git
```## how to reproduce
After cloning, the chains can be executed [individually](#individual-steps) or by running it all
together in one [aggregated chain](#aggregated-chain)### individual steps
**[./veld_step_1_preprocess.yaml](./veld_step_1_preprocess.yaml)**
This preprocessing converts the ELTeC data from xml into txt files as defined in
[./data/xsl/transformation.xsl](./data/xsl/transformation.xsl). The output of this is saved to
[./data/data_tmp_txt_transformed/](./data/data_tmp_txt_transformed/).```
docker compose -f veld_step_1_preprocess.yaml up
```**[./veld_step_2_download_models.yaml](./veld_step_2_download_models.yaml)**
udpipe will be used to infer on the data. The models will be downloaded from
https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3131# and saved to
[./data/data_tmp_udpipe_models/](./data/data_tmp_udpipe_models/).```
docker compose -f veld_step_2_download_models.yaml up
```**[./veld_step_3_infer.yaml](./veld_step_3_infer.yaml)**
Input for inference is the data from the previous preproccesing chains, and output is saved to
[./data/data_tmp_conllu_inferenced/](./data/data_tmp_conllu_inferenced/)```
docker compose -f veld_step_3_infer.yaml up
```**[./veld_step_4_analyse.yaml](./veld_step_4_analyse.yaml)**
Starts a jupyter notebook which can be used for deeper inspection interactively. Inputs are the
inferenced conllu files, and the output statistics are persisted into data veld repo:
https://github.com/veldhub/veld_data__eltec_conllu_stats```
docker compose -f veld_step_4_analyse.yaml up
```**[./veld_step_5_inspect_with_conllueditor.yaml](./veld_step_5_inspect_with_conllueditor.yaml)**
Starts a [conllueditor](https://github.com/Orange-OpenSource/conllueditor) with which individual
conllu files can be visualized and inspected.```
docker compose -f veld_step_5_inspect_with_conllueditor.yaml up
```### aggregated chain
**[./veld_step_all.yaml](./veld_step_all.yaml)**
All of the individual chains above can be simply run by launching the aggregated chain:
```
docker compose -f veld_step_all.yaml up
```