https://github.com/natlibfi/fintoai-data-ykl
DVC pipeline for YKL projects of Finto AI
https://github.com/natlibfi/fintoai-data-ykl
annif dvc dvc-pipeline glam subject-indexing text-classification
Last synced: about 1 year ago
JSON representation
DVC pipeline for YKL projects of Finto AI
- Host: GitHub
- URL: https://github.com/natlibfi/fintoai-data-ykl
- Owner: NatLibFi
- License: cc0-1.0
- Created: 2022-02-24T09:41:40.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-01-15T15:24:34.000Z (about 2 years ago)
- Last Synced: 2025-01-21T14:46:10.714Z (about 1 year ago)
- Topics: annif, dvc, dvc-pipeline, glam, subject-indexing, text-classification
- Language: Shell
- Homepage: https://ai.finto.fi
- Size: 343 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FintoAI-data-YKL
Configurations for maintaining the Annif projects with YKL vocabulary used at [Finto AI service](ai.finto.fi/).
The projects are trained and evaluated using a [DVC (Data Version Control) pipeline](https://dvc.org/doc/start/data-management/data-pipelines) defined in [dvc.yaml](/dvc.yaml).
The pipeline takes care of
1. installing Annif in a venv,
2. loading the vocabulary,
3. training the projects,
4. evaluating the projects.
When the necessary vocabulary and training corpora are in place the pipeline can be run using the command
dvc repro
For more information about using DVC with Annif projects see the [DVC exercise of Annif tutorial](https://github.com/NatLibFi/Annif-tutorial/blob/master/exercises/OPT_dvc.md).