An open API service indexing awesome lists of open source software.

https://github.com/veldhub/veld_data__eltec_conllu_stats

Data velds encapsulating statistics on conllu data.
https://github.com/veldhub/veld_data__eltec_conllu_stats

analysis conllu nlp statistics

Last synced: 5 months ago
JSON representation

Data velds encapsulating statistics on conllu data.

Awesome Lists containing this project

README

          

# ![veld chain](https://raw.githubusercontent.com/veldhub/.github/refs/heads/main/images/symbol_V_letter.png) veld_data__eltec_conllu_stats

Statistics on conllu data inferenced with udpipe on eltec corpora.

This repo and its data is the output of this chain veld repo:
https://github.com/veldhub/veld_chain__eltec_udpipe_inference

## statistics

### count_token

Simply counting the token for each file (token definition: https://universaldependencies.org/format.html)

### count_lemma_total

Simply counting the unique lemmas (lemma definition: https://universaldependencies.org/format.html)

### count_lemma_normalized_by_token

Taking `count_lemma_total` and dividing it by `count_token` so that this lemma count is respective to the overall token count.

### count_pos

For each part-of-speech tag, count its occurrence (pos definition: https://universaldependencies.org/u/pos/index.html)

### count_feat

For each feature tag, count its occurence (feature definition: https://universaldependencies.org/u/feat/index.html)