An open API service indexing awesome lists of open source software.

https://github.com/stanfordnlp/sindhi-tokenization

Sindhi tokenization data from ISRA
https://github.com/stanfordnlp/sindhi-tokenization

Last synced: about 1 month ago
JSON representation

Sindhi tokenization data from ISRA

Awesome Lists containing this project

README

        

# sindhi-tokenization
Sindhi tokenization data from ISRA

A collection of text files, with token and sentence boundaries marked
in the tkns_ and stns_ files respectively.

A tool in [Stanza](https://github.com/stanfordnlp/stanza),
`convert_text_files.py`, processes this data into a CoNLL-style
suitable for training a tokenizer.
(The other annotations are left blank.)