An open API service indexing awesome lists of open source software.

https://github.com/hankcs/bolt_splits

Split Broad Operational Language Translation corpus into train/dev/test set
https://github.com/hankcs/bolt_splits

Last synced: 10 months ago
JSON representation

Split Broad Operational Language Translation corpus into train/dev/test set

Awesome Lists containing this project

README

          

# bolt_splits
Split Broad Operational Language Translation corpus into train/dev/test set.

The pseudo-code for splitting goes as follows:

```
For files in each genre:
For files in each ext:
For files in each length of filename:
Sort files by filename
Split files to trn, dev, tst with 8:1:1 ratio
```