Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bcdh/spacy-serbian-pipeline

A pipeline for creating a language model for Serbian in spaCy
https://github.com/bcdh/spacy-serbian-pipeline

Last synced: about 2 months ago
JSON representation

A pipeline for creating a language model for Serbian in spaCy

Awesome Lists containing this project

README

        

# Serbian Language Pipeline for Spacy

Work in progress. Far from production ready.

## How to use with Spacy?

...

## Data files

For testing training, we're using the UD dataset, which has been automatically converted to Cyrillic. This is temporary. We will eventually use our own training data.

### Lemmatizer data

- data originates from Morpho-SLaWS (Tasovac, Rudan and Rudan 2015) and Transpoetika (Tasovac 2012)
- currently includes both Ekavian and Jekavian forms, I may move Jekavians to the normalization function