Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bcdh/spacy-serbian-pipeline
A pipeline for creating a language model for Serbian in spaCy
https://github.com/bcdh/spacy-serbian-pipeline
Last synced: about 2 months ago
JSON representation
A pipeline for creating a language model for Serbian in spaCy
- Host: GitHub
- URL: https://github.com/bcdh/spacy-serbian-pipeline
- Owner: BCDH
- License: gpl-3.0
- Created: 2019-11-24T07:17:01.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-04-04T14:18:06.000Z (almost 2 years ago)
- Last Synced: 2024-10-31T00:12:36.336Z (2 months ago)
- Language: Python
- Size: 98.2 MB
- Stars: 3
- Watchers: 8
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Serbian Language Pipeline for Spacy
Work in progress. Far from production ready.
## How to use with Spacy?
...
## Data files
For testing training, we're using the UD dataset, which has been automatically converted to Cyrillic. This is temporary. We will eventually use our own training data.
### Lemmatizer data
- data originates from Morpho-SLaWS (Tasovac, Rudan and Rudan 2015) and Transpoetika (Tasovac 2012)
- currently includes both Ekavian and Jekavian forms, I may move Jekavians to the normalization function