https://github.com/miksus/syntags
Lightweight Part of Speech tagger with support for custom word featuring. Development is focusing on Finnish grammar.
https://github.com/miksus/syntags
Last synced: 9 months ago
JSON representation
Lightweight Part of Speech tagger with support for custom word featuring. Development is focusing on Finnish grammar.
- Host: GitHub
- URL: https://github.com/miksus/syntags
- Owner: Miksus
- License: gpl-3.0
- Created: 2018-08-17T18:31:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-09-11T22:02:54.000Z (over 7 years ago)
- Last Synced: 2025-02-12T02:36:58.888Z (11 months ago)
- Language: Python
- Size: 23.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Syntags
Lightweight Part of Speech tagger with support for custom word featuring. Development of the default featuring functions are focused on Finnish grammar.
## Getting Started
### Requirements
```
Python 3
Pandas
Numpy
Scikit-learn
```
### Train the tagger
The tagger does not come pretrained thus requires pretagged (labeled) data.
The input data to the transformer can take various forms:
- list of sentences
```
>>> [["This", "is", "first", "example"], ["This", "is", "another", "example"]]
```
- pandas Series with index indicating sentence number
```
>>> pd.Series(["This", "is", "first", "example", "This", "is", "another", "example"],
index=[1,1,1,1,2,2,2,2])
```
- pandas Series containing lists
```
>>> pd.Series([["This", "is", "first", "example"], ["This", "is", "another", "example"]])
```
- pandas DataFrame with the text column in same format as above Series examples. The other columns are considered as additional features and passed to the estimator.