https://github.com/miksus/syntags

Lightweight Part of Speech tagger with support for custom word featuring. Development is focusing on Finnish grammar.
https://github.com/miksus/syntags

Last synced: 9 months ago
JSON representation

Lightweight Part of Speech tagger with support for custom word featuring. Development is focusing on Finnish grammar.

Host: GitHub
URL: https://github.com/miksus/syntags
Owner: Miksus
License: gpl-3.0
Created: 2018-08-17T18:31:18.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-09-11T22:02:54.000Z (over 7 years ago)
Last Synced: 2025-02-12T02:36:58.888Z (11 months ago)
Language: Python
Size: 23.4 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Syntags

Lightweight Part of Speech tagger with support for custom word featuring. Development of the default featuring functions are focused on Finnish grammar.

## Getting Started

### Requirements

```

Python 3

Pandas

Numpy

Scikit-learn

```

### Train the tagger

The tagger does not come pretrained thus requires pretagged (labeled) data. 

The input data to the transformer can take various forms:

- list of sentences

```

>>> [["This", "is", "first", "example"], ["This", "is", "another", "example"]]

```

- pandas Series with index indicating sentence number

```

>>> pd.Series(["This", "is", "first", "example", "This", "is", "another", "example"],

              index=[1,1,1,1,2,2,2,2])

```

- pandas Series containing lists

```

>>> pd.Series([["This", "is", "first", "example"], ["This", "is", "another", "example"]])

```

- pandas DataFrame with the text column in same format as above Series examples. The other columns are considered as additional features and passed to the estimator.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/miksus/syntags

Awesome Lists containing this project

README