https://github.com/fostroll/phonetized_ner_srv
Tiny Flask app for phonetization, NE tagging and text distance calculation
https://github.com/fostroll/phonetized_ner_srv
Last synced: about 1 month ago
JSON representation
Tiny Flask app for phonetization, NE tagging and text distance calculation
- Host: GitHub
- URL: https://github.com/fostroll/phonetized_ner_srv
- Owner: fostroll
- License: apache-2.0
- Created: 2020-08-04T16:45:55.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-05-25T16:28:08.000Z (almost 4 years ago)
- Last Synced: 2025-02-14T06:35:59.894Z (3 months ago)
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# phonetized_ner_srv
Tiny Flask app for phonetization, NE tagging and text distance calculation.
## Prerequisites
*Python 3* and *PyPI* packages `flask`, `mordl`, `textdistance`, `toxine`,
`transliterate`.## Starting the Server
First, place storages of trained ***MorDL*** `UposTagger`, `FeatsTagger` and
`NeTagger` into `srv/models` directory. Change the parameter `emb_path` in
`ds_config.json` file of every storage, so that that path became correct.
Note, that the root point for relative paths there is `ner_srv`. Thus, if your
embeddings also placed in the `srv/models` directory, just add `'model/'` in
the beginning of each `emb_path` value.Second, you may go back to the `srv` directory and correct port in `main.py`
script.After that, ensure that you're still in the `srv` directory and run
```sh
sh ./run.sh prod
```Or, if you need debug mode, run just
```sh
sh ./run.sh
```## Usage
All services return data in *json* format.
```
:/api/tokenize/
http://
```
Returns *Parsed CoNLL-U* for tokenized **text** (untagged).```
:/api/tag/
http://
```
Returns *Parsed CoNLL-U* with **text** tokenized and with *UPOS*, *FEATS* and
*MISC:NE* fields filled.```
:/api/phonetize/?level=3&syllables=false
http://
```
Returns phonetized version of **text**. Only texts in Russian are processed
correctly.**level**: the level of simplification. Allowed values:
- `0` means no changes at all but excess spaces;
- `1` removes all spaces;
- `2` most standard version of phonetization;
- `3` refined phonetization;
- `4` rude phonetization;
- `5` even more rude.Default **level** is `3`.
**syllables**: if `true`, returns array of syllables instead of just **text**
phonetized. Default is `false`.```
:/api/text-distance//?ner1=&ner2=&level=3&algorithm=damerau_levenshtein&normalize=true&qval=1
http://
```
Returns text distance between **text1** and **text2**. Only text in Russian
are processed correctly.**ner1**: if specified, at the start, **text1** will be tokenized and tagged,
and then replaced by *FORM* fields of tokens that have **ner1** as value of
the *MISC:NE* field.**ner2**: if specified, at the start, **text2** will be tokenized and tagged,
and then replaced by *FORM* fields of tokens that have **ner2** as value of
the *MISC:NE* field.**level**: before calculating the distance, both **text1** and **text2** will
be phonetized with that level (see `api/phonetize` service).**algorithm**: what method to use to calculate the distance. Allowed
values are: `hamming`, `levenshtein`, `damerau_levenshtein` (default),
`jaro`, `jaro_winkler`, `gotoh`, `smith_waterman`.**normalize**: use normalized distance (default is `true`).
**qval**: use `1` (default).
## License
***phonetized_ner_srv*** is released under the Apache License. See the
[LICENSE](https://github.com/fostroll/ner_srv/blob/master/LICENSE) file for
more details.