{"id":23383143,"url":"https://github.com/fostroll/phonetized_ner_srv","last_synced_at":"2025-04-08T09:30:15.909Z","repository":{"id":202058472,"uuid":"285041219","full_name":"fostroll/phonetized_ner_srv","owner":"fostroll","description":"Tiny Flask app for phonetization, NE tagging and text distance calculation","archived":false,"fork":false,"pushed_at":"2021-05-25T16:28:08.000Z","size":33,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-14T06:35:59.894Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fostroll.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-08-04T16:45:55.000Z","updated_at":"2021-05-25T16:28:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"feeb36f5-6010-4a05-8ff0-5c59b29cf411","html_url":"https://github.com/fostroll/phonetized_ner_srv","commit_stats":null,"previous_names":["fostroll/phonetized_ner_srv"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fphonetized_ner_srv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fphonetized_ner_srv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fphonetized_ner_srv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fphonetized_ner_srv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fostroll","download_url":"https://codeload.github.com/fostroll/phonetized_ner_srv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247813006,"owners_count":21000411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-21T22:17:41.048Z","updated_at":"2025-04-08T09:30:15.868Z","avatar_url":"https://github.com/fostroll.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# phonetized_ner_srv\r\n\r\nTiny Flask app for phonetization, NE tagging and text distance calculation.\r\n\r\n## Prerequisites\r\n\r\n*Python 3* and *PyPI* packages `flask`, `mordl`, `textdistance`, `toxine`,\r\n`transliterate`.\r\n\r\n## Starting the Server\r\n\r\nFirst, place storages of trained ***MorDL*** `UposTagger`, `FeatsTagger` and\r\n`NeTagger` into `srv/models` directory. Change the parameter `emb_path` in\r\n`ds_config.json` file of every storage, so that that path became correct.\r\nNote, that the root point for relative paths there is `ner_srv`. Thus, if your\r\nembeddings also placed in the `srv/models` directory, just add `'model/'` in\r\nthe beginning of each `emb_path` value.\r\n\r\nSecond, you may go back to the `srv` directory and correct port in `main.py`\r\nscript.\r\n\r\nAfter that, ensure that you're still in the `srv` directory and run\r\n```sh\r\nsh ./run.sh prod\r\n```\r\n\r\nOr, if you need debug mode, run just\r\n```sh\r\nsh ./run.sh\r\n```\r\n\r\n## Usage\r\n\r\nAll services return data in *json* format.\r\n\r\n```\r\nhttp://\u003caddress\u003e:\u003cport\u003e/api/tokenize/\u003ctext\u003e\r\n```\r\nReturns *Parsed CoNLL-U* for tokenized **text** (untagged).\r\n\r\n```\r\nhttp://\u003caddress\u003e:\u003cport\u003e/api/tag/\u003ctext\u003e\r\n```\r\nReturns *Parsed CoNLL-U* with **text** tokenized and with *UPOS*, *FEATS* and\r\n*MISC:NE* fields filled.\r\n\r\n```\r\nhttp://\u003caddress\u003e:\u003cport\u003e/api/phonetize/\u003ctext\u003e?level=3\u0026syllables=false\r\n```\r\nReturns phonetized version of **text**. Only texts in Russian are processed\r\ncorrectly.\r\n\r\n**level**: the level of simplification. Allowed values:\r\n- `0` means no changes at all but excess spaces;\r\n- `1` removes all spaces;\r\n- `2` most standard version of phonetization;\r\n- `3` refined phonetization;\r\n- `4` rude phonetization;\r\n- `5` even more rude.\r\n\r\nDefault **level** is `3`.\r\n\r\n**syllables**: if `true`, returns array of syllables instead of just **text**\r\nphonetized. Default is `false`.\r\n\r\n```\r\nhttp://\u003caddress\u003e:\u003cport\u003e/api/text-distance/\u003ctext1\u003e/\u003ctext2\u003e?ner1=\u0026ner2=\u0026level=3\u0026algorithm=damerau_levenshtein\u0026normalize=true\u0026qval=1\r\n```\r\nReturns text distance between **text1** and **text2**. Only text in Russian\r\nare processed correctly.\r\n\r\n**ner1**: if specified, at the start, **text1** will be tokenized and tagged,\r\nand then replaced by *FORM* fields of tokens that have **ner1** as value of\r\nthe *MISC:NE* field.\r\n\r\n**ner2**: if specified, at the start, **text2** will be tokenized and tagged,\r\nand then replaced by *FORM* fields of tokens that have **ner2** as value of\r\nthe *MISC:NE* field.\r\n\r\n**level**: before calculating the distance, both **text1** and **text2** will\r\nbe phonetized with that level (see `api/phonetize` service).\r\n\r\n**algorithm**: what method to use to calculate the distance. Allowed\r\nvalues are: `hamming`, `levenshtein`, `damerau_levenshtein` (default),\r\n`jaro`, `jaro_winkler`, `gotoh`, `smith_waterman`.\r\n\r\n**normalize**: use normalized distance (default is `true`).\r\n\r\n**qval**: use `1` (default).\r\n\r\n## License\r\n\r\n***phonetized_ner_srv*** is released under the Apache License. See the\r\n[LICENSE](https://github.com/fostroll/ner_srv/blob/master/LICENSE) file for\r\nmore details.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffostroll%2Fphonetized_ner_srv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffostroll%2Fphonetized_ner_srv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffostroll%2Fphonetized_ner_srv/lists"}