Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/martinjack/uaddresspacy
πΊπ¦ UAddresspacy | Spacy ΡΠ°Π·Π±ΠΎΡΠΊΠ° ΡΠΊΡΠ°ΠΈΠ½ΡΠΊΠΎΠ³ΠΎ Π°Π΄ΡΠ΅ΡΠ° Π½Π° ΡΠΈΠΏΡ
https://github.com/martinjack/uaddresspacy
address nlp parsing spacy spacy-nlp ukraine
Last synced: 3 months ago
JSON representation
πΊπ¦ UAddresspacy | Spacy ΡΠ°Π·Π±ΠΎΡΠΊΠ° ΡΠΊΡΠ°ΠΈΠ½ΡΠΊΠΎΠ³ΠΎ Π°Π΄ΡΠ΅ΡΠ° Π½Π° ΡΠΈΠΏΡ
- Host: GitHub
- URL: https://github.com/martinjack/uaddresspacy
- Owner: martinjack
- License: mit
- Archived: true
- Created: 2022-03-18T13:21:43.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2022-03-22T21:23:52.000Z (almost 3 years ago)
- Last Synced: 2024-09-22T07:01:58.545Z (3 months ago)
- Topics: address, nlp, parsing, spacy, spacy-nlp, ukraine
- Language: Python
- Homepage:
- Size: 31.4 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.en.md
- License: LICENSE
Awesome Lists containing this project
README
![header](doc/header.png)
# ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅
[![PyPI version](https://badge.fury.io/py/uaddresspacy.svg)](https://badge.fury.io/py/uaddresspacy)Parsing Ukrainian addresses into types
> Read this in other language: [English](README.en.md), [Π ΡΡΡΠΊΠΈΠΉ](README.md), [Π£ΠΊΡΠ°ΡΠ½ΡΡΠΊΠΈΠΉ](README.ua.md)
# Requirements
* python3
* spacy
* re
* pandas
* csv
* os
* signal
* threading## Model preparation
```shell
python3 pretrain.py
```## Model creation
```shell
python3 train.py
```## Train model
```shell
python3 -m spacy train config/config.cfg --paths.train training/train.spacy --paths.dev training/test.spacy --output models
```## Train model more accurately
```shell
python3 -m spacy train config/config_acc.cfg --paths.train training/train.spacy --paths.dev training/test.spacy --output models
```## Model check
```shell
python3 example.py
```### Create model description file
```shell
python3 -m spacy init fill-config config/base_config.cfg config/config.cfg
```### Create description file for a more accurate model
```shell
python3 -m spacy init fill-config config/base_config_acc.cfg config/config_acc.cfg
```## Examples
```python
import uaddresspacyprint(uaddresspacy.parse(", - ΠΏΠΎΠ»ΡΠ°Π²ΡΡΠΊΠ° ΡΡΡΡΠ²ΡΡΠΊΠΈΠΉ ΠΆΠΎΠ²ΡΠ½Π΅Π²Π΅ Π²ΡΠ». -, Π±ΡΠ΄. -, ΠΊΠ².,"))
# [('ΠΏΠΎΠ»ΡΠ°Π²ΡΡΠΊΠ°', 'Locality'), ('ΡΡΡΡΠ²ΡΡΠΊΠΈΠΉ', 'CountyType'), ('ΠΆΠΎΠ²ΡΠ½Π΅Π²Π΅', 'Locality'), ('Π²ΡΠ».', 'StreetType'), ('Π±ΡΠ΄.', 'HouseNumberType'), ('ΠΊΠ².', 'ApartmentType')]
print(uaddresspacy.parse(", 01000 ΠΊΠΈΡΠ², ΠΌΡΡΡΠΎ ΠΊΠΈΡΠ², ΠΌΡΡΡΠΎ ΠΊΠΈΡΠ² Π²ΠΎΡΠΎΠ²ΡΡΠΊΠΎΠ³ΠΎ, Π±ΡΠ΄. 43-Π±, ΠΊΠ². 14,"))
# [('01000', 'PostCode'), ('ΠΊΠΈΡΠ²', 'Region'), ('ΠΌΡΡΡΠΎ', 'LocalityType'), ('ΠΊΠΈΡΠ²', 'Locality'), ('Π²ΠΎΡΠΎΠ²ΡΡΠΊΠΎΠ³ΠΎ', 'Street'), ('Π±ΡΠ΄.', 'HouseNumberType'), ('43-Π±', 'HouseNumber'), ('ΠΊΠ².', 'ApartmentType'), ('14', 'Apartment')]
```
![use](doc/use.gif)```sh
python3 pretrain.py
```
![pretrain](doc/pretrain.gif)## Structure
| File | Description |
| :------------- | :------------- |
| pretrain.py | Preparing data for model training |
| train.py | Model preparation |
| example.py | Get example parsings address on types |
| report.csv | Example parsing address on types |
| addresses.csv | List of addresses to check |
| training/raw.csv | Data for training |
| training/pretrain.csv | Data to train model |## Π’ΠΈΠΏΡ
| Name | Description |
| :------------- | :------------- |
| Country | Country |
| RegionType | Type region |
| Region | Region |
| CountyType | Type county |
| County | County |
| Included | Included |
| LocalityType | Type locality |
| Locality | Locality |
| StreetType | Type street |
| Street | Street |
| HousingType | Type housing |
| Housing | Housing |
| HostelType | Type hostel |
| Hostel | Hostel |
| HouseNumberType | Type housenumber |
| HouseNumber | HouseNumber |
| HouseNumberAdditionally | Additionally housenumber |
| SectionType | Type section |
| Section | Section |
| ApartmentType | Type apartment |
| Apartment | Apartment |
| RoomType | Type room |
| Room | Room |
| Sector | Sector |
| FloorType | Type floor |
| Floor | Floor |
| PostCode | Postcode |
| Manually | Manually |
| NotAddress | Not address |
| Comment | Comment |
| AdditionalData | Additional data |