Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/martinjack/uaddresspacy

πŸ‡ΊπŸ‡¦ UAddresspacy | Spacy Ρ€Π°Π·Π±ΠΎΡ€ΠΊΠ° украинского адрСса Π½Π° Ρ‚ΠΈΠΏΡ‹
https://github.com/martinjack/uaddresspacy

address nlp parsing spacy spacy-nlp ukraine

Last synced: about 1 month ago
JSON representation

πŸ‡ΊπŸ‡¦ UAddresspacy | Spacy Ρ€Π°Π·Π±ΠΎΡ€ΠΊΠ° украинского адрСса Π½Π° Ρ‚ΠΈΠΏΡ‹

Awesome Lists containing this project

README

        

![header](doc/header.png)
# ОписаниС
[![PyPI version](https://badge.fury.io/py/uaddresspacy.svg)](https://badge.fury.io/py/uaddresspacy)

Parsing Ukrainian addresses into types

> Read this in other language: [English](README.en.md), [Русский](README.md), [Π£ΠΊΡ€Π°Ρ—Π½ΡΡŒΠΊΠΈΠΉ](README.ua.md)

# Requirements
* python3
* spacy
* re
* pandas
* csv
* os
* signal
* threading

## Model preparation
```shell
python3 pretrain.py
```

## Model creation
```shell
python3 train.py
```

## Train model
```shell
python3 -m spacy train config/config.cfg --paths.train training/train.spacy --paths.dev training/test.spacy --output models
```

## Train model more accurately
```shell
python3 -m spacy train config/config_acc.cfg --paths.train training/train.spacy --paths.dev training/test.spacy --output models
```

## Model check
```shell
python3 example.py
```

### Create model description file
```shell
python3 -m spacy init fill-config config/base_config.cfg config/config.cfg
```

### Create description file for a more accurate model
```shell
python3 -m spacy init fill-config config/base_config_acc.cfg config/config_acc.cfg
```

## Examples
```python
import uaddresspacy

print(uaddresspacy.parse(", - ΠΏΠΎΠ»Ρ‚Π°Π²ΡΡŒΠΊΠ° Ρ‡ΡƒΡ‚Ρ–Π²ΡΡŒΠΊΠΈΠΉ ΠΆΠΎΠ²Ρ‚Π½Π΅Π²Π΅ Π²ΡƒΠ». -, Π±ΡƒΠ΄. -, ΠΊΠ².,"))
# [('ΠΏΠΎΠ»Ρ‚Π°Π²ΡΡŒΠΊΠ°', 'Locality'), ('Ρ‡ΡƒΡ‚Ρ–Π²ΡΡŒΠΊΠΈΠΉ', 'CountyType'), ('ΠΆΠΎΠ²Ρ‚Π½Π΅Π²Π΅', 'Locality'), ('Π²ΡƒΠ».', 'StreetType'), ('Π±ΡƒΠ΄.', 'HouseNumberType'), ('ΠΊΠ².', 'ApartmentType')]
print(uaddresspacy.parse(", 01000 ΠΊΠΈΡ—Π², місто ΠΊΠΈΡ—Π², місто ΠΊΠΈΡ—Π² Π²ΠΎΡ€ΠΎΠ²ΡΡŒΠΊΠΎΠ³ΠΎ, Π±ΡƒΠ΄. 43-Π±, ΠΊΠ². 14,"))
# [('01000', 'PostCode'), ('ΠΊΠΈΡ—Π²', 'Region'), ('місто', 'LocalityType'), ('ΠΊΠΈΡ—Π²', 'Locality'), ('Π²ΠΎΡ€ΠΎΠ²ΡΡŒΠΊΠΎΠ³ΠΎ', 'Street'), ('Π±ΡƒΠ΄.', 'HouseNumberType'), ('43-Π±', 'HouseNumber'), ('ΠΊΠ².', 'ApartmentType'), ('14', 'Apartment')]
```
![use](doc/use.gif)

```sh
python3 pretrain.py
```
![pretrain](doc/pretrain.gif)

## Structure
| File | Description |
| :------------- | :------------- |
| pretrain.py | Preparing data for model training |
| train.py | Model preparation |
| example.py | Get example parsings address on types |
| report.csv | Example parsing address on types |
| addresses.csv | List of addresses to check |
| training/raw.csv | Data for training |
| training/pretrain.csv | Data to train model |

## Π’ΠΈΠΏΡ‹
| Name | Description |
| :------------- | :------------- |
| Country | Country |
| RegionType | Type region |
| Region | Region |
| CountyType | Type county |
| County | County |
| Included | Included |
| LocalityType | Type locality |
| Locality | Locality |
| StreetType | Type street |
| Street | Street |
| HousingType | Type housing |
| Housing | Housing |
| HostelType | Type hostel |
| Hostel | Hostel |
| HouseNumberType | Type housenumber |
| HouseNumber | HouseNumber |
| HouseNumberAdditionally | Additionally housenumber |
| SectionType | Type section |
| Section | Section |
| ApartmentType | Type apartment |
| Apartment | Apartment |
| RoomType | Type room |
| Room | Room |
| Sector | Sector |
| FloorType | Type floor |
| Floor | Floor |
| PostCode | Postcode |
| Manually | Manually |
| NotAddress | Not address |
| Comment | Comment |
| AdditionalData | Additional data |