https://github.com/geocurly/name-splitter
Split name utility
https://github.com/geocurly/name-splitter
name-splitter php74 russian-name-splitter splitter
Last synced: 3 months ago
JSON representation
Split name utility
- Host: GitHub
- URL: https://github.com/geocurly/name-splitter
- Owner: geocurly
- License: mit
- Created: 2020-04-05T12:37:24.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-08-29T10:07:14.000Z (almost 6 years ago)
- Last Synced: 2025-01-19T22:50:55.616Z (over 1 year ago)
- Topics: name-splitter, php74, russian-name-splitter, splitter
- Language: PHP
- Size: 109 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# name-splitter
### Unfortunately, utility supports only сyrillic names
There is a name split utility. It's take input string and parse it to the object.
## Usage:
```php
'CP1251']);
$result = $splitter->split('Иванов Иван Иванович');
[$surname, $name, $middleName] = [
$result->getSurname(),
$result->getName(),
$result->getMiddleName(),
];
```
## Quality
The NameSplitter's tests cover ~ 13000 cases of russian names with accuracy 99.65. Every case took a part with many templates, so result cases count was 124283.
You can run tests with your data set (use `--verbose` option to see templates errors):
```bash
[aleksandr@aleksandr name-splitter]$ ./bin/name-split-test --file=$(realpath fio.csv)
TESTED TEMPLATES:
%Surname %Name %Middle
%Name %Middle %Surname
%Name %Middle
%Name %Surname
%Surname %Name
%Surname %StrictInitials
%StrictInitials %Surname
%Surname %SplitInitials
%SplitInitials %Surname
ACCURACY: 99.65
COUNT CASE TOTAL: 124283
COUNT CASE PASS: 123848
COUNT CASE ERROR: 435
```
Format for `fio.csv` file is:
```csv
SomeSurname;SomeName;SomeMiddleName
```
## Problems
* Utility can't recognize templates like `%Name %Surname` when surname matches with middle name (for example `Иван Иванович`).
* Some templates may not correctly work when split name doesn't exist in [dictionaries](https://github.com/geocurly/name-splitter/tree/master/resources/dictionaries/ru)
## Decision
You can use pre and post templates:
```php
'Difficult Surname',
TPL::NAME => 'Difficult Name'
]),
static function(StateInterface $state) {
// TODO there is will be your implementation
return [
TPL::SURNAME => $surname ?? null,
TPL::NAME => $name ?? null,
];
},
];
// There are may be any callable types if they take to input the StateInterface
$after = [];
$splitter = new NameSplitter([], $before, $after);
$result = $splitter->split('Difficult Surname Difficult Name');
```