Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ilib-js/ilib-name

Parse and format names of people
https://github.com/ilib-js/ilib-name

Last synced: 3 days ago
JSON representation

Parse and format names of people

Awesome Lists containing this project

README

        

# ilib-name

Library to parse and format names of people

## Installation

```
npm install ilib-name

or

yarn add ilib-name
```

## API Reference

You can see the [generated API reference docs](./docs/ilibName.md)
for full details.

## Motivation

Why would you want to parse and format names?

Let's say you have a part of your web site that allows your users to enter some contacts. You may have a name field and an email field. The name field is a single field where they can type the whole name of their contact. Now, you would probably also have a page that lists the contacts in alphabetical order, and the user may be able to click on the last name or first name column to sort by that field.

The problem is... what is the last name or each contact? What is the first name? In English, you might be tempted to just take the first word in the whole name as the first name, and the last word as the last name. But what if the user has typed in a prefix or suffix like "Dr. John Smith, PhD."? That algorithm won't even work in English, as that contact's name would end up being "Dr. PhD.".

Now things get even more complicated when you have to support multiple languages. Some languages typically reverse the order of the names. For this reason, iLib never deals with "first" and "last" name, as it doesn't even make sense for those languages. Instead, it uses the terms "given name" and "family name", which are unambiguous and do not depend on order.

There is even more fun to be had if you try to parse names in other langauges. Here is a table of example names in various languages and how they are parsed.

| Locale | Name | prefix | given | middle | family | suffix |
|----------------------------------|-------------------------------|--------|--------|--------|-------------------|--------|
| English for the US (en-US) | Dr. John James Smith Jr. | Dr. | John | James | Smith | Jr. |
| German for Germany (de-DE) | Ludwig von Beethoven | | Ludwig | | von Beethoven | |
| Spanish for Spain (es-ES) | Jose Manuel Rodriguez Mendoza | | Jose | Manuel | Rodriguez Mendoza | |
| Korean for South Korea (ko-KR) | 김인환 | | 인환 | | 김 | |
| Indonesian for Indonesia (id-ID) | Sukarno | | Sukamo | | | |

Note that in English example, we have the prefixes and suffixes to deal with. It turns out that many other languages have prefixes and suffixes too.

In German, the auxiliary word "von" is considered part of the family name. However, when sorting by family name, "von Beethoven" would appear under the B's, not the V's, as auxiliaries are ignored. That is, German is sorted by "head word" of the family name rather than the first word of the family name.

In Spain, many people have a given name first, followed by one or more middle names, followed by their father's family name, and followed finally their mother's family name. So in the example above, Jose's family name is "Rodriguez Mendoza" where "Rodriguez" is Jose's father's family name, and "Mendoza" is Jose's mother's family name. If you were to shorten the name, Jose would probably prefer "Jose Rodriguez" instead of "Jose Mendoza", and Jose's children would get the name "Rodriguez" as well.

In Korean, as in Chinese and Japanese, the family name comes first, followed by the given name with no space in between. In Chinese and Japanese, sometimes only the first character is the family name, and some times up to 3 characters. Either way, the parts of the name have to be teased apart without help from a separating space.

Other languages have reversed order as well. In Hungarian, it is common to have reverse order, as in "Kovacs Ernie" instead of "Ernie Kovacs" as he is known in the US. In Russian, it is even more complicated, as it is common to write names both in western order and in reverse. Russian speakers can recognize the order easily, as they are very familiar with customs used to create names. Ilyanov is a patronymic name form, and any Russian speaker would recognize it immediately as a family name, no matter where it appeared in the whole name.

In Indonesia, it is common for people to have only a given name, especially for older folks or people who live outside of large metropolises, or even to have multiple names but no family name. You might be identified more specifically by naming your parents, such as "Sukarno child of Suparman and Wulandari". Relatively recently, it has become more common to have a patronymic name appended to your given name, such as "Sukarno Suparmanputra". Appending patronymics is also common in Iceland as well.

Given all the above, the humble Javascript programmer needs a little help to parse and format names so they don't have to be an expert in a large number of languages. Enter ilib-name.

## Parsing Names with ilib-name

Parsing names is a simple matter of creating an instance of the Name class. Here is an example:

~~~~~~
{
prefix: "Dr.",
givenName: "John",
middleName: "James",
familyName: "Smith",
suffix: ", Jr."
}
~~~~~~

## Parsing in Other Languages

Just like other iLib classes, you can pass in options to the constructor to control how the parsing proceeds. For example, let's say you want to parse the name in Korean instead:

~~~~~~
import { Name } from 'ilib-name';

const name = new Name("김인환", {locale: "ko-KR"});
~~~~~~

Or if you are using older Javascript:

~~~~~~
var NamePkg = require("ilib-name");
var Name = NamePkg.Name;

var name = new Name("김인환", {locale: "ko-KR"});
~~~~~~

Now the properties are:

~~~~~~
{
givenName: "인환",
familyName: "김"
}
~~~~~~

## Mixed Sets of Names in Asian Languages

It is typical in a contact list of a person who speaks an Asian language to have a mix of both Asian names and Western names together in varying degrees. The iLib parser can deal with this situation properly when parsing in Asian locales by checking if the name contains any Asian characters in it. If a name contains at least one Asian character, it assumes that the name is an Asian name. If it contains all non-Asian characters, it is assumed to be a Western style name.

That means the following will still work properly:

~~~~~~
import { Name } from 'ilib-name';

const name = new Name("Dr. John James Smith, Jr.", {locale: "ko-KR"});
~~~~~~

or in older Javascript:

~~~~~~
var NamePkg = require("ilib-name");
var Name = NamePkg.Name;

var name = new Name("Dr. John James Smith, Jr.", {locale: "ko-KR"});
~~~~~~

Now, the "name" variable should contain the following properties:

~~~~~~
{
prefix: "Dr.",
givenName: "John",
middleName: "James",
familyName: "Smith",
suffix: ", Jr."
}
~~~~~~

## It Doesn't Work All the Time

There are many types of names and strings where the parser does not work properly. Don't expect that it should work in 100% of the cases!

For example, in some Exchange-based contact systems, the administrator may have decided to append the name of an employee's division to each name automatically. Thus, the name "John Smith (Mortgages and Loans Group)" will not be parsed corrected. This poor fellow's name is not "John Group)"!

If needed, the parser can be upgraded in the future to support more things and special cases. Please let us know if you have a very common case that is not supported. (email to "[email protected]")

## Formatting Names

In some contacts services, such as Google Contacts, names are given as whole names in a single string, and you need the parser documented in the previous section to decipher the parts of the name.

Other services, such as Outlook/Exchange, give you names that are broken down by field. This is much less ambiguous and makes it easy to sort by a particular name, but it presents a problem if you want to display that name on the screen. What order do you use? What if you don't want to present all the parts of the name and you just want the short version?

There is where you would use the class NameFmt to reassemble the parts into a single string. Here is an example:

~~~~~~
import { Name, NameFmt } from 'ilib-name';

// presumably, these parts came from Outlook or some such service
const name = new Name({
prefix: "Dr.",
givenName: "John",
middleName: "James",
familyName: "Smith",
suffix: ", Jr."
});
const nf = new NameFmt();
const formatted = nf.format(name);
~~~~~~

or in older Javascript:

~~~~~~
var NamePkg = require("ilib-name");
var Name = NamePkg.Name;
var NameFmt = NamePkg.NameFmt;

// presumably, these parts came from Outlook or some such service
var name = new Name({
prefix: "Dr.",
givenName: "John",
middleName: "James",
familyName: "Smith",
suffix: ", Jr."
});
var nf = new NameFmt();
var formatted = nf.format(name);
~~~~~~

Now the name in the "formatted" variable will appear as the string:

> John Smith

Let's say you wanted the full name instead where all the parts are assembled in the correct order. In this case, the formatter also takes options to its constructor:

~~~~~~
import { Name, NameFmt } from 'ilib-name';

// presumably, these parts came from Outlook or some such service
const name = new Name({
prefix: "Dr.",
givenName: "John",
middleName: "James",
familyName: "Smith",
suffix: ", Jr."
});
const nf = new NameFmt({style: "full"});
const formatted = nf.format(name)
~~~~~~

Or in older Javascript:

~~~~~~
var NamePkg = require("ilib-name");
var Name = NamePkg.Name;
var NameFmt = Name.NameFmt;

// presumably, these parts came from Outlook or some such service
var name = new Name({
prefix: "Dr.",
givenName: "John",
middleName: "James",
familyName: "Smith",
suffix: ", Jr."
});
var nf = new NameFmt({style: "full"});
var formatted = nf.format(name)
~~~~~~

The entire name will be re-assembled as:

> Dr. John James Smith, Jr.

## Collating Family Names

As mentioned in a previous section, German groups the auxiliary words along with the family name for display, but not for collation sorting. In the case of sorting, you have to get the head word of the name. Dutch also follows these rules, by the way.

There are two methods of the Name class you can use.

First, you can get the head word directly:

~~~~~~
import { Name } from "ilib-name";

var name = new Name("Ludwig von Beethoven", {locale: "de-AT"});
var headname = name.getHeadFamilyName();
~~~~~~

or in older Javascript:

~~~~~~
var NamePkg = require("ilib-name");
var Name = NamePkg.Name;

var name = new Name("Ludwig von Beethoven", {locale: "de-AT"});
var headname = name.getHeadFamilyName();
~~~~~~

Now the variable "headname" will contain:

> Beethoven

Alternately, you can use the method "getSortFamilyName" to get a re-ordered version of the family name that you can use directly for sorting.

~~~~~~
import { Name } from "ilib-name";

var name = new Name("Pieter van der Veen", {locale: "nl-NL"});
var sortname = name.getSortFamilyName();
~~~~~~

or in older Javascript:

~~~~~~
var Name = require("ilib/lib/Name.js");

var name = new Name("Pieter van der Veen", {locale: "nl-NL"});
var sortname = name.getSortFamilyName();
~~~~~~

Now the variable "sortname" will contain this string with the head word first, followed by a comma, followed by all the auxiliaries:

> Veen, van der

## Asynchronous Operation

All the above examples are shown synchronously. Like all other ilib ESM code,
this library can be use in asynch mode. Both the Name and the NameFmt class
have a static `create()` that is a factory method which creates a new
instance asynchronously and returns a Promise. This would be useful in a
browser environment with webpack when the locale data is not yet loaded.

Example of parsing a name asynchronously:

~~~~~~
import { Name } from 'ilib-name';

Name.create("Dr. John James Smith, Jr.", {locale: "ko-KR"}).then((name) => {
// do something with the name fields
});
~~~~~~

## License

Copyright © 2013-2015, 2018, 2022, JEDLSoft

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and
limitations under the License.

## Release Notes

### v1.0.0

* Copied code from ilib v14.15.0
* converted to ES modules
* Added Name.create() factory method for asynch operation