An open API service indexing awesome lists of open source software.

https://github.com/tantikristanti/nerd_kid

NERD and wiKIData (NERD KID) is a machine learning application for classifying Wikidata items into 27 classes (as defined by the Grobid-NER project).
https://github.com/tantikristanti/nerd_kid

grobid-ner machine-learning nerd wikidata

Last synced: 2 months ago
JSON representation

NERD and wiKIData (NERD KID) is a machine learning application for classifying Wikidata items into 27 classes (as defined by the Grobid-NER project).

Awesome Lists containing this project

README

        

# nerdKid :clown_face:

This project is inspired by the works of [entity-fishing](https://github.com/kermitt2/entity-fishing) and [grobid-ner](https://github.com/kermitt2/grobid-ner). Entity-fishing is a tool to automate a recognition and disambiguisation task while grobid-ner is a named-entity recogniser based on the [GROBID](https://github.com/kermitt2/grobid) library, a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured TEI-encoded documents with a particular focus on technical and scientific publications.

**nerdKid** project focuses on the classification of entities into their types (e.g. Person, Location), [grobid-ner Classes](https://grobid-ner.readthedocs.io/en/latest/class-and-senses/) with the use of Wikidata as online knowledge base.

![nerdKid](pic/nerdKid.jpg)

# Goal
According to [Wikidata's statistics](https://www.wikidata.org/wiki/Wikidata:Statistics), more than one hundred million items can be found in Wikidata. With its rich and open knowledge base, it's interesting to learn how those items can be classified into 27 classes. These classes are based on [Grobid-NER](http://grobid-ner.readthedocs.io/en/latest/class-and-senses/) 's project results.

The idea of this project is to make computers understand how grouping millions of items in Wikidata into specific classes based on their data characteristics.

Let's take an example of an item [Albert Einstein](https://www.wikidata.org/wiki/Q937) in Wikidata which has an identifier 'Q937'. This item actually has a number of properties (e.g. 'instance of-P31', 'sex or gender-P21', etc.) as well as a number of values for each property (e.g. 'human-Q5' as a value of property 'P31', 'male-Q6581097' as a value of property 'P-21'). Based on a trained given model, computers will understand how making some predictions and classifying the Albert Einstein's item into a certain class, Person class, for instance. This project will also consider disambiguity of items. For instance, computers will not classify [Marshall Plan](https://www.wikidata.org/wiki/Q4576) into a Person class, because it's not a name of a person, rather it's an American initiative to aid Western Europe.

![Albert Einstein](pic/AlbertEinstein.jpg)

# Tools

![Developing Tools](pic/Tools.jpg)

# Installation-Build-Run
**1. Installation**

*a. Clone this source*

```$ git clone https://github.com/tantikristanti/NERD_KID.git```

*b. Download the zip file*

[NERD_KID](https://github.com/tantikristanti/NERD_KID/archive/master.zip)

**2. Build the project**

```$ mvn clean install```

# Training and Evaluation

For the training purpose, 9922 items of Wikidata were chosen. From these examples, 80% were used for the training purpose and the rest for the evaluation.
The accuracy obtained from the current model is 92,091%. Furthermore, the FMeasure result for each class type can be seen as follows:

Developing Tools

Since the examples were taken randomly, a number of class types did not have enough examples. This is the reason a number of classes have 0 for their FMeasure.

# Get the prediction results

To predict each Wikidata Id prepared in ![New Elements](data/csv/NewElements.csv), this service can be called:

```$ mvn exec:java -Dexec.mainClass="org.nerd.kid.model.WikidataNERPredictor"```

- The result can be seen in ![Result Predicted Class](result/csv/ResultPredictedClass.csv)

# Demo version

For testing purposes, Nerd-Kid is available here [Nerd-Kid](http://nerd.huma-num.fr/kid/service/ner?id=Q1)

User can only just change the Wikidata Id started with 'Q' and then the number.

![Prediction Result](pic/ResultPredictionWeb.jpg)

- The result will be Wikidata Id, the properties, and the result of predicted class.

# Use **nerdKid** Service in Other Projects
**nerdKid** prepares ways to be used in other projects:

1.Make sure **nerdKid** is built `$ mvn clean install`

2.Add the dependency to **nerdKid** in the pom file of other projects :

```

org.nerd.kid
nerd-kid-project
1.0-SNAPSHOT


org.slf4j
slf4j-log4j12


log4j
log4j

```

To prevent some errors due to the overlapping of Maven dependencies like for example *slf4j*, do some steps explained here [slf4j](http://www.slf4j.org/faq.html) :
- Declare the exclusion of commons-logging in the provided scope

```

commons-logging
commons-logging
1.1.1
provided

org.slf4j
jcl-over-slf4j
2.0.0-alpha1

```

- And ignore other slf4j dependency (or simply mark them as comments)

3.Add **nerdKid** library (nerd-kid-project) under `lib/org/nerd/kid`.
This library is built as the deployment result and is saved under `.m2/repository/org/nerd/kid/`

4.Call the prediction service :

For predicting NER type, **nerdKid** needs to collect the statements for each Wikidata element. These statements are collected from *entity-fishing* service.
There are several ways to collect the statements:
- Collect from [entity-fishing](http://nerd.huma-num.fr/nerd/service/kb/concept)
- Collect from localhost `http://localhost:8090/service/kb/concept`
- Collect from LMDB data of *entity-fishing* (data/db/db-kb), [entity-fishing-documentation](https://nerd.readthedocs.io/en/latest/build.html#install-build-and-run)

In this case, new classes as implementation of an interface called `WikidataFetcherWrapper` in **nerdKid** need to be created and the `WikidataElement getElement(String wikiId)` method needs to be adapted as needed.

a. Example of using **nerdKid** service by default :

```
WikidataFetcherWrapper wrapper = new NerdKBFetcherWrapper();
WikidataNERPredictor wikidataNERPredictor = new WikidataNERPredictor(wrapper);
System.out.println(wikidataNERPredictor.predict("Q1077").getPredictedClass());
```

b. Example of using **nerdKid** service by running *entity-fishing* on localhost (default port on 8090) :

To use this way, *entity-fishing* needs to be run `$ mvn clean jetty:run`, see [entity-fishing-documentation](https://nerd.readthedocs.io/en/latest/build.html#install-build-and-run)

```
WikidataFetcherWrapper wrapper = new NerdKBLocalFetcherWrapper();
WikidataNERPredictor wikidataNERPredictor = new WikidataNERPredictor(wrapper);
System.out.println(wikidataNERPredictor.predict("Q1077").getPredictedClass());
```

## Reference

For citing this work, please simply refer to the Github project:

```Nerd-Kid (2017-2023) ```

## Contact

Main author and contact: [Tanti Kristanti](mailto:[email protected])