https://github.com/rosette-api/custom-processor-sample
Custom processor sample for Babel Street Analytics Entity Extractor
https://github.com/rosette-api/custom-processor-sample
Last synced: 4 months ago
JSON representation
Custom processor sample for Babel Street Analytics Entity Extractor
- Host: GitHub
- URL: https://github.com/rosette-api/custom-processor-sample
- Owner: rosette-api
- Created: 2020-08-25T20:20:38.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-09-28T18:28:01.000Z (over 4 years ago)
- Last Synced: 2025-01-11T20:46:27.484Z (6 months ago)
- Language: Java
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 7
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Rosette Entity Extractor Custom Processor Sample
The Rosette Entity Extractor supports custom processors that allow you to preprocess text at the pre-extraction phase or modify the output at the pre-redaction phase. This sample will show how a custom processor can be used to correct an entity type.
Requirements:
- Rosette Enterprise 1.17.0
- Rosette Enterprise license for Entity Extraction and English
- [OPTIONAL] If you start Rosette Server via docker-compose, Docker must be installed and running on your machine## Observe default behavior
Start Rosette Server
```
export ROSAPI_HOME=/path-to-rosette-install/server directory
$ROSAPI_HOME/bin/launch.sh console
```
Call the /entities endpoint
```
curl -s --request POST 'http://localhost:8181/rest/v1/entities' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--data '{"content":"My name is Karin", "language":"eng"}' \
| jq .
```
You will see that "Karin" is extracted with the incorrect type ORGANIZATION. The custom processor sample code will correct entities that follow the string "My name is" to be type PERSON.## Build
To build the custom processor sample, run
```
export CP_SAMPLE=/path-to-this-directory
cd $CP_SAMPLE
mvn -Drosapi.home=$ROSAPI_HOME -P extract-rex-core-jar
mvn -Drosapi.home=$ROSAPI_HOME -P build-custom-processor
```This will build the `custom-processor-sample-1.0.jar` in the target directory.
## Integrate with Rosette Server
Copy `custom-processor-sample-1.0.jar` into Rosette Server
```
cp target/custom-processor-sample-1.0.jar $ROSAPI_HOME/launcher/bundles
```Edit `$ROSAPI_HOME/launcher/config/rosapi/rex-factory-config.yaml` and add the following lines
```
#Custom processors to add to annotators. See AppDev guide for more details on custom processor.
customProcessors:
- personContextAnnotator
- boundaryAdjustAnnotator
- metadataAnnotator#Register a custom processor class. See AppDev guide for more details on custom processor.
customProcessorClasses:
- sample.SampleCustomProcessor
```## Run
Start Rosette Server
```
$ROSAPI_HOME/bin/launch.sh console
```
Execute the same curl command as before:
```
curl -s --request POST 'http://localhost:8181/rest/v1/entities' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--data '{"content":"My name is Karin", "language":"eng"}' \
| jq .
```The output is
```
{
"entities": [
{
"type": "PERSON",
"mention": "Karin",
"normalized": "Karin",
"count": 1,
"mentionOffsets": [
{
"startOffset": 11,
"endOffset": 16
}
],
"entityId": "T0",
"confidence": 0.03511071
}
]
}
```
Notice that the entity type is now correct as PERSON.## The Docker way
1. Edit the `docker-compose.yaml` file, adding the following files to the volumes section.
```
volumes:
- rosette-roots-vol:/rosette/server/roots:ro
- ${ROSAPI_LICENSE_PATH}:/rosette/server/launcher/config/rosapi/rosette-license.xml:ro
- ${CP_SAMPLE}/config/rex-factory-config.yaml:/rosette/server/launcher/config/rosapi/rex-factory-config.yaml:ro
- ${CP_SAMPLE}/target/custom-processor-sample-1.0.jar:/rosette/server/launcher/bundles/custom-processor-sample-1.0.jar:ro
```2. Start the Rosette Server Docker container
```
ROSAPI_LICENSE_PATH=/rosette-license.xml docker-compose up
```Call the /entities endpoint with the same command as shown above. You should see "Karin" extracted
as the correct entity type, PERSON.