Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/primeqa/primeqa-orchestrator

Orchestrator connecting different PrimeQA components
https://github.com/primeqa/primeqa-orchestrator

Last synced: 5 days ago
JSON representation

Orchestrator connecting different PrimeQA components

Awesome Lists containing this project

README

        



# Orchestrator REST Microservice

This toolkit provides an orchestrator microservice that integrates PrimeQA's retriever & reader modules as a REST Server and also other "search" capabilities e.g. IBM Watson Discovery.

Hence, using this orchestrator one can either integrate a neural retriever like ColBERT from PrimeQA or external search e.g. IBM Watson Discovery to fetch documents and then use PrimeQA's reader to extract answer spans from those relevant documents.

![Build Status](https://github.com/primeqa/primeqa-orchestrator/actions/workflows/primeqa-orchestrator-ci.yml/badge.svg)
[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)

✔️ Getting Started

- [Repository](https://github.com/primeqa/primeqa-orchestrator)

✅ Prerequisites

- [Python 3.9](https://www.python.org/downloads/)

⚙️ Setup

📓 Third-party dependencies

- [PrimeQA](https://github.com/primeqa/primeqa): If you don't have access to running PrimeQA instance, then please refer to PrimeQA repository for more details on setting and running a local one.
- [Watson Discovery](https://cloud.ibm.com/) (Optional): Follow instructions on IBM Cloud to configure Watson Discovery V2 service.

🧩 Setup Local Environment

- [Setup and activate a Virtual Environment](https://docs.python.org/3/tutorial/venv.html) (as shown below) or use [Miniconda](https://docs.conda.io/en/latest/miniconda.html)

```shell
# Install virtualenv
pip3 install virtualenv

# Create a new virtual environment for this project. If using pyenv, path_to_python_3.9_executable will be ~/.pyenv/versions/3.9.x/bin/python
virtualenv --python= venv

# Activate virtual environment
source venv/bin/activate
```

- Install dependencies

```shell
pip install -r requirements.txt
pip install -r requirements_test.txt
```

🐛 `gprcio` and `grpcio-tools` has limited support on Apple Silicone (M1, M2). Please refer to [grpc github issue#25082](https://github.com/grpc/grpc/issues/25082) for details or download appropriate wheels from [here](https://github.com/pietrodn/grpcio-mac-arm-build).

📜 TLS and Certificate Management

Orchestrator service REST server supports mutual or two-way TLS authentication (also known as mTLS). Application's [`config.ini`](orchestrator/service/config/config.ini) file contains the default certificate paths, but they can be overridden using environment variables.

Self-signed certificates are generated and packaged with the Docker build.
Self-signed certs _may be_ required for local development and testing. If you want to generate them, follow the steps below:

```shell
#!/usr/bin/env bash

# Make neccessary directories
mkdir -p security/
mkdir -p security/certs/
mkdir -p security/certs/ca security/certs/server security/certs/client

# Generate CA key and CA cert
openssl req -x509 -days 365 -nodes -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/ca/ca.key -out security/certs/ca/ca.crt

# Generate Server key (without passphrase) and Server cert signing request
openssl req -nodes -new -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/server/server.key -out security/certs/server/server.csr

# Sign Server cert
openssl x509 -req -days 365 -in security/certs/server/server.csr -CA security/certs/ca/ca.crt -CAkey security/certs/ca/ca.key -CAcreateserial -out security/certs/server/server.crt

# Generate Client key (without passphrase) and Client cert signing request
openssl req -nodes -new -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/client/client.key -out security/certs/client/client.csr

# Sign Client cert
openssl x509 -req -days 365 -in security/certs/client/client.csr -CA security/certs/ca/ca.crt -CAkey security/certs/ca/ca.key -CAserial security/certs/ca/ca.srl -out security/certs/client/client.crt

# Delete signing requests
rm -rf security/certs/server/server.csr
rm -rf security/certs/client/client.csr
```

**IMPORTANT:**

- By default, the application tries to load certs from `/opt/tls`. You will need to update appropriate `tls_*` variables in [`config.ini`](orchestrator/service/config/config.ini) during local use.

- We recommend to generate certificates with official signing authority and use them via volume mounts in the application container.

🛠 Build & Deployment

💻 Local

- Open Python IDE & set the created virtual environment
- Open `orchestrator/services/config/config.ini`, set `require_ssl = True` (if you wish to use TLS authentication) & `rest_port`
- Generate GRPC:
```shell
#!/usr/bin/env bash
set -xeuo pipefail
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/indexer.proto
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/parameter.proto
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/reader.proto
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/retriever.proto
2to3 --fix=import --nobackups --write orchestrator/integrations/primeqa/grpc_generated
```
- Open `application.py` and run/debug
- Go to
- To be able to use `reader`, `indexer` and `retriever` services, be sure you have access to running instance of PrimeQA container

💻 Docker

- Open `config.ini` and set `rest_port`
- Open `Dockerfile` and set the same value to `port`
- Run `docker build -f Dockerfile -t primeqa-orchestrator:$(cat VERSION) .` (creates docker image)
- Run `docker run --rm --name primeqa-orchestrator -d -p : --mount type=bind,source="$(pwd)"/store,target=/store -e STORE_DIR=/store primeqa-orchestrator:$(cat VERSION)` (run docker container)
- Go to
- To be able to use `reader`, `indexer` and `retriever` services, be sure you have access to running instance of PrimeQA container

🚨 Configure

- Before first use, you will need to specify few neccessary configurations to connect to third-party depedencies. These setting are intentionally left blank for security purposes.

- Go to `STORE_DIR` directory on your local machine and copy the [primeqa.json](./data/primeqa.json) file in that directory.

- You will need to add/update the `settings` portion in `primeqa.json` file. Primarily add `service_endpoint` information (inclusive of port) for `PrimeQA` in `retriever` and `reader` sections in settings.

a. To use a IBM® Watson Discovery based retriever, add/update `Watson Discovery` add the following to the list in the `retrievers` section.

```json
"Watson Discovery": {
"service_endpoint": "",
"service_api_key": "",
"service_project_id": ""
}
```

b. For PrimeQA based retrievers, add/update `PrimeQA` related section in `retrievers` as follows

```json
"PrimeQA": {
"service_endpoint": ":"
}
```

c. For PrimeQA based readers, add/update `PrimeQA` related section in `readers` as follows

```json
"PrimeQA": {
"service_endpoint": ":",
"beta": 0.7
}
```

For example, to enable both `IBM® Watson Discovery` instance based retriever and `PrimeQA` based retrievers and `PrimeQA` based reader, the settings will look as follows

```json
{
"retrievers": {
"Watson_Discovery": {
"service_endpoint": "",
"service_api_key": "",
"service_project_id": ""
},
"PrimeQA": {
"service_endpoint": ":"
}
},
"readers": {
"PrimeQA": {
"service_endpoint": ":",
"beta": 0.7
}
}
}
```

NOTE: The final scoring and ranking is done with a weighted sum of the Reader answer scores and Retriever search hits scores. The `beta` field is the weight assigned to the reader scores and `1-beta` is the weight assigned to the retriever scores.

🧪 Testing

1. To see all available retrievers, execute [GET] `/retrievers` endpoint

```sh
curl -X 'GET' 'http://{PUBLIC_IP}:50059/retrievers' -H 'accept: application/json'
```

2. To see all available readers, execute [GET] `/readers` endpoint

```sh
curl -X 'GET' 'http://{PUBLIC_IP}:50059/readers' -H 'accept: application/json'
```

Frequenty Asked Questions (FAQs)

1. How do I get feedbacks to fine tune my reader model?



```sh
curl -X 'GET' \
'http://localhost:50059/feedbacks?application=reading&application=qa&_format=primeqa' \
-H 'accept: application/json' > feedbacks.json
```

2. How do I get feedbacks to fine tune my retriever model?



```sh
curl -X 'GET' \
'http://localhost:50059/feedbacks?application=retrieval&_format=primeqa' \
-H 'accept: application/json' > feedbacks.json
```

📄 Documentation Sync

**Keep PrimeQA documentation reference sync**
Anytime this README files is updated, it is necessary to open a PR on PrimeQA repository to update, with the same modifications, **[the associated file](https://github.com/primeqa/primeqa/blob/main/docs/orchestrator.md)** used on [documentation page](https://primeqa.github.io/primeqa/orchestrator.html).
_Do not modify initial image path_