Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/primeqa/primeqa-orchestrator
Orchestrator connecting different PrimeQA components
https://github.com/primeqa/primeqa-orchestrator
Last synced: 5 days ago
JSON representation
Orchestrator connecting different PrimeQA components
- Host: GitHub
- URL: https://github.com/primeqa/primeqa-orchestrator
- Owner: primeqa
- License: apache-2.0
- Created: 2022-10-13T17:47:02.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-19T12:18:38.000Z (over 1 year ago)
- Last Synced: 2023-07-03T23:31:55.212Z (over 1 year ago)
- Language: Python
- Size: 1010 KB
- Stars: 3
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Orchestrator REST Microservice
This toolkit provides an orchestrator microservice that integrates PrimeQA's retriever & reader modules as a REST Server and also other "search" capabilities e.g. IBM Watson Discovery.
Hence, using this orchestrator one can either integrate a neural retriever like ColBERT from PrimeQA or external search e.g. IBM Watson Discovery to fetch documents and then use PrimeQA's reader to extract answer spans from those relevant documents.
![Build Status](https://github.com/primeqa/primeqa-orchestrator/actions/workflows/primeqa-orchestrator-ci.yml/badge.svg)
[![LICENSE|Apache2.0](https://img.shields.io/github/license/saltstack/salt?color=blue)](https://www.apache.org/licenses/LICENSE-2.0.txt)✔️ Getting Started
- [Repository](https://github.com/primeqa/primeqa-orchestrator)
✅ Prerequisites
- [Python 3.9](https://www.python.org/downloads/)
⚙️ Setup
📓 Third-party dependencies
- [PrimeQA](https://github.com/primeqa/primeqa): If you don't have access to running PrimeQA instance, then please refer to PrimeQA repository for more details on setting and running a local one.
- [Watson Discovery](https://cloud.ibm.com/) (Optional): Follow instructions on IBM Cloud to configure Watson Discovery V2 service.🧩 Setup Local Environment
- [Setup and activate a Virtual Environment](https://docs.python.org/3/tutorial/venv.html) (as shown below) or use [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
```shell
# Install virtualenv
pip3 install virtualenv# Create a new virtual environment for this project. If using pyenv, path_to_python_3.9_executable will be ~/.pyenv/versions/3.9.x/bin/python
virtualenv --python= venv# Activate virtual environment
source venv/bin/activate
```- Install dependencies
```shell
pip install -r requirements.txt
pip install -r requirements_test.txt
```🐛 `gprcio` and `grpcio-tools` has limited support on Apple Silicone (M1, M2). Please refer to [grpc github issue#25082](https://github.com/grpc/grpc/issues/25082) for details or download appropriate wheels from [here](https://github.com/pietrodn/grpcio-mac-arm-build).
📜 TLS and Certificate Management
Orchestrator service REST server supports mutual or two-way TLS authentication (also known as mTLS). Application's [`config.ini`](orchestrator/service/config/config.ini) file contains the default certificate paths, but they can be overridden using environment variables.
Self-signed certificates are generated and packaged with the Docker build.
Self-signed certs _may be_ required for local development and testing. If you want to generate them, follow the steps below:```shell
#!/usr/bin/env bash# Make neccessary directories
mkdir -p security/
mkdir -p security/certs/
mkdir -p security/certs/ca security/certs/server security/certs/client# Generate CA key and CA cert
openssl req -x509 -days 365 -nodes -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/ca/ca.key -out security/certs/ca/ca.crt# Generate Server key (without passphrase) and Server cert signing request
openssl req -nodes -new -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/server/server.key -out security/certs/server/server.csr# Sign Server cert
openssl x509 -req -days 365 -in security/certs/server/server.csr -CA security/certs/ca/ca.crt -CAkey security/certs/ca/ca.key -CAcreateserial -out security/certs/server/server.crt# Generate Client key (without passphrase) and Client cert signing request
openssl req -nodes -new -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/client/client.key -out security/certs/client/client.csr# Sign Client cert
openssl x509 -req -days 365 -in security/certs/client/client.csr -CA security/certs/ca/ca.crt -CAkey security/certs/ca/ca.key -CAserial security/certs/ca/ca.srl -out security/certs/client/client.crt# Delete signing requests
rm -rf security/certs/server/server.csr
rm -rf security/certs/client/client.csr
```**IMPORTANT:**
- By default, the application tries to load certs from `/opt/tls`. You will need to update appropriate `tls_*` variables in [`config.ini`](orchestrator/service/config/config.ini) during local use.
- We recommend to generate certificates with official signing authority and use them via volume mounts in the application container.
🛠 Build & Deployment
💻 Local
- Open Python IDE & set the created virtual environment
- Open `orchestrator/services/config/config.ini`, set `require_ssl = True` (if you wish to use TLS authentication) & `rest_port`
- Generate GRPC:
```shell
#!/usr/bin/env bash
set -xeuo pipefail
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/indexer.proto
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/parameter.proto
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/reader.proto
python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/retriever.proto
2to3 --fix=import --nobackups --write orchestrator/integrations/primeqa/grpc_generated
```
- Open `application.py` and run/debug
- Go to
- To be able to use `reader`, `indexer` and `retriever` services, be sure you have access to running instance of PrimeQA container💻 Docker
- Open `config.ini` and set `rest_port`
- Open `Dockerfile` and set the same value to `port`
- Run `docker build -f Dockerfile -t primeqa-orchestrator:$(cat VERSION) .` (creates docker image)
- Run `docker run --rm --name primeqa-orchestrator -d -p : --mount type=bind,source="$(pwd)"/store,target=/store -e STORE_DIR=/store primeqa-orchestrator:$(cat VERSION)` (run docker container)
- Go to
- To be able to use `reader`, `indexer` and `retriever` services, be sure you have access to running instance of PrimeQA container🚨 Configure
- Before first use, you will need to specify few neccessary configurations to connect to third-party depedencies. These setting are intentionally left blank for security purposes.
- Go to `STORE_DIR` directory on your local machine and copy the [primeqa.json](./data/primeqa.json) file in that directory.
- You will need to add/update the `settings` portion in `primeqa.json` file. Primarily add `service_endpoint` information (inclusive of port) for `PrimeQA` in `retriever` and `reader` sections in settings.
a. To use a IBM® Watson Discovery based retriever, add/update `Watson Discovery` add the following to the list in the `retrievers` section.
```json
"Watson Discovery": {
"service_endpoint": "",
"service_api_key": "",
"service_project_id": ""
}
```b. For PrimeQA based retrievers, add/update `PrimeQA` related section in `retrievers` as follows
```json
"PrimeQA": {
"service_endpoint": ":"
}
```c. For PrimeQA based readers, add/update `PrimeQA` related section in `readers` as follows
```json
"PrimeQA": {
"service_endpoint": ":",
"beta": 0.7
}
```For example, to enable both `IBM® Watson Discovery` instance based retriever and `PrimeQA` based retrievers and `PrimeQA` based reader, the settings will look as follows
```json
{
"retrievers": {
"Watson_Discovery": {
"service_endpoint": "",
"service_api_key": "",
"service_project_id": ""
},
"PrimeQA": {
"service_endpoint": ":"
}
},
"readers": {
"PrimeQA": {
"service_endpoint": ":",
"beta": 0.7
}
}
}
```NOTE: The final scoring and ranking is done with a weighted sum of the Reader answer scores and Retriever search hits scores. The `beta` field is the weight assigned to the reader scores and `1-beta` is the weight assigned to the retriever scores.
🧪 Testing
1. To see all available retrievers, execute [GET] `/retrievers` endpoint
```sh
curl -X 'GET' 'http://{PUBLIC_IP}:50059/retrievers' -H 'accept: application/json'
```2. To see all available readers, execute [GET] `/readers` endpoint
```sh
curl -X 'GET' 'http://{PUBLIC_IP}:50059/readers' -H 'accept: application/json'
```Frequenty Asked Questions (FAQs)
1. How do I get feedbacks to fine tune my reader model?
```sh
curl -X 'GET' \
'http://localhost:50059/feedbacks?application=reading&application=qa&_format=primeqa' \
-H 'accept: application/json' > feedbacks.json
```2. How do I get feedbacks to fine tune my retriever model?
```sh
curl -X 'GET' \
'http://localhost:50059/feedbacks?application=retrieval&_format=primeqa' \
-H 'accept: application/json' > feedbacks.json
```📄 Documentation Sync
**Keep PrimeQA documentation reference sync**
Anytime this README files is updated, it is necessary to open a PR on PrimeQA repository to update, with the same modifications, **[the associated file](https://github.com/primeqa/primeqa/blob/main/docs/orchestrator.md)** used on [documentation page](https://primeqa.github.io/primeqa/orchestrator.html).
_Do not modify initial image path_