An open API service indexing awesome lists of open source software.

https://github.com/kbss-cvut/termit-docker

Dockerization of TermIt - a SKOS terminology manager
https://github.com/kbss-cvut/termit-docker

docker docker-compose

Last synced: about 1 year ago
JSON representation

Dockerization of TermIt - a SKOS terminology manager

Awesome Lists containing this project

README

          

# TermIt Docker

TermIt Docker serves to spin off a TermIt deployment, consisting of:

- [GraphDB](https://www.ontotext.com/products/graphdb/) (database)
- [TermIt](https://github.com/kbss-cvut/termit) (backend)
- [Annotace](https://github.com/kbss-cvut/annotace)
- [TermIt UI](https://github.com/kbss-cvut/termit-ui) (frontend)

## Prerequisities

- Docker 19.03.0 or later & Docker Compose installed (and accessible under the current user).

### Resource Requirements

- TermIt: at least 512MB RAM (1GB and more is optimal), at least 2 CPUs
- In case more users create and edit terms in TermIt, more CPUs is recommended
- TermIt UI: 100MB RAM
- GraphDB: at least 2GB RAM (depending on the amount of data stored), 1 CPU
- Annotace: at least 512MB RAM

Ideally, the whole deployment should have at least 4GB RAM available, with at least 2-3 CPU cores.

## Running TermIt

1. Set email server configuration in `.env`. In particular, set `MAIL_HOST`, `MAIL_USERNAME` and `MAIL_PASSWORD`, (
optionally) `MAIL_PORT`.
2. (_Optional_) Set `ROOT` variable in `.env` to reflect the local context prefix the app will be running on.
3. (_Optional_) Set `HOST_PORT` variable in `.env` to reflect the port on which TermIt should be accessible.
4. (_Optional_) Set `URL` variable in `.env` to reflect the address TermIt will be running on. If the system is running
behind a server proxy (like Apache), the URL should be the **public URL** provided by the server proxy (for
example, https://termit.fel.cvut.cz). Otherwise,
the URL should contain the `HOST_PORT` specified above (for example, http://localhost:1234). If the **public URL**
is not based on standards HTTP(S) ports (80, 443), set also the `PUBLIC_PORT` so that the backend is able to
correctly generate server URL for the API docs using Swagger UI.
5. (_Optional_, recommended) Set `JWT_SECRET_KEY` variable in `.env`. It should be a string of at least 32 characters
that will be used to hash the JWT authentication token for logged-in users.
6. Start all the services by running
`docker-compose up -d`
7. (_Optional_) If you have a license for GraphDB, go to `${URL}/${ROOT}/sluzby/db-server/license/register` and upload
the license file.
8. Go to `${URL}/${ROOT}/sluzby/db-server/import#server`, select the "termit" repository, and in the "Server files"
section, click the "Import" button for all the files. In the "Import settings" dialog, set the Base IRI
to `http://onto.fel.cvut.cz/ontologies/termit`.
9. Go to `${URL}/${ROOT}/sluzby/db-server/sparql` and execute all the queries in the `db-server/lucene` directory to
create Lucene connectors for full-text search (see below w.r.t. the connector language settings).
10. Look for admin credentials in the `termit-server` log (on Linux/WSL, you can use
grep: `docker-compose logs | grep "Admin credentials"`) and use them for first login at the configured URL,
e.g. http://localhost:1234/termit.

## Configuration

TermIt is highly configurable both in terms of the content and the way it runs. This section provides details on
the most important configuration options.

### Language

The default configuration assumes TermIt is run for Czech vocabularies. To use TermIt in other environments, the
following changes are needed:

#### TermIt

TermIt backend stores and loads strings based on the configured language. To change it, set
the `TERMIT_PERSISTENCE_LANGUAGE` value in `docker-compose.yml` to the appropriate language tag (e.g., en, de).

#### Full Text Search

Full text search (FTS) is implemented via Lucene connectors in the underlying GraphDB repository. These connectors are
language-specific, so to use a different language for TermIt and FTS working correctly, the Lucene connectors need to be
configured accordingly. To use a different language that Czech, set the following in the connector-creating SPARQL
queries in `db-server/lucene`:

- Set the value of the "languages" attribute to the appropriate language tag
- Set the value of the "analyzer" attribute to the appropriate fully qualified Lucene analyzer class name. See, for
example, https://lucene.apache.org/core/4_0_0/analyzers-common/overview-summary.html.

### Further TermIt Configuration

As stated above, TermIt is highly configurable. The following table lists the names of environment variables that can be
passed to TermIt backend either directly in `docker-compose.yml`, in
an [env_file](https://docs.docker.com/compose/compose-file/compose-file-v3/#env_file), or via command line.

| Variable | Explanation |
|:---------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ```TERMIT_ADMIN_CREDENTIALSFILE```\* | Name of the file in which admin credentials are saved when its account is generated.
value must be present |
| ```TERMIT_ADMIN_CREDENTIALSLOCATION```\* | Specifies the folder in which admin credentials are saved when its account is generated.
value must be present |
| ```TERMIT_CHANGETRACKING_CONTEXT_EXTENSION```\* | Extension appended to asset identifier (presumably a vocabulary ID) to denote its change tracking context
identifier.
value must be present |
| ```TERMIT_COMMENTS_CONTEXT```\* | IRI of the repository context used to store comments (discussion to assets).
value must be present |
| ```TERMIT_CORS_ALLOWEDORIGINS```\* | A comma-separated list of allowed origins for CORS.
Default value: ```http://localhost:3000```
value must be present |
| ```TERMIT_TEMPLATE_EXCELIMPORT``` | Template file for Excel import.


The purpose of configuring this file is mainly to have the value lists for term types and states in the
template aligned with the corresponding languages used by TermIt.


Empty value means the built-in template file should be used. |
| ```TERMIT_FILE_STORAGE```\* | Specifies root directory in which document files are stored.
value must be present |
| ```TERMIT_GLOSSARY_FRAGMENT```\* | IRI path to append to vocabulary IRI to get glossary identifier.
value must be present |
| ```TERMIT_NAMESPACE_FILE_SEPARATOR```\* | Separator of File namespace from the parent Document identifier.


Since File identifier is given by the identifier of the Document it belongs to and its own normalized label,
this separator is used to (optionally) configure the File identifier namespace.


For example, if we have a Document with IRI ```http://www.example.org/ontologies/resources/metropolitan-plan/document```
and a File with normalized label ```main-file```, the resulting IRI will be ```
http://www.example.org/ontologies/resources/metropolitan-plan/document/SEPARATOR/main-file```, where
'SEPARATOR' is the value of this configuration parameter.
value must be present |
| ```TERMIT_NAMESPACE_RESOURCE```\* | Namespace for resource identifiers.
value must be present |
| ```TERMIT_NAMESPACE_SNAPSHOT_SEPARATOR```\* | Separator of snapshot timestamp and original asset identifier.


For example, if we have a Vocabulary with IRI ```http://www.example.org/ontologies/vocabularies/metropolitan-plan```
and the snapshot separator is configured to ```version```, a snapshot IRI will look something like ```http://www.example.org/ontologies/vocabularies/metropolitan-plan/version/20220530T202317Z```.
value must be present |
| ```TERMIT_NAMESPACE_TERM_SEPARATOR```\* | Separator of Term namespace from the parent Vocabulary identifier.


Since Term identifier is given by the identifier of the Vocabulary it belongs to and its own normalized
label, this separator is used to (optionally) configure the Term identifier namespace.


For example, if we have a Vocabulary with IRI ```http://www.example.org/ontologies/vocabularies/metropolitan-plan```
and a Term with normalized label ```inhabited-area```, the resulting IRI will be ```
http://www.example.org/ontologies/vocabularies/metropolitan-plan/SEPARATOR/inhabited-area```, where 'SEPARATOR'
is the value of this configuration parameter.
value must be present |
| ```TERMIT_NAMESPACE_USER```\* | Namespace for user identifiers.
value must be present |
| ```TERMIT_NAMESPACE_VOCABULARY```\* | Namespace for vocabulary identifiers.
value must be present |
| ```TERMIT_PERSISTENCE_DRIVER```\* | OntoDriver class for the repository.
value must be present |
| ```TERMIT_PERSISTENCE_LANGUAGE```\* | Language used to store strings in the repository (persistence unit language).
value must be present |
| ```TERMIT_PUBLICVIEW_WHITELISTPROPERTIES```\* | Unmapped properties allowed to appear in the public term access API.
value must be present |
| ```TERMIT_REPOSITORY_URL```\* | URL of the main application repository.
value must be present |
| ```TERMIT_TEXTANALYSIS_TERMOCCURRENCEMINSCORE```\* | Score threshold for a term occurrence for it to be saved into the repository.
Default value: ```0.49```
value must be present |
| ```TERMIT_WORKSPACE_ALLVOCABULARIESEDITABLE```\* | Whether all vocabularies in the repository are editable.


Allows running TermIt in two modes - one is that all vocabularies represent the current version and can be
edited. The other mode is that working copies of vocabularies are created and the user only selects a subset
of these working copies to edit (the so-called workspace), while all other vocabularies are read-only for
them.
Default value: ```true```
value must be present |
| ```TERMIT_MAIL_SENDER``` | Human-readable name to use as email sender. |
| ```SPRING_MAIL_HOST``` | Email server hostname. |
| ```SPRING_MAIL_PORT``` | Email server port. |
| ```SPRING_MAIL_USERNAME``` | Email server username. |
| ```SPRING_MAIL_PASSWORD``` | Email server password. |
| ```SPRING_SERVLET_MULTIPART_MAXFILESIZE``` | Maximum size of a single uploaded file |
| ```TERMIT_ACL_DEFAULTEDITORACCESSLEVEL``` | Default access level for users in the editor role.
Default value: ```READ``` |
| ```TERMIT_ACL_DEFAULTREADERACCESSLEVEL``` | Default access level for users in the reader role.
Default value: ```READ``` |
| ```TERMIT_CORS_ALLOWEDORIGINPATTERNS``` | A comma-separated list of allowed origin patterns for CORS.


This allows a more dynamic configuration of allowed origins that ```\#allowedOrigins``` which contains exact
origin URLs. It is useful, for example, for Netlify preview builds of the frontend which use a generated
subdomain URL. |
| ```TERMIT_JWT_SECRETKEY``` | Secret key used when hashing a JWT. |
| ```TERMIT_LANGUAGE_STATES_SOURCE``` | Path to a file containing definition of the language of states terms can be in. The file must be in
Turtle format. The term definitions must use SKOS terminology for attributes (prefLabel, scopeNote and
broader/narrower). |
| ```TERMIT_LANGUAGE_TYPES_SOURCE``` | Path to a file containing definition of the language of types terms can be classified with.


The file must be in Turtle format. The term definitions must use SKOS terminology for attributes (prefLabel,
scopeNote and broader/narrower). |
| ```TERMIT_MAIL_SENDER``` | Human-readable name to use as email sender. |
| ```TERMIT_REPOSITORY_PASSWORD``` | Password for connecting to the application repository. |
| ```TERMIT_REPOSITORY_PUBLICURL``` | Public URL of the main application repository.


Can be used to provide read-only no authorization access to the underlying data. |
| ```TERMIT_REPOSITORY_USERNAME``` | Username for connecting to the application repository. |
| ```TERMIT_SCHEDULE_CRON_NOTIFICATION_COMMENTS``` | CRON expression configuring when to send notifications of changes in comments to admins and
vocabulary authors. Defaults to '-' which disables this functionality.
Default value: ```-``` |
| ```TERMIT_SECURITY_PROVIDER``` | Determines whether the internal security mechanism or an external OIDC service will be used for
authentication.


In case na OIDC service is selected, it should be configured using standard Spring Boot OAuth2 properties.
Default value: ```INTERNAL``` |
| ```TERMIT_SECURITY_ROLECLAIM``` | Claim in the authentication token provided by the OIDC service containing roles mapped to TermIt user roles.


Supports nested objects via dot notation.
Default value: ```realm_access.roles``` |
| ```TERMIT_TEXTANALYSIS_URL``` | URL of the text analysis service. |
| ```TERMIT_URL``` | TermIt frontend URL.


It is used, for example, for links in emails sent to users.
Default value: ```http://localhost:3000/#``` |

**\* Required**

The parameters are based on
the [Configuration](https://github.com/kbss-cvut/termit/blob/master/src/main/java/cz/cvut/kbss/termit/util/Configuration.java)
class in TermIt backend. If you need to further adjust the behavior of TermIt, consult this class.

### Host Proxy Configuration

TermIt uses Web sockets for asynchronous communication between the server and the clients. If the host system runs a web
proxy (most do),
this needs to be configured in the proxy.

#### Apache2

For the Apache HTTP server (default on Debian and other Linux systems) this can be done by enabling the
`mod_proxy_wstunnel` [module](https://httpd.apache.org/docs/2.4/mod/mod_proxy_wstunnel.html) and using the following
rewrite rule:

```apache2
# Proxy WebSocket connections to termit at port 1234
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule ^/termit?(.*) "ws://localhost:1234/termit/sluzby/server$1" [P,L]
```

#### Nginx

For nginx, this can be done by adding the following snippet, which initializes the `connection_upgrade` variable, to
the `http` section of the `nginx.conf` file:

```nginx
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
```

And then adding the `Upgrade` and `Connection` headers to the request:

```nginx
location /termit {
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Other proxy headers and proxy_pass
}
```