https://github.com/grip-on-software/visualization-site
Configuration for GROS servers, templates for integrated visualization hub, tests, analytics, schema documentation and prediction API routes
https://github.com/grip-on-software/visualization-site
server-configuration visualization-dashboard
Last synced: 4 months ago
JSON representation
Configuration for GROS servers, templates for integrated visualization hub, tests, analytics, schema documentation and prediction API routes
- Host: GitHub
- URL: https://github.com/grip-on-software/visualization-site
- Owner: grip-on-software
- License: apache-2.0
- Created: 2023-06-01T10:41:24.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2024-08-05T09:06:39.000Z (almost 2 years ago)
- Last Synced: 2025-09-01T02:01:57.261Z (9 months ago)
- Topics: server-configuration, visualization-dashboard
- Language: Python
- Homepage: https://gros.liacs.nl/visualization/
- Size: 2.71 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GROS visualization proxy and site
This repository contains configuration files and static web files for the
visualization hub of the GROS project. The proxy is intended to provide access
to a Ghost blog and Discourse forum if they are deployed within the
environment, as well as to the visualization reports for the organization,
which were generated using branch-based builds on a Jenkins instance.
The repository needs to be configured to work within a specific environment.
Two stages of reverse proxies are used to safely provide access to the GROS
webservers (separate Docker instances, a Jenkins setup and/or direct access).
The reverse proxies are as follows:
- [Caddy](https://caddyserver.com/) for transparent proxy access from a BigBoat
dashboard. Several subinstances handle specific domain names. This is only
used in an environment where access has to be routed through several VLANs.
- [NGINX](https://www.nginx.com/) for proxy access and/or static file hosting
from a central server listening on specific ports. It can also use Host-based
proxying. Complete, direct hosting is possible but may require additional,
separate server configuration.
- [Apache](https://httpd.apache.org/) as an alternative to NGINX, for proxy
access and/or static file hosting from a central server listening on specific
ports. It can also use Host-based proxying. Complete, direct hosting is
possible but may require additional, separate server configuration.
Many of the configuration files and web files use
[Mustache](https://www.npmjs.com/package/mustache) in order to use
configuration items and functions to adjust to other environments.
Additionally, this repository contains a Shell script `goaccess-report.sh` that
can be run periodically to generate a server statistics report. It requires
installation of [GoAccess](https://goaccess.io/) which analyzes logs and
creates an analytics dashboard.
Separate documentation exists for more details on how the second proxy layer
works.
## Dependencies
While the build of the visualization site may work with simply having Docker
installed, the test and production environments will need a more full-fledged
installation of other dependencies.
For the tests, a Jenkins installation is assumed with proper OpenJDK Java 8+
and Python 3.6+ (including `distutils` and `virtualenv`). The agent that
performs the tests must have Docker Compose V2. In addition, tools such as
Bash, Git, `jq`, `awk`, `sed`, `grep` and `xargs` must be available and
GNU-like (POSIX-compliant may not be enough). A SonarQube Scanner must be
registered in Jenkins. The server agent that performs the publishing must have
Docker Compose V2, Node.js, `curl` and `rsync` as well. Details for configuring
Jenkins servers and agents are outside the scope of this documentation,
although some details may be available in other GROS documentation. The
integration tests may be able to be run outside a Jenkins job, but support for
this is limited and it requires passing additional environment variables.
The deployment assumes an NGINX or Apache service and potentially a Docker
Compose installation some which can be set up using (some of) the configuration
files that we build from the templates in this repository.
## Tests
This repository contains integration tests for the visualizations. The entire
visualization proxy layer setup is emulated within a `docker-compose` network,
with each deployed visualization within its own Docker instance running its
Node.js development web service. A Caddy proxy pretends to be the Jenkins
server with access to the HTML reports and artifacts.
The tests themselves are run with a Selenium instance using a Chromium browser,
which is being controlled by a Python runner. The tests are in the form of
Python unit tests which instruct the Selenium instance to navigate to pages and
perform actions such as clicking. The unit tests then check if certain elements
are visible, contain certain text, or have other properties, and the test
passes if this is the case.
The visualizations use their test builds which contain coverage tracking, which
is extracted and combined after the test completes. In addition to coverage,
Sonar scans check for code smells, and a dependency check searches for
vulnerabilities.
At the end of each test, more actions take place. Logs are stored and
a screenshot is made of the page, so that there is a visual reference of the
state of the page in case a test fails. Finally, an accessibility testing
engine is used to verify if the contents of the page conform to various WCAG
rules. All of these results are combined as well into a report.
## Configuration
The visualization dashboard, connections to other servers as well as common
elements such as a navigation bar, are all configured in this repository. The
main configuration point is usually the `config.json` file, which can be copied
from `lib/config.json` to the root of the repository in order to make local
adjustments. The following configuration items (all strings, unless noted
otherwise) are known:
- `visualization_url`: The URL to the visualizations. This may include
a protocol and domain name, but does not need to in case all the resources
are hosted on the same domain. The remainder is a path to the root of the
visualizations, where the dashboard is found and every visualization (except
predictions) has sub-paths below it.
- `prediction_url`: The URL to the prediction site. This may include
a protocol and domain name, but does not need to in case all the resources
are hosted on the same domain. The remainder is a path to the root of the
predictions (without any specific paths below it).
- `base_url`: The absolute URL to the base website where other static resources
are located. This is used to link to the base website in the navigation bar
and to make certain URLs absolute like schema identifiers and canonical URLs.
If it is left empty, then some of those URLs will not have a protocol and
domain name or use the current location as base.
- `blog_url`: The URL to the blog. This may include a protocol and domain name,
but does not need to in case all the resources are hosted on the same domain.
The remainder is a path to the root of the blog. If this is empty, then no
blog is available.
- `discussion_url`: The URL to the discussion forum. This may include
a protocol and domain name, but does not need to in case all the resources
are hosted on the same domain. The remainder is a path to the root of the
forum. If this is empty, then no forum is available.
- `download_url`: The URL to default download links on the dashboard. This can
be overwritten by specific download URLs by the visualizations and an altered
value may work with the redirections provided by the NGINX or Apache proxy,
so care should be taken when adjusting this. This may be helpful when direct
access to downloads are available. If this is empty, then no download links
are shown on the dashboard.
- `jira_url`: URL that is used in some navigation bars to link to a Jira
instance.
- `blog_host`: Domain name of an internal server where the blog is hosted.
- `blog_server`: Domain name of an external server that provides access to the
blog. This name should point (possibly via a Caddy proxy) toward the NGINX or
Apache proxy that makes the server available.
- `discussion_host`: Domain name of an internal server where the discussion
forum is hosted.
- `discussion_server`: Domain name of an external server that provides access
to the discussion forum. This name should point (possibly via a Caddy proxy)
toward the NGINX or Apache proxy that makes the server available.
- `visualization_server`: Domain name of an external server that provides
access to the visualization dashboard and every visualization (except
predictions). This name should point (possibly via a Caddy proxy) toward the
NGINX or Apache proxy that makes the server available.
- `www_server`: Domain name of an external server that listens on a "www"
address. This server redirects to the visualization server. This name should
point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that
makes the redirection service available.
- `prediction_server`: Domain name of an external server that provides access
to the predictions. This name should point (possibly via a Caddy proxy)
toward the NGINX or Apache proxy that makes the server available.
- `hub_organizations` (array): When multiple organizations are hosted in the
same environment, an array of objects containing organizations, navigation
data and branch names can be added to make them available cross-builds. For
tests, they provide a build and serve dummy data for visualizations on
multiple branches. The test routes only consider the `visualization-site` and
`prediction-site` as keys in each object, and their values should provide
a branch name that the `visualization_branch` and `prediction_branch` may
respectively point to for NGINX testing, or the branch names generated using
`hub_mapping` for Apache testing. The organizations objects in this array
also define how items in a navigation bar dropdown should appear, with
titles, content/alternative text locales, images (plus width/height/style
attributes) and external URLs. Finally, a key `visualizations` may have an
array value with the visualizations that the organization's hub hosts.
- `hub_regex`: When multiple organizations are hosted in the same environment,
a regular expression can be used to match the organization name which must
occur at the start of the path, and place the matched parts into variables
for later use in rewrite rules of NGINX or Apache. Use `(?...)`
for capturing matches for compatibility across proxy servers. To avoid
renumbering issues, this regular expression should not contain unnamed match
captures `(...)`.
- `hub_mapping` (object): Groups of environment variable names to replace the
matched substrings from the regular expression from `hub_regex` in. Only used
for Apache, when hosting multiple organizations in the same environment.
Group keys can be "hub", "visualization" and "prediction", corresponding to
the visualization-site itself, the visualizations and the prediction-site,
respectively. Each variable within the group has an object with "input",
"default", and "output" keys, which are used to build a `RewriteMap`. The
input and default can refer to other variables with dollar signs. The output
mapping cannot contain spaces or empty strings in both the keys and values.
In addition to the `hub_regex` variables, the "prediction" group may also
transform a `branch_organization` variable to map paths to organizations in
(combined) prediction setups. The mapping is also used for determining the
available prediction path setups on the server for the OpenAPI specification,
and for copying the published visualizations on master branches to the proper
default organizations. In NGINX, one can instead alter variables captured
from regular expressions using `hub_branch`, `visualization_branch` and
`prediction_branch`.
- `branch_maps_path`: Filesystem path where the rewrite maps for the Apache
configuration are expected to reside. The map files are written to
`httpd/maps` in this repository, but may be mapped to another absolute path.
This affects the path within the Docker instance during the `docker-compose`
paths, but can also be helpful when using (portions of) the configuration on
an Apache server.
- `hub_redirect`: When multiple organizations are hosted in the same
environment, variables from a matched organization at the start of the path
using `hub_regex` can be used in a rewrite that redirects to another URL. It
is assumed that this configuration value produces an absolute URL. For NGINX,
use dollar-sign variables; for Apache, use environment variables named with
`hub_mapping` with `%{...}` syntax.
- `hub_branch`: Inject some processing steps in the NGINX configuration for the
visualizations hub. This should at least determine the branch of a Jenkins
build to use for the visualization site. When multiple organizations are
hosted in the same environment, variables from a matched organization at the
start of the path using `hub_regex` can be used in further processing.
- `visualization_branch`: Inject some processing steps in the NGINX
configuration for a visualization. This should at least determine the branch
of a Jenkins build to use for a visualization. When multiple organizations
are hosted in the same environment, variables from a matched organization at
the start of the path using `hub_regex` can be used in further processing.
- `prediction_branch`: Inject some processing steps in the NGINX configuration
for the prediction site. This should at least determine the branch of
a Jenkins build to use for the visualization or prediction site. When
multiple organizations are hosted in the same environment, variables from
a matched organization at the start of the path using `hub_regex` can be used
in further processing.
- `jenkins_host`: Domain name of an internal server where the Jenkins build
system is hosted.
- `jenkins_path`: Path that the Jenkins build system is hosted below. This is
useful in situations where Jenkins is hosted on the same domain as other
resources and is therefore within a path rather than directly at the root of
a domain.
- `jenkins_direct`: Filesystem path of a location where a copy of the
visualizations and predictions are available. If this is not empty, the
structure within the path must follow that which the `copy.sh` script makes
and supported requests are internally rewritten to this path. If this is
empty, then requests for visualizations and predictions are proxied to the
Jenkins server.
- `jenkins_direct_url`: URL through which the Jenkins server is available from
the location of the build (a Jenkins node with the 'publish' tag, most likely
the server itself), when the visualizations and predictions are copied.
- `jenkins_direct_cert`: Filesystem path to a certificate used for validating
a HTTPS connection to the Jenkins server, when the visualizations and
predictions are copied (for branch information).
- `jenkins_api_token`: Encrypted token that can be used for basic authorization
against the Jenkins API for at least branch details of builds.
- `files_host`: Domain name of an internal server where an ownCloud instance is
hosted.
- `files_share_id`: Identifier of a published share on an ownCloud instance
with files that are made available in addition to the prediction resources.
If this is an empty string, then the paths to the files list and specific
files are not passed through to ownCloud from the NGINX or Apache proxy.
- `control_host`: Domain name of an internal server where secure resources are
hosted, including encryption services and access control checks. This domain
must be accessible through HTTPS from the NGINX or Apache proxy. This can be
set to an empty string to disable proxy connections to endpoints for
encrypting names of developers and checking project group access, as used in
some visualizations. An empty string also makes the Caddy proxy configuration
unusable, so only leave this empty if a portion of the configuration is used.
- `websocket_server`: Domain name of an external server where a WebSocket for
real-time updates of access log analytics is hosted. This name should point
(possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the
WebSocket service available (via the GoAccess script).
- `proxy_nginx` (boolean): Whether to use NGINX to provide access to the other
servers or host the direct files. If set to false, use Apache HTTP Server
instead. Affects the test as well as which configuration is generated.
- `proxy_range`: CIDR range of trusted IP addresses that may host the first
layer of proxies in front of the NGINX or Apache proxy, for example the Caddy
proxies. Requests from these addresses may provide headers with the real IP
address of the original request, which are used instead of the proxy's IP
address.
- `proxy_port_in_redirect` (boolean): Whether to make the NGINX or Apache proxy
specify the port number when a redirect is generated. If a Caddy proxy is in
front of it, then this is most likely not wanted. Similarly, if only portions
of the configuration files are used, then this is also not useful to enable.
- `auth_cert`: Filesystem path to a certificate used for validating the HTTPS
connection to the `control_host`. If a `$SERVER_CERTIFICATE` environment is
not set, then this is also used as the path on the Docker host machine during
the `docker-compose` tests to provide this path within the Docker instance.
- `allow_range` (array): CIDR ranges of IP addresses that are allowed to access
the access log analytics.
- `goaccess_path`: Filesystem path to the location where the GoAccess interface
is hosted. The directory should contain an `analytics` subpath which holds
the actual web interface. This can be set to an empty string to disable the
GoAccess endpoint.
- `goaccess_log_path`: Filesystem path to log file location. Access logs stored
in this path (including rotated and possibly GZip-encoded logs) are followed.
- `swagger_openapi_url`: URL prefix to use for the OpenAPI specification files
that are made available in the Swagger UI. The default value of `./` works
well for the Docker instance, but if the OpenAPI files are meant to be stored
elsewhere then another path is possible. For production builds of a static
file hosting setup with `jenkins_direct`, this value is set to the absolute
path of the direct hosting root, and additional OpenAPI files rather than
just the prediction API are made available.
- `swagger_validator_url`: URL to use to connect to the validator. The default
value of `/validator` works well for the Docker compose network. If it is set
to an empty string, then the online validator is used. Can be set to `none`
to disable the validator. For production builds of a static file hosting
setup with `jenkins_direct`, we use the online validator.
Configuration items that have keys ending in `_url` may be processed to direct
toward an organization-specific path, in case multiple organizations are hosted
in the same environment. The value is searched for the substring
`$organization`, possibly after slashes. These can be replaced with the actual
organization that the build is for. In some cases, it is removed only to allow
NGINX or Apache rules to add a supported variant of it in front of the path
using `hub_regex`, and in other cases, it may be preserved to be adjusted at
a later moment within the visualization or hub. Specific [environment
variables](#environment-variables) determine how the substring `$organization`
is replaced.
Note that configuration items that have keys ending in `_host` or `_server` may
be set to "fake" values when only a portion of the final NGINX or Apache
configuration is actually used, for example when some resources are not made
available. The configuration values should still be set to valid domain names
so that they can be used within the `docker-compose` network during the tests.
### Environment variables
Several environment variables may be used when calling `npm run` or the Shell
scripts, and may affect various contexts, such as resource paths in the HTML
pages, the JavaScript-based navigation bar, the proxy server configuration and
the test environment. Boolean environment variables must be set to `true` in
order to take effect.
- `$VISUALIZATION_ORGANIZATION`: Determines the organization to use within the
URLs defined in the configuration.
- `$VISUALIZATION_COMBINED` (boolean): Determines whether to replace the
organization within the URL with `/combined`.
- `$NAVBAR_SCOPE`: Determines the navigation bar override file to use. If the
file `navbar.$NAVBAR_SCOPE.js` exists, then it is loaded in place of the
generated `navbar.json` file. See [files](#files) for more details.
- `$VISUALIZATION_ANONYMIZED` (boolean): Determines whether to show
a notification on the HTML page indicating that the visualizations contain
anonymized data and that some functionality is limited or missing.
- `$VISUALIZATION_SITE_CONFIGURATION`: Path to the main configuration file,
defaulting to `config.json`. If this file does not exist, then
`lib/config.json` is used.
- `$PREDICTION_CONFIGURATION`: Path to the configuration file of the
`prediction-site` visualization, which is then used during tests. By default,
the prediction's default configuration is used, which may be incompatible in
some complex combined/organization setups. Other visualizations are tested
with default configuration.
- `$VISUALIZATION_NAMES`: Optional space-separated list of repository names of
visualizations (excluding `visualization-site` itself) that should be
included in the navigation bar, proxy configuration, test environment setup,
and publishing phase. By default, all visualizations listed in the file
`visualizations.json` are considered.
- `$VISUALIZATION_MAX_SECONDS` (integer): Number of seconds to wait before are
visualizations are compiled and made available during the test setup. By
default, 60 seconds is allocated, which may be too short for some nodes.
- `$REPO_ROOT`: Directory to store the Git repositories of the visualizations
during the test setup. Relative to the current directory. By default, `repos`
is created as a subdirectory. If another directory is used, then existing
clones are left as-is with no up-to-dateness checks and dependencies are
backed up so that they do not interfere with the test setup.
- `$SERVER_CERTIFICATE`: Path to the HTTPS certificate to use in the test setup
when requesting upstream resources for encryption and access checks. By
default, this is the `auth_cert` from the configuration, but this can be set
to use a different path at the host device than in the Docker test setup.
- `$PUBLISH_PRODUCTION`: A boolean variable which indicates whether to perform
a copy for potential publication from the current feature branch instead of
only doing so when on a main branch. Note that main branches are skipped in
this case, so the publication may not have organization-specific paths.
- `$JENKINS_HOME`: Provided by
[Jenkins](https://www.jenkins.io/doc/book/managing/system-properties/#jenkins_home)
and used by the `copy.sh` script as the root from which to collect artifacts
and visualization HTML reports for publishing.
- `$BRANCH_NAME`: Provided by [Jenkins Multibranch
Pipeline](https://www.jenkins.io/doc/book/pipeline/multibranch/#additional-environment-variables)
and used by the test environment in order to separate Docker resources when
building dependencies for visualizations under test.
- `$BUILD_NUMBER`, `$BUILD_TAG`, `$BUILD_URL`, `$NODE_NAME`: Provided by
[Jenkins](https://www.jenkins.io/doc/book/pipeline/jenkinsfile/#using-environment-variables)
and used by the test environment in order to track build context for
reporting and naming purposes.
### Files
Whereas the configuration file, usually located at `config.json`, is likely
necessary to be copied (from `lib/config.json`) and changed, there are other
files within the repository that can be modified to adjust what is available on
the visualization site. We will briefly introduce these files, as adjustments
should be considered more like code changes.
The navigation bar is configured in `navbar.json.mustache`, with optional
contextual overrides in `navbar.$NAVBAR_SCOPE.js`. The format is defined in the
`@gros/visualization-ui` package for the `Navbar` class, with the addition that
URLs can have `$organization` substrings replaced, and various Mustache
operations can take place to fill the navigation bar with menus/links defined
elsewhere, such as visualizations and organizations. `navbar.$NAVBAR_SCOPE.js`
should, if used, refer to `navbar.json` and use simple JavaScript operations to
augment it, though the use of this file may not be necessary when specifics for
an organization can be set by changing other files, such as `config.json` and
`visualizations.json`.
The available visualizations are configured in `visualizations.json`. The
structure is based on the layout of the dashboard, but the items defined in it
also determine which visualizations are actually made available in the
navigation bar, proxy server configuration as well as within the tests. The
JSON object is used as a Mustache structure within the index template and the
navigation bar, and so they may contain Mustache items to refer to certain
configuration items, such as URLs.
Localization of the visualization site is in `lib/locales.json`. The messages
in it are used when referred from JavaScript code using the `Locales` class
from the `@gros/visualization-ui` package, or when using the `data-message`
attribute within the HTML templates.
## Running
The visualization can be built using Node.js and `npm` by running `npm install`
and then either `npm run watch` to start a development server that also
refreshes browsers upon code changes, or `npm run production` to create
a minimized bundle. The resulting HTML, CSS and JavaScript is made available in
the `www` directory.
As mentioned in the [dependencies](#dependencies), this repository also
contains a `Dockerfile` specification for a Docker image that can perform the
installation of the app and dependencies, which allows building the
visualization site within there, removing the need for a global Node.js
installation. The `Jenkinsfile` contains appropriate steps for a Jenkins CI
deployment, including the tests, visualization building and publishing.
During the build, JavaScript in order to construct a common navigation bar is
separated from the main JavaScript bundle in `vendor.js`. The visualizations
and prediction site also refer to this file to display the navigation bar.
A non-static production environment can make use of the generated proxy server
configuration in order to deploy the reverse proxy layer(s) that allow access
to all the visualizations and other resources. Optionally, the `caddy` docker
compose file can be set up. Next, either the main `nginx.conf` plus the files
generated in the `nginx` directory can be supplied to the NGINX service, or
likewise with `httpd.confg` and the generated files in the `httpd` directory
for the Apache service. Alternatively, a subset of these files may be used, for
example to host all under one domain, with additional local configuration.
Documentation based on JSON schema specification files from various GROS
repositories can be built with the `doc` directory. This requires installing
the some Python dependencies, preferably first setting up a `virtualenv`, then
running `pip install -r doc/requirements.txt`. The file `doc/source/conf.py`
contains configuration for the documentation, including paths or URLs to the
schema files, which can be altered to use local paths. When run on the Jenkins
server to create static documentation, these paths may be defined through the
`doc.sh` script which locates the JSON schemas as part of archives and modules
available in the current workspace and beyond.
A static production environment can make use of the result of the Bash script
`copy.sh` when run on the Jenkins server in order to create a document root
directory with the organizational hubs, all the visualizations, JSON schemas,
OpenAPI specifications and Swagger UI available for publishing on a static file
hosting server. This server may make use of the NGINX or Apache configurations,
which in their compiled form properly route the prediction site and error
pages, among others. Like other parts, the `copy.sh` script requires specific
[configuration](#configuration) for mapping hub paths, selecting organizations,
producing paths and URLs and accessing Jenkins for the publishable
visualizations, archived files and prediction branches.