An open API service indexing awesome lists of open source software.

https://github.com/quarkusio/search.quarkus.io

Search backend for Quarkus websites
https://github.com/quarkusio/search.quarkus.io

Last synced: 4 months ago
JSON representation

Search backend for Quarkus websites

Awesome Lists containing this project

README

        

= Quarkus.io Search

> A Quarkus-powered application that allows full-text search in websites of the Quarkus organization such as https://quarkus.io.

[[architecture]]
== Architecture

The application is deployed on an OpenShift cluster, next to an Elasticsearch instance.

It fetches the sources from quarkus.io and localized variants (pt.quarkus.io, ...) to index them,
and exposes the search feature through a REST API.

The webpage at https://quarkus.io/guides calls that API to perform search.

// Source if you need to make changes:
// https://miro.com/app/board/uXjVNtX8LlY=/?share_link_id=204315078797

image::architecture.png[Architecture diagram]

[[development]]
== Development

[[development-sample]]
=== Using the built-in quarkus.io sample

[NOTE]
====
By default, this will use a <>,
which has the advantage of being faster to index on startup,
but is incomplete.

To index the actual data, see <>.
====

To launch the application in dev mode:

[source,shell]
----
quarkus dev
# OR
./mvnw quarkus:dev
----

Then head over to http://0.0.0.0:9000/q/swagger-ui/ to try the service.

[[development-full]]
=== Using the actual quarkus.io data

To use the actual quarkus.io data without downloading it on each startup,
you should clone the https://github.com/quarkusio/quarkusio.github.io[quarkus.io repository]
manually and add a `.env` file to your working directory:

[source,properties]
----
_DEV_QUARKUSIO_GIT_URI=file:/home/myself/path/to/quarkusio.github.io
_DEV_QUARKUSIO_LOCALIZED_JA_GIT_URI=file:/home/myself/path/to/ja.quarkus.io
_DEV_QUARKUSIO_LOCALIZED_ES_GIT_URI=file:/home/myself/path/to/es.quarkus.io
_DEV_QUARKUSIO_LOCALIZED_PT_GIT_URI=file:/home/myself/path/to/pt.quarkus.io
_DEV_QUARKUSIO_LOCALIZED_CN_GIT_URI=file:/home/myself/path/to/cn.quarkus.io
# Avoid reindexing on startup if indexes already contain data.
# You can remove this if you're fine with waiting a few minutes on each live reload.
_DEV_INDEXING_ON_STARTUP_WHEN=indexes-empty
----

Then the application will use data from that clone when indexing on startup.

If you also intend to run quarkus.io locally,
and want links of search hits to redirect to your local instance of quarkus.io,
you should also add this property:

[source,properties]
----
_DEV_QUARKUSIO_WEB_URI=http://localhost:4000
----

By default, in dev mode, any parsing/indexing warnings and errors are going to be simply logged.
But in case you want to get index errors reported to a GitHub issue, the following properties should be added:
[source,properties]
----
_DEV_INDEXING_REPORTING_TYPE=github-issue
# see about tokens https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
INDEXING_REPORTING_GITHUB_TOKEN={your-generated-github-token}
INDEXING_REPORTING_GITHUB_ISSUE_REPOSITORY={your-github-user}/search.quarkus.io
INDEXING_REPORTING_GITHUB_ISSUE_ID={github-issue-id-in-your-repository}
INDEXING_REPORTING_GITHUB_WARNING_REPEAT_DELAY=10m
----

=== Updating the built-in docs.quarkiverse.io sample

By default, the app will use the built-in docs.quarkiverse.io sample containing a snapshot of docs.quarkiverse.io.

Quarkiverse guides are indexed from the GitHub workflow artifact created as part of
the https://github.com/quarkiverse/quarkiverse-docs/actions/workflows/publish.yml[publish action].
To update the included sample: go to the https://github.com/quarkiverse/quarkiverse-docs/actions/workflows/publish.yml?query=is%3Asuccess[page],
open the first run at the top (latest successful execution) and download the `github-pages` at the bottom of the page.

GitHub has a 100mb limit on the files we can include in the repository. This means that the downloaded artifact has to be cleaned up
a bit. Unpack the archive and remove some of the extensions (in particular try dropping the ones taking the most space, e.g. `quarkus-backstage`.).
Make sure to keep Amazon S3 guides as those are used in tests.
Once you are done removing extensions, archive remaining guides back and replace the existing file.

=== Downloading the latest docs.quarkiverse.io artifact

If during the development there's a need to use a fresh docs.quarkiverse.io artifact downloaded by the app itself during the indexing process,
add the following properties to your `.env` file:

[source,properties]
----
_DEV_QUARKIVERSEIO_SOURCE=github-artifact
# see about tokens https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
QUARKIVERSEIO_GITHUB_ARTIFACT_TOKEN={your-generated-github-token}
----

[[testing]]
== Testing

Testing (and, by default <>) will use a small sample of quarkus.io included in this repository
at `src/test/resources/quarkusio-sample.zip` (alongside with the localized site samples such as `quarkusio-sample-cn.zip`,
`quarkusio-sample-es.zip`, `quarkusio-sample-ja.zip`, `quarkusio-sample-pt.zip`).

This sample must itself be a git repository, so generating it can get technical.

Fortunately, there is a tool included in this project to generate the sample.
To update the sample:

* Add the references of the guides you want included in the sample to `GuidesRef`.
* Run `QuarkusIOSample.main()` using your IDE,
with the project root as the current working directory.
+
If you set up your `.env` file as descripted in <>,
you don't need to pass any argument.
+
Otherwise, pass as the path to your clone of quarkus.io,
followed by pairs of language code and path to corresponding localized site,
e.g.`path/to/quarkus.io ja path/to/ja.quarkus.io es path/to/es.quarkus.io [...]`.
Add pairs for all `cn`, `es`, `ja`, `pt` language codes to update all samples at the same time.

[[production]]
== Production

=== OpenShift configuration

The production environment uses OpenShift.
You can generate the OpenShift configuration by running this:

[source,shell]
----
quarkus build -DskipTests
# OR
./mvnw package -DskipTests
----

The generated file will be in `target/kubernetes/openshift.yml`.

=== Running it with podman

If you want to start a (production) container locally with podman, you can build one with the following command:

[source,shell]
----
quarkus build -DskipTests -Dquarkus.container-image.build=true
# OR
./mvnw install -DskipTests -Dquarkus.container-image.build=true
----

Then start it this way:

[source,shell]
----
podman pod create -p 8080:8080 -p 9000:9000 -p 9200:9200 --name search.quarkus.io
# Start multiple Elasticsearch containers
podman container run -d --name search-backend-0 --pod search.quarkus.io \
--cpus=2 --memory=2g \
-e "node.name=search-backend-0" \
-e "discovery.seed_hosts=localhost" \
-e "cluster.initial_cluster_manager_nodes=search-backend-0,search-backend-1,search-backend-2" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "xpack.security.enabled=false" \
-e "cluster.routing.allocation.disk.threshold_enabled=false" \
elasticsearch-custom:latest
podman container run -d --name search-backend-1 --pod search.quarkus.io \
--cpus=2 --memory=2g \
-e "node.name=search-backend-1" \
-e "discovery.seed_hosts=localhost" \
-e "cluster.initial_cluster_manager_nodes=search-backend-0,search-backend-1,search-backend-2" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "xpack.security.enabled=false" \
-e "cluster.routing.allocation.disk.threshold_enabled=false" \
elasticsearch-custom:latest
podman container run -d --name search-backend-2 --pod search.quarkus.io \
--cpus=2 --memory=2g \
-e "node.name=search-backend-2" \
-e "discovery.seed_hosts=localhost" \
-e "cluster.initial_cluster_manager_nodes=search-backend-0,search-backend-1,search-backend-2" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "xpack.security.enabled=false" \
-e "cluster.routing.allocation.disk.threshold_enabled=false" \
elasticsearch-custom:latest
# Then the app; this will fetch the actual data on startup (might take a while):
podman container run -it --rm --name search.quarkus.io --pod search.quarkus.io search-quarkus-io:999-SNAPSHOT
# OR, if you already have locals clones of *.quarkus.io:
# (you might need to run quarkus dev with those repos first to get them all in sync)
REPOS_DIR=$HOME/path/to/dir/containing/repos
podman container run -it --rm --name search.quarkus.io --pod search.quarkus.io \
--cpus=1 --memory=1g \
-v $REPOS_DIR/quarkusio.github.io:/mnt/quarkus.io:ro,z \
-v $REPOS_DIR/cn.quarkus.io:/mnt/cn.quarkus.io:ro,z \
-v $REPOS_DIR/es.quarkus.io:/mnt/es.quarkus.io:ro,z \
-v $REPOS_DIR/ja.quarkus.io:/mnt/ja.quarkus.io:ro,z \
-v $REPOS_DIR/pt.quarkus.io:/mnt/pt.quarkus.io:ro,z \
-e INDEXING_REPORTING_TYPE=log \
-e GITHUB_OAUTH=ignored \
-e GITHUB_STATUS_ISSUE_ID=1 \
-e QUARKUSIO_GIT_URI=file:/mnt/quarkus.io \
-e QUARKUSIO_LOCALIZED_CN_GIT_URI=file:/mnt/cn.quarkus.io \
-e QUARKUSIO_LOCALIZED_ES_GIT_URI=file:/mnt/es.quarkus.io \
-e QUARKUSIO_LOCALIZED_JA_GIT_URI=file:/mnt/ja.quarkus.io \
-e QUARKUSIO_LOCALIZED_PT_GIT_URI=file:/mnt/pt.quarkus.io \
search-quarkus-io:999-SNAPSHOT
----

[[deployment]]
== Deployment

=== Current process

Maintainers can review the application and update configuration/secrets on the OpenShift console.

There are two namespaces containing two separate deployments at the moment:

* Production (`production` branch):
** Console: https://console-openshift-console.apps.ospo-osci.z3b1.p1.openshiftapps.com/k8s/cluster/projects/prod-search-quarkus-io
** Search Web UI: none; that's on purpose, people should use https://quarkus.io/guides
** Endpoint: https://search.quarkus.io/api/guides/search
** Endpoint spec (OpenAPI): https://search.quarkus.io/api/openapi
** Indexing status reports: https://github.com/quarkusio/search.quarkus.io/issues/130
* Staging (`main` branch):
** Console: https://console-openshift-console.apps.ospo-osci.z3b1.p1.openshiftapps.com/k8s/cluster/projects/dev-search-quarkus-io
** Search Web UI (for testing/debugging purposes): https://search-quarkus-io-dev-search-quarkus-io.apps.ospo-osci.z3b1.p1.openshiftapps.com/
** Endpoint: https://search-quarkus-io-dev-search-quarkus-io.apps.ospo-osci.z3b1.p1.openshiftapps.com/api/guides/search
** Endpoint spec (OpenAPI): https://search-quarkus-io-dev-search-quarkus-io.apps.ospo-osci.z3b1.p1.openshiftapps.com/api/openapi
** SwaggerUI: https://search-quarkus-io-dev-search-quarkus-io.apps.ospo-osci.z3b1.p1.openshiftapps.com/q/swagger-ui/
** Indexing status reports: https://github.com/quarkusio/search.quarkus.io/issues/131

Deployment will happen automatically when pushing to the relevant branch.

Be careful about which configuration you change in the UI,
as deployment may overwrite part of the topology.

=== Setting it up

Most of the process is automated, but if you need to deploy to a new cluster,
you will need to set up a few things manually:

1. Service account for GitHub Actions deployment.
The account credentials (username/token) need to be registered as GitHub Actions secrets,
as well as the cluster URI.
See `.github/workflows/deploy.yml`.
2. Namespace
The OpenShift namespace needs to be registered as a GitHub Actions environment variable.
See `.github/workflows/deploy.yml`.
3. Config maps and secrets.
`search-quarkus-io-config`::
Environment variables for the application.
+
Put in there whatever configuration you need for your specific cluster.
+
In particular:
* `GITHUB_STATUS_ISSUE_ID`: The number of an issue on quarkusio/search.quarkus.io
where indexing status should be reported.
See `indexing.reporting.github` configuration properties for more details.
`search-quarkus-io-secret`::
Secret environment variables for the application.
+
Put in there whatever secret configuration you need for your specific cluster.
+
In particular:
* `GITHUB_OAUTH`: a GitHub token that allows commenting/reopening/closing a GitHub issue
on quarkusio/search.quarkus.io.
See `indexing.reporting.github` configuration properties for more details.
`search-backend-config`::
Environment variables for the Elasticsearch instances.
+
Put in there whatever configuration you need for your specific cluster.
`search-backend-secret`::
Secret environment variables for the Elasticsearch instances.
+
Put in there whatever secret configuration you need for your specific cluster.

[[license]]
== License

This project is licensed under the Apache License Version 2.0.

The web assets in `src/main/resources/web` are licensed under the Creative Commons Attribution 3.0 International License.