Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vy/reactor-pubsub

Google Pub/Sub Java driver for mortals.
https://github.com/vy/reactor-pubsub

google-cloud java pubsub queueing reactive

Last synced: 2 months ago
JSON representation

Google Pub/Sub Java driver for mortals.

Awesome Lists containing this project

README

        

https://github.com/vy/reactor-pubsub/actions[image:https://github.com/vy/reactor-pubsub/workflows/CI/badge.svg[Actions Status]]
https://search.maven.org/search?q=g:com.vlkan%20a:reactor-pubsub[image:https://img.shields.io/maven-central/v/com.vlkan/reactor-pubsub.svg[Maven Central]]
https://www.apache.org/licenses/LICENSE-2.0.txt[image:https://img.shields.io/github/license/vy/reactor-pubsub.svg[License]]

Finally a https://cloud.google.com/pubsub[Google Cloud Pub/Sub] Java 8 driver
that you can wrap your head around its internals and put the fun (i.e.,
backpressure-aware, reactive, efficient, batteries-included, and *simple*!) back
into messaging.

- It is *backpressure-aware*, because it only pulls batch of messages whenever
you need one and tell it to pull one. It doesn't operate a covert thread
(pool) to pull whenever it sees a fit and ram it through the application.

- It is *reactive*, because every request is non-blocking, asynchronous, and
wired up with the rest of the application using
http://www.reactive-streams.org[Reactive Streams].

- It is *efficient*, because it works in batches, the basic unit of message
exchange between your driver and Pub/Sub. You `pull()` some, you `ack()` some.
One-message-at-a-time is an illusion created by drivers and incurs a
significant performance penalty along with operational complexity under the
hood.

- It is *batteries-included*, because it provides goodies (out of the box
metrics integration, an adaptive rate limiter to help you avoid burning money
by continuously pulling and nack'ing messages when something is wrong with
your consumer, a `ScheduledExecutorService` implementation with a bounded task
queue to mitigate backpressure violating consumption) that assist real-world
production deployments.

- It is *simple*, because there are 2 dozens of classes where half is used to
represent JSON models transmitted over the wire and the rest is just reactive
streams gluing. There are no smart retry mechanisms (though thanks to reactive
streams, you can introduce yours), there are no message lease time extending
background tasks (hence, you better consume your batch before it times out),
etc. Spend a minute on the source code and you are ready to write your own
Pub/Sub driver!

Due to the constraints on the development resources (read as, _"it is just
me"_), I needed to take some shortcuts to get this project going:

- https://github.com/reactor/reactor-core/[Reactor Core] for Reactive Streams

- https://github.com/reactor/reactor-netty[Reactor Netty] for HTTP requests

- https://github.com/googleapis/google-auth-library-java[Google Auth
Library]footnote:[This could have been replaced with a more lightweight
alternative, but given you have already been using Pub/Sub, it is highly
likely that you already sold your soul to some other Google Cloud services
too. Hence, no need to introduce an extra dependency.] for authentication

- https://github.com/FasterXML/jackson-databind[Jackson]footnote:[https://github.com/googleapis/google-api-java-client[Google
APIs Client Library] already depends on Jackson.] for JSON serialization

- http://micrometer.io/[Micrometer] for metrics _(optional)_

== Getting started

You first need to add the `reactor-pubsub` artifact into your list of
dependencies:

```xml

com.vlkan
reactor-pubsub
${reactor-pubsub.version}

```

(Note that the Java 9 module name is `com.vlkan.pubsub`.)

You can create a publisher and publish a message as follows:

```java
// Create the publisher.
String projectName = "awesome-project";
PubsubPublisherConfig publisherConfig = PubsubPublisherConfig
.builder()
.setProjectName(projectName)
.setTopicName("awesome-topic")
.build();
PubsubPublisher publisher = PubsubPublisher
.builder()
.setConfig(publisherConfig)
.build();

// Publish a message.
publisher
.publishMessage(
new PubsubDraftedMessage(
"Yolo like nobody's watching!"
.getBytes(StandardCharsets.UTF_8)))))
.doOnSuccess(publishResponse ->
System.out.format(
"Published awesome message ids: %s%n",
publishResponse.getMessageIds()))
.subscribe();
```

Note that `PubsubDraftedMessage` constructor has two variants:

- `PubsubDraftedMessage(byte[] payload)`
- `PubsubDraftedMessage(byte[] payload, Map attributes)`

`PubsubPublisher` provides the following auxiliary methods:

- `publishMessages(List messages)`
- `publishMessage(List message)`
- `publish(PubsubPublishRequest publishRequest)`

You can create a subscriber and start receiving messages from a subscription as
follows:

```java
// Create a puller.
String subscriptionName = "awesome-subscription";
PubsubPullerConfig pullerConfig = PubsubPullerConfig
.builder()
.setProjectName(projectName)
.setSubscriptionName(subscriptionName)
.build();
PubsubPuller puller = PubsubPuller
.builder()
.setConfig(pullerConfig)
.build();

// Create an acker.
PubsubAckerConfig ackerConfig = PubsubAckerConfig
.builder()
.setProjectName(projectName)
.setSubscriptionName(subscriptionName)
.build();
PubsubAcker acker = PubsubAcker
.builder()
.setConfig(ackerConfig)
.build();

// Pull and consume continuously.
puller
.pullAll()
.concatMap(pullResponse -> {
int messageCount = pullResponse.getReceivedMessages().size();
System.out.format("Just got awesome %d messages!%n", messageCount);
return acker.ackPullResponse(pullResponse);
})
.subscribe();
```

Note that `PubsubAcker` provides the following auxiliary methods:

- `ackPullResponse(PubsubPullResponse pullResponse)`
- `ackMessages(List messages)`
- `ackMessage(PubsubReceivedMessage message)`
- `ackIds(List ackIds)`
- `ackId(String ackId)`
- `ack(PubsubAckRequest ackRequest)`

== Utilities

The project ships a couple of utilities where you might find them handy in
assembling your messaging pipeline. Even though they are optional, we strongly
recommend their usage.

=== Rate limiter

We strongly encourage everyone to employ the provided rate limiter while
consuming messages. The rationale is simple: In order to avoid burning GCP bills
for nothing, you better cut down the consumption rate if the rest of the system
is indicating a failure.

`reactor-pubsub` provides the following utilities for rate limiting purposes:

- `RateLimiter` is a simple (_package local_) rate limiter.

- `StagedRateLimiter` is a rate limiter with multiple stages. Each stage is
composed of a _success rate_ and _failure rate_ pair. In the absence of
failure acknowledgements, excessive permit claims replace the active stage
with the next faster one, if there is any. Likewise, excessive failure
acknowledgements replace the active stage with the next slower one, if there
is any.

One can employ the `StagedRateLimiter` for a `PubsubPuller` as follows:

```java
// Create the staged rate limiter and its reactor decorator.
String stagedRateLimiterName = projectName + '/' + subscriptionName;
StagedRateLimiter stagedRateLimiter = StagedRateLimiter
.builder()
.setName(stagedRateLimiterName)
.setSpec("1/1m:, 1/30s:1/1m, 1/1s:2/1m, :1/3m") // (default)
.build();
StagedRateLimiterReactorDecoratorFactory stagedRateLimiterReactorDecoratorFactory =
StagedRateLimiterReactorDecoratorFactory
.builder()
.setStagedRateLimiter(stagedRateLimiter)
.build();
Function, Flux> stagedRateLimiterFluxDecorator =
stagedRateLimiterReactorDecoratorFactory.ofFlux();

// Employ the staged rate limiter.
puller
.pullAll()
.concatMap(pullResponse -> {
// ...
return acker.ackPullResponse(pullResponse);
})
.transform(stagedRateLimiterFluxDecorator)
.subscribe();
```

The stages are described in increasing success rate limit order using a
specification format as follows: `1/1m:, 1/30s:1/1m, 1/1s:2/1m, :1/3m`. The
specification is a comma-separated list of _[success rate limit]:[failure rate
limit]_ pairs where, e.g., `1/1h` is used to denote a rate limit of a single
permit per 1 hour. Temporal unit must be one of h(ours), m(inutes), or
s(econds). The initial failure rate limit and the last success rate limit can be
omitted to indicate no rate limits.) This example will result in the following
stages.

.`StagedRateLimiter` stages for specification `1/1m:, 1/30s:1/1m, 1/1s:2/1m, :1/3m`.
|===
| stage | success rate limit | failure rate limit

| 1
| 1/1m (once per minute)
| infinite

| 2
| 1/30s (once per 30 second)
| 1/1m (once per minute)

| 3
| 1/1s (once per second)
| 2/1m (twice per minute)

| 4
| infinite
| 1/3m (once per 3 minute)
|===

By contract, initially the active stage is set to the one with the slowest
success rate limit.

=== Bounded `SchedulerExecutorService`

`PubsubPuller`, `PubsubAccessTokenCache`, and
`StagedRateLimiterReactorDecoratorFactory` optionally receive either a
`ScheduledExecutorService` or a Reactor `Scheduler` in their builders for timed
invocations. One can explicitly change the implicit scheduler used by any
Reactor `Mono` or `Flux` as well. (See
https://projectreactor.io/docs/core/release/reference/#schedulers[Threading and
Schedulers] in Reactor reference manual.) We strongly suggest employing a common
dedicated scheduler for all these cases with a _bounded task queue_. That said,
unfortunately neither the default Reactor ``Scheduler``s nor the
`ScheduledExecutorService` implementations provided by the Java Standard library
allow one to put a bound on the task queue size. This shortcoming is severely
prone to hiding backpressure problems. (See the
http://cs.oswego.edu/pipermail/concurrency-interest/2019-April/016861.html[the
relevant concurrency-interest discussion].) To mitigate this, we provide
`BoundedScheduledThreadPoolExecutor` wrapper and strongly recommend to employ it
in your Reactor assembly line. Even though this will incur an extra thread
context switching cost, this is almost negligible for a majority of the use
cases and the benefit will overweight this minor expense. The usage is as simple
as follows:

```java
// Create the executor.
ScheduledThreadPoolExecutor executor =
new ScheduledThreadPoolExecutor(
Runtime.getRuntime().availableProcessors());
BoundedScheduledThreadPoolExecutor boundedExecutor =
new BoundedScheduledThreadPoolExecutor(100, executor);
Scheduler scheduler = Schedulers.fromExecutorService(boundedExecutor);

// Set the access token cache executor.
PubsubAccessTokenCache
.builder()
.setExecutorService(executor)
// ...
.build();

// Set the puller scheduler.
PubsubPuller puller = PubsubPuller
.builder()
.setScheduler(scheduler)
// ...
.build();

// Employ the scheduler in the Reactor pipeline.
puller
.pullAll()
.concatMap(pullResponse -> {
// ...
return acker.ackPullResponse(pullResponse);
})
.flatMap(this::doSomeOtherAsyncIO)
.subscribeOn(scheduler)
.subscribe();
```

== F.A.Q

=== How can I avoid stream termination when pull fails?

It is a common pitfall to build a message consumption pipeline as follows:

```java
puller
.pullAll()
.concatMap(pullResponse -> businessLogic
.execute(pullResponse)
.then(acker.ackPullResponse(pullResponse)))
.subscribe();
```

Here the `Flux` returned by `pullAll()` will be terminated
if any of the methods along the reactive chain (`pullAll()`,
`businessLogic.execute()`, `ack()`, etc.) throws an exception. No matter how
many `doOnError()`, `onErrorResume()` you plaster there, the damage has been
done, the subscription has been cancelled, and `pullAll()` will not continue
pulling anymore. Note that this applies to any
https://projectreactor.io/docs/core/release/reference/#flux[`Flux`] and nothing
new to the way we leverage it here. To prevent such premature stream
termination, you need to retry subscribing. While this can be done as simple as
calling `retry()`, you might also want to check out more fancy options like
`retryBackoff()`. As one final remark, make sure you deal (log?) with the error
prior to retrying.

=== How can I retry ack's?

See
https://projectreactor.io/docs/core/release/reference/#faq.exponentialBackoff[How
to use `retryWhen` for exponential backoff?] in Reactor reference manual.

=== How can I change the GCP credentials?

Unless one provided, all `PubsubPublisher`, `PubsubPuller` and `PubsubAcker`
classes use the `PubsubAccessTokenCache.getDefaultInstance()` and
`PubsubClient.getDefaultInstance()` defaults. By default,
`PubsubAccessTokenCache` leverages `GoogleCredentials.getApplicationDefault()`
provided by the `google-auth-library-oauth2-http` artifact. This function
determines the credentials by trying out the following steps in order:

. Credentials file pointed to by the `GOOGLE_APPLICATION_CREDENTIALS`
environment variable

. Credentials provided by the Google Cloud SDK `gcloud auth application-default
login` command

. Google App Engine built-in credentials

. Google Cloud Shell built-in credentials

. Google Compute Engine built-in credentials

Rather than relying on this mechanism, one can explicitly set the credentials
as follows:

```java
// Create the access token cache.
PubsubAccessTokenCache accessTokenCache = PubsubAccessTokenCache
.builder()
.setCredentials("awesome-password") // null falls back to the defaults
.build();

// Create the client.
PubsubClient client = PubsubClient
.builder()
.setAccessTokenCache(accessTokenCache)
.build();

// Create the puller.
PubsubPuller puller = PubsubPuller
.builder()
.setClient(client)
// ...
.build();

// Create the ack'er.
PubsubAcker acker = PubsubAcker
.builder()
.setClient(client)
// ...
.build();

// Create the publisher.
PubsubPublisher publisher = PubsubPublisher
.builder()
.setClient(client)
// ...
.build();
```

=== How can I enable metrics?

Given http://micrometer.io/[Micrometer] is used for metrics, you first need to
have it in your list of dependencies:

```xml

io.micrometer
micrometer-core
${micrometer.version}

```

Both `PubsubClient` and `StagedRateLimiterReactorDecoratorFactory` provide
means to configure metrics. Each can be simply configured as follows:

```java
// Create a meter registry.
MeterRegistry meterRegistry = ...;

// Pass the meter registry to the Pub/Sub client.
PubsubClient
.builder()
.setMeterRegistry(meterRegistry)
.setMeterNamePrefix("pubsub.client") // default
.setMeterTags(Collections.emptyMap()) // default
// ...
.build();

// Pass the meter registry to the rate limiter factory.
StagedRateLimiterReactorDecoratorFactory
.builder()
.setMeterRegistry(meterRegistry)
.setMeterNamePrefix("pubsub.stagedRateLimiter") // default
.setMeterTags(Collections.emptyMap()) // default
// ...
.build();
```

Above will publish metrics with the following footprints:

|===
|Name |Tags |Description

|`pubsub.client.publish.latency`
|`projectName`, `topicName`, `result`
|`publish` request latency

|`pubsub.client.publish.count`
|`projectName`, `topicName`
|``publish``ed message count

|`pubsub.client.{pull,ack}.latency`
|`projectName`, `subscriptionName`, `result`
|`pull` and `ack` request latency

|`pubsub.client.{pull,ack}.count`
|`projectName`, `subscriptionName`
|``pulled``ed/``ack``ed message count

|`pubsub.stagedRateLimiter.permitWaitPeriod`
|`name`
|permit wait period distribution summary
|===

There are a couple of details that need further elaboration here:

- When `PubsubPullerConfig#pullPeriod` is set to zero (default), `pull` requests
will only get completed when there are messages. Hence, one might experience
high latencies in queues that frequently become empty.

- When `PubsubPullerConfig#pullPeriod` is set to a value greater than zero,
repeatedly executed `pull` requests by `PubsubPuller#pullAll()` will get
followed by a `pullPeriod` delay after an empty response. Hence the published
`pubsub.client.pull.latency` metrics are a combination of both the full and
the empty responses.

- As of this writing, Pub/Sub blocks every `pull` requests at least ~1.5 seconds
before returning an empty response.

=== How can I run it against the Pub/Sub emulator?

Pub/Sub provides an https://cloud.google.com/pubsub/docs/emulator[emulator]
to test your applications locally. In order to use it in combination with
`reactor-pubsub`, you need to configure the `baseUrl` of the `PubsubClient` as
follows:

```java
// Create a custom client.
PubsubClientConfig clientConfig = PubsubClientConfig
.builder()
.setBaseUrl("http://localhost:8085")
.build();
PubsubClient client = PubsubClient
.builder()
.setConfig(clientConfig)
.build();

// Create a publisher.
PubsubPublisher publisher = PubsubPublisher
.builder()
.setClient(client)
// ...
.build();

// Create a puller.
PubsubPuller puller = PubsubPuller
.builder()
.setClient(client)
// ...
.build();

// Create an acker.
PubsubAcker acker = PubsubAcker
.builder()
.setClient(client)
// ...
.build();
```

=== How fast is ``reactor-pubsub``?

One of the most frequent questions `reactor-pubsub` is challenged with is how
does it perform given the official Pub/Sub client uses Protobuf over HTTP/2
(gRPC), whereas `reactor-pubsub` uses JSON over HTTP/1?

Before going into convincing figures to elaborate on ``reactor-pubsub``s
performance characteristics, there is one thing that deserves attention in
particular: _JSON over HTTP/1 is a deliberate design decision for simplicity_
rather than a fallback due to technical limitations. Even though it is
opinionated, ``reactor-pubsub`` strives to serve as a Pub/Sub client that
leverages frequently used tools (e.g., JSON) and idioms Java developers are
accustomed to. Further, in case of failures, it should be trivial to spot the
smoking gun using a decent IDE debugger.

`reactor-pubsub` source code ships a reproducible link:benchmark[benchmark]
along with its link:benchmark/results.html[results]. As shared there, one can
retrieve (i.e., `pull` & `ack`) a payload of 781 MiB in 2,083 ms using two
2.70GHz CPU cores, pull batch size 50, pull concurrency 5, and message payload
length 16 KiB. That is, 11,998 messages per second on a single core!* Do you
still need more juice? 🙇 Go ahead and create a ticket with your use case,
observed performance, and implementation details.

image:benchmark/results.png[Benchmark Results]

== Historical account

I (_Volkan Yazıcı_) would like to take this opportunity to share the historical
account from my perspective to justify the effort and defend it against any
potential https://en.wikipedia.org/wiki/Not_invented_here[NIH] syndrome
accusations.

*Why did I feel a need to implement a Pub/Sub Java driver from scratch?* At
https://bol.com[bol.com], we heavily use Pub/Sub. There we started our pursuit
like the rest of the Pub/Sub users with
https://cloud.google.com/pubsub/docs/quickstart-client-libraries[the official
Java drivers] provided by Google. Later on we started bumping into backpressure
problems: tasks on the shared `ScheduledExecutorService` were somehow awkwardly
dating back and constantly piling up. That was the point I introduced a
link:src/main/java/com/vlkan/pubsub/util/BoundedScheduledThreadPoolExecutor.java[BoundedScheduledThreadPoolExecutor]
and shit hit the fan. I figured the official Pub/Sub driver was ramming the
fetched batch of messages through the shared executor. My first reaction was to
cut down the pull buffer size and the concurrent pull count. That solved a
majority of our backpressure-related problems, though created a new one:
efficiency. Then I started examining the source code and wasted quite a lot of
time trying to make forsaken
https://github.com/googleapis/gax-java/blob/master/gax/src/main/java/com/google/api/gax/batching/FlowControlSettings.java[FlowControlSettings]
work. This disappointing inquiry resulted in something remarkable: I understood
how Pub/Sub works and amazed by the extent of complexity for such a simple task.
I have already been using Reactive Streams (RxJava and Reactor) every single
work day in the last five years and compiled a thick collection of lessons and
recipes out of it. The more I examined the official Pub/Sub Java driver source
code, the more I was convinced that I could very well engineer this into
something way more simple. I know how to pump JSON payloads over HTTP via
Reactor Netty and enjoy a backpressure-aware, reactive comfort out of the box.
But that wasn't the tipping point I had decided to implement my own Pub/Sub Java
driver. I made my mind when I witnessed that
https://github.com/spring-cloud/spring-cloud-gcp/pull/1461#discussion_r274098603[Google
engineers are clueless about these problems].

*Why all the fuss about the rate limiting?* One morning I came to the office
and read an e-mail from one of the platform teams asking how come we managed to
burn hundreds of dollars worth of Pub/Sub messaging in the middle of the night.
One of the application (non-critical) databases happened to go down for a couple
of hours and during that period nodes constantly sucked up messages and nack'ed
them due to the database failure. This is an opinionated Pub/Sub driver and in
my opinion you should not relentlessly burn Pub/Sub bills if the rest of the
application is shouting out there is something going on wrong. Hence, please
configure and use the god damn rate limiter.

== Security policy

If you have encountered an unlisted security vulnerability or other unexpected
behaviour that has security impact, please report them privately to the
mailto:[email protected][[email protected]] email address.

== Contributors

- https://github.com/akoshterek[Alex Koshterek]
- https://github.com/berkaybuharali[Berkay Buharalı]
- https://github.com/luiccn[Luiz Neto]
- https://github.com/bsideup[Sergei Egorov]

== License

Copyright © 2019-2020 https://vlkan.com/[Volkan Yazıcı]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of the
License at

```
http://www.apache.org/licenses/LICENSE-2.0
```

Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.