Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wlandau/txtq

A small message queue for interprocess communication
https://github.com/wlandau/txtq

Last synced: 16 days ago
JSON representation

A small message queue for interprocess communication

Awesome Lists containing this project

README

        

---
output:
github_document:
html_preview: false
---

```{r knitrsetup, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
options(tibble.print_min = 5, tibble.print_max = 5)
```

```{r mainexample, echo = FALSE}
suppressMessages(suppressWarnings(library(txtq)))
```

[![CRAN](https://www.r-pkg.org/badges/version/txtq)](https://cran.r-project.org/package=txtq) [![check](https://github.com/wlandau/txtq/workflows/check/badge.svg)](https://github.com/wlandau/txtq/actions?query=workflow%3Acheck) [![Codecov](https://codecov.io/github/wlandau/txtq/coverage.svg?branch=main)](https://codecov.io/github/wlandau/txtq?branch=main)

# txtq - a small message queue for parallel processes

The `txtq` package helps parallel R processes send messages to each other. Let's say Process A and Process B are working on a parallel task together. First, both processes grab the queue.

```{r setup}
path <- tempfile() # Define a path to your queue.
path # In real life, temp files go away when the session exits, so be careful.
q <- txtq(path) # Create a new queue or recover an existing one.
q$validate() # Check if the queue is corrupted.
```

The queue uses text files to keep track of your data.

```{r files}
list.files(q$path()) # The queue's underlying text files live in this folder.
q$list() # You have not pushed any messages yet.
```

Then, Process A sends instructions to Process B.

```{r ab}
q$push(title = "Hello", message = "process B.")
q$push(
title = c("Calculate", "Calculate"),
message = c("sqrt(4)", "sqrt(16)")
)
q$push(title = "Send back", message = "the sum.")
```

You can inspect the contents of the queue from either process.

```{r list}
q$list()
q$list(1) # You can specify the number of messages to list.
q$count()
```

As Process A is pushing the messages, Process B can consume them.

```{r pop2}
q$pop(2) # If you pass 2, you are assuming the queue has >=2 messages.
```

Those popped messages are not technically in the queue any longer.

```{r list2}
q$list()
q$count() # Number of messages technically in the queue.
```

But we still have a full log of all the messages that were ever sent.

```{r listafterpop}
q$log()
q$total() # Number of messages that were ever queued.
```

Let's let Process B get the rest of the instructions.

```{r bfinishconsume}
q$pop() # q$pop() with no arguments just pops one message.
q$pop() # Call q$pop(-1) to pop all the messages at once.
```

Now let's say Process B follows the instructions in the messages. The last step is to send the results back to Process A.

```{r sendback}
q$push(title = "Results", message = as.character(sqrt(4) + sqrt(16)))
```

Process A can now see the results.

```{r aconsume}
q$pop()
```

The queue can grow large if you are not careful. Popped messages are kept in the database file.

```{r largedb}
q$push(title = "not", message = "popped")
q$count()
q$total()
q$list()
q$log()
```

To keep the database file from getting too big, you can clean out the popped messages.

```{r clean}
q$clean()
q$count()
q$total()
q$list()
q$log()
```

You can also reset the queue to remove all messages, popped or not.

```{r reset}
q$reset()
q$count()
q$total()
q$list()
q$log()
```

When you are done, you can destroy the files in the queue.

```{r destroy}
q$destroy()
file.exists(q$path())
```

This entire time, the queue was locked when either process was trying to create, access, or modify it. That way, the results stay correct even when multiple processes try to read or change the data at the same time.

## Importing

You can import a `txtq` into another `txtq`. The unpopped messages are grouped together and sorted by timestamp. Same goes for the popped messages.

```{r import}
q_from <- txtq(tempfile())
q_to <- txtq(tempfile())
q_from$push(title = "from", message = "popped")
q_from$push(title = "from", message = "unpopped")
q_to$push(title = "to", message = "popped")
q_to$push(title = "to", message = "unpopped")

q_from$pop()

q_to$pop()

q_to$import(q_from)

q_to$list()

q_to$log()
```

# Network file systems

As an interprocess communication tool, `txtq` relies on the [`filelock`](https://github.com/r-lib/filelock) package to prevent race conditions. Unfortunately, `filelock` cannot prevent race conditions on network file systems (NFS), which means neither can `txtq`. In other words, on certain common kinds of clusters, `txtq` cannot reliably manage interprocess communication for processes on different computers. However, it can still serve as a low-tech replacement for a simple non-threadsafe database.

# Similar work

## liteq

[Gábor Csárdi](https://github.com/gaborcsardi)'s [`liteq`](https://github.com/r-lib/liteq) package offers essentially the same functionality implemented with SQLite. It has a some additional features, such as the ability to detect crashed workers and re-queue failed messages, but it was in an early stage of development at the time `txtq` was released.

## Other message queues

There is a plethora of message queues beyond R, most notably [ZeroMQ](https://zeromq.org) and [RabbitMQ](https://www.rabbitmq.com/). In fact, [Jeroen Ooms](https://github.com/jeroen) and [Whit Armstrong](https://github.com/armstrtw) maintain [`rzmq`](https://github.com/ropensci/rzmq), a package to work with [ZeroMQ](https://zeromq.org) from R. Even in this landscape, `txtq` has advantages.

1. The `txtq` user interface is friendly, and its internals are simple. No prior knowledge of sockets or message-passing is required.
2. `txtq` is lightweight, R-focused, and easy to install. It only depends on R and a few packages on [CRAN](https://cran.r-project.org).
3. Because `txtq` it is file-based,
- The queue persists even if your work crashes, so you can diagnose failures with `q$log()` and `q$list()`.
- Job monitoring is easy. Just open another R session and call `q$list()` while your work is running.