Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chjj/bthreads

worker threads for javascript
https://github.com/chjj/bthreads

Last synced: 17 days ago
JSON representation

worker threads for javascript

Awesome Lists containing this project

README

        

# bthreads

A [worker_threads] wrapper for node.js. Provides transparent fallback for
pre-11.7.0 node.js (via `child_process`) as well as browser web workers.
Browserifiable, webpack-able.

## Usage

``` js
const threads = require('bthreads');

if (threads.isMainThread) {
const worker = new threads.Worker(__filename, {
workerData: 'foo'
});

worker.on('message', console.log);
worker.on('error', console.error);

worker.on('exit', (code) => {
if (code !== 0)
console.error(`Worker stopped with exit code ${code}.`);
});
} else {
threads.parentPort.postMessage(threads.workerData + 'bar');
}
```

Output (with node@<11.7.0):

``` bash
$ node --experimental-worker threads.js
foobar
$ node threads.js
foobar
```

## Backends

bthreads has 4 backends and a few layers of fallback:

- `worker_threads` - Uses the still experimental [worker_threads] module in
node.js. Only usable prior to node.js v11.7.0 if `--experimental-worker` is
passed on the command line.
- `child_process` - Leverages the [child_process] module in node.js to emulate
worker threads.
- `web_workers` - [Web Workers API][web_workers] (browser only).
- `polyfill` - A [polyfill] for the web workers API.

The current backend is exposed as `threads.backend`. Note that the current
backend can be set with the `BTHREADS_BACKEND` environment variable.

### Explicit Entry Points

`require('bthreads')` will automatically pick the backend depending on what is
stable, but in some cases that may not be what you want. Because of this,
there are also more explicit entry points:

- `require('bthreads/process')` - Always the `child_process` backend,
regardless of node version.
- `require('bthreads/threads')` - Always the `worker_threads` backend,
regardless of node version.

## Caveats

Some caveats for the `child_process` backend:

- The transfer list only works for MessagePorts. ArrayBuffers won't _actually_
be transferred.
- `options.workerData` probably has a limited size depending on platform (the
maximum size of an environment variable).
- `SharedArrayBuffer` does not work and will throw an error if sent.
- In order to avoid memory leaks, MessagePorts (all aside from the parentPort)
do not hold event loop references (`ref()` and `unref()` are noops).
- Prior to node 10, objects like `Proxy`s can be serialized and cloned as they
cannot be detected from regular javascript.
- `SHARE_ENV` does not work and will throw an error if passed.

Caveats for the `web_workers` backend:

- `options.workerData` possibly has a limited size depending on the browser
(the maximum size of `options.name`).
- `options.eval` requires a "bootstrap" file for code. This is essentially a
bundle which provides all the necessary browserify modules (such that
`require('path')` works, for example), as well as bthreads itself. By
default, bthreads will pull in its own [bundle] as an [npm
package][bthreads-bundle] from [unpkg.com]. If using the default bootstrap
file, you must have `blob:` and/or `data:` set as a [Content-Security-Policy]
source (see [content-security-policy.com] for a guide). When using a bundler,
note that the bundler will _not_ be able to compile the eval'd code. This
means that `require` will have limited usability (restricted to only core
browserify modules and `bthreads` itself).
- The `close` event for MessagePorts only has partial support (if a thread
suddenly terminates, `close` will not be emitted for any remote ports).
This is because the `close` event is not yet a part of the standard Web
Worker API. See https://github.com/whatwg/html/issues/1766 for more info.
- `SHARE_ENV` does not work and will throw an error if passed.
- `workerData` is serialized as json instead of using the structured clone
algorithm. This limits what can be sent as `workerData`. This was done to
reduce code size since serializing structured data is non-trivial.
- The `stdio`, `stdin`, and `stdout` options will throw an error if passed.
STDIO streams do not exist in the browser. This is done to reduce code size.
- To make sure bthreads is aware of the `Buffer` object in the browser, you
must assign `bthreads.Buffer` like so: `bthreads.Buffer = Buffer;`. Once
again, this was done to reduce code size.

Caveats for the `polyfill` backend:

- Code will not actually run in a separate context (obviously).
- `importScripts` will perform a synchronous `XMLHttpRequest` and potentially
freeze the UI. Additionally, XHR is bound to certain cross-origin rules that
`importScripts` is not.
- Similarly, worker scripts are also spawned using XHR. This means they are
restricted by the `connect-src` `Content-Security-Policy` directive
specifically (instead of perhaps the `worker-src` directive).
- All transferred `ArrayBuffer`s behave as if they were `SharedArrayBuffer`s
(i.e. they're not neutered). Be careful!
- Uncaught errors will not be caught and emitted as `error` events on worker
objects.
- Worker scripts cannot be executed as ES modules.
- Exotic objects like `Proxy`s can be serialized and cloned as they cannot be
detected from regular javascript.

Caveats for all of the above:

- For a number of reasons, bthreads has to walk the objects you pass in to
send. Note that the cloning function may get confused if you attempt to send
the raw prototype of a built-in object (for example
`worker.postMessage(Buffer.prototype)`).

Finally, caveats for the `worker_threads` backend:

- It is somewhat unstable and crashes a lot with assertion failures,
particularly when there is an uncaught exception or the thread is forcefully
terminated. Note that `worker_threads` is still experimental in node.js!
- Native modules will be unusable if they are not built as context-aware
addons.

## High-level API

The low-level node.js API is not very useful on its own. bthreads optionally
provides an API similar to [bsock].

Example (for brevity, the async wrapper is not included below):

``` js
const threads = require('bthreads');

if (threads.isMainThread) {
const thread = new threads.Thread(__filename);

thread.bind('event', (x, y) => {
console.log(x + y);
});

console.log(await thread.call('job', ['hello']));
} else {
const {parent} = threads;

parent.hook('job', async (arg) => {
return arg + ' world';
});

parent.fire('event', ['foo', 'bar']);
}
```

Output:

``` js
foobar
hello world
```

### Creating a thread pool

You may find yourself wanting to parallelize the same worker jobs. The
high-level API offers a thread pool object (`threads.Pool`) which will
automatically load balance and scale to the number of CPU cores.

``` js
if (threads.isMainThread) {
const pool = new threads.Pool(__filename);

const results = await Promise.all([
pool.call('job1'), // Runs on thread 1.
pool.call('job2'), // Runs on thread 2.
pool.call('job3') // Runs on thread 3.
]);

console.log(results);
} else {
const {parent} = threads;

Buffer.poolSize = 1; // Make buffers easily transferrable.

parent.hook('job1', async () => {
const buf = Buffer.from('job1 result');
return [buf, [buf.buffer]]; // Transfer the array buffer.
});

parent.hook('job2', async () => {
return 'job2 result';
});

parent.hook('job3', async () => {
return 'job3 result';
});
}
```

## Writing code for node and the browser

One of the remarkable features of bthreads is that it allows for static
analysis when bundling. The `threads.Pool` and `threads.Thread` objects resolve
their `filename` argument as if it was a `require()` from the calling file.

``` js
const thread = new threads.Thread('./worker.js');
```

The above line will resolve to `${__dirname}/worker.js` in node.js and
`${window.location}/worker.js` in the browser. In node.js, it is _not_ relative
to the current working directory! We accomplish this through various forms of
sorcery.

Why does this matter? Because it allows for browserify and/or webpack to do
static analysis on your code and ship your code (including workers) as a single
bundled file! Of course, this would require an extra browserify or webpack
plugin which adds some more initialization code for choosing the proper entry
point.

### How this works behind the scenes (for plugin implementers)

Statically analyzing the line above, the compiler should replace
`'./worker.js'` with `'bthreads-worker@[id]'`. When initializing the code,
`bthreads` should be implicitly required. `bthreads` will set an environment
variable called `process.env.BTHREADS_WORKER_INLINE` which contains the `[id]`
you generated previously, allowing you to determine which function to run
inside the worker thread.

In other words, when the compiler comes across:

``` js
const thread = new threads.Thread('./worker.js');
```

`./worker.js` should be included in the bundled and mapped to an ID (in our
case, we include it in the bundle with an ID of `1`).

Our line becomes:

``` js
const thread = new threads.Thread('bthreads-worker@1');
```

The bundle's main entry point should include some initialization code like:

``` js
requireBthreads();

if (process.env.BTHREADS_WORKER_INLINE)
requireWorker(process.env.BTHREADS_WORKER_INLINE);
else
requireMain();
```

## importScripts

In the browser, bthreads exposes a more useful version of `importScripts`
called `threads.require`.

``` js
const threads = require('bthreads');
const _ = threads.require('https://unpkg.com/underscore/underscore.js');
```

This should work for any library exposed as UMD or CommonJS. Note that
`threads.require` behaves more like `require` in that it caches modules
by URL.

## More about eval'd browser code

Note that if you are eval'ing some code inside a script you plan to bundle with
browserify or webpack, `require` may get unintentionally transformed or
overridden. This generally happens when you are calling toString on a defined
function.

``` js
const threads = require('bthreads');

function myWorker() {
const threads = require('bthreads');

threads.parentPort.postMessage('foo');
}

const code = `(${myWorker})();`;
const worker = new threads.Worker(code, { eval: true });
```

The solution is to access `module.require` instead of `require`.

``` js
const threads = require('bthreads');

function myWorker() {
const threads = module.require('bthreads');

threads.parentPort.postMessage('foo');
}

const code = `(${myWorker})();`;
const worker = new threads.Worker(code, { eval: true });
```

## API

- Default API
- `threads.isMainThread` - See [worker_threads] documentation.
- `threads.parentPort` - See [worker_threads] documentation (worker only).
- `threads.threadId` - See [worker_threads] documentation.
- `threads.workerData` - See [worker_threads] documentation (worker only).
- `threads.MessagePort` - See [worker_threads] documentation.
- `threads.MessageChannel` - See [worker_threads] documentation.
- `threads.Worker` - See [worker_threads] documentation.
- Helpers
- `threads.backend` - A string indicating the current backend
(`worker_threads`, `child_process`, `web_workers`, or `polyfill`).
- `threads.browser` - `true` if a browser backend is being used.
- `threads.location` - The current module URL (cross-platform
`import.meta.url`).
- `threads.filename` - The current module filename (cross-platform
`__filename`).
- `threads.dirname` - The current module dirname (cross-platform
`__dirname`).
- `threads.require(location)` - `importScripts()` wrapper (browser+worker
only).
- `threads.resolve(location)` - Resolve a URL or path to a filename. This is
what `threads.require` calls internally.
- `threads.exit(code)` - A reference to `process.exit`.
- `threads.cores` - Number of CPU cores available.
- Options
- `threads.Buffer` - In the browser, this must be set to the `Buffer` object
in order for bthreads to be aware of buffers.
- `threads.bufferify` - A boolean indicating whether to cast Uint8Arrays
to Buffer objects after receiving. Only affects the high-level API. This
option is on by default.
- High-Level API
- `threads.Thread` - `Thread` Class (see below).
- `threads.Port` - `Port` Class (see below).
- `threads.Channel` - `Channel` Class (see below).
- `threads.Pool` - `Pool` Class (see below).
- `threads.parent` - A reference to the parent `Port` (worker only, see
below).

### Socket Class (abstract, extends EventEmitter)

- Constructor
- `new Socket()` - Not meant to be called directly.
- Properties
- `Socket#events` (read only) - A reference to the bind `EventEmitter`.
- `Socket#closed` (read only) - A boolean representing whether the socket is
closed.
- Methods
- `Socket#bind(name, handler)` - Bind remote event.
- `Socket#unbind(name, handler)` - Unbind remote event.
- `Socket#hook(name, handler)` - Add hook handler.
- `Socket#unhook(name)` - Remove hook handler.
- `Socket#send(msg, [transferList])` - Send message, will be emitted as a
`message` event on the other side.
- `Socket#read()` (async) - Wait for and read the next `message` event.
- `Socket#fire(name, args, [transferList])` - Fire bind event.
- `Socket#call(name, args, [transferList], [timeout])` (async) - Call remote
hook.
- `Socket#hasRef()` - Test whether socket has reference.
- `Socket#ref()` - Reference socket.
- `Socket#unref()` - Clear socket reference.
- Events
- `Socket@message(msg)` - Emitted on message received.
- `Socket@error(err)` - Emitted on error.
- `Socket@event(event, args)` - Emitted on bind event.

### Thread Class (extends Socket)

- Constructor
- `new Thread(filename, [options])` - Instantiate thread with module.
- `new Thread(code, [options])` - Instantiate thread with code.
- `new Thread(function, [options])` - Instantiate thread with function.
- Properties
- `Thread#online` (read only) - A boolean representing whether the thread is
online.
- `Thread#stdin` (read only) - A writable stream representing stdin (only
present if `options.stdin` was passed).
- `Thread#stdout` (read only) - A readable stream representing stdout.
- `Thread#stderr` (read only) - A readable stream representing stderr.
- `Thread#threadId` (read only) - An integer representing the thread ID.
- Methods
- `Thread#open()` (async) - Wait for the `online` event to be emitted.
- `Thread#close()` (async) - Terminate the thread and wait for `exit` event
but also listen for errors and reject the promise if any occur (in other
words, a better `async` version of `Thread#terminate`).
- `Thread#wait()` (async) - Wait for thread to exit, but do not invoke
`close()`. Also listen for errors and reject the promise if any occur.
- Events
- `Thread@online()` - Emitted once thread is online.
- `Thread@exit(code)` - Emitted on exit.

### Port Class (extends Socket)

- Constructor
- `new Port()` - Not meant to be called directly.
- Methods
- `Port#start()` - Open and bind port (usually automatic).
- `Port#close()` (async) - Close the port and wait for `close` event, but
- `Port#wait()` (async) - Wait for port to exit, but do not invoke `close()`.
Also listen for errors and reject the promise if any occur.
- Events
- `Port@close()` - Emitted on port close.

### Channel Class

- Constructor
- `new Channel()` - Instantiate channel.
- Properties
- `Channel#port1` (read only) - A `Port` object.
- `Channel#port2` (read only) - A `Port` object.

### Pool Class (extends EventEmitter)

- Constructor
- `new Pool(filename, [options])` - Instantiate pool with module.
- `new Pool(code, [options])` - Instantiate pool with code.
- `new Pool(function, [options])` - Instantiate pool with function.
- Properties
- `Pool#file` (read only) - A reference to the filename, function, or code
that was passed in.
- `Pool#options` (read only) - A reference to the options passed in.
- `Pool#size` (read only) - Number of threads to spawn.
- `Pool#events` (read only) - A reference to the bind `EventEmitter`.
- `Pool#threads` (read only) - A `Set` containing all spawned threads.
- Methods
- `Pool#open()` (async) - Populate and wait until all threads are online
(otherwise threads will be lazily spawned).
- `Pool#close()` (async) - Close all threads in pool, reject on errors.
- `Pool#populate()` - Populate the pool with `this.size` threads (otherwise
threads will be lazily spawned).
- `Pool#next()` - Return the next thread in queue (this may spawn a new
thread).
- `Pool#bind(name, handler)` - Bind remote event for all threads.
- `Pool#unbind(name, handler)` - Unbind remote event for all threads.
- `Pool#hook(name, handler)` - Add hook handler for all threads.
- `Pool#unhook(name)` - Remove hook handler for all threads.
- `Pool#send(msg)` - Send message to all threads, will be emitted as a
`message` event on the other side (this will populate the pool with threads
on the first call).
- `Pool#fire(name, args)` - Fire bind event to all threads (this will
populate the pool with threads on the first call).
- `Pool#call(name, args, [transferList], [timeout])` (async) - Call remote
hook on next thread in queue (this may spawn a new thread).
- `Pool#hasRef()` - Test whether pool has reference.
- `Pool#ref()` - Reference pool.
- `Pool#unref()` - Clear pool reference.
- Events
- `Pool@message(msg, thread)` - Emitted on message received.
- `Pool@error(err, thread)` - Emitted on error.
- `Pool@event(event, args, thread)` - Emitted on bind event.
- `Pool@spawn(thread)` - Emitted immediately after thread is spawned.
- `Pool@online(thread)` - Emitted once thread is online.
- `Pool@exit(code, thread)` - Emitted on thread exit.

### Thread, Pool, and Worker Options

The `options` object accepted by the `Thread`, `Pool`, and `Worker` classes is
nearly identical to the [worker_threads] worker options with some differences:

- `options.type` and `options.credentials` are valid options when using the
browser backend (see [web_workers]). Note that `options.type = 'module'` will
not work with the `polyfill` backend. If a file extension is `.mjs`,
`options.type` is automatically set to `module` for consistency with node.js.
- `options.bootstrap` is a valid option in the browser when used in combination
with `options.eval`. Its value should be the URL of a compiled [bundle] file.
For security, it's recommended to serve your own bootstrap file. This can be
set to `false` to do a _raw_ eval (you must inline your own initialization
code, presumably by using `importScripts`).
- The `Pool` class accepts `size` option. This allows you to manually set the
pool size instead of determining it by the number of CPU cores.
- `options.dirname` allows you to set the `__dirname` of an eval'd module.
This makes `require` more predictable in eval'd modules (note this is _not_
necessary with the `Thread` and `Pool` objects -- it is done automatically).

#### Worker Data

In the browser, `workerData` is serialized as JSON instead of structured data.
To force usage of the structured clone algorithm, it's possible to require
`./lib/encoding` (note that this will increase your code size _greatly_).

``` js
const encoding = require('bthreads/encoding');
const thread = new threads.Thread('./worker.js', {
workerData: encoding.stringify({ foo: 'bar' })
});
```

## Contribution and License Agreement

If you contribute code to this project, you are implicitly allowing your code
to be distributed under the MIT license. You are also implicitly verifying that
all code is your original work. ``

## License

- Copyright (c) 2019, Christopher Jeffrey (MIT License).

See LICENSE for more info.

[worker_threads]: https://nodejs.org/api/worker_threads.html
[child_process]: https://nodejs.org/api/child_process.html
[web_workers]: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers
[polyfill]: https://github.com/chjj/bthreads/blob/master/lib/browser/polyfill.js
[Content-Security-Policy]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy
[content-security-policy.com]: https://content-security-policy.com/
[bsock]: https://github.com/bcoin-org/bsock
[bundle]: https://github.com/chjj/bthreads/blob/master/lib/browser/eval.js
[bthreads-bundle]: https://www.npmjs.com/package/bthreads-bundle
[unpkg.com]: https://unpkg.com/