https://github.com/radrow/ddtrace
Distributed deadlock detection via asynchronous (pitch)black-box monitors
https://github.com/radrow/ddtrace
deadlock-detection erlang gen-server monitoring
Last synced: 4 months ago
JSON representation
Distributed deadlock detection via asynchronous (pitch)black-box monitors
- Host: GitHub
- URL: https://github.com/radrow/ddtrace
- Owner: radrow
- Created: 2025-11-14T14:45:47.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-02-09T13:32:23.000Z (4 months ago)
- Last Synced: 2026-02-09T16:32:56.079Z (4 months ago)
- Topics: deadlock-detection, erlang, gen-server, monitoring
- Language: Erlang
- Homepage:
- Size: 2.94 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DDTrace
DDTrace is a tool for asynchronous distributed deadlock detection in
`gen_server`-based systems.
## Repository layout
The three top-level applications are:
- `apps/ddtrace` – the main DDTrace library.
- `apps/model` – the scenario generator, tracer tooling, and Elixir CLI used to
exercise the library.
- `apps/microchip_factory` – an example `gen_server`-based Elixir application
which shows DDTrace in a slightly more realistic setup. Refer to its README
for more details.
- `elephant_patrol` — an example *distributed* `gen_server`-based Elixir app.
To build the tooling with the testing models run:
```
mix deps.get
mix escript.build
```
The scenario testing escript is written to `./ddtrace`. Example usage:
```
./ddtrace apps/model/priv/scenarios/deadlock.conf
```
The microchip factory example can be run as follows:
```
mix run -e "MicrochipFactory.start_two(true)"
```
## Prerequisites
- Erlang/OTP 26
- Elixir 1.14
## Application requirements
The monitored system must entirely consist of `gen_server` instances. Moreover,
each server must adhere to _Single-threaded Remote Procedure Call_ (SRPC), which
in practice means that it may only use `gen_server:call` and `gen_server:cast`
for communication. To calls, they must always reply via `{reply, _Reply,
_State}` (i.e. no accumulation of the `From` argument and returning `{noreply,
_State}`). Multi-calls* through `gen_server:multi_call` and manual request
handling via `gen_server:send_request`/`gen_server:reply` is also forbidden. In
order for deadlock detection to work properly, every generic server must be
monitored.
TODO: there is a chance that `gen_server:multi_call` would work, but this is to be investigated.
### Tracing limitations
`ddtrace` monitors employ the `trace` facility to oversee their `gen_server` instances.
Because Erlang allows at most one tracer for each process, this effectively prevents using
`trace` to debug systems monitored by `ddtrace`.
## Instrumenting generic servers with DDTrace
A monitor is started via `ddtrace:start` or `ddtrace:start_link`. The PID of the
monitored `gen_server` is passed as a parameter.
Monitors recognise each other via a *monitor registry* which maps generic
servers' PIDs to their monitors. The registry is implemented in the `mon_reg`
module using `pg` process groups. Monitors take care of registering themselves
in the registry automatically.
In order to receive a deadlock notification, the user needs to register itself
as a subscriber to a particular monitor. One would normally subscribe to a
monitor immediately after making a call, and unsubscribe upon receiving a
response or deadlock notification. To subscribe to deadlocks, use the
`ddtrace:subscribe_deadlocks` function (use `ddtrace:unsubscribe_deadlocks` to opt out). The
subscribtion function returns a request identifier that can be used in generic
server's `reqid` or listened to directly via `gen_server:wait_response`.
The following snippet exemplifies how to monitor a single generic
server with DDTrace:
``` erlang
%% Start the service
{ok, P} = gen_server:start(my_gen_server_module, []),
%% Start the monitor
{ok, M} = ddtrace:start_link(P),
%% Subscribe to deadlocks
ReqM = ddtrace:subscribe_deadlocks(M),
%% Call the service
ReqP = gen_server:send_request(P, request)
%% Set up request ID collection
ReqIds0 = gen_server:reqids_new(),
ReqIds1 = gen_server:reqids_add(ReqP, process, ReqIds0),
ReqIds2 = gen_server:reqids_add(MonP, monitor, ReqIds1),
case gen_statem:receive_response(ReqIds2, infinity, true) of
{{reply, R}, process, _ReqIds} -> %% Handle reply
{{reply, {deadlock, Cycle}}, monitor, _ReqIds} -> %% Handle deadlock
end.
```
**IMPORTANT:** Self-inflicted deadlocks (e.g. `gen_server:call(self(), lol)`)
are handled by `gen_server` and cause the process to crash without sending a
call message. DDTrace will handle this case as well, but the end user might a
receive crash result before the deadlock notification from DDTrace. Note that
simply waiting for `{error, {calling_self, _}, _Label, _ReqIds}` is not
sufficient, as this may happen in a nested call. Therefore, some additional
recursion might be needed to distinguish such a deadlock from a regular error.