Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fulcrologic/guardrails

Efficient, hassle-free function call validation with a concise inline syntax for clojure.spec and Malli
https://github.com/fulcrologic/guardrails

Last synced: about 2 months ago
JSON representation

Efficient, hassle-free function call validation with a concise inline syntax for clojure.spec and Malli

Awesome Lists containing this project

README

        

:sectanchors:
ifdef::env-github,env-cljdoc[]
:tip-caption: :bulb:
:note-caption: :information_source:
:important-caption: :heavy_exclamation_mark:
:caution-caption: :fire:
:warning-caption: :warning:
endif::[]

= Guardrails

image:https://img.shields.io/clojars/v/com.fulcrologic/guardrails.svg[link=https://clojars.org/com.fulcrologic/guardrails]
image:https://circleci.com/gh/fulcrologic/guardrails/tree/main.svg?style=svg["CircleCI", link="https://circleci.com/gh/fulcrologic/guardrails/tree/main"]

Efficient, hassle-free function call validation with a concise inline syntax for clojure.spec and Malli:

[source, clojure]
----
(>defn ranged-rand
[start end]
[int? int? | #(< start end) ; `|` = such that
=> int? | #(>= % start) #(< % end)]
(+ start (long (rand (- end start)))))
----

Guardrails is intended to make it possible (and easy) to use specs/schemas as a loose but informative (even advisory) type system during development
so you can better see where you are making mistakes as you make them, with zero impact on _production_ build size or performance, and minimal impact during development.

It's the evolution of https://github.com/gnl/ghostwheel[Ghostwheel's] runtime spec validation functionality, which it streamlines, optimizes and extends, while keeping the original inline syntax. See the <> section for more details on Guardrails' rationale and origin story.

== NEW in 1.2

Version 1.2 adds some exciting new features and performance improvements (partially funded by https://www.dataico.com[Dataico]):

* Use https://github.com/metosin/malli[Malli] as your <>. You can even mix-and-match Clojure Spec and Malli in the same project/namespace.
* <>: Limit the number of function call checks per second at the namespace, function, or global level.
* <>: Authors can auto-exclude internal implementation functions from validation when their libraries are used in dependent projects downstream.
* <>: Users can disable and enable active checking for any namespace or function at runtime, as well as override library exclusions.
* <>: Full support for checking such-that predicates on a function's output, with the ability to reference any input arguments in both clojure.spec and Malli.

== Quick Start

. Add this library to your dependencies
. Create a `guardrails.edn` with `{}` in it in your project root
. When you run a REPL or CLJS compiler, include the JVM option `-Dguardrails.enabled`
** Optionally: If you're using CLJS, set your compiler options to include `{:external-config {:guardrails {}}}`

And code as follows:

[source, clojure]
-----
(ns com.domain.app-ns
(:require
[com.fulcrologic.guardrails.core :refer [>defn >def | ? =>]]))

;; >def (and >fdef) can be used to remove specs from production builds. Use them to define
;; specs that you only need in development. See the docstring of
;; `com.fulcrologic.guardrails.noop` for details.
(>def ::thing (s/or :i int? :s string?))

;; When guardrails is disabled this will just be a normal `defn`, and no fspec overhead will
;; appear in cljs builds. When enabled it will check the inputs/outputs and *always* log
;; an error using `expound`, and then *optionally* throw an exception,
(>defn f
[i]
[::thing => int?]
(if (string? i)
0
(inc i)))
-----

When the function is misused you'll get an error:

[source, bash]
-----
user=> (f 3.2)
ERROR /Users/user/project/src/com/domain/app_ns.clj:12 f's argument list
-- Spec failed --------------------

[3.2]
^^^

should satisfy

int?

or

string?

-- Relevant specs -------

:user/thing:
(clojure.spec.alpha/or :i clojure.core/int? :s clojure.core/string?)
-----

You can control if spec failures are advisory or fatal by editing `guardrails.edn` and setting the `:throw?` option. See
<> for more details.

Make sure to set your editor or IDE to resolve Guardrails' `>defn` and `>fdef` as Clojure's `defn`, and `>defn-` as `defn-` respectively – this way you get proper highlighting, formatting, error handling, structural navigation, symbol resolution, and refactoring support. A linting configuration for clj-kondo is included and should work out of the box.

=== Output Options

The configuration can be changed to help the signal to noise ratio on failures. If you have heavily instrumented your code with Guardrails, then turning on these two config options will likely give you a clearer picture on failures:

* `:guardrails/compact? true` - Remove blank/excess lines from the failure output.
* `:guardrails/stack-trace :none` - Change what is shown for the stack trace. :none elides it, :prune causes a more pruned stack trace on a single line, and :full is the default of showing the whole thing.
* `:guardrails/trace? true` - Shows the GR function call stack on failures (with argument values)

Try the above settings as shown. You can always see the full stack by calling `last-failure` (which is printed with any failures for convenience). The result is way less noisy, and often sufficient to find the problem. When it's not, the full stack trace is just a copy/paste away.

== Clojurescript Considerations

I use `shadow-cljs` as the build tool for all of my projects, and highly recommend it. Version 0.0.11 of Guardrails
checks the compiler optimizations and refuses to output guardrails checks except in development mode (no optimizations). This
prevents you from accidentally releasing a CLJS project with big runtime performance penalties due to spec checking
at every function call.

The recommended approach for using guardrails in your project is to make a separate `:dev` and `:release` section of your
shadow-cljs config, like so:

[source, clojure]
------
{:builds {:main {:target :browser
...
:dev {:compiler-options
{:closure-defines {'goog.DEBUG true}
:external-config {:guardrails {}}}}
:release {}}}
...}
------

Doing so will prevent you from accidentally generating a release build with guardrails enabled in case you had
a shadow-cljs server running in dev mode (which would cache that guardrails was enabled) and built a release
target:

[source, bash]
-----
# in one terminal:
$ shadow-cljs server
# later, in a different terminal
$ shadow-cljs release main
-----

In this scenario Guardrails will detect that you have accidentally enabled it on a production build and will
throw an exception. The only way to get guardrails to build into a CLJS release build is to explicitly set
the JVM property "guardrails.enabled" to "production" (NOTE: any truthy value will enable it in CLJ).

You can set JVM options in shadow-cljs using the `:jvm-opts` key:

[source, clojure]
-----
:jvm-opts ["-Dguardrails.enabled=production"]
-----

but this is highly discouraged.

=== Dead Code Elimination

There is a a noop namespace that can be used in your build settings to attempt to eliminate all traces of guardrails
and dependent code. This will not remove spec dependencies unless you only use spec for guardrails, so do similar tricks
for your inclusions of spec namespaces.

See https://github.com/fulcrologic/guardrails/blob/develop/src/main/com/fulcrologic/guardrails/noop.cljc[noop.cljc].

[[gspec-syntax]]
== The Gspec Syntax

`[arg-specs* (| arg-preds+)? \=> ret-spec (| ret-preds+)? (\<- generator-fn)?]`

`|` : such that

The number of `arg-specs` must match the number of function arguments, including a possible variadic argument – Guardrails will shout at you if it doesn't.

=== Single/Multiple Arities

Write the function as normal, and put a gspec after the argument list:

[source, clojure]
-----
(>defn myf
([x]
[int? => number?]
...)
([x y]
[int? int? => int?]
...))
-----

=== Variadic Argument Lists

`arg-specs` for variadic arguments are defined as one would expect from standard fspec:

[source, clojure]
-----
(>fdef clojure.core/max
[x & more]
[number? (s/* number?) => number?])
-----

[NOTE]
--
The `arg-preds`, if defined, are `s/and`-wrapped together with the `arg-specs` when desugared.

The `ret-preds` are equivalent to (and desugar to) spec's `:fn` predicates, except that the anonymous function parameter
is the ret, and the args are referenced using their symbols. That's because in the gspec syntax spec's `:fn` is simply
considered a 'such that' clause on the ret.
--

=== Such That

To add an additional condition add `|` after either the argument specs (just before `\=>`) or return value spec
and supply a lambda that uses the symbol names from the argument list (and `%` for return value).

[source, clojure]
-----
(>defn f
[i]
[int? | #(< 0 i 10) => int? | #(pos-int? %)]
...)
-----

=== Nilable

The `?` macro can be used as a shorthand for `s/nilable`:

[source, clojure]
-----
(>fdef clojure.core/empty?
[coll]
[(? seqable?) => boolean?])
-----

=== Nested Specs

Nested gspecs are defined using the exact same syntax:

[source, clojure]
-----
(>fdef clojure.core/map-indexed
([f]
[[nat-int? any? => any?] => fn?])
([f coll]
[[nat-int? any? => any?] (? seqable?) => seq?]))
-----

In the rare cases when a nilable gspec is needed `?` is put in a vector rather than a list:

[source, clojure]
-----
(>fdef clojure.core/set-validator!
[a f]
[atom? [? [any? => any?]] => any?])
-----

TIP: For nested gspecs there's no way to reference the args in the `arg-preds` or `ret-preds` by symbol. The recommended
approach here is to register the required gspec separately by using `>fdef` with a keyword.
//You can do it with `#(\-> % :arg1)` in the `arg-preds`, but that won't work in the `ret-preds` and it's quite messy anyway. You could theoretically use a nested `(s/fspec ...)` instead of a gspec, but that gets unwieldy quick.

NOTE: Nested gspecs with one or more `any?` argspecs desugar to `ifn?`, so as not to mess up generative testing. This
can be overridden by passing a generator – even an empty one, that is simply adding `\<-` or `:gen` to the gspec – in which case the gspec will desugar exactly as specified.
{zwsp}
The assumption here is that `any?` does not imply that the function can in fact handle any type of argument.
{zwsp}
You should still write out nested gspecs, even if they are as simple as `[any? \=> any?]` – this is useful as succinct
documentation that this particular function receives exactly one argument.

=== Gspec Advantages

- It's much more concise and easier to write and read.
- It's inline, so you can see at a glance what kind of data a function expects and returns right under the
docstring and arg list, for example when previewing the function definition in your editor.
- Writing specs for multi-arity functions adds zero complexity, even with multiple return types and such-that predicates
- Renaming/refactoring parameters is a breeze – just use your IDE's symbol rename functionality and all references in
the predicate functions will be handled correctly, because `>defn` syntax is valid `defn` syntax.
+
From the point of view of the programmer and the editor, the function arguments are bound to their respective symbols and can be freely referenced in any expression as expected, including the gspec which is considered just another body form.
- For the same reason, you can reliably bypass Guardrails temporarily by simply changing `>defn` to `defn` - the minimal performance impact
of evaluating the gspec vector as the first body form aside, nothing will break.
- It can be elided to have zero impact on the build by an external control (config file/JVM parameter).

Credit: The above documentation was largely taken from https://github.com/gnl/ghostwheel#the-gspec-syntax[Ghostwheel's documentation].

[#malli-support]
== Malli Support

Version 1.2.0 includes full support for Malli. If you use the latter for data validation, you no longer need to maintain a separate spec-based set of schema for Guardrails – it is, after all, the same data you use in your functions!

All you have to do to use it instead of spec is change your require statement. In fact, you can alias BOTH spec-based and malli-based Guardrails in the same namespace – just make sure you use the right kind of schema with the corresponding function!

The special operators `\=>`, `|`, and `?` can come from either implementation, as they are purely symbolic.

[source, clojure]
-----
(ns foo.bar
(:require
[clojure.spec.alpha :as s]
[com.fulcrologic.guardrails.core :as gr.spec]
[com.fulcrologic.guardrails.malli.core :as gr.malli :refer [=> | ?]))

(gr.spec/>defn f
[x]
[(s/keys :req [:thing/x]) => int?]
...)

(gr.malli/>defn f
[x]
[[:map :thing/x] => :int]
...)
-----

All configuration options apply to both variants (max checks per second, throw configuration, etc.). Other than the items used *within* the gspec, they are identical.

=== The Guardrails Malli Registry

Clojure Spec forces you to use a shared global registry, and carefully ensure that your keywords are qualified and do not collide with others.

Malli does have a default registry, but it is not mutable. This lets you to pick registries at will, and allows for more lenient use of "poor" naming because the threat of collision is reduced; however, it makes the writing of function schemas a lot more tedious:

[source, clojure]
-----
(>defn f
[x]
[[:map {:registry my-reg} ...
-----

Fortunately, Malli supports mutable registries, so we can provide the convenience of a global registry and dramatically reduce the boilerplate.

The mutable Guardrails registry is initiated with the exact content of the default Malli registry, and is held in the `com.fulcrologic.guardrails.malli.registry` namespace, which also includes functions that you can use to directly add your own schema to it. You'll need to do this for any qualified keywords you want to use in `>defn`s that leverage Malli. For example you can merge in some other schema maps with:

[source, clojure]
-----
(gr.reg/merge-schemas! my-custom-stuff my-other-stuff)
-----

The `com.fulcrologic.guardrails.malli.core` namespace also has a convenient `>def` that is like the Clojure Spec `def`, in that it will register a schema under a qualified keyword for you:

[source, clojure]
-----
(>def :member/name :string)
-----

== Configuration

=== Enabling

Guardrails is disabled by default, emitting *exactly* what a plain `defn` would until you explicitly turn it on, which is done via a JVM option. We chose this path because it is highly effective at preventing its accidental enabling in production, which could cause huge performance impacts.

The JVM option `-Dguardrails.enabled=true` should be used to turn on
guardrails. When not defined `>defn` will emit exactly what `defn` would.

You may also enable it in cljs in your shadow-cljs config
(see Configuration...adding even an empty config map will enable it).

=== The Configuration File

The default config goes in the root of the project as `guardrails.edn`:

[source, clojure]
-----
{
; what to emit instead of defn, if you have another defn macro
:defn-macro nil

;; Nilable map of Expound configuration options.
:expound {:show-valid-values? true
:print-specs? true}

;; GLOBALLY enable non-exhaustive checking (this is NOT recommended, you'd
;; usually want to set it in a more granular fashion on the function or
;; namespace level using metadata.)
;; Limits function call checks per second to this maximum number.
;; The intermittent checks are spread out evenly to ensure sufficient coverage,
;; so if MCPS is set to 50 and you have 1000 calls per second, roughly every 20th
;; call will be checked.
:guardrails/mcps 100

;; Low-level stack trace output
;; default :full
:guardrails/stack-trace :prune ; or :full or :none

;; nREPL hates using stderr, but in other REPLs seeing your problems in red is nice.
;; default false
:guardrails/use-stderr? true

;; Optional (default false). Compress the explanation of the problem.
:guardrails/compact? true

;; Keep track of the active GR-instrumented calls as a stack, and show that on errors.
;; default false
:guardrails/trace? true

;; should a spec failure on args or ret throw an exception?
;; (always logs an informative message)
:throw? false

;; should a failure be forwarded through tap> ?
:tap>? false}
-----

You can override the config file *name* using JVM option
`-Dguardrails.config=filename`.
In your shadow-cljs config file you can override settings via the `[:compiler-options :external-config :guardrails]`
config path of a build:

[source, clojure]
-----
...
:app {:target :browser
:dev {:compiler-options
{:external-config {:guardrails {:throw? false}}
:closure-defines {'goog.DEBUG true}}}}
...
-----

== Performance

Guardrails adds an overhead that is roughly equivalent to the cost of running a Clojure spec or Malli validation. On an
Apple M1 Max, the average check on a generic code base (tested against https://github.com/fulcrologic/statecharts[our statecharts library]) takes around 11 microseconds. This actually tested out to roughly the same for Malli AND Spec, though we did find cases where
Malli was roughly 2x faster. We did not do further deep analysis.

[source]
-----
nCalls Max Mean MAD Clock Total
174,586 14.72ms 11.44μs ±78% 2.00s 91%
-----
(measured using https://github.com/taoensso/tufte[Tufte])

As you can see, if you instrument a lot of your functions, the number of calls can add up quickly (this result was from running 8 tests). So, even though the runtime checks are only taking microseconds, the overall effect can be dramatic.

Here's how fast those tests are when we turn off Guardrails altogether (one call, because
we measured the entire test suite runtime instead of the overhead of non-existent runtime checks):

[source]
-----
nCalls Max Mean MAD Clock Total
1 72.38ms 72.38ms ±0% 72.38ms 100%
-----

As you can see, the performance can be a significant drag on development, often leading people to strip out their checks, a thing that I've had to do in my own libraries in the past because it hurt downstream users. No more! We now have various ways of improving the situation.

[#check-throttling]
=== Limiting Max Checks Per Second

In version 1.2.0 and above you can tune Guardrails to limit the number of times a function is checked per second. This can have a huge performance benefit for functions that are called in loops and possibly involve complex and expensive checks.

The limit can be applied globally, to a namespace, or even to a function (recommended), by setting `:guardrails/mcps` to an integer. Like most other options, you can place it in the global `guardrails.edn`, the compiler config, the metadata of a namespace, or in the attribute map of a `>defn`. For example:

[source, clojure]
-----
(>defn f
{:guardrails/mcps 100}
[x]
[int? => int?]
...)
-----

The performance boost from this setting can be dramatic. The Fulcrologic Statecharts library uses Guardrails extensively for internal function checks, and without an MCPS limit some state changes can take human-perceptible amounts of time (like seconds). With it applied the performance impact returns to a virtually unnoticeable level. Measurements on this particular library indicate that each *check* takes around 20 microseconds, but the overhead of the max-calls-per-second is only 20 or so nanoseconds (on an M1 Max Mac Studio). So, when a calculation ends up causing 100k+ checks (remember there is a check for each arg, and one for the return value) enabling MCPS makes things run literally 1000x faster.

Here's that same set of 8 tests we showed earlier, but with MCPS set to 100 the Guardrails overhead is reduced to only a few milliseconds! In other words, the non-exhaustive checking makes it appear as if guardrails isn't even there. Since it is very common to have just a handful of heavily called functions, dropping each of their check counts to 100 means that you're more likely to only run a few thousand checks in total.

Of course, the downside is that you are no longer getting rigorous data flow checking, but for functions that are called heavily this is an acceptable trade-off, since the probability of detecting some kind of problem can be tuned as you see fit.

The throttling always checks the "leading edge" first; from there it tracks a counter, and uses the high resolution timer to calculate the current number of checks per second that have been done. If that exceeds the set limit, the check is skipped (and the time will change, but not the check count), so after enough time has elapsed, more checks will happen. This has the tendency to "spread out" the checks over time, but of course even high resolution timers are going to give you a lot of jitter at a high call frequency.

[#dynamic-exclusions]
=== Dynamic Exclusions

Version 1.2.0 also includes the ability to turn checks completely on or off, including at runtime, on a wide or granular level, such as for a namespace or even a function, both in CLJ and CLJS. The functions for controlling this are in `com.fulcrologic.guardrails.config`:

* `(config/exclude-checks! ns-or-fn)` - Turns off checking for an entire ns, or just a single fully-qualified symbol.
* `(config/allow-checks! ns-or-fn)` - Turns on checking for an entire ns, or just a single fully-qualified symbol.
* `(config/excluded? ns-or-fn)` - Indicate if the given (entire ns) or fn (qualified symbol) is excluded from checks.
* `(config/clear-exclusions!)` - Make everything, even in libraries that export exclusions, run checks. See next section.
* `(config/reset-exclusions!)` - Re-apply any library exclusion exports (resets exclusions to what they were at startup). See next section.

[#static-exclusions]
=== Static Exclusions (Special Attention Library Authors)

Most libraries have a main surface API, and then a bunch of internal functions. It is useful to instrument all of these with Guardrails in order to get the benefits of documentation, validation during development, and verification while testing.

Unfortunately, this can have a huge performance impact on downstream consumers of your library that also use Guardrails. It makes sense that a library author should indicate which functions comprise the *public* API (and should be checked by downstream users), and which are considered more *internal* and should only be checked when the author is working on the library itself.

Library authors (and application authors as well) can include a file at the top level of their classpath (e.g. src or resources folder, usually) with the special name `guardrails-export.edn` which contains a config map that can exclude a set of namespaces from ALL checks. To get the checking config at runtime, the `>defn` functions in those namespaces will only run a simple check on a volatile, so setting these exclusions returns things to pretty much full performance (~= no Guardrails used at all).

For example, `src/main/guardrails-export.edn` in the https://github.com/fulcrologic/statecharts[Fulcrologic statecharts library] looks something like this:

[source, clojure]
-----
{:exclude #{com.fulcrologic.statecharts.algorithms.v20150901-impl}}
-----

Remember, this goes in a *specially-named* file, *not* in the primary guardrails configuration file, since these are meant to be seen by downstream consumers (like data_readers.clj).

A quick implementation note: In order to make this work in Clojurescript a macro must run that can read the JVM classpath, and compile all the exclusions found in on-disk (and in JAR files) into a runtime set. The same happens in Clojure (though in CLJ you can read the fs again at any time).

The set of exclusions found in export files at load time is what `reset-exclusions!` will restore if you have dynamically changed the exclusions at runtime. Basically this load-time set is kept in a var for exactly this reason since CLJS cannot re-trigger a classpath scan.

NOTE: As a library *author* these exclusions will end up applying to your code as well, since it is difficult to tell which export file belongs to which project on the classpath. Thus the beginning of your test namespaces (and possibly your non-published user ns) should all start with a call to `config/clear-exclusions!` if you want to include your implementation checks while running your tests and working on your library code.

== Why?

Clojure spec's instrument (and Orchestra's outstrument) have a number of disadvantages when trying to use them for
this purpose. Specifically, they are side-effecting after-calls that do not play particularly well with hot code reload,
and always throw when there is a failed spec. Furthermore, management of the accidental inclusion of specs in your cljs
builds (which increase build size) is a constant pain when writing separate specs for functions (the specs end up in
a whole other file, inclusion needs to be via a development ns, and things easily get out of date).

This library is a middle ground between the features of raw Clojure spec and George Lipov's Ghostwheel.
Much of the source code in this library is directly from https://github.com/gnl/ghostwheel[Ghostwheel].

This library's goals are:

- The ability to use a simple DSL to declare the spec with a function (taken from Ghostwheel). See that library's docs
for *syntax* of `>defn`, `>def`, etc.
- The ability to support dead-code elimination in cljs.
- No reliance on generative testing facilities/checkers. No orchestra/instrument stuff.
- Good output when a function receives or emits an incorrect value.
- The ability to control if a spec failure causes a throw (instrument always throws), because a lot of the time
during development your spec is just wrong, and crashing your program is very inconvenient. You just want a log message
to make you aware.

without the extra overhead of Ghostwheel's support for:

* Automatic generative testing stuff.
* Tracing.
* Side-effect detection/warning.

== Copyright and License

The code and documentation taken from Ghostwheel is by George Lipov and follows the ownership/copyright of that library.
The modifications in this library are copyrighted by Fulcrologic, LLC.

This library follows Ghostwheel's original license: Eclipse public license version 2.0.