{"id":16726363,"url":"https://github.com/brunobonacci/safely","last_synced_at":"2025-04-05T05:09:49.685Z","repository":{"id":62431806,"uuid":"42828184","full_name":"BrunoBonacci/safely","owner":"BrunoBonacci","description":"Safely is a Clojure's circuit-breaker library for handling retries in an elegant declarative way.","archived":false,"fork":false,"pushed_at":"2022-09-09T10:51:19.000Z","size":1499,"stargazers_count":206,"open_issues_count":2,"forks_count":9,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-10-13T22:52:42.849Z","etag":null,"topics":["circuit-breaker","clojure","exceptions","exponential-backoff","retry","retry-policies"],"latest_commit_sha":null,"homepage":"https://cljdoc.org/d/com.brunobonacci/safely","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BrunoBonacci.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-09-20T20:22:01.000Z","updated_at":"2024-09-25T11:43:48.000Z","dependencies_parsed_at":"2022-11-01T21:00:40.208Z","dependency_job_id":null,"html_url":"https://github.com/BrunoBonacci/safely","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoBonacci%2Fsafely","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoBonacci%2Fsafely/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoBonacci%2Fsafely/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoBonacci%2Fsafely/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BrunoBonacci","download_url":"https://codeload.github.com/BrunoBonacci/safely/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247289429,"owners_count":20914464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["circuit-breaker","clojure","exceptions","exponential-backoff","retry","retry-policies"],"created_at":"2024-10-12T22:52:58.911Z","updated_at":"2025-04-05T05:09:49.658Z","avatar_url":"https://github.com/BrunoBonacci.png","language":"Clojure","readme":"# safely\n[![CircleCI](https://circleci.com/gh/BrunoBonacci/safely.svg?style=svg)](https://circleci.com/gh/BrunoBonacci/safely) [![Clojars Project](https://img.shields.io/clojars/v/com.brunobonacci/safely.svg)](https://clojars.org/com.brunobonacci/safely) ![CircleCi](https://img.shields.io/circleci/project/BrunoBonacci/safely.svg) ![last-commit](https://img.shields.io/github/last-commit/BrunoBonacci/safely.svg) [![cljdoc badge](https://cljdoc.org/badge/com.brunobonacci/safely)](https://cljdoc.org/d/com.brunobonacci/safely/CURRENT)\n\nSafely is a Clojure's circuit-breaker library for handling retries in\nan elegant declarative way.\n\nThe library offers out of the box:\n\n  * declarative exception handling\n  * declarative *circuit breaker* (in pure Clojure)\n  * automatic policy-based retries (declarative)\n  * randomized delays retries\n  * attenuation of self-emergent behaviour in distributed systems\n  * sleepless-mode for testing\n  * automatic and customizable logging of errors\n  * automatic tracking of errors rate/count in monitoring tools\n  * automatic tracing compatible with OpenZipkin distributed tracing\n\n## Usage\n\nAdd the dependency into your `project.clj`.\n\n``` clojure\n;; stable version\n[com.brunobonacci/safely \"1.0.0\"]\n```\n\n  * Latest version: [![safely](https://img.shields.io/clojars/v/com.brunobonacci/safely.svg)](https://clojars.org/com.brunobonacci/safely)\n  * Online [Documentation latest version](https://cljdoc.org/d/com.brunobonacci/safely/CURRENT).\n\n\n\nRequire the namespace:\n\n``` clojure\n(ns foo.bar\n  (:require [safely.core :refer [safely]]))\n```\n\nThen, make a call to a remote system:\n\n``` clojure\n;; wrap your critical calls\n;; to external systems (api, db, etc)\n;; into a `safely` block, and define\n;; what to do in case of failures.\n\n(safely\n  (api-call \"other-system\")\n\n  :on-error\n  :max-retries 5\n  :default   {:some :value})\n```\n\nThis is a quick ref-card of all possible configurable options:\n\n``` clojure\n\n;;\n;; all in one example\n;;\n\n(safely\n\n ;; code to execute\n (do (comment run something which can potentially blow))\n\n ;; exception handling\n :on-error\n\n ;; upon error return a default value\n :default \"some value\"\n\n ;; retry a number of times before\n ;; to give up or return the default value\n ;; use :forever for unlimited retries.\n :max-retries 5\n\n ;; between retries wait a fix amount of time (not recommended)\n :retry-delay [:fix 3000] ;; 3s in millis\n\n ;; or wait a uniform random range between :min and :max\n :retry-delay [:random-range :min 1000 :max 3000]\n\n ;; or wait a random amount of time with +/- a random variation\n :retry-delay [:random 3000 :+/- 0.35]\n\n ;; or wait an exponential amount of time with a random variation\n :retry-delay [:random-exp-backoff :base 300 :+/- 0.50]\n :retry-delay [:random-exp-backoff :base 300 :+/- 0.35 :max 25000]\n\n ;; or wait a given list of times with a random variation\n :retry-delay [:rand-cycle [50 100 250 700 1250 2500] :+/- 0.50]\n\n ;; you can provide a predicate function which determine\n ;; which class of errors are retryable. Just write a\n ;; function which takes an exception and return something\n ;; truthy or falsey.\n :retryable-error? #(not (#{ArithmeticException NullPointerException} (type %)))\n\n ;; valid values: :original, :wrapped, :legacy, (fn [exception] true)\n ;; If an exception is thrown it determine its value.\n ;; :wrapped refers to the ex-info exception throw by safely\n ;; :original refers ti the exception raised inside the block\n ;; :legacy is to maintain the behaviour of earlier versions (mix of the two)\n ;; You can provide a function to control the type of the exception thrown.\n :rethrow :legacy\n\n ;; you can provide a predicate function which determine\n ;; if the output of the body should be considered as a failed response\n ;; this can be useful when using safely with APIs which have a return\n ;; status for errors instead of exceptions. Two good examples are HTTP\n ;; status codes and polling API, in which you wish to slow down the polling\n ;; when the result of the previous polling doesn't contain records.\n :failed? #(not (\u003e= 200 (:status %) 299))\n\n ;; to activate the circuit breaker just give a name to the operation\n :circuit-breaker :operation-name\n\n ;; *PLEASE NOTE*: the following options are ONLY used in conjunction with\n ;; a circuit breaker\n\n ;; control the thread pool size for this operation\n :thread-pool-size  10\n\n ;; control the thread pool queue size for this operation\n :queue-size        5\n\n ;; the number of request's outcome to be sampled for analysis\n :sample-size       100\n\n ;; the number of milliseconds to wait before giving up\n ;; NOTE: it can be used only in conjunction with circuit-breaker\n :timeout           30000 ;; (millis, default no timeout)\n\n ;; What to do with the request when the timeout time is\n ;; elapsed. :never, :if-not-running or :always\n :cancel-on-timeout :always\n\n ;; stats are collected about the outcome of the operations\n ;; this parameter controls the number of 1-sec buckets\n ;; to control.\n :counters-buckets  10\n\n ;; the strategy used to trip the circuit open\n :circuit-breaker-strategy :failure-threshold\n\n ;; the threshold of failing requests after which the circuit trips\n ;; open. This is only used when\n ;; :circuit-breaker-strategy is :failure-threshold\n :failure-threshold 0.5\n\n ;; when the circuit breaker is tripped open, no requests will\n ;; be allowed for a given period.\n :grace-period      3000 ;; millis\n\n ;; the strategy to decide which requests to let through\n ;; for evaluation before closing the circuit again.\n :half-open-strategy :linear-ramp-up\n\n ;; the number of millis during which time an increasing number\n ;; of requests will be let through for evaluation purposes.\n :ramp-up-period    5000\n\n\n ;; General options.\n ;; customize your error message for logs\n :message \"a custom error message\"\n\n ;; set to false if you don't want to log errors\n :log-errors false\n\n ;; or choose the logging level\n :log-level :warn\n\n ;; to disable the stacktrace reporting in the logs\n :log-stacktrace false\n\n ;; whether to enable or disable tracking.\n ;; values: `:enabled` or `:disabled` (default: `:enabled`)\n :tracking :enabled\n\n ;; and track the execution time and outcome with the following action name\n ;; if not provided it will attempt to record the location (line + source file)\n :track-as ::action-name\n\n ;; a vector of key/value pairs to include in the tracking event.\n ;; They are useful to give more context to the event,\n ;; so that when you read the event you have more info.\n ;; for example:\n :tracking-tags [:batch-size 30 :user user-id]\n\n ;; is a function which returns the restult of the evaluation\n ;; and capture some information from the result.\n ;; This is useful, for example if you want to capture the\n ;; http-status of a remote call.\n ;; it returns a map or `nil`, the returned map will be merged\n ;; with the tracking event.\n :tracking-capture (fn [r] {:http-status (:http-status r)})\n )\n\n```\n\n\n## Examples and Case studies\n\nHere a collection of examples and case studies:\n\n  * Use safely with AWS apis\n    * [ETL-load job](./examples/etl-load/doc/etl-load-example.md) -\n      See how the use of `safely` can greatly simplify your ETL jobs\n      and make sure that you are fully utilising your database\n      resources while being tolerant for transitory failures. It is\n      also demonstrated that the exponential backoff exhibits great\n      adaptive behaviours. The example is valid for Hadoop and Spark\n      ETL jobs as well.\n\n\n## Exception handling\n\nThe macro `safely` will run the given code and in case an exception\narises it will follow the policy described after the `:on-error`\nkeyword.\n\n### Return default value\n\nThis is the simplest of the policies. In case of an exception with the\ngiven code a default value will returned.\n\n```Clojure\n;; no error raised, so result is returned\n(safely\n (/ 1 2)\n\n :on-error\n :default 1)\n;;=\u003e 1/2\n\n\n;; an error is raised, but a default value is given\n;; so the default value is returned\n(safely\n ;; ArithmeticException Divide by zero\n (/ 1 0)\n\n :on-error\n :default 1)\n;;=\u003e 1\n```\n\n### Automatic retry\n\nIn some cases by retying a failed operation you can get a successful\noutcome.  For example operations which involve network requests might\ntime out of fail for transitory network \"glitches\".  Typically, before\ngiving up, you want to retry some operations.\n\nFor example, let's assume you wish to retrieve the list active users\nfrom a corporate RESTful webservice and you want to account for\ntransitory failures, you could retry the operation a number of times\nbefore giving up.\n\nThe code could look like as follow:\n\n```Clojure\n;; Automatic retry\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3)\n```\n\n*In this case `:max-retries 3` means that there can be a maximum of 4\nattempts* in total. Between each attempts the thread will be sleeping\nfor a random amount of time.  We will discuss retry delays later on.\n\nIf the first attempt succeed, then the result of the web request is\nreturned, however if an error arises then `safely` will retry until\none of the following conditions is reached: either a the operation\nexecutes successfully, or the `:max-retries` is reached.\n\nAt the point the `:max-retries` is reached, if a `:default` value has\nbeen provided then it will be returned, otherwise the exception will\nbe thrown up the stack.\n\n\n```Clojure\n;; Automatic retry with default value\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :default {:accounts [] :status \"BUSY\"})\n```\n\nIn the previous case the HTTP GET operation may fail and it will be\nautomatically retried for a maximum of 3 times, after which, the\ndefault value of `{:accounts [] :status \"BUSY\"}` is returned.\n\nIf the `:default` clause it is omitted the a\n`clojure.lang.ExceptionInfo` will the thrown with the details of the\nnumber of attempts and the original cause.\n\n\n### Retry delays and randomization\n\n#### Self-emergent Behaviour\n\nIn large distributed systems failures can produce strange behaviour\ndue to the fact that all participant act in the exact same way.\nConsider the example of a service failure where all other services\nwhich use the former detect the failure and decide to retry after the\nexact same amount of time. When the system comes back to life it will\nbe flooded with retry requests from all the other services at the same\ntime. If the number of client service is big enough can cause the\nservice which is already struggling to die and reboot in a continuous\ncycle.\n\n\u003e \"Emergent behavior is that which cannot be predicted through analysis\n\u003e at any level simpler than that of the system as a whole. Emergent\n\u003e behavior, by definition, is what’s left after everything else has been\n\u003e explained\" (Dyson and George 1997).\n\n\u003e \"Emergent behavior is also been defined as the action of simple rules\n\u003e combining to produce complex results\" (Rollings and Adams 2003)\n\nIn this paper\n[Emergent Behavior in Systems of Systems]( http://faculty.nps.edu/thuynh/Conference%20Proceedings%20Papers/Paper_14_Emergent%20Behavior%20in%20Systems%20of%20Systems.pdf)\nyou can see more examples of emergent behaviour.\n\n\n#### Retry policies\n\n`safely` implements several randomization strategies to minimize\nthe appearance of these large scale issues.\n\nAll delay strategies are randomized by default, here is a list of those\nwe currently support.\n\nThe default configuration is: `[:random-exp-backoff :base 300 :+/- 0.50 :max 60000]`\n\n* `:random-range` (min/max) - it define a random range between a fixed boundary\n* `:random` (amount +/- random percentage) - It define an amount and\n  percentage of variation (both sides + or -) from that base amount\n* `:random-exp-backoff` - (**default strategy**) it define an amount\n  of time between each attempt which grows exponentially at every\n  subsequent attempt and it is randomized with a custom +/-\n  percentage. Optionally you can specify a maximum amount of time\n  beyond which it won't grow any more.\n* `:rand-cycle` - if none of the above strategies suits your case you\n  can specify your own list of delays between each attempts and a\n  randomization factor. If the number of attempts goes beyond the\n  listed values it will start from the first one again in a continuous\n  cycle.\n* `:fix` for special cases you can specify a fix amount of time\n  between retry, however i do not recommend the use of this strategy.\n\nNow we will show how each strategy works with code samples.\n\n\n#### :fix\n\nIn this example `safely` will retry for a maximum of 3 times with a\ndelay 3 seconds (3000 milliseconds) exacatly. This strategy is\nstrongly discouraged in order to minimize self emergent behaviour.\n\n```Clojure\n;; Automatic retry with fix interval (NOT RECOMMENDED)\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :retry-delay [:fix 3000])\n```\n\n#### :random-range\n\nIn this example `safely` will retry for a maximum of 3 times with a\ndelay of minimum 2 seconds (2000 milliseconds) and a maximum of 5\nseconds (5000 milliseconds).\n\n```Clojure\n;; Automatic retry with random-range\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :retry-delay [:random-range :min 2000 :max 5000])\n```\n\n#### :random\n\nIn this example `safely` will retry for a maximum of 3 times with a\ndelay 3 seconds (3000 milliseconds) and plus or minus an amount **up\nto** 50% of the base amount. This means that the waiting time could be\neffectively anything between 1500 millis (3000 - 50%) and 4500 millis\n(3000 + 50%).\n\n```Clojure\n;; Automatic retry with random-range\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :retry-delay [:random 3000 :+/- 0.50])\n```\n\n#### :random-exp-backoff\n\nIn this example `safely` will retry for a maximum of 3 times with a\nexponential backoff delay of 300 milliseconds and plus or\nminus random 50% of the calculated wait time. This means that the first retry\nwill be ~300 millis (+/- random variation), the second retry will ~600 millis\n(+/- random variation) etc.\n\n```Clojure\n;; Automatic retry with random-range\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :retry-delay [:random-exp-backoff :base  300 :+/- 0.50])\n```\n\n**The exponential backoff** typically follows this formula:\n\n    delay = base-delay * 2 ^ retry [+/- random-variation]\n\nNOTE: The random variation is added in a second step.\n\nfor a exponential back off for 3000 millis (3 sec) would be:\n\n    retry:     0       1       2       3       4 ...\n    formula:  3*2^0   3*2^1   3*2^2   3*2^3   3*2^4\n    delay:     3s     6s      12s     24s     48s\n\n\nSo for example for a given base you have the number of\nmilliseconds of each subsequent retry:\n\n\n| Base | Retry 1 | Retry 2 | Retry 3 | Retry 4 | Retry 5 |\n|-----:|--------:|--------:|--------:|--------:|--------:|\n|   50 |      50 |     100 |     200 |     400 |     800 |\n|  100 |     100 |     200 |     400 |     800 |    1600 |\n|  200 |     200 |     400 |     800 |    1600 |    3200 |\n| 2000 |    2000 |    4000 |    8000 |   16000 |   32000 |\n| 3000 |    3000 |    6000 |   12000 |   24000 |   48000 |\n\n\nIf you wish to check the sequence for a given base you can try on the\nREPL as follow:\n\n```Clojure\n(require 'safely.core)\n(take 10 (#'safely.core/exponential-seq 2000))\n;;=\u003e (2000 4000 8000 16000 32000 64000 128000 256000 512000 1024000)\n```\n\n**The randomization is applied after the exponential value has been\n  calculated**\n\nIf you want to simulate the random variation as well write as follow:\n\n```Clojure\n(require 'safely.core)\n(-\u003e\u003e (#'safely.core/exponential-seq 2000)\n  (map #(safely.core/random % :+/- 0.50))\n  (take 10))\n;; =\u003e (2488 2152 6072 11159 46051 60235 65198 231233 573339 518515)\n```\nNOTE: Every execution will return different numbers.\n\n#### :random-exp-backoff (with :max)\n\nAdditionally you can specify a maximum amount of time which beyond\nwhich you want to wait for a similar amount of time.\n\n```Clojure\n;; Automatic retry with random-range with a max delay\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 10\n  :retry-delay [:random-exp-backoff :base  3000 :+/- 0.50 :max 240000])\n```\n\nThe above example sets a maximum delay of **4 minutes** (240000 millis)\nbeyond which time `safely` won't backoff exponentially any more, but\nit will remain constant (with some random variation).\n\nExample for the effect of `:max 240000`\n\n```Clojure\n(require 'safely.core)\n;; without :max\n(take 10 (#'safely.core/exponential-seq 3000))\n;; =\u003e (3000 6000 12000 24000 48000 96000 192000 384000 768000 1536000)\n\n;; with :max 240000\n(take 10 (#'safely.core/exponential-seq 3000 240000))\n;; =\u003e (3000 6000 12000 24000 48000 96000 192000 240000 240000 240000)\n```\n\n#### :rand-cycle\n\nIf you don't like the exponential backoff, then you can specify a\nsequence of expected delays between each retry. `safely` will use these\ntimes (in milliseconds) and add randomization to compute the amount of\ndelay between each retry. Once last delay in the sequence is reached\n`safely` will cycle back to the first number and repeat the sequence.\n\n```Clojure\n;; Automatic retry with random list of delays\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 6\n  :retry-delay [:rand-cycle [1000 3000 5000 10000] :+/- 0.50])\n```\n\nIn the above example I've specified the desired waiting time (with\nvariation) of **1s, 3s, 5s and 10s**, I've also specified that I would\nlike `safely` to retry **6 times**, but only 4 wait times were\nspecified. Safely will cycle back from the beginning of the sequence\nproducing effective waiting times of:\n\n    retry:     1     2     3     4      5     6\n    delay:   1000  3000  5000  10000  1000  3000\n             |---------------------|  |---------...\n       cycling back to the beginning of the sequence\n\nIn this way you can specify your custom values which better suits\nyour particular situation.\n\n\n### Errors logging\n\nOne common mistake is to have empty `catch` block. The exception in this case\nit is swallowed by the program without leaving any trace. There are very few\noccasion when this is a good idea, in most of the cases it is recommended to\nat least log the exception in a logging system.\n`safely` by default logs the exception with `timbre`. There are a few configurable\noption which you can leverage to make message more suitable for your situation.\n\nWe have:\n\n* `:message` to customize the log message and make it more meaningful\n  with information which pertain the action you were trying to\n  achieve.\n* `:log-level` the level to use while logging the exception. The\n  default value is `:warn`, other possible values are: `:trace`,\n  `:debug`, `:info`, `:warn`, `:error` and `:fatal`\n* `:log-errors` (`true`|`false`) whether or not the error must be\n  logged. If you don't want to log exceptions in a particular block\n  you can disable it with: `:log-errors false`\n* `:log-stacktrace` (`true`|`false`) whether to report the full\n  stacktrace of the exception or omit it completely. (default `true`)\n* `:log-ns \"logger.name\"` To specify a logger name. Typically the\n  name of a namespace. When using the macro it defaults to the current\n  namespace, when using the function version it defaults to `safely.log`\n\nFor example this log the exception with the given message and a log\nlevel of `:info`.\n\n```Clojure\n;; Customize logging\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :message \"Error while fetching active users\"\n  :log-level :info)\n```\n\nIn this case we disable the error logging for the given block.\n\n```Clojure\n;; Disable logging\n(safely\n  (Thread/sleep 3000)\n  :on-error\n  :log-errors false)\n```\n\n\n It is possible to control the logging of the individual attempts\n by setting the following options:\n   - `:log-inner-errors`\n   - `:log-inner-level`\n   - `:log-inner-stacktrace`\n   - `:log-inner-ns`\n\nAll the `:log-inner-*` if no value is provided, they default to the\nvalue of the `:log-*` options. There are useful to reduce the log\nnoise on individual attempts.\n\nFor example:\n\n```Clojure\n;; Customize logging\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :message \"Error while fetching active users\"\n  :max-retries 3\n  :log-level :error\n  :log-inner-level :debug)\n```\n\nWill log the errors individual attempts as `:debug` level, but\nshould all the attempts up to the `:max-reties` be exhausted then\nthe final error is logged as `:error` level.\n\n### Automatic tracking (monitoring)\n\nIf you have (and you should) a monitoring system which track application\nmetrics as well then you can track automatically how many times a\nparticular section protected by safely is running into errors.\n\nTracking is enabled by default, but if you wish to disable it, set:\n\n* `:tracking :disabled` *(default `:enabled`)*\n  Whether to enable or disable tracking.\n\nIf you wan to track a particular section, all you need to do is to\ngive a name to the section you are protecting with safely with:\n\n* `:track-as ::action-name`\n  Will use the given keyword or string as name for the event. Use\n  names which will be clearly specifying the which part of your code\n  you are tracking, for example: `::db-save` and `::fect-user` clearly\n  specify which action if currently failing. Use namespaced keywords,\n  or fully-qualified actions \"mymodule.myaction\" for avoiding\n  name-conflicts.  Use `mulog/set-global-context!` to add general info\n  such application name, version, environment, host etc. The tracking\n  is done via [***μ/log***](https://github.com/BrunoBonacci/mulog).\n  If `:track-as` is not provided, its source code location will be\n  used instead. _All `safely` blocks are tracked by default._ If you\n  put `:track-as nil` the tracking event won't be collected, but\n  the tracking context will be created..\n\nFor example:\n\n```Clojure\n;; Automatic retry with random-range\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :retry-delay [:random-range :min 2000 :max 5000]\n  :track-as ::fetch-active\n  :circuit-breaker :fetch-active-users)\n```\n\nThis will track the call events providing a number of interesting\ninformation about this single block and publish them to a variety of\nmonitoring systems.\n\n* `:tracking-tags [:key1 :val1, :key2 :val2, ...]` *(default `[]`)*\n   A vector of key/value pairs to include in the tracking event.\n   They are useful to give more context to the event, so that\n   when you read the event you have more info.\n\n   Example:\n   `:tracking-tags [:batch-size 30 :user user-id]`\n\n* `:tracking-capture (fn [result] {:k1 :v1, :k2 :v2})` *(default `nil`)*\n   Is a function which returns the restult of the evaluation and\n   capture some information from the result.  This is useful, for\n   example if you want to capture the http-status of a remote call.  it\n   returns a map or `nil`, the returned map will be merged with the\n   tracking event.\n\n   Example:\n   `:tracking-capture (fn [r] {:http-status (:http-status r)})`\n\n\nFor more information you can see the [tracking](./doc/tracking.md)\npage.\n\nWith [***μ/trace***](https://github.com/BrunoBonacci/mulog#%CE%BCtrace)\nyour `safely` expressions turn into traces which you can visualise\nwith [OpenZipkin](https://zipkin.io/) compatible tracers.\n\nHere is one example:\n\n![mulog tracing](./doc/images/mulog-tracing.png)\n\n\n### Circuit breaker.\n\nThe circuit breaker functionality (introduced in v0.5.0) was\npopularised by [M. T. Nygard's book \"Release\nIt!\"](https://books.google.co.uk/books?id=md4uNwAACAAJ) and\n[2nd ed.](https://books.google.co.uk/books?id=Ug9QDwAAQBAJ).\nThere are already a good amount of open-source libraries which offer\nquite good implementation of circuit-breakers as defined by\nNygard. The most popular it is\n[Hystrix](https://github.com/Netflix/Hystrix) from Netflix.  However,\nHystrix over the years became unnecessarily a huge library.  `safely`\noffers an implementation of the same ideas in a much simplified way\nand 100% Clojure (for JVM).\n\nIf you want to know more about the general idea behind the circuit\nbreaker I would recommend the book \"Release It!\" mentioned above. Here\nI'm going to describe how `safely` implementation works.\n\nInternally the circuit breaker is a state machine which looks like\nthis:\n\n![circuit breaker state machine](/doc/images/circuit-breaker-sm.png)\n\nThe state machine is initiated with the `:closed` state. Like an\nelectrical circuit a _closed_ circuit it is a working circuit in which\nthe current can flow through.\n\n#### **`:closed` state**\n\nIn this state the circuit breaker is allowing to pass all the\nrequests. So when a new request is issued, the circuit breaker will\nretrieve the dedicated thread pool associated with this request type\nand enqueue the new request. Once enqueued an available thread will\npick the request and process it. When the request is completed then\nthe circuit breaker will update its internal state capturing the\noutcome of each request. In this case one of the following things can\nhappen:\n\n  - **the request is successful**, then the result from the processing\n    thread is returned to the caller.\n  - **the request processing fails with an error**, in this case the\n    error is propagated back to the caller and further retries could\n    be made depending whether they are configured and within the\n    limit. If the limit of retries in `:max-retries` is reached then the\n    `:default` value is returned when provided or the error itself.\n  - **the request times out**, if the request has configured\n    `:timeout` and the processing isn't completed within this time, an\n    exception is raised and it follows the same path and the\n    processing error.\n  - **all threads and busy and no more requests can be enqueued**, in\n    this case the request is rejected (`:queue-full`) and the same\n    error handling path or retries is used.  The number of threads and\n    the queue size are configurable parameters.  More on how to size\n    them properly later.\n\nFor any of the above outcomes the circuit breaker state machine updates\na counter. Only counters for the last few seconds are kept and they are\nused by the state evaluation function to determine whether the circuit breaker\nshould be tripped and move to the next state.\n\nCurrently the following strategies are available to trip the circuit breaker:\n\n  - **:failure-threshold** which it looks at the counters and trips\n    the circuit open when a configurable threshold of failing requests\n    is reached. It will wait until at least 3 requests have been\n    processed before verifying the threshold. Very simple and\n    effective.\n\n\n#### **`:open` state**\n\nIf the state evaluation function decides to trip the circuit off\nbecause too many errors occurred, then the circuit breaker state\nmachine goes into the `:open` state. In this state all incoming\nrequests are rejected immediately with a `:circuit-open` error and the\nstandard error path with retries is followed.\n\nThis is useful to immediately reduce the load into the target system.\nThe circuit stays open for a few seconds (according to\n`:grace-period`) and then the circuit automatically transitions to the\n`:half-open` state.\n\n#### **`:half-open` state**\n\nThe purpose of this state is to assess whether the target system is\nback to normal before closing the circuit back and allow all the\nrequests. So for this purpose the circuit breaker allows only a few\nrequests to pass and it checks their outcome. If the system keep\nfailing then the circuit goes back to the `:open` state, if the\nrequests and now successful and the issue seems to be resolved then\nthe circuit goes back to the `:closed` state.\nThe same evaluation function used to trip the circuit open is used\nto evaluate whether now is back to normal.\n\nDuring the `:half-open` state, only a part of the incoming requests\nwill be allowed. The number of the requests allowed depends on the\n`:half-open-strategy`.\n\nThese are the currently supported strategies:\n\n  - **`:linear-ramp-up`**, it will ramp up the number of requests\n  allowed in the circuit breaker over time. The length of time is\n  configurable via `:ramp-up-period`. The system will go back to\n  closed only after the `:ramp-up-period` is elapsed, however if it\n  detects failures during the ramp up it will preemptively open the\n  circuit again.\n\n\n#### How to use the circuit-breaker\n\nTo activate the circuit breaker function just add the `:circuit-breaker`\noption if your `safely` options:\n\n```Clojure\n;; activating circuit breaker\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  ;; give a name to the circuit-breaker\n  :circuit-breaker :fetch-active-users\n  ;; optionally set a timeout for this operation (millis)\n  :timeout 30000)\n```\n\nThat's it!. `safely` in the background will create a thread pool named\n`:fetch-active-users` which will be in charge of processing the\nrequests. You can use the circuit breaker in conjunction with all\nother safely options such as retry strategies, log and tracing.\n\n_**NOTE**: for every unique value passed to `:circuit-breaker` a\nnumber of resources need to be created in the system, namely the\nthread-pool and the circuit-breaker state machine. Therefore you must\nensure that the values passed to the `:circuit-breaker` options are\n**not randomly generated or high cardinality** to avoid the risk of\nrunning out of memory in your system. Best practice is to name the\ncircuit breaker after the operation that it is trying to accomplish._\n\n\n#### Circuit breaker functions\n\n##### `shutdown-pools`\n\nFor every named circuit breaker, `safely` will create its own\ndedicated thread pool. If you wish to shutdown the pool\nprogrammatically then you can call the `shutdown-pools` function\nwith a specific circuit breaker name or without parameters\nto shut all of them down.\n\n##### `circuit-breaker-info`\n\nIf you want to access the info stored in the state machine\nfor monitoring purposes then you can use the `circuit-breaker-info`\nfunction with a circuit breaker name for the state regarding the\nspecific circuit breaker or without parameters for all.\n\n#### How to size the thread pool\n\nYou might think that a thread pool of 10 is very small for your\nsystem, and you might be tempted to increase this number by one order\nof magnitude.  Although some times this is the correct thing to do,\nmost of the time it won't be. The defaults are already set for large\nvolume systems so most of you won't need to change the size of the\nthread pool and/or the queue length.  However if you think you should\nchange these values for your system I would recommend to use the\n[Litlle's Law](http://web.mit.edu/~sgraves/www/papers/Little's%20Law-Published.pdf)\n(from Queueing Theory) to choose the correct size.\n\nThe _Little's Law_ says that the long term average number of items `L`\nin your system is equal to the average arrival rate `λ` multiplied by\nthe long term average time `W` required to process that item, therefore:\n\n![Little's Law](/doc/images/LittleLaw.png)\n\nThe interesting property about the _Little's Law_ is that it applies\nto the whole system as well as its individual parts.  This means that\nthis law will apply to your system as a whole, meaning all the\ninstances of your system in the cluster, as well as the individual\ninstances. Moreover, if your single instance has two possible paths\nwith two different probabilities, it will apply to these sub-parts as\nwell with the parameters adjusted accordingly.\n\nFor example if you have a system which processes 5000 requests/second\nas a whole, and you have 15 instances to serve these requests,\nand each requests takes on average 25 milliseconds, then we can reason\nas follow:\n\n  * `λ = 5000 rq/s`\n  * `W = 25 millis -\u003e 0.025s`\n  * then we can deduce that `L` for the whole system is going to be:\n  * `L = λW -\u003e 5000 rq/s * 0.025 s -\u003e L = 125`\n  * So it means that the whole system will have an average of `125`\n    concurrent requests when processing `5000 rq/s`.\n  * Since every instance follow the Little's law as well and\n    since all the instances have typically the same probability\n    to get a request (via a load balancer), then it is safe\n    to assume that every instance will have the same share of traffic.\n    Since we ha *15 instances* then we can say that:\n  * `Li = L / 15 -\u003e 125 / 15 -\u003e Li = 8.34` where `Li` is the load of a\n    single instance.\n\nAs you can see although your system as a whole processes a lot of\nrequests per seconds, the individual instance _concurrent load `Li`_\nit will be within the range of the thread pool. If we size the thread\npool a bit larger to cope with requests bursts and we add a small\nqueue typically 30%-50% of the thread pool size we can ensure that\noccasional hiccups and bursts of requests are handled properly without\ncausing the circuit breaker to trip over.\n\nI hope this small guide helps you to correctly size your system.\nAnyway, always use measurements (tracking, monitoring) to compute\nthe right size and verify you changes according to your assumptions\nto see if the change had the effect you hoped.\n\n\n### Macro vs function\n\n`safely` it's a Clojure macro which wraps your code with a try/catch\nand offers a elegant declarative approach to the error\nmanagement. However in many cases macro can't be used easily for this\nreason we provide a function as well.\n\nEverything you can do with the macro `safely` you can do with the\nfunction `safely-fn` which takes a **thunk** (function with zero\narguments and the same options with `safely` takes after the\n`:on-error` clause.\n\nSo for example this is the use of the macro you have seen so far:\n\n```Clojure\n;; Automatic retry with random-range\n(safely\n  (http/get \"http://user.service.local/users?active=true\")\n  :on-error\n  :max-retries 3\n  :retry-delay [:random-range :min 2000 :max 5000])\n```\n\nThis is the same example **but with the `safely-fn` instead**:\n\n```Clojure\n;; Automatic retry with random-range\n(safely-fn\n  (fn []\n    (http/get \"http://user.service.local/users?active=true\"))\n\n  :max-retries 3\n  :retry-delay [:random-range :min 2000 :max 5000])\n```\n\n_Note the use of the **thunk** to wrap the code and the absence of the\n `:on-error` keyword._\n\n\n### Testing and the `sleepless-mode`\n\nIf you are writing automated test but you don't want to wait then\nyou can enable the **sleepless-mode** in order to skip the waiting\ntimes of the retry for example:\n\nThis might wait up to 40s before returning \"\".\n\n```Clojure\n;; this might wait up to 40s before returning \"\"\n(safely\n  (slurp \"/not/existing/file\")\n  :on-error\n  :max-retries 5\n  :default \"\")\n```\n\nThis one does the same number of retries but doesn't sleep and it\nreturns immediately (same code path, but no sleep).\n\n```Clojure\n;; This one does the same number of retries but doesn't sleep\n(binding [safely.core/*sleepless-mode* true]\n  (safely\n    (slurp \"/not/existing/file\")\n    :on-error\n    :max-retries 5\n    :default \"\"))\n```\n\n\n## License\n\nCopyright © 2015-2024 Bruno Bonacci\n\nDistributed under the Apache License v 2.0 (http://www.apache.org/licenses/LICENSE-2.0)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrunobonacci%2Fsafely","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrunobonacci%2Fsafely","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrunobonacci%2Fsafely/lists"}