https://github.com/andreacrotti/clojure-debugging
https://github.com/andreacrotti/clojure-debugging
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/andreacrotti/clojure-debugging
- Owner: AndreaCrotti
- Created: 2023-03-07T13:45:06.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-03-22T13:40:47.000Z (almost 3 years ago)
- Last Synced: 2025-02-01T09:13:16.960Z (11 months ago)
- Language: Clojure
- Size: 1.81 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
#+AUTHOR: Andrea Crotti
#+REVEAL_THEME: dracula
#+REVEAL_TRANS: fade
#+REVEAL_SPEED: fast
#+REVEAL_TOC: nil
#+OPTIONS: num:nil ^:nil tex:t toc:nil reveal_progress:t reveal_control:t
#+reveal_overview:t
#+title: Problem Solving
#+subtitle: For Software Engineers
* Problem solving
#+begin_notes
As software engineers, we are constantly solving problems. They can be as hard
as a subtle concurrency bug that takes you a week to find, to just understanding why
a library is not behaving as you think it should.
This talk mostly concentrates on the hard problems and suggests a method about how to tackle them.
As a disclaimer I'm in no way an expert problem solver, but I wanted to give this talk is because in my career I've spent an enormous amount of time solving hard problems, and in retrospect with a more methodic approach I would have been a lot faster and not stress as much.
As you will notice there are lots of Sherlock Holmes quotes in the slides, because I think that there is lots in common between solving murders and solving difficult software problems.
Fun fact about the "Elementary, my dear Watson", these exact words are never used in Holmes stories, but it was paraphrased from the "Crooked Man", and it was “‘Excellent!’ I [Watson] cried. ‘Elementary,’ said he.”
#+end_notes
#+begin_quote
Elementary, my dear Watson' ~ Sherlock Holmes
#+end_quote
* The method
#+begin_notes
There are lots of books about problem solving, where probably the most famous is "How to solve it" a small volume written in 1945, which I read ages ago and gave me the idea for this talk.
That book in particular is quite focused on solving math problems, in this talk I just want to focus on software development so I just came up with some steps myself, mostly from personal experience.
The steps I'm suggesting won't apply to all the situations in the same way, but hopefully some main concepts will be general enough.
#+end_notes
** Don't
#+begin_notes
I will start with the things you should not do first.
Specially when the problem you are trying to solve is affecting production you might panic and just try to rush to find a solution.
Unfortunately that pretty much never helps and you if you don't think clearly you might just end up wasting a lot of time following the wrong lead, and make the situation even worse.
For the same reason you should not try to guess without enough evidence.
#+end_notes
- don't panic
- don't rush
- don't guess
** Describe the problem
#+begin_notes
The first phase is actually describing the problem and collecting all the evidence.
This is probably the most important phase and maybe the most neglected one, even by senior developers.
In this phase you should collect all the evidence and write down a detailed description of the problem.
#+end_notes
#+begin_quote
“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay.” ~ Sherlock Holmes
#+end_quote
- gather as much data as possible
- when did the problem happen first?
- how often is it happening?
- what did we change?
- have any of the external dependencies changed?
- write it all down
** Reproduce
#+begin_notes
This phase is maybe the hardest one, since sometimes you can't really reproduce easily a production issue.
It's also very important though because if you can't reproduce a problem, it will be a lot harder to know if you actually fixed it or not.
So try to reproduce the problem locally, even better if you can write a failing test.
If it turns out that you really can't reproduce the problem at all then that's also very valuable information, since it probably means that it's not something under your control that's causing the software to misbehave.
#+end_notes
#+begin_quote
‘You see, but you do not observe. The distinction is clear.’ ~ Sherlock Holmes
#+end_quote
- can you reproduce the problem?
- can you write a failing test?
** Fix it
#+begin_notes
At this stage hopefully we know what the problem is, and we can really go about at fixing it.
This phase can take quite a long time depending on how easily you can reproduce the problem, but the main takeaways here are that you really need to change one thing at a time.
In this phase is also very important to keep track of all the attempts you've made, specially if it's an extremely difficult problem to solve and can take days, you'll be grateful to have it all written down.
#+end_notes
#+BEGIN_QUOTE
“When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.” ~ Sherlock Holmes
#+END_QUOTE
- change *ONE VARIABLE AT A TIME*
- keep track of all your attempts
- test your fix in the real environment
** Are we done yet?
#+begin_notes
Assuming you fixed the problem and deployed it successfully, are we done yet?
In this phase you should have think if you've really done everything you could to make sure the problem you solved won't happen again.
This might involve writing more tests, add some extra assertions or even extra documentation if there are no code changes that can really help anymore.
#+end_notes
#+begin_quote
“Those who cannot remember the past are condemned to repeat it.” ~ George Santayana
#+end_quote
- is the problem really gone?
- are you sure it can't happen again?
- what else can you do to make the system more resilient/transparent?
* Examples
** A slow request
#+begin_notes
So the first example of problem solving was an issue we had while moving a project from a dedicated VM to the internal cloud platform.
After doing that on UAT all the performance tests started to fail miserably and the API just became way too slow.
No code was changed, so the only explanation was the actual infrastructure move, but was that the problem?
Yes and no, the API got slower because of that change, but it turns out that we could not do anything about it, however after some profiling we found out that 90% of the time was actually used parsing strings into datetimes.
Just a one line change adding an actual format made some requests that were taking minutes just take seconds, so even faster than before moving to the internal cloud.
#+end_notes
- moved an API from VM to internal cloud
- the API got unbearably slow
- no code changed
- what is going on?
#+REVEAL: split
#+begin_src clojure
(declare heavy-transformations)
(defn do-lots-of-smart-stuff [request]
(let [;; slow
ts (cf/parse (-> request :json-params :ts))
;; fast
;; ts (cf/parse (-> request :json-params :ts) (cf/formatter "YYYY-MM-DD"))
]
(heavy-transformations (request))))
#+end_src
|--------------------------------------------+---------|
| fn | max time |
|--------------------------------------------+---------|
| :clojure-debugging.speedy/defn_without-fmt | 21.22μs |
| :clojure-debugging.speedy/defn_with-fmt | 53.34μs |
#+REVEAL: split
#+begin_src clojure
(defn parse
"Returns a DateTime instance in the UTC time zone obtained by parsing the
given string according to the given formatter."
(^org.joda.time.DateTime [^DateTimeFormatter fmt ^String s]
(.parseDateTime fmt s))
(^org.joda.time.DateTime [^String s]
(first
(for [f (vals formatters)
:let [d (try (parse f s) (catch Exception _ nil))]
:when d] d))))
#+end_src
** A question of space
#+begin_notes
#+end_notes
- kafka connect workers stop working
- no code changed
- no useful logs anywhere
- the process is still running, it just hangs forever
#+REVEAL: split
#+begin_src clojure
(defn store-files!
[{:keys [file-writing-pool] :as ctx}
{:keys [batch-id] :as batch}]
(log/log "Storing files" {:batch-id batch-id})
(cp/future file-writing-pool
(println "writing out these files")))
#+end_src
* Conclusions
#+begin_notes
I'll just leave you with a last quote from an Italian comedian, that more or less corresponds to "The answer inside you, but it's wrong".
Hopefully next time you are facing a hard problem to debug you'll think What Would Holmes Do in this situation.
#+end_notes
#+begin_quote
"The answer is inside you, but it's wrong" ~ Quelo
#+end_quote
*WWHD* (What Would Holmes Do)