An open API service indexing awesome lists of open source software.

https://github.com/guettli/programming-guidelines

My personal programming guidelines
https://github.com/guettli/programming-guidelines

best-practices guidelines programming-languages python

Last synced: about 1 year ago
JSON representation

My personal programming guidelines

Awesome Lists containing this project

README

          

# Programming Guidelines

My opinionated programming guidelines.

[1. Introduction](#1-introduction)

[2. Data structures](#2-data-structures)

[3. Dev](#3-dev)

[4. Remote APIs](#4-remote-apis)

[5. Op](#5-op)

[6. Networking](#6-networking)

[7. Monitoring](#7-monitoring)

[8. Communication with others](#8-communication-with-others)

[9. Epilog](#9-epilog)

## 1. Introduction

### About this README

I was born in 1976. I started coding with basic and assembler when I was
13. Later turbo pascal. From 1996-2001 I studied computer science at
HTW-Dresden (Germany). I learned Shell, Perl, Prolog, C, C++, Java, PHP, and finally Python.

Sometimes I see young and talented programmers wasting time. There are
two ways to learn: Make mistakes yourself, or read from the mistakes
which were done by other people.

This list summarises a lot of mistakes I did in the past. I wrote it, to
help you, to avoid these mistakes.

It's my personal opinion and feeling. No facts, no single truth.

### I need your feedback

If you have a general question, please start a [new discussion](https://github.com/guettli/programming-guidelines/discussions/new).

If you think something is wrong or missing, feel free to open an issue or pull request.

### Relaxed focus on your monitor

Do not look at the keyboard while you type. Have a relaxed focus on your
monitor.

I type with ten fingers. It's like flying if you learned it. Your eyes
can stay on the rubbish you type, and you don't need to move your eyes
down (to keyboard) and up (to monitor) several hundred times per day.
This saves a lot of energy. This is a simple tool to help you to learn touch typing:
[tipp10](https://www.tipp10.com/en/)

Measure your typing speed: [10fastfingers.com](//10fastfingers.com/)

Avoid switching between mouse and keyboard too much.

I like Lenovo keyboards with track point. If you want more grip, then
read [Desktop Tips "Keyboard"](https://github.com/guettli/desktop-tips-and-tricks/blob/master/README.md#keyboard)

Once I was fascinated by the copy+paste history of Emacs and PyCharm.
But then I thought to myself: "I want more. I am hungry. I want a
copy+paste history not only in one application, but I also want it for the whole
desktop". The solution is very simple, but somehow only a few people use
it. The solution is called a clipboard manager. I use [CopyQ](https://hluk.github.io/CopyQ/). I use ctrl+alt+v to open the list of last
copy+paste texts. CopyQ supports regex searches in the history.

### Avoid searching with your eyes

Avoid searching with your eyes. Search with the tools of your IDE. You
should be able to use it "blind". You should be able to move the cursor
to the matching position in your code without looking at your keyboard,
without grabbing your mouse/touchpad/TrackPoint and without looking
up/down on your screen.

Compare two files with a diff tool, otherwise, you might get this ugly skeptical frown.

How often per day do you search for the mouse cursor on your screen?
Support your eyes by increasing the cursor size. If you use Ubuntu,
you can do it via [Universal Access / Cursor Size](https://askubuntu.com/questions/1266951/increase-mouse-cursor-size-on-ubuntu-20-04/1266961#1266961)

### Increase font size

During daily work, you often jump from one information snippet to the next
information snippet.

When was the last time you read a text with more than 20 sentences?

I think from time to time you should do so. Slow down, focus on one
text, and read slowly. It helps to increase the font-size. `ctrl-+` is
your friend.

### KISS

Keep it simple and stupid. The most boring and most obvious solution is
often the best. Although it sometimes takes months until you know which
solution it is.

From the book "Site Reliability Engineering" (O'Reilly Media 2016)

Quote:

: The Virtue of Boring

Unlike just about everything else in life, "boring" is a
positive attribute when it comes to software! We don’t want our programs to be spontaneous and interesting; we want them to stick to the script and predictably accomplish their business goals.

Example: [Pure Functions](https://en.wikipedia.org/wiki/Pure_function) are great. They are stateless, their output can be cached forever, they are easy to test.

### Increase the obviousness

But it is not only about code. It is about the experience of all stakeholders: Users, salespeople, support hotline, developers,...

It is hard work to keep it simple.

One thing I love to do: "Increase the obviousness".

One tool to get there: Use a central wiki (without spaces), and
define terms. Related text from me: [Documentation in Intranets: My point of view](https://github.com/guettli/intranets)

### Avoid redundancy

See heading.

### Premature optimization is the root of all evil.

The famous quote "premature optimization is the root of all evil." is true.
You can read more about this here [When to optimize](https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize).

### MVP

You should know what an [MVP (minimum valuable product)](https://en.wikipedia.org/wiki/Minimum_viable_product) is. Building an MVP means to bring something useable to your customer, and then listen to their feedback. Care for their needs, not for your vision of a super performant application.

Avoid i18n in MVP. German is my mother tongue. If I develop a MVP for German users, than I won't to i18n. This can be done later, if needed.

------------------------------------------------------------------------

## 2. Data structures

### Introduction

"Bad programmers worry about the code. Good programmers worry about data
structures and their relationships." -- Linus Torvalds (creator and
developer of the Linux kernel and the version control system git)

### Cache vs Database

There is a fundamental fact which you need to understand: The difference between
a cache and a database.

Remember the basic Input-Process-Output pattern.

In a cache you store data which is **output**. That's handy since you can access the output
without doing the processing again. But cache-invalidation is hard. Maybe
the input has changed, and the value in the cache is outdated? Who knows?
If possible avoid caching, since this will never give you outdated data.
You don't need to backup your cache data. You can create it again.

In a database you store data which is **input**. Usually it was entered by a human
by hand, or generated by measuring some real word data. You can use the data
in database to create a nice HTML page. It is important to backup your valuable
database data, since you can't create it again. The generated output (HTML, JSON, ...)
has no value.

Data which is input usualy has value. Data which is output has only little value,
since you can re-create it again.

### Relational Database

I know SQL is..... It is either obvious or incomprehensible. And, yes, it is
boring.

A relational database is a rock-solid data storage. Use it.

When I studied computer science, I disliked SQL. I thought it was an
outdated solution. I tried to store data in files in XML format, used
in memory Berkley-DB, I used an object-oriented database written in Python (ZODB),
I used NoSQL .... And finally, I realized that boring SQL is the best solution
for most cases.

I use PostgreSQL.

I don't like NoSQL, except for caching (simple key-value DB).

The [PostgreSQL Documentation](https://www.postgresql.org/docs/current/index.html) contains
an introduction to SQL and is easy to read.

If you want to share small SQL snippets, you can use https://dbfiddle.uk/

### Cardinality

It does not matter how you work with your data (struct in C, classes in
OOP, tables in SQL, ...). Cardinality is very important. Using 0..\* is
often easier to implement than 0..1. The first can be handled by a
simple loop. The second is often a nullable column/attribute. You need
conditions (IFs) to handle nullable columns/attributes.

If this is new to you, I will give you two examples:

- 1:N --> One invoice has several invoice positions. For example,
you buy three books in one order, the invoice will have three invoice positions. This is a 1:N relationship. The invoice position is contained in exactly one invoice.
- N:M --> If you look at tags, for example at the Question+Answer
site StackOverflow: One question can be related to several
tags/topics and of course a topic can be set on several questions.
For example, you have a strange UnicodeError in Python then you can set the tags "python" and "unicode" on your question. This is an N:M
relationship. One well know example of N:M is user and groups.

### Conditionless Data Structures

If you have no conditions in your data structures, then the coding for
the input/output of your data will be much easier.

### Avoid nullable Foreign Keys

Imagine you have a table "meeting" and a table "place". The table
"meeting" has a ForeignKey to table "place". In the beginning, it might
be not clear where the meeting will be. Most developers will make
the ForeignKey optional (nullable). WAIT: This will create a condition
in your data structure. There is a way easier solution: Create a place
called "unknown". Use this [senitel value](https://en.wikipedia.org/wiki/Sentinel_value) as default. This data
structure (without a nullable ForeignKey) makes implementing the GUI
much easier.

In other words: If there is no NULL in your data, then there will be
less NullPointerException in your source code while processing the data
:-)

Fewer conditions, fewer bugs.

### Avoid nullable boolean columns

\[True, False, Unknown\] is not a nullable Boolean Column.

If you want to store data in a SQL database that has three states
(True, False, Unknown), then you might think a nullable boolean column
(here "my\_column") is the right choice. But I think it is not. Do you
think the SQL statement "select \* from my\_table where my\_column = %s"
works? No, it won't work since "select \* from my\_table where
my\_column = NULL" will never return a single line. If you don't
believe me, read: [Effect of NULL in WHERE clauses
(Wikipedia)](https://en.wikipedia.org/wiki/Null_(SQL)#Effect_of_Unknown_in_WHERE_clauses).
If you like typing, you can work-around this in your application, but I
prefer straightforward solutions with only a few conditions.

If you want to store True, False, Unknown: Use text, integer, or a new
table and a foreign key.

### Avoid nullable characters columns

If you allow NULL in a character column, then you have two ways to
express "empty":

- NULL
- empty string

Avoid it if possible. In most cases, you just need one variant of
"empty". Simplest solution: avoid that a column holding character data is allowed to be null.

If you think the character column should be allowed to be NULL (for example you want a unique, but optional identifier for rows),
then consider a constraint: If the character string in the column is not
NULL, then the string must not be empty. This way ensure that there are
is only one variant of "empty".

### SQL: I prefer subqueries to joins

In most cases, I use an ORM to access data and don't write SQL by hand.

If I do write SQL by hand, then I often prefer [SQL Subqueries](https://en.wikipedia.org/wiki/SQL_syntax#Subqueries)
to SQL Joins.

Have a look at this example:
```
SELECT id, name
FROM products
WHERE category_id IN
(SELECT id
FROM categories
WHERE expired = True)
```
I can translate this to human language easily: Select all products, which
belong to a category that has expired.

### Use all features PostgreSQL does offer

If you want to store structured data, then PostgreSQL is a safe default
choice. It fits in most cases. Use all features PostgreSQL does offer.
Don't constrain yourself to use only the portable SQL features. It's ok
if your code does work only with PostgreSQL and no other database if
this will solve your current needs. If there is a need to support
other databases in the future, then handle this problem in the future,
not today. PostgreSQL is great, and you waste time if you don't use its
features.

Imagine there is a Meta-Programming-Language META (AFAIK this does
not exist) and it is an official standard created by the ISO (like SQL).
You can compile this Meta-Programming-Language to Java, Python, C, and
other languages. But this Meta-Programming-Language would only support
70% of all features of the underlying programming languages. Would it
make sense to say "My code must be portable, you must use META, you must
not use implementation-specific stuff!"?. No, I think it would make no
sense.

My conclusion: Use all features PostgreSQL has. Don't make your life more
complicated than necessary and don't restrict yourself to use only
portable SQL.

Great features PG has, which you might not know yet:

* [Insert/Update/Delete Trigger](https://www.postgresql.org/docs/current/sql-createtrigger.html)
* "SELECT FOR UPDATE .... SKIP LOCKED" gives you the perfect foundation for a task-queue. For example [Procrastinate](https://github.com/peopledoc/procrastinate)
* [PGAdmin](https://www.pgadmin.org/) nice GUI to configure your databases.
* [Fulltext Search](https://www.postgresql.org/docs/current/textsearch.html)

There is just one hint: Avoid storing binary data in PostgreSQL. An S3
service like [minio](https://min.io/) is a better choice.

### Where to not use PostgreSQL?

- For embedded systems SQLite may fit better
\* Prefer SQLite if there will only be one process accessing the database at a time. As soon as there are multiple users/connections,
you need to consider going elsewhere
- TB-scale full-text search systems.
- Scientific number crunching:
[hdf5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format)
- Caching: Redis fits better
- Go with the flow: If you are wearing the admin hat (instead of the
dev hat), and you should install (instead of developing) a product,
then try the default DB (sometimes MySQL) first.

Source: PostgreSQL general mailing list:

### Transactions do not nest

I love nested function calls and recursion. This way you can write easy
to read code. For example recursion in quicksort is great.

Nested transactions ... sounds great. But stop: What is
[ACID](https://en.wikipedia.org/wiki/ACID) about? This is about:

- Atomicity
- Consistency
- Isolation
- Durability

Database transactions are atomic. If the transaction was successful,
then it is **D**urable.

Imagine you have one outer-transaction and two inner transactions.

1. Transaction OUTER starts
2. Transaction INNER1 starts
3. Transaction INNER1 commits
4. Transaction INNER2 starts
5. Transaction INNER2 raises an exception.

Is the result of INNER1 durable or not?

Conclusion: Transactions do not nest

Related:

The "partial transaction" concept in PostgreSQL is called savepoints.
They capture
linear portions of a transaction's work. Your use of them may be able to
express a hierarchical expression of updates that may be preserved or
rolled back, but the concept in PostgreSQL is not itself hierarchical.

### My customer wants to extend the data schema...

Imagine you created some kind of issue-tracking system. Up until now, you provide attributes like "subject", "description", "datetime created",
"datetime last-modified", "tags", "related issues", "priority", ...

Now the customer wants to add some new attributes to issues. It would be quite easy for you to update
the database schema and update the code.

Maybe you are lucky and you have 100 customers. Then you would like to prefer to spend your time
improving the core product. You don't want to spent too much time on the features which
only one customer wants.

Or the customer wants to update the schema on its own.

What can you do now?

One solution is EAV: The [Entity–attribute–value model](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)

### Why I don't want to work with MongoDB

> MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. ([Wikipedia](https://en.wikipedia.org/wiki/MongoDB))

One document in a collection can differ in its structure. For example, most all documents in a collection have an integer value on the attribute "foo", but for unknown reasons, one document has a float instead of an integer. Grrr.

What does the solution look like?

```
return try {
this.getLong(key)
} catch (e: ClassCastException) {
if (this[key] is Double) this.getDouble(key).toLong() else null
}
```

No! I want a clear schema where all values in a column are of the same type.

Of course, my wish has a draw-back: If you want to upgrade a table in a production relational database, you might have downtime, because the database needs some
minutes to convert all rows to the new schema. But at least in my context, this was never a big problem up until now.

Related: [StackOverflow "class java.lang.Double cannot be cast to class java.lang.Long"](https://stackoverflow.com/questions/65141475/mongodb-class-java-lang-double-cannot-be-cast-to-class-java-lang-long)

------------------------------------------------------------------------

## X. UI

### Mockups help

Start with painting. A [Mockup](https://en.wikipedia.org/wiki/Mockup#Software_engineering) helps.

If you improve an existing application, then take a screenshot and then paint it with expressive colors. I like #ff00ff.

If you doing something from scratch, then create some slides paint it roughly, add numbers to buttons and add a little
text on what should happen if someone pushes the button. Again, use expressive colors, so that it easy to see what is ideation and
what is existing GUI.

You don't need expensive tools like Figma or InVision for this. Especially if I create something new, I like to do it on paper with a pencil and crayons.

Of course, the above hints make no sense if you write a device driver that has no graphical user interface.

### Faceted search

You should know this term: [Faceted search](https://en.wikipedia.org/wiki/Faceted_search)

### FTUE

[First-time user experience](https://en.wikipedia.org/wiki/First-time_user_experience) is very important. Does a user who has never used the application before understanding it immediately?

### Don't make me think

[Don't make me think](https://en.wikipedia.org/wiki/Don%27t_Make_Me_Think) is the title of a book. I don't think it is necessary to read it. Just remember this title and try to create user interfaces that are easy to understand.

### SEO

I think the best docs about search engine optimization are from the company which creates the currently most popular internet search engine:

[developers.google.com/search](https://developers.google.com/search)

------------------------------------------------------------------------

## 3. Dev

### Input-Processing-Output

There are thousands of programming languages and thousands of ways to exchange data. But finally, it is one concept:

[Input-Processing-Output](https://en.wikipedia.org/wiki/IPO_model)

If you tell your navigation system of your car "Please show me the route to Casablance Pub, Leipzig" or if you write your first program which adds two integers and prints the result.

### Less code, fewer bugs

- Not existing code is the best: Less code, fewer bugs
- Code maintained by a reliable upstream (like Python, PostgreSQL,
Django, Linux, Node.js, Typescript, ...) is more reliable than my code.

### Avoid low-level stuff

For me this means to avoid: Assembler, C, C++, Rust, golang ...

These tools are great if you want maximum performance.

My goal is to create something useful. Maybe I optimize later.

### fewer resources, fewer bugs

There are several ways to give data to a method.

Let's have a look at this simple method call: `my_method(some_string)`

You might think there is only of variable which gets accessed by the method?

Let's find more ways this method could get input:

* Environment variables: Maybe setting LANG=de_DE influences the output?
* Filesystem: Maybe the existence or content of a file in the local file system influences the method.
* `my_method()` could access a database, storage, or a cache to read additional data
* Maybe there is a global variable that contains a value that was set by a previous call to `my_method()`
* Maybe the date influences the method. Maybe the method creates a different output at a full moon.
* ...

AFAIK there is no clear name that distinguishes between explicit and implicit input.

You can't avoid implicit input, and it is 100% ok if it is obvious. If your method
should return the data of the user with the id 12345, then your code needs to access
the database.

If the same code works in one environment, but not in a different environment, and you don't know why
then this tool might help: [dumpenv](https://github.com/guettli/dumpenv) it writes the environment to
a list of files, which you can compare with your favorite diff tool (e.g. Meld).

### Environment variables for configuration?

Environment variables are great for providing applications/containers values for database connection strings, URL
to a storage server ...

As soon as an environment variable is used in a condition like `if $FOO equals "BAR", then ... else ...`, then
it is some kind of magic input.

I prefer "clear" input: For a http request this means the GET/POST data. Using the http header is some kind of magic,
and should be avoided.

For commands called via the command line it is the same: I prefer command line arguments instead of environment variables.

Imagine you have a typo in the environment variable name. A dirty shell script will use an empty string and it is likely that
it will do something wrong. Compare this to a script: If you have a typo in the argument for the command, it will fail and tell
you that the given argument is unknown.

A shell script might make you faster during the first 10 minutes. But it will make you slower in the long run.

Writing a Python script with `argparse` takes longer, but will provide you much more reliablity.

I know [12factor App](https://12factor.net/):

> III. Config
>
> Store config in the environment

I agree with connection URLs and passwords/keys/tokens which connect the app to the environment. But if the
configuration influences the behaviour, then I think traditional configuration or configuration stored in a database
makes more sense.

For connection-URL and passwords the data type is easy: It is a string.

But you configuration needs booleans or other data types, then environment variables are not well suited.

### Zen of Python

[Zen of Python](https://www.python.org/dev/peps/pep-0020/) (Written by
Tim Peters in the year 1999)

- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
- Special cases aren't special enough to break the rules.
- Although practicality beats purity.
- Errors should never pass silently.
- Unless explicitly silenced.
- In the face of ambiguity, refuse the temptation to guess.
- There should be one-- and preferably only one --obvious way to do it.
- Although that way may not be obvious at first unless you're Dutch.
- Now is better than never.
- Although never is often better than *right* now.
- If the implementation is hard to explain, it's a bad idea.
- If the implementation is easy to explain, it may be a good idea.
- Namespaces are one honking great idea -- let's do more of those!

In the year 2001, I knew these programming languages: Basic, Pascal,
Assembler, C, C++, Prolog, Lisp, Visual Basic, Java, JavaScript, Tcl/Tk,
Perl.

I was unhappy with all of them and looked for a new language. I narrowed
down the languages, I was interested in and there were two choices left.
One was ruby, the other was Python. I choose Python. It looked simpler,
like executable pseudo-code. Since 2001 I use it nearly every work-day.
I like it, and till now, no other language attracts me.

I am not married to Python. I am willing to change. But the next
language needs to be better. Up until now, I see no alternative.

JavaScript has a big benefit, that it can be executed in the browser.
But I don't like it. Why I don't like it? I don't know. Sometimes
feelings are more important than facts.

### CRUD --> CRD

In most cases, the software does create, read, update, delete data. See
[CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete)

The "update" part is the most difficult one.

Sometimes CRD helps: Do not implement the update operation. Use
delete+create. But be sure to use transactions to avoid data loss, if your
data storage supports this:
"BEGIN; DELETE ...; INSERT ...; COMMIT;"

Translating to SQL terms:

| CRUD Term |SQL |
| ------------|-----------------------------------|
| create |insert into my\_table values (...) |
| read |select ... from my\_table |
| update |update my\_table set col1=... |
| delete |delete from my\_table where ... |

Take a look at virtualization and containers ([Operating-system-level
virtualization](https://en.wikipedia.org/wiki/Operating-system-level_virtualization)).
There CRD gets used, not CRUD. Containers get created, then they
execute, then they get deleted. You might use configuration management
to set up a container. But this gets done exactly once. There is one
update from the vanilla container to your custom container. But this is like
"create". No updates will follow once the container was created. This
makes it easier and more predictable.

The same is true for operating on data-structures in memory. In most cases, you should not alter the data structure which is iterating. Create a new data structure while iterating the input data. In other words: no in-place editing.

### Stateless

When I was a student I was excited and fascinated by [CORBA (Common Object Request Broker Architecture)](https://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture). I thought this is the future of machine to machine communication. Today I smile about how childish I was 19 years ago. CORBA is dead, stateless [http](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) has won.

Things are much easier to implement and predict if you just have one method call. One request and one response. You don't have an open connection and a reference to a remote object which executes on a remote server.

Look at all the dated protocols which are like a human conversation between a client and a server: SMTP, IMAP, FTP, ... Nobody wants the client and the server to have a chatty dialog like this:

```
Client: My name is Bob
Server: Hi Bob, nice to meet you.
Server: But are you really Bob?
Server: Please prove to me that you're Bob. You can use method foo, bar, blu for authentication
Client: I choose method "blu"
Server: Ok, then please tell send the magic blu token
Client: Here it is xyuasdusd8... I hope you like it.
Server: Fine, I accept this. Now I trust you. Now I know you are Bob
Client: Please show me the first message
Server: here it is:
Server:...
Client: looks like spam. Please delete this message
Server: Now I know that you want to delete this message.
Server: But won't delete it now. Please send me EXPUNGE to execute the delete.
Client: grrrr, this is complicated. I already told you that I want the message to be deleted.
Client: EXPUNGE
...
```

Of course roughly the same needs to be done with HTTP. But HTTP you can cut the task into several smaller HTTP requests. This gives the service the chance of delegating request-1 to server-a and request-2 to server-b. In the cloud, environment containers get created and destroyed in seconds. It is easier without a long-living connection.

In the above case (IMAP protocol) the EXPUNGE is like a COMMIT in relational databases. It is very handy to have a transactional database to implement a service. But it makes no sense to expose the transaction to the client.

Stateless is like IPO: Input-Processing-Output.

### No Shell Scripting

The shell is nice for interactive usage. But shell scripts are
unreliable: Most scripts fail if filenames contain whitespaces.
Shell-Gurus know how to work around this. But quoting can get complicated. I use the shell for interactive stuff daily. But I stopped
writing shell scripts.

Reasons:

- If an error happens in a shell script, the interpreter steps silently to the next line. Yes, I know you can use "set -e". But you don't get a stack trace. Without a stack trace, you waste a lot of time analyzing why this error happened.
- It makes sense to use (or run) an application monitoring platform. For example "Shell" is not a [supported plattform of Sentry](https://docs.sentry.io/platforms/). If you configure it for your prefered environment once, then you get great error reporting in once place. Even if your small backup-script is only a three lines long shell script: It is unreliable, use a real language!
- Shell-Scripts tend to call a lot of subprocesses. Every call to grep, head, tail, cut creates a new process. This tends to get slow.
I have seen shell scripts that start thousands of processes per second.
After re-writing them in Python they were 100 times faster and 100
times more readable.
- I do this `find ... | xargs` daily, but only while using the shell interactively. But what happens if a filename contains a space character? Yes, I know `find ... -print0 | xargs -r0`. BTW, I switched from find+xargs to [rg](https://github.com/BurntSushi/ripgrep) for most cases.
- Look at all the pitfalls: [Bash
Pitfalls](https://mywiki.wooledge.org/BashPitfalls)
- Even Crontab lines are dangerous. Look at this cron-job which should clean the directory of the temporary files:

> @weekly . ~/.bashrc && find $TMPDIR -mindepth 1 -maxdepth 1 -mtime +1 -print0 | xargs -r0 rm -rf

Do you spot the big risk?

Shell scripts are fine if they are conditionless. This means no "if", no "else", no "for".
For example in a Dockerfile you can use "RUN ...." commands to create a custom image. But I would not call things like this a shell script.
It is just a sequence of commands to execute.

### Portable Shell Scripts

I think writing portable shell scripts and avoiding bashism (shell
scripts that use features that are only available in the bash) is a
useless goal. It is wasting time. It feels productive, but it is not.

Avoid `#!/bin/sh`. The interpreter could be bash, dash, busybox, or something else.
See [Comparison of command
shells](https://en.wikipedia.org/wiki/Comparison_of_command_shells).
Please be explicit. Use `#!/bin/your-favorite-shell`.

If I look at this page
([DashAsBinSh](https://wiki.ubuntu.com/DashAsBinSh)), which explains how
to port shell scripts to /bin/dash I would like to laugh, but I can't
because I think it is sad that young and talented people waste their
precious time which this nonsense. Since systemd gets used, the shell
gets started less often (compared to the old system-V or BSD init). This
architectural change brought improvement. And I think that using dash
instead of bash brings no measurable benefit today. If you want it
minimal, then use Alpine Linux with Busybox.

If you are not able to create a dependency to bash, then solve this
issue. Use rpm/dpkg or configuration management to handle "my script
foo.sh needs bash".

I know that there are some edge cases where the bash is not available,
but in most cases, the time to get things done is far more important.
Execution performance is not that important. First: get it done
including automated tests.

### Server without a shell is possible

In the past, it was unbelievable: A Unix/Linux server that does not
execute a shell while doing its daily work. The dream is true today.
These steps do not need a shell: operating system boots. Systemd starts.
Systemd spawn daemons. For example a web server. The web server spawns
worker processes. An HTTP request comes in and the worker process handles
one web request after the other. In the past, the boot process and the
start/stop scripts were shell scripts. I am very happy that systemd
exists.

But time has changed. Today applications run in containers. Containers
don't need systemd. In [Kubernetes](https://en.wikipedia.org/wiki/Kubernetes) containers
get started and stopped, not services. There is no need for a daemon starting and
stopping services since this gets done on a higher level.

### Avoid calling command-line tools

I try to avoid calling a command-line tool if a library is available.

Example: You want to know how long a process is running (with Python).
Yes, you could call `ps -p YOUR_PID -o lstart=` with the subprocess
library. This works.

But why not use a library like
[psutil](https://pypi.python.org/pypi/psutil)?

Why do you want to avoid a third-party library?

Is there a feeling like "too much work, too complicated"? Installing a
library is easy, do it.

Check the license of the library. If it is BSD, MIT, LGPL, or Apache like, then
use the library.

Calling a subprocess is slow, especially if it gets done often you will notice
the difference soon.

That's one reason I dislike shell scripting. Calling `grep`, `cut`, `sed` again and
again wastes a lot of CPU time. You can see this with the command line tool `top`.
If the `sy` value is high, then your server is busy starting new processes. A library is
way more efficient, since you don't start new processes again and again.

### Shell Scripts are ok, if ...

Shell Scipts are ok, if they are almost aconditionless: Few "if", "else" or "for".

I use this heading, to ensure that the script is using Bash and stops if something is wrong (aka "Bash strict mode"):

```
#!/bin/bash
trap 'echo "Warning: A command has failed. Exiting the script. Line was ($0:$LINENO): $(sed -n "${LINENO}p" "$0")"; exit 3' ERR
set -Eeuo pipefail

...
```
And:

* you should check your script in CI with [shellcheck](https://github.com/koalaman/shellcheck).
* Use an IDE plugin which uses shellcheck.
* For formatting you can use [shell-format](https://marketplace.visualstudio.com/items?itemName=foxundermoon.shell-format) based on [shfmt](https://github.com/mvdan/sh)

I elaborated that here [Bash Strict Mode](https://github.com/guettli/bash-strict-mode)

### Avoid toilet paper programming (wrapping)

What is "toilet paper programming"? This is a pattern which was often
used in the past: There is something wrong inside - something is
smelling. Let's write a wrapper. Still something wrong? Let's write a
second wrapper.....

All these wrappers do not solve the underlying issue.

In the past, there were fewer alternatives. And since you had no choices,
you were forced to use a particular tool. If this did not work the way
you wanted it, you need to write a wrapper.

Today you have many more alternatives. If tool x does not work
the way you want it to, you can use tool y.

I am happy that the anti-pattern "toilet paper programming" gets used
less often today.

Example: WxPython (GUI toolkit) wraps WxWindows wraps gtk wraps xlib.

There are still some places where toilet paper wrappers need to get coded again and again.

For example, JSON does not support datetime, timedelta, and binary data. See [Let's fix JS](https://github.com/guettli/lets-fix-js). Speak to the upstream, to whoever is responsible for this, even if you think they are way too big, and you are way too small.

### If unsure use MIT License

The [MIT License](https://en.wikipedia.org/wiki/MIT_License) is simple and short. Most projects
at Github use it.

Some licenses are much too long. I tried to read the GPL twice, but I fell
asleep. I don't like things that I don't understand.

Next argument: The GPL and AGPL licenses are [viral](https://en.wikipedia.org/wiki/Viral_license). If you want
to create a commercial product, you can't use this.

For me "freedom" means no constraints. That's why I prefer
the MIT License, since GPL and AGPL have the constraint
that you must open your source, too.

See [Code licensed under AGPL MUST NOT be used at Google](https://opensource.google/docs/using/agpl-policy/)

### Loop in DB, not in your code

Do the filtering in the database. In most cases, it is faster than the
loops in your programming language. And if the DB is not fast enough,
then I guess there is just the matching index missing up until now.

### Do permission checking via SQL

Imagine you have three models (users, groups, and permissions) as tables
in a relational database system.

Most systems do the permission checking via source code. Example: `if
user.is_admin then return True`.

Sooner or later you need the list of items: Show all items which the
current user may see.

Now you write SQL (or use your ORM) to create a queryset that returns
all items which satisfy the needed conditions.

Now you have two implementations. The first `if user.is_admin then
return True` and one which uses set operations (SQL). This is redundant and looking
for trouble. Sooner or later your permission checks get more complex and then one implementation will get out of sync.

That's why I think: do permission checking via SQL

Some call this "Authorization predicate push-down"

### Real men use ORM

[ORM (Object-relational mapping)](https://en.wikipedia.org/wiki/Object-relational_mapping) makes daily
work much easier. The above heading is a stupid joke. Clever people use tools to make work simpler, more fun, and more
convenient. ORMs are great.

Some (usually elderly) developers fear that an ORM is slower than hand-crafted and optimized SQL. Maybe
there are corner cases where this prejudice is true. But that's not a reason to avoid ORMs. Just use them,
and if you hit a corner case, then use raw SQL.

See [premature optimization is the root of all evil](#premature-optimization-is-the-root-of-all-evil)

Make your life easy, use ORM.

Example: [Django ORM "Filtering on a Subquery() or Exists() expressions"](https://docs.djangoproject.com/en/dev/ref/models/expressions/#filtering-on-a-subquery-or-exists-expressions).

```
# Select all rows of the model Post, which have a comment which was created a day ago:

one_day_ago = timezone.now() - timedelta(days=1)
recent_comments = Comment.objects.filter(
post=OuterRef('pk'),
created_at__gte=one_day_ago,
)

Post.objects.filter(Exists(recent_comments))
```
For me above code is super easy to read.

### SQL is an API

If you have a database-driven application and a third party tool wants
to send data to the application, then sometimes the easiest solution is
to give the third party access to the database.

You can create a special database user that has only access to one table.
That's easy.

Nitpickers will disagree: If the database schema changes, then the
communication between both systems will break. Of course, that's true.
But in most cases, this will be the same if you use a "real" API. If
there is a change to the data structure, then the API needs to be
changed, too.

I don't say that SQL is always the best solution. Of course, HTTP based
APIs are often better for services which get consumed by third paries.

But for internal services PostgreSQL with a custom role (only access to one table, and
only allowed to do INSERT) works fine. You can use NOTIFY, so that you can handle the inserted data
immediately.

For professional internal services you can use [nats.io](https://nats.io/). But using NATS for small
project makes no sense.

### C is slow

... looking at the time you need to get things implemented. Yes, the
execution is fast, but the time to get the problem done takes "ages". I
avoid C programming, if possible. If Python gets too slow, I can optimize
the hotspots. But do this later. Don't start with the second step. First,
get it done and write tests. Then clean up the code (simplify it). Then
.... What is the next step? Optimize? In most cases, the customer has new
needs and he likely wants new features not faster execution.

Higher-level languages have a better "zero to [MVP](https://en.wikipedia.org/wiki/Minimum_viable_product)" speed.

### Three time dimensions

I think in software development there are three dimension of "time".

Most developer immediatley think about "execution time": How fast is the code? How can I make the code even faster?

But there are:

Time for "From wish to wow": How long does it take to implement and deploy a feature, so that the customer is happy?

Time for "From ? to Aha!": How fast can an other developer understand your code.

I think in most cases the proprity is like this: First "From wish to wow", then "From ? to Aha!", then "execution time".

Of course this depends on your context. If you developing on PostgreSQL-core, Python-core, Kubernetes-core or Linux-kernel then
execution time is very important.

But mere mortals do application development.

Of course the application should have a good performance.

But my hint is to optimize the performance of the application by using statistical profiling
of the production system. But just looking at the code and guessing how to optimize performance
won't help, if you have not measured the performance of the production system.

### Version Control: git

For version control of software, I use git. I think all other tools (svn,
mercurial, CVS, darcs, bazaar) can be considered "dead". See
[StackOverflow TagTrend](http://sotagtrends.com/?tags=git+svn+mercurial+cvs+darcs+bazaar)

The only exception to the rule "use git" is Google. They use their [own gigantic monorepo](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext).

### Avoid long-living branches

Avoid long-living branches in your git repos. The more time that passes,
the less likely is that your work will ever get merged. For me one week
is ok, but three weeks are too long.

Ten lines of improvement that get pushed to main today have much more value
than 1000 lines which are in a branch which will never get pushed to main.

Trunk based development goes further. Sounds good:

> ... each developer divides their work into small batches and merges that
> work into the trunk at least once (and potentially several times) a day.

See [Google DevOp Guide "Trunk based development"](https://cloud.google.com/solutions/devops/devops-tech-trunk-based-development)

### Don't put generated code into version control

Please read [Source code vs generated code](#source-code-vs-generated-code). Generated code or binary
data should not be in a git repository. It is possible but strange.

### The best commits remove code

For me, the best commits add some lines to the docs, add some lines to
tests and removes more lines than it adds to the production code.

### Time is too short to run all tests before commit+push

If the guideline of your team is: "Run all tests before commit+push",
then there is something wrong. Time is too short to watch tests running!
Run only the tests of the code you touched `py.test -k my_keyword`.

It's the job of automated CI (Continuous Integration) to run all tests.
That's not your job.

### Time is too short to care for "E302 expected 2 blank lines, found 1"

Style Guide Enforcements (like [flake8 for Python](https://flake8.pycqa.org/en/latest/)) don't help much.

Time is too short to manually make the style guide checker happy by
editing the source code.

> E302 expected 2 blank lines, found 1

I don't want to waste my time with "errors" like above. This is no error.
The code is great and makes the customer happy.

Reading the message, understanding it, opening the file, editing it, re-runing
the checker .... No, this is not productive.

The solution is (like almost always) **automation**

Style guide enforcement does not help.

Automated source code styling helps.

Unfortuantely this is not solved yet.

For the Python there is [black](https://github.com/psf/black), but it is not ready yet.

### CI

Use continuous integration. Only tested code is allowed to get deployed.
This needs to be automated. Humans make more errors than automated
processes.

Github Actions are great.

Increasing the version number can be done with [BumpVer](https://pypi.org/project/bumpver/) which
can use [Calendar Versioning](https://calver.org/) (for example YYYY.MM.X)

All I need to do is to commit. All other steps are automated :-)

### Tests should work offline

Imagine a developer sits on a train and has an unreliable network connection.

Nevertheless, I want that all tests can get executed.

For simple unit-tests that don't need a server, this is easy.

But if your test needs an HTTP-server, a database (PostgreSQL, MySQL),
a key-value DB (Redis), ... What can you do?

Automation is the solution. You can use a tool like Ansible to set up
the needed environment.

### CI Config

CI tools (GitLab, Travis, Jenkins) usually have a web GUI. Keep the
things you configure with the GUI simple. Yes, modern ci tools can do a
lot. With every new version, they get even more [turing complete](https://en.wikipedia.org/wiki/Turing_completeness) (this was
a joke, I hope you understood it). Please do separation of concerns. The
CI tool is the GUI to start a job. Then the jobs run, and then you can
see the result of the job in your browser. If you do configure condition
handling "if ... then ... else ..." inside the web-GUI, then I think you
are on the wrong track.

The ci tool calls a command line. To make it easy for debugging and
development this job should be callable via the command line, too. In
other words: the web GUI gets used to collect the arguments. Then a
command-line script gets called. Then the web GUI displays the result
for you. I think it is wise to avoid a complex CI config. If you want to
switch to a different ci tool (example from Jenkins to GitLab), then
this is easy if your logic is in scripts and not in ci tool
configuration.

### Avoid Threads, Async and Promises

Threads and Async are fascinating. BUT: It's hard to debug. You will
need much longer than you initially estimated. Avoid it, if you want to
get things done. It's different in your spare time: Do what you want and
what is fascinating for you.

There is one tool and one concept that is rock solid, well known, easy
to debug, and available everywhere and it is great for parallel
execution. The tool is called "operating system" and the concept is
called "process". Why re-invent it? Do you think starting a new process is
"expensive" ("it is too slow")? Just, do not start a new process for
every small method you want to call in parallel. Use a [Task
Queue](https://www.fullstackpython.com/task-queues.html). Let this tool
handle the complicated async stuff and keep your code simple like
running in one process with one thread. It is all about IPO:
Input-Processing-Output.

There is a good reason to use async: The [C10k
Problem](https://en.wikipedia.org/wiki/C10k_problem). BUT: I guess you
don't have this problem. If you don't have this problem, then don't use
technology which was invented to solve this issue :-)

The related part of the [Google Codereview Guidelines "Functionality"](https://google.github.io/eng-practices/review/reviewer/looking-for.html#functionality)

There is a huge difference between implementing a task-queue and using
a task-queue. If you implement a task-queue, then threads/async/promises/multiprocessing are
the building blocks. But taks-queues exist. There is no need to re-invent them.

I like to use task-queues, and write my code in a very predictable single-thread,
single-process synchronous way.

[Hick's Law](https://en.wikipedia.org/wiki/Hick%27s_law):

> increasing the number of choices will increase the decision time logarithmically.

Everytime I need to deal with async or task-queues (like celery or rq) my output decreases.
There are so many ways to handle parallelism. Now you could argue: "Thomas, parallelism is not the problem. The problem is that you are too stupid."
Maybe this is correct. Maybe I am too stupid (or not familiar with this topic). I guess I am just an medicore developer.
My experience is that the environment should be optimized for medicore (normal) people. This will provide the best result.
Thus my rule of thumb: keep it simple and try to avoid Threads, Async and all this parallel computing.

### Functions should return values, Not Promises.

Especially in JavaScript, functions often return [Promises](https://developer.mozilla.org/de/docs/Web/JavaScript/Reference/Global_Objects/Promise).

The `Promise` represents the eventual completion (or failure) of an asynchronous operation and its resulting value.

I don't like this. I want a method to execute synchronously and then return the result.

If I want a method to be executed asynchronously, then I (the caller) can use a Promise. But I don't want the function to decide "async or sync?".

I want to decide this, and I want the default to be "synchronous execution".

Pseudo Code (synchronous):
```
response = fetch('https://example.com')
my_json = response.json()
```

JavaScript (asynchronously)
```
const my_json = async () => {
const response = await fetch('https://example.com');
return response.json();
}
```

The second code snippet is way more complicated.

I think this can be compared to hyperlinks on web pages. The default is to follow the
hyperlink (synchronous). If the user wants to open the hyperlink in a new tab (asynchronous),
then this decision should
be done by the user, not by the one who created this hyperlink.

I have seen JavaScript code where almost every line contained `await`. That's childish.

### Don't waste time doing it "generic and reusable" if you don't need to

If you are doing some kind of software project for the first time, then
focus on getting it done. Don't waste time to do it perfectly, reusable,
fast, or portable. You don't know the needs of the future today. One main
goal: Try to make your code easy to understand without comments and make
the customer happy. First, get the basics working, then tests and CI,
then listen to the new needs, wishes, and dreams of your customers.

Example: If you are developing web or server applications, don't waste
time making your code working on Linux and MS-Windows. Focus on one
platform.

See [Minimum viable
product](https://en.wikipedia.org/wiki/Minimum_viable_product)

Related Book: [The Lean Startup](http://theleanstartup.com/book)

Several months after writing the above text I found this

[Google Codereview Guidelines "Complexity"](https://google.github.io/eng-practices/review/reviewer/looking-for.html#complexity)
> A particular type of complexity is over-engineering, where developers have made the code more generic than it needs to be, or added functionality that isn’t presently needed by the system. Reviewers should be especially vigilant about over-engineering. Encourage developers to solve the problem they know needs to be solved now, not the problem that the developer speculates might need to be solved in the future. The future problem should be solved once it arrives and you can see its actual shape and requirements in the physical universe.

Related: [YAGNI (You aren't gonna need it)](https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it)

### Use a modern IDE

Time for vi and emacs has passed. Use a modern IDE on modern hardware
(SSD disk). For example PyCharm. I switched from Emacs to PyCharm in
2016. I used Emacs from 1997 until 2015 (18 years).

### Easy to read code: Use guard clauses (early return)

Guard clauses (early return) help to avoid indentation. It makes code
easier to read and understand. See

Example:

# Code with unnecessary complexity

def my_method(my_model_instance):
if my_model_instance.is_active:
if my_model_instance.number > MyModel.MAX_NUMBER:
if my_model_instance.foo:
....
....
....
....
....

Better:

# Less complex because less indentation

def my_method(my_model_instance):
if not my_model_instance.is_active:
return
if not my_model_instance.number > MyModel.MAX_NUMBER:
return
if not my_model_instance.foo:
return
....
....
....
....
....

Look at the actual code which does something. I used five lines with
.... points for it. I think more indentation, makes the code more
complex. The "return" simplifies the code. For me, the second version is
much easier to read.

Please tell me, if you know a tool which can detect and maybe fix missing early returns for Python
code.

For Python there exists a "complexity checker": [radon](https://pypi.org/project/radon/), but AFAIK
it can't be used to detect missing early-returns.

### Source code vs generated code

I guess every young programmer wants to write a tool that automatically
creates source code. Stop! Please think about it again. What do you
gain? Don't confuse data and code. Imagine you have a source code
generator that takes DATA as input and creates SOURCE as output. What
is the difference between the input (DATA) and the output (SOURCE)? What
do you gain? Even if you have some kind of artificial intelligence, you
can't create new information if your only input is DATA. It is just a
different syntax. Why not write a program which reads DATA and does the
thing you want to do?

For the current context, I see only two different things: **source code**
for humans and **generated code** for the machine.

Just because a file contains code of a programming language, this does
not means that this file is source code.

If the TypeScript compiler creates JavaScript, then the output is
generated code since the created JavaScript source is intended for the
interpreter only. Not for humans. If you create JavaScript with a
keyboard and a text editor it is source code. Don't mix source code and
generated code in one file.

In other words: source code gets created by humans with the help of an
editor or IDE.

### Don't believe the "automatically create foo" hype

If you are new to software development you are fascinated by the magic.
You can create things! In this section, I call the magic output "foo".

Yes, you can automatically create foo with a script. Whatever "foo" is
in your context: It has no value. It is worth nothing. It is dust in the wind like a web page that displays the current time.
This output is only temporarily valuable.

Look at the basic IPO pattern: Input - Processing - Output (in this case
"foo").

Do not store "foo", the output of your script, in a database. Do not
store "foo" in version control.

It has no value since you can always create "foo" again. You just need
the input and your script.

You can store "foo" in a cache to improve performance. But do not store
it permanently. Don't make a backup of it.

Don't store automatically created data in your database. Instead re-calculate the data again
and again. Maybe a
[Materialized View (PostgreSQL)](https://www.postgresql.org/docs/current/rules-materializedviews.html) helps
you do improve speed.

A term that is often a hint to this anti-pattern is "generator". Yes,
you can generate a lot of data. But this bloated, generated data is just hot air with
little value.

DevOps who prefer "Op" to "Dev" tend to create a configuration with a script.
You can do this but then create the config again daily. Do not edit
the generated config by hand.

Related: [Single source of truth](https://en.wikipedia.org/wiki/Single_source_of_truth)

### Regex are great - But it's like eating rubbish

Yes, I like a regular expression. But slow down: What do I do, if I use a
regex? I think it is "parsing". I remember to have read this some time
ago: "Time is too short to rewrite parsers". Don't parse data! We live
in the 21 century. Consume high-level data structures like JSON, YAML, or
protocol buffers. If possible, refuse to accept CSV or custom text format
as input data.

From time to time you need to do text processing. Unfortunately, there
are several regex flavors. My guide-line: Use PCRE. They are available
in Python, Postfix, in `grep -P` and many other tools. Don't waste time with other
regex flavors, if PCRE is available.

Current Linux distributions ship with a grep version which has the -P
option to enable PCRE. AFAIK this is the only way to grep for special
characters like the binary null: [How to grep for special
character](https://superuser.com/a/612336/95878)

### Use a password manager

I use keepass. And sync it via Nextcloud.

Don't forget to add the content of your ~/.ssh/id_rsa file to it.

### CSV - Comma-separated values

CSV is not a data format. It is an illness. See the introduction at:

If your customer sends you tabular data in Excel, read the excel
directly. Do not convert it to CSV just because you think this is
easier.

If a customer wants you to send him CSV, ask if he can consume JSON.

There are great libraries for reading and writing Excel. For example:
[openpyxl](https://openpyxl.readthedocs.io/en/stable/)

Other alternatives to CSV:

* [zarr](https://zarr.readthedocs.io/en/stable/) (For data science (very long arrays))
* [jsonlines](http://jsonlines.org/) (for example for logfiles)

### Give booleans a "positive" name

I once gave a DB column the name "failed". It was a boolean indicating
if the transmission of data to the next system was successful. The
output as a table in the GUI looked confusing for humans. The column
heading was "failed". What should be visible in the cell for failed
rows? Boolean usually get translated to "Yes/No" or "True/False". But if
the human brain reads "Yes" or "True" it initially thinks "all right".
But in this case "Yes" meant "Yes, it failed". The next time I will call
the column "was\_successful", then "Yes" means "Yes, it was successful".
Some GUI toolkits render "True" as a green (meaning "everything is ok")
hook and "False" as a red cross (meaning "it failed").

### Love your docs

I have seen it several times on Github. Just have a look
at the README files on GitHub. They start with "Installing", "Configuring", then "Special Cases"...

What is missing? An introduction! Just some sentences
about what this great project is all about. Programmers prefer the details to the big picture,
the overview.

But "Project simple-foo simplifies foo" is not enough. What is "foo"?

Dear programmers, learn to relax and look at the thing you create like a newcomer. Imagine a newcomer who knows how to add two integers with his favorite programming
language. What is missing to make him understand why the project/lib/tool is needed?

First, you need to convince him that this project is worth a try, then if he knows
the "why?", then explain how to install it.

If you have this mindset "I do the important (programming)
stuff. Someone else can care for the docs", then your open source
project won't be successful.

If you write docs, then do it for newcomers. Start with the
introduction, define the important terms, then provide simple and straightforward use
cases. Put details and special cases at the end.

If your library gets used and you add a bug, you will get feedback soon.

Tests fail or even worse customers will complain.

But if you write broken docs, no one will complain.

Even if someone reads your mistake, it is unlikely that you get
feedback. Unfortunately, only a few people take this seriously and tell you
that there is a mistake in your docs.

How to solve this?

You need to act.

Let someone else read your docs.

The quality of feedback you get depends on the type of person you ask to
read your docs.

If it is a programmer, likely, he does not read your docs
carefully. Most software developers do not care for orthography and it
is hard for them to read the docs like a newcomer. They already know
what's written there, and they will say "it is ok".

My solution: resubmission: Read the text again 30 days later.

A good example is [gVisor](https://github.com/google/gvisor) the README starts with "What is gVisor?" and "Why does gVisor exist?"

### Test your docs

Keeping your docs in the same git repo like your code makes sense. This has the benefit that you have a review and testing process.

Integrate automated spell checking into the CI process.

### Canonical docs

Look at the question concerning OpenSSH options at the Q+A site [serverfault.com](https://serverfault.com/).
There is a lot of guessing. Something is wrong. Nobody knows where the
canonical upstream docs are. Easy linking to a specific configuration is not
possible. What happens? Redundant docs. Many blog posts try to explain
stuff... Don't write blog posts, instead, you should improve the upstreams docs. Talk with
the core developers. Open an issue in the issue tracker if you think something is missing in the docs.

Open an issue if the docs start with the hairy details and don't start
with an introduction/overview. Developers don't realize this, since they
need to deal with the hairy details daily. Don't be shy: Help them to
see the world through the eyes of a newcomer.

I am unsure if I should love or hate "wiki.archlinux.org". On the one
hand, I found there valuable information about systemd and other Linux
related secrets. On the other hand, it is redundant and since a lot of
users take their knowledge from this resource, the canonical upstream
docs get less love. First, determine where the canonical upstream docs
are. Then communicate with the maintainers. Avoid redundant docs.

In other words: Blog posts are nice, but they are like dust in
the wind. They explain a snapshot. Three months later they are outdated.
It makes more sense to add one missing sentence to the upstream docs,
then to create a blog post explaining something which is not explained
in the docs. At least in the open-source world. Since it is more likely
that you can influence the upstream docs.

Related: [Single Source of Truth (Wikipedia)](https://en.wikipedia.org/wiki/Single_source_of_truth)

Related: [Canonical URL](https://en.wikipedia.org/wiki/Canonicalization#URL)

Related: ["Don't repeat yourself" vs "We enjoy typing"](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself#DRY_vs_WET_solutions)

### One central glossary: One page per term

> There are only two hard things in Computer Science: cache invalidation and naming things.
> -- Phil Karlton

[Martin Fowler](https://martinfowler.com/bliki/TwoHardThings.html)

My best practice to solve the "naming things" challenge

* Define your terms, your terminology. For small projects, a glossary is enough, but for bigger projects, every term should have its page. It should be easy to create a hyperlink to this term. That's why I prefer the "one term, one-page" approach. Creating hyperlinks into a page (https://..../...#foo) are possible but less fun.
* The defined terms should not differ too much from the spoken words (or the words used in your chat/mail messages). If there is a difference, then alter the written definition.
* Someone should be responsible for the docs. "Everybody is responsible for it" does not work.
* Encourage and motivate people, again and again, to speak up if the docs are outdated.

More about this topic from me: [Intranets](https://github.com/guettli/intranets)

### Do not send long instructions to customers via mail

If you send long instructions to customers via mail, then these docs in
the mail are hidden magic. Only the customer who receives this mail
knows the hidden magic.

Publish your docs in your app. Send your customer a link to the online
docs.

Despite all myths: Some users read the docs!

And that's great if the user has more knowledge. Because this means you
have less work. Fewer emails, fewer interrupts, fewer phone calls :-)

This even applies to public discussion forums. Don't write too much. Create great docs and answer questions by
providing links to the docs. And be polite and include the question if this answers the question of the user.

[Permalinks](https://en.wikipedia.org/wiki/Permalink) are great, since they provide a single source of truth.

### Don't write tech-docs in a non-English language

General rule: don't waste time.

It is feasible to write high-level blog posts about tech topics in your favorite language.

Sometimes it is easier to communicate the holistic view in your mother-tongue.

But it is not feasible to write detailed tech stuff in a non-English language.

Example:

https://wiki.ubuntuusers.de/Installation_auf_externen_Speichermedien/

I came across this page because I want to install Linux on an external hard disc.

Unfortunately, there seemed to be no good English guide on how to do this.

The most solid guide I found during the first minutes was the above link. Unfortunately, the above guide was outdated.

Grrrrrr. Now I needed to choose:

* V1: Should I update the outdated german guide? It is a wiki editable by everybody.

* V2: I use an English guide, but they look not solid.

Grrr. I don't like thinking.

The people who created the German guide thought they help the world. They felt good
while doing what they did. I think they wasted time. Automatic translations are quite
good today. At least if you translate English to your favorite language.
I won't update the outdated German guide in the wiki. This would help only very few people.
Most people which want to install Linux on an external hard drive can either
read English text or they know who to translate English text to their favorite
language. I would update an Englisch wiki page since this would help a lot of people.

Don't get me wrong: Docs for applications you write should be in the language of your customers. Above text
is about tech-related docs.

My conclusion: Don't write tech-docs in a non-English language

### Care for newcomers

In the year 1997, I was very thankful that there was a hint "If unsure
choose ..." when I needed to compile a Linux kernel. These days you
need to answer dozens of question before you could compile the invention of
Linus Torvalds.

I had no clue what most questions were about. But this small advice "If
unsure choose ..." helped me get it done.

If you are managing a project: Care for newcomers. Provide them with
guidelines. But don't reinvent docs. Provide links to the relevant
upstream docs, if you just use a piece of software.

### Good example for "care for newcomers"

> Writing plugins
>
> It is easy to implement local conftest plugins .... Please refer to [Installing and Using plugins](https://docs.pytest.org/en/stable/plugins.html#using-plugins) if you only want to use but not write plugins.

That's great. That's newcomer focused documentation.

### Keep custom IDE configuration small

Imagine you lost your PC and you lost your development environment:

- IDE configuration
- Test data
- Test database

All that's left is your source code from version control, CI servers and
deployment workflow.

How much would you lose? How much time would you waste to set up your
personal development environment again?

Keep this time small. This is related to "care for newcomers". If you
need several hours to set up your development environment, then a new team member would need even much more time.

Although I use PyCharm and VSCode, the introduction of [Gitpod](https://www.gitpod.io/) gets it to the point:

> Gitpod does to Dev Environments what Docker did to Servers. Today we are emotionally attached (for better or worse) to our dev environments, give them names & massage them over time. They are pets - similar to servers before docker took advantage of namespaces and cgroups in Linux and turned these nice puppies into cattle.
> With Gitpod it is the same - we treat dev environments as automated resources you can spin up when you need them and close down (and forget about) when you are done with your task. Dev environments become fully automated and ephemeral. Only then you are always ready-to-code - immediately creative, immediately productive with the click of a button, and without any friction.

### Setting up a new development environment should be easy

This happened to me several times: I wanted to improve some open source
software. Up until now I only used the software, now I want to write a
patch. If setting up a new development environment and running the tests
is too complicated or not documented, then I will resign and won't
provide a patch. These steps need to be simple for people starting from
scratch:

- check out the source from version control
- check that all tests are working (before modifying something)
- write a patch and write a test for your patch
- check that all tests are working (after modifying something)

### Passing around methods make things hard to debug

Even in C, you can pass around method-pointers. It's very common in
JavaScript and sometimes it gets done in Python, too. It is hard to
debug. IDE's can't resolve the code: "Find usages" don't work. I try to
avoid it. I prefer OOP (Inheritance) and avoid passing around methods or
treating methods like variables.

But maybe this is just my strong backend related roots. I have never
coded in a big modern JavaScript-based environment.

I like it simple: Input-Processing-Output.

With "Input" being 100% data. Not a method.

### Software Design Patterns are overrated

If you need several pages in a book to explain a software design
pattern, then it is too complicated. I think Software Design Patterns
are overrated.

Why are so many books about software design patterns and nearly no books
about database design patterns?

### OOP is overrated

About OOP (Object-oriented programming)

**Stateless** has won. OOP is stateful:

1. Create an instance of a class
2. Call a method of this instance
3. Destruct the instance

Three steps vs one step.

OOP is great for implementing an [ORM (Object-relational mapping)](https://en.wikipedia.org/wiki/Object-relational_mapping). But implementing this should be done by people who have more experience than I have :-)

Here is code that uses the well-known jUnit style:
```
# OOP way
import unittest

class TestSMTP(unittest.TestCase):
def smtp_connection(self):
import smtplib
return smtplib.SMTP("smtp.gmail.com", 587, timeout=5)

def test_helo(self):
response_code, msg = self.smtp_connection().ehlo()
self.assertEqual(response_code, 250)
```

The non-object-oriented way:
```
# pytest way
import pytest

@pytest.fixture
def smtp_connection():
import smtplib
return smtplib.SMTP("smtp.gmail.com", 587, timeout=5)

def test_ehlo(smtp_connection):
response_code, msg = smtp_connection.ehlo()
assert response_code == 250
```

My rule of thumb: Less indentation, means less complexity, means better code.

Two things are simplified: The second version does not need a class or inheritance. Nice, since less code means fewer bugs.

In the second example the method `smpt_connection()` is not an instancemethod of a class, it just an unbound method. If a test
asks for a parameter with this name, then pytest gives the test the result of this method.

And look at the assertion: `self.assertEqual(response_code, 250)` vs `assert response_code == 250`. Namespaces
introduced by dots are great (`assertEqual` is in the namespace of `self`). But if one level is enough, then
this is even better.

Of course, this is opinionated, and it is 100% ok if you prefer the OOP-way and not the shorter solution.

### "Dependency injection" is just "Configuration"

For me, the term [Dependency injection](https://en.wikipedia.org/wiki/Dependency_injection) and the corresponding Wikipedia article are way too complicated.

For me, it is just "Configuration". But some people don't like it simple, they prefer .... (I removed this phrase since it was provocative. Feel free to add your favorite phrase here)

From Wikipedia "Dependency injection"

> In the following Java example, the Client class contains a Service member variable that is initialized by the Client
> constructor. The client controls which implementation of service is used and controls its construction.
> In this situation, the client is said to have a hard-coded dependency on ExampleService.

Now have a look at these docs [Database Settings](https://docs.djangoproject.com/en/3.0/ref/settings/#databases)

```
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'mydatabase',
}
}
```

That's all: Instead of hard-coded dependencies, you provide a way to configure your software.

I avoid the term "Dependency injection", since it is unclear to me.

### Test-Driven Development

red, green, refactor. More verbose: make the test fail, make the test
pass, refactor (simplify) code.

### Extract Method to get full coverage

Imagine you have a method like this:
```
def my_method(a, b, c):
# ten
# lines
# of
# code

if a > b:
# ....

# again
# ten
# lines
# of
# code
```

One thing is 100% sure: You can get full coverage with one test. You would
need to call the method twice: Once with `a > b` and once with opposite.

But you don't want to call this method twice, since useless executing
of the code above and below the "if" statementent. You want to avoid
that you test suite gets too big and too slow.

Maybe you could extract the condition into an new method:

```
def my_method(a, b, c):
# ten
# lines
# of
# code

d = handle_case_foo(a, b)

# again
# ten
# lines
# of
# code

def handle_case_foo(a, b):
if a > b:
return ...
return ...
```

This way you can test `my_method()` with one test, and you can write
a small test for `handle_case_foo()`.

### From bug to fix

First, make your bug reproducible. If it is reproducible, then it is easy
to fix it.

Make it reproducible in a test.

Imagine there is a bug in your method do\_foo(). You see the mistake
easily and you fix it. Done?

I think you are not done yet. I try to follow this guideline:

Before fixing the bug, search test\_do\_foo(). There is no test for this
method up until now? Then write it.

Now you have test\_do\_foo().

You have two choices now: extend test\_do\_foo() or write
test\_do\_foo\_\_your\_special\_case(). I use the double underscore
here.

Make the test fail (red)

Fix the code. The test is green now.

Slow down. Take a sip of tea. Look at your changes ("git diff" in your
preferred IDE). Is there a way to simplify your patch? If yes, simplify
it.

Run the "surrounding tests". If do\_foo() is inside the module "bar".
Then run all tests for module "bar" (I use py.test -k bar). But if this
would take more than three minutes, then leave the testing to the CI
which happens after you commit+push (you have a CI, haven't you?)

### Tests and production code go hand in hand.

You implemented the great method foo() and you implement a corresponding
method called test\_foo(). It does not matter if you write foo() first,
and then test\_foo() or the other way round. But it makes sense to store
both methods with one commit to one git repo.

Several months later you discover a bug in your code. Or worse: your
customer discovers it.

If you fix foo() you need to extend test\_foo() or write a new method
test\_foo\_with\_special\_input(). Again both changes (production code
and testing code) walk into the git repo like a pair of young lovers
holding hands :-)

Related [Guideline of Google: Codereview "Tests"](
https://google.github.io/eng-practices/review/reviewer/looking-for.html#tests)

### 80% unit-tests

* 80% unit-tests
* 15% integration tests
* 5% end-to-end tests

From [Software Engineering at Google](https://www.oreilly.com/library/view/software-engineering-at/9781492082781/)

### pre-commit hook

For basic syntax checking (aka linting) before commit I use [pre-commit](https://pre-commit.com/)

Adding simple checks is very easy: [hook to reject commit if a file contains a specific string](https://stackoverflow.com/a/66171121/633961)

### aaa-tests (smoke tests)

If you have a huge test-suite, which takes more than ten minutes to execute, then I recommend
to flag some tests. I call these tests "aaa" tests. These tests should be fast and check the basic
stuff.

This way you can check if most parts are all right before pushing code and triggering CI.

Some call these "smoke tests".

Why "aaa"?

Most test runners allow you to execute all tests which match a certain pattern. I name the tests "test_aaa_...",
and then I can easily run all these tests. Example: `pytest -k aaa`.

Running all aaa-tests should take less then a minute

But I don't call it automatically before each commit.

### Creating test data is much more important than you initially think

Creating test data is very important. It can help you with several
things:

1: It can help you to create a re-usable application. If you have only
one customer, it does not matter. But the real benefit of software is its re-usability.
Your code wants to get executed by several customers. As soon as you have two or more
customers you need a neutral test environment that is no specific to one of your customers.
It is a lot of work to create a neutral test environment if you have not done it from
day one. But the work only needs to be done once and helps in the long run.

2: It can help you to create presentation/demo systems.

3: It can help you in automated tests.

Your tests should not run on real data from customers.

If you create test data this should be automated. This way you can fill a new database with useful data. You should be able to create a
demo system with one command (or one click).

Write the creation of test data once and use it for both: presentions
and automated tests.

### Don't use random data for tests.

Do not use random data for tests. It just makes no sense: the test environment should
be reproducible, not flaky.

Some people use libraries which create random user names and addresses (street, city, postal code, .....) like [Faker](https://pypi.org/project/Faker/).

I don't see why a special library for creating test data is needed. Random data leads to flaky tests.

If you need some a list of names/addresses/ to fill you database, then I see these options:

* Option0: If you users have different roles, use a corresponding name: like "Admin", "Staff", "User", ...
* Option1: Be creative and/or use names which come to your mind: Bob Geldof, Steve Wonder, Mr. Bean, ...
* Option2: you can take data from here by hand: https://github.com/joke2k/faker/tree/master/faker/providers
* Option3: Use the [faker](https://faker.readthedocs.io/en/master/) library **once** and create some JSON. Store this JSON in your code or in an extra file. Then uninstall faker.

This way it is far easier to debug a test which works on your machine, but fails in CI. If you use random data, then
this is much harder. Imagine in CI a mail gets send to only three users, although four users should get an email. If you
use random data you can't differentiate between the users. If you use a predictable naming scheme, then you can distinguish between
the users.

This guideline is about writing tests. If you create demo-systems, then it is the same: Don't use
random data. The output should repeatable. Although for a demo-system you usualy want nice names.

If you use an ORM in your production code, then use the ORM to create your
test data.

I like [pytest fixtures](https://docs.pytest.org/en/latest/explanation/fixtures.html).

I know that there special cases and corresponing libraries which use fuzzing to test edge cases.
For example Golang has the package [fuzz](https://go.dev/doc/fuzz/). But that's only for special cases,
if is not needed for most application programming.

### How to create QA and staging systems?

Many teams create the QA and staging systems by copying the production system.

This works, but I think it is better to create these systems from code stored in
version control.

Creating a test system via code looks complicated at first, but it helps you to create
reliable, reproducible systems. This makes you faster in the long run.

### Don't check for counts in unittests.

> AssertionError: 8 != 9

That's a useless error message.

You have absolutely no clue if a test fails with a message like this.

It is much more useful to compare a list of strings.

### This is untestable code

If you are new to software testing, then you might think ... "some parts
of my code are *untestable*".

I don't think so. I guess your software uses the [IPO pattern](https://en.wikipedia.org/wiki/IPO_model): Input, Processing, Output. The
question is: How to feed the input for testing to my code? Mocking,
virtualization, and automation are your friends.

The "untestable" code needs to be cared for. Code is always testable,
there is no untestable code. Maybe your knowledge of testing is limited
up until now. Finding untestable code and making it testable is the
beginning of an interesting adventure.

### Flaky Tests

Tests are never flaky. If the same code ran fine yesterday, and it the same code
fails today, then the test itself is stable.

The environment is flaky. Some small bit in the environment is different today.

Maybe the servers are under more load today, which results in slower responses, which
results in timeouts.

Maybe it fails because there is a new test that executes before the flaky test and which
modifies the database.

Maybe a shared resource contains different data today.

...

The bigger your environment, the more likely you have flaky tests.

This is the way to avoid flaky tests:

* Keep your test, simple. Try to write stateless methods that receive only a few input.
* keep the environment, simple. If you can avoid Selenium, then avoid it. This will save you time.
* Avoid shared resources. Tests should have their own database, their own cache, ...
* ...

### Hermetic Testing

This blog from [Google Testing Blog "Hermetic Servers"](https://testing.googleblog.com/2012/10/hermetic-servers.html) explains it in-depth:
End-to-End tests are faster and less flaky if they run on localhost and don't need other resources.

This usualy means:

* The database is running on localhost
* storage server (S3) is running on localhost or in-memory
* Cache Server (Redis) is running on localhost or in-memory.

For storage and cache it is easy to find an in-memory solution (for Django [dj-inmemorystorage](https://github.com/waveaccounting/dj-inmemorystorage),
but for the database it is more difficult. My opinion: Use PostgreSQL during development.
Don't use SQLite, since it does not support all features of PostgreSQL.

### Hermetic Testing: N times on localhost

It should be easy for developers to set up several test-systems on his local machine.

If you are working on a larger change, it is really helpful to have one system with the old state, and
a second system with the new state.

Both systems should be hermetic, which means that they don't share resources.

Using the same database server is fine, but both should use different databases.

### Unit-Tests may use the ORM.

Imagine you use a framework that provides you a nice ORM to create, read, update, and delete your data.

Now you write some backend-methods on top of this ORM.

And on top of your methods, you might provide an HTTP API.

Imagine you have a class `Ticket` which has a method called `resolve()`. This method uses the ORM.

You want to write a unit-test for this method.

A purist argues: I only want to unit-test the method, I must not use the ORM since blablabla.

I understand what the purist wants. But I want to get things done. I want to make
customers happy, not unit-test purists.

For me, it is 100% ok if unit-tests use the ORM.

In other words: Only mock away things that take too long or things that need resources
which are not available (e.g. an SMTP server).

Related Podast: [Don't Mock your Database (Jeff Triplett)](https://testandcode.com/154)

### Is config code or data?

The heading "Is config code or data?" could be phrased as "config: DB or git?", too.

Where should configuration be stored?

This is a difficult question. At least at the beginning. For me, most
configuration is data, not code. That's why the config is in a
**database**, not in a text or source code file in a version control
system.

This has one major drawback. All developers love their version control
system. Most developers love git. It is such a secure place. Nothing can get lost
or accidentally modified. And if a change was wrong, you can always revert
to an old version. It is like heaven. Isn't it?

No, it is not. The customer can't change it. The customer needs to call
you and you need to do stupid repeatable useless work.

For me, the configuration should be in the database. This way you can provide
a GUI for the customer to change the config.

The configuration and recipes for the configuration management are
stored in git. But this is a different topic. If I speak about
configuration management, then I speak mostly about configuring Linux
servers and networks (aka [Infrastructure as code](https://en.wikipedia.org/wiki/Infrastructure_as_code)). In my case, this is nothing which my customer
touches.

### ForeignKey from code to DB

This code uses the ORM of Django

``` {.sourceCode .python}
if ....:
issue.responsible_group=Group.objects.get(name='Leaders')
```

`Group` is a class and refers to a table with the same name. Each group has a name. There
is one group (one row) with the name "Leaders".

The above code is dirty because 'Leaders' is like a ForeignKey from code to
a database row.

How to avoid this?

Create a global config table in your database. This table has exactly one
row. That's the global config. There you can create a column called
"Leaders" and there you store the ForeignKey to the matching group.

### Testcode is conditionless

Test code should not contain conditions (the keyword `if`). If you have
loops (`for`, `while`) in your tests, then this looks strange, too.

Tests should be straight forward:

> 1. Build environment: Data structures, ...
> 2. Run the code which operates on the data structures
> 3. Ensure that the output is like you want it to.

### Code Review: Start to look at the tests first

If I do a code review, I like to look at the tests first. This hides
the implementation from my eyes and shows how the method get used.

A clean interface is more important than a clean implementation. The
implementation can get refactored easily. The interface is harder to change,
since in most cases all usages of the interface need to be updated.

### Don't search the needle in a haystack. Inject dynamite and let it explode

Imagine you have a huge codebase that was written by a nerd who is
gone for several months. Somewhere in the code, a row in the database gets
updated. This update should not happen, and you can't find the relevant
source code line during the first minutes. You can reproduce this
failure in a test environment. What can you do? You can start a debugger
and jump through the lines which get executed. Yes, this works. But this
can take longer, it is like "Searching the needle in a haystack". Here is
a different way: Add a constraint or trigger to your database which
fires on the unwanted modification. Execute the code and BANG - you get
the relevant code line with a nice stack trace. This way you get the
solution provided on a silver platter with minimal effort :-)

In other words: Don't waste time searching.

Sometimes you can't use a database constraint to find the relevant
stack trace, but often there are other ways.....

If you can't use a database constraint, maybe this helps: Raise
Exception on unwanted syscall (Python+GDB)

If you want to find the line where unwanted output in stdout gets
emitted:

If you have a library that logs a warning, but the warning does not
help, since it is missing important information. And you have no clue
where this warning comes from. You can use this solution:

You can use `strace -e inject...` to perform syscall tampering for
the specified set of syscalls.

### Avoid magic or uncommon things

- hard links in Linux file systems.
- file system ACLs (Access control lists). Try to use as little as possible chmod/chown.
- git submodules (Please use dependency management, configuration management, deployment, tools, ...)
- [seek()](https://en.cppreference.com/w/c/io/fseek). Stateless is better. If you use seek() the file position is a state. Sooner or later the position (state) will be wrong.
- Scripts which get executed via OpenSSH [ForceCommand](http://man.openbsd.org/OpenBSD-current/man5/sshd_config.5#ForceCommand) or "command" in .ssh/authorized_keys. SSH is not an API, use http.
- I think even [symbolic links](https://en.wikipedia.org/wiki/Symbolic_link) are strange and outdated. Just some minutes ago I got confused because `grep -r foo .` did not show a result, but `grep foo ./my-dir/abc.txt` showed a result. Root-cause: `my-dir` was a symlink.

### Avoid writing a native GUI

Imagine you have developed web applications up until now. You have never
developed a native GUI before. Now a new potential customer has a use
case and you think: This time a native GUI would be a good solution.

Caution: slow down. Developing a native GUI is much more work and needs
much more time than you think.

The edit, compile, run cycle is much longer. This will slow you down.

If you develop a native GUI, you might need several mouse clicks until
you reach the part where you improving the current code. And like all
humans, you are not perfect, and you have a typo. The application
crashes, and you need to do the edit, compile, run, five clicks cycle
again...

Compare this to a web application: You do not need to do five clicks to
reach the part where you improve the current code. You just hit ctrl-r
and reload the page. The stateless HTTP protocol makes this possible. I
love it.

Next argument: The native GUI community is tiny compared to web
development. If you have a question, you have only a few people to talk
to.

I am at the Chemnitzer Linux Days yearly and meet a lot of newcomers
there. Some people new to software development think: "I just want to
develop a simple app for me. No need to run a web server. I want a real
application running on my pc."

My advice: use Python and Django. The things you learn have more value.
The knowledge you gain can be used to build cool stuff. If you have a
question, there is always someone who has an useful advice.

See the [TagTrend gtk, qt,
django](http://sotagtrends.com/?tags=%5Bgtk,qt,django%5D)

### Avoid writing native Apps

Developing a mobile-friendly web application is much easier than writing a
native app. If you can avoid it, then avoid writing a native app.

The development and release process is much slower.

Of course, the age of [Progressive Web Apps](https://web.dev/progressive-web-apps/) has just begun.
A lot of things are not possible in a web app up until now. Just be warned, that this road is
slow and in the long run deprecated, since the environments for PWAs are getting better every year.

### My prefered Web Stack

Python, Django, Gunicorn, Nginx, PostgreSQL, [htmx](https://htmx.org/), Bootstrap5.

This way I can write responsive mobile friendly applications.

I think React/Vue are in general overrated and not needed for my use cases.

### Learn one programming language, not ten.

Most young developers think they need to learn many programming languages
to be a good developer.

My opinion: Learn Python, SQL, and some JavaScript.

Then learn other topics: PostgreSQL, Configuration management,
continuous-integration, organizing, teamwork, learn to play a musical
instrument, long-distance running, family

### git

Moved here [git tips](https://github.com/guettli/git-tips)

### Avoid Conditional Breakpoints

Imagine, you can reproduce a bug in a test. But you could not
fix it at the moment. If you want to create a conditional breakpoint to find
the root of the problem, then you could be on the wrong track. Rewrite
the code first, to make it more fine-grained debuggable and testable.

Modify the source and test where a normal (non-conditional) breakpoint is enough.

This likely means you need to move the body of a loop
into a new method.

``` {.sourceCode .}
# Old
def my_method(...):
for foo in get_foos():
do_x(foo)
do_y(foo)
...
```

``` {.sourceCode .}
# new
def my_method(...):
for foo in get_foos():
my_method__foo(foo)

def my_method__foo(foo):
do_x(foo)
do_y(foo)
...
```

Now you can call `my_method__foo()` in a test, and you don't need a
conditional breakpoint anymore. This helps you now (during debugging), but raises
the overall value of the source code in the long run, too. Instead of a few big monster methods,
you have more small and easy to understand methods that follow the simple input-processing-output model.

### Make a clear distinction between Authentication and Permission Checks

It is important to understand the difference.

**Authentication** happens first: Is the user really Bob, or is there
just someone who pretends to be Bob?

**Permission Checks** Is Bob allowed to do action "foo"? Here we already
trust that the user is Bob and not someone else. I use the term
"Permission Checks" on purpose since the synonym "Authorization" sounds
too similar to "Authentication".

Related question:

Even the http-spec confuses both similar sounding words:

> There's a problem with 401 Unauthorized, the HTTP status code for authentication errors. And that’s just it: it’s for authentication, not authorization. Receiving a 401 response is the server telling you, “you aren’t authenticated–either not authenticated at all or authenticated incorrectly–but please reauthenticate and try again.

Source: [403 Forbidden vs 401 Unauthorized HTTP responses](https://stackoverflow.com/questions/3297048/403-forbidden-vs-401-unauthorized-http-responses)

General guidelines: Avoid [Homonyms](https://en.wikipedia.org/wiki/Homonym)

### Idempotence is great

Idempotence is great, since it ensures that it does no harm if
the method is called twice.

Errors (for example power outage) can happen in every millisecond.
That's why you need to decide what you want:

- if the power outage happened, some jobs do not get executed.
Cronjobs work this way.
- if the power outage happened, some jobs do get executed twice to
ensure they get done.

Further reading:
(I don't
use celery, but I like this part of the docs)

### File Locking is deprecated

In the past [File Locking](https://en.wikipedia.org/wiki/File_locking)
was a very interesting and adventurous topic. Sometimes it worked,
sometimes not, and you got interesting edge cases to solve again and
again. It was fun, especially on NFS (Network File System). Only hardcore experts know the difference between
fcntl, flock, and lockf.

.... But on the other hand: It's too complicated, too many edge cases,
too much wasting time.

There will be chaos if there is no central dispatcher.

I like tools like . It is simple and robust.
But the next time I create something like this, I will try [django-pg-queue](https://github.com/SweetProcess/django-pg-queue)

BTW, the topic is called
[Synchronization](https://en.wikipedia.org/wiki/Synchronization_(computer_science)).

Further reading about "task queues":

### No nested directory trees

If you store files, then avoid nested directory trees. It is complicated
and if you want to use a storage server like
[S3](https://en.wikipedia.org/wiki/Amazon_S3) later, you are in trouble.

Most storage servers support containers and
[blobs](https://en.wikipedia.org/wiki/Binary_large_object) inside a
container. Containers in containers are not supported, and that's good
since it makes the environment simpler.

### Code doesn't call mkdir

Code runs in an environment. This environment was created with
configuration management. This means: source code usually does not call
mkdir. In other words: Creating directories is part of configuration management. Setting up the environment and executing code
in this environment are two distinct parts. If your software runs, the
environment does already exist. Code creating directories if they do not
exist yet should be cut into two parts. One part is creating the
environment (gets executed only once) and the second part is the daily
executing (which is 100% sure that the environment is like it is. In
other words: the code can trust the environment that the directory
exists). These two distinct parts should be separated.

How to create directories if I should not do it with my software? With
automated configuration management (Ansible, Chef, ...) or during
installation (RPM/DPKG).

Exception: You create a temporary directory that is only needed for
some seconds. But since switching from subprocess/shell calling to using
libraries (see "Avoid calling command line tools") temporary files get
used much less.

### Debugging Performance

I use two ways to debug slow performance:

> - Logging and profiling, if you have a particular reproducible use
> case
> - Django Debug Toolbar to see which SQL statements took long in a
> HTTP request.
> - Statistics collected on production environments. For Python:
> or
>

### You provide the GUI for configuring the system. Then the customer (not you) uses this GUI

I developed a workflow system for a customer. The customer gave me an
excel sheet with steps, transitions, and groups.

The coding was the difficult part.

Then I configured the system according to the excel sheet.

The code was bug-free, but I made a mistake when I entered the values
(from excel to the new web-based workflow GUI).

The customer was upset because the configuration contained mistakes.

I learned. Now I ask if it would be ok if I provide the GUI and the
customer enters the configuration. In most cases, the customer likes to
do this.

There is a big difference. The customer feels productive if he does
something like this. I hate it. I care for the database design and the
code, but entering data with copy+paste from the Excel sheet ... No I
don't like this. Results will be better if you like what you do :-)

For detail lovers: No, it was not feasible to write a script that
imported the excel sheet to the database. The excel sheet was not well
structured.

*give a man a fish and you feed him for a day; teach a man to fish and
you feed him for a lifetime*

### Better error messages

If you have worked with Windows95, then you must have seen them: Empty
error messages with just a red icon and a button labeled "OK"