Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/amscanne/huptime

Utility for zero downtime restarts of unmodified programs.
https://github.com/amscanne/huptime

Last synced: about 2 months ago
JSON representation

Utility for zero downtime restarts of unmodified programs.

Awesome Lists containing this project

README

        

High uptime
===========

[![Build Status](https://travis-ci.org/amscanne/huptime.png)](http://travis-ci.org/amscanne/huptime)

Huptime is a tool for achieving zero downtime restarts without the need to
modify your program in any way.

Although many applications support reloading configurations while running, a
zero downtime restart allows for upgrading the application code without
rejecting any clients.

Basic Example
-------------

In a terminal, run:

huptime --exec python -m SimpleHTTPServer &
PID=$!

Then, in a second terminal:

while true; do curl http://localhost:8000 2>/dev/null || echo "fail"; done

Finally, in a third terminal (or back in the first):

kill -HUP $PID

You should see no "fail" output on the second terminal.

With this reload, the complete code for SimpleHTTPServer is reloaded
(potentially with changes), but at no time are connections denied or dropped.
When the new version is up and running again (i.e. it binds the socket and
calls accept), then pending connections will be processed.

Why?
----

With continuous deployment, software can be updated dozens, hundreds or even
thousands of times per day. It is critical that service is not interrupted during
upgrades.

In an ideal world, all applications would support a mechanism for doing zero
downtime restarts. The reality is that many standard frameworks make this
difficult to do from the top down. It's not practical to plumb this
functionality through every layer, particularly for applications over which you
have no control.

Compound this with the fact that many applications consist of many different
small components (written using different languages and frameworks), and you've
got yourself a headache.

Because of this complexity, one of the first things people have to do is implement
a custom load balancing tier and a complex upgrade process. Although this is
important at a certain scale, it shouldn't be that hard for simple services.
It's crazy to add a whole new tier when the problem can be solved in a much
simpler way!

Huptime attempts to make it very simple to achieve these restarts for simple,
unmodified applications.

How do I install it?
--------------------

Building this package gnerally requires:
* make
* gcc and g++
* python
* rpmbuild (optional) and dpkg (optional)

Clone the repo:

git clone http://github.com/amscanne/huptime

Install it the old-fashioned way:

cd huptime && sudo make install

Want Ubuntu & Debian packages?

cd huptime && make deb && dpkg -i huptime*.deb

How about CentOS or RedHat?

cd huptime && make rpm && rpm -i huptime*.rpm

How do I use it?
----------------

You simply need to run services via huptime.

For example:

# Start the service.
huptime /usr/bin/myservice &

# Zero downtime restart.
killall -HUP myservice

# Or, if you prefer...
huptime --restart /usr/bin/myservice

If there is a pidfile, it can be reset on restart:

# Start the service.
huptime --unlink /var/run/myservice.pid /usr/bin/myservice &

# Zero downtime restarts.
killall -HUP myservice

# Again, if you prefer...
huptime --restart /usr/bin/myservice

Or, if you need exec (for example, to run under upstart):

# Start the service and get the PID.
huptime --exec /usr/bin/myservice &
PID=$!

# Zero downtime restart (same PID).
kill -HUP $PID

# Again, as always...
huptime --restart /usr/bin/myservice

What does it support?
---------------------

Huptime should [+] handle the following normal things:

* Daemonization & pid files
* Process pools
* Multiple server sockets
* Event-based and thread-based servers
* Integration with supervisors (just use exec!)

In terms of languages and frameworks, huptime should support nearly all
programs that are *dynamically linked* against a *modern libc*.

Most modern dynamic languages (python, ruby, node, etc.) fall into this
category. Most C/C++ programs also fall into this category. A unique exception
is *go*, which invokes system calls directly and uses only static linking.
(For the record, I am a big fan of this approach. However, both have their
merits).

[+] Should. YMMV.

What else does it do?
---------------------

* Transparent multi-binding (running the same service multiple times)

If you are running Linux 3.9+, then you can also easily enable pools of
processes by starting your services with the *--multi* option. Again, this does
not require any modification on your application.

For example:

# Start the service (4 workers).
huptime --multi=4 /usr/bin/myservice &

# Zero downtime restart of all.
killall -HUP myservice

# Or, if you prefer...
huptime --restart /usr/bin/myservice

Want to manage the number of running scripts yourself?

pids="";

reload() {
for pid in $pids; do
kill -HUP $pid;
done
}

stop() {
for pid in $pids; do
kill -TERM $pid;
done
}

trap reload SIGHUP;
trap stop SIGTERM;
trap stop SIGINT;

count="0";
while [ "$count" -lt "$N" ]; do
huptime --multi=1 /usr/bin/myservice &
pids="$pids $!";
count=$(($count + 1));
done

for pid in $pids; do
wait $pid;
done

* Transparent restart on exit

If you've got a stubborn program and you want to restart it automatically when
it fails, you should really fix your program. Barring that, you should use a
supervisor like upstart. Barring *that* (you don't care about the sensible
features that a supervisor provides and want zero downtime restarts), you can
use huptime.

To enable this option, simply specify *--revive* on the huptime command line.

For example:

# Start a zero downtime netcat.
huptime --revive nc -l 9000 < message.txt &

# Clients will always find a server...
nc localhost 9000

How does it work?
-----------------

Huptime installs a signal handler for `SIGHUP`.

It tracks open file descriptors by intercepting calls to `bind` and `accept`
(among other things). When the program receives a `SIGHUP`, it will
intelligently `exec` a new copy of the program *without* closing any bound
sockets and without requiring any changes to the program.

Note that this is not simply a reload but rather a new version of the
application with config changes and code changes (as both now appear on disk).

When the new copy of the program tries to bind the same socket, huptime will
silently replace it with the still-open socket from the previous version.

There are two fundamental modes of operation:

* fork (default)

If you use fork, then when the process receives a `SIGHUP`, then it will `fork`
and `exec` a new copy of the application. This results in less downtime, as new
requests can start being served immediately, while old requests are still being
finished by the original program.

This may not integrate cleanly with supervisor processes like upstart however,
which depend on the PID of the application staying constant.

This may also present issues for some applications that check pidfiles or
contain internal mechanisms for preventing two copies of themselves from
running. Huptime goes to some effort to prevent conflict (allowing for unlink
prior to executing the child), but it may still arise.

* exec

If you use exec, then when a process receives a `SIGHUP`, then it will begin
queueing requests to the bound socket (in the kernel) and wait until all
outstanding requests are finished. Only when existing requests are finished
will the program restart.

This may not work properly if requests are not bounded in how long they will
take. This may also lead to high response times for some clients during the
restart. However, this approach will play well with supervisors.

For example, if you are using upstart, you can do the restart as:

upstart reload service

Limitations
-----------

Although the majority of programs will work, I'm sure that *all* will not.

The exit is not done through the normal application path on restart. Although
all file descriptors are closed, there may be application-level resources (or
some system resources) that are not cleaned up as expected and may cause
problems.

The command line and environment cannot be changed between restarts. You can
easily work around this issue by putting all configuration inside a file that
is read on start-up (i.e. `myservice --config-file=/etc/myservice.cfg`).

What's up with the name?
------------------------

It's clever! Services are often reloaded using `SIGHUP`. The point of this tool
is to maximize uptime by enabling zero downtime restarts via `SIGHUP`. It's
your high availabilibuddy!