Super Unicorn Inkmi Logo

How I made Inkmi Self Healing with Go and Systemd

Self Healing Code for Less Work

Inkmi is Dream Jobs for CTOs and written as a decoupled monolith in Go, HTMX, Alpinejs, NATS.io and Postgres. I document my adventures and challenges in writing the application here on this blog, tune in again.

I’m building Inkmi - Dream Jobs for CTOsas a solo entrepreneur. So time is precious, and I don’t have devops people looking into things.

To make Inkmi less error-prone and less work to operate, I looked for ways to make it self-healing. There are many practices out there and I have adopted some:

  • Systemd listeners
  • Restart every 24h
  • Check yourself if you are working, otherwise stop
  • Send notifications to a monitor
  • Check if your binary has changed

Inkmi is written in Go and deployed as one binary with embedded assists. I use systemd to start and monitor the binary (reduce privileges, manage memory).

Inkmi Self Healing

Systemd listeners

Systemd does TCP networking and keeps connections working. This queues connections outside the binary and helps with spikes. When the binary restarts, Systemd keeps the connections and hands them over to the newly started Inkmi. This way customers don’t see connections interrupted or ended.

With the systemd API for Go and Echo this looks like

listeners, err := activation.Listeners()
// IP address we're listening to
_ = listeners[0].Addr().String()
// set listeners for Echo to use
e.Listener = listeners[0]

24h Restart

Something from my Windows NT days and that I laughed about back then, Systemd restarts the binary every 24h. Should there be a memory or other problem, it will go away. One needs to check graphs in e.g. Grafana though, otherwise you will miss a problem that needs to be fixed in code.

Check if you’re still working

Inkmi tries a /health URL every minute to check if it still responds. If it doesn’t respond to itself, it exists. Systemd then will restart the binary. With Systemd keeping TCP connections, customers will not see an impact. As Inkmi is written in Go, restarts are very fast, e.g. compared to JVM language restarts.

Send notifications

Inkmi sends a notification to Systemd every few seconds. If Systemd stops receiving those, it assumes the binary hangs and restarts the binary.

Telling systemd we’re ready to receive connections:

_, err := daemon.SdNotify(false, daemon.SdNotifyReady)

Telling systemd we’re ok:

_, _ = daemon.SdNotify(false, daemon.SdNotifyWatchdog)

Check for a changed binary

The Go Inkmi binary watches itself for changes. When the binary changes, Inkmi exists. Systemd then will restart Inkmi with the new binary. This is how releases have no impact on customers.

watcher, err := fsnotify.NewWatcher()
...
_ = watcher.Add(os.Args[0])
<-watcher.Events
...

Example systemd configuration

The systemd service file looks something like this, then you need the socket file.

[Unit]
Description=MyService
Requires=myservice.socket
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/myservice
WatchdogSec=30s
Restart=always
RuntimeMaxSec=1d

ProtectSystem=strict
ProtectHome=true
PrivateUsers=true
PrivateTmp=true
DynamicUser=yes

[Install]
WantedBy=multi-user.target

Results

With this in place, the Inkmi binary has been very stable and I have less work tending the application of my startup. In my former startups I didn’t have a self healing setup and systems needed much more attention.

About Inkmi

Inkmi is a website with Dream Jobs for CTOs. We're on a mission to transform the industry to create more dream jobs for CTOs. If you're a seasoned CTO looking for a new job, or a senior developer ready for your first CTO calling, head over to https://www.inkmi.com

Other Articles

©️2024 Inkmi - Dream ❤️ Jobs for CTOs | Impressum