Making a Dumb NTP Clock Smart: DST Compensation with a Unikernel

My Mondaine SBB wall clock looks great on the wall, but it has a fundamental flaw: it treats NTP time as UTC and applies a fixed offset. It does have a manual DST toggle, but every time change I’d need to take it off the wall, connect to its Wi-Fi, and flip the setting. Cumbersome enough that I prefer spending a few hours implementing a fake NTP server to compensate :) ...

23 May 2026 · 4 min · 810 words · Clément Nussbaumer

Copy Fail: From Unprivileged Pod to Kubernetes Node Root

This article covers two complementary paths: the CNI wrapper staging chain, and the fully autonomous operator-SA compromise that eliminates the external trigger dependency. Both are proven on Talos Linux v1.12.4, Cilium v1.18.x, kernel 6.18.9. Update (May 5th): code and building blocks on GitHub: https://github.com/clementnuss/copyfail-cve-exploits Context I work at PostFinance, where we run a Kubernetes platform supporting banking workloads. Our production clusters run Debian 12 with kernel 6.1.158+, which happens to be not vulnerable to CVE-2026-31431 (more on that at the end). ...

1 May 2026 · 13 min · 2602 words · Clément Nussbaumer

Banking on Reliability: Cloud Native SRE Practices in Financial Services

This article is a written companion to my KubeCon EU 2026 talk of the same name. It covers four stories from five years of running a Kubernetes platform at PostFinance, a systemic Swiss financial institution: SLOs as a reliability driver, open-source monitoring tools, continuous end-to-end testing, and an interactive debugging session tracking down rare 502 errors. The interactive visualizations below (hash ring, race condition sequence diagram) are ported from the Slidev presentation so you can explore them at your own pace. ...

26 March 2026 · 8 min · 1587 words · Clément Nussbaumer

Adding PrometheusHistograms support to VictoriaMetrics/metrics

TL;DR: I added support for PrometheusHistograms (those with le buckets) to the VictoriaMetrics/metrics package (a lightweight alternative to prometheus/client_golang), which allows me to: Switch to the more lightweight VictoriaMetrics/metrics library in my open-source projects, which I find simpler to use Make it possible to choose between classical Prometheus histograms or VictoriaMetrics histograms (much more precise) with a flag Maintain compatibility with existing Prometheus-based monitoring setups Problem While working on kubenurse, I wanted to switch from the heavier prometheus/client_golang library to the more lightweight VictoriaMetrics/metrics package. However, there was one significant blocker: the VictoriaMetrics library only supported their own log-based histogram format, not the traditional Prometheus histograms with static le buckets. ...

12 July 2025 · 4 min · 745 words · Clément Nussbaumer

A connected farm, part 3 - weighbridge automation

The Weighbridge Next to the actual farm with the milking cows, the farm is also constituted of a biogas plant. Taking advantage of the facilities there (trucks, buildings, etc.), my wife’s family have been collecting “green waste” for years now, and up until 2024, the cost for taking care of that waste was being paid for by a “per-habitant” tax paid by the town. Recently however, due to the so-called “principe de causalité”, in place of a tax/fee per capita, people bringing green waste to the biogas plant will have to pay for the amount they brought. As a result, a weighbridge had to be installed, which is only one part of the equation. ...

25 February 2025 · 5 min · 1041 words · Clément Nussbaumer

HTTP 502 - Upstream errors with nginx

A little context I work at PostFinance, taking care of the Linux systems and of the Open-Source Kubernetes platform we are running to support all sorts of banking workloads. Aside from running the platform, we also take in user support issues (where users are internal developers/colleagues), and this blog article covers an issue named “Ingress gets HTTP 502 errors on high load”. You can take this article as an SRE exercise: I’ll provide the same data I received in the support issue, in the same order, and you should try to discover the actual issue as soon as possible. Good luck ;) ...

17 February 2025 · 5 min · 898 words · Clément Nussbaumer

A Connected Farm, part 2 - Remote Controlled Fence ⚡️

This article again covers a topic related to my wife’s family farm, but this time, instead of exporting milking data to Grafana, I will detail my usage of Michael Stapelberg’s amazing gokrazy project, which made it possible to reliably develop Go software to control fences around the farm. Fences and Cows 🐄 The farm is distributed on 2 sites, and on each site there are rather long electric fences, in which the cows happily pasture during the day (and for the heifer’s fence, also during the night). To prevent the cows from escaping the fences and e.g. eat our neighbour’s grass (which is always greener, as we all know), the fences are electrified ⚡️ with high voltage (6000V) impulsions every second. ...

11 May 2024 · 5 min · 995 words · Clément Nussbaumer

Kubenurse: The In-Cluster Doctor Making Network Rounds

TLDR: Kubenurse is the Swiss army knife for Kubernetes network monitoring. It will help you pinpoint bottlenecks and know the latency in your network identify nodes with network issues (packet drops, slow connection, etc.) uncover issues like DNS failures, broken sockets, or interrupted TLS negotiations Description Kubenurse is a Kubernetes network monitoring tool developed and open-sourced by PostFinance (a Swiss Banking Institution), which acts like an in-cluster doctor, continuously checking the health of your pod-to-pod, pod-to-service, and pod-to-ingress connections. ...

7 April 2024 · 6 min · 1101 words · Clément Nussbaumer

A Connected Farm, part 1 - Milking 🐄 🥛

Alongside my work as a System Engineer (with a focus on Kubernetes) at PostFinance, I’m married to a farmer in Switzerland, and live with her and her family on the family farm. This is quite different from my daily work, and I sometimes have the opportunity to help by, for example, feeding calves during milking, using my skills to install surveillance cameras, deploying a long-distance WiFi network across the farm, or modernizing the milking monitoring. It’s this latter point that I’m detailing today (without all the technical details, which are covered in the README of the open-source project I’ve created for this purpose). ...

17 February 2024 · 3 min · 554 words · Clément Nussbaumer

Backing up MariaDB on Kubernetes

Hosting MariaDB on Kubernetes proved so far a quite good experience: using the Bitnami Helm Chart to host a “standalone” instance (i.e. without replication, as replication already happens on the storage layer, and because simplicity is more valuable than a complex HA setup like Galera) of MariaDB worked out quite well. Being cautious, I had configured a daily backup to S3, using a tool found on Github, but when it came to restoring data dumped with this tool, which uses a pretty old mysqldump binary, I was stuck and couldn’t restore 😅 For some reason, the default config of the tool didn’t bother to escape quotes and other sensitive types of chars, and as a result I had to resort to restoring my daily velero backup of my MariaDB instance in another namespace to make a proper export from there and to finally restore my data. ...

27 December 2023 · 3 min · 505 words · Clément Nussbaumer