Banking on Reliability: Cloud Native SRE Practices in Financial Services

This article is a written companion to my KubeCon EU 2026 talk of the same name. It covers four stories from five years of running a Kubernetes platform at PostFinance, a systemic Swiss financial institution: SLOs as a reliability driver, open-source monitoring tools, continuous end-to-end testing, and an interactive debugging session tracking down rare 502 errors. The interactive visualizations below (hash ring, race condition sequence diagram) are ported from the Slidev presentation so you can explore them at your own pace. ...

26 March 2026 · 8 min · 1587 words · Clément Nussbaumer

DNS servers monitoring

A few months ago, I found myself needing to know about the reliability of some internal DNS provider’s servers, after getting a series of hardly trackable random network issues, aka “It’s always DNS”. More specifically, I needed to know about the following: number of errors/timeouts capability to query over TCP or UDP capability to monitor multiple DNS servers at once return codes received in the answer (i.e. NOERROR, SERVFAIL, NXDOMAIN, you name it) ...

31 July 2023 · 3 min · 612 words · Clément Nussbaumer