Banking on Reliability: Cloud Native SRE Practices in Financial Services

This article is a written companion to my KubeCon EU 2026 talk of the same name. It covers four stories from five years of running a Kubernetes platform at PostFinance, a systemic Swiss financial institution: SLOs as a reliability driver, open-source monitoring tools, continuous end-to-end testing, and an interactive debugging session tracking down rare 502 errors. The interactive visualizations below (hash ring, race condition sequence diagram) are ported from the Slidev presentation so you can explore them at your own pace. ...

26 March 2026 · 8 min · 1587 words · Clément Nussbaumer

HTTP 502 - Upstream errors with nginx

A little context I work at PostFinance, taking care of the Linux systems and of the Open-Source Kubernetes platform we are running to support all sorts of banking workloads. Aside from running the platform, we also take in user support issues (where users are internal developers/colleagues), and this blog article covers an issue named “Ingress gets HTTP 502 errors on high load”. You can take this article as an SRE exercise: I’ll provide the same data I received in the support issue, in the same order, and you should try to discover the actual issue as soon as possible. Good luck ;) ...

17 February 2025 · 5 min · 898 words · Clément Nussbaumer