Kubenurse: The In-Cluster Doctor Making Network Rounds

TLDR: Kubenurse is the Swiss army knife for Kubernetes network monitoring. It will help you pinpoint bottlenecks and know the latency in your network identify nodes with network issues (packet drops, slow connection, etc.) uncover issues like DNS failures, broken sockets, or interrupted TLS negotiations Description Kubenurse is a Kubernetes network monitoring tool developed and open-sourced by PostFinance (a Swiss Banking Institution), which acts like an in-cluster doctor, continuously checking the health of your pod-to-pod, pod-to-service, and pod-to-ingress connections....

7 April 2024 · 6 min · 1101 words · Clément Nussbaumer

A Connected Farm, part 1 - Milking 🐄 🥛

Alongside my work as a System Engineer (with a focus on Kubernetes) at PostFinance, I’m married to a farmer in Switzerland, and live with her and her family on the family farm. This is quite different from my daily work, and I sometimes have the opportunity to help by, for example, feeding calves during milking, using my skills to install surveillance cameras, deploying a long-distance WiFi network across the farm, or modernizing the milking monitoring....

17 February 2024 · 3 min · 554 words · Clément Nussbaumer

Backing up MariaDB on Kubernetes

Hosting MariaDB on Kubernetes proved so far a quite good experience: using the Bitnami Helm Chart to host a “standalone” instance (i.e. without replication, as replication already happens on the storage layer, and because simplicity is more valuable than a complex HA setup like Galera) of MariaDB worked out quite well. Being cautious, I had configured a daily backup to S3, using a tool found on Github, but when it came to restoring data dumped with this tool, which uses a pretty old mysqldump binary, I was stuck and couldn’t restore 😅 For some reason, the default config of the tool didn’t bother to escape quotes and other sensitive types of chars, and as a result I had to resort to restoring my daily velero backup of my MariaDB instance in another namespace to make a proper export from there and to finally restore my data....

27 December 2023 · 2 min · 306 words · Clément Nussbaumer

Minimal downtime when rebooting etcd nodes

Graceful leader changes When needing to restart some Kubernetes control-plane nodes on which etcd also happens to be running, you will prefer a graceful transfer of the leadership of the etcd cluster, to reduce the transition period that comes with a leader election. This can be achieved with the following script, provided you specify the adequate environment variables in /etc/profile.d/etcd-all file. set -o pipefail && \ source /etc/profile.d/etcd-all && \ AM_LEADER=$(etcdctl endpoint status | grep $(hostname) | cut -d ',' -f 5 | tr -d ' ') && \ if [[ $AM_LEADER = "true" ]] then NEW_LEADER=$(etcdctl endpoint status | grep -v $(hostname) | cut -d ',' -f 2 | tr -d ' ' | tail -n '-1') && \ etcdctl move-leader $NEW_LEADER && sleep 15 fi Info: the following environment variables need to be set, for example through a file such as: /etc/profile....

7 July 2023 · 1 min · 154 words · Clément Nussbaumer

Kubernetes CNI — deconstructed

A few months ago, I had to understand in detail how Container Network Interface (CNI) is implemented to, well, simply get a chaos testing solution working on a bare-metal installation of Kubernetes. At that time, I found a few resources that helped me understand how this was implemented, mainly Kubernetes’ official documentation on the topic, and the official CNI specification. And yes, this specification simply consists of a Markdown document, which I needed to invest a consequent amount of energy to digest and process....

29 March 2021 · 6 min · 1204 words · Clément Nussbaumer