Kubenurse: The In-Cluster Doctor Making Network Rounds

TLDR: Kubenurse is the Swiss army knife for Kubernetes network monitoring. It will help you

pinpoint bottlenecks and know the latency in your network
identify nodes with network issues (packet drops, slow connection, etc.)
uncover issues like DNS failures, broken sockets, or interrupted TLS negotiations

Description

Kubenurse is a Kubernetes network monitoring tool developed and open-sourced by PostFinance (a Swiss Banking Institution), which acts like an in-cluster doctor, continuously checking the health of your pod-to-pod, pod-to-service, and pod-to-ingress connections.

It is a small Go application that runs as a DaemonSet on every node in your cluster, and which continously performs requests against the following endpoints:

kubenurse ingress endpoint itself, typically https://kubenurse.your-cluster-ingress.yourdomain.tld
→ this endpoint lets us know about the end-to-end latency, and also permits to detect ingress controller problems
kubenurse service endpoint, i.e. kubenurse.kubenurse.svc.cluster.local:8080
→ monitoring the service will be helpful in appreciating in-cluster network latency
Kubernetes API server / DNS, through its DNS name, kubernetes.default.svc.cluster.local
→ this endpoint captures both the K8s apiserver latency, as well as the DNS resolution inside the cluster.
Kubernetes API server / IP, through the direct endpoint, e.g. 10.127.0.1
→ same as above, but bypassing DNS resolution. Interesting and helpful in conjunction with the above to quickly identify DNS lookup errors/slowness
neighbouring kubenurse pods, e.g. towards node-02, node-03, …
→ especially helpful in diagnosing a neighbour with an erratic network connection.

It then collects error counters and detailed latency histograms, which can be used for alerting and visualization. All the collected metrics are partitioned with a type label, as can be seen in this excalidraw.com drawing which illustrates the different request types.

kubenurse_request_types

Metrics

For each request type, instrumentation functions around Golang’s http client record information such as the overall latency of the request, the fact that an error occurred during the request, and detailed information (time for DNS lookup, time for TLS establishment, etc.) thanks to instrumentation with Go http/httptrace package.

All this data is then available at the /metrics endpoint, and the following metrics are exposed.

metric name	labels	description
`kubenurse httpclient request duration seconds`	`type`	latency histogram for request duration, partitioned by request type
`kubenurse httpclient trace request duration seconds`	`type, event`	latency histogram for httpclient trace metric instrumentation, partitioned by request type and httptrace connection events
`kubenurse httpclient requests total`	`type, code, method`	counter for the total number of http requests, partitioned by HTTP code, method, and request type
`kubenurse errors total`	`type, event`	error counter, partitioned by httptrace event and request type
`kubenurse neighbourhood incoming checks`	n\a	gauge which reports how many unique neighbours have queried the current pod in the last minute

For metrics partitioned with a type label, it is possible to precisely know which request type increased an error counter, or to compare the latencies of multiple request types, for example compare how your service and ingress latencies differ.

The event label takes value in e.g. dns_start, got_conn, tls_handshake_done, and more. the detailed label values can be seen in the httptrace.go file.

Getting started

Installing Kubenurse is a child’s play with the provided Helm chart:

helm upgrade kubenurse --install \
  --repo=https://postfinance.github.io/kubenurse/ kubenurse \
  --set=ingress.url="kubenurse.yourdomain.tld"

Running that command should get you started, but you most likely need to double-check your logs to make sure that you don’t have any errors.

For the detailed configuration option, check the Helm parameters collapsible section of the README, or the environment variable part if you prefer to deploy with raw manifests.

Grafana

Once everything is running and metrics are properly collected, you can import the example Grafana dashboard to start visualizing the metrics:

kubenurse grafana overview

Neighbourhood check

Note

This chapter is rather technical, you can skip to the conclusion if you are not interested in knowing how hashing is used to randomly distribute the neighbourhood checks.

As documented above, kubenurse conducts a series of path_<neighbour-node-xx> checks against schedulable (i.e. non-cordoned) nodes, which permit to quickly identify nodes with latency issues or connectivity problems.

Neighbourhood filtering

While the neighbourhood check is really useful, without filtering, the number of requests for the neighbourhood check in a cluster with \( n \) nodes was growing as \( O(n^2) \), which rendered kubenurse impractical on large clusters, as documented in issue #55.

To solve this issue, I recently implemented a node filtering feature, which works as follows

Kubenurse computes its own node name checksum: currentNodeHash
it then computes the sha256 checksums for all neighbours’ node names, and it computes h := otherNodeHash - currentNodeHash
it puts the subtracted hash h in a size 10 max-heap, thereby keeping only the next 10 nodes to query.

If you want to take a look at the implementation for the node filtering, follow over here.

To make it more visual, here is an example with 6 nodes, where each node queries the next 3 nodes (i.e. the limit is set to 3 here):

node filtering drawing

Thanks to this filtering, every node is making queries to at most 3 (10 per default, configurable) nodes in its neighbourhood, unless one of the nodes is cordoned or deleted, in which case the following node in the list is picked.

This filtering introduces many benefits:

because of the way we first hash the node names, the checks are randomly distributed, independant of the node names. if we only picked the 10 next nodes in a sorted list of the node names, then we might have biased the results in environments where node names are sequential
metrics-wise, a kubenurse pod should typically only have histogram entries for ca. 10 other neighbouring nodes worth of checks, which greatly reduces the load on your monitoring infrastructure
because we use a deterministic algorithm to choose which nodes to query, the metrics churn rate stays minimal. (to the contrary, if we randomly picked 10 nodes for every check, then in the end there would be one prometheus bucket for every node on the cluster, which would put useless load on the monitoring infrastructure)

Per default, the neighbourhood filtering is set to 10 nodes, which means that on cluster with more than 10 nodes, each kubenurse will query exactly 10 nodes, as described above.

Conclusion

Kubenurse is a lightweight, easy-to-use, and powerful Kubernetes networking monitoring tool that provides millisecond-level latency insights. By using Kubenurse, you can

troubleshoot network issues faster by pinpointing problems like ingress errors or DNS issues.
set meaningul alerts and SLOs for your ingress latency, the apiserver latency, the node-to-node latency, etc.
quickly identify broken nodes with flappy network links thanks to neighborhood checks.

Finally, PRs and issues are open, feel free to contribute or ask if something is unclear or could be improved, I’ll be happy to work on it :)

Description#

Metrics#

Getting started#

Grafana#

Neighbourhood check#

Neighbourhood filtering#

Conclusion#