One Kubernetes node is crashed. Multiple pods affected
Incident Report for DFDS IT

Resolved
This incident has been resolved.
Posted Mar 14, 2024 - 09:20 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Mar 13, 2024 - 16:40 UTC
Identified
Node running out of memory due to high increase of memory of grafana-agent. Also a rogue hangfire process in a dev container seems to contribute to this.

A new node has been spawned. The rogue container has been scaled down in order to establish if they both are the root cause of the problem.
Posted Mar 13, 2024 - 15:26 UTC
This incident affected: Kubernetes critical components (Kubernetes [Hellman] - Capacity/Scheduling).