we switched our worker nodes from ubuntu 20.04 to centos8 (we have three EC2 instances on AWS, they’re m5a.xlarge) and the workers keep stalling typically within an hour or two.
the documentation is, as expected, pretty hopeless in providing any kind of troubleshooting. I’m going to see if I can contribute to that.
as you can see, there’s no suggestions as to how to diagnose the cause of stalling or any idea for mitigation: https://concourse-ci.org/concourse-worker.html#worker-heartbeating-and-stalling
Is anyone else running centos8 as a worker node, and do you have stalling problems or do anything special to prevent it?
back story: we mainly use CentOS because our main development needs to be RHEL compatible. We had been using ubuntu20.04 for our worker nodes because the kernel supported runc out of the box, but since an official centos8 AMI for AWS came out recently, it seemed time to make the switch and thereby run everything on centos, which makes our Puppeting modules simpler.