Slow task execution

#1

Hello,

We’re experiencing slow task execution. Once a task is executing it runs quickly enough but usually spends a good minute or so ‘pending’ whether the image has been cached or not.

For context we’re running one web, and three workers using binaries on EBS backed AWS t2.large instances pulling images from ECR and credentials from parameter store. There don’t appear to be any errors in the logs and as far as we can see in the AWS console utilisation is low (<10%). Today I tried increasing the user file limit but it doesn’t look to have made a difference.

Does anyone have any thoughts as to what the problem could be? Are there any recommended minimum requirements? I’ve not seen any in the documentation.

This is on version 4.2.1

#2

Are you checking logs? Sounds like it’s probably doing container cleanup.

#3

Yes I’m scanning the logs. There is cleanup activity going on but it looks normal as far as I can see. One thing I have noticed is that Concourse seems to be cycling through check containers. Is this normal? I’d have thought it would keep a container around if it’s checking a resource rather than destroying and recreating them every minute.

#4

We finally had time to take another look at this. We were able to solve it in the end by swapping to the btrfs filesystem. Task execution start time has gone from around 1 to 2 minutes down to 8 seconds

1 Like