I am seeing a lot of "queue-full" errors in my logs

My users are starting to see a lot of 502 errors when attempting to log into Concourse. It is intermittent; some get in fine, others don’t. When I check Splunk for my logs I am seeing a lot of errors ending with queue-full. I can’t find reference to my errors here or in the Discord channels. Here are a couple of examples:

io.kubernetes.container.name: concourse-prod-web	
io.kubernetes.pod.name: concourse-prod-web-75f87dfcc9-npfrh	
io.kubernetes.pod.namespace: concourse-prod	

line: {"timestamp":"2019-07-18T20:42:05.653956680Z","level":"error","source":"atc","message":"atc.http-response-time.queue-full","data":{"session":"4259005"}}	
source: stdout

and

io.kubernetes.container.name: concourse-prod-web	
io.kubernetes.pod.name: concourse-prod-web-75f87dfcc9-npfrh	
io.kubernetes.pod.namespace: concourse-prod	
	
line:	{"timestamp":"2019-07-18T20:42:05.649669800Z","level":"error","source":"atc","message":"atc.pipelines.scheduler.tick.lock.acquire.lock-acquired.queue-full","data":{"id":[2,30],"pipeline":"<redacted company specific info>","session":"16.13528.22181.1.1.1","team":"<redacted company specific info>"}}	
source: stdout

Our environment is:
Concourse 5.2.0
Web nodes: 3

  • CPU: 2 cores
  • RAM: 4G

Worker nodes: 8

  • CPU: 6 cores
  • RAM: 60GB
  • Storage: 600GB

Thanks,
Matt

I’ve been seeing a lot of this myself. It looks like a fix just got merged (https://github.com/concourse/concourse/pull/3937) and is slated for 5.5, and that this log means that events aren’t being forwarded to your time series database, but nothing else.