Debugging high memory usage of check containers

Hi all,
I’ve recently noticed that we’ve got some check processes that are spawned on our Concourse workers from some check containers that are consuming nearly 2 GB of memory as measured by top. It can happen that multiple check processes can appear at the same time and that each consume ~2 GB of memory, resulting in several GB of memory eaten up for resource checks. Some of our users’ jobs are now being terminated by the OOM killer, and I suspect that these check processes are to blame.

The problem is that I cannot find a way to identify which resources the check containers belong to so that I can debug this.

I know that there are some tickets concerning this issue (see below) that were closed because check containers will no longer exist in the near future. Regardless, can someone provide a work-around that will allow me to determine the pipeline and resource of a check container given its ID? Thanks a bunch!

Kyle

Links

I found a solution through the database. To get the pipeline name, resource name, and resource type, run the following query (tested with Concourse 5.6.0):

select pipelines.name, resources.name, resources.type
from pipelines
inner join resources
  on pipelines.id = resources.pipeline_id
inner join resource_config_check_sessions
  on resources.resource_config_id = resource_config_check_sessions.resource_config_id
inner join containers
  on resource_config_check_sessions.id = containers.resource_config_check_session_id
  where handle = :handle;

:handle is a variable so that you can run this script and parameterize it from the command line. For example:

$ psql -v handle="'8e342eab-6049-47d7-6926-3baf7dc6f63c'" -f pipeline.sql concoursedb
      name     |     name      |     type 
---------------+---------------+--------------
 pipeline-name | resource name | resource type
(1 row)

To get the container handle from a PID as reported by top, use the command:

ps -axfo pid,uname,cmd

Then, find the parent of the check process. The container handle will be the last part of the parent process’s name. In the following example, the container handle is 85c0cef8-cfca-4c49-7eb6-05105703225b

11565 root          \_ /var/gdn/assets/linux/bin/dadoo -runc-root /run/runc exec /var/gdn/assets/linux/bin/runc /home/concourse/worker/depot/85c0cef8-cfca-4c49-7eb6-05105703225b/processes/d8bda513-e5c8-4767-4de1-979c9e82a599 85c0cef8-cfca-4c49-7eb6-05105703225b

11581 4294967+      |   \_ /opt/resource/check