Behavior of metrics emitters with multiple ATCs


#1

Hi everyone,

We recently experimented with using two ATCs in our CI instead of one. The first thing that we noticed when we implemented the second ATC was that the average rate of resource checks dropped by a factor of two. (This is the emitted concourse_resource_checks_total Prometheus metric.) The reason for this behavior was that our ATCs sat behind a load balancer; Prometheus, which we use for metrics collection, was configured to collect metrics from their common endpoint in our network. Each time Prometheus requested a set of metrics it would receive an answer from only one of the two ATCs.

Our conclusion is that the number of resource checks metric emitted by an ATC corresponds to that ATC only; to get the total number of resource checks in the whole CI system, we need to collect metrics from both ATCs independently.

Question

Do all metrics work this way? For example, to get the concourse_workers_containers metric or the concourse_db_queries_total metric for the whole CI system, do we always need to collect them from all ATCs? Are there any metrics for which collection from only one ATC will equal the system-wide value?

We are using Concourse 4.2.1.

Thanks for the help!
kmd


Concourse support multiple web node emit metric to a single InfluxDB?
#2

After some experimentation and looking at the source code, it appears to me that all metrics that are emitted by the ATC require scraping from each ATC individually to have a system-wide view of the metrics. ATCs that are behind a load balancer need to have separate endpoints exposed to the relevant metric scraping services.