Does anyone typically monitor / alert on ATC and TSA log content (maybe shipping them to ELK / Elastic Stack with, perhaps, Journalbeat or an equivalent)?
Or are emitted metrics usually sufficient to monitor Web and Worker health?
In my experience the metrics emitted by the various Concourse components have been sufficient. I’ve mostly seen clients use prometheus but I know the Concourse team themselves are using datadog - their dashboad is publically available.
Hi @crsimmons. Thanks for the info re your experience.
Yes, we’ve set up metrics to emit and are currently pulling them in to Prometheus. I was just curious as the concourse-worker and concourse-web journals do seem to contain a lot of stuff - a lot of heartbeat-type of logging. Occasionally when something unusual happens (i.e. breaks) we kind-of look in those logs but have not often found anything too useful. I wondered… is it that there is useful information in here but it’s too hard to find in all the other noise? (therefore perhaps shipping them somewhere else would help). Or is there just nothing of use to the general operator - in which case not bother.
In the meantime while I ponder over that question, I’ll look more into the emitted logs. Thanks for the link to the public dashboard - that is useful and gives me a few ideas.
The logs can be useful in determining what has broken. In particular when debugging something like workers intermittently stalling cross referencing between the TSA logs on the web and the worker logs can be helpful. Usually I see alerting set up based on metrics then if something doesn’t look right operators can go dig through the logs. Sending the logs to something that is friendlier to search might be useful in this regard but I wouldn’t base alerting on it.
Thanks again. Yeah, the raw logs are certainly not friendly!
Is that something you’ve done yourself? (i.e. sending the logs where they’re easier to search)
I think I’ve seen customers implement it before. IIRC implementation depends on how Concourse was deployed. For example with a BOSH deployed Concourse you should be able to configure the syslog addon to ship all the component logs somewhere.
I don’t think there’s any built-in feature of Concourse that will send logs to something external. I believe the Concourse binary itself just dumps all logs onto stdout/stderr so you would just need to wrap it in something that captures the logs.
Hi @crsimmons, yes that sounds like it. We don’t use BOSH, so would need to find an equivalent method. I’m going to see how we get on with the metrics first - that seems to be what most folks rely on from what I can see. If it turns out we’re still sifting through logs looking for information, then I’ll look further into it - but hopefully that won’t be required. We’ll see.
Thanks again for the info & advice on this. Much appreciated.