Resource script '/opt/resource/check []' failed: exit status 128

Hi all :slight_smile:

I have troubles finding out why the following error message keeps on returning:

resource script '/opt/resource/check []' failed: exit status 128

stderr:
Identity added: /tmp/git-resource-private-key (fakeuser@fakedomain.com)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

If we click on the retry button it most of the times is checked successfully right away. If you just wait for the system to retry it also will be checked successfully. But they also come back randomly on all pipelines and all checks, making the system unstable and making the web logs filling up with these errors.

I checked this forum and google for similar errors but could not find anything :frowning:.

This is the log line that I see on the webserver a lot from all different pipelines:

Jun 24 14:43:48 ip-10-228-84-197.pnls.local run_concourse.sh[23879]: {"timestamp":"2020-06-24T14:43:48.675714885Z","level":"info","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.check-failed","data":{"exit-status":128,"pipeline":"flow-something","resource":"some-git-resource","session":"17.9.4.1.144","team":"main"}}

I tried looking on the internet for atc.pipelines.radar.scan-resource.interval-runner.tick.check-failed but cannot find any usefull.

Maybe one of you can help us out, that would be amazing!

Kind regards,
Enri Peters

That’s very interesting, because I started having the same error. Difference is, mine seems to consistently fail. I use Amazon Secrets Manager to store secrets - could that have anything to do with it, i.e., newlines not being encoded correctly?

Thanks

Hi @space55!

I am using the AWS Systems Manager Parameter Store instead of the Secrets Manager to store secrets.

I had issues with the secrets stored in the Parameter Store in the past but those were giving api rate limit errors. Not these messages.

I think this 128 error has something to do with Git returning an unexpected exit code. Which maybe could happen because the key has disappeared as you sort of say… but not sure yet…
I am looking for a way to find out. I have also read somewhere unexpected behaviour can happen when containers are restarted. But i have no idea how to check that or if they even are restarted. I also don’t know where these checks run. Are they running on the workers/containers/volumes/a separate process? How to check it? I could use fly hijack, but I need to know what to hijack.

You could try using the parameter store for testing if that solves it for you?

By the way those api rate limit errors i got back then seem to be solved via these env vars:

export CONCOURSE_SECRET_CACHE_ENABLED=true
export CONCOURSE_SECRET_CACHE_DURATION=1440m
export CONCOURSE_SECRET_CACHE_DURATION_NOTFOUND=1440m

As an idea how many errors we get:

[root@ip-10-228-84-197 concourse]# uptime
 17:19:27 up  8:52,  1 user,  load average: 0.23, 0.42, 0.45
[root@ip-10-228-84-197 concourse]# journalctl -u concourse_service --no-pager | grep 'exit-status":128' | wc -l
8563

and that only in 8 hours…

Thanks @skabba! I don’t think I’m hitting rate limit issues, as I can retrieve them freely from the VM that the web node is running on, and I already have caching enabled (granted, I have a 60m cache set, plus a 60s not found, but thank you for reminding me to increase the cache by a lot).

These checks are run in a container on the worker. I looked for changes that would have broken it in a Concourse release in the past month, and I couldn’t find anything.

If there is anyone with the same issue (and solved it) please let us know!

There are actually three files for resource in concourse.
/check , /in, /out. what you are receiving is indication that there is something missing while authentication.

Can you check your resource and keys again.

Hi @vinodpandey!

Thnx for your reply to this thread!

I will have a look at this, but the strange thing if we click on the retry button it most of the time is checked successfully right away. If we just wait for the system to retry it also will be checked successfully. But they also come back randomly on all pipelines and all checks.

Kind regards,
Enri