Workers painfully slow to start


#1

Good evening,

I deployed concourse on kubernetes using the helm chart like this:

helm install stable/concourse --name concourse --namespace concourse

and when I check the worker logs I get:

{“timestamp”:“1537749073.369020224”,“source”:“worker”,“message”:“worker.setup.unpacking”,“log_level”:1,“data”:{“session”:“1”}}
~34sec
{“timestamp”:“1537749107.401631832”,“source”:“worker”,“message”:“worker.setup.done”,“log_level”:1,“data”:{“session”:“1”}}
~0sec
{“timestamp”:“1537749107.406176567”,“source”:“worker”,“message”:“worker.garden.extract-resources.extract.extracting”,“log_level”:1,“data”:{“resource-type”:“bosh-deployment”,“session”:“2.1.1”}}
~835sec
{“timestamp”:“1537749942.317373276”,“source”:“worker”,“message”:“worker.garden.extract-resources.extract.extracting”,“log_level”:1,“data”:{“resource-type”:“bosh-io-release”,“session”:“2.1.2”}}
~408sec
{“timestamp”:“1537750350.337106943”,“source”:“worker”,“message”:“worker.garden.extract-resources.extract.extracting”,“log_level”:1,“data”:{“resource-type”:“bosh-io-stemcell”,“session”:“2.1.3”}}
~562sec
{“timestamp”:“1537750913.240059853”,“source”:“worker”,“message”:“worker.garden.extract-resources.extract.extracting”,“log_level”:1,“data”:{“resource-type”:“cf”,“session”:“2.1.4”}}
~562sec
{“timestamp”:“1537751475.658858776”,“source”:“worker”,“message”:“worker.garden.extract-resources.extract.extracting”,“log_level”:1,“data”:{“resource-type”:“docker-image”,“session”:“2.1.5”}}

I have been looking at the number of elapsed seconds between each step and this is a big blocker for me at the moment. Can you let me know if this is expected or if there are tunables that would improve the situation?

Regards,

Marc


#2

That’s…really slow. Those images aren’t even that large, and it’s not like anything super complicated is happening during the extraction. It literally just shells out to tar: https://github.com/concourse/bin/blob/master/cmd/concourse/worker_linux.go#L214

On the plus side, we’ll be removing this step in the next release, and instead the image will just contain the resource types pre-extracted. It should be a lot quicker that way. So if you don’t end up figuring this out, it’ll at least be “fixed” in the next release. But you might want to figure out why your disk is so slow to write to either way, since it’ll probably be slow for builds, too.


#3

Hi Vito,

Thanks for the reply!

Yes, the current state makes it quite difficult to use. To be sure it wasn’t a PVC issue on k8s I ran this quick test this morning:

root@concourse-worker-0:/concourse-work-dir/3.14.1/assets/resources# dd if=/dev/zero of=test bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 2.74544 s, 97.8 MB/s

Which seems reasonably fast given the size of files that need to be extracted so I am even more confused now. The command ps -ef f does confirm what you said:

root         46      1  1 15:26 ?        Sl     0:09 concourse worker --name=concourse-worker-1
root       1045     46  0 15:39 ?        D      0:00  \_ /concourse-work-dir/3.14.1/assets/bin/tar -zxf /concourse-work-dir/3.14.1/assets/resources/bosh-io-release/rootfs.tar.gz -C /concourse-work-dir/3.14.1/assets/resource-images/bosh-io-release/rootfs
root       1046   1045  0 15:39 ?        S      0:00      \_ gzip -d

and

root         46      1  8 15:26 ?        Sl     0:09 concourse worker --name=concourse-worker-1
root        307     46  0 15:26 ?        D      0:00  \_ /concourse-work-dir/3.14.1/assets/bin/tar -zxf /concourse-work-dir/3.14.1/assets/resources/bosh-deployment/rootfs.tar.gz -C /concourse-work-dir/3.14.1/assets/resource-images/bosh-deployment/rootfs
root        308    307  0 15:26 ?        S      0:00      \_ gzip -d

I will run a couple more tests and report back.


#4

OK, it seems to be related to local vs PVC (usually remote) storage and the number of files in each archive…

Local storage:

root@concourse-worker-1:/# time /concourse-work-dir/3.14.1/assets/bin/tar -zxf /concourse-work-dir/3.14.1/assets/resources/bosh-io-release/rootfs.tar.gz -C /tmp/test

real	0m0.648s
user	0m0.350s
sys	0m0.336s
root@concourse-worker-1:/# find /tmp/test | wc -l
3856

PVC (remote) storage:

root@concourse-worker-1:/# time /concourse-work-dir/3.14.1/assets/bin/tar -zxf /concourse-work-dir/3.14.1/assets/resources/bosh-io-release/rootfs.tar.gz -C /concourse-work-dir/3.14.1/assets/resource-images/bosh-io-release/rootfs

real	5m25.938s
user	0m0.379s
sys	0m1.807s

Is there a reason for using PVC if it gets wiped at each restart though? (I saw rm -rf ${CONCOURSE_WORK_DIR:-/concourse-work-dir}/*)

Regards,

Marc