Need help switching from btrfs to overlay in Concourse bosh deployment

#1

Hi,

Could you please tell me how exactly can I change the default btrfs file system to overlay on Concourse as a bosh deployment? I have searched for details about that procedure but I could not find more information about the exact procedure. I can see people that have done this in Concourse but they don’t talk about the details of the procedure that they have done in order to achieve this switch. What we have done so far is the following:

  1. We have set the property for the storage driver of the baggage claim to have a value “overlay”.
  2. We have redeployed the Concourse dosh deployment.

However after the Concourse redeployment the worker nodes of Concourse still have the btrfs type for the file system for baggageclaim.

Note: Currently our VM Update Strategy is changed from the default delete-create to the new create-swap-delete vm replacement strategy as described here https://bosh.io/docs/changing-deployment-vm-strategy/
I am not sure if this strategy affects the storages and if it will delete the old ones and create the new ones with the right file system, so we had an idea to switch back to the old vm replacement strategy - create-delete and to redeploy the Concourse bosh deployment again.

Can you please help us to switch from btrfs to overlay? What are we missing?

Best Regards,
Stoyko Dimitrov

0 Likes

Inconsistent behaviour of ATC with create-swap-delete vm-replacement strategy with Concourse v4.2.1
#3

I have found how to switch baggageclaim’s file system from btrfs to overlay. After the re-deployment of Concourse you have to recreate all of the workers - then the baggageclaim file system will switch from btrfs to overlay. Therefore the issue can be closed now. Thanks anyways!

Cheers!

0 Likes

#4

Hi,

Before we close this issue I have some additional findings that I would like to share with the Community and to ask if someone has found a solution for them. Before that let me say that we use Concourse version 4.2.1.

  • The first finding is that after changing the configuration of the Concourse BOSH deployment from btrfs to overlay and then redeploying the Concourse deployment you have to recreate all of the workers at least twice (I still have not figured out the exact number of required recreations on order for the switch to actually happen and if the order in which you recreate the worker nodes matters or not - if you have to recreate each node once and then repeat or if you have to recreate each node twice and then move on to the next node).
  • The second finding is that when you switch back from overlay to btrfs you once again must go through the re-creation procedure. Unfortunately in this situation the procedure becomes even more unclear. When you recreate the first worker node twice the Concourse registers it in ATC and work load starts being distributed to it. However the same is not true for the other worker nodes. Even after the third recreation they are still not being registered to the ATS and workload is not being distributed to them. The command ‘fly workers’ shows only the first worker node being registered.
  • The third finding is that after the redeployment of Concourse with the settings for the new overlay file system all of the worker nodes become unusable - they are not being registered in ATC and therefore the command ‘fly workers’ shows no registered worker nodes. So this means downtime for Concourse.
  • The fourth finding is that after you redeploy Concourse and you end up with unusable workers you don’t really need to wait for the drain during the following worker recreations because they have nothing to drain but even though this fact (that they have nothing to drain) the draining takes forever and usually results in error during recreation of the worker due to drain timeout. You can just use the --skip-drain parameter of bosh recreate command in this situation.
  • The fifth finding is that the required redeployment of Concourse causes the current builds to fail with an error (check with ‘fly builds’ command) and messages in Concourse pipeline “no workers” followed by “failed to create volume”, therefore the draining strategy is irrelevant in this case as well although in this situation the draining again takes forever. Therefore it seems reasonable to softly stop (with the drain running) all of the Concourse workers before you start the required redeployment (probably with the --skip-drain parameter for the ‘bosh deploy’ command) of Concourse in order to make the switch from btrfs to overlay file system used by baggageclaim.

Has anyone found a solution regarding the above findings? Also can someone comment what is the state of the above findings in the latest version of Concourse - 5.0.1?

Best Regards,
Stoyko Dimitrov

0 Likes

#5

Regarding my finding explained in my previous post I have found out that the create-swap-delete bosh vm replacement strategy is causing them. When I switch back to the default - delete-create I do not experience the above mentioned issues. Therefore I have created another issue Inconsistent behaviour of ATC with create-swap-delete vm-replacement strategy with Concourse v4.2.1 regarding the issue with the vm replacement strategy.

0 Likes