V4.2.1 - 'too many open files'


#1

Deployment Details
v4.2.1
Bosh deployed
9 complex pipelines
hundreds of jobs, many triggering every few minutes
hundreds of resources
12 workers

Notes
We’ve been experiencing issues when using v4.2.1 which we did not encounter on v3.14.1. Same scale, same bosh deployment mechanism, same network. Different stemcell tho. bosh-vsphere-esxi-ubuntu-xenial-go_agent/97.18 for 4.2.1, bosh-vsphere-esxi-ubuntu-trusty-go_agent/3586.40 for 3.14.1.

Basically the webs go off the grid and the workers begin to stall. We get ‘too many open files’ in the logs.

We managed to get into a worker which was suffering and were able to get more info - There were nearly half a million sockets in a CLOSE_WAIT state which were owned by the ‘worker’ process. The worker process was refusing to die/restart because it had baggageclaim as a child process. baggageclaim could not be killed because it was waiting on a btrfs garbage collection process. garbage collection could not complete because it was waiting for btrfs filesystem to respond.

btrfs filesystem was dead… So dead that you could not even ‘ls’ any of the folders on it.

Checking into the kernel logs with dmesg we saw that there had been btrfs related GPFs and stack traces dumped out.

[1925543.779252] general protection fault: 0000 [#1] SMP PTI
[1925543.780347] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype nf_log_ipv4 nf_log_common xt_LOG ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment veth bridge stp llc iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables xfs libcrc32c btrfs zstd_compress xor raid6_pq isofs sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel vmw_balloon aes_x86_64 ppdev crypto_simd glue_helper cryptd joydev input_leds intel_rapl_perf serio_raw parport_pc parport shpchp i2c_piix4 mac_hid vmw_vsock_vmci_transport vsock vmw_vmci autofs4 vmwgfx ttm mptspi drm_kms_helper mptscsih syscopyarea sysfillrect sysimgblt psmouse fb_sys_fops mptbase drm vmxnet3 scsi_transport_spi floppy
[1925543.792587]  pata_acpi
[1925543.793132] CPU: 2 PID: 3923976 Comm: baggageclaim Not tainted 4.15.0-34-generic #37~16.04.1-Ubuntu
[1925543.794799] Hardware name: it's-a-secret
[1925543.796779] RIP: 0010:prefetch_freepointer+0x15/0x30
[1925543.797723] RSP: 0018:ffffb19e03493ca8 EFLAGS: 00010286
[1925543.798696] RAX: 0000000000000000 RBX: b43c7c338b75d7a3 RCX: 0000000009f5ba8f
[1925543.799967] RDX: 0000000009f5ba8e RSI: b43c7c338b75d7a3 RDI: ffff9398296fc600
[1925543.801239] RBP: ffffb19e03493ca8 R08: ffffd19dffd02d90 R09: ffff93955e821488
[1925543.802507] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000001408040
[1925543.803778] R13: ffff9396894d2818 R14: ffff9398296fc600 R15: ffff9398296fc600
[1925543.805051] FS:  00007f0654ff9700(0000) GS:ffff93983fd00000(0000) knlGS:0000000000000000
[1925543.806466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1925543.807515] CR2: 000000c0001a3000 CR3: 00000004295b2004 CR4: 00000000000606e0
[1925543.808819] Call Trace:
[1925543.809374]  kmem_cache_alloc+0xa2/0x1c0
[1925543.810267]  ? btrfs_get_or_create_delayed_node+0x5b/0x1a0 [btrfs]
[1925543.811395]  btrfs_get_or_create_delayed_node+0x5b/0x1a0 [btrfs]
[1925543.812500]  btrfs_delayed_update_inode+0x2c/0x2b0 [btrfs]
[1925543.813536]  ? getnstimeofday64+0xe/0x20
[1925543.814315]  ? btrfs_update_root_times+0x40/0x90 [btrfs]
[1925543.816245]  btrfs_update_inode+0x8f/0x110 [btrfs]
[1925543.817559]  btrfs_dirty_inode+0x5b/0xd0 [btrfs]
[1925543.818849]  ? current_time+0x32/0x70
[1925543.819981]  btrfs_update_time+0x81/0xc0 [btrfs]
[1925543.821241]  touch_atime+0xb2/0xe0
[1925543.822308]  iterate_dir+0x12d/0x1a0
[1925543.823390]  SyS_getdents64+0x9c/0x120
[1925543.824488]  ? iterate_dir+0x1a0/0x1a0
[1925543.825586]  do_syscall_64+0x73/0x130
[1925543.826643]  ? do_syscall_64+0x73/0x130
[1925543.827733]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[1925543.828988] RIP: 0033:0x47c720
[1925543.829919] RSP: 002b:000000c00067cfe8 EFLAGS: 00000212 ORIG_RAX: 00000000000000d9
[1925543.831543] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c720
[1925543.833093] RDX: 0000000000002000 RSI: 000000c000408000 RDI: 0000000000000003
[1925543.834644] RBP: 000000c00067d038 R08: 0000000000000000 R09: 0000000000000000
[1925543.836187] R10: 0000000000000000 R11: 0000000000000212 R12: 00000000000000e0
[1925543.837713] R13: 0000000000000075 R14: 0000000000864f31 R15: 0000000000000038
[1925543.839225] Code: c7 e0 cb ae 95 e8 dc b1 ea ff eb 93 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 85 f6 48 89 e5 74 14 48 63 47 20 48 01 c6 <48> 33 36 48 33 b7 40 01 00 00 0f 18 0e 5d c3 66 90 66 2e 0f 1f
[1925543.843307] RIP: prefetch_freepointer+0x15/0x30 RSP: ffffb19e03493ca8
[1925543.844767] ---[ end trace ceff278338a9a0c1 ]---
[1925543.944765] general protection fault: 0000 [#2] SMP PTI
[1925543.946107] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype nf_log_ipv4 nf_log_common xt_LOG ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment veth bridge stp llc iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables xfs libcrc32c btrfs zstd_compress xor raid6_pq isofs sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel vmw_balloon aes_x86_64 ppdev crypto_simd glue_helper cryptd joydev input_leds intel_rapl_perf serio_raw parport_pc parport shpchp i2c_piix4 mac_hid vmw_vsock_vmci_transport vsock vmw_vmci autofs4 vmwgfx ttm mptspi drm_kms_helper mptscsih syscopyarea sysfillrect sysimgblt psmouse fb_sys_fops mptbase drm vmxnet3 scsi_transport_spi floppy
[1925543.960967]  pata_acpi
[1925543.961824] CPU: 2 PID: 676611 Comm: exe Tainted: G      D          4.15.0-34-generic #37~16.04.1-Ubuntu
[1925543.964211] Hardware name: it's-a-secret
[1925543.966836] RIP: 0010:kmem_cache_alloc+0x81/0x1c0
[1925543.968084] RSP: 0018:ffffb19e038d3cd0 EFLAGS: 00010286
[1925543.969424] RAX: b43c7c338b75d7a3 RBX: b43c7c338b75d7a3 RCX: 0000000009f5ba90
[1925543.971125] RDX: 0000000009f5ba8f RSI: 0000000001408040 RDI: 00003e05c0002d90
[1925543.972768] RBP: ffffb19e038d3d00 R08: ffffd19dffd02d90 R09: ffff9394b07feb58
[1925543.974414] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000001408040
[1925543.976062] R13: b43c7c338b75d7a3 R14: ffff9398296fc600 R15: ffff9398296fc600
[1925543.977720] FS:  00007f99fa5a2700(0000) GS:ffff93983fd00000(0000) knlGS:0000000000000000
[1925543.979532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1925543.981000] CR2: 000000c42032a000 CR3: 000000013553e001 CR4: 00000000000606e0
[1925543.982713] Call Trace:
[1925543.983693]  ? btrfs_get_or_create_delayed_node+0x5b/0x1a0 [btrfs]
[1925543.985233]  btrfs_get_or_create_delayed_node+0x5b/0x1a0 [btrfs]
[1925543.986751]  btrfs_delayed_delete_inode_ref+0x30/0x90 [btrfs]
[1925543.988226]  __btrfs_unlink_inode+0x163/0x530 [btrfs]
[1925543.989602]  btrfs_unlink_inode+0x1c/0x50 [btrfs]
[1925543.990932]  btrfs_unlink+0x84/0xd0 [btrfs]
[1925543.992157]  vfs_unlink+0xfe/0x1b0
[1925543.993257]  ? apparmor_path_unlink+0x1b/0x20
[1925543.994482]  do_unlinkat+0x26c/0x2d0
[1925543.995579]  SyS_unlinkat+0x33/0x50
[1925543.996655]  do_syscall_64+0x73/0x130
[1925543.997757]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[1925543.999072] RIP: 0033:0x8ac3a0
[1925544.000061] RSP: 002b:000000c4202aad80 EFLAGS: 00000206 ORIG_RAX: 0000000000000107
[1925544.001753] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000008ac3a0
[1925544.003365] RDX: 0000000000000000 RSI: 000000c4203264f0 RDI: ffffffffffffff9c
[1925544.004966] RBP: 000000c4202aade0 R08: 0000000000000000 R09: 0000000000000000
[1925544.006557] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000050
[1925544.008138] R13: 000000000000004f R14: 0000000000000200 R15: 0000000000000018
[1925544.009702] Code: 7b 3c 6b 49 83 78 10 00 4d 8b 28 0f 84 05 01 00 00 4d 85 ed 0f 84 fc 00 00 00 49 63 5f 20 48 8d 4a 01 49 8b 3f 4c 89 e8 4c 01 eb <48> 33 1b 49 33 9f 40 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74
[1925544.013932] RIP: kmem_cache_alloc+0x81/0x1c0 RSP: ffffb19e038d3cd0
[1925544.015415] ---[ end trace ceff278338a9a0c2 ]---

#2

I actually forgot to pose a question.

Has anyone else come across this? I’m not sure that our environment is particularly special but we do run at a certain scale and frequency that may be higher than most


#3

We’ve seen this with a few clients recently. There’s more details in this GitHub issue.

You can temporarily resolve it by increasing the ulimit of the ATC process to something higher (like 4096) like this but unfortunately BOSH will undo that if it recreates the vm for any reason.


#4

Thanks for the suggestion Colin - Unfortunately we’ve already tried that. Bumping the ulimit up to 4096 helped keep things running for a little bit longer (8 hours instead of 2 hours), but as btrfs is freezing up behind the scenes then the file handles can’t be released so the worker still falls over in the end.


#5

That’s unfortunate. You could also try using overlay as the baggageclaim driver instead of btrfs. The BOSH option is here.


#6

We’re now up to 48 hours running without any problems using the overlay driver instead. Will report back sometime next week, but right now that’s looking like a good fix for us. Thanks for the pointer crsimmons.