Disable ATS Heartbeat?

Before the winter break one of my customers ran into storage disconnect issues with their ESXi hosts on a particular cluster connected to a specific shared storage. I will keep this high level and lacking specifics as they do not pertain to the overall article.

During the troubleshooting process it was recommended at one point to disable VAAI ATS heartbeats on the ESXi hosts as the storage vendor thought this may be a contributing factor to the underlying issue. The timing of a recent upgrade to vSphere 5.5 U2 led them down this path. This recommendation is low impact to the ESXi hosts, command line, and if it was the issue you would see a reduction of storage disconnects in the logs. In this case, the change was made during a green-zone and did not resolve the issue.

While this was found NOT to be the cause of the issue, I thought I would share some valuable information that is already well documented on VMware websites and VMW employee Blogs.

So what Changed?

The way ESXi hosts communicate with the storage arrays in ensuring a VMFS heartbeat.  Before this update, the ESXi host would use SCSI read/write for validation, now this process is off loaded to the storage system. ATS is enabled on VMFS5 by default. It is disabled for VMFS3. Reference https://kb.vmware.com/kb/2113956 for more information.

Why the change?

According to a blog article on the topic, http://blogs.vmware.com/vsphere/2012/05/vmfs-locking-uncovered.html, the ‘critical section functionality’ is expanded giving your hosts more reliable/consistent results when VMFS metadata is updated. Basically it is a better way to assuring VMFS is ready to commit changes to the data. Give Cormac’s article a read, much more detail.

Should I disable ATS in my environment?

No. I would recommend leaving this feature on unless directed to do so by support.

If you see an issue with storage disconnects that is directly related to this feature being enabled, and your storage vendor is recommending it, reference the aforementioned KB article. As I previously stated, the disablement is not that intrusive and should confirm this being the issue in short order.

If your storage vendor is recommending this change, you may want to confirm with the storage vendor that there are not any updates available for your storage array that addresses this issue.  There are a few out there.