Pages

Thursday, September 16, 2021

Read IO spike every 5 minutes for 7.0U1 (84220)

 

Symptoms
  • Storage array experiencing continuous 5 minute read spike and high CPU utilization.
  • Other storage computations like deduplication and compression can be delayed or stalled.
  • In our case it was huge environment  (200-300 host)  connected to  Pure storage array
Purpose
This article will explain the reason and provide workaround or fix.
Cause

A change was made ( in 7.0U1):

In hostd to make API call every 5 minutes.
In VMFS a new lighter API was added to get the required stat.

Impact / Risks
Storage overutilization in case of large amount o hosts and large amount of datastores.
Resolution
Not available yet
Workaround

Changing /etc/vmware/hostd/config.xml on each host.
We can recommend to try to 12 hours for customer . Changing vmfsStatsIntervalInSecs=43200.
 

A one liner to perform this task:

sed -i -e 's/<vmfsStatsIntervalInSecs>.*>/<vmfsStatsIntervalInSecs>21600<\/vmfsStatsIntervalInSecs>/g' /etc/vmware/hostd/config.xml;/etc/init.d/hostd restart
Related Information
30 mins  = vmfsStatsIntervalInSecs=1800
1  hour = vmfsStatsIntervalInSecs=3600
3  hours = vmfsStatsIntervalInSecs=10800
6  hours = vmfsStatsIntervalInSecs=21600
12 hours = vmfsStatsIntervalInSecs=43200
Default setting in  etc/vmware/hostd/config.xml
 <!-- Vmfs stats collection interval -->                                                                                 
 <!-- Min value:5 mins Default Value:5 mins - in terms of seconds -->                                                    
 <!-- Setting it below 5 mins will reset it back to 5 mins,due to perf impact on VMFS -->                                
 <vmfsStatsIntervalInSecs> 300 </vmfsStatsIntervalInSecs>      
 
 
Confidential or Internal Information


https://bugzilla.eng.vmware.com/show_bug.cgi?id=2580232 change was made ( in 7.0U1)

The relevant PR for this KB https://bugzilla.eng.vmware.com/show_bug.cgi?id=2788282

 

- Note: hostd datastore refresh invoking VMFS datastore refresh
Vol3GetAttributesVMFS6 -> Res3StatVMFS6 can end up in reading a lot of VMFS
metadata.

- The amount of VMFS metadata read would be proportional to both size of VMFS
datastore and the number of VMFS datastores on ESXi server.

No comments:

Post a Comment