Pages

Wednesday, February 27, 2019

APD - ExtendAPDCondition

Good To Know about this advanced setting.

[root@esx24:~] esxcli system settings advanced list -o /Scsi/ExtendAPDCondition
   Path: /Scsi/ExtendAPDCondition
   Type: integer
   Int Value: 0
   Default Int Value: 0
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Trigger APD condition when paths are in unavailable states

VMware VDS Healthcheck enhancement


Overview


Customer give the feedback for healthcheck about scalability issues. There are two things here:
1.     Currently, each uplink will send out broadcast packets for each vlan, if vlan range is big, then that will cause physical switch flushes its port’s lookup table and the normal traffic will be flooding, and cause performance issue.
2.     Currently, we send out quite a lot broadcast packets at the same time, and those broadcast will introduce a lot of ACK packets followed, and thus it causes traffic bursted for this healthcheck.
We need to work out a way to reduce multicast packets number sending by healthcheck and resolve the lookup table flush issue.

Scope and requests


The scope is for vSphere-2016 release.
The requests are:
1.     Can do one time check for specific vlan.
2.     Resolve the physical switch flush lookup table issue.
3.     Reduce the broadcast packets as much as possible.

Detail design

In the new design, healthcheck will provide:
      User can specified vlan checking range instead of whole vlan range of DVS.
      User can specified vlan checking at certain host instead of all hosts within same DVS.
      Using unicast packet instead of broadcast to avoid broadcast storm and response packets(same host might not need to send back the ACK packets) when there are more than two physical uplinks connected to this DVS on this host.
      Change the ACK mode: there will be no ACK packets send out from the same host. If the receiving packet is sending from the same host, just mark the session is ACK’ed directly instead of sending back the ACK packets to physical host to the same host.

Provide UI interface in order to do customizing vlan/MTU check:


User can specify the vlan range and select the hosts to run the check, and the results listed per host as well. From the UI side, we need to provide following interface to let customer initiate his customized vlan MTU check.

For the result showing, we can use the current format to show result for both one time checking and periodical checking.

Changes for management plane:


From MP side, the original code get VLAN settings from all DVPorts and DVPortgroups, we need to provide VIMAPI to accept inputted VLAN range from UI side and to initialize the one time checking.

The way to fetch result does not need to change.

Changes in data plane:


Change the way to send out probing packets.

Original vlanMTU  check model:


In the original design, each uplink will send out broadcast packets for each configured vlanID. And the ACK’ed packets will received from both same host and other hosts within one same DVS.

New VLANMTUCHECK steps:


In the new design, all uplinks of the same vswitch will be treated as one same checking group instead of sending out packets separately in order to reduce the number of packets sending to the physical switch. Here is the reasons:
-       If the unicast packet of specific vlanID sending from uplink0 to uplink1 is ACK’ed by uplink1, it indicates that this vlanID configuration is correct on both uplink0 and uplink1; if it is not ACK’ed by uplink1, but the same vlanID is ACK’ed by another uplink2, it indicates that vlanID is setting correct on uplink0 and uplink2 and is wrong on uplink1.
The overall design is:
-       If the vswitch has only one uplink,  it will send out broadcast packet as it did in old version.
-       If there are more than 1 linkup uplinks, it will choose the first linkup uplink, and make all other linkup uplinks are destination ports and send out unicast packet for each vlanID to each other uplink.
-       If the ticket get all ACK’ed packets from all other uplinks, mark all vlanID setting is corret.
-       If received only from part of all other uplinks, picked each ACK uplink and mark both vlanID correct for the source uplink and ACK’ed uplink.
-       If there are vlanIDs did not receive ACK packet from any other uplinks, choose the next uplink as source uplink, send unicast packets to all following uplinks for all non-ACK’ed vlanIDs. Recording ACK’ed vlanID for each uplink and repeat this until: 1) there is no untrunked vlanID; 2) the last uplink.
-       Comparing all trunked vlan bitmap of all uplinks with configured vlan bitmap, if there are still vlanIDs untrunked, trigger another round of broadcast phase for each uplink for each untrunked vlanID just as previous version. Doing this, in order to reduce the chance that the vlanID settings in correct at only one of the uplinks or there is LAG configured at physical switch side. If there is a LAG configured at physical switch side, unicast packets sending among uplinks within one LAG will not be received by the targeted port, but broadcast packets can be responsed by remote hosts. So we need to send out broadcast packets as second round of checking.

Please refer to figure.1 below:



                               Figure. 1  New model of vlan MTU check

Detailed flowchart is:


ACK model change:


In the new design, if the request packets sending among uplinks belong to same DVS and the same host, will not send back ACK packets and will updated the ticket’s ACK’ed list directly in order to reduce the unicast packet packet amount and reduce the possibility to flush the MAC table of physical switch.

Risk and assumptions:


The new design uses unicast packets replacing the broadcast packets,  that makes the ways to sending packets and checking process change totally, vlanmtucheck module will be re-architectured, it will introduce code changes at most of the places. So need QE team to run healthcheck testing for good quality.

Part of this change requests UI and MP resource, without that, customized checking request is not possible to implement for most of that changes is in UI and MP side.

Test cases


For this design will change the way to run vlan MTU check, uplinks with the same DVS at the same host will interact together, so need to design new test cases to cover this.