Pages

Friday, November 29, 2024

VMware vSAN ESA - storage performance testing

I have just finished my first VMware vSAN ESA Plan, Design, Implement project and had a chance to test vSAN ESA performance. Every storage should be stressed and tested before being put into production. VMware's software-defined hyperconverged storage (vSAN) is no different. It is even more important because the server's CPU, RAM, and Network are leveraged to emulate enterprise-class storage.

vSAN ESA Environment

All storage performance tests were performed on
  • 6-node vSAN ESA Cluster (6x ESXi hosts) 
    • ESXi Specification
      • Server Model: Cisco UCS X210c M7
      • CPU: 32 CPU Cores - 2x CPU Intel Xeon Gold 6544Y 16C @ 3.6 GHz
        • 115.2 GHz capacity
      • RAM: 1.5 TB
      • NIC: Cisco VIC 15230 - 2x 50Gbps
        • vSAN vmknic is active/standby, therefore active on one 50 Gbps NIC (vmnic)
          • 50 Gbps is physically two 25G-KR (transceiver modules)
      • Storage: 5x NVMe 6.4 TB 2.5in U.2 P5620 NVMe High Perf High Endurance
        • The usable raw capacity of one disk is 5.82 TB, that's the difference between vendor "sales" capacity and reality. almost 0.6 TB difference :-(
  • Storage benchmark software - HCIBench 2.8.3
    • 18 test VMs (8x data vDisk, 2 workers per vDisk) evenly distributed across the vSAN Cluster
      • 3 VMs per ESXi host
    • fio target storage latency 2.5 ms (2,500 us)
  • vSAN Storage Policy:
    • RAID-5
    • compression enabled
    • IOPS Limit 5,000 (to not totally overload the server's CPU)

Test Cases

Random storage workloads

32KB IO, 100% read, 100% random

Test Case Name: fio-8vmdk-90ws-32k-100rdpct-100randompct-2500lt-1732885897

Performance Result
Datastore: CUST-1001-VSAN
=============================
JOB_NAME: job0
Number of VMs: 18
I/O per Second: 721,317.28 IO/S
Throughput: 22,540.00 MB/s
Read Latency: 2.03 ms
Write Latency: 0.00 ms
95th Percentile Read Latency: 2.00 ms
95th Percentile Write Latency: 0.00 ms

ESXi Host CPU Usage during test 78 GHz (1 GHz is used in idle)
vSAN vmnic4 transmit traffic ~3.4 GB/s (27.2 Gb/s)
vSAN vmnic4 receive traffic ~3.4 GB/s (27.2 Gb/s)
Storage IOPS per ESXi: 120,220 IOPS (721,317 IOPS / 6 ESXi hosts)

ESXi CPU Usage due to vSAN Storage + vSAN Network Traffic
120,220 Storage IOPS + 27.2 Gb/s Network transmit traffic + 27.2 Gb/s Network receive traffic requires 77 GHz
That means 1 vSAN read 32 KB I/O operation (including TCP network traffic) requires ~640 KHz.
In other words, 640,000 CPU clock cycles for 32 KB read I/O (256,000 bits) means ~2.5 Hz to read 1 bit of data.

ESXi CPU Usage due to vSAN network traffic
I have tested that
9.6 Gb/s of transmit pure network traffic requires 1681 MHz (1.68 GHz) of CPU usage
That means
10,307,921,510 b/s transmit traffic requires 1,681,000,000 Hz
1 b/s transmit traffic requires 0.163 Hz
1 Gb/s transmit traffic requires 163 MHz

I have also tested that
10 Gb/s of receive pure network traffic requires 4000 MHz (4 GHz) of CPU usage
That means
10,737,418,240 b/s transmit traffic requires 4,000,000,000 Hz
1 b/s receive traffic requires 0.373 Hz
1 Gb/s receive traffic requires 373 MHz 

vSAN ESXi host reports transmitting network traffic of  27.2 Gb/s, thus it requires ~ 4.43 GHz CPU 
vSAN ESXi host reports receiving network traffic of 27.2 Gb/s, thus it requires ~ 10.15 GHz CPU

ESXi CPU Usage due to vSAN Storage without vSAN network traffic
We can deduct 14.58 GHz (4.43 + 10.15) CPU usage (the cost of bidirectional network traffic) from 77 GHz total ESXi CPU usage. That means we need 62.42 GHz CPU usage for vSAN storage operations without network transfers.  
We were able to achieve 120,220 IOPS on the ESXi host at 62.42 GHz (62,420,000,000 Hz)
That means 1 NVMe read 32 KB I/O operation without a TCP network traffic requires ~519 KHz.
In other words, 519,000 CPU clock cycles for 32 KB read I/O (256,000 bits) means ~2 Hz to read 1 bit of data

32k IO, 100% write, 100% random

Test Case Name: fio-8vmdk-90ws-32k-0rdpct-100randompct-2500lt-1732885897

Performance Result
Datastore: CUST-1001-VSAN
=============================
JOB_NAME: job0
Number of VMs: 18
I/O per Second: 285,892.55 IO/S
Throughput: 8,934.00 MB/s
Read Latency: 0.00 ms
Write Latency: 1.74 ms
95th Percentile Read Latency: 0.00 ms
95th Percentile Write Latency: 2.00 ms

ESXi Host CPU Usage during test 88 GHz (1 GHz is used in idle)
vSAN vmnic4 transmit traffic ~4.44 GB/s (35.5 Gb/s)
vSAN vmnic4 receive traffic ~5 GB/s (40 Gb/s)
Storage IOPS per ESXi: 47,650 IOPS (285,892 IOPS / 6 ESXi hosts)

ESXi CPU Usage due to vSAN Storage + vSAN Network Traffic
47,650 Storage IOPS + 35.5 Gb/s Network transmit traffic + 40 Gb/s Network receive traffic requires 87 GHz
That means 1 vSAN write 32 KB I/O operation (including TCP network traffic) requires ~1,825 KHz.
In other words, 1,825,000 CPU clock cycles for 32 KB write I/O (256,000 bits) means ~7.13 Hz to write 1 bit of data.

ESXi CPU Usage due to vSAN network traffic
1 Gb/s transmit traffic requires 163 MHz
1 Gb/s receive traffic requires 373 MHz 

vSAN ESXi host reports transmitting network traffic of  35.5 Gb/s, thus it requires ~ 5.79 GHz CPU 
vSAN ESXi host reports receiving network traffic of 40 Gb/s, thus it requires ~ 14.92 GHz CPU

ESXi CPU Usage due to vSAN Storage without vSAN network traffic
We can deduct 20.71 GHz (5.79 + 14.92) CPU usage (the cost of bidirectional network traffic) from 87 GHz total ESXi CPU usage. We need 66.29 GHz CPU usage for vSAN storage operations without network transfers.  
We were able to achieve 47,650 IOPS on the ESXi host at 66.29 GHz (66,290,000,000 Hz)
That means 1 NVMe write 32 KB I/O operation without a TCP network traffic requires ~1,391 KHz.
In other words, 1,391,000 CPU clock cycles for 32 KB write I/O (256,000 bits) means ~5.43 Hz to write 1 bit of data

32k IO, 70% read - 30% write, 100% random

Test Case Name: fio-8vmdk-90ws-32k-70rdpct-100randompct-2500lt-1732908719

Performance Result
Datastore: CUST-1001-VSAN
=============================
JOB_NAME: job0
Number of VMs: 18
I/O per Second: 602,702.73 IO/S
Throughput: 18,834.00 MB/s
Read Latency: 1.55 ms
Write Latency: 1.99 ms
95th Percentile Read Latency: 2.00 ms
95th Percentile Write Latency: 2.00 ms

ESXi Host CPU Usage during test 95 GHz (1 GHz is used in idle)
vSAN vmnic4 transmit traffic ~4.5 GB/s (36 Gb/s)
vSAN vmnic4 receive traffic ~4.7 GB/s (37.6 Gb/s)
Storage IOPS per ESXi: 100,450 IOPS (602,702 IOPS / 6 ESXi hosts)

Sequential storage workloads

1024k IO, 100% read, 100% sequential

Test Case Name: fio-8vmdk-90ws-1024k-100rdpct-0randompct-2500lt-1732911329

Performance Result
Datastore: CUST-1001-VSAN
=============================
JOB_NAME: job0
Number of VMs: 18
I/O per Second: 22,575.50 IO/S
Throughput: 22,574.00 MB/s
Read Latency: 6.38 ms
Write Latency: 0.00 ms
95th Percentile Read Latency: 6.00 ms
95th Percentile Write Latency: 0.00 ms

ESXi Host CPU Usage during test 60 GHz (1 GHz is used in idle)
vSAN vmnic4 transmit traffic ~3.4 GB/s (27.2 Gb/s)
vSAN vmnic4 receive traffic ~3.2 GB/s (25.6 Gb/s)
Storage IOPS per ESXi: 3,762 IOPS (22,574 IOPS / 6 ESXi hosts)
Throughput per ESXi: 3,762.00 MB/s (22,574.00 MB/s / 6 ESXi hosts)

ESXi CPU Usage due to vSAN Storage + vSAN Network Traffic
3,762 Storage IOPS + 27.2 Gb/s Network transmit traffic + 25.6 Gb/s Network receive traffic requires 59 GHz
That means 1 vSAN read 1024 KB I/O operation (including TCP network traffic) requires ~15,683 KHz.
In other words, 15,640,000 CPU clock cycles for 1024 KB read I/O (8,388,608 bits) means ~1.86 Hz to read 1 bit of data.

ESXi CPU Usage due to vSAN network traffic
1 Gb/s transmit traffic requires 163 MHz
1 Gb/s receive traffic requires 373 MHz 

vSAN ESXi host reports transmitting network traffic of  27.2 Gb/s, thus it requires ~4.43 GHz CPU 
vSAN ESXi host reports receiving network traffic of 25.6 Gb/s, thus it requires ~9.55 GHz CPU

ESXi CPU Usage due to vSAN Storage without vSAN network traffic
We can deduct 13.98 GHz (4.43 + 9.55) CPU usage (the cost of bidirectional network traffic) from 59 GHz total ESXi CPU usage. That means we need 45.02 GHz CPU usage for vSAN storage operations without network transfers.  
We were able to achieve 3,162 IOPS on the ESXi host at 45.02 GHz (45,020,000,000 Hz)
That means 1 NVMe read 1 MB I/O operation without a TCP network traffic requires ~14,238 KHz.
In other words, 14,238,000 CPU clock cycles for 1024 KB read I/O (8,388,608 bits) means ~ 1.69 Hz to read 1 bit of data

1024k IO, 100% write, 100% sequential

Test Case Name: fio-8vmdk-90ws-1024k-0rdpct-0randompct-2500lt-1732913825

Performance Result
Datastore: CUST-1001-VSAN
=============================
JOB_NAME: job0
Number of VMs: 18
I/O per Second: 15,174.08 IO/S
Throughput: 15,171.00 MB/s
Read Latency: 0.00 ms
Write Latency: 8.30 ms
95th Percentile Read Latency: 0.00 ms
95th Percentile Write Latency: 12.00 ms

ESXi Host CPU Usage during test 60 GHz (1 GHz is used in idle)
vSAN vmnic4 transmit traffic ~3.9 GB/s (31.2 Gb/s)
vSAN vmnic4 receive traffic ~3.9 GB/s (31.2 Gb/s)
Storage IOPS per ESXi: 2,529 IOPS (15,171.00  IOPS / 6 ESXi hosts)
Throughput per ESXi: 2,529 MB/s (15,171.00 MB/s / 6 ESXi hosts)

ESXi CPU Usage due to vSAN Storage + vSAN Network Traffic
2,529 Storage IOPS + 31.2 Gb/s Network transmit traffic + 31.2 Gb/s Network receive traffic requires 59 GHz
That means 1 vSAN 1024 KB write I/O operation (including TCP network traffic) requires ~23,329 KHz.
In other words, 23,329,000 CPU clock cycles for 1024 KB write I/O (8,388,608 bits) means ~2.78 Hz to write 1 bit of data.

ESXi CPU Usage due to vSAN network traffic
1 Gb/s transmit traffic requires 163 MHz
1 Gb/s receive traffic requires 373 MHz 

vSAN ESXi host reports transmitting network traffic of  27.2 Gb/s, thus it requires ~4.43 GHz CPU 
vSAN ESXi host reports receiving network traffic of 25.6 Gb/s, thus it requires ~9.55 GHz CPU

ESXi CPU Usage due to vSAN Storage without vSAN network traffic
We can deduct 13.98 GHz (4.43 + 9.55) CPU usage (the cost of bidirectional network traffic) from 59 GHz total ESXi CPU usage. That means we need 45.02 GHz CPU usage for vSAN storage operations without network transfers.  
We were able to achieve 2,259 IOPS on the ESXi host at 45.02 GHz (45,020,000,000 Hz)
That means 1 NVMe 1024 KB write I/O operation without a TCP network traffic requires ~19,929 KHz.
In other words, 19,929,000 CPU clock cycles for 1024 KB write I/O (8,388,608 bits) means ~ 2.37 Hz to write 1 bit of data

1024k IO, 70% read - 30% write, 100% sequential

Performance Result
Datastore: CUST-1001-VSAN
=============================
JOB_NAME: job0
Number of VMs: 18
I/O per Second: 19,740.90 IO/S
Throughput: 19,738.00 MB/s
Read Latency: 5.38 ms
Write Latency: 8.68 ms
95th Percentile Read Latency: 7.00 ms
95th Percentile Write Latency: 12.00 ms

ESXi Host CPU Usage during test 62 GHz (1 GHz is used in idle)
vSAN vmnic4 receive traffic ~4.15 GB/s (33.2 Gb/s)
vSAN vmnic4 transmit traffic ~4.3 GB/s (34.4 Gb/s)
Storage IOPS per ESXi: 3,290 IOPS (19,740.90 IOPS / 6 ESXi hosts)
Throughput per ESXi: 3,290 MB/s (19,738.00  MB/s / 6 ESXi hosts)

Observations and explanation

Observation 1 - Storage and network workload requires CPU resources.

This is obvious and logical, however, here is some observed data from our storage performance benchmark exercise.

32K, 100% read, 100% random (721,317.28 IOPS in VM guest,  22,540.00 MB/s in VM guest)
    => CPU Usage ~77 GHz
    => ~2.5 Hz to read 1 bit of data (storage + network) 
    => ~2 Hz to read 1 bit of data (storage only)
    => 25% goes to network traffic

32K, 70%read 30%write, 100% random (602,702.73 IOPS in VM guest, 18,834.00 MB/s in VM guest)
    => CPU Usage ~94 GHz << THIS IS STRANGE, WHY IS IT MORE CPU THAN 100% WRITE? I DON'T KNOW.

32K, 100% write, 100% random (285,892.55 IOPS in VM guest, 8,934.00 MB/s in VM guest) 
    => CPU Usage ~87 GHz
    => ~7.13 Hz to write 1 bit of data  (storage + network)
    => ~5.43 Hz to write 1 bit of data (storage only)
    => 31% goes to network traffic

1M, 100% read, 100% random (22,575.50 IOPS in VM guest,  22,574.00 MB/s in VM guest)
    => CPU Usage ~60 GHz
    => ~1.86 Hz to read 1 bit of data  (storage + network)
    => ~1.69 Hz to read 1 bit of data (storage only)
    => 10% goes to network traffic

1M, 70% read 30% write, 100% random (19,740.90 IOPS in VM guest, 19,738.00 MB/s in VM guest) 
    => CPU Usage ~61 GHz

1M, 100% write, 100% random (15,174.08 IOPS in VM guest, 15,171.00  MB/s in VM guest) 
    => CPU Usage ~59 GHz
    => ~2.78 Hz to write 1 bit of data  (storage + network)
    => ~2.37 Hz to write 1 bit of data (storage only)
    => 17% goes to network traffic

Reading 1 bit of information from vSAN hyper-converged storage requires roughly between ~1.86 Hz (1024 KB I/O size) and 2.5 Hz (32 KB I/O size).

Writing 1 bit of information to vSAN hyper-converged storage requires roughly between ~2.78 Hz (1024 KB I/O size) and 7.13 Hz (32 KB I/O size).

The above numbers are not set in stone but it is good to observe system behavior. 

When I had no IOPS limits in vSAN Storage Polices, I was able to fully saturate ESXi CPU's. That's a clear sign that storage subsystem (NVMe NAND Flash disks) nor ethernet/ip network (up to 50 Gbps via a single vmnic4) are bottlenecks. The bottleneck in my case is CPU. However, there is always some bottleneck and we are not looking for maximum storage performance, but for predictable and consistent storage performance without a negative impact on other resources (CPU, Network). 

That's the reason why it is really good to know at least these rough numbers to do some capacity/performance planning of hyper-converged vSAN solution.

With IOPS limit 5,000, 144 vDisks @ 5000 IOPS can have a sustainable response time of around 2 ms (32 KB I/O). The vSphere/vSAN infrastructure is designed for ~150 VM's so that's perfectly balanced. We have other two VM Storage Polices (10,000 IOPS limit and 15,000 IOPS limit) for more demanding VMs hosting SQL Servers and other storage-intensive workloads.

That's about 720,000 IOPS aggregated in total. Pretty neat for a 6-node vSAN cluster, isn't it? 

Observation 2 - Between 10% and 30% CPU is consumed due to TCP network traffic

vSAN is a hyper-converged (Compute, Storage, Network) software-defined storage striping data across ESXi hosts, thus heavily leveraging standard ethernet network and tcp/ip for transport storage data across vSAN nodes (ESXi hosts). vSAN RAID (Redundant Array of Independent Disks) is actually RAIN (Redundant Array of Independent Nodes), therefore the network is highly utilized during heavy storage load. You can see the numbers above in the test results.

As I planned, designed, and implemented vSAN on Cisco UCS infrastructure with 100Gb networking (partitioned into 2x32Gb FCoE , 2x10 Gb Ethernet, 2x10Gb Ethernet, 2x50Gb Ethernet), RDMA over Converged Ethernet (RoCE) would be great to use to decrease CPU requirements and even improve latency and I/O response time. It seems RoCE v2 is supported on vSphere 8.0 U3 for my network interface card Cisco VIC 15230 (driver nenic version 2.0.11) but Cisco is not listed among vendors supporting vSAN over RDMA. I will try to ask somebody in Cisco what's the reason and if they have something in the roadmap.
  


Sunday, November 17, 2024

ESXi update from cli

Step 1: upload the VMware-ESXi-8.0U3b-24280767-depot.zip file to a datastore accessible by the host.

esxcli software sources profile list -d /vmfs/volumes/[datastore]/VMware-ESXi-8.0U3b-24280767-depot.zip

esxcli software profile update -d “/vmfs/volumes/[datastore]/VMware-ESXi-8.0U3b-24280767-depot.zip” -p ESXi-8.0U3b-24280767-standard


Saturday, November 16, 2024

VMware vCenter (VCSA) Update via shell command software-packages.py

Online update

cd /usr/lib/applmgmt/support/scripts

./software-packages.py stage --url --acceptEulas

./software-packages.py list --staged

./software-packages.py validate

./software-packages.py install

ISO update

Download the VCSA patch which should end with FP.iso from support.brocade.com > selecting VC and the version.

Upload the file to a datastore and map it to the VCSA VM through CD / DVD Drive option.

Patch the VCSA from CLI.

Run the following commands

software-packages.py stage –-iso

software-packages.py list –-staged

software-packages.py install –-staged

Reboot the VCSA VM.

This should patch the VCSA

Thursday, November 14, 2024

Mount SFTP share via sshfs

 #!/bin/bash

sshfs david.pasek@gmail.com@sftp.virtix.cloud:./ ~/mnt/sftp -p 55022

Wednesday, November 13, 2024

OpenNebula - VMware Alternative

Web Admin Management Interface (SunStone) is at https://[IP]:2616

Main Admin User Name: oneadmin

Default network is 172.16.100/24

Repo: https://downloads.opennebula.io/repo/

Tuesday, November 12, 2024

Linux Remote Desktop based on open-source | ThinLinc by Cendio

Linux Remote Desktop based on open-source | ThinLinc by Cendio

https://www.cendio.com/

Keywords: RDP

Monitoring VMware vSphere with Zabbix

Source: https://vmattroman.com/monitoring-vmware-vsphere-with-zabbix/

Zabbix is an open-source monitoring tool designed to oversee various components of IT infrastructure, including networks, servers, virtual machines, and cloud services. It operates using both agent-based and agentless monitoring methods. Agents can be installed on monitored devices to collect performance data and report back to a centralized Zabbix server.

Zabbix provides comprehensive integration capabilities for monitoring VMware environments, including ESXi hypervisors, vCenter servers, and virtual machines (VMs). This integration allows administrators to effectively track performance metrics and resource usage across their VMware infrastructure.

In this post, I will show you how setup Zabbix monitoring with VMware vSpehre infrastructure.

Requirements:

  • Zabbix server
  • Access to the VMware vCenter Server

1. Create zabbix service user in the vCenter

At first, let’s create service user on the vCenter that will be used by Zabbix server to collect data. To make life easier, in my in lab setup user zabbix@vsphere.local will have full Administrator privileges. But, Read-Only permissions should be enough.

1. In vSphere Client choose Menu -> Administration -> Users and Groups. From Users tab, select Domain vsphere.local and click ADD button to add a new user.

2. Type a username and password. Click ADD to create a new user.

3. Change tab to Groups and select Administrators group.

4. Find a new user zabbix, click on it and save. User is added to the Administrators group.

5. From the Host and Clusters view, choose vCenter name and go to the Permissions tab. Click Add button.

6. Choose a proper domain (vsphere.local), find user zabbix , set role to Administrator and check Propagate to children. Click OK to give that permissions.

2. Make changes on the Zabbix server

Next, we need to edit zabbix_server.conf. In this file we need to enable vmware collector process. It’s necessary to start VMware monitoring.

FYI, I have installed Zabbix server in version 7.0.4.

1. Edit a configuration file zabbix_server.conf

vim /etc/zabbix/zabbix_server.conf

2. Find StartVMwareCollectors parameter, delete “#” before it and change value from 0 to at least 2.
Save a file and exit.

Except for StartVMwareCollectors which is mandatory, it’s possible to enable and modify additional VMware parameters. More details about them, you can find HERE.
VMwareCacheSize
VMwareFrequency
VMwarePerfFrequency
VMwareTimeout


3. Restart zabbix-server service.

systemctl restart zabbix-server

3. Configure VMware template on Zabbix

1.Log in to the Zabbix server via GUI – http://zabbix_server/zabbix
Go to the Hosts section under Monitoring tab.

2. Create a new “Host”. Click Create Host in the right upper corner.

3. In the Host tab provide the following details:
Host name – type a name of the system that we want to monitor, here is VMware Infrastructure;
Templates – type/find template name “VMware”, more info about VMware template you can find HERE;
Host groups – find/type “VMware(new)” host group.
Than, go to the Macros tab.

4. In the Macros tab you need to provide 3 values/macros. These macros describes data that it’s needed to connect Zabbix to the VMware vCenter.

{$VMWARE.URL} – VMware service (vCenter or ESXi hypervisor) SDK URL (https://servername/sdk) that we want to connect;
{$VMWARE.USERNAME} – VMware service username created in the 1 section;
{$VMWARE.PASSWORD} – VMware service user password created in the 1 section.
Click Add button.

5. New Host was created and collecting data is in progress.

6. Depending on the size of the infrastructure, data collection takes different times. Once configured, Zabbix will automatically discover VMs and begin collecting performance data. Overview of the lates data, you can find in the Dashboard screen.

7. More specific and detailed data, you can find in the Latest data under the Monitoring tab.

In Host groups or Hosts type (or click “Select” button) the name of the item you are looking for. Name of the ESXi host, virtual machine, vCenter name, datastore or all VMware information.

Zabbix can collect various metrics from VMware using its built-in templates. These metrics include:
– CPU usage
– Memory consumption
– Disk I/O statistics
– Network traffic
– Datastore capacity

Summary

In summary, integrating Zabbix with VMware provides a robust solution for monitoring virtualized environments, enhancing visibility into system performance and resource utilization while enabling timely alerts and responses to operational issues.


Jak zabránit čekání na obnoveni NFS datastore při startu ESXi?

Jak zabránit čekání na obnoveni NFS datastore při startu ESXi? 

Když vám ESXi odmítá startovat 1-2 hodiny, protože se pokouší připojit NFS datastore, které jsou dávno odstraněné.  

1. Proveďte restart ESXi 

2. Stiskněte Shift+O při startu 

3. Na konec řádku zadejte jumpstart.disable=restore-nfs-volumes 

4. Potvrďte pomocí klávesy Enter

Backup and restore ESXi host configuration data using command line

 Source: https://vmattroman.com/backup_and_restore_esxi_host_configuration_data_using_command_line/

Backup and restore ESXi host configuration data using command line

In some cases we need to reinstall ESXi host. To avoid time consuming setting up servers, we can quickly backup and restore host configuration. To achieve this, there are three possible ways: ESXi command line, vSphere CLI or PowerCLI.


In this article I will show how backup and restore host configuration data using ESXi command line.

1. Backup ESXi host configuration

1. Enable SSH service on the ESXi host.

2. SSH to the ESXi host.

3. Synchronize the configuration changed with persistent storage with a command:

vim-cmd hostsvc/firmware/sync_config

4. Back-up the configuration data for the ESXi host with a command:

vim-cmd hostsvc/firmware/backup_config

5. Copy generated http:// address to the web browser. In place of asterisk ‘*‘ put FQDN or IP of your ESXi host. Download the file.

6. This is a download ESXi data backup file:

2. Restore ESXi host configuration

1. Rename previously downloaded backup file from configBundle-vexpert-nuc.infra.home.tgz to configBundle.tgz

2. Put the host into maintenance mode with this command or from the web client:

vim-cmd hostsvc/maintenance_mode_enter

3. Copy configBundle.tgz to one of the available datastore in the host and reboot ESXi.

4. Than, move your backup file configBundle.tgz to /tmp

5. To restore the ESXi host configuration run this command:

vim-cmd hostsvc/firmware/restore_config 0

6. Exit from maintenance mode with a command:

vim-cmd hostsvc/maintenance_mode_exit

Monday, November 4, 2024

New SKUs / pricing (MSRP) for VMware available

 A new pricebook is out, effective November 11 2024:

The Essentials Plus SKU (VCF-VSP-ESPL-8) is going EOL as of 11th, therefore Enterprise Plus is coming back.

Also there is a price adjustment for VVF.

Item NumberDescriptionprice per Core per year MSRP USD
VCF-CLD-FND-5VMware Cloud Foundation 5$350,00
VCF-CLD-FND-EDGEVMware Cloud Foundation Edge - For Edge Deployments Only$225,00
VCF-VSP-ENT-PLUSVMware vSphere Enterprise Plus - Multiyear$120,00
VCF-VSP-ENT-PLUS-1YVMware vSphere Enterprise Plus 1YR$150,00
VCF-VSP-FND-1YVMware vSphere Foundation 1-Year$190,00
VCF-VSP-FND-8VMware vSphere Foundation 8, Multiyear$150,00
VCF-VSP-STD-8VMware vSphere Standard 8$50,00

What is ESXi Core Dump Size?

ESXi host Purple Screen of Death (PSOD) happens when VMkernel experiences a critical failure. This can be due to hardware issues, driver problems, etc. During the PSOD event, the ESXi hypervisor captures a core dump to help diagnose the cause of the failure. Here’s what happens during this process:

After a PSOD, ESXi captures a core dump, which includes a snapshot of the hypervisor memory and the state of the virtual machines. The core dump is stored based on the host configuration (core dump partition, file, or network), and it helps diagnose the cause of the critical failure by providing insights into the state of the system at the time of the crash. Core dump is crucial for troubleshooting and resolving the issues leading to PSOD. In ESXi 6.7, core dump was stored in partition but since ESXi 7, it is stored to file.

For vSphere design, I would like to know the typical core dump file size to allocate optimal storage space for core dumps. Of course, the size of core file depends on multiple factors but the main factor should be the memory used by vmKernel.   

ESXi host memory usage is split into three buckets

  1. vmKernel memory usage (core hypervisor)
  2. Other memory usage
    • BusyBox Console including
      • Core BusyBox Utilities (e.g., ls, cp, mv, ps, top, etc.)
      • Networking and Storage Tools (ifconfig, esxcfg-nics, esxcfg-vswitch, esxcli, etc.)
      • Direct Console User Interface (DCUI)
      • Management Agents and Daemons (hostd, vpxa, network daemons like SSH, DNS, NTP, and network file copy aka NFC)
  3. Free memory

Here are data from three different ESXi hosts I have access to. 

ESXi, 8.0.3 (24022510) with 128 GB (131 008 MB) physical RAM

  • vmKernel memory usage:  747 MB
  • Other memory usage: 20 264 MB
  • Free memory: 109 997 MB

ESXi, 8.0.3 (24022510) with 256 GB (262 034 MB) physical RAM

  • vmKernel memory usage:  1544 MB
  • Other memory usage: 21 498 MB
  • Free memory: 238 991 MB
In vSphere 8.0.3 core dump is set to be stored as a 3.6 GB file at location in ESX-OSData.
 [root@dp-esx02:~] esxcli system coredump file list  
 Path                                                   Active Configured    Size  
 ------------------------------------------------------------------------------------------------------- ------ ---------- ----------  
 /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  true    true 3882876928  

It is configured and active. 

 [root@dp-esx02:~] esxcli system coredump file get  
   Active: /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  
   Configured: /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  

The coredump file has 3.6 GB
 [root@dp-esx02:~] ls -lah /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  
 -rw-------  1 root   root    3.6G Oct 29 13:07 /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  

Now let's try first PSOD and watch what happens. Below is the command to initiate PSOD and the screenshot
 vsish -e set /reliability/crashMe/Panic 1  

VMware Support will ask you for zdump file (VMware proprietary bin file) which can be generated by command esxcfg-dumppart
 [root@dp-esx02:~] esxcfg-dumppart --file --copy --devname /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile --zdumpname /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02  
 Created file /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02.1  
 [root@dp-esx02:~] ls -lah /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02.1  
 -rw-r--r--  1 root   root   443.9M Oct 29 13:07 /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02.1  
The extracted VMkernel zdump file has 443.9 MB.

Now let's try the second PSOD.
 vsish -e set /reliability/crashMe/Panic 1  


ESXi, 7.0.3 (23794027) with 512 GB (524 178 MB) physical RAM

  • vmKernel memory usage:  3 261 MB
  • Other memory usage: 369 029 MB
  • Free memory: 151 888 MB