Pages

Monday, November 4, 2024

What is ESXi Core Dump Size?

ESXi host Purple Screen of Death (PSOD) happens when VMkernel experiences a critical failure. This can be due to hardware issues, driver problems, etc. During the PSOD event, the ESXi hypervisor captures a core dump to help diagnose the cause of the failure. Here’s what happens during this process:

After a PSOD, ESXi captures a core dump, which includes a snapshot of the hypervisor memory and the state of the virtual machines. The core dump is stored based on the host configuration (core dump partition, file, or network), and it helps diagnose the cause of the critical failure by providing insights into the state of the system at the time of the crash. Core dump is crucial for troubleshooting and resolving the issues leading to PSOD. In ESXi 6.7, core dump was stored in partition but since ESXi 7, it is stored to file.

For vSphere design, I would like to know the typical core dump file size to allocate optimal storage space for core dumps. Of course, the size of core file depends on multiple factors but the main factor should be the memory used by vmKernel.   

ESXi host memory usage is split into three buckets

  1. vmKernel memory usage (core hypervisor)
  2. Other memory usage
    • BusyBox Console including
      • Core BusyBox Utilities (e.g., ls, cp, mv, ps, top, etc.)
      • Networking and Storage Tools (ifconfig, esxcfg-nics, esxcfg-vswitch, esxcli, etc.)
      • Direct Console User Interface (DCUI)
      • Management Agents and Daemons (hostd, vpxa, network daemons like SSH, DNS, NTP, and network file copy aka NFC)
  3. Free memory

Here are data from three different ESXi hosts I have access to. 

ESXi, 8.0.3 (24022510) with 128 GB (131 008 MB) physical RAM

  • vmKernel memory usage:  747 MB
  • Other memory usage: 20 264 MB
  • Free memory: 109 997 MB

ESXi, 8.0.3 (24022510) with 256 GB (262 034 MB) physical RAM

  • vmKernel memory usage:  1544 MB
  • Other memory usage: 21 498 MB
  • Free memory: 238 991 MB
In vSphere 8.0.3 core dump is set to be stored as a 3.6 GB file at location in ESX-OSData.
 [root@dp-esx02:~] esxcli system coredump file list  
 Path                                                   Active Configured    Size  
 ------------------------------------------------------------------------------------------------------- ------ ---------- ----------  
 /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  true    true 3882876928  

It is configured and active. 

 [root@dp-esx02:~] esxcli system coredump file get  
   Active: /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  
   Configured: /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  

The coredump file has 3.6 GB
 [root@dp-esx02:~] ls -lah /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  
 -rw-------  1 root   root    3.6G Oct 29 13:07 /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile  

Now let's try first PSOD and watch what happens. Below is the command to initiate PSOD and the screenshot
 vsish -e set /reliability/crashMe/Panic 1  

VMware Support will ask you for zdump file (VMware proprietary bin file) which can be generated by command esxcfg-dumppart
 [root@dp-esx02:~] esxcfg-dumppart --file --copy --devname /vmfs/volumes/66d993b7-e9cd83a8-b129-0025b5ea0e15/vmkdump/00000000-00E0-0000-0000-000000000008.dumpfile --zdumpname /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02  
 Created file /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02.1  
 [root@dp-esx02:~] ls -lah /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02.1  
 -rw-r--r--  1 root   root   443.9M Oct 29 13:07 /vmfs/volumes/DP-STRG02-Datastore01/zdump-coredump.dp-esx02.1  
The extracted VMkernel zdump file has 443.9 MB.

Now let's try the second PSOD.
 vsish -e set /reliability/crashMe/Panic 1  


ESXi, 7.0.3 (23794027) with 512 GB (524 178 MB) physical RAM

  • vmKernel memory usage:  3 261 MB
  • Other memory usage: 369 029 MB
  • Free memory: 151 888 MB



No comments:

Post a Comment