Pages

Thursday, July 3, 2025

How to install and configure scanner in Linux Mint

Install scanner software

apt install sane sane-utils

Configure scanner 

Add scanner IP address to file /etc/sane.d/net.conf as visible in screenshot below.


192.168.4.200 is IP address of my scanner available over network.

Using the scanner

You can run xsane directly from terminal ...

xsane

... or find it and execute it from Desktop Environment Application menu.

Saturday, June 28, 2025

How to Install and Configure NVIDIA Graphics Card in FreeBSD

 

[SKIP - NOT USED] Install driver for NVIDIA Graphics Card

pkg install nvidia-driver
sysrc kld_list+="nvidia nvidia-modeset"
sysrc linux_enable="YES" 

[SKIP - NOT USED] Configure the NVIDIA driver in a configuration file

cat >> /usr/local/etc/X11/xorg.conf.d/20-nvidia.conf << EOF
Section "Device"
    Identifier "Card0"
    Driver     "nvidia"
    BusID     "pci0:0:1:0"  
EndSection
EOF

[SKIP - NOT USED] NVIDIA configuration (it creates /etc/X11/xorg.conf)

pkg install nvidia-xconfig
nvidia-xconfig

Tuesday, June 17, 2025

How to get VMs with specific custom attribute?

Here is the Onliner to list VMs with custom attribute "Last Backup" ...

Get-VM | Select-Object Name, @{N='LastBackup';E={($_.CustomFields | Where-Object {$_.Key -match "Last Backup"}).Value}} | Where-Object {$_.LastBackup -ne $null -and $_.LastBackup -ne ""}

and here is the another one to count the number of such VMs ...

Get-VM | Select-Object Name, @{N='LastBackup';E={($_.CustomFields | Where-Object {$_.Key -match "Last Backup"}).Value}} | Where-Object {$_.LastBackup -ne $null -and $_.LastBackup -ne ""} | Measure-Object | Select-Object Count

 

How to get all VMs restarted by VMware vSphere HA? PowerCLI OneLiner below will do the magic ...

Get-VIEvent -MaxSamples 100000 -Start (Get-Date).AddDays(-1) -Type Warning | Where {$_.FullFormattedMessage -match "restarted"} | select CreatedTime,FullFormattedMessage | sort CreatedTime -Descending | Format-Table


Sunday, June 15, 2025

How to compress PDF file in Linux

I'm using Linux Mint with xsane for scanning documents on my old but still good Canon MX350 printer/scanner. Scans are saved as huge PDF documents (for example 50 MB) and I would like to compress it to consume much less disk space.

Install Ghostscript

apt install ghostscript

Compress the file input.pdf

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output_compressed.pdf input.pdf

Let's break down these options

  • -sDEVICE=pdfwrite: Tells Ghostscript to output a PDF file.
  • -dCompatibilityLevel=1.4: Sets the PDF version. Version 1.4 is quite old but widely compatible and often allows for good compression. You can try 1.5 or 1.6 for slightly more modern features and potentially better compression in some cases.
  • -dPDFSETTINGS=/ebook: This is the main compression control. As mentioned, /ebook usually gives a good balance.
  • -dNOPAUSE -dQUIET -dBATCH: These make Ghostscript run silently and non-interactively.
  • -sOutputFile=output_compressed.pdf: Specifies the name of the compressed output file.
  • input.pdf: original 50 MB PDF.

Lossy compression (322x) from 50 MB to 155 KB without any visible degradation is worth to keep cloud (Google drive) costs low.


Sunday, June 1, 2025

My VIM configuration file

My preferred editor in unix-like systems is vi or vim. VI is everywhere and VIM is improved for scripting and coding.

Below is my VIM config file /home/dpasek/.vimrc

 syntax on  
 filetype plugin indent on  

 " Show line numbers  
 set number  

 " Show relative line numbers (optional, good for motions like 5j/5k)  
 " set relativenumber  
 " Highlight matching parentheses  
 set showmatch  

 " Enable auto-indentation  
 set smartindent  
 set autoindent  

 " Use spaces instead of tabs, and set width (adjust to taste)  
 set expandtab  
 set tabstop=4  
 set shiftwidth=4  
 set softtabstop=4  

 " Show line and column in status line  
 set ruler  

 " Show partial command in bottom line  
 set showcmd  

 " Show a vertical line at column 80 (optional)  
 set colorcolumn=80  
 
 " Disable VIM mouse handling and keep it to terminal  
 set mouse=  

 " Enable persistent undo (requires directory)  
 set undofile  
 set undodir=~/.vim/undodir  
 
 " Make backspace behave sanely  
 set backspace=indent,eol,start  
 
 " Enable searching while typing  
 set incsearch  
 set hlsearch     " Highlight all matches  
 set ignorecase    " Case insensitive search...  
 set smartcase     " ...unless capital letter used  
 
 " Status line always visible  
 set laststatus=2  

 

Sunday, May 18, 2025

VMware VCF's SDDC Backup over sftp

You can do a native VCF SDDC Manager backup via SFTP protocol. SFTP is a file transfer protocol that operates over the SSH protocol. When using SFTP for VMware VCF's backup, you're effectively using the SSH protocol for transport.

For VCF older than 5.1, you have to allow ssh-rsa algorithm for host key and user authentication on your SSH Server.

It is configurable in SSH Daemon Configuration (/etc/ssh/sshd_config) on your backup server should have following lines to allow ssh-rsa algorithm for host key and user authentication.

# add ssh-rsa to the list of acceptable host key algorithms
HostKeyAlgorithms +ssh-rsa
 
# allow the ssh-rsa algorithm for user authentication
PubkeyAcceptedAlgorithms +ssh-rsa
 
 
This should not be necessary for SDDC Manager in VCF 5.1 and later.
 

Friday, May 9, 2025

RaspberryPi - GPIO control over Web Interface

How to use RaspberryPi inputs and outputs? The easiest way is to use the GPIO pins directly on the RaspberryPi board.

Hardware

Raspberry Pi has 8 freely accessible GPIO ports. which can be controlled. In the following picture they are colored green. 

GPIO ports

Attention!!! GPIO are 3.3V and do not tolerate 5V !! Maximum current is 16mA !! It would be possible to use more of them by changing the configuration.

Software

First you need to install the ligthhttpd (or apache ) server and PHP5:
sudo groupadd www-data
sudo apt-get install lighttpd
sudo apt-get install php5-cgi
sudo lighty-enable-mod fastcgi
sudo adduser pi www-data
sudo chown -R www-data:www-data /var/www
In the lighthttpd configuration

you need to add:
bin-path" => "/usr/bin/php5-cgi
socket" => "/tmp/php.socket"

Now you need to restart lighthttpd:
sudo /etc/init.d/lighttpd force-reload

This will run our webserver with PHP.

Now we get to the actual GPIO control. The ports can be used as input and output. Everything needs to be done as root.

First you need to make the port accessible:
echo "17" > /sys/class/gpio/export

Then we set whether it is an input (in) or output (out):
echo "out" > /sys/class/gpio/gpio17/direction

Set the value like this:
echo 1 > /sys/class/gpio/gpio17/valu

Read the status:
cat /sys/class/gpio/gpio17/value

This way we can control GPIO directly from the command line. If we use the www interface for control, we need to set the rights for all ports so that they can be controlled by a user other than root.
chmod 666 /sys/class/gpio/gpio17/value
chmod 666 /sys/class/gpio/gpio17/direction

Saturday, May 3, 2025

How to create a template on XCP-ng with XenOrchestra

"In this post I will show you how to create a template in XenOrchestra and using an image we created and customized ourself. " ... full blog post is available at https://blog.bufanda.de/how-to-create-a-template-on-xcp-ng-with-xenorchestra/

Thursday, February 13, 2025

VMware vSAN ESA on Cisco UCS - TCP Connection Half Open Drop Rate

During the investigation of high disk response times in one VM using vSAN storage, I saw a strange vSAN metric (TCP Connection Half Open Drop Rate).

What is it?

I have opened support ticket with VMware Support (2025-02-13) and started my own troubleshooting in paralel.

 

vSAN ESA - TCP Connection Half Open Drop issue

Here is the screenshot of vSAN ESA - Half Open Drop Rate over 50% on some vSAN Nodes ...

vSAN ESA - Half Open Drop Rate over 50% on some vSAN Nodes

Physical infrastructure schema

Here is the physical infrastructure schema of VMware vSAN ESA cluster ...

The schema of Physical infrastructure

 Virtual Networking schema

Here is the virtual networing schema of VMware vSphere ESXi host (vSAN Node) participating in vSAN ESA cluster ...

Virtual Networking of ESXi Host (vSAN Node)

vSAN Cluster state

  • ESX01 dcserv-esx05 192.168.123.21 (agent)  [56% half-open drop]
  • ESX02 dcserv-esx06 192.168.123.22 (backup) [98% half-open drop]
  • ESX03 dcserv-esx07 192.168.123.23 (agent)  [54% half-open drop]
  • ESX04 dcserv-esx08 192.168.123.24 (agent)  [0% half-open drop]
  • ESX05 dcserv-esx09 192.168.123.25 (master) [0% half-open drop] but once per some time (hour or so) 42% - 49% drop
  • ESX06 dcserv-esx10 192.168.123.26 (agent)  [0% half-open drop]

Do I have problem? I’m not certain, but it doesn’t appear to be the case.

I have seen high virtual disk latency on VM (docker host with single NVMe vDisk) with the storage load less than 12,000 IOPS (IOPS limit set to 25,000), so that was the reason why I was checking vSAN ESA infrastructure deeper and found the TCP Half Open Drop "issue".

High vDisk (vNVMe) response times in first week of February

However, IOmeter in Windows server with single SCSI vDisk on SCSI0:0 adapter is able to generate almost 25,000 IOPS @ 0.6 ms response time of 28.5KB-100%_read-100%_random storage pattern with 12 workers (threads)

12 workers on SCSI vDisk - we see performance of 25,000 IOPS @ 0.6 ms response time
 
It is worth to mention, that approximately 2,600 IOPS (512B I/O size) - 1,400 IOPS (1MB I/O size) per storage worker is not only vSAN but any shared enterprise storages "artificial" throuhput limit for good reason (explanation of the reason is another topic), however, it's essential to use more workers (threads, oustanding I/Os)  to achieve higher performance/throughput. Bellow is the performance result of single worker (thread) with 4KB I/O size.
 
Single worker (thread) with 4KB I/O size
 
So, let's use more workers (more storage threads = leveraging higher queue depth = higher paralelization) and test how many 28.5KB IOPS we can achieve on single vDisk.
 
With 64 workers (IOmeter_64workers_28.5KB_IO_100%_read_100%_random) I can generate 108,000 IOPS @ 0.6 ms response time.
 
64 workers on SCSI vDisk we see performance of 108,000 IOPS @ 0.6 ms response time
 
It is important mention that all above test were done on SCSI vDisk on PVSCSI adapter which has 256 queue depth, so performance can, if storage subsystem allows it, theoretically scale up to 256 workers.
 
However, if we use 128 workers (IOmeter_64workers_28.5KB_IO_100%_read_100%_random) we can see that storage subsystem does not handle it, performance 98,000 IOPS is even lower than performance with 64 workers and response time increase to 1.3 ms.
 
128 workers on SCSI vDisk we see performance of 98,300 IOPS @ 1.3 ms response time
 
If we use the same storage workload with 128 workers (IOmeter_64workers_28.5KB_IO_100%_read_100%_random) but with NVMe vDisk instead of SCSI vDisk, we can see that storage subsystem can handle 108,000 IOPS @ 1.2 ms but it is still worse performance quality than 64 workers on SCSI vDisk from response time perspective (1.2 ms vs 0.6 ms response time).
 
128 workers on NVMe vDisk we see performance of 108,000 IOPS @ 1.2 ms response time
 
If we test 64 workers on NVMe vDisk we see performance of 110,000 IOPS @ 0.6 ms response time.
 
64 workers on NVMe vDisk we see performance of 110,000 IOPS @ 0.6 ms response time
 
Anyway, all tests above shows pretty good storage performance on vSAN ESA cluster experiencing TCP Connection Half Open Drop Rate.

Network Analysis - packet capturing

What is happening in vSAN Node (dcserv-esx06) in maintenance mode with all vSAN storage migrated out of node?

[root@dcserv-esx06:/usr/lib/vmware/vsan/bin] pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'src host 192.168.123.22 and tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0'
The name of the uplink is vmnic4.
The session capture point is UplinkRcvKernel,UplinkSndKernel.
pktcap: The output file is -.
pktcap: No server port specifed, select 30749 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 30749.
pktcap: Main thread: 305300921536.
pktcap: Dump Thread: 305301452544.
pktcap: The output file format is pcapng.
pktcap: Recv Thread: 305301980928.
pktcap: Accept...
reading from file -pktcap: Vsock connection from port 1032 cid 2.
, link-type EN10MB (Ethernet), snapshot length 65535
09:19:52.104211 IP 192.168.123.22.52611 > 192.168.123.23.2233: Flags [SEW], seq 2769751215, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 401040956 ecr 0], length 0
09:20:52.142511 IP 192.168.123.22.55264 > 192.168.123.23.2233: Flags [SEW], seq 3817033932, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 1805625573 ecr 0], length 0
09:21:52.182787 IP 192.168.123.22.57917 > 192.168.123.23.2233: Flags [SEW], seq 2055691008, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 430011832 ecr 0], length 0
09:22:26.956218 IP 192.168.123.22.59456 > 192.168.123.23.2233: Flags [SEW], seq 3524784519, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 2597182302 ecr 0], length 0
09:22:52.225550 IP 192.168.123.22.60576 > 192.168.123.23.2233: Flags [SEW], seq 3089565460, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 378912106 ecr 0], length 0
09:23:52.397431 IP 192.168.123.22.63229 > 192.168.123.23.2233: Flags [SEW], seq 2552721354, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 2409421282 ecr 0], length 0
09:24:52.436734 IP 192.168.123.22.12398 > 192.168.123.23.2233: Flags [SEW], seq 3269754737, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3563144147 ecr 0], length 0
09:25:52.476565 IP 192.168.123.22.15058 > 192.168.123.23.2233: Flags [SEW], seq 1510936927, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 1972989571 ecr 0], length 0
09:26:52.515032 IP 192.168.123.22.17707 > 192.168.123.23.2233: Flags [SEW], seq 262766144, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3787605572 ecr 0], length 0
09:27:52.554904 IP 192.168.123.22.20357 > 192.168.123.23.2233: Flags [SEW], seq 2099691233, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 2472387791 ecr 0], length 0
09:28:52.598409 IP 192.168.123.22.23017 > 192.168.123.23.2233: Flags [SEW], seq 1560369055, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 688302913 ecr 0], length 0
09:29:52.641938 IP 192.168.123.22.25663 > 192.168.123.23.2233: Flags [SEW], seq 394113563, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3836880073 ecr 0], length 0
09:30:52.682276 IP 192.168.123.22.28221 > 192.168.123.23.2233: Flags [SEW], seq 4232787521, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 830544087 ecr 0], length 0
09:31:52.726506 IP 192.168.123.22.30871 > 192.168.123.23.2233: Flags [SEW], seq 3529232466, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3037414646 ecr 0], length 0
09:32:52.768689 IP 192.168.123.22.33520 > 192.168.123.23.2233: Flags [SEW], seq 3467993307, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3716244554 ecr 0], length 0
09:33:52.809641 IP 192.168.123.22.36184 > 192.168.123.23.2233: Flags [SEW], seq 2859309873, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 1556603624 ecr 0], length 0
09:34:52.849282 IP 192.168.123.22.38830 > 192.168.123.23.2233: Flags [SEW], seq 891574849, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 226049490 ecr 0], length 0
09:35:52.889434 IP 192.168.123.22.41487 > 192.168.123.23.2233: Flags [SEW], seq 1629372626, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 100385827 ecr 0], length 0
09:36:52.931192 IP 192.168.123.22.44140 > 192.168.123.23.2233: Flags [SEW], seq 3898717755, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3230029896 ecr 0], length 0
09:37:52.972758 IP 192.168.123.22.46788 > 192.168.123.23.2233: Flags [SEW], seq 3798420138, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 1400467195 ecr 0], length 0
09:38:53.013565 IP 192.168.123.22.49449 > 192.168.123.23.2233: Flags [SEW], seq 1759807546, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 1072184991 ecr 0], length 0
09:39:53.055394 IP 192.168.123.22.52096 > 192.168.123.23.2233: Flags [SEW], seq 2996482935, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3573008833 ecr 0], length 0
09:40:53.095123 IP 192.168.123.22.54754 > 192.168.123.23.2233: Flags [SEW], seq 103237119, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3275581229 ecr 0], length 0
09:41:53.136593 IP 192.168.123.22.57408 > 192.168.123.23.2233: Flags [SEW], seq 2105630912, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 1990595855 ecr 0], length 0
09:42:53.178033 IP 192.168.123.22.60054 > 192.168.123.23.2233: Flags [SEW], seq 4245039293, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 296668711 ecr 0], length 0
09:43:38.741557 IP 192.168.123.22.62070 > 192.168.123.23.2233: Flags [SEW], seq 343657957, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3406471577 ecr 0], length 0
09:43:53.219844 IP 192.168.123.22.62713 > 192.168.123.23.2233: Flags [SEW], seq 452468561, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3555078978 ecr 0], length 0
09:44:53.264107 IP 192.168.123.22.11779 > 192.168.123.23.2233: Flags [SEW], seq 3807775128, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3836709718 ecr 0], length 0
09:45:53.306117 IP 192.168.123.22.14431 > 192.168.123.23.2233: Flags [SEW], seq 3580778695, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3478626421 ecr 0], length 0
09:46:53.348438 IP 192.168.123.22.17083 > 192.168.123.23.2233: Flags [SEW], seq 1098229669, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 2219974257 ecr 0], length 0
09:47:53.386992 IP 192.168.123.22.19737 > 192.168.123.23.2233: Flags [SEW], seq 1338972264, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 708281300 ecr 0], length 0
09:48:53.426861 IP 192.168.123.22.22389 > 192.168.123.23.2233: Flags [SEW], seq 3973038592, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3153895628 ecr 0], length 0
09:49:53.469640 IP 192.168.123.22.25046 > 192.168.123.23.2233: Flags [SEW], seq 2367639206, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3155172682 ecr 0], length 0
09:50:53.510996 IP 192.168.123.22.27703 > 192.168.123.23.2233: Flags [SEW], seq 515312838, win 65535, options [mss 8960,nop,wscale 9,sackOK,TS val 3434645295 ecr 0], length 0

How does TCP SYN/SYN-ACK behave between DCSERV-ESX06 and other vSAN nodes?

ESXi command to sniff TCP SYN from DCSERV-ESX06 (192.168.123.23) to DCSERV-ESX07 (192.168.123.23) is

pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'src host 192.168.123.22 and dst host 192.168.123.23 and tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0'

Command to sniff TCP SYN-ACK is

pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'src host 192.168.123.23 and dst host 192.168.123.22 and tcp[tcpflags] & (tcp-syn|tcp-ack) = (tcp-syn|tcp-ack)' 

Here are observations and screenshots from sniffing excercise.

No new TCP connections have been initiated between DCSERV-ESX06 (backup vSAN node) and DCSERV-ESX05 (agent vSAN node) in some limited sniffing time (several minutes).

Between DCSERV-ESX06 (192.168.123.22, backup vSAN node) and DCSERV-ESX07 (192.168.123.23, agent vSAN node) new TCP Connection is established (SYN/SYN-ACK) every minute.

No new TCP connections have been initiated between DCSERV-ESX06 (192.168.123.22, backup vSAN node) and DCSERV-ESX08 (192.168.123.24, agent vSAN node) in some limited sniffing time (several minutes).

No new TCP connections have been initiated between DCSERV-ESX06 (192.168.123.22, backup vSAN node) and DCSERV-ESX09 (192.168.123.25, agent vSAN node) in some limited sniffing time (several minutes).

No new TCP connections have been initiated between DCSERV-ESX06 (192.168.123.22, backup vSAN node) and DCSERV-ESX10 (192.168.123.26, agent vSAN node) in some limited sniffing time (several minutes).

Interesting observation

New TCP Connection between DCSERV-ESX06 (192.168.123.22, backup vSAN node) and DCSERV-ESX07 (192.168.123.23, agent vSAN node) is usually established (SYN/SYN-ACK) every minute.

Why this happening only between DCSERV-ESX06 (backup node) and DCSERV-ESX07 (agent node) and not with other nodes? I do not know.

Further TCP network troubleshooting

Next step is to collect TCP SYN, TCP SYN/ACK, TCP stats, and NET stats on DCSERV-ESX06 (most "problematic" vSAN node) and DCSERV-ESX10 (not "problematic" vSAN node) into the files. I will capture data during one hour (60 minutes) to be able to compare number of SYN and SYN/ACK packets and compare it with TCP and network statistics. 

Capturing of TCP SYN

timeout -t 3600 pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0' > /tmp/dcserv-esx06_tcp-syn.dump

timeout -t 3600 pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0' > /tmp/dcserv-esx10_tcp-syn.dump

Capturing of TCP SYN/ACK

timeout -t 3600 pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'tcp[tcpflags] & (tcp-syn|tcp-ack) = (tcp-syn|tcp-ack)' > /tmp/dcserv-esx06_tcp-syn_ack.dump

timeout -t 3600 pktcap-uw --uplink vmnic4 --capture UplinkRcvKernel,UplinkSndKernel -o - | tcpdump-uw -r - 'tcp[tcpflags] & (tcp-syn|tcp-ack) = (tcp-syn|tcp-ack)' > /tmp/dcserv-esx10_tcp-syn_ack.dump

Capturing of TCP Statistics

for i in $(seq 60); do { date; vsish  -e get /net/tcpip/instances/defaultTcpipStack/stats/tcp; }  >> /tmp/dcserv-esx06_tcp_stats; sleep 60; done

for i in $(seq 60); do { date; vsish  -e get /net/tcpip/instances/defaultTcpipStack/stats/tcp; }  >> /tmp/dcserv-esx10_tcp_stats; sleep 60; done 

Capturing of TCP Statistics

netstat  captures 60 min with 30 sec x 120 times = 3600 sec = 60 min 

for i in $(seq 120); do { date; net-stats -A -t WwQqihVv -i 30; } >> /tmp/dcserv-esx06_netstats ; done

for i in $(seq 120); do { date; net-stats -A -t WwQqihVv -i 30; } >> /tmp/dcserv-esx10_netstats ; done

Output Files Comparison

ESX06
tcpdump
15:48:32.422347 - 16:48:16.542078: 199 TCP SYN
15:49:16.434140 - 16:48:46.533262: 199 TCP SYN/ACK

Fri Mar  7 15:49:10 UTC 2025
tcp_statistics
   connattempt:253432751
   accepts:3996127
   connects:8341861
   drops:4778493
   conndrops:247894569
   minmssdrops:0
   closed:257671058

Fri Mar  7 16:48:10 UTC 2025
tcp_statistics
   connattempt:253587720
   accepts:3997071
   connects:8345071
   drops:4781004
   conndrops:248047267
   minmssdrops:0
   closed:257827086

tcp_statistics difference
   connattempt:154969
   accepts:944
   connects:3210
   drops:2511
   conndrops:152698
   minmssdrops:0
   closed:156028

ESX10
tcpdump
15:49:44.554242 - 16:49:16.544940: 179 TCP SYN
15:50:16.441776 - 16:49:54.142493: 185 TCP SYN/ACK

Fri Mar  7 15:50:49 UTC 2025
tcp_statistics
   connattempt:826534
   accepts:2278888
   connects:3105348
   drops:1414905
   conndrops:74
   minmssdrops:0
   closed:3338137

Fri Mar  7 16:49:49 UTC 2025
tcp_statistics
   connattempt:826864
   accepts:2279789
   connects:3106579
   drops:1415439
   conndrops:74
   minmssdrops:0
   closed:3339470

Difference
   connattempt:330
   accepts:901
   connects:1231
   drops:534
   conndrops:0
   minmssdrops:0
   closed:1333
 

What does it mean? I don't know. I have VMware support case opened and waiting on their analysis.

There were various calls with various parts of VMware support but here is the first meaningful response from VMware support (2025-04-03 - 50 days after opening a support ticket)

Your capture is highly filtered and many details are missing. Please consider the following points when collecting the capture:

  1. Use the pktcap-uw command and capture in .pcap format. Collecting all the data in a single file will help us trace packets to specific connections.
  2. Capture all TCP packets, not just SYN/SYN-ACK. Half-open drops are usually caused by RESET packets
  3. TCP uses the same set of statistics for the entire network stack. Therefore, we must collect packets from all vmk interfaces in the default network stack, or from a common uplink.

You can use a command similar to below one:

pktcap-uw --vmk <vmk> --proto 0x6 --dir 2 -o <file.pcap>
pktcap-uw --uplink <vmnic> --proto 0x6 --dir 2 -o <file.pcap>

Ok. No problem. Let's do a packet capturing of everything going through uplink used by vSAN.

My vSAN ESA vmkernel interface is pined to vmnic4, therefore I used following command

cd /vmfs/volumes/MY-DATASTORE
pktcap-uw --uplink vmnic4 --proto 0x6 --dir 2 -o netdump.pcap

It is good to monitor datastore usage as it dumps 30GB of network trafic in 4 minutes. 

Another meaningful communication with VMware support (2025-05-08 - 85 days after opening a support ticket)

VMware support asked me for another packet capturing. They want packet capture not only from uplink used for vSAN traffic (VMKNIC4), but also from uplinks VMKNIC0, VMKNIC1, and VMKNIC5, where if vSphere management traffic.

Below is onliner I used to capture network traffic and split it into ~2 GB (2,000 MB) files as requested by VMware support.

cd /vmfs/volumes/MY-DATASTORE 
 
timeout -t 360 pktcap-uw --uplink vmnic0 --proto 0x6 --dir 2 -o - | tcpdump-uw -r - -w vmnic0-pcap -C 2000 & \
timeout -t 360 pktcap-uw --uplink vmnic1 --proto 0x6 --dir 2 -o - | tcpdump-uw -r - -w vmnic1-pcap -C 2000 & \
timeout -t 360 pktcap-uw --uplink vmnic4 --proto 0x6 --dir 2 -o - | tcpdump-uw -r - -w vmnic4-pcap -C 2000 & \
timeout -t 360 pktcap-uw --uplink vmnic5 --proto 0x6 --dir 2 -o - | tcpdump-uw -r - -w vmnic5-pcap -C 2000 &

Explanation of onliner above:

  • timeout 360 : limit packet capturing to 6 minutes to keep overall packet capture data capacity below 30 GB
  • -o - : Sends raw pcap data to stdout.
  • tcpdump -r -: Reads from stdin
  • -w /tmp/vmk0-%Y%m%d-%H%M%S.pcap: Uses timestamped filenames.
  • -C 2000: Splits output files every 2000 MB (~2GB).

I've sent this new packet capture to VMware Support again and waited for their response.

Another meaningful communication with VMware support (2025-05-15 - 92 days after opening a support ticket)

VMware response ...

Hello David,

Etcd is the misbehaving application. Looks like some of the hosts (100.68.81.23 and 100.68.81.21) dont have etcd configured and this host is trying to reach them. Can you help check why this configuration is missing on some of the hosts.

34 0.087251 0.000057000 100.68.81.23 100.68.81.22 2380 → 58192 [RST, ACK] Seq=0 Ack=2589825032 Win=0 Len=0 34
35 0.087370 0.000119000 100.68.81.23 100.68.81.22 2380 → 58193 [RST, ACK] Seq=0 Ack=1816019462 Win=0 Len=0 35
38 0.093287 0.000060000 100.68.81.21 100.68.81.22 2380 → 58194 [RST, ACK] Seq=0 Ack=3524013708 Win=0 Len=0 38
39 0.093407 0.000120000 100.68.81.21 100.68.81.22 2380 → 58195 [RST, ACK] Seq=0 Ack=2552292164 Win=0 Len=0 39
42 0.186674 0.000065000 100.68.81.23 100.68.81.22 2380 → 58196 [RST, ACK] Seq=0 Ack=428680618 Win=0 Len=0 42
43 0.186793 0.000119000 100.68.81.23 100.68.81.22 2380 → 58197 [RST, ACK] Seq=0 Ack=1113298373 Win=0 Len=0 43
46 0.193167 0.000056000 100.68.81.21 100.68.81.22 2380 → 58198 [RST, ACK] Seq=0 Ack=1739165024 Win=0 Len=0 46
47 0.193286 0.000119000 100.68.81.21 100.68.81.22 2380 → 58199 [RST, ACK] Seq=0 Ack=3827463043 Win=0 Len=0 47
50 0.286874 0.000073000 100.68.81.23 100.68.81.22 2380 → 58201 [RST, ACK] Seq=0 Ack=1641220058 Win=0 Len=0 50
51 0.286874 0.000000000 100.68.81.23 100.68.81.22 2380 → 58200 [RST, ACK] Seq=0 Ack=1825411290 Win=0 Len=0 51

./var/run/log/etcd.log:1556:2025-02-13T12:59:27Z Wa(4) etcd[28532348]: health check for peer 7312e1f21f195833 could not connect: dial tcp 100.68.81.21:2380: connect: connection refused
./var/run/log/etcd.log:1557:2025-02-13T12:59:30Z Wa(4) etcd[28532348]: health check for peer 5c34e4f236d566f0 could not connect: dial tcp 100.68.81.23:2380: connect: connection refused
./var/run/log/etcd.log:1558:2025-02-13T12:59:30Z Wa(4) etcd[28532348]: health check for peer 5c34e4f236d566f0 could not connect: dial tcp 100.68.81.23:2380: connect: connection refused
./var/run/log/etcd.log:1560:2025-02-13T12:59:32Z Wa(4) etcd[28532348]: health check for peer 7312e1f21f195833 could not connect: dial tcp 100.68.81.21:2380: connect: connection refused
./var/run/log/etcd.log:1561:2025-02-13T12:59:32Z Wa(4) etcd[28532348]: health check for peer 7312e1f21f195833 could not connect: dial tcp 100.68.81.21:2380: connect: connection refused
./var/run/log/etcd.log:1562:2025-02-13T12:59:35Z Wa(4) etcd[28532348]: health check for peer 5c34e4f236d566f0 could not connect: dial tcp 100.68.81.23:2380: connect: connection refused

My thought process ...
 
Interesting. Why is there any ETCD in my vSphere/vSAN deployment? AFAIK, ETCD is only used when vSphere with Tanzu (TKG, Supervisor Cluster, Workload Management) is enabled. But this is not my case. I have pure vSphere with vSAN enabled.
 
I was thinking how can I help VMware support to check why ETCD configuration is missing on some of the hosts? Well, I think there should not be any ETCD in my deployment. So, lets check the ETCD status on all 6 ESXi hosts in my cluster.
 
I used following three commands on each ESXi host ...
 
ls -la /var/run/log/etcd.log            # Does exist etcd log file?
tail -f /var/run/log/etcd.log            
# What is the last etcd.log log entry?
ps | grep etcd                                # Does etcd process run in ESXi host?
 
... and summarize the findings into the following summary.

DCSERV-ESX05
etcd process: not running
last log entry: 2025-01-23T04:57:01Z In(6) etcd[19020602]: started streaming with peer 28f1baf9f89e1c97 (writer)

DCSERV-ESX06
etcd process: is running ... Why?
last log entry: 2025-05-15T21:05:20Z Wa(4) etcd[44266208]: health check for peer 5c34e4f236d566f0 could not connect: dial tcp 100.68.81.23:2380: connect: connection refused
 
DCSERV-ESX07
etcd process: not running
last log entry: 2024-12-18T17:26:22Z In(6) etcd[8404413]: started streaming with peer 549aa92459681df0 (writer)
 
DCSERV-ESX08
etcd process: not running
last log entry: 2024-11-25T15:26:45Z In(6) etcd[2115318]: stopped peer 71ecff499039aa21
 
DCSERV-ESX09
etcd process: is running ... Why?
last log entry: 2025-05-15T21:11:53Z Db(7) etcd[25597540]: start time = 2025-05-15 21:11:53.01956 +0000 UTC m=+20117.190157001, time spent = 120µs, remote = 100.68.81.25:28729, response type = /etcdserverpb.Cluster/MemberList, request count = -1, request size = -1, response count = -1, response size = -1, request content =
 
DCSERV-ESX10
etcd process: not running
last log entry: none, log file empty : -rw-------    1 root     root             0 Nov 21 15:35 /var/run/log/etcd.log

What does this all mean?

ETCD is running on two ESXi hosts: DCSERV-ESX06 and DCSERV-ESX09
 
TCP Connection Half Open Drop Rate is observed on three ESXi hosts: DCSERV-ESX05 (~55%), DCSERV-ESX06 (~98%), DCSERV-ESX07 (~55%)

The only common determinator is DCSERV-ESX06
 
It does not seem to correlate.
 
I would like to get answer to following question?
  • Why ETCD is running on two ESXi hosts when I have just vSphere and vSAN? There is no Tanzu (aka VMware vSphere Kubernetes Service) enabled.
  • I realized that two running ETCDs could be associated with two vCLS Pods and when consulting with ChatGPT, I have got following answers
    • In 8.0.2 and newer, VMware started shifting vCLS to “vCLS Pods”, running containers inside the VM, using a small internal container runtime.
    • VMware uses ETCD inside these pods as part of the vCLS control plane
    • vCLS Pods communicate over port 2380, which is etcd’s peer port

I will share my findings and thoughts with VMware support and wait for their response, because we cannot trust ChatGPT and vendor support is the main authority for their product. 

Another meaningful communication with VMware support (2025-05-23 - 100 days after opening a support ticket)

VMware response ...

just to follow up on previous mail

I checked this internally, etcd can run even if WCP/TKG isn't in use, this could be a 3 etcd node cluster, so may not be running on some hosts,

The number of half open drops are increasing because the connection requests are being denied by the other host as the service is not currently running on them.

Can you send me the output of the below command on the vcenter

/usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster status

Can you also upload a full vcenter log bundle along with the host logs

What is command /usr/lib/vmware/clusterAgent/bin/clusterAdmin?

The clusterAdmin tool in VMware ESXi is a command-line utility used for managing and administering vSphere clustering functionality, particularly vSphere HA (High Availability) and DRS (Distributed Resource Scheduler) operations at the host level. This tool is part of the cluster agent infrastructure that runs on each ESXi host and handles communication between the host and vCenter Server for cluster-related operations. 

Primary Functions: 

  • Managing cluster membership and host participation in vSphere clusters
  • Configuring and troubleshooting vSphere HA settings on individual hosts
  • Handling cluster state information and synchronization
  • Managing resource pool configurations and DRS policies
  • Performing cluster-related diagnostic operations


Common Use Cases:

  • Troubleshooting cluster connectivity issues
  • Manually triggering cluster reconfiguration operations
  • Checking cluster agent status and health
  • Resetting cluster configuration when hosts become disconnected
  • Diagnosing HA or DRS failures


Typical Usage: The tool is usually invoked with various subcommands and parameters, such as:

  • Status checking operations
  • Configuration reset commands
  • Cluster membership management
  • Resource allocation adjustments

This utility is primarily intended for VMware support engineers and advanced administrators who need to perform low-level cluster troubleshooting or maintenance operations that aren't available through the vSphere Client interface. It's part of the internal clustering infrastructure and should be used carefully, typically only when directed by VMware support or when following specific troubleshooting procedures.

Well, that's the case. VMware suport engineer (TSE) was asking for command outputs, so here are outputs from all ESXi hosts in vSphere/vSAN Cluster ...

dcserv-esx05

[root@dcserv-esx05:~] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster status
{
   "state": "hosted",
   "cluster_id": "5bab0e84-305e-4966-ae6e-b9386c6b19f3:domain-c2051",
   "is_in_alarm": false,
   "alarm_cause": "",
   "is_in_cluster": true,
   "members": {
      "available": false
   }

}
[root@dcserv-esx05:~]

dcserv-esx06

[root@dcserv-esx06:~] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster stat
us
{
   "state": "hosted",
   "cluster_id": "5bab0e84-305e-4966-ae6e-b9386c6b19f3:domain-c2051",
   "is_in_alarm": true,
   "alarm_cause": "Timeout",
   "is_in_cluster": true,
   "members": {
      "available": false
   }

}
[root@dcserv-esx06:~]

dcserv-esx07

[root@dcserv-esx07:~] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster stat
us
{
   "state": "hosted",
   "cluster_id": "5bab0e84-305e-4966-ae6e-b9386c6b19f3:domain-c2051",
   "is_in_alarm": false,
   "alarm_cause": "",
   "is_in_cluster": true,
   "members": {
      "available": false
   }

}
[root@dcserv-esx07:~]

dcserv-esx08

[root@dcserv-esx08:~] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster stat
us
{
   "state": "standalone",
   "cluster_id": "",
   "is_in_alarm": false,
   "alarm_cause": "",
   "is_in_cluster": false,
   "members": {
      "available": false
   }
}
[root@dcserv-esx08:~]

dcserv-esx09

[root@dcserv-esx09:~] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster stat
us
{
   "state": "hosted",
   "cluster_id": "5bab0e84-305e-4966-ae6e-b9386c6b19f3:domain-c2051",
   "is_in_alarm": false,
   "alarm_cause": "",
   "is_in_cluster": true,
   "members": {
      "available": true
   },

   "namespaces": [
      {
         "name": "root",
         "up_to_date": true,
         "members": [
            {
               "peer_address": "dcserv-esx09.dcserv.cloud:2380",
               "api_address": "dcserv-esx09.dcserv.cloud:2379",
               "reachable": true,
               "primary": "yes",
               "learner": false
            }
         ]
      }
   ]
}

[root@dcserv-esx09:~]

dcserv-esx10

[root@dcserv-esx10:~] /usr/lib/vmware/clusterAgent/bin/clusterAdmin cluster stat
us
{
   "state": "standalone",
   "cluster_id": "",
   "is_in_alarm": false,
   "alarm_cause": "",
   "is_in_cluster": false,
   "members": {
      "available": false
   }
}
[root@dcserv-esx10:~]

It seems to me that output above means that hosts

  • 4 nodes (dcserv-esx05, dcserv-esx06, dcserv-esx07, dcserv-esx09) are in cluster
  • only dcserv-esx09 have members available
  • dcserv-esx06 is in alarm state and alarm cause is Timeout
    • all other nodes are not in alarm state
  • when I check if etcd is running (ps | grep etcd), it runs only on following two ESXi hosts
    • dcserv-esx06, dcserv-esx09

VMware TSE mentioned that ... "etcd can run even if WCP/TKG isn't in use, this could be a 3 etcd node cluster". However, I see etcd service running only on two of six ESXi hosts. TSE believes there should be running 3 nodes. It leads into the following questions ...

Q1: What is the purpose of 3-node ETCD in vSphere/vSAN cluster?

Q2: Why only 2-nodes are running?

Anyway. I do not understand  /usr/lib/vmware/clusterAgent/bin/clusterAdmin tool. This is VMware low level internal tool. So let's wait for next VMware Support follow up.

System Logs from vCenter along with the host logs have been exported and uploaded to VMware Support Case. I'm looking forward to seeing if this will help VMware support to identify the root cause.

 

 

 

Wednesday, February 12, 2025

VMware vs OpenStack

Here are scrrenshot from Canonical webcast

Feature comparison


OpenStack technological stack

 

System containers (LXD) vs Application Containers (Docker)


 

 

 


Thursday, January 30, 2025

vSphere 8 consumption gui

Source: https://www.linkedin.com/posts/katarinawagnerova_vsphere-kubernetes-vms-ugcPost-7213567854271492099-ygOq?utm_source=share&utm_medium=member_ios

Infrastructure & Application Monitoring with Checkmk

Source: https://checkmk.com/ 


docker container run -dit -p 8080:5000 -p 8000:8000 --tmpfs /opt/omd/sites/cmk/tmp:uid=1000,gid=1000 -v monitoring:/omd/sites --name monitoring -v /etc/localtime:/etc/localtime:ro --restart always checkmk/check-mk-cloud:2.3.0p24
 
 

VCF - nested ESX

Source: https://mhvmw.wordpress.com/2024/12/29/part-iii-beginners-guide-using-nested-esxi-hosts-for-a-vcf-5-2-1-home-lab/

 

Shodan - Search Engine for the Internet of Everything

Search Engine for the Internet of Everything

https://www.shodan.io/


Shodan is the world's first search engine for Internet-connected devices. Discover how Internet intelligence can help you make better decisions.

Network Monitoring Made Easy

Within 5 minutes of using Shodan Monitor you will see what you currently have connected to the Internet within your network range and be setup with real-time notifications when something unexpected shows up.

ČRa new data center

Source: https://www.cra.cz/tiskove-centrum/datova-centra/cra-se-stanou-jednickou-mezi-provozovateli-datovych-center-ziskaly-uzemni-rozhodnuti-pro-nove-dc

CRA se stanou jedničkou mezi provozovateli datových center, získaly územní rozhodnutí pro nové DC

CRA se stanou jedničkou mezi provozovateli datových center, získaly územní rozhodnutí pro nové DC

České Radiokomunikace (CRA) finišují s přípravami jednoho z nejambicióznějších projektů v oblasti digitální infrastruktury v České republice, nového datového centra. Podařil se další významný krok, CRA získaly územní rozhodnutí. V lokalitě Praha Zbraslav vznikne do dvou let jedno z největších zařízení svého druhu nejen v České republice, ale i v Evropě, které bude mít kapacitou přes 2 500 serverových racků a příkon 26 megawattů.

„Hlavními atributy našeho projektu jsou inovativnost, udržitelnost, efektivita, spolehlivost a bezpečnost. Našim cílem je přivést do Česka velké společnosti, které zde dosud nemohly služeb datacenter využít z kapacitních důvodů s ohledem na jejich velikost či obsazenost,“ upřesňuje Miloš Mastník, generální ředitel Českých Radiokomunikací. „Nyní máme platné územní rozhodnutí a to znamená, že můžeme znovu pokročit s finálními přípravami,“ doplňuje Miloš Mastník.

Datové centrum bude mít rozlohu 5 622 m² s rozměry budovy 320 × 45 metrů a vyroste na revitalizovaných pozemcích, kde stály původně tři středovlnné rozhlasové vysílače CRA. Bude vybaveno kapacitou 2 500 serverových míst (racků) s příkonem 26 MW z dvou nezávislých tras pro bezpečné ukládání a správu dat. Prostory půjde přizpůsobit specifickým potřebám jednotlivých zákazníků. Každá místnost bude mít také vlastní kancelářské a úložné prostory, čímž se centrum stane komplexním řešením pro technologické potřeby firem.

Datové centrum bude splňovat nejpřísnější technologické i ekologické standardy. Bude plně napájené z obnovitelných zdrojů, konkrétně ze solární článků umístěných na střeše budovy. Díky strategické poloze, inovativnímu systému chlazení s hodnotou GWP <10, využívání zbytkového tepla a optimalizovanou výkonovou kapacitou bude efektivita provozu na špičkové úrovni s hodnotou PUE (Power Usage Effectiveness) 1,25. Například pro lepší distribuci vzduchu a hygienické standardy budou využity deskové podlahy, což zlepší chlazení a zároveň umožní výkonové zatížení jednotlivých racků až na 20 kW bez nutnosti dodatečného posílení chlazení.

CRA plánují splnit certifikace LEED Gold a dodržet standardy ASHRAE, projekt vzniká v souladu s principy ESG.

Projekt získal podporu Ministerstva průmyslu a obchodu, které se společností CRA podepsalo memorandum o porozumění. Memorandum stanovuje rámec spolupráce mezi státem a CRA v rámci pravomocí a platných předpisů s cílem podpořit digitální transformaci, výzkum a vývoj technologií a zajistit nezbytnou infrastrukturu pro další růst ekonomiky.

CRA již provozují osm datových center v České republice, například na pražském Žižkově, Strahově a Cukráku, stejně jako v Brně, Ostravě, Pardubicích a Zlíně. Zájem o pronájem kapacit stále roste, proto CRA otevřely nový datový sál letos na jaře v rámci vysílače Cukrák, koupily datové centrum Lužice a chystají modernizaci a rozšíření DC Tower na Žižkově.

Zbraslavské datové centrum má být ve spolupráci s mateřskou firmou Cordiant Digital Infrastructure dokončeno v roce 2026. Stavební a další nezbytná povolení od různých regulačních orgánů plánují CRA získat na jaře 2025. Samotná výstavba potrvá přibližně 24 měsíců. Díky již existující infrastruktuře včetně připojení na optickou síť, silniční napojení a dostupné energie, bude projekt schopen rychlé realizace.

 

Tarsnap - Online backups for the truly paranoid

Source: http://www.tarsnap.com/

 

NAS Performance: NFS vs. SMB vs. SSHFS | Jake’s Blog

Source: https://blog.ja-ke.tech/2019/08/27/nas-performance-sshfs-nfs-smb.html 

NAS Performance: NFS vs. SMB vs. SSHFS

This is a performance comparison of the the three most useful protocols for networks file shares on Linux with the latest software. I have run sequential and random benchmarks and tests with rsync. The main reason for this post is that i could not find a proper test that includes SSHFS.

NAS Setup

The hardware side of the server is based on an Dell mainboard with an Intel i3-3220, so a fairly old 2 core / 4 threads CPU. It also does not support the AES-NI extensions (which would increase the AES performance noticeably) the encryption happens completely in software.

As storage two HDDs in BTRFS RAID1 were used, it does not make a difference though, because the tests are staged to hit almost always the cache on the server, so only the protocol performance counts.

I installed Fedora 30 Server on it and updated it to the latest software versions.

Everything was tested over a local Gigabit Ethernet Network. The client is a quadcore desktop machine running Arch Linux, so this should not be a bottleneck.

SSHFS (also known as SFTP)

Relevant package/version: OpenSSH_8.0p1, OpenSSL 1.1.1c, sshfs 3.5.2

OpenSSH is probably running anyway on all servers, so this is by far the simplest setup: just install sshfs (fuse based) on the clients and mount it. Also it is per default encrypted with ChaCha20-Poly1305. As second test i did choose AES128, because it is the most popular cipher, disabling encryption is not possible (without patching ssh). Then i added some mount options (suggested here) for convenience and ended with:

sshfs -o Ciphers=aes128-ctr -o Compression=no -o ServerAliveCountMax=2 -o ServerAliveInterval=15 remoteuser@server:/mnt/share/ /media/mountpoint

NFSv4

Relevant package/version: Linux Kernel 5.2.8

The plaintext setup is also easy, specify the exports, start the server and open the ports. I used these options on the server: (rw,async,all_squash,anonuid=1000,anongid=1000)

And mounted with: mount.nfs4 -v nas-server:/mnt/share /media/mountpoint

But getting encryption to work can be a nightmare, first setting up kerberos is more complicated than other solutions and then dealing with idmap on both server an client(s)… After that you can choose from different levels, i set sec=krb5p to encrypt all traffic for this test (most secure, slowest).

SMB3

Relevant package/version: Samba 4.10.6

The setup is mostly done with installing, creating the user DB, adding a share to smb.conf and starting the smb service. Encryption is disabled by default, for the encrypted test i set smb encrypt = required on the server globally. It uses AES128-CCM then (visible in smbstatus).

IDmapping on the client can be simply done as mount option, i used as complete mount command:

mount -t cifs -o username=jk,password=xyz,uid=jk,gid=jk //nas-server/media /media/mountpoint

Test Methodology

The main test block was done with the flexible I/O tester (fio), written by Jens Axboe (current maintainer of the Linux block layer). It has many options, so i made a short script to run reproducible tests:

#!/bin/bash
OUT=$HOME/logs

fio --name=job-w --rw=write --size=2G --ioengine=libaio --iodepth=4 --bs=128k --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-write.log
sleep 5
fio --name=job-r --rw=read --size=2G --ioengine=libaio --iodepth=4 --bs=128K --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-read.log
sleep 5
fio --name=job-randw --rw=randwrite --size=2G --ioengine=libaio --iodepth=32 --bs=4k --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-randwrite.log
sleep 5
fio --name=job-randr --rw=randread --size=2G --ioengine=libaio --iodepth=32 --bs=4K --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-randread.log

First two are classic read/write sequential tests, with 128 KB block size an a queue depth of 4. The last are small 4 KB random read/writes, but with are 32 deep queue. The direct flag means direct IO, to make sure that no caching happens on the client.

For the real world tests i used rsync in archive mode (-rlptgoD) and the included measurements:

rsync --info=progress2 -a sshfs/TMU /tmp/TMU

Synthetic Performance

Sequential

sequential read diagram

Most are maxing out the network, the only one falling behind in the read test is SMB with encryption enabled, looking at the CPU utilization reveals that it uses only one core/thread, which causes a bottleneck here.

sequential write diagram

NFS handles the compute intensive encryption better with multiple threads, but using almost 200% CPU and getting a bit weaker on the write test.

SSHFS provides a surprisingly good performance with both encryption options, almost the same as NFS or SMB in plaintext! It also put less stress on the CPU, with up to 75% for the ssh process and 15% for sftp.

Random

4K random read diagram

On small random accesses NFS is the clear winner, even with encryption enabled very good. SMB almost the same, but only without encryption. SSHFS quite a bit behind.

4K random write diagram

NFS still the fastest in plaintext, but has a problem again when combining writes with encryption. SSHFS is getting more competitive, even the fastest from the encrypted options, overall in the mid.

random read latency diagram random read latency diagram

The latency mostly resembles the inverse IOPS/bandwith. Only notable point is the pretty good(low) write latency with encrypted NFS, getting most requests a bit faster done than SSHFS in this case.

Real World Performance

This test consists of transfering a folder with rsync from/to the mounted share and a local tmpfs (RAM backed). It contains the installation of a game (Trackmania United Forever) and is about 1,7 GB in size with 2929 files total, so a average file size of 600 KB, but not evenly distributed.

mixed read diagram mixed write diagram

After all no big surprises here, NFS fastest in plaintext, SSHFS fastest in encryption. SMB always somewhat behind NFS.

Conclusion

In trusted home networks NFS without encryption is the best choice on Linux for maximum performance. If you want encryption i would recommend SSHFS, it is a much simpler setup (compared to Kerberos), more cpu efficient and often only slightly slower than plaintext NFS. Samba/SMB is also not too far behind, but only really makes sense in a mixed (Windows/Linux) environment.

Thanks for reading, i hope it was helpful.

 

Best DevOps tools

Source: https://www.virtualizationhowto.com/2025/01/best-containers-for-devops-in-2025/ 

Best Containers for DevOps in 2025

A look at the top Docker containers for DevOps in 2025. Streamline your code projects and automation with these cool and robust containers

I use a LOT of Docker containers in the home lab and in my DevOps journey to continually work with various code projects, automation, and just running applications in containers. There are myriads of DevOps containers to be aware of that provide a lot of value and can help you achieve various business and technical objectives. There are several DevOps containers that I want to share with you that I use. Let’s look at the best Docker containers for DevOps in 2025 and see which ones I am using.

Why run Docker Containers?

There may be a question as to why you would run containers for DevOps tools instead of VMs? That is a great question. Virtual Machines are still very important and provide the foundation for virtual infrastructure and container hosts. I don’t think they will go away for a long time. However, containers are my favorite way to run apps and DevOps solutions.

Docker
Docker

Docker containers allow you to easily spin up new applications in seconds and not minutes or hours. You can simply pull an application container and spin it up with a one-line docker command instead of having to install a VM operating system, install all the prerequisites, and meet all the requirements of the application, which might take a couple of hours total.

Instead, spin up a Docker container host on a virtual machine and then spin up your applications in containers on top of your container host.

Best Docker Containers for DevOps beginning in 2025

Below is my list of best Docker containers for DevOps in 2025 broken out in sections. You will note a few repeats in the sections as some solutions do more than one thing.

CI/CD:

  • GitLab
  • Jenkins
  • Gitea
  • ArgoCD

Container registries

  • GitLab
  • Gitea
  • Harbor

Secrets management

  • Hashicorp Vault
  • CyberArk Conjur
  • OpenBAO

Code Quality

  • Sonarqube
  • Trivvy

Monitoring stack

  • Telegraf
  • InfluxDB
  • Prometheus
  • Grafana

Ingress

  • Nginx Proxy Manager
  • Traefik
  • Envoy by Lyft

CI/CD and Container Registries

GitLab

GitLab is the CI/CD solution and code management repo that I have been using to version my DevOps code in the home lab and in production environments. If you want to self-host your code repos and do extremely cool CI/CD pipelines for infrastructure as code, GitLab is a free solution that is easy to stand up in a Docker container.

Gitlab
Gitlab

You can use it to automate testing, build and automate, and deployment to your environments. You can also integrate third-party solutions in GitLab, which is a great way to extend what it can do

Pros:

  • It is an all in one solution for DevOps and code
  • Good CI/CD pipeline features
  • Has third-party tools and integrations
  • Good community support

Cons:

  • Can be resource-intensive
  • Some features may be in the paid product
  • Is rumored to be in talks of a buyout by someone?

Docker Compose Code:

version: '3'
services:
  gitlab:
    image: 'gitlab/gitlab-ee:latest'
    restart: always
    hostname: 'gitlab.example.com'
    environment:
      GITLAB_OMNIBUS_CONFIG: |
        external_url 'http://gitlab.example.com'
    ports:
      - '80:80'
      - '443:443'
      - '22:22'
    volumes:
      - './config:/etc/gitlab'
      - './logs:/var/log/gitlab'
      - './data:/var/opt/gitlab'

Learn more about GitLab here: The most-comprehensive AI-powered DevSecOps platform | GitLab

Jenkins

Jenkins is an OSS tool that most know. It will come up in just about any DevOps conversation around a self-hosted code repo. Many have a love/hate relationship with Jenkins. It can literally do anything you want it to, which is a plus. But the downside is, it can literally do anything. You can use it to build your projects, test code, and deploy to your infrastructure.

Jenkins
Jenkins

It also has a ton of third-party apps you can integrate with the solution and the CI/CD pipeline. Just about every other DevOps solution has an integration with Jenkins so it is supported across the board.

Pros:

  • It has been around forever so great support
  • Active community
  • distributed builds are supported
  • Everything seems to integrate with Jenkins

Cons:

  • Can be complex to set up and manage
  • Interface feels a little outdated

Docker Compose Code:

version: '3'
services:
  jenkins:
    image: 'jenkins/jenkins:lts'
    restart: always
    ports:
      - '8080:8080'
      - '50000:50000'
    volumes:
      - './jenkins_home:/var/jenkins_home'

Learn more about Jenkins here: Jenkins

Gitea

Gitea is a newcomer on the block. It has a modern feel about it, but isn’t as fully featured as other solutions like GitLab or Jenkins. It is easy to deploy and manage for Git repos. It has features that include issue tracking, CI/CD, and code reviews.

Gitea
Gitea

Pros:

  • Lightweight and easy to configure
  • Has CI/CD pipelines
  • Lower resource requirements compared to other solutions

Cons:

  • Fewer features compared to other solutions like GitLab and Jenkins
  • Smaller community

Docker Compose Code:

version: '3'
services:
  gitea:
    image: 'gitea/gitea:latest'
    restart: always
    ports:
      - '3000:3000'
      - '222:22'
    volumes:
      - './gitea:/data'

Learn more about Gitea here: Gitea Official Website

ArgoCD

ArgoCD is a more Kubernetes-centric solution for GitOps. Its purpose is to supply continuous delivery for Kubernetes. It automates application deployment by tracking changes in a Git repository. It continuously monitors and synchronizes Kubernetes clusters which is a more proactive solution to make sure that applications are always deployed in the desired state.

Argocd
Argocd

Pros:

  • GitOps-centric
  • Real-time synchronization
  • Kubernetes native solutions

Cons:

  • Can be complex with GitOps and Kubernetes knowledge needed

Docker Compose Code: ArgoCD is usually installed using Kubernetes manifests or with Helm charts. So, not typically Docker Compose. Here is an example of a manifest:

apiVersion: v1
kind: Namespace
metadata:
  name: argocd
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: argocd-server
  namespace: argocd
---
apiVersion: v1
kind: Service
metadata:
  name: argocd-server
  namespace: argocd
spec:
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: https
      port: 443
      targetPort: 8080
  selector:
    app: argocd-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-server
  namespace: argocd
spec:
  replicas: 1
  selector:
    matchLabels:
      app: argocd-server
  template:
    metadata:
      labels:
        app: argocd-server
    spec:
      serviceAccountName: argocd-server
      containers:
        - name: argocd-server
          image: argoproj/argocd:v2.0.0
          ports:
            - containerPort: 8080
          command:
            - argocd-server
          args:
            - --staticassets
            - /shared/app
            - --repo-server
            - argocd-repo-server:8081
            - --dex-server
            - argocd-dex-server:5556
          volumeMounts:
            - name: static-files
              mountPath: /shared/app
      volumes:
        - name: static-files
          emptyDir: {}

Learn more about ArgoCD here: Argo CD – Declarative GitOps CD for Kubernetes (argo-cd.readthedocs.io).

Harbor

Harbor is a well-known container registry solution. It supports features that most want for their registries like role-based access control, image replication, multiple registries, vulnerability scans, and others.

Harbor registry
Harbor registry

Pros:

  • Good security
  • Role-based access control (RBAC)
  • Image replication and vulnerability scanning

Cons:

  • More complex setup
  • No less than 6 containers for the solution
  • Requires additional resources

Docker Compose Code:

version: '3.5'
services:
  log:
    image: goharbor/harbor-log:v2.0.0
    restart: always
    volumes:
      - /var/log/harbor/:/var/log/docker/:z
  registry:
    image: goharbor/registry-photon:v2.0.0
    restart: always
  core:
    image: goharbor/harbor-core:v2.0.0
    restart: always
  portal:
    image: goharbor/harbor-portal:v2.0.0
    restart: always
  jobservice:
    image: goharbor/harbor-jobservice:v2.0.0
    restart: always
  proxy:
    image: goharbor/nginx-photon:v2.0.0
    restart: always

Learn more about Harbor registry here: Harbor (goharbor.io).

Secrets Management

HashiCorp Vault

The Vault solution allows you to store secrets and pull these dynamically when you are using IaC solutions like Terraform. You can store many types of secrets, including things like API keys, passwords, and certificates. It is easy to stand up as a solution in either Docker or Kubernetes.

Vault
Vault

It provides a secure way for code builds and other things like CI/CD to grab secrets on the fly from the secrets vault.

Pros:

  • Secure secrets management
  • Dynamic secrets can be used
  • Audit logging

Cons:

  • It can get complex to build out
  • Learning curve

You can see my full blog post on how to install Hashicorp Vault inside Docker here: Hashicorp Vault Docker Install Guide.

Docker Compose Code:

version: '3.8'

services:
  vault:
    image: hashicorp/vault:latest
    container_name: vault
    ports:
      - "8200:8200"
    volumes:
      - ./config:/vault/config
      - ./data:/vault/file
    cap_add:
      - IPC_LOCK
    command: "vault server -config=/vault/config/vault-config.json"

vault-config.json

{
  "storage": {
    "file": {
      "path": "/vault/file"
    }
  },
  "listener": {
    "tcp": {
      "address": "0.0.0.0:8200",
      "tls_disable": true
    }
  },
  "ui": true
}

Learn more about Hashicorp Vault here: Vault by HashiCorp (vaultproject.io).

CyberArk Conjur

CyberArk Conjur provides a community edition for secrets management. It focuses on CI/CD pipelines. You can integrate various tools and platforms for credentials, API keys, and other secrets.

It has detailed audit logging and other features that can help with security.

Cyberark conjur
Cyberark conjur

Pros:

  • Strong integration with DevOps tools
  • Robust access controls
  • Detailed auditing

Cons:

  • Added features may require enterprise version (paid)
  • Complicated setup and management for those not familiar with the solution

Docker Compose Code:

version: '3'
services:
  conjur:
    image: cyberark/conjur:latest
    restart: always
    environment:
      CONJUR_AUTHENTICATORS: authn
    ports:
      - "443:443"
    volumes:
      - ./conjur/data:/var/lib/conjur

Learn more about CyberArk Conjur here: Secrets Management | Conjur.

OpenBAO

If you are a looking for a free and open source secrets management solution, then OpenBAO is one to try. It is from the Linux Foundation and allows you to store passwords and other secret information. Like Vault, you can use it to store things such as API keys, passwords, etc.

Openbao
Openbao

Pros:

  • Simple solution that is lightweight
  • Encryption support and RBAC
  • Open-source and free

Cons:

  • Limited features
  • Smaller community

Docker Compose Code:

version: '3'
services:
openbao:
image: openbao/openbao:latest
restart: always
ports:
- "8080:8080"
volumes:
- ./openbao/data:/var/openbao

Learn more about OpenBAO here: OpenBao | OpenBao.

Code Quality

SonarQube

You can use SonarQube for scanning code quality, and things like linting, etc. It can help do automatic code reviews and detect bugs in code. You can also use it as a vulnerability scanner and find code smells.

It supports many different programming languages and scripting languages. You can integrate it with CI/CD pipelines and give you a report of what it finds, etc.

Sonarqube
Sonarqube

Pros:

  • Code quality analysis
  • Multiple languages supported
  • Integrates with CI/CD

Cons:

  • Can be resource-intensive
  • Doesn’t support some languages like PowerShell

Docker Compose Code:

version: '3'
services:
  sonarqube:
    image: sonarqube:latest
    restart: always
    ports:
      - "9000:9000"
    volumes:
      - ./sonarqube/conf:/opt/sonarqube/conf
      - ./sonarqube/data:/opt/sonarqube/data
      - ./sonarqube/logs:/opt/sonarqube/logs
      - ./sonarqube/extensions:/opt/sonarqube/extensions

Learn more about SonarQube here: Code Quality, Security & Static Analysis Tool with SonarQube | Sonar (sonarsource.com).

Trivvy

Trivvy is another solution I have used that allows you to scan for vulnerabilities (CVEs) and also for misconfigurations in your code (IaC). You can scan things like repositories, artifacts, container images, and you can even scan things like Kubernetes clusters.

Trivvy code scanner
Trivvy code scanner

Take a look at the example Docker compose code below:

version: '3.8'

services:
  trivy:
    image: aquasec/trivy:latest
    container_name: trivy
    entrypoint: ["trivy"]
    volumes:
      - ./trivy-cache:/root/.cache
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - TRIVY_SEVERITY=HIGH,CRITICAL 
      - TRIVY_EXIT_CODE=1            
      - TRIVY_IGNORE_UNFIXED=true   
    command: --help #Replace this with what you want to scan like, "image <image-name>"

Learn more about Trivvy on the official site here: Trivy.

Monitoring Stack

Telegraf

Telegraf collects and reports on metrics. It is part of the very well known “TICK” stack that many use for monitoring.

Telegraf
Telegraf

Pros:

  • Many plugins to extend its features
  • Lightweight
  • Integrates with various systems

Cons:

  • Requires configuration that is customized for different solutions
  • Learning curve

Docker Compose Code:

version: '3'
services:
  telegraf:
    image: telegraf:latest
    restart: always
    volumes:
      - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf

Learn more about Telegraf here: Telegraf Documentation (influxdata.com).

InfluxDB

InfluxDB is an open-source time series database. It is also part of the “TICK” stack. It is often used for housing metrics, events, and logs. There are many integrations with InfluxDB and you will find a lot of community projects using it.

Influxdb
Influxdb

Pros:

  • Great for time-series data
  • High performance
  • Integrates with many solutions

Cons:

  • Can require large resources depending on data
  • Complex queries may result in a learning curve

Docker Compose Code:

version: '3'
services:
  influxdb:
    image: influxdb:latest
    restart: always
    ports:
      - "8086:8086"
    volumes:
      - ./influxdb/data:/var/lib/influxdb

Learn more about InfluxDB here: InfluxDB Time Series Data Platform | InfluxData.

Grafana

Grafana is the de facto tool that is used in the open-source world to visualize data gathered from other solutions. It is commonly used in solution “stacks” of things like InfluxDB, Prometheus, etc. Combined with other tools it makes a great open-source monitoring solution that can replace even enterprise solutions for data views.

Grafana
Grafana

Pros:

  • Powerful for dashboarding and visualizing data
  • Many integrations
  • Intuitive interface
  • Thousands of community dashboards available

Cons:

  • Configuration may be complex depending on the integration
  • Learning curve

Docker Compose Code:

version: '3'
services:
  grafana:
    image: grafana/grafana:latest
    restart: always
    ports:
      - "3000:3000"
    volumes:
      - ./grafana/data:/var/lib/grafana

Learn more about Grafana here: Grafana: The open observability platform | Grafana Labs.

Ingress

Nginx Proxy Manager

Nginx Proxy Manager is a great solution that I use a lot in the home lab and it provides an easy way to add SSL termination to your Docker containers. Instead of having to configure SSL inside the container you are hosting, you configure the SSL cert in Nginx Proxy Manager and then proxy the requests for your containers inside the proxy network.

Nginx proxy manager
Nginx proxy manager

Pros:

  • User-friendly
  • Lots of features
  • Easy SSL configuration for Docker containers

Cons:

  • Limited to Nginx features
  • May need more advanced configuration for complex setups

Docker Compose Code:

version: '3.8'
services:
  app:
    image: 'jc21/nginx-proxy-manager:latest'
    restart: unless-stopped
    ports:
      # These ports are in format <host-port>:<container-port>
      - '80:80' # Public HTTP Port
      - '443:443' # Public HTTPS Port
      - '81:81' # Admin Web Port
      # Add any other Stream port you want to expose
      # - '21:21' # FTP

    # Uncomment the next line if you uncomment anything in the section
    # environment:
      # Uncomment this if you want to change the location of
      # the SQLite DB file within the container
      # DB_SQLITE_FILE: "/data/database.sqlite"

      # Uncomment this if IPv6 is not enabled on your host
      # DISABLE_IPV6: 'true'

    volumes:
      - ./data:/data
      - ./letsencrypt:/etc/letsencrypt

Learn more about Nginx Proxy Manager here: Nginx Proxy Manager.

Traefik

Similar to Nginx Proxy Manager, Traefik is a way to provide reverse proxy features for containers. It is also a load balancer and can automatically discover services and apply routing to your containers. You can use it to manage SSL certificates as well like LetsEncrypt to automatically provision those.

It is more difficult to use than Nginx Proxy Manager since most configuration is done in the Traefik configuration itself which can be tedious.

Traefik logo
Traefik logo

Pros:

  • Automatic service discovery
  • Great integration with Docker and Kubernetes
  • Lightweight

Cons:

  • Configuration can be complicated
  • Certificates can be complex to get working
  • More complicated to use than Nginx Proxy Manager

Docker Compose Code:

version: '3'
services:
  traefik:
    image: traefik:v2.4
    restart: always
    command:
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./traefik/traefik.yml:/etc/traefik/traefik.yml

Learn more about Traefik here: Traefik Labs.

Envoy by Lyft

Envoy is a reverse proxy solution that was originally developed by Lyft and it is now part of the Cloud Native Computing Foundation (CNCF). It is built with distributed communication systems in mind. It can be used as a sidecar proxy that can be used in things like service meshes. Also, it can simply be used as a standalone proxy solution.

Envoy proxy
Envoy proxy

Note the following example Docker compose code below:

version: '3.8'

services:
  envoy:
    image: envoyproxy/envoy:v1.26.0 
    container_name: envoy
    ports:
      - "9901:9901" # Admin interface
      - "10000:10000" # Example listener port
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml:ro 
    command: ["-c", "/etc/envoy/envoy.yaml"] 
    networks:
      - envoy-net
    restart: unless-stopped

networks:
  envoy-net:
    driver: bridge

Below is an example of the envoy.yaml configuration file:

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        config:
          codec_type: AUTO
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: service_backend
          http_filters:
          - name: envoy.filters.http.router
  clusters:
  - name: service_backend
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_backend
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080
admin:
  access_log_path: /dev/stdout
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

Learn more about Envoy here: Envoy proxy.

Wrapping up

Hopefully this list of what I think are some of the best DevOps containers in 2025 will help you discover some solutions that you may not have used before. All of these solutions are a great way to start learning DevOps practices and workflows and it will take your home lab or production environments to the next level.