It’s late in the evening and: oh boy, did we have fun troubleshooting UEFI **** again. This time it’s related to ESXi scripted install with UEFI Secure Boot turned on. Chances are, if you stumbled over this post, you’re running into the same issue.
Since ESXi 6.0 it’s possible to do scripted installs using PXE even on UEFI based systems and there have been good posts on this topic. Now it’s 2018 and I doubt anyone is deploying anything not UEFI based anymore, although it would be nice if we did so (you maybe saw my bashing of UEFI before).
So lately I’ve been working on a bunch of scripts to automate our ESXi deployments. Part of the story is that we wanted to use ESXi scripted install to auto-configure our ESXi hosts based on a centrally versioned JSON configuration file. So you just put the baremetal host into your rack, attach the cables, turn it on and: boom. Done. The host will install, configure and show up in vCenter where the next part of our configuration suite will kick in.
The most important part of the scripted install is the ks.cfg kickstart file that ESXi will use when configuring the kernelopt parameter of the boot.cfg. Let me give you a quick exmaple how such a file could look like.
vmaccepteula rootpw [email protected]! install --firstdisk=localesx,local --overwritevmfs network --bootproto=dhcp --device=vmnic0 %post --interpreter=busybox # Wait 30 seconds until firstboot scripts have been copied before rebooting sleep 30; reboot %firstboot --interpreter=python import logging logging.info("do some bootstrapping...")
Awesome, right? Besides the options like vmaccepteula and rootpw which you will certainly be able to relate to their UI install representatives it’s also possible to execute custom scripts at certain points (
%firstboot). Even better: not only Bash (busybox) but also Python (3.5.3 on ESXi 6.7) scripts are supported. So you can do pretty much everything in a reliable way.
Today I spent about 3 hours troubleshooting our kickstart script, searching for mistakes I made in the script or somewhere else, because it just would not execute after
%firstboot. Until I finally checked the kickstart logs located at /var/log/kickstart.log. Only god knows why I wasted 3 hours and multiple test-installs before I checked the logs. People get stubborn when they want something to be done I guess. Nevermind. Here’s what I found in the logs:
INFO %firstboot beeing ignored when UEFI secure boot is turned on
Wait, WHAT? Okay that’s a pretty clear error message. Let’s google that. Indeed: I’m not the only one. In this thread VMware Engineer William Lam confirms that this is due to restrictions of the UEFI specification. Great. Another pseudo security UEFI feature that nobody asked for.
Thankfully William also provides a workaround for the issue: disable Secure Boot at install time and re-enable it later on. I verified this procedure and was happy to see my
%firstboot script beeing executed without any issues. When checking the /var/log/kickstart.log again it this time presented me with a nicer message.
INFO Running /var/lib/vmware/firstboot/001.firstboot_001
Although it’s not perfect, in our case it’ll mean a way more complicated automated rollout process where we have to disable Secure Boot using Redfish APIs and later on enable it again (let’s hope you’re not working with hardware templates), it’s the best we got at this time. And since it’s a issue with the UEFI specification I doubt that we’ll have an better option anytime soon.