Maybe This Was a Good Week to Stop Sniffing Glue!

Support has returned to a normal post-major-OS-release level, mostly (there's still a lot), so I've got a little time to talk about one of the reported problems that was, I think, of general interest. Generally nerdy interest, that is. But it gives you a little insight into what's happening behind the scenes as we make progress towards GA release.

My APFS Backup Drive Isn't Showing Up in the Boot Menu!

While rare, this problem also occurs with HFS+, and is usually due to a drive that isn't responding correctly at boot time. Working around the issue typically involves attaching the drive after you reach the Option+boot menu: that way, the system and drive get a little more time to talk, and all works out.

But, with APFS, we were seeing a number of users indicating that their drive wasn't ever showing up in the Option+Boot menu, even though the drive was in the Startup Disk Preference Pane, and the usual workarounds didn't work.

On top of that, if the user actually booted up from the drive (from Recovery, the Startup Disk Preference Pane, or whatever), the drive would show up in Option+Boot, even after an Erase-then-copy backup...and even after deleting the various special APFS Preboot, Recovery and VM partitions.

Wait, What? C'mon.

I know! But it's true! And so it took a while to get it to happen in-house. But now that I've figured it out...it makes sense.

Doveryai no Proveryai

Additional investigation showed that you didn't actually have to start up from the drive. You merely had to select it in the Startup Disk Preference Pane. You could then switch back to your original drive without booting, and the drive would now always show up in Option+boot.

Given that, my initial thought (after WTF?) was that there was a new security enhancement at play. Perhaps, with the new "3rd-party applications can't set the startup drive" behavior in mind, Apple had taken another step, forcing users to select a drive as a startup drive using the Startup Disk Preference Pane at least once before it would work from Option+boot.

That sort of made sense, except the drive remained bootable across systems, and so there was no actual protection. So that wasn't it.

Schizoid Embolism?

As I mentioned above, once I had a drive that "worked", it would always work, whether Smart Updated or Erase-then-copied. I could even erase the volume with Disk Utility (which makes sense, since that's what SuperDuper! is doing, after all), and it would continue to show up in Option+boot (once a backup was made, of course).

Every one of these tests would take quite a while. Even with a minimal macOS High Sierra install, a test copy from scratch takes about 15 minutes, so each cycle was pretty costly in terms of time.

But, over time, I found that if I turned on all devices in Disk Utility, and erased the drive rather than the volume, the bad behavior returned. So, clearly, this was an outside-the-volume issue, but it followed the disk regardless of system. And that could only mean one thing.

EFI.

Don't Touch Me There

If you're not building a Hackintosh, you never have to deal with EFI. And while it's made the news lately due to some security issues with older Mac versions, it's not something you ever really hear about.

Basically, EFI stands for Extensible Firmware Interface (currently it's actually UEFI, but most people still say EFI). As that name implies, it's sort of an operating system inside the BIOS that can do stuff like trusted boot, GUID/GPT partitioning, etc.

So, a device can supply programs that run in that environment when attached. And that stuff is stored in a hidden EFI partition on the drive.

For security reasons, normal applications can't touch EFI.

Openly #Blessed

Given that discovery, the next step was verifying that the Startup Disk Preference Pane was using bless to do its thing (it was), and then looking at all the files bless was reading and writing.

Sure enough, one of the files being read (although not written) was /usr/standalone/i386/apfs.efi, and its presence on the drive was not enough.

Time to hit the Open Source repository. (Which is super useful; thanks, Apple, for releasing this stuff, even if it's unbuildable and references private frameworks.)

Analyzing the code there showed that, indeed, bless was embedding an APFS driver into EFI using a private, privileged API that we couldn't (and wouldn't want to) use. Interestingly, it was being done during the processing of --setBoot, the option that actually makes a drive the current startup volume. So there we go!

Don't Do Me Like That (RIP TP)

Except SuperDuper! can't use --setBoot, because it gives an error: only Apple apps can use --setBoot.

Or can it?

The code that embeds apfs.efi into the container's EFI is actually outside the block that actually sets the current startup drive. Which means that action will occur, regardless of whether there's an error.

So, by using an option that generates an error, --setBoot, we can get the EFI modified as needed. Adding --nextonly helps to minimize any potential side effects, too, since that just sets up the next boot without making the selection permanent (and doesn't do it anyway, since doing that requires privileges we don't have).

And, indeed, that solves the problem.

Spock Would Not Be Pleased

I'd argue that embedding the apfs.efi into the container's EFI should be done during the regular bless --folder operation, since the drive really isn't fully blessed without it, but I'm sure Apple had a reason to do it this way, even though it seems...illogical.

But logical or not, the multi-day investigation resulted in a workable fix, which will be in the next Beta, and obviously in the final release of v3.0 as well.

Back to it. Have a good weekend.