The past week's been spent delving into some pretty obscure problems. Special thanks, right at the top, to Jan, who spent a lot of time running special code that fixed some of this stuff. Owe you a beer, Jan.

Also, Beta 4 is linked at the bottom of this post, so if you want to just go there and not read how we got there, well, you won't hurt my feelings. Much. >sniff<

Heading to Entebbe

We had a report from a user that blessing Thunderbolt wasn't working. The symptoms were exactly like the FireWire problem previously reported (see below), which really didn't make sense, given it'd act like a regular SATA device, so it was back to reading a bunch of bless code to try to figure out what was going on.

I think I've figured this one out and, unfortunately, it looks like a bug in bless, at least in one case: RAID volumes.

You may remember that there are special volumes in an APFS container that are used for various purposes. One, Preboot, is responsible for booting tasks. When you bless a regular APFS volume, you're also configuring the Preboot volume in the container to support boot.

Now, one Preboot volume supports all the potentially bootable volumes within a given APFS container (there can be any number of them).

bless, when looking for the Preboot volume, sometimes can't find it, even when it's there. When this situation occurs, if you look at the 'verbose' bless info, you'll see, just before it fails (this is an example from a real user):

Returning booter information dictionary:
<CFBasicHash 0x7f94e2d2e5b0 [0x7fffaa32d5b0]>{type = mutable dict, count = 3, entries =>
    0 : <CFString 0x106669ad0 [0x7fffaa32d5b0]>{contents = "System Partitions"} = (
        disk0s1,
        disk2s1,
        disk3s1
    )
    1 : <CFString 0x10666a2b0 [0x7fffaa32d5b0]>{contents = "Data Partitions"} = (
        disk2s2,
        disk3s2
    )
    2 : <CFString 0x10666a2d0 [0x7fffaa32d5b0]>{contents = "Auxiliary Partitions"} = (
        disk2s3,
        disk3s3
    )
}

In this case, it's not seeing any preboot volume at all. But when we look at the output of diskutil, we can clearly see it's there, and has the right role:

+-- Container disk5 40B6CB66-CB84-4913-9D81-E99117C5118C
====================================================
APFS Container Reference:    disk5
Capacity Ceiling (Size):      750079967232 B (750.1 GB)
Capacity In Use By Volumes:  720822272 B (720.8 MB) (0.1% used)
Capacity Available:          749359144960 B (749.4 GB) (99.9% free)
|
+-< Physical Store disk4 0A426235-14D4-4F80-A334-DBA686914922
|  ---------------------------------------------------------
|  APFS Physical Store Disk:  disk4
|  Size:                      750079967232 B (750.1 GB)
|
+-> Volume disk5s1 EDEBF4F8-D55E-41A4-9B91-4C8284696EDA
|  ---------------------------------------------------
|  APFS Volume Disk (Role):  disk5s1 (No specific role)
|  Name:                      Backup Disk (Case-insensitive)
|  Mount Point:              /Volumes/Backup Disk
|  Capacity Consumed:        933888 B (933.9 KB)
|  Encrypted:                No
|
+-> Volume disk5s2 F8301391-7F37-4827-8189-AF830BA3D59A
|  ---------------------------------------------------
|  APFS Volume Disk (Role):  disk5s2 (Preboot)
|  Name:                      Preboot (Case-insensitive)
|  Mount Point:              Not Mounted
|  Capacity Consumed:        18489344 B (18.5 MB)
|  Encrypted:                No
|
+-> Volume disk5s3 A75B0F60-9626-4F96-9D94-5AD97155838F
|  ---------------------------------------------------
|  APFS Volume Disk (Role):  disk5s3 (Recovery)
|  Name:                      Recovery (Case-insensitive)
|  Mount Point:              Not Mounted
|  Capacity Consumed:        517365760 B (517.4 MB)
|  Encrypted:                No
|
+-> Volume disk5s4 5B344BEC-B85B-4373-97D3-081CEA467854
    ---------------------------------------------------
    APFS Volume Disk (Role):  disk5s4 (VM)
    Name:                      VM (Case-insensitive)
    Mount Point:              Not Mounted
    Capacity Consumed:        20480 B (20.5 KB)
    Encrypted:                No

The code that's having problems is in BLCreateBooterInformationDictionary.c in Apple's Open Source bless project. After some additional investigation, it looks like, in this case, if the APFS container is on an Apple RAID, bless can't find the Preboot volume and doesn't properly set up the container.

I've got one user's specific drive on order so I can test in his exact configuration here.

Of course, this doesn't explain every case we've seen, but it at least we think we understand what causes this one.

Re-Fire the Main Course

I dug out a FireWire drive here and created an adapter centipede (USB-C to Thunderbolt, Thunderbolt to FireWire, FireWire to drive) and... I was able to successfully bless and boot from a FireWire drive hosting an APFS volume.

So, while there are some FireWire configurations that bless fails on, it's not a blanket failure. It doesn't look like it's only FireWire RAID drives (some weren't), either. So, we're still investigating.

At this point, I'd generally encourage you to use USB-3/USB-C/Thunderbolt drives for any Macs that support those standards. They're all faster than FW800, have a future (as much as any technology has a future), and work fine.

Connection Required?

We were getting weird intermittent errors on some user systems that, when correlated (an exhausting process, since you have to try to figure out the common elements between a bunch of totally random cases), made no sense: the situations where the copies would fail corresponded to a lack of internet access (whether due to proxies, down connections, down DNS, etc).

What's especially strange about that is that...apart from the version check (and resulting software update, if accepted), we don't access the network. And this was happening to these users at the end of the copy, during the bless action.

Long story short (and thanks to Chuck for running a bunch of tests for me), we use xpath to parse the XML returned by the -plist parameters to various tools (such as diskutil). And that XML has a DTD at the start of it that references apple.com - and xpath would try to fetch that DTD, fail, and return a blank result.

Surprisingly simple fix: delete that line from the XML. No more network access, proper result return, everything's happy.

Bold and Robust

Due to an extremely high level of coffee consumption, this new beta fixes those and a bunch of other things. So, enough reading about the details and time to get to downloading.

Thanks, again, for helping out during this process. It's great to see it's working well for almost everyone, and satisfying to be able to resolve problems for those reporting them. Have at the new release, and let us know what you find!

SuperDuper! 3.0 B4