(I really wanted to embed the teaser trailer for Peter Greenaway's A Zed and Two Noughts above, but alas it's a tad NSFW. Really great movie, though.)

Content warning: some strong language is used in this post. Not without reason.

So, here's something that you might not expect me to say:

Because of the way APFS "file clones" work, no program operating at the file level, including SuperDuper!, can make an exact physical copy of every possible APFS volume.

That's right. There are cases where we can't make an exact copy of your APFS volume. And Time Machine can't either. Nothing can.

That doesn't mean the copy is bad! It just means it might not be as space efficient as the original.

Doomed! Doomed! (Well, maybe not so doomed.)

Remember back in this post where I talked about the demo where Craig showed how fast it was to copy a gigantic amount of data?

I explained back then it was because the files aren't being copied. Rather, APFS creates new directory entries for the files, but references the same data blocks. So nothing is copied, which is fast. This is documented in Apple's APFS Guide.

From the user's perspective, these are different files. They're not like hard linked files, where changing one copy changes the others (not that most users know what hard links are). As far as users are concerned, they're totally separate, even if, at the file system level, they share the same data.

In APFS, if one of the cloned files is changed, even by a single byte, that changed data 'splits off' from the rest, and the files are now physically, and not just logically, separate—some of the data blocks now have two copies: the original ones, and the modified ones.

This process continues as the files diverge further.

The amount of logical drive space taken by the copies is twice the original, of course. However, the amount of actual space taken is, effectively zero...until the files are changed. At which point the space taken is the original plus the number of modified blocks.

This is all handled for you by APFS. You don't really have to think about it.

Quantum Theory?

Until you do have to think about it.

Consider this case: you have a 1TB APFS drive, and three 333GB files, named A, B, and C. So the drive is nearly full.

You then create a folder, and copy the three files into that folder with Finder. Of course, you'd expect the copy to fail and the drive to fill...but it doesn't.

In fact, if you look at the volume's size with Get Info, you may be surprised to see it has the same amount of data on it as was there before you made the copies. But, if you look at Finder's size for the folder, you'll see you now have 2TB of data on a 1TB drive. It's like magic!

At least until you change one of the files.

But now, select those files and folders with Finder and try to copy them to another 1TB drive. What happens?

The drive fills.

A Shitty Analogy

You can't copy it to the same size drive! But why?

The reason is there's no (public) way to find out that two files are actually sharing the same data (they might even only be sharing some of the same data, as I explain above). So, when copied, the "clone" relationship is broken, as is the ten-pounds-of-shit-in-a-five-pound-bag magic. You now have a full ten pounds. It doesn't fit...so you end up covered in shit.

But What If You...

Yes, we know:

  • What if you kept track of checksums of every file on the drive, and then made "clones" for each file based on whether the files had the same data?

    Leaving aside how ungodly slow that would be (think about trying to match ten million files to each other via checksums every time you copied), remember that cloning operates at a block level, where some blocks may be shared and some may not be. At a file level, it just won't work.

  • How about using hard links?

    That won't work either: clones and hard links are not semantically equivalent at all, since changing one of the hard linked files would change all of them, by definition.

  • Just ask the file system!

    While there are APIs to create clones, there's nothing there to find out whether two files are clones... and also, the shared data is at the block level, so still, no.

  • Time Machine does it!

    Well, not really. Time Machine does seem to be able to determine if two files are clones (which I assume it's doing with private APIs, because I can't find any documented APIs to determine if two files are clones). When it backs up cloned files, it uses hard links to represent them (since HFS+ doesn't support clones, and Time Machine can only back up to HFS+ volumes), and when it restores, it checks to see if those files are clones (which it tracks in a special database), and restores them as clones to APFS...unless they're restored to an HFS+ volume, where all bets are off.

    But even in the best case, restoring to APFS, when files get 'separated' when they're changed, again, only the part of the file that was changed is separate. The other blocks are still shared. So even though they've jumped through hoops to maintain the clone relationship, there are lots of cases where Time Machine's own copies will increase in size too, and it happens more and more as the files diverge further.

  • You guys are so smart, you figure it out! Why are you asking me?

    Geez, don't get so defensive!

We're All in this Together

So, as you can see, given the low-level behavior, there's really no solution, even when you're Apple.

What does this mean for you? It means you can get in cases where data that fits on a source drive won't fit on a destination, even when the drives are exactly the same size.

To avoid problems, you need enough space to store the full logical size of the data (that is, with all the "clones" separated) when you copy, unless you're copying the entire container at a sector level.

We Good?

Again, this doesn't mean your backup isn't good! It is! It has all your data!

What it does mean is that the data isn't stored as efficiently on the backup. So, it might not fit on your drive when you back it up. And it also might not fit when you restore, if the backup ends up larger than the capacity of the source.

That's easy to check, and the solution is also pretty easy: have plenty of free space on your drives, folks. It's always been good advice, and given all this hidden behavior that happens with cloned files, it's even more important with APFS.

OK, back to the code mines...