PDA

View Full Version : Hangs while backing up to network mounted sparse image


acronce
09-09-2008, 11:52 AM
Hi all,

I'm trying to back up about 750 GB to a sparse image mounted from an AFP share on a ReadyNAS RAID5 array. The sparse image is encrypted with AES 128, and set with a maximum size of 900 GB. The machine I'm backing up from is running Leopard Server 10.5.4.

I've never successfully backed up the drive. After a couple of attempts, a failure pattern is emerging. Basically what happens is access to the mounted sparse image wedges somehow. When this first happened, Super Duper just hung. It couldn't make any progress, even after several hours.

I was able to quit Super Duper. But access to the sparse image was generally wedged. I couldn't eject the drive. And when I tried to run Disk Utility, it hung at start up. The diskimages-helper application was running and taking a fair amount of the processor (upwards of 20%).

Eventually whatever was holding its lock on the sparse image volume freed up and the volume was ejected. But this was hours later. Note that this happened on a server, so rebooting it during the work day was not an option.

After the above weirdness happened, we rebooted the server at a safe time and tried the backup again. Unfortunately we got another failure. Just after midnight, the Super Duper backup failed again, this time with an I/O error, according to the log.

But again the machine is in that weird state where the mounted disk image volume cannot be ejected, and Disk Utility wedges at launch. Also, when I tried to cd to /Volumes from the terminal, the terminal also wedged.

I've backed up other systems to other sparse image files before and I've never seen anything like this. Although I admit that previous backups of to sparse image files over the network have generally been smaller than 200 GB. And this is the first time that I've used this particular ReadyNAS hardware for this purpose.

I'm tempted to bail on using sparse images for this backup and just backup to a raw share point. Unfortunately I would lose encryption, which is not desirable given this situation.

Any thoughts on what could be going wrong here?

Best regards,
--
Allen Cronce

dnanian
09-09-2008, 01:03 PM
We're just asking the image to mount with hdiutil; I don't know why it would hang the diskimages-helper unless there was something pretty wrong.

Maybe try pointing at the mounted image rather than the file?

acronce
09-09-2008, 04:34 PM
We're just asking the image to mount with hdiutil; I don't know why it would hang the diskimages-helper unless there was something pretty wrong.

Maybe try pointing at the mounted image rather than the file?

Sorry, I wasn't clear enough. In all the failure cases, I specified the mounted image as the Super Duper destination. That means that Super Duper didn't even mount the image, the Finder did.

BTW, I created the sparse image manually using Disk Utility rather than let Super Duper create it. The image file is also AES encrypted, which I think I mentioned.

I've looked at this problem a little more. In both failure cases, the backup seems to get to around the same place before the bad stuff happens. The file that Super Duper is trying to copy, ironically, is an archived Super Duper backup sparse image file that's 160 GB.

Also, I've looked at the logs on the system. If I'm reading this right, it looks like the underlying AFP volume is disappearing right around the time that the backup fails. Here's the log entries (with volume names changed to protect the innocent):

Sep 9 00:00:01 my_server newsyslog[4773]: logfile turned over
Sep 9 00:14:33 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: doing reconnect on /Volumes/afp_share
Sep 9 00:14:33 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Volumes/afp_share
Sep 9 00:14:38 my_server kernel[0]: AFP_VFS afpfs_Reconnect: posting to KEA EINPROGRESS for /Volumes/afp_share
Sep 9 00:14:38 my_server KernelEventAgent[60]: tid 00000000 received VQ_NOTRESP event (1)
Sep 9 00:14:38 my_server KernelEventAgent[60]: tid 00000000 type 'afpfs', mounted on '/Volumes/afp_share', from 'afp_00btPO00btPO00btPO00btPO-2.2d000005', not responding
Sep 9 00:14:38 my_server loginwindow[59]: 1 server now unresponsive
Sep 9 00:14:38 my_server KernelEventAgent[60]: tid 00000000 found 1 filesystem(s) with problem(s)
Sep 9 00:15:08 my_server loginwindow[59]: 1 server now unresponsive
Sep 9 00:15:33 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Open Session failed 32 /Volumes/afp_share
Sep 9 00:15:33 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: sleep for 1 seconds and then try again
Sep 9 00:15:34 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Volumes/afp_share
Sep 9 00:16:35 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Open Session failed 32 /Volumes/afp_share
Sep 9 00:16:35 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: sleep for 2 seconds and then try again
Sep 9 00:16:37 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Volumes/afp_share
Sep 9 00:17:37 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Open Session failed 32 /Volumes/afp_share
Sep 9 00:17:37 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: sleep for 4 seconds and then try again
Sep 9 00:17:41 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Volumes/afp_share
Sep 9 00:18:41 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Open Session failed 32 /Volumes/afp_share
Sep 9 00:18:41 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: sleep for 8 seconds and then try again
Sep 9 00:18:49 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Volumes/afp_share
Sep 9 00:19:49 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Open Session failed 32 /Volumes/afp_share
Sep 9 00:19:49 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: sleep for 10 seconds and then try again
Sep 9 00:19:59 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: connect to the server /Volumes/afp_share
Sep 9 00:20:22 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Logging in with uam 7 /Volumes/afp_share
Sep 9 00:20:23 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: Restoring session /Volumes/afp_share
Sep 9 00:20:24 my_server kernel[0]: AFP_VFS afpfs_DoReconnect: primary reconnect failed 5, trying secondary /Volumes/afp_share
Sep 9 00:20:24 my_server kernel[0]: AFP_VFS afpfsCheckOpenFiles: File <my_server backup.sparseimage> can not be reopened due to deny modes or byte range locks (file has exclusive access).
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server KernelEventAgent[60]: tid 00000000 received VQ_NOTRESP event (1)
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server loginwindow[59]: No servers unresponsive
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server /Applications/SuperDuper!.app/Contents/MacOS/SDCopy[2148]: open on /Volumes/my_server backup/backups/Treasury/image/Treasury Backup.sparseimage: Invalid argument
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: jnl: disk2s2: do_jnl_io: strategy err 0x5
Sep 9 00:20:24 my_server SuperDuper![533]: ***ERROR OCCURRED: ditto: /Volumes/my_server backup/backups/Treasury/image/Treasury Backup.sparseimage: Input/output error
Sep 9 00:20:24 my_server kernel[0]: jnl: disk2s2: write_journal_header: error writing the journal header!
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).
Sep 9 00:20:24 my_server kernel[0]:
Sep 9 00:20:24: --- last message repeated 1 time ---
Sep 9 00:20:24 my_server kernel[0]: jnl: disk2s2: do_jnl_io: strategy err 0x5
Sep 9 00:20:24 my_server kernel[0]: jnl: disk2s2: end_transaction: only wrote 0 of 25088 bytes to the journal!
Sep 9 00:20:24 my_server kernel[0]: disk2s2: 0x5 (UNDEFINED).

This has me thinking that either the ReadyNAS has an AFP bug, or that backing up large amounts of data to a volume that's actually a sparse image file on an AFP share is maybe not a good idea in general. It's one thing when the backing store is a local hard drive. The diskimages-helper, which I think "fronts" the volume for the image file, can rely on solid reliability.

But if the backing store is over the network, then there could be big problems if diskimages-helper can't reliably read or write the underlying file data. I can totally see low level file operations get blocked by an incomplete AFP transfer. Especially when we're transferring huge files (like image backups of other systems) to the sparse image volume.

I think I'm going to try backing up to just a simple share point on the ReadyNAS, rather than a mounted sparse image, to see if that makes a difference. Maybe this all comes down to a reliability problem with the ReadyNAS.

If this ends up being a ReadyNAS problem, then I apologize for the diversion.

Best regards,
--
Allen Cronce

dnanian
09-09-2008, 06:17 PM
We're not going to be able to back up directly to a network drive. But I'm using a ReadyNAS to store a pretty large image and haven't had any problems. Perhaps you should try using their SMB sharing for the image?

acronce
09-09-2008, 07:10 PM
We're not going to be able to back up directly to a network drive. But I'm using a ReadyNAS to store a pretty large image and haven't had any problems. Perhaps you should try using their SMB sharing for the image?

OK, I didn't realize that the disk image route was the only way for a network backup. I can try SMB.

BTW, what version of ReadyNAS firmware are you using? I posted on their forum and they pointed me at a new RAIDiator-4.01c1-p2 version that I'm going to try. I guess this is a prelease because it has to be downloaded from their site. It doesn't appear when you try to do a remote update.

Thanks in advance.

Best regards,
--
Allen Cronce

dnanian
09-09-2008, 08:03 PM
I have the currently released firmware (non-beta).

acronce
09-09-2008, 10:18 PM
I have the currently released firmware (non-beta).

Sorry to be a pest, but I have a couple more questions:

1. Are you using an SMB or AFP share for your sparse image backup?

2. What model ReadyNAS do you have?

The reason why I ask is that I was using the currently released firmware and AFP on a ReadyNAS 600. If that configuration works for you and not for me, then something else must be causing the problem.

Best regards,
--
Allen Cronce

dnanian
09-09-2008, 10:50 PM
I'm using AFP, on a ReadyNAS NV.

acronce
09-11-2008, 02:02 PM
I'm using AFP, on a ReadyNAS NV.
I did a manual firmware update to version v4.01c1-p2, which isn't available via their automatic remote update system. That seems to have solved the problem. I've backed up twice with no incident.

Sorry that this turned out to be a ReadyNAS issue.

Best regards,
--
Allen Cronce

dnanian
09-11-2008, 02:25 PM
Great news; thanks for the update, ALlen.