Return to Digital Photography Articles
How to set up your Backup Software
Many of the popular backup programs offer a huge number of configuration options, which can be daunting for the first-time users. This page covers some of the common configuration options you may want to use for your setup.
Files to Exclude from Backup
In general, one should always exclude any automatic cache files from your backup set, as these will be regenerated after recovering the source data. Including them in the archive set also has the drawback that they always change, meaning that you're often including unnecessary files in your incremental backups.
Therefore, I recommend adding exclusions for the following files / folders:
- Folder Thumbnails
Thumbs.db (Windows XP)
ZbThumbnail.info (Canon ZoomBrowser)
When configuring a backup job, I have added a file name filter to exclude Thumbs.db. When you view a folder containing digital photos in Windows XP, the operating system creates a database of thumbnails for your files in the same folder. Any time that you add files or modify them, the Thumbs.db file will change once you re-open the folder for viewing.
- Adobe Stock Photos:
\My Documents\AdobeStockPhotos\Previous Searches
Some newer Adobe products such as Photoshop ship with a front-end for their searchable stock photo collection. I would recommend excluding these as this set is easily recreated and may be constantly changing. Note that the AdobeStockPhotos directory also includes a Purchased Images folder, so you probably wouldn't want to include that in your exclusion set.
- Captured Video Files
These are all preserved elsewhere on physical tapes (miniDV), and so I already have a degree of redundancy or protection on these files. However, the most significant reason that I don't include these is their sheer size. As I have hundreds of gigabytes of video files in my system (for video editing projects), archiving these is impractical. In my case I keep these video files in their own folder, so I exclude the folder from the backup set.
- System Folders
\System Volume Information
One generally does not want to include system folders or files in a backup as they don't generally recover well unless the entire OS were recovered. Therefore, I leave these files for my boot drive image backups that I do separately with a different tool.
- Installed Applications
As I already have original discs for all of my applications, along with a separate compressed version of these packed onto DVD-R, I do not need to include these in my main backup set. Just like for the system files, restoring installed applications from a non-image backup is often fruitless (as it relies on changes to the registry and other locations too), so I don't include these. Note that you should set up all of your software so that it stores configuration data (whenever possible) to a folder other than your boot / program files directories.
- Temporary Files & Caches
\MSOCache (Microsoft Office)
hiberfil.sys (Hibernation Memory Dump)
gobackio.bin (Norton GoBack)
pagefile.sys (Virtual Memory Page File)
The operating system and some programs create temporary directories on your data volume (the drive that you are backing up), so you may need to list these separately as exclusions. Several of the hidden system files listed above can be many gigabytes in size!
Compression Filters / Exclusions
Enabling compression on your backup set is almost always a good idea. If you are considering a backup mechanism to a remote / offsite location, then this will be all the more important as typical residential internet upload bandwidth is very limited.
Unfortunately, not all file types will compress well. In fact, file types that are already compressed in some form (such as JPEG images, MPEG videos, etc.) may even increase in size after compression. Trying to compress these files is not only a waste of time but could potentially negate the benefit of compressing in the first place. Therefore, the backup software must provide a means by which you can filter out certain file types from compression: all files will be compressed, except for the specified file types.
Typical compression exclusions for archives:
.7z .ace .arc .arj .cab .cdx .gz .gzip .jar .lzh .rar .tgz .zip .zoo
Typical compression exclusions for multimedia:
.avi .crw .cr2 .dng .gif .jpeg .jpg .mov .mp3 .mpeg .mpg .nef .png .swf .tif .tiff .wmv
In your environment, you may have additional file types that would need to be added to the above list.
ZIP Archive Compatibility
As mentioned elsewhere, I am a stong believer in using an industry-standard file format for your backup data. ZIP is widely used for this purpose, and most backup programs elect to rely on it, rather than creating their own proprietary format. But it should be noted that the type of ZIP files that your program creates may not be compatible with all ZIP utilities.
In particular, the original ZIP format does not allow for AES encryption. Therefore, if your backup utility supports this flavor of encryption, it is highly likely that you will not be able to open up the ZIP archives within Windows XP Explorer or WinRAR 3.6. Instead, you will need to use a more recent ZIP utility such as WinZip 9.
|Is this practical?|
Many backup programs use the industry standard ZIP archive format for storing backup data. By default, these programs often assume that you are going to be archiving a relatively small amount of files, so they generally default to creating a single ZIP archive for the entire backup.
With a 64-bit ZIP engine and a Windows XP operating system, this may mean that backup files can be generated in excess of 4GB! I discovered this the hard way when my first attempt at a 80GB full backup was trying to create a single huge ZIP file in C:\Documents and Settings\<user>\Temp. Not everyone has 80GB free on their boot drive! Of course this is completely impractical... Not only does it place excessive demands on your temporary storage, but there is no way that a file transfer this size would ever complete successfully.
Therefore, it is strongly suggested that you enable ZIP splitting at much smaller file sizes. In my case, I select 250MB, which offers a good balance between limiting the number of files in the backup directory and transfer times that are not excessive too for a remote link. Enabling the split option will partition the backup set into as many segments as required.
ZIP Splitting and Recovery
When selecting a split size, you should also keep in mind how it will impact recovery operations. If you think that you are likely to use the recovery process frequently, then you must consider how much data download this might entail. For example, if you are backing up to a remote FTP server, your effective bandwidth may be very limited (e.g. 50 KB/sec). If you need to recover a single file from an incremental or full backup set, you will need to download the entire ZIP file that contains the file of interest. In most cases, this will mean downloading a file the size of the ZIP Splitting setting (250MB in my setup). This may take just over an hour to download via a link across residential internet connections.
Now consider what you would need to download if you are trying to recover multiple files. If the files happen to be stored within the same ZIP archive (often the case if the files are in the same general folder hierarchy), then no additional download time may be required. But in many cases it could necessitate multiple ZIP archive downloads. In the worst case, a full recovery may not be practical via remote FTP download across a consumer internet connection. For this reason, I will simply pick up the NAS box from the offsite location whenever I need to do a large recovery -- I have this capability since I have a do-it-yourself offsite backup scheme.
ZIP Splitting can be accomplished in one of two ways: abrupt segmentation or independent splits.
Let's consider a backup job that would normally create a 400MB ZIP archive, and you have configured your split size to 250MB. Ideally, this would create two ZIP files, one of 250MB and the other of 150MB. The most basic way to accomplish this is to generate hte 400MB file first, then truncate (chop off) at the 250MB mark and place the remainder of the file into another chunk. Because the two chunks will no longer be complete ZIP files, recovery is much more difficult and onerous. Recovery will require all chunks to be downloaded/recovered before they can be stitched together to create the original larger ZIP file, at which point the ZIP decompression and recovery can start. What happens if you have a 80GB backup split across 350 chunks? You will need all 350 chunks present and uncorrupted in order for a third-party ZIP utility to open the file.
With abrupt segmentation, you'll probably find that your backup software has created files that are exactly your split size and with names such as: 1_E.z01 ... 1_E.z291 etc. Notice that the file extension is not .zip.
Because there is always the possibility of data corruption or loss within any data storage, I am not keen on the all-or-nothing characteristic. I would like to be able to download a single chunk (out of the 350) and open it up directly in a standalone ZIP utility. In order to do this, your backup program will need to have the capability of creating Independent Splits.
When Create Independent Splits is enabled, each chunk is a complete, valid ZIP archive. You no longer need to have all chunks in the archive available in order to perform a partial recovery operation. Without this option, a failure in one small part of your backup set can render the entire backup useless!
With independent splits, each file will have a .zip extension, and may not use up all of the split size specified.
Unfortunately, my first full backup (90GB) was performed without enabling this option, and this was important enough to me that I discarded my computer's 20 hours of effort and started the process over again!
I confirmed that I could open the archive files in WinZip 9, but they could not be opened in WinRar 3.6 or Windows XP.
Need to explicitly turn on split. In my case changed it to 250MB chunks, which are manageable. This way it only consumes 250MB on temp folder as well (reuses the space), and it's less demanding on the connection stability.
To get the full benefit of 128-bit encryption, it is suggested that the key be 32 characters, 49 for 192-bit and 64 for 256-bit. Unless you are hyper-paranoid, shorter passwords should be fine.
Incremental Backup Comparison Criteria
When performing an incremental backup, many programs simply compare against the archive bit. I have written elsewhere why this method is bad. The main point being that several programs may touch this bit, and it has no flexibility in that it is either set or cleared.
The better method is to use a catalog, and compare certain recorded file characteristics against those in the current view of the filesystem. These characteristics include modified time & date, size and possibly CRC32.
IPTC and Incremental Backups
In general, the time & date comparisons are sufficient for incremental backup criteria since most programs that modify files also update the last modified date timestamp. Unfortunately, some programs (such as photo catalog software / image browsers) can allow you to make modifications to the metadata (e.g. IPTC fields) without actually changing any other file characteristics.
|IPTC Metadata update didn't affect file size or last modified time|
If you are updating the IPTC metadata within your photo catalog, you may need to run your backup program using CRC32 as a comparison criteria instead or change the options within your image browsing software.
For more details, please see my article on Incremental Backup Methods.
If you are going to enable email notification (after completion of a backup job), it is worth enabling compression of the backup log (if it contains the list of files backed up) and using SSL for SMTP authentication. The SMTP authentication via SSL will ensure that your mail server password won't be sniffed en-route.
If you are planning to use an online backup via FTP, then you will need to consider how this process might interfere with your normal ability to use your internet connection. A file transfer via FTP may swamp your web surfing and cause you to experience very long latencies / response times. This can be frustrating -- fortunately there is something you can do about it.
- Bandwidth Throttling in Software
Most programs that offer FTP backups also allow you to specify a maximum KB/sec for uploads and downloads. While this will reduce the network consumption by your backup software program, this is less than ideal. Especially when you are running your job unattended (overnight), it is desireable to operate at full rate. It's only when you want to use the computer at the same time that you'd like to reduce the bandwidth consumption. This feature, however, is useful if you want to reduce your demands on the remote server / router (i.e. your friends!).
- Bandwidth Throttling in the Router
I much prefer a method that gives full performance to the backup software when you are not using the computer, but limits its demands as soon as you start using the internet for other activities. For this reason, I have configured my router to use QOS (Quality of Service) settings that set FTP transfers to use a lower priority than all other internet traffic. This can be done easily in many Linksys Routers by adding port 21 to the QoS section (which you can find under: Applications & Gaming -> QoS. Make sure you set the enable radio button at the top and then enable Optimize Gaming Applications with the settings shown below.
Enabling FTP throttling in Linksys Router
FTP Download for Website Administrators
If you are a website administrator, it is highly likely that you are running it on a remote server. You may have huge mySQL databases that provide much of your site's content, and of course these may be vulnerable to loss. While most backup programs don't offer FTP download, you can configure most programs to accomplish this if some command-line script invocation is provided. To do this, add the following line to your "before backup" command-line action:
This simply runs an FTP script that executes a series of commands to a command-line FTP session. For example, the following ftp_dload.ftp script will log on to your website with username "username", password "password" and then download a file. The downloaded file(s) can then be copied over in your normal backup job. In addition, I have configured the website server to run a cron job that extracts and compresses the mySQL data.