Return to Digital Photography Articles

Archiving Digital Photos

The following section details the extremely important issue of archiving. In the first part, I detail the reasoning and strategy. Then, I go on to discuss the automated workflow I use to ensure I don't have to worry about my hard-earned photo collection.

Select a Topic
Do-it-Yourself Offsite Backup
Review: Backup NOW! Deluxe 3
Review: DirSync 3
Review: Second Copy 7
Review: Genie Backup Manager Pro 7
Comparison of Backup Software
Incremental Backup Methods
Review: Backup4All
How to set up your Backup Software
Choosing the best Backup Software
Review of Kanguru Slim FC-RW
Review: Kanguru Media X-Change 2.0
Archiving / Backup for Digital Photos
Beware of archiving to hard drives
Review: BackupNOW Deluxe 3.0
Review: Beyond Compare 2

Methodology used to archive digital photo collection

My Previous Backup Method

  • Every day, my photo collection is mirrored onto another physical hard drive at 6:00 AM. The likelihood of two drives failing at the same time are not great, and it helps ensure that one has at least a recent backup to work from.
  • In addition, the entire collection is also archived by incremental backup to a DVDR on Monday & Fridays at 8:00 PM. If I leave the DVDR in the drive, the backup will be done automatically as well.
  • Create two copies of the backup data to DVD-R. One set is kept at home, and the other is kept at a remote location.
  • Every month or so, I perform a full or differential backup to DVD-R. Instead of relying on a large number of incremental backup sets, I prefer to restart the backup from a clean copy every so often.
  • Keep at least one old backup set at a remote location (full plus any differential and incremental discs) stored, and destroy previous sets.

The collection is defined as my entire hierarchy of photos in addition to the IMatch database (which contains all of the categories and tags).

My Current Backup Method

I have long since searched for an affordable way to protect my collection without the hassle of burning DVDs then transporting them off-site. The manual transport is the problematic part as it is all to lazy to put it off. So, I created my own offsite backup for free! This has been a fantastic improvement, both in automation and piece of mind.

  • Every night, my photo collection is copied via FTP to an offsite hard drive automatically. This backup is incremental and preserves old version sets of photos and data. By offsite, it means far away from my home!
  • The same process is performed on my documents, website and other files of importance.
  • All files are protected by encryption with AES before backup.
  • Once every month or so I do a full backup to an external drive.

The beauty of this setup is that it is completely automatic. I don't need to be involved in the process anymore (so I won't get lazy and forget), plus it protects me from disasters around the home (where onsite hard drives or DVDs could get damaged / stolen).

Please have a look at my article on FTP Offsite Backup. Note that I do not pay for a service -- this simply uses my own hard drive (NAS) stored at another location and configured appropriately.

Both backup processes also involve verification, reading back the data to ensure that everything copied to the medium without error.

Backups: Full, Differential, Incremental, Drive Image and Mirror

Nearly all backup programs can be classified into two variants: those that create drive images and those that create file-based backups.

The difference in backup styles is important to understand as it has implications on how a photo catalog can be archived. The following summarizes the main differences between each style:

  • Drive Image
    Imaging is a process whereby an exact duplicate of the contents of a drive are created on another drive (or spanned across removable media). It should be noted that imaging is almost always done at the disk-level, not on the individual file level. Therefore, with imaging, you generally don't have any options with regards to which directories or files you want to archive. This is a fundamental point, as it means that you must copy the entire drive, and cannot ignore certain directories or file types.
  • Full Backup
    Unlike imaging, a full backup is almost always file-based, and it copies a selection of the drive's directory hierarchy. It also usually supports exclusion rules (ie. not copying over the Thumbs.db files) and the archive set can span multiple drives (eg. some directories from drive C: and others from E:). A full backup copies over all of the files in the backup set, and it pays no attention to modification dates, archive bits, etc.
  • Differential Backup
    Differential backups copy over the files that have changed since the last full backup. After performing a number of incremental backups, the backup data will eventually be spanning a large number of incremental sets in addition to the last full backup. This means that recovering the full set of files might involve using a large number of backup sets or discs. This has the disadvantage in that a failure in one disc might destroy the entire restore process. When a differential set is created, all of these incremental discs are essentially "thrown out" and a fresh start is created of changes since the last full backup. If not many of the original files have been changed, it's worth performing differential backups periodically. If a large percentage of the source files change, then it is better to recreate the full backup set instead.
  • Incremental Backup
    Incremental backups copy over the files that have changed since the last backup, whether it was a full, differential or incremental backup. This has the advantage in creating the smallest and fastest backup to keep "up-to-date". It has the disadvantage over time of distributing the new or changed files over many backup sets or discs. An ideal strategy is to use an incremental set regularly and then a full or differential periodically.
  • Mirroring
    Mirrors simply keep a synchronized / up-to-date copy of specified file hierarchy across multiple drives. A change on one drive (usually the master or source) is reflected in the destination. Some programs will automatically monitor the source directory for changes, others need to be invoked periodically or scheduled. Unlike normal backups, mirroring has the disadvantage in that a corruption in the original set (the source) will eventually copy over the mirror destination set. A smarter mirror system will take an additional step with changed files and create a subdirectory to preserve the history of changes to each file. See the section on changesets.

Backup storage: Standard or Proprietary

Coming soon...


Select a section to enter:

Coming soon:

  • Ultimate archiving strategies: RAID
  • Offsite FTP
  • Mirroring with a smart twist: changesets
  • Review: XXCopy
  • Media: CD-R, DVD-R, archival quality? Media lifespan?

 


Reader's Comments:

Please leave your comments or suggestions below!
2013-08-25Michael
 Hi Calvin,

First off: your website is true goldmine. I've been snooping around quite a bit the past couple of weeks. You might even get me interested in freediving or RC helis ;)

I am currently designing my own workflow and backup strategy for my digital photos. I would like to throw DVD-Rs in the mix for backup - just in case an EMP-bomb hits my part of the world and erases all harddisks. Do you have any suggestions on how to keep track of which file is on which dvd? Should I put the DVD number in the metadata of the file? Should I use some kind of catalog for this? I'm looking forward to reading your thoughts on this.
 Thanks! If you are looking to incorporate offline media (such as DVD-Rs) then I think the easiest approach is probably to leverage the capabilities of a photo catalog program (as the centralized database will keep track of which files are archived and on which media). However, if you want to approach this without the use of a catalog database, then I think you'll want to minimize the amount of effort involved (as otherwise your methodology will undoubtedly become inconsistent as one gets lazy). Therefore, I would rule out the metadata approach. Instead, I would opt for a simpler approach of labeling the discs with the approximate date ranges (start date and only label the end date once the disc is full and finalized). In the disc, I would retain the same dated folder structure. This way you can easily find a disc associated for a particular date. The downside would be if you were to update a set of photos from an older date. In that case you might consider not completely filling each disc -- instead leave yourself a good half gigabyte and don't finalize the disc before moving on to the next one. Good luck!
2007-09-09jorrit
 Hi,

Thanks for a very well written and exhaustive website. I stumbled upon it yesterday when I found out that my current backup program (GBM) claimed to have backed-up some of my crucial folders like my desktop, but 'forgot' to actually do it. :-0

The setup I have at home reflects somewhat your setup, with a few minor changes that make my life a bit more challenging, and I still feel I haven't fully cracked the puzzle yet.

My setup:
  • I work on a Pc (Laptop) where I do most of the work. I make a daily backup of the user files to the server
  • On the background, I have a server running with a RAID 5 configuration.
    Unfortunately, the laptop hard disk is not large enough for all my past projects, so I only store the most recent versions on laptop, and the older projects are stored on the raid configuration only.
Now, saving the data on the raid does help in the case of a crashing hard disk, but doesn't address the issue about the single physical location. The other problem I have is of versioning: I work a lot with Adobe software, and their CS suite has a nasty habit of corrupting files while saving.

So, after implementing the RAID, the problem is now restated for me in:
  • how can I implement a system that allows me to easily go back to a previous version?
  • How can I backup the laptop including the versioning information to the RAID?
  • How can I backup the RAID (containing the backup of the laptop and the information that is stored on the RAID only) to a remote location while maintaining versioning information?
I haven't so far seen any articles addressing this issue, so I am wondering if I'm making life too difficult. Do I need to backup the "versioning information" at all if I actually have a daily backup? Should I simply mirror my laptop to the raid and make a daily backup from that?

Any suggestions or links or ideas are appreciated!
 I think your needs are fairly typical -- the problem is that the solution isn't always addressed completely by a single product.

Looking at your requirements, the need to "roll-back" or "rewind" versions of the same file means that straight mirroring will not be sufficient. If there is a chance that you may corrupt your working file, or accidentally resave, the mirrored version will be equally as useless down the road.

Therefore, you will need additional backup software (or scripts) that preserve version sets. A regular incremental backup is what you need, not a true mirror. Yes, this will cost you extra file size but this can always be "purged" later on (by doing a full backup). I have outlined a couple backup utilities with versioning.

I would recommend that you keep the incremental versioned sets in an offsite location. If you chose to do this the old way (backup to CD / DVD and then move them to another location), you just need to ensure that the backup program lets to recover to an older incremental copy (i.e. previous version). I much prefer automation here, so I use an automatic offsite backup to NAS for this purpose. This automated backup uses encryption and preserves versions and solves the danger of onsite storage.

Once you have automated your offsite incremental scheme, then you are free to add your local mirroring (e.g. via RAID5) to provide some fast redundancy locally.

In answering your very last question, yes, I would simply make a daily incremental versioned backup from your RAID. You don't need to worry about the versioning information as this is what the backup software's database/catalog retains for you. Note that your backup utility will often place a copy of this catalog in the offsite location as well, so that you don't run the risk of losing it.

Hope this helps!
2005-07-31 
 

The subject of this article is archiving but what you explain is back up. I cannot figure why someone cannot create the software we all need. It needs to automatically copy the photos to thumbnail images, allow us to put descriptions to each photo or globally, then back up the master images to cd and set an index number on the cd.

Later when we search by keywords for our photos it will tell us which CD to insert to pull up the master image. Doesn't sound that tough but cannot find this software anywhere.

 

Perhaps I've used the term archive in a looser context. Yes, what I'm primarily talking about is backup. However, what you are describing can be done quite easily by many catalog programs on the market today. Many of them support offline images, which involves creating thumbnails, tagging into a database and then mark the database with a unique identifier for the media for later retrieval.

Although some of these catalog apps don't actually do the burning process themselves, one generally just needs to insert the burned disk to have the catalog app take care of the rest. Trying to write a catalog application to do all these steps (and do them all well) is not easy, as it is clear that even just the catalog features are hard enough to get right, let alone add in the variable of burning to removable media. Perhaps more will become available that offer a good mix of all features integrated together.

2005-07-12Justin
 

I was going to comment that hard drives can indeed be a viable backup solution if you implement RAID, but I see you have that planned as a topic already.

A couple of advantages to hard drives:

  1. They've been around for a long time and haven't changed all that much. Compare this to CD's which are already near-obsolete, and even DVD's, which are starting to face competition from HD-DVD's. What if you had started using Iomega Jazz discs to backup a few years ago...better hope you can find a drive to read it! Now I'm not saying that suddenly we'll have drives that won't read a standard CDR, but hard drive technology seems to move a bit more slowly.
  2. Per GB it's incredibly cheap. This makes it easy to implement things like RAID, and to have multiple drives with the same content on them in different locations. I have a couple of portable Firewire drives which I bring in once a week or so to backup everything onto and then keep them offsite.
  3. Since HD's are larger than media like DVD's, you don't have to shuffle around as many things. This might be a minor niggle but personally I prefer to have more stuff stored in one place (of course that means it's even more important to have more than one copy of that drive, in case one should fail). It also means less work in your catalog app, as you can get a lot more on just one volume.

I can definitely see where you are coming from with regards to hard drives, but I wouldn't count them out entirely. Implemented properly I think they can be a very valid solution.

 

Justin -- some great points.... RAID is certainly a solution that I have been contemplating seriously, and the benefits that you mention are definitely significant. From my perspective, I think your point #3 is the one that might offer the biggest draw for most people. The more you can automate the process (remove the human element), the more likely you are to have a thorough and recent backup, leaving one unscathed. It doesn't take long before one's image collection outgrows what can reasonably be archived by hand (ie. to removable media). Having a terabyte of storage at home, I'm currently experimenting with a few different approaches.

 
2005-01-11Dustin
 

Goin old school for a moment. I am in the process of archiving an entire library of negatives and contact sheets. The question I have is, how do people store these combinations? I'm looking for a forum that talks about how others do it. There are a lot of options and different opinions regarding this subject.

 


Leave a comment or suggestion for this page:

(Never Shown - Optional)
 

Visits!