Archiving for posterity and profit
By Andy McCourt
Once a digital image is captured – either by camera or scanner – it can live a variety of lives in different media but at some stage it needs to be archived and indexed for future use. Much debate abounds about the best method but, as Andy McCourt discusses here, some are more archival than others.
There are two archiving issues facing those who reproduce images in a professional capacity – storing and retrieving the image files and longevity of the finished result. The first is distinctly a digital issue, the latter an all-analogue affair, which we will discuss in the next issue of Digital Reproduction.
It is this unique nexus where digital (data) - meets analogue (paper, canvas etc) that makes the digital reproduction business so exciting. Since the dawn of time, humankind has sought ways to immortalise its art, records and creative output. Until about thirty years ago, this was all accomplished by committing images to a substrate (paper, vellum, film, papyrus or even cave walls), and then storing them. It is impossible to estimate the amount of precious recorded history and art that has been lost due to improper archiving.
Where professional photography and art reproduction are concerned, the digital age has brought both benefits and drawbacks. Film has long been the archive medium for these images, with negatives and transparencies being carefully stored away in acid-free sleeves, in air-tight containers and in low-humidity rooms. Even so, many a film archive or canister has been opened after years, only to reveal crumbling fragments of gelatin, silver and cellulose.
Fortunately, the art of making copies has protected the memory of many important images and the digital age has facilitated this with its many archiving formats but, which ones are best for long-term bullet-proof archiving of important image files?
LOCKING AWAY THE DATA
We all know that a digital image is data at its basic level. A set of instructions to the programme that opens it up and ‘re-assembles’ the file close to or exactly as it was captured. Some file formats compress the data – such as jpeg – and this can cause loss of data when de-compressed. For this reason, most professionals prefer to shoot RAW image data and deal with large image files for archiving. For example, the Hasselblad H2D-39 delivers 70MB uncompressed RAW files in 16-bit colour depth, or 8-bit 115MB TIFF files. Most professionals are moving towards RAW files exported directly to Adobe’s DNG (digital negative) format.
The initial capture stage is usually (unless camera is tethered to a computer in a studio), to an in-camera Compact Flash, SD, xD or similar storage card or computer hard disk in the case of a scanned image. Whilst it is feasible to use professional-grade CF type cards from SanDisk, Lexar, Kingston and Seagate as an archive source, it is an expensive way to go about it as 8GB cards cost around one thousand dollars. There are intermediate ‘transport’ stages of archiving such as hand-held viewers with internal storage and USB connection – Jobo and Epson make these handy devices for instance. Then there are the myriad ‘stick’ storage devices that go up to a couple of gig and are small enough to hang around your neck. Often, however, the files are copied across to CD or DvD in the belief that this has protected the images forever.
DVD- DATA VANISHED = DISASTER!
Unfortunately, this is an assumption and one that could prove costly in years to come. CDs and DvDs may be cheap but are prone to read/write errors and the aluminium oxide layer degrades over time, corrupting the data or rendering the discs unreadable. Some cheaper CDs and DvDs have been known to ‘fall apart’ after only 2 or 3 years. The better ones – such as Kodak Gold – are made to much higher standards and do indeed contain gold, so they are much more expensive. But even these should not be viewed as permanent archive storage, in isolation of other forms of back-up.
In any case, even dual-layer DvDs hold only up to 8-9GB or about 150 RAW files in the 70GB region. There is a new type of DvD coming though called Blu-Ray (as it uses a blue-violet laser instead of red), and this is said to hold up to 25GB of data but archive capabilities are not known at this stage.
Hard drives are a better option and these can come in many shapes and sizes. USB hard drives can be bought relatively cheaply these days, from manufacturers such as Seagate, LaCie and others. My computer has a 40GB Seagate USB hard drive plugged into it as backup and its cost was around $200. One Terrabyte (TB) HDDs are now available for about $1,000 and this is the equivalent of 120 dual-layer DvDs, with much better access to the files. A busy professional photographer shooting 1,000 images a month could easily fill 1TB in a year.
But of course, even hard drives can crash so it’s always a good idea to back-up mission-critical data. Damon Rulach from Hasselblad distributor CR Kennedy advocates magnetic tape drives as the most robust method of archiving or backing up files; “Tape cartridges can hold up to 40GB and last up to 100 years,” he told the audience at his July ADAPT workshop. Sony’s tape drive costs around $1,500 and tapes about $90 each. Of course, if archiving tapes for years, it’s equally important to keep the reading device as they may not be making it in years to come! It’s like Beta-vs-VHS in the old video tape days. Try finding a Beta player today!
RAIDING THE DATA VAULT
There is another method of archiving important data straight-out of high-end server-based computing in publishing houses and prepress departments where they deal with thousands of images daily. It’s called RAID for ‘Redundant Array of Independent Discs.’ It is basically a stack of HDDs (hard drives) – anything from 3 to dozens – all interlinked and governed by the RAID software. RAID can be built into your desktop computer, situated alone or even as a plug-in box.
The beauty of RAID is the way it stores the data. It is ‘striped’ across all the drives so no single HDD is responsible for all the data, and there is a redundant drive or drives always available. If and when one of the drives fails, the RAID system sends out an alert, the crashed drive is replaced and – this is the magic part – all the data is rebuilt, nothing is lost. Not only does it do this but it does it in real-time so, if you have a RAID server and, say 3 Mac operators working on images, and there is a system crash; it rebuilds the data up to the point of the crash – all work in progress as well as the image library is saved.
How does RAID do this? It’s all in the RAID algorhyhtms and probably best left up to the boffins who, as one way put it, are ‘Raiders of the lost Quark.’ RAID is a click-on option when buying Dell computers online these days.
So, the most bullet-proof method of storing valuable digital image files would be a RAID server, together with tape back-up where the tapes are stored in a fire-proof safe. For most of us, HDD-storage with back up will suffice but don’t rely on copies to CD and DvDs alone.
ARCHIVE ON THE INTERNET?
There is one more method for the discerning professional and that is ‘Data Silos’ or digital repositories where files are stored and managed by a third party who charges a fee for the service. Australian company Recall is one such firm and, although their main market is corporate and financial data, they could easily handle important image files too.
Of course, the internet itself is like a huge data repository and it is conceivable that images can be in a perpetual state of; ’storage’ in cyberspace. However, accessing big files is still slow even with broadband and with ubiquitous access, copyright becomes an issue if original files are available. Google now offers a ‘let us look after your pictures’ service but this is aimed mostly at consumer markets – at this stage anyway.
One commentator estimates that there are about 20 Exabytes of information out there – and enough storage to keep it all. How much is an Exabyte? Well, after Megabytes of course we have the Gigabyte which is 1,000 times greater. Going up the scale, with each stage 1,000 times greater than the previous one, it goes…Terrabyte, Petabyte, Exabyte, Zetabyte then Yotabyte. There is some speculation that Google was named after a mathematical expression for a 1 followed by countless zeros – a ‘gogol.’ For those seeking Greek etymology; sorry – the story goes that a top mathematician showed this huge number to his 5-year-old son and asked ‘what’s this?’ Gogol was the answer!
Whether a Gogolbyte will ever exist probably does not matter to us in the professional digital image reproduction business… but it makes sense to back up your files to several media and use the most robust method for archiving and retrieval.
THE ARCHIVE DATA TRAIL
Leaving your RAW image files on two or more of these media will add protection to the data in the event of potential loss. As image libraries grow, indexing and retrieval software needs to be a consideration.
1 – The original – photographic subject or artwork
2 – CF card or similar – images stored and safe for now
3 – Portable storing/viewing device – immediate back-up of RAW file
4 – Computer hard disc – files transferred for viewing/editing
5 – CD or DvD burning – cheap and handy but short-term solution only
6 – External hard disc – selected and edited files stored here
7 – Server-based storage & retrieval, RAID – for big image libraries
8 – Magnetic Tape backup – Dependable, portable, protectable
9 – Data Silos – archival storage of critical images by third parties
10 – The internet – wherever it’s headed, it’s getting bigger