This is the first in a series of posts about putting metadata into scanned picture files, including why it's desirable, how I approach it, and how well it works. The series consists of:
Part 1: The Scanned Image Metadata Project (this post)
Part 2: Standards, Guidelines, and ExifTool
Part 3: Dealing with Timestamps
Part 6: The Metadata Removal Problem
Part 7: Thoughts after 4000+ Scans
Not
long ago, my wife asked if I could find a particular photograph. I
dug up what turned out to be a slide from 1992. The exercise
reminded me that the bulk of our photographic history exists only in
non-digital form: slides, prints, and negatives. That puts it one
disaster away from annihilation. A fire, a flood, a theft, and we lose
everything. Not that a sudden catastrophe is necessary. Slides,
negatives, and prints degrade over time. Colors shift. Details fade.
I've known for many years that I should have our pictures scanned into digital form. In 2008, I looked down that road, but I was stymied by the challenge of storing metadata. Getting images into files is easy. Capturing the metadata for the pictures--who's in them, when and where they were taken, etc.--is anything but.
The image metadata problem is an old one. News photographers have long needed a way to electronically convey photos and associated information to their central offices. By 1991, there was a technical standard for it. Thirty-plus years later, you'd think we'd have a well-established, straightforward way to handle image metadata. You'd be wrong. As a comment at Stack Exchange Photography put it last month, "Image and video metadata is a complete hot mess."
There are two basic reasons for this. First, there are three overlapping standards for metadata storage. All are in broad use. Terminology and conventions within and among them are inconsistent and confusing. One standard's Description field is another standard's Caption_Abstract, for example, and that's sometimes referred to simply as Caption. It's different from the Title field, which is not to be confused with the UserComment field.
The second issue is that programs working with metadata layer on additional inconsistent and confusing names. It's not easy to remember that one standard's DateTimeOriginal field is called DateCreated in some programs, but DateCreated is completely different from CreateDate, which is the name some programs use for a field officially called DateTimeDigitized. Though the Title field is not the same as the Description field, File Explorer and Photo Viewer on Windows 10 sometimes show the value of the Description field with the label Title. Sometimes with the label Subject. Occasionally with both.
Mastering
the name game is one challenge. Dealing with redundancy is another. Each image file
typically has three description fields, for example, one per
standard. Do you write the same data into all
three fields, thus ensuring consistency, but risking incoherence if
one of the fields is edited, or do you write to only a single
field and leave the other two blank? Sorry--trick question!
Many programs automatically write to all three fields, even
if you edit only one. At the same time, some programs that show descriptions read from
only one of the fields, so if the one they look at is empty, you won't
see anything, even if other description fields have information in them. Redundancy and potential inconsistency are, sadly, the only practical choice.
Little wonder that some people throw
up their hands and look for a solution not involving embedded metadata. One approach is to store the metadata
separately from the image, often using the image file's name as a
key to look up in a spreadsheet or text file. For me, this as a
non-starter. It's too easy for the image and the metadata to get
separated. Another approach is to use an image's metadata as its file
name. This is clumsy even in concept ("Joe, Bob, Sue, Fred at Lincoln
Beach celebrating Bob's retirement 1980-07-16.jpg"), but a bigger
problem is that it doesn't address photos stored in the cloud (where
file names may not be visible) and photos sent via text message (where
the sender's file name is not provided). Image file metadata is a mess,
to be sure, but it's still the best of a bad lot.
I want to store metadata about a scanned photo in its image file such that it will be easily accessible in any program that displays metadata. Unless expressly removed from the file, the metadata should stay with the image if it's copied, moved, emailed, texted, uploaded, or shared in the cloud. The comments written on the back of a physical photograph stay with the photo as it's moved about. Image metadata should do the same.
Achieving my goal requires figuring out the following:
- What metadata should be stored.
- Which metadata fields it should be stored in.
- How to put metadata into those fields.
- How to view metadata in an image file.
- How to preserve metadata when an image is moved around (e.g., emailed, texted, uploaded, etc.)
In recent weeks, I've spent a lot of time wrestling with these issues. In subsequent blog posts, I'll explain what I've learned and the conclusions I've come to. Links to the full series are at the top of this post.
3 comments:
Wishing you the best of luck with your project with the hope that finally somebody will be able to devise a standard acceptable to all.
looking forward to reading more about this. There are also some issues when you add Metadata to the images on one OS and transfer the image to another OS. No all data may be viewable
@Avisenna: In my experience, the metadata you see is not dependent on the OS, but rather on the program you're using to view the metadata. Some programs show more than others. I'll address this issue in a later blog post.
Post a Comment