Saturday, January 15, 2022

Image Metadata: My Approach

 This is part 4 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach (this post)

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

In part 3, I mentioned that I use the standard "description" fields to hold what I truly know about when a picture was taken and developed. That's not all I put into these fields. I also include:

  • Descriptive text I have for the picture, e.g., written on the slide frame or the back of the picture. For the sample slide at right, it's "Tim Johnson's equipment". (Update 25 July 2023: per this blog post, I now also include the name of the set of pictures an image is from, if there is one.)
  • The source of the image, e.g., that it came from a slide.

The "description" fields are permitted to contain newlines, but I've found that many programs display only the first line of multi-line values.  I therefore put everything on a single line, and I use vertical bars to separate different pieces of information. For example, this is my "description" value for the sample slide:

Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide

Some of this text is unique to the picture, some is boilerplate (e.g., "Taken" and "Developed"), and some is likely to be repeated in the metadata for other images (e.g., "Scanned 35mm slide"). Programs aimed at metadata entry often support templates that reinforce formatting decisions and reduce the need to enter information more than once. A template for the "description" fields for my 35mm slides could look like this,

??? | Taken ??? | Developed ??? | Scanned 35mm slide

 where "???" is placeholder text for slide-specific information that must be entered manually.

I explained in part 3 how I take what I know about when a picture was taken and turn it into a timestamp for the standard "when taken" metadata fields. I also explained that I expect the scanner to automatically write a "when scanned" timestamp into the image file. Scanners can also be configured to write a copyright notice into one or more of the standard "copyright" fields. Because that can be made fully automatic and might protect my interests, I do it.

The following, then, is the metadata I write and the complete names of the fields I write into. (In posts prior to this one, I've sometimes omitted the namespace specifier when discussing XMP fields.)

  • A description of the image, including the set it's from, what's in it, when it was taken, when it was developed, and the source that gave rise to it. This is written to Exif's ImageDescription, IPTC's Caption-Abstract, and XMP's dc:description fields.
  • A "when taken" timestamp. It goes into Exif's DateTimeOriginal, IPTC's DateCreated and TimeCreated, and XMP's photoshop:DateCreated fields.
  • A "when scanned" timestamp. It's written to Exif's DateTimeDigitized, IPTC's DigitalCreationDate and DigitalCreationTime, and XMP's xmp:CreateDate fields.
  • A copyright notice, which is put into Exif's Copyright, IPTC's CopyrightNotice, and XMP's dc:rights fields.

The guidelines from the Metadata Working Group specify that the corresponding Exif, IPTC, and XMP fields for descriptions, "when taken", "when digitized," and copyright should be kept in sync, so some (but not all) programs will update all three fields in a set if you write to any of them. With ExifTool, you can use the MWG composite fields Description, DateTimeOriginal, CreateDate, and Copyright to set a value for the fields in all three standards at once.

An alternative to explicitly writing the same values to fields in Exif, IPTC, and XMP is to write values to the fields for one of these standards, then copy them into the fields for the others. For example, scanning software and a GUI program could be used to write values to Exif fields, and ExifTool could be used to copy the Exif values into the metadata blocks for IPTC and XMP. Given a file named myScannedImage.jpg, this command would do the trick:

exiftool -ApplicationRecordVersion=4
-MWG:Description<EXIF:ImageDescription
-
MWG:DateTimeOriginal<EXIF:DateTimeOriginal
-
MWG:CreateDate<EXIF:CreateDate
-MWG:Copyright<EXIF:Copyright
myScannedImage.jpg

Minor variations on this command would use IPTC or XMP instead of Exif as the source of the fields to be copied. 

Yes, this looks like black magic, and no, I'm not going to explain how it works. (ExifTool has very comprehensive online documentation.) It looks even blacker when you type the command on a single line, which is how you'd typically do it. My point is that this approach guarantees consistency among Exif, IPTC, and XMP, yet requires manually entering information only for Exif. ExifTool can be applied to many files at once, so if you have lots of files with fields to copy, it can make quick work of a big job.

I want the metadata I embed in image files to be as widely and easily accessible as possible, so my approach is very conservative. I use only widely supported, standard fields, and I'm careful to put the same values into the Exif, IPTC, and XMP fields that are supposed to mirror one another.


No comments: