Wednesday, January 12, 2022

Image Metadata: Dealing with Timestamps

This is part 3 of my series on metadata for scanned pictures. 

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps (this post)

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

Several image metadata fields store timestamps (i.e., dates and/or times). As you'll see, the field names can exasperate, but the bigger problem is that the fields request more precision than anybody with scanned images is likely to have.

The most important timestamp identifies when the picture was taken. Digital cameras know this down to the second, but for pictures from the age of film, such precision isn't available. For example, in my first post in this series, I mentioned a slide that my wife asked me to track down. What I'll call the sample slide is shown at right. I had the foresight to write on its frame when it was taken, but I wrote only July 1992. I don't know what day in July, and I certainly don't know the time.

Exif and XMP (but not IPTC) timestamps are permitted to omit unknown date and time information, but word on the net is that partial timestamps are uncommon and that they're likely to confuse programs that encounter them. Because I want my metadata to be compatible with as wide a variety of programs as possible, I've decided to avoid them. 

That commits me to providing a complete timestamp for each metadata field that wants one. But when I don't know the month (or the day or the time) when a picture was taken, what should I use for the values I don't have? The convention among image metadata-istas is to use the earliest permissible values: 1 for missing days and months and 00:00:00 for missing times. Per this convention, the timestamp for when the sample slide was taken is 1992:07:01 00:00:00.

I'm not wild about this convention. When you order images chronologically, it has the effect of putting images with unknown months, days, or time in front of images with more detailed information. A picture known to have been taken on July 15, for example, is ordered after a picture known only to have been taken sometime in July. I was recently looking through scans of pictures from my wife's and my wedding and honeymoon, and the honeymoon pictures were listed before those from the wedding. That's because I know the date of the wedding, but on the honeymoon pictures, I noted only the month and year. It's been a long time since my wife and I got married, so I could be mis-remembering, but I'm pretty sure that the wedding came first.

I believe it would make more sense to have images with missing information sit in the chronological back of the bus, i.e., to order them after the images with more specific information. That'd be easy to do (just use the latest valid value for unknown days, months, and times instead of the earliest), but I decided against it. In addition to running counter to convention, it's more error-prone. If you use the last day of the month as the day a picture was taken when you don't know the actual day, you have to deal with the fact that different months have different numbers of days, and the number of days in February depends on the year. When scanning photos, the date the picture was taken has to be entered manually, so the process should be as simple as possible. Setting unknown months and days to 1 is about as simple as it gets, and an "unknown time" value of 00:00:00 is a lot easier to enter than 23:59:59 (which is what you'd have to use for unknown times in order for them to follow known times).

Only some of my slides and photos have annotations telling me when they were taken. For those that don't, I fall back on when they were developed. In the case of slides, that's typically marked on the slide frame. For the sample slide, the development date is August 1992. If I had no information about when the slide was taken, that's what I'd use.

This policy means that for an image whose metadata timestamp says it was taken on July 1, 1992, it's impossible to distinguish among these possibilities:

  • The picture was taken on July 1, 1992.
  • The picture was taken in July 1992, but I don't know which day.
  • I don't have information about when the picture was taken, but I know the film was developed in July 1992.

I address this ambiguity by putting what I actually know into the "description" metadata fields for the picture. These fields have different names in Exif, IPTC, and XMP. Exif uses ImageDescription. IPTC goes with Caption-Abstract. In XMP, the field is dc:description

Many (but not all) programs that edit metadata tie these fields together. If you edit one, the others are updated automatically. ExifTool takes a different approach. There, if you write to one of the "description" fields, only that single field is affected. If you want to update them all (and you certainly want to keep them in sync!), you can write to the MWG composite field, Description. That propagates the change to all of Exif, IPTC, and XMP.

For the sample slide, I put this information into its description:

Taken 7/1992
Developed 8/1992

My policy implies that when I encounter an image file with a day of 1 for when it was taken, I have to check its description to find out what the 1 means. The metadata timestamp for when the picture was taken is an approximation. What's actually known is in the image's description. 

This approach generalizes to pictures where the "when taken" information is too vague to put into date/time format. For example, if I have nothing telling me when a picture was taken or developed, but I can guess that it was taken in the late 1970s, I can leave the "date taken" fields empty and write what I know in the description (e.g., "Taken in the late 1970s--look at those clothes!") . 

Naturally, "Date Taken" is not the name of a standard metadata field. That'd be too easy. The Exif field name is DateTimeOriginal. XMP calls it DateCreated. IPTC has two fields, one for the date (DateCreated) and one for the time (TimeCreated). Note that DateCreated in XMP is both a date and a time, but DateCreated in IPTC is just a date.

Programs manipulating metadata timestamps may or may not propagate changes in one field to the corresponding fields in other metadata blocks. In my experience, it's easier for these fields to get out of sync than it is with description metadata.

ExifTool's approach to "date taken" mimics that for description information. Individual timestamp fields can be written, but it's also possible to write to an MWG composite field representing the three fields that should mirror one another. For the "date taken" timestamp, the composite field's name is DateTimeOriginal (the same name that Exif uses), so using ExifTool to write the MWG DateTimeOriginal field has the effect of putting a value into the corresponding "date taken" fields for Exif, IPTC, and XMP. 

The date and time when a picture was taken is typically the most important timestamp for a scanned image, but it might also be useful to know when the scan was performed. I expect scanners to be able to automatically insert this information into the metadata. I don't have any specific use for this timestamp, but since recording it should incur virtually no cost, I want to do it. You never know what information might be useful in the future.

The Exif field for when an image took digital form is DateTimeDigitized. IPTC again uses two fields, DigitalCreationDate and DigitalCreationTime. XMP calls it CreateDate. CreateDate is also the name of ExifTool's composite field for all these fields.

Note that the IPTC and XMP DateCreated fields refer to when a picture was taken. The XMP and ExifTool CreateDate fields refer to when it was digitized. I think this is a terminological train wreck, but, sadly, this is the only train in the station.


No comments: