Tuesday, January 18, 2022

Image Metadata: Viewing What I Wrote

This is part 5 of my series on metadata for scanned pictures. The series starts here.


Just because an image file contains metadata doesn't mean that the metadata is visible or recognizable as what it is. Lots of programs can display metadata. Each has its own quirks. I put only four pieces of metadata into my image files, but most of the programs I tested show only some of these. The fields that are displayed may be labeled differently from both the standard names and the names used by the program used to put the metadata into the file. Some programs apply a name from one standard to a field from a different one.

It is, as usual, a mess. The closer you look, the messier it gets. I've performed numerous experiments, and the stories I could tell...  

But I won't. The way to deal with the mess is to not look very closely. My goal is to produce image files with metadata that I can share with others. I already know how to view an image's metadata, so the real question is whether other people can see it. 

There's no reason to expect friends and family members, etc., to know anything about Exif, IPTC or XMP. However, they'll know descriptive text or a copyright statement when they see it, and if they see a date and time, they'll assume that's when the picture was taken. If they see another date and time that says something about when the picture was scanned or digitized, they are unlikely to be confused.

Inspired by Carl Seibert's survey of how different programs prioritize Exif, IPTC, and XMP when reading metadata, I examined a dozen programs to see how well they made the metadata visible for my sample side from part 3 (shown at right). Although a couple of the programs are aimed at more serious users, most of the 12 are stock apps that come as part of the operating system. They're the programs likely to be used by people with no special interest in metadata. All of the programs I looked at are free. 

The high-level takeaway is that the most important metadata stored in my scanned image files is pretty accessible for anybody who knows to look for it. Things could be better, but they're not bad. As such, my approach to embedding metadata in image files seems to be reasonable.

I scored each program I looked at on a 10-point scale. Points were awarded as follows:

  • 6 points if the image's metadata description is fully visible. If this requires making a window wider or putting a phone into landscape mode, that's fine. I used this description (from part 4 of this series) for testing:

Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide

  • 3 points if the metadata description is partially visible, but can't be made fully visible. A partially visible description tells the person looking at the picture that descriptive information is present, but it's not as good as showing the entire description.

  • 2 points for showing the date when the picture was taken such that a viewer could reasonably assume that that's what the timestamp represents.

  • 1 point for displaying the copyright notice (even if it's only partially visible).

  • 1 point for showing the date and time scanned in a way that makes it recognizable as what it is.

I weight the description field heavily, because it contains the two most important pieces of metadata: what's in the picture and when it was taken. (Recall from part 3 that the "when taken" field holds only an approximation. The actual "when taken" information is part of the description.) If the description is visible, and especially if it's fully visible, that's all most people need.

I issue a big penalty for programs that engage in what I consider a grossly deceptive practice:

  • -6 points if the image's description metadata is not visible, but the program offers its own description field that, if used, stores the entered information, but not in the image file. In other words, a program loses 6 points if it offers a field that looks like an image's metadata field for a description, but isn't. 

Only one program incurred this penalty. I don't want to give anything away, so I'll just say that it carries a company name that rhymes with "Boogle".

The scores tell only part of the story. 10 means that a program can display all the metadata I store in a recognizable form, but it doesn't mean that getting it to do that is straightforward. For details, read the per-program overviews that follow.

Programs on Windows 10

Of the following six programs, three (Windows File Explorer, Windows Photo Viewer, and the Microsoft Photos App) are included with Windows. The other three (XnView MP, Adobe Bridge, and ExifTool) must be downloaded and installed separately.

Windows File Explorer and Windows Photo Viewer (Score: 6)

These two programs show image metadata the same way: on the Details tab of a file's Properties dialog. This dialog displays a limited-width view of the description (3 points) and copyright (1 point), as well as the "when taken" timestamp (2 points). There's no timestamp for when the image was scanned. The fact that the description is displayed twice and is labeled both Title and Subject is strange, but both fields are in the Description section of the tab, so I think things are clear enough. 

Both of these programs ship with Windows 10, but my understanding is that Photo Viewer is hidden in some installations in favor of the Photos app. From a metadata point of view, that's a big step backwards, as we'll see next.

Photos App (Score: 2) 

Clicking on "..." and selecting "ⓘ File Information" when viewing a photo in the Photos app brings up a panel with metadata information. Of the four fields I write into image files, only when the photo was taken is displayed (2 points). This is disappointing for a dedicated photos app, and it's notably worse than Windows Photo Viewer, which is the program the Photos app replaced.

XnView MP (Score: 10)

XnView MP is my default image viewer, and that was the case before I started worrying about metadata. Its score of 10 indicates that it shows all the information I put into image files, but the plethora of metadata viewing options takes some getting used to. 

Everything starts with the Edit menu, which includes entries for "Edit comment...", "Edit IPTC...", and "Edit XMP...". For purposes of viewing metadata, none of these is correct. What you want is "Properties..." (also on the Edit menu). Selecting it brings up a window with multiple tabs, including one for each of Exif, IPTC, XMP, and ExifTool.

The Exif tab does the best job of showing all the metadata I embed, with each of the four fields clearly labeled and near the top of the window. On its own, this tab scores a 10.

The IPTC-IIM tab also shows all the fields, but the timestamp for when the image was scanned is unrecognizable unless you know that the hexadecimal codes for the relevant timestamp fields are 0x3e and 0x3f. No "normal" person would know that, so the IPTC tab loses the point for showing the date/time scanned and ends up with a 9. 

The XMP tab shows everything, but I'd expect the similarity of the names for the "when taken" and "when scanned" fields (DateCreated and CreateDate) to sow confusion and uncertainty. I give the tab credit for neither, and it gets a 7.

The ExifTool tab shows the results of running the copy of ExifTool that's embedded inside XnView MP. The amount of information can be overwhelming, but everything's there. It's there three times, in fact, once each for Exif, IPTC, and XMP. Taken by itself, the ExifTool tab scores a 10, but the Exif tab remains the easier way to get the information.

Adobe Bridge (Score: 10)

Bridge is Adobe's free companion to Photoshop and Lightroom. It's designed to organize and manage photos, not to change their appearance. Using Bridge, you can view and edit metadata, but you can't change what a picture looks like. 

It's reasonable to expect people who use Bridge to have an above-average familiarity with image metadata.

Bridge's metadata panel is divided into several sections, including ones for Exif, IPTC IIM, IPTC Core, and IPTC Extension. XMP appears to be missing until you recall (from part 2) that IPTC Core and IPTC Extension are sometimes used synonymously with XMP. No single section shows all the fields I write, but everything is present: the IPTC-IIM and IPTC Core sections have the description, "when taken" timestamp, and copyright notice, and the Exif section has the "when scanned" timestamp.

ExifTool  (Score: 10)

ExifTool is a command line program, though GUIs have been built on top of it. It's the go-to power tool in the image metadata world, and it didn't take me long to regard it as the source of truth for metadata in image files. Different programs label the metadata they show in different ways, so when you look at a field value, it can be hard to know exactly what you're looking at. Some programs lie. The Preview App on MacOS, for example, has tabs for Exif and IPTC, but there are conditions under which the values on those tabs come from XMP! Since metadata in image files can be seen only with the aid of programs that know how to read it, how do you know which programs to trust? I trust ExifTool.

It's hard to imagine anybody using ExifTool without knowing about Exif, IPTC, XMP, and the various fields they offer. I therefore score ExifTool with the expectation that it's being used by somebody who brings a fair amount of metadata knowledge to the table. Such users can be expected to recognize the difference between DateCreated and CreateDate. With that in mind, ExifTool scores a 10.

ExifTool's output on the sample slide is an unwieldy 96 lines long if you let it show you everything (which is the default), but if you ask it for only the fields I put into it,

exiftool -S




         '.\The Brown Experience 1985-1993 031.jpg'

you get this in return:

Description: Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide
Copyright: © 2022 Scott Meyers (smeyers@aristeia.com), all rights reserved.
DateTimeOriginal: 1992:07:01 00:00:00
CreateDate: 2022:01:14 17:54:46

The copyright symbol (©) is displayed incorrectly, but that's a problem with Windows PowerShell (where I ran the command), not ExifTool.

Programs on MacOS Big Sur

Each of the three programs I tested on MacOS is included with the operating system.

Finder (Score: 6)

Right-clicking on an image file in the Finder and choosing "Get Info" brings up this window:

It shows the full description in the metadata (6 points), but though timestamps are shown for when the file was created and last modified, there is no sign of the "when taken" and "when scanned" timestamps. The copyright notice is similarly missing. The Finder thus gets a score of 6.

Photos App (Score: 8)

Clicking the ⓘ while viewing a photo in the Photos app brings up its Info window:

It shows the full description (6 points) as well when the photo was taken (2 points), but the "when scanned" timestamp and the copyright notice are not shown. The score for the Photos app is 8.

Preview App (Score: 10)

Viewing image metadata with the MacOS Preview app reminds me of using XnView MP, but with a twist. With XnView MP, the Exif tab shows metadata from the Exif fields, and the IPTC tab shows metadata from the IPTC fields. That's not always the case with the MacOS Preview app. Regardless of how a tab is labeled, it may show metadata drawn from Exif, IPTC and XMP. That's disturbing, but, fortunately, irrelevant for my purposes. Writing the same metadata to corresponding fields in Exif, IPTC, and XMP means that it doesn't matter which field gets read. The Preview app's Exif tab, for example, shows when the photo was taken and when it was digitized (i.e., scanned). This information is correct for my image files, although it's actually pulled from the IPTC metadata instead of that for Exif.

On its own, this tab gets a score of 3: 2 for the date/time when the picture was taken, and 1 for when it was scanned.

The IPTC tab shows everything and thus gets a 10, though I take a dim view of the decision to display the date and time digitized between the date taken and the time taken:

The Preview app also has a TIFF tab. I don't know what kind of metadata this tab is supposed to show, but since all the tabs can show metadata from Exif, IPTC, and XMP, the labels don't really matter. Here's the TIFF tab for the sample slide. It shows the full description (6 points) and the copyright notice (1 point). The value it shows for the "Date Time" field corresponds neither to when the photo was taken nor to when it was scanned, so no points for that. The tab gets a score of 7.

The more I use the Preview app to look at image metadata, the less I like it. It right-justifies field names with respect to the center of the window, and it left-justifies field values with respect to that center, and, as you can see, this leads to a lot of wasted space on the left side of the window. I've often found that widening the window doesn't cause the text inside to be reformatted, so I've had to play games to get all the metadata properly displayed (e.g., force-close the app and then reopen it).

Programs on iOS 15

Photos App (Score: 8)

As of iOS 15, touching the ⓘ icon or swiping up while viewing an image displays the Info pane, which includes the image's full description (6 points) and the date and time it was taken (2 points). There's no sign of the copyright or "date scanned" metadata, so this app gets an 8.

Prior to iOS 15, accessing an image's metadata typically involved saving the image to the Files app, then using the Files app to view the embedded metadata. That continues to work on iOS 15, but it's more cumbersome, and my experience is that even though it displays more metadata fields than the Photos app's Info pane, it doesn't show any of the fields I write to my scanned image files. It would get a score of 0 if I officially evaluated it, but since I'm running iOS 15, I'm going to pretend I know nothing about the Files app workaround.

Google Photos App (Score: -4)

I'm generally impressed with Google's products and services, but the impression its iOS Photos app leaves on me is a depressing mixture of disbelief and anger. 

Pressing "..." while viewing a photo brings up its Info sheet:

It shows the "when taken" timestamp (2 points), but there's no sign of the "when scanned" timestamp, the copyright notice, or the description. Instead, there is an "Add description..." field, which, being empty, suggests that the image lacks a description. For my files, this is not just untrue, but triply untrue, because my scanned image files have description metadata in each of the Exif, IPTC, and XMP fields. As a company, Google knows this, because Google Photos in the cloud (see below) displays the embedded description. 

But that's not the heinous part. Should you, noting the the empty description field, succumb to temptation and put information into it, your text will not be stored in the metadata in the image file! Instead, the information you enter will be stored separately by Google. The same is true of any other edits you make on the Info sheet, e.g., "Add a location" or "Edit date & time". The Info sheet is a place to enter image metadata, but it's not a place to enter image metadata that will be stored inside the image!

This is reprehensible behavior. Hiding metadata present in a image while offering users the chance to add metadata that you'll keep private is...well, words fail me. But math doesn't. I slap on the -6 penalty for grossly deceptive practices, and Google's Photos app for iOS ends up with a record-setting low score of -4.

Cloud Services

There are lots of cloud-based photo storage services. I tested only Google Photos and iCloud Photos, and to be clear, I did it via their web browser interface, not via an app on a computer or mobile device. Among the many services I did not test are Facebook, Flickr, SmugMug, Amazon Photos, Microsoft Onedrive, Degoo, and photobucket. I welcome your comments about viewing image metadata using these services.

In a 2017 blog post, Caroline Guntur wrote,

Many cloud platforms and social media sites will not upload, or retain the [metadata] in your photos. Some will even strip the information completely upon download.

In a later post in this series, I will address what happens to metadata when you move image files around (e.g., upload or download them, email them, text or IM them, etc.). My testing shows that uploading an image to both Google Photos and iCloud Photos has no effect on its metadata--at least not for the four fields I care about. 

Google Photos (Score: 8)

Clicking the ⓘ symbol while viewing a photo opens its Info panel. That panel displays the full metadata description (6 points) as well as the "when taken" timestamp (2 points). The copyright and "when scanned" fields are missing, so the Google Photos cloud service scores an 8.

Like the Google Photos iPhone app, the Google Photos cloud service displays an inviting "Add a description" field at the top of the panel. As with the iPhone app, metadata you enter here is not stored in the image file, but instead in a Google database. 

Unlike the iPhone app, the description metadata already in the file is shown, albeit with the label "Other." Because Google Photos in the cloud displays the description metadata embedded in the file, there's less chance the person viewing the photo will think there's no description for it and will avail themselves of the "Add a description" field. I therefore withhold the six-point penalty here that I impose on Google's iPhone app.

iCloud Photos (Score: 2)

As far as I can tell, the only metadata visible for a photo viewed using the web browser interface to iCloud Photos is the date on which it was taken. It's displayed above the photo being viewed:

That yields a disappointing score of 2. Apple's apps on MacOS and iOS do notably better, and my impression from looking at Apple's support pages is that they expect you to use those apps as much as possible. If you don't have an Apple device, well, presumably that's an incentive for you to get one.

Saturday, January 15, 2022

Image Metadata: My Approach

 This is part 4 of my series on metadata for scanned pictures. The series starts here.


In part 3, I mentioned that I use the standard "description" fields to hold what I truly know about when a picture was taken and developed. That's not all I put into these fields. I also include:

  • Descriptive text I have for the picture, e.g., written on the slide frame or the back of the picture. For the sample slide at right, it's "Tim Johnson's equipment".
  • The source of the image, e.g., that it came from a slide.

The "description" fields are permitted to contain newlines, but I've found that many programs display only the first line of multi-line values.  I therefore put everything on a single line, and I use vertical bars to separate different pieces of information. For example, this is my "description" value for the sample slide:

Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide

Some of this text is unique to the picture, some is boilerplate (e.g., "Taken" and "Developed"), and some is likely to be repeated in the metadata for other images (e.g., "Scanned 35mm slide"). Programs aimed at metadata entry often support templates that reinforce formatting decisions and reduce the need to enter information more than once. A template for the "description" fields for my 35mm slides could look like this,

??? | Taken ??? | Developed ??? | Scanned 35mm slide

 where "???" is placeholder text for slide-specific information that must be entered manually.

I explained in part 3 how I take what I know about when a picture was taken and turn it into a timestamp for the standard "when taken" metadata fields. I also explained that I expect the scanner to automatically write a "when scanned" timestamp into the image file. Scanners can also be configured to write a copyright notice into one or more of the standard "copyright" fields. Because that can be made fully automatic and might protect my interests, I do it.

The following, then, is the metadata I write and the complete names of the fields I write into. (In posts prior to this one, I've sometimes omitted the namespace specifier when discussing XMP fields.)

  • A description of the image, including what's in it, when it was taken, when it was developed, and the source that gave rise to it. This is written to Exif's ImageDescription, IPTC's Caption-Abstract, and XMP's dc:description fields.
  • A "when taken" timestamp. It goes into Exif's DateTimeOriginal, IPTC's DateCreated and TimeCreated, and XMP's photoshop:DateCreated fields.
  • A "when scanned" timestamp. It's written to Exif's DateTimeDigitized, IPTC's DigitalCreationDate and DigitalCreationTime, and XMP's xmp:CreateDate fields.
  • A copyright notice, which is put into Exif's Copyright, IPTC's CopyrightNotice, and XMP's dc:rights fields.

The guidelines from the Metadata Working Group specify that the corresponding Exif, IPTC, and XMP fields for descriptions, "when taken", "when digitized," and copyright should be kept in sync, so some (but not all) programs will update all three fields in a set if you write to any of them. With ExifTool, you can use the MWG composite fields Description, DateTimeOriginal, CreateDate, and Copyright to set a value for the fields in all three standards at once.

An alternative to explicitly writing the same values to fields in Exif, IPTC, and XMP is to write values to the fields for one of these standards, then copy them into the fields for the others. For example, scanning software and a GUI program could be used to write values to Exif fields, and ExifTool could be used to copy the Exif values into the metadata blocks for IPTC and XMP. Given a file named myScannedImage.jpg, this command would do the trick:

exiftool -ApplicationRecordVersion=4

Minor variations on this command would use IPTC or XMP instead of Exif as the source of the fields to be copied. 

Yes, this looks like black magic, and no, I'm not going to explain how it works. (ExifTool has very comprehensive online documentation.) It looks even blacker when you type the command on a single line, which is how you'd typically do it. My point is that this approach guarantees consistency among Exif, IPTC, and XMP, yet requires manually entering information only for Exif. ExifTool can be applied to many files at once, so if you have lots of files with fields to copy, it can make quick work of a big job.

I want the metadata I embed in image files to be as widely and easily accessible as possible, so my approach is very conservative. I use only widely supported, standard fields, and I'm careful to put the same values into the Exif, IPTC, and XMP fields that are supposed to mirror one another.

Wednesday, January 12, 2022

Image Metadata: Dealing with Timestamps

This is part 3 of my series on metadata for scanned pictures. The series starts here.


Several image metadata fields store timestamps (i.e., dates and/or times). As you'll see, the field names can exasperate, but the bigger problem is that the fields request more precision than anybody with scanned images is likely to have.

The most important timestamp identifies when the picture was taken. Digital cameras know this down to the second, but for pictures from the age of film, such precision isn't available. For example, in my first post in this series, I mentioned a slide that my wife asked me to track down. What I'll call the sample slide is shown at right. I had the foresight to write on its frame when it was taken, but I wrote only July 1992. I don't know what day in July, and I certainly don't know the time.

Exif and XMP (but not IPTC) timestamps are permitted to omit unknown date and time information, but word on the net is that partial timestamps are uncommon and that they're likely to confuse programs that encounter them. Because I want my metadata to be compatible with as wide a variety of programs as possible, I've decided to avoid them. 

That commits me to providing a complete timestamp for each metadata field that wants one. But when I don't know the month (or the day or the time) when a picture was taken, what should I use for the values I don't have? The convention among image metadata-istas is to use the earliest permissible values: 1 for missing days and months and 00:00:00 for missing times. Per this convention, the timestamp for when the sample slide was taken is 1992:07:01 00:00:00.

I'm not wild about this convention. When you order images chronologically, it has the effect of putting images with unknown months, days, or time in front of images with more detailed information. A picture known to have been taken on July 15, for example, is ordered after a picture known only to have been taken sometime in July. I was recently looking through scans of pictures from my wife's and my wedding and honeymoon, and the honeymoon pictures were listed before those from the wedding. That's because I know the date of the wedding, but on the honeymoon pictures, I noted only the month and year. It's been a long time since my wife and I got married, so I could be mis-remembering, but I'm pretty sure that the wedding came first.

I believe it would make more sense to have images with missing information sit in the chronological back of the bus, i.e., to order them after the images with more specific information. That'd be easy to do (just use the latest valid value for unknown days, months, and times instead of the earliest), but I decided against it. In addition to running counter to convention, it's more error-prone. If you use the last day of the month as the day a picture was taken when you don't know the actual day, you have to deal with the fact that different months have different numbers of days, and the number of days in February depends on the year. When scanning photos, the date the picture was taken has to be entered manually, so the process should be as simple as possible. Setting unknown months and days to 1 is about as simple as it gets, and an "unknown time" value of 00:00:00 is a lot easier to enter than 23:59:59 (which is what you'd have to use for unknown times in order for them to follow known times).

Only some of my slides and photos have annotations telling me when they were taken. For those that don't, I fall back on when they were developed. In the case of slides, that's typically marked on the slide frame. For the sample slide, the development date is August 1992. If I had no information about when the slide was taken, that's what I'd use.

This policy means that for an image whose metadata timestamp says it was taken on July 1, 1992, it's impossible to distinguish among these possibilities:

  • The picture was taken on July 1, 1992.
  • The picture was taken in July 1992, but I don't know which day.
  • I don't have information about when the picture was taken, but I know the film was developed in July 1992.

I address this ambiguity by putting what I actually know into the "description" metadata fields for the picture. These fields have different names in Exif, IPTC, and XMP. Exif uses ImageDescription. IPTC goes with Caption-Abstract. In XMP, the field is dc:description

Many (but not all) programs that edit metadata tie these fields together. If you edit one, the others are updated automatically. ExifTool takes a different approach. There, if you write to one of the "description" fields, only that single field is affected. If you want to update them all (and you certainly want to keep them in sync!), you can write to the MWG composite field, Description. That propagates the change to all of Exif, IPTC, and XMP.

For the sample slide, I put this information into its description:

Taken 7/1992
Developed 8/1992

My policy implies that when I encounter an image file with a day of 1 for when it was taken, I have to check its description to find out what the 1 means. The metadata timestamp for when the picture was taken is an approximation. What's actually known is in the image's description. 

This approach generalizes to pictures where the "when taken" information is too vague to put into date/time format. For example, if I have nothing telling me when a picture was taken or developed, but I can guess that it was taken in the late 1970s, I can leave the "date taken" fields empty and write what I know in the description (e.g., "Taken in the late 1970s--look at those clothes!") . 

Naturally, "Date Taken" is not the name of a standard metadata field. That'd be too easy. The Exif field name is DateTimeOriginal. XMP calls it DateCreated. IPTC has two fields, one for the date (DateCreated) and one for the time (TimeCreated). Note that DateCreated in XMP is both a date and a time, but DateCreated in IPTC is just a date.

Programs manipulating metadata timestamps may or may not propagate changes in one field to the corresponding fields in other metadata blocks. In my experience, it's easier for these fields to get out of sync than it is with description metadata.

ExifTool's approach to "date taken" mimics that for description information. Individual timestamp fields can be written, but it's also possible to write to an MWG composite field representing the three fields that should mirror one another. For the "date taken" timestamp, the composite field's name is DateTimeOriginal (the same name that Exif uses), so using ExifTool to write the MWG DateTimeOriginal field has the effect of putting a value into the corresponding "date taken" fields for Exif, IPTC, and XMP. 

The date and time when a picture was taken is typically the most important timestamp for a scanned image, but it might also be useful to know when the scan was performed. I expect scanners to be able to automatically insert this information into the metadata. I don't have any specific use for this timestamp, but since recording it should incur virtually no cost, I want to do it. You never know what information might be useful in the future.

The Exif field for when an image took digital form is DateTimeDigitized. IPTC again uses two fields, DigitalCreationDate and DigitalCreationTime. XMP calls it CreateDate. CreateDate is also the name of ExifTool's composite field for all these fields.

Note that the IPTC and XMP DateCreated fields refer to when a picture was taken. The XMP and ExifTool CreateDate fields refer to when it was digitized. I think this is a terminological train wreck, but, sadly, this is the only train in the station.

Tuesday, January 11, 2022

Image Metadata: Standards, Guidelines, and ExifTool

This is part 2 of my series on metadata for scanned pictures. The series starts here.

Image metadata is a field that loves abbreviations. A good entry point is the names of the three most important standards:
  • Exif (often written EXIF) was developed by camera manufacturers. It primarily addresses low-level information about a digital picture, such as the make and model of the camera used to take it, the exposure settings, the date and time it was taken, etc. However, it has a few fields for higher-level information, such as the copyright holder and a description of what's in the picture.
  • IPTC comes from news organizations and generally aims at higher-level issues, such as photo captions, lookup keywords, copyrights, and the like. The original "legacy" IPTC standard was IIM. It's known as IPTC-IIM. That was succeeded by IPTC-Core and IPTC-Extension, but IPTC-IIM is still widely used, so in practice, there are three IPTC standards to be aware of.
  • XMP was developed by Adobe as a more general approach to metadata than Exif and IPTC. XMP can represent all Exif and IPTC metadata, plus much more. XMP groups its fields into namespaces. The XMP Exif namespace, for example, provides fields for metadata defined by Exif. A particularly important namespace is Dublin Core, which defines fields applicable to more than just images, e.g., to audio, video, and printed information. Among these fields are those for copyright and descriptive information. Trivia lovers will delight in knowing that the Dublin in Dublin Core is in Ohio, not Ireland.
IPTC-Core and IPTC-Extension are implemented using XMP technology, so it is not uncommon to lump these three standards together, even though they're different. A consequence is that IPTC-IIM is often just called IPTC. Sometimes it's simply referred to as IIM.
XMP became an international standard in 2012. Because it's more expressive than Exif and is the underpinning of IPTC-Core and IPTC-Extension, I think many people assumed that XMP would quickly replace Exif and IPTC for metadata storage. Life would be simpler if it had. It'd also be simpler if Exif and IPTC covered disjoint sets of information. But XMP hasn't taken over, and Exif and IPTC have significant overlap, so many important metadata fields in an image file exist in three places: one in each of the Exif, IPTC, and XMP parts of the file. 

In 2008 and 2010, the Metadata Working Group (MWG) published a set of guidelines for programs that have to deal with this mess. Alas, if the (really outstanding) work by Carl Siebert in 2017-18 accurately reflects how current programs read and write redundant metadata fields, the guidelines have hardly brought order to the chaos. Different programs treat the equivalent Exif, IPTC, and XMP fields in different ways. The MWG itself appears to have given up. Its web site (www.metadataworkinggroup.org) is no longer reachable, which is why my link to the MWG guidelines uses the Internet Archive's Wayback Machine. 

All serious travelers through the image metadata wilderness eventually make their way to ExifTool. An astonishingly powerful program for metadata manipulation, its influence is such that when the name it uses for a standard field differs from the name used in the standard, the ExifTool name tends to dominate. For example, the Exif field holding the date and time when a digital image was created (e.g., the date/time when a photo is scanned) is called DateTimeDigitized. ExifTool calls that field CreateDate. Many metadata workers casually refer to Exif's DateTimeDigitized as CreateDate, in part because that's ExifTool's name for the field and in part because it's the name of the corresponding field in XMP. This can confuse the uninitiated (as I was not that long ago), because searching the Exif standard for CreateDate turns up nothing. 

Interestingly, ExifTool did not extend the use of  the name "CreateDate"to IPTC (i.e., to IPTC-IIM). IPTC splits the date and time of digitization into separate fields, DigitalCreationDate and DigitalCreationTime, and ExifTool uses the IPTC names for these fields.

Notwithstanding its name, ExifTool reads and writes more than Exif metadata. It also handles metadata defined by IPTC, XMP, and a variety of lesser standards. In addition, it offers "composite" fields derived from the MWG guidelines. These fields make it possible to simultaneously write to all the fields in Exif, IPTC, and XMP that are supposed to contain the same value. As you'll see in a later post, I take advantage of this capability when putting metadata into the files for my scanned pictures.

Monday, January 10, 2022

The Scanned Image Metadata Project

Not long ago, my wife asked if I could find a particular photograph. I dug up what turned out to be a slide from 1992. The exercise reminded me that the bulk of our photographic history exists only in non-digital form: slides, prints, and negatives. That puts it one disaster away from annihilation. A fire, a flood, a theft, and we lose everything. Not that a sudden catastrophe is necessary. Slides, negatives, and prints degrade over time. Colors shift. Details fade.

I've known for many years that I should have our pictures scanned into digital form. In 2008, I looked down that road, but I was stymied by the challenge of storing metadata. Getting images into files is easy. Capturing the metadata for the pictures--who's in them, when and where they were taken, etc.--is anything but. 

The image metadata problem is an old one. News photographers have long needed a way to electronically convey photos and associated information to their central offices. By 1991, there was a technical standard for it. Thirty-plus years later, you'd think we'd have a well-established, straightforward way to handle image metadata. You'd be wrong. As a comment at Stack Exchange Photography put it last month, "Image and video metadata is a complete hot mess."

There are two basic reasons for this. First, there are three overlapping standards for metadata storage. All are in broad use. Terminology and conventions within and among them are inconsistent and confusing. One standard's Description field is another standard's Caption_Abstract, for example, and that's sometimes referred to simply as Caption. It's different from the Title field, which is not to be confused with the UserComment field.

The second issue is that programs working with metadata layer on additional inconsistent and confusing names. It's not easy to remember that one standard's DateTimeOriginal field is called DateCreated in some programs, but DateCreated is completely different from CreateDate, which is the name some programs use for a field officially called DateTimeDigitized. Though the Title field is not the same as the Description field, File Explorer and Photo Viewer on Windows 10 sometimes show the value of the Description field with the label Title. Sometimes with the label Subject. Occasionally with both.

Mastering the name game is one challenge. Dealing with redundancy is another. Each image file typically has three description fields, for example, one per standard. Do you write the same data into all three fields, thus ensuring consistency, but risking incoherence if one of the fields is edited, or do you write to only a single field and leave the other two blank? Sorry--trick question! Many programs automatically write to all three fields, even if you edit only one. At the same time, some programs that show descriptions read from only one of the fields, so if the one they look at is empty, you won't see anything, even if other description fields have information in them. Redundancy and potential inconsistency are, sadly, the only practical choice.

Little wonder that some people throw up their hands and look for a solution not involving embedded metadata. One approach is to store the metadata separately from the image, often using the image file's name as a key to look up in a spreadsheet or text file. For me, this as a non-starter. It's too easy for the image and the metadata to get separated. Another approach is to use an image's metadata as its file name. This is clumsy even in concept ("Joe, Bob, Sue, Fred at Lincoln Beach celebrating Bob's retirement 1980-07-16.jpg"), but a bigger problem is that it doesn't address photos stored in the cloud (where file names may not be visible) and photos sent via text message (where the sender's file name is not provided). Image file metadata is a mess, to be sure, but it's still the best of a bad lot.

I want to store metadata about a scanned photo in its image file such that it will be easily accessible in any program that displays metadata. Unless expressly removed from the file, the metadata should stay with the image if it's copied, moved, emailed, texted, uploaded, or shared in the cloud. The comments written on the back of a physical photograph stay with the photo as it's moved about. Image metadata should do the same.

Achieving my goal requires figuring out the following:

  • What metadata should be stored.
  • Which metadata fields it should be stored in.
  • How to put metadata into those fields.
  • How to view metadata in an image file.
  • How to preserve metadata when an image is moved around (e.g., emailed, texted, uploaded, etc.)
In recent weeks, I've spent a lot of time wrestling with these issues. In subsequent blog posts, I'll explain what I've learned and the conclusions I've come to.