Tuesday, January 11, 2022

Image Metadata: Standards, Guidelines, and ExifTool

This is part 2 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool (this post)

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

Image metadata is a field that loves abbreviations. A good entry point is the names of the three most important standards:
  • Exif (often written EXIF) was developed by camera manufacturers. It primarily addresses low-level information about a digital picture, such as the make and model of the camera used to take it, the exposure settings, the date and time it was taken, etc. However, it has a few fields for higher-level information, such as the copyright holder and a description of what's in the picture.
  • IPTC comes from news organizations and generally aims at higher-level issues, such as photo captions, lookup keywords, copyrights, and the like. The original "legacy" IPTC standard was IIM. It's known as IPTC-IIM. That was succeeded by IPTC-Core and IPTC-Extension, but IPTC-IIM is still widely used, so in practice, there are three IPTC standards to be aware of.
  • XMP was developed by Adobe as a more general approach to metadata than Exif and IPTC. XMP can represent all Exif and IPTC metadata, plus much more. XMP groups its fields into namespaces. The XMP Exif namespace, for example, provides fields for metadata defined by Exif. A particularly important namespace is Dublin Core, which defines fields applicable to more than just images, e.g., to audio, video, and printed information. Among these fields are those for copyright and descriptive information. Trivia lovers will delight in knowing that the Dublin in Dublin Core is in Ohio, not Ireland.
IPTC-Core and IPTC-Extension are implemented using XMP technology, so it is not uncommon to lump these three standards together, even though they're different. A consequence is that IPTC-IIM is often just called IPTC. Sometimes it's simply referred to as IIM.
 
XMP became an international standard in 2012. Because it's more expressive than Exif and is the underpinning of IPTC-Core and IPTC-Extension, I think many people assumed that XMP would quickly replace Exif and IPTC for metadata storage. Life would be simpler if it had. It'd also be simpler if Exif and IPTC covered disjoint sets of information. But XMP hasn't taken over, and Exif and IPTC have significant overlap, so many important metadata fields in an image file exist in three places: one in each of the Exif, IPTC, and XMP parts of the file. 

In 2008 and 2010, the Metadata Working Group (MWG) published a set of guidelines for programs that have to deal with this mess. Alas, if the (really outstanding) work by Carl Siebert in 2017-18 accurately reflects how current programs read and write redundant metadata fields, the guidelines have hardly brought order to the chaos. Different programs treat the equivalent Exif, IPTC, and XMP fields in different ways. The MWG itself appears to have given up. Its web site (www.metadataworkinggroup.org) is no longer reachable, which is why my link to the MWG guidelines uses the Internet Archive's Wayback Machine. 

All serious travelers through the image metadata wilderness eventually make their way to ExifTool. An astonishingly powerful program for metadata manipulation, its influence is such that when the name it uses for a standard field differs from the name used in the standard, the ExifTool name tends to dominate. For example, the Exif field holding the date and time when a digital image was created (e.g., the date/time when a photo is scanned) is called DateTimeDigitized. ExifTool calls that field CreateDate. Many metadata workers casually refer to Exif's DateTimeDigitized as CreateDate, in part because that's ExifTool's name for the field and in part because it's the name of the corresponding field in XMP. This can confuse the uninitiated (as I was not that long ago), because searching the Exif standard for CreateDate turns up nothing. 

Interestingly, ExifTool did not extend the use of  the name "CreateDate"to IPTC (i.e., to IPTC-IIM). IPTC splits the date and time of digitization into separate fields, DigitalCreationDate and DigitalCreationTime, and ExifTool uses the IPTC names for these fields.

Notwithstanding its name, ExifTool reads and writes more than Exif metadata. It also handles metadata defined by IPTC, XMP, and a variety of lesser standards. In addition, it offers "composite" fields derived from the MWG guidelines. These fields make it possible to simultaneously write to all the fields in Exif, IPTC, and XMP that are supposed to contain the same value. As you'll see in a later post, I take advantage of this capability when putting metadata into the files for my scanned pictures.

No comments: