Tuesday, January 25, 2022

Image Metadata: The Metadata Removal Problem

 This is part 6 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem (this post)

Part 7: Thoughts after 4000+ Scans


 

When I embarked on this project, I knew it'd be a challenge to figure out how to put metadata into image files. I expected that some programs would be better than others at showing the metadata I'd put in. But I didn't realize I'd have to contend with programs that silently strip metadata when you ask them to do something completely different. Caroline Guntur's blog post opened my eyes:

Many cloud platforms and social media sites will not upload, or retain the [metadata] in your photos. Some will even strip the information completely upon download.

So I can upload an image file with metadata, but the uploaded file might not have it. Or I can download a file with metadata, but the downloaded file might not have it. Ouch!

I shouldn't have been surprised. Especially on social media sites, photo metadata has acquired a reputation as a security and privacy risk. The GPS coordinates for where a photo was taken (typically included in the metadata by cell phones) have drawn particular attention. Some sites have responded by removing most or all metadata from uploaded images (sometimes while keeping it for their own use). That has drawn the ire of many photographers, who have been understandably unhappy about having, among other things, their embedded copyright notices removed from their pictures.

It got me to wondering: if uploading and downloading images may affect their metadata, what about other ways of moving files around? Is email safe? Texting? I decided to do some poking around.

I looked into two basic scenarios:

  • Upload/Download: Is metadata maintained in image files that are uploaded to a web site or cloud service and then downloaded? This scenario covers social media sites like Facebook, Instagram, and Twitter, as well as cloud storage platforms from Google, Apple, Amazon, etc.

  • Point-to-point Communication: Is metadata maintained in images sent via email, texting, or instant messaging (e.g., WhatsApp and Facebook Messenger)? And what about Airdrop, Apple's close-range wireless mechanism for transferring files from one device to another?

Upload/Download Scenarios

IPTC is not just the name of a metadata standard. It's also the abbreviation for the organization that created it: the International Press Telecommunications Council. Among its activities is looking out for the intellectual property rights of its members. One of the ways it does that is by checking how well a variety of web sites adhere to the IPTC's request that metadata in uploaded image files be left intact. Every three years since 2013, the IPTC has tested a variety of sites to see whether they retain four fields the IPTC considers particularly important: Caption/description, Creator, Copyright Notice, and Credit Line ("the 4Cs"). The latest results (from 2019) cover 16 sites and are here. I encourage you to read the report (it's not long), but the highlights are that "good" sites (i.e., those retaining the 4Cs) include Flickr, Google Photo and Drive, Dropbox, and Microsoft OneDrive. The "bad" sites (i.e., those not retaining the 4Cs) include Instagram, Facebook, and Twitter.

The IPTC's test results are interesting, but they're silent regarding the retention of the two timestamps I care about ("when taken" and "when scanned"), and they have nothing to say about  Apple's iCloud, which I think is a serious omission. I decided to do some testing of my own.  

It's useful to distinguish sites whose primary purpose is storage and accessibility from those whose primary purpose is sharing. Google Photos and Apple iCloud Photos, for example, push themselves as services that let you securely store your photos (and videos) in the cloud and have them accessible from all your devices. They support sharing photos with others, but that's not their primary purpose. You could easily make use of these services without ever sharing anything.

In contrast, the primary reason to upload photos to social media services like Facebook, Instagram, and Twitter, is to share them with others. The purpose of uploading photographs is for other people to see them.

Sites for Storage and Accessibility

I uploaded an image file to the following services, then I downloaded it and checked to see if the Exif, IPTC, and XMP copies of the four fields I use (description, copyright, "when taken", and "when scanned") remained intact. My findings were consistent, both with one another and with the results of the IPTC's testing:

  • Google Photos: All my metadata was preserved.
  • iCloud Photos: All my metadata was preserved.
  • Google Drive: All my metadata was preserved.
  • iCloud Drive: All my metadata was preserved.
  • Microsoft OneDrive: All my metadata was preserved.
  • CrashPlan for Small Business: All my metadata was preserved.

This is reassuring. Storing an image file in cloud storage is unlikely to change its metadata. This is good news for those of us who believe in cloud-based backups.

My experiments were based on the default behavior for these sites, and I suspect that's the case for the IPTC's, too. According to Consumer Reports, Flickr can  be configured to omit metadata when images are downloaded, and it's possible that the same is true of other storage and accessibility sites. However, anybody who configures a site to omit metadata in downloaded images is hardly in a position to complain if images downloaded from that site lack metadata.

Sites for Sharing

Social media sites such as Facebook and Twitter are perhaps the best known sharing-oriented web sites, but the umbrella over such sites is broader than that. Also covered are dating sites (e.g., Tinder and eHarmony), for example, as well as sites for selling things (e.g., eBay and craigslist). 

I didn't test how these sites handle image metadata, because others (e.g., Consumer Reports and Kapersky, in addition to the IPTC) have covered this ground better than I could. They've all come to the same conclusion: social media and other sharing-based sites typically remove metadata from uploaded photographs. 

Social media and other sharing-based sites are a poor choice if you want to share not just pictures, but also their metadata.

Point-to-Point Communication

The point-to-point communication mechanisms I considered are email, texting and instant-messaging, and Apple's Airdrop. I did little experimentation of my own, because this terrain has also been well explored by others.

On the email front, the consensus is that image files sent via email retain their metadata. I did a few simple tests, and my results showed the same: metadata was preserved.

Email can contain images either inline (i.e., displayed in the message itself) or as attachments. In 2020, Craig Ball published a blog post describing how inline images in email appeared to have no metadata, while attached images did. His investigation revealed that the inline images he received did, in fact, contain all the metadata in the images that had been sent, but the metadata somehow got stripped during the process of saving an inline image as an independent file. The blog post went on to explain how to work around the problem.

To see if I could reproduce his results, I emailed an image to myself twice, once as an attachment and once as an inline image. In both cases, I was able to see the metadata without any trouble. However, the email client I used was Thunderbird, whereas Ball used Gmail and Outlook. That could explain why we experienced different behaviors.

It's comforting that Ball's conclusion aligns with the consensus that images sent via email retain their metadata. At the same time, it's disturbing that extracting an inline image from a message may cause its metadata to be removed. Sigh.

But that's email. These days, more photos are probably sent by text or instant message. How does image metadata fare when communicated in those ways?

On the instant-messaging front, things are clear. I didn't run any tests myself, because the net community speaks with a single voice:

  • WhatsApp removes image metadata.
  • Facebook Messenger removes image metadata.
  • Signal removes image metadata.
  • Telegram removes image metadata.

There are ways to work around this behavior (e.g., by sending photos as documents), but the fact remains that these instant-messaging services redact photo metadata as a matter of policy.

When we shift from instant messaging to good, old-fashioned, ordinary texting, the air is fogged by the fact that smart phones typically obscure whether you're engaging in good, old-fashioned, ordinary texting. Users of the Messages app on Apple devices, for example, typically communicate with one another via iMessage. iMessage is an internet-based protocol that is quite different from the cell phone system's SMS/MMS technologies (which underlie good, old-fashioned texting). iMessage works only between Apple devices and only when an internet connection is available, so for texting to or from non-Apple devices or when internet access is lacking, the Messages app employs SMS/MMS. The protocol used for a particular sent message is indicated in Messages by the bubble color (blue for iMessage, green for SMS/MMS), but all incoming messages look the same (grey bubble), regardless of whether they were transmitted using iMessage or SMS/MMS.

This means that a text message sent or received using Messages might be a "normal" text (conveyed via SMS/MMS), but it might be an iMessage text, depending on whether the other party (or parties) in the conversation were using Apple devices and whether an internet connection was available. My understanding is that a similar bifurcation exists on Android devices, where the Google Messages app may send and receive messages using either RCS or SMS/MMS, depending on the capabilities of the parties' devices and those of their service providers.

The effect of texting on image metadata appears to be:

  • Photos sent using the iMessage protocol retain their metadata. This is both the wisdom of the net as well as my personal experience. Photos texted between Apple devices arrive with their metadata intact (unless the lack of an internet connection causes Messages to fall back on SMS/MMS),
  • Photos sent using the RCS protocol retain their metadata. It's harder to find information about RCS than iMessage, but the sources I consulted (e.g., here and here) agree on this point. Photos texted between devices running Android should arrive with their metadata intact (provided both sender and recipient(s) are using RCS).
  • Photos sent using SMS/MMS may retain their metadata. This is the scenario that applies to texts between different kinds of devices (e.g., between iOS and Android devices). Most (but not all) Internet sources I consulted said that MMS strips metadata. My favorite overview of the situation is by Dr. Neal Krawetz. His summary is that "the entire delivery process for texted pictures is just one bad handling process after another." I lack the expertise to evaluate the accuracy of his analysis, but it looks quite plausible, and it would explain the varying behavioral descriptions I found elsewhere on the internet. I feel confident in stating that transmitting photos via SMS/MMS might retain their metadata.
Stepping back from the details, we can say that instant messaging apps scrub metadata from photos, and sending photos by text may or may not have it scrubbed. Texting photos between Apple devices is a good bet as regards metadata retention, but it's important to make sure that both sender and receiver see blue bubbles in the Messages app.

The final point-to-point communication mechanism I looked at is Apple's Airdrop. I'd always thought of Airdrop as simply a way to wirelessly copy a file from one Apple device to another, but that's not quite right. A standard file copy entails copying a sequence of bytes from one place to another. What the bytes represent (e.g., a document, an image, the state of a game) is immaterial. The copying program doesn't care what the bytes are for. It just copies them.

Copying an image file in that manner would copy the file's metadata, because the copying program wouldn't care that it's an image file. It would simply copy the bytes, just like it would with a document or a game state, etc. But that's not how Airdrop behaves. By default, metadata is removed from pictures that are Airdropped. This can be overridden by enabling the "All Photos Data" option, but it's a non-sticky setting, so it has to be explicitly enabled each time Airdrop is used to copy images from one device to another. 

Airdrop's "strip metadata by default" behavior makes it less convenient and less reliable for sharing photos with metadata than a simple file-copying program would be.

Conclusion

Once you get metadata into an image file, you don't want to accidentally lose it, either for yourself or for those with whom you want to share it. The safest things you can do with image files (from the perspective of metadata retention) are to upload them to sites designed for storage and accessibility (as opposed to sharing) and to send them via email. The worst things you can do (again, from the perspective of metadata retention) are to upload them to sharing-oriented sites (e.g., social networks) or to text them using instant-messaging services.

Tuesday, January 18, 2022

Image Metadata: Viewing What I Wrote

This is part 5 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote (this post)

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

Just because an image file contains metadata doesn't mean that the metadata is visible or recognizable as what it is. Lots of programs can display metadata. Each has its own quirks. I put only four pieces of metadata into my image files, but most of the programs I tested show only some of these. The fields that are displayed may be labeled differently from both the standard names and the names used by the program used to put the metadata into the file. Some programs apply a name from one standard to a field from a different one.

It is, as usual, a mess. The closer you look, the messier it gets. I've performed numerous experiments, and the stories I could tell...  

But I won't. The way to deal with the mess is to not look very closely. My goal is to produce image files with metadata that I can share with others. I already know how to view an image's metadata, so the real question is whether other people can see it. 

There's no reason to expect friends and family members, etc., to know anything about Exif, IPTC or XMP. However, they'll know descriptive text or a copyright statement when they see it, and if they see a date and time, they'll assume that's when the picture was taken. If they see another date and time that says something about when the picture was scanned or digitized, they are unlikely to be confused.

Inspired by Carl Seibert's survey of how different programs prioritize Exif, IPTC, and XMP when reading metadata, I examined a dozen programs to see how well they made the metadata visible for my sample side from part 3 (shown at right). Although a couple of the programs are aimed at more serious users, most of the 12 are stock apps that come as part of the operating system. They're the programs likely to be used by people with no special interest in metadata. All of the programs I looked at are free. 

The high-level takeaway is that the most important metadata stored in my scanned image files is pretty accessible for anybody who knows to look for it. Things could be better, but they're not bad. As such, my approach to embedding metadata in image files seems to be reasonable.

I scored each program I looked at on a 10-point scale. Points were awarded as follows:

  • 6 points if the image's metadata description is fully visible. If this requires making a window wider or putting a phone into landscape mode, that's fine. I used this description (from part 4 of this series) for testing:

Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide

  • 3 points if the metadata description is partially visible, but can't be made fully visible. A partially visible description tells the person looking at the picture that descriptive information is present, but it's not as good as showing the entire description.

  • 2 points for showing the date when the picture was taken such that a viewer could reasonably assume that that's what the timestamp represents.

  • 1 point for displaying the copyright notice (even if it's only partially visible).

  • 1 point for showing the date and time scanned in a way that makes it recognizable as what it is.

I weight the description field heavily, because it contains the two most important pieces of metadata: what's in the picture and when it was taken. (Recall from part 3 that the "when taken" field holds only an approximation. The actual "when taken" information is part of the description.) If the description is visible, and especially if it's fully visible, that's all most people need.

I issue a big penalty for programs that engage in what I consider a grossly deceptive practice:

  • -6 points if the image's description metadata is not visible, but the program offers its own description field that, if used, stores the entered information, but not in the image file. In other words, a program loses 6 points if it offers a field that looks like an image's metadata field for a description, but isn't. 

Only one program incurred this penalty. I don't want to give anything away, so I'll just say that it carries a company name that rhymes with "Boogle".

The scores tell only part of the story. 10 means that a program can display all the metadata I store in a recognizable form, but it doesn't mean that getting it to do that is straightforward. For details, read the per-program overviews that follow.

Programs on Windows 10

Of the following six programs, three (Windows File Explorer, Windows Photo Viewer, and the Microsoft Photos App) are included with Windows. The other three (XnView MP, Adobe Bridge, and ExifTool) must be downloaded and installed separately.

Windows File Explorer and Windows Photo Viewer (Score: 6)

These two programs show image metadata the same way: on the Details tab of a file's Properties dialog. This dialog displays a limited-width view of the description (3 points) and copyright (1 point), as well as the "when taken" timestamp (2 points). There's no timestamp for when the image was scanned. The fact that the description is displayed twice and is labeled both Title and Subject is strange, but both fields are in the Description section of the tab, so I think things are clear enough. 

Both of these programs ship with Windows 10, but my understanding is that Photo Viewer is hidden in some installations in favor of the Photos app. From a metadata point of view, that's a big step backwards, as we'll see next.

Photos App (Score: 2) 

Clicking on "..." and selecting "ⓘ File Information" when viewing a photo in the Photos app brings up a panel with metadata information. Of the four fields I write into image files, only when the photo was taken is displayed (2 points). This is disappointing for a dedicated photos app, and it's notably worse than Windows Photo Viewer, which is the program the Photos app replaced.

XnView MP (Score: 10)

XnView MP is my default image viewer, and that was the case before I started worrying about metadata. Its score of 10 indicates that it shows all the information I put into image files, but the plethora of metadata viewing options takes some getting used to. 

Everything starts with the Edit menu, which includes entries for "Edit comment...", "Edit IPTC...", and "Edit XMP...". For purposes of viewing metadata, none of these is correct. What you want is "Properties..." (also on the Edit menu). Selecting it brings up a window with multiple tabs, including one for each of Exif, IPTC, XMP, and ExifTool.

The Exif tab does the best job of showing all the metadata I embed, with each of the four fields clearly labeled and near the top of the window. On its own, this tab scores a 10.

The IPTC-IIM tab also shows all the fields, but the timestamp for when the image was scanned is unrecognizable unless you know that the hexadecimal codes for the relevant timestamp fields are 0x3e and 0x3f. No "normal" person would know that, so the IPTC tab loses the point for showing the date/time scanned and ends up with a 9. 

The XMP tab shows everything, but I'd expect the similarity of the names for the "when taken" and "when scanned" fields (DateCreated and CreateDate) to sow confusion and uncertainty. I give the tab credit for neither, and it gets a 7.

The ExifTool tab shows the results of running the copy of ExifTool that's embedded inside XnView MP. The amount of information can be overwhelming, but everything's there. It's there three times, in fact, once each for Exif, IPTC, and XMP. Taken by itself, the ExifTool tab scores a 10, but the Exif tab remains the easier way to get the information.

Adobe Bridge (Score: 10)

Bridge is Adobe's free companion to Photoshop and Lightroom. It's designed to organize and manage photos, not to change their appearance. Using Bridge, you can view and edit metadata, but you can't change what a picture looks like. 

It's reasonable to expect people who use Bridge to have an above-average familiarity with image metadata.

Bridge's metadata panel is divided into several sections, including ones for Exif, IPTC IIM, IPTC Core, and IPTC Extension. XMP appears to be missing until you recall (from part 2) that IPTC Core and IPTC Extension are sometimes used synonymously with XMP. No single section shows all the fields I write, but everything is present: the IPTC-IIM and IPTC Core sections have the description, "when taken" timestamp, and copyright notice, and the Exif section has the "when scanned" timestamp.

ExifTool  (Score: 10)

ExifTool is a command line program, though GUIs have been built on top of it. It's the go-to power tool in the image metadata world, and it didn't take me long to regard it as the source of truth for metadata in image files. Different programs label the metadata they show in different ways, so when you look at a field value, it can be hard to know exactly what you're looking at. Some programs lie. The Preview App on MacOS, for example, has tabs for Exif and IPTC, but there are conditions under which the values on those tabs come from XMP! Since metadata in image files can be seen only with the aid of programs that know how to read it, how do you know which programs to trust? I trust ExifTool.

It's hard to imagine anybody using ExifTool without knowing about Exif, IPTC, XMP, and the various fields they offer. I therefore score ExifTool with the expectation that it's being used by somebody who brings a fair amount of metadata knowledge to the table. Such users can be expected to recognize the difference between DateCreated and CreateDate. With that in mind, ExifTool scores a 10.

ExifTool's output on the sample slide is an unwieldy 96 lines long if you let it show you everything (which is the default), but if you ask it for only the fields I put into it,

exiftool -S
         -mwg:description

         -mwg:copyright

         -mwg:datetimeoriginal

         -mwg:createdate

         '.\The Brown Experience 1985-1993 031.jpg'

you get this in return:

Description: Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide
Copyright: © 2022 Scott Meyers (smeyers@aristeia.com), all rights reserved.
DateTimeOriginal: 1992:07:01 00:00:00
CreateDate: 2022:01:14 17:54:46

The copyright symbol (©) is displayed incorrectly, but that's a problem with Windows PowerShell (where I ran the command), not ExifTool.

Programs on MacOS Big Sur

Each of the three programs I tested on MacOS is included with the operating system.

Finder (Score: 6)

Right-clicking on an image file in the Finder and choosing "Get Info" brings up this window:

It shows the full description in the metadata (6 points), but though timestamps are shown for when the file was created and last modified, there is no sign of the "when taken" and "when scanned" timestamps. The copyright notice is similarly missing. The Finder thus gets a score of 6.

Photos App (Score: 8)

Clicking the ⓘ while viewing a photo in the Photos app brings up its Info window:

It shows the full description (6 points) as well when the photo was taken (2 points), but the "when scanned" timestamp and the copyright notice are not shown. The score for the Photos app is 8.

Preview App (Score: 10)

Viewing image metadata with the MacOS Preview app reminds me of using XnView MP, but with a twist. With XnView MP, the Exif tab shows metadata from the Exif fields, and the IPTC tab shows metadata from the IPTC fields. That's not always the case with the MacOS Preview app. Regardless of how a tab is labeled, it may show metadata drawn from Exif, IPTC and XMP. That's disturbing, but, fortunately, irrelevant for my purposes. Writing the same metadata to corresponding fields in Exif, IPTC, and XMP means that it doesn't matter which field gets read. The Preview app's Exif tab, for example, shows when the photo was taken and when it was digitized (i.e., scanned). This information is correct for my image files, although it's actually pulled from the IPTC metadata instead of that for Exif.

On its own, this tab gets a score of 3: 2 for the date/time when the picture was taken, and 1 for when it was scanned.

The IPTC tab shows everything and thus gets a 10, though I take a dim view of the decision to display the date and time digitized between the date taken and the time taken:

The Preview app also has a TIFF tab. I don't know what kind of metadata this tab is supposed to show, but since all the tabs can show metadata from Exif, IPTC, and XMP, the labels don't really matter. Here's the TIFF tab for the sample slide. It shows the full description (6 points) and the copyright notice (1 point). The value it shows for the "Date Time" field corresponds neither to when the photo was taken nor to when it was scanned, so no points for that. The tab gets a score of 7.

The more I use the Preview app to look at image metadata, the less I like it. It right-justifies field names with respect to the center of the window, and it left-justifies field values with respect to that center, and, as you can see, this leads to a lot of wasted space on the left side of the window. I've often found that widening the window doesn't cause the text inside to be reformatted, so I've had to play games to get all the metadata properly displayed (e.g., force-close the app and then reopen it).

Programs on iOS 15

Photos App (Score: 8)

As of iOS 15, touching the ⓘ icon or swiping up while viewing an image displays the Info pane, which includes the image's full description (6 points) and the date and time it was taken (2 points). There's no sign of the copyright or "date scanned" metadata, so this app gets an 8.

Prior to iOS 15, accessing an image's metadata typically involved saving the image to the Files app, then using the Files app to view the embedded metadata. That continues to work on iOS 15, but it's more cumbersome, and my experience is that even though it displays more metadata fields than the Photos app's Info pane, it doesn't show any of the fields I write to my scanned image files. It would get a score of 0 if I officially evaluated it, but since I'm running iOS 15, I'm going to pretend I know nothing about the Files app workaround.

Google Photos App (Score: -4)

I'm generally impressed with Google's products and services, but the impression its iOS Photos app leaves on me is a depressing mixture of disbelief and anger. 

Pressing "..." while viewing a photo brings up its Info sheet:

It shows the "when taken" timestamp (2 points), but there's no sign of the "when scanned" timestamp, the copyright notice, or the description. Instead, there is an "Add description..." field, which, being empty, suggests that the image lacks a description. For my files, this is not just untrue, but triply untrue, because my scanned image files have description metadata in each of the Exif, IPTC, and XMP fields. As a company, Google knows this, because Google Photos in the cloud (see below) displays the embedded description. 

But that's not the heinous part. Should you, noting the the empty description field, succumb to temptation and put information into it, your text will not be stored in the metadata in the image file! Instead, the information you enter will be stored separately by Google. The same is true of any other edits you make on the Info sheet, e.g., "Add a location" or "Edit date & time". The Info sheet is a place to enter image metadata, but it's not a place to enter image metadata that will be stored inside the image!

This is reprehensible behavior. Hiding metadata present in a image while offering users the chance to add metadata that you'll keep private is...well, words fail me. But math doesn't. I slap on the -6 penalty for grossly deceptive practices, and Google's Photos app for iOS ends up with a record-setting low score of -4.

Cloud Services

There are lots of cloud-based photo storage services. I tested only Google Photos and iCloud Photos, and to be clear, I did it via their web browser interface, not via an app on a computer or mobile device. Among the many services I did not test are Facebook, Flickr, SmugMug, Amazon Photos, Microsoft Onedrive, Degoo, and photobucket. I welcome your comments about viewing image metadata using these services.

In a 2017 blog post, Caroline Guntur wrote,

Many cloud platforms and social media sites will not upload, or retain the [metadata] in your photos. Some will even strip the information completely upon download.

In a later post in this series, I will address what happens to metadata when you move image files around (e.g., upload or download them, email them, text or IM them, etc.). My testing shows that uploading an image to both Google Photos and iCloud Photos has no effect on its metadata--at least not for the four fields I care about. 

Google Photos (Score: 8)

Clicking the ⓘ symbol while viewing a photo opens its Info panel. That panel displays the full metadata description (6 points) as well as the "when taken" timestamp (2 points). The copyright and "when scanned" fields are missing, so the Google Photos cloud service scores an 8.

Like the Google Photos iPhone app, the Google Photos cloud service displays an inviting "Add a description" field at the top of the panel. As with the iPhone app, metadata you enter here is not stored in the image file, but instead in a Google database. 

Unlike the iPhone app, the description metadata already in the file is shown, albeit with the label "Other." Because Google Photos in the cloud displays the description metadata embedded in the file, there's less chance the person viewing the photo will think there's no description for it and will avail themselves of the "Add a description" field. I therefore withhold the six-point penalty here that I impose on Google's iPhone app.

iCloud Photos (Score: 2)

As far as I can tell, the only metadata visible for a photo viewed using the web browser interface to iCloud Photos is the date on which it was taken. It's displayed above the photo being viewed:

That yields a disappointing score of 2. Apple's apps on MacOS and iOS do notably better, and my impression from looking at Apple's support pages is that they expect you to use those apps as much as possible. If you don't have an Apple device, well, presumably that's an incentive for you to get one.

Saturday, January 15, 2022

Image Metadata: My Approach

 This is part 4 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach (this post)

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

In part 3, I mentioned that I use the standard "description" fields to hold what I truly know about when a picture was taken and developed. That's not all I put into these fields. I also include:

  • Descriptive text I have for the picture, e.g., written on the slide frame or the back of the picture. For the sample slide at right, it's "Tim Johnson's equipment". (Update 25 July 2023: per this blog post, I now also include the name of the set of pictures an image is from, if there is one.)
  • The source of the image, e.g., that it came from a slide.

The "description" fields are permitted to contain newlines, but I've found that many programs display only the first line of multi-line values.  I therefore put everything on a single line, and I use vertical bars to separate different pieces of information. For example, this is my "description" value for the sample slide:

Tim Johnson's equipment | Taken 7/1992 | Developed 8/1992 | Scanned 35mm slide

Some of this text is unique to the picture, some is boilerplate (e.g., "Taken" and "Developed"), and some is likely to be repeated in the metadata for other images (e.g., "Scanned 35mm slide"). Programs aimed at metadata entry often support templates that reinforce formatting decisions and reduce the need to enter information more than once. A template for the "description" fields for my 35mm slides could look like this,

??? | Taken ??? | Developed ??? | Scanned 35mm slide

 where "???" is placeholder text for slide-specific information that must be entered manually.

I explained in part 3 how I take what I know about when a picture was taken and turn it into a timestamp for the standard "when taken" metadata fields. I also explained that I expect the scanner to automatically write a "when scanned" timestamp into the image file. Scanners can also be configured to write a copyright notice into one or more of the standard "copyright" fields. Because that can be made fully automatic and might protect my interests, I do it.

The following, then, is the metadata I write and the complete names of the fields I write into. (In posts prior to this one, I've sometimes omitted the namespace specifier when discussing XMP fields.)

  • A description of the image, including the set it's from, what's in it, when it was taken, when it was developed, and the source that gave rise to it. This is written to Exif's ImageDescription, IPTC's Caption-Abstract, and XMP's dc:description fields.
  • A "when taken" timestamp. It goes into Exif's DateTimeOriginal, IPTC's DateCreated and TimeCreated, and XMP's photoshop:DateCreated fields.
  • A "when scanned" timestamp. It's written to Exif's DateTimeDigitized, IPTC's DigitalCreationDate and DigitalCreationTime, and XMP's xmp:CreateDate fields.
  • A copyright notice, which is put into Exif's Copyright, IPTC's CopyrightNotice, and XMP's dc:rights fields.

The guidelines from the Metadata Working Group specify that the corresponding Exif, IPTC, and XMP fields for descriptions, "when taken", "when digitized," and copyright should be kept in sync, so some (but not all) programs will update all three fields in a set if you write to any of them. With ExifTool, you can use the MWG composite fields Description, DateTimeOriginal, CreateDate, and Copyright to set a value for the fields in all three standards at once.

An alternative to explicitly writing the same values to fields in Exif, IPTC, and XMP is to write values to the fields for one of these standards, then copy them into the fields for the others. For example, scanning software and a GUI program could be used to write values to Exif fields, and ExifTool could be used to copy the Exif values into the metadata blocks for IPTC and XMP. Given a file named myScannedImage.jpg, this command would do the trick:

exiftool -ApplicationRecordVersion=4
-MWG:Description<EXIF:ImageDescription
-
MWG:DateTimeOriginal<EXIF:DateTimeOriginal
-
MWG:CreateDate<EXIF:CreateDate
-MWG:Copyright<EXIF:Copyright
myScannedImage.jpg

Minor variations on this command would use IPTC or XMP instead of Exif as the source of the fields to be copied. 

Yes, this looks like black magic, and no, I'm not going to explain how it works. (ExifTool has very comprehensive online documentation.) It looks even blacker when you type the command on a single line, which is how you'd typically do it. My point is that this approach guarantees consistency among Exif, IPTC, and XMP, yet requires manually entering information only for Exif. ExifTool can be applied to many files at once, so if you have lots of files with fields to copy, it can make quick work of a big job.

I want the metadata I embed in image files to be as widely and easily accessible as possible, so my approach is very conservative. I use only widely supported, standard fields, and I'm careful to put the same values into the Exif, IPTC, and XMP fields that are supposed to mirror one another.


Wednesday, January 12, 2022

Image Metadata: Dealing with Timestamps

This is part 3 of my series on metadata for scanned pictures. 

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps (this post)

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

Several image metadata fields store timestamps (i.e., dates and/or times). As you'll see, the field names can exasperate, but the bigger problem is that the fields request more precision than anybody with scanned images is likely to have.

The most important timestamp identifies when the picture was taken. Digital cameras know this down to the second, but for pictures from the age of film, such precision isn't available. For example, in my first post in this series, I mentioned a slide that my wife asked me to track down. What I'll call the sample slide is shown at right. I had the foresight to write on its frame when it was taken, but I wrote only July 1992. I don't know what day in July, and I certainly don't know the time.

Exif and XMP (but not IPTC) timestamps are permitted to omit unknown date and time information, but word on the net is that partial timestamps are uncommon and that they're likely to confuse programs that encounter them. Because I want my metadata to be compatible with as wide a variety of programs as possible, I've decided to avoid them. 

That commits me to providing a complete timestamp for each metadata field that wants one. But when I don't know the month (or the day or the time) when a picture was taken, what should I use for the values I don't have? The convention among image metadata-istas is to use the earliest permissible values: 1 for missing days and months and 00:00:00 for missing times. Per this convention, the timestamp for when the sample slide was taken is 1992:07:01 00:00:00.

I'm not wild about this convention. When you order images chronologically, it has the effect of putting images with unknown months, days, or time in front of images with more detailed information. A picture known to have been taken on July 15, for example, is ordered after a picture known only to have been taken sometime in July. I was recently looking through scans of pictures from my wife's and my wedding and honeymoon, and the honeymoon pictures were listed before those from the wedding. That's because I know the date of the wedding, but on the honeymoon pictures, I noted only the month and year. It's been a long time since my wife and I got married, so I could be mis-remembering, but I'm pretty sure that the wedding came first.

I believe it would make more sense to have images with missing information sit in the chronological back of the bus, i.e., to order them after the images with more specific information. That'd be easy to do (just use the latest valid value for unknown days, months, and times instead of the earliest), but I decided against it. In addition to running counter to convention, it's more error-prone. If you use the last day of the month as the day a picture was taken when you don't know the actual day, you have to deal with the fact that different months have different numbers of days, and the number of days in February depends on the year. When scanning photos, the date the picture was taken has to be entered manually, so the process should be as simple as possible. Setting unknown months and days to 1 is about as simple as it gets, and an "unknown time" value of 00:00:00 is a lot easier to enter than 23:59:59 (which is what you'd have to use for unknown times in order for them to follow known times).

Only some of my slides and photos have annotations telling me when they were taken. For those that don't, I fall back on when they were developed. In the case of slides, that's typically marked on the slide frame. For the sample slide, the development date is August 1992. If I had no information about when the slide was taken, that's what I'd use.

This policy means that for an image whose metadata timestamp says it was taken on July 1, 1992, it's impossible to distinguish among these possibilities:

  • The picture was taken on July 1, 1992.
  • The picture was taken in July 1992, but I don't know which day.
  • I don't have information about when the picture was taken, but I know the film was developed in July 1992.

I address this ambiguity by putting what I actually know into the "description" metadata fields for the picture. These fields have different names in Exif, IPTC, and XMP. Exif uses ImageDescription. IPTC goes with Caption-Abstract. In XMP, the field is dc:description

Many (but not all) programs that edit metadata tie these fields together. If you edit one, the others are updated automatically. ExifTool takes a different approach. There, if you write to one of the "description" fields, only that single field is affected. If you want to update them all (and you certainly want to keep them in sync!), you can write to the MWG composite field, Description. That propagates the change to all of Exif, IPTC, and XMP.

For the sample slide, I put this information into its description:

Taken 7/1992
Developed 8/1992

My policy implies that when I encounter an image file with a day of 1 for when it was taken, I have to check its description to find out what the 1 means. The metadata timestamp for when the picture was taken is an approximation. What's actually known is in the image's description. 

This approach generalizes to pictures where the "when taken" information is too vague to put into date/time format. For example, if I have nothing telling me when a picture was taken or developed, but I can guess that it was taken in the late 1970s, I can leave the "date taken" fields empty and write what I know in the description (e.g., "Taken in the late 1970s--look at those clothes!") . 

Naturally, "Date Taken" is not the name of a standard metadata field. That'd be too easy. The Exif field name is DateTimeOriginal. XMP calls it DateCreated. IPTC has two fields, one for the date (DateCreated) and one for the time (TimeCreated). Note that DateCreated in XMP is both a date and a time, but DateCreated in IPTC is just a date.

Programs manipulating metadata timestamps may or may not propagate changes in one field to the corresponding fields in other metadata blocks. In my experience, it's easier for these fields to get out of sync than it is with description metadata.

ExifTool's approach to "date taken" mimics that for description information. Individual timestamp fields can be written, but it's also possible to write to an MWG composite field representing the three fields that should mirror one another. For the "date taken" timestamp, the composite field's name is DateTimeOriginal (the same name that Exif uses), so using ExifTool to write the MWG DateTimeOriginal field has the effect of putting a value into the corresponding "date taken" fields for Exif, IPTC, and XMP. 

The date and time when a picture was taken is typically the most important timestamp for a scanned image, but it might also be useful to know when the scan was performed. I expect scanners to be able to automatically insert this information into the metadata. I don't have any specific use for this timestamp, but since recording it should incur virtually no cost, I want to do it. You never know what information might be useful in the future.

The Exif field for when an image took digital form is DateTimeDigitized. IPTC again uses two fields, DigitalCreationDate and DigitalCreationTime. XMP calls it CreateDate. CreateDate is also the name of ExifTool's composite field for all these fields.

Note that the IPTC and XMP DateCreated fields refer to when a picture was taken. The XMP and ExifTool CreateDate fields refer to when it was digitized. I think this is a terminological train wreck, but, sadly, this is the only train in the station.


Tuesday, January 11, 2022

Image Metadata: Standards, Guidelines, and ExifTool

This is part 2 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool (this post)

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

Image metadata is a field that loves abbreviations. A good entry point is the names of the three most important standards:
  • Exif (often written EXIF) was developed by camera manufacturers. It primarily addresses low-level information about a digital picture, such as the make and model of the camera used to take it, the exposure settings, the date and time it was taken, etc. However, it has a few fields for higher-level information, such as the copyright holder and a description of what's in the picture.
  • IPTC comes from news organizations and generally aims at higher-level issues, such as photo captions, lookup keywords, copyrights, and the like. The original "legacy" IPTC standard was IIM. It's known as IPTC-IIM. That was succeeded by IPTC-Core and IPTC-Extension, but IPTC-IIM is still widely used, so in practice, there are three IPTC standards to be aware of.
  • XMP was developed by Adobe as a more general approach to metadata than Exif and IPTC. XMP can represent all Exif and IPTC metadata, plus much more. XMP groups its fields into namespaces. The XMP Exif namespace, for example, provides fields for metadata defined by Exif. A particularly important namespace is Dublin Core, which defines fields applicable to more than just images, e.g., to audio, video, and printed information. Among these fields are those for copyright and descriptive information. Trivia lovers will delight in knowing that the Dublin in Dublin Core is in Ohio, not Ireland.
IPTC-Core and IPTC-Extension are implemented using XMP technology, so it is not uncommon to lump these three standards together, even though they're different. A consequence is that IPTC-IIM is often just called IPTC. Sometimes it's simply referred to as IIM.
 
XMP became an international standard in 2012. Because it's more expressive than Exif and is the underpinning of IPTC-Core and IPTC-Extension, I think many people assumed that XMP would quickly replace Exif and IPTC for metadata storage. Life would be simpler if it had. It'd also be simpler if Exif and IPTC covered disjoint sets of information. But XMP hasn't taken over, and Exif and IPTC have significant overlap, so many important metadata fields in an image file exist in three places: one in each of the Exif, IPTC, and XMP parts of the file. 

In 2008 and 2010, the Metadata Working Group (MWG) published a set of guidelines for programs that have to deal with this mess. Alas, if the (really outstanding) work by Carl Siebert in 2017-18 accurately reflects how current programs read and write redundant metadata fields, the guidelines have hardly brought order to the chaos. Different programs treat the equivalent Exif, IPTC, and XMP fields in different ways. The MWG itself appears to have given up. Its web site (www.metadataworkinggroup.org) is no longer reachable, which is why my link to the MWG guidelines uses the Internet Archive's Wayback Machine. 

All serious travelers through the image metadata wilderness eventually make their way to ExifTool. An astonishingly powerful program for metadata manipulation, its influence is such that when the name it uses for a standard field differs from the name used in the standard, the ExifTool name tends to dominate. For example, the Exif field holding the date and time when a digital image was created (e.g., the date/time when a photo is scanned) is called DateTimeDigitized. ExifTool calls that field CreateDate. Many metadata workers casually refer to Exif's DateTimeDigitized as CreateDate, in part because that's ExifTool's name for the field and in part because it's the name of the corresponding field in XMP. This can confuse the uninitiated (as I was not that long ago), because searching the Exif standard for CreateDate turns up nothing. 

Interestingly, ExifTool did not extend the use of  the name "CreateDate"to IPTC (i.e., to IPTC-IIM). IPTC splits the date and time of digitization into separate fields, DigitalCreationDate and DigitalCreationTime, and ExifTool uses the IPTC names for these fields.

Notwithstanding its name, ExifTool reads and writes more than Exif metadata. It also handles metadata defined by IPTC, XMP, and a variety of lesser standards. In addition, it offers "composite" fields derived from the MWG guidelines. These fields make it possible to simultaneously write to all the fields in Exif, IPTC, and XMP that are supposed to contain the same value. As you'll see in a later post, I take advantage of this capability when putting metadata into the files for my scanned pictures.

Monday, January 10, 2022

The Scanned Image Metadata Project

This is the first in a series of posts about putting metadata into scanned picture files, including why it's desirable, how I approach it, and how well it works. The series consists of: 

Part 1: The Scanned Image Metadata Project (this post)

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans


 

Not long ago, my wife asked if I could find a particular photograph. I dug up what turned out to be a slide from 1992. The exercise reminded me that the bulk of our photographic history exists only in non-digital form: slides, prints, and negatives. That puts it one disaster away from annihilation. A fire, a flood, a theft, and we lose everything. Not that a sudden catastrophe is necessary. Slides, negatives, and prints degrade over time. Colors shift. Details fade.

I've known for many years that I should have our pictures scanned into digital form. In 2008, I looked down that road, but I was stymied by the challenge of storing metadata. Getting images into files is easy. Capturing the metadata for the pictures--who's in them, when and where they were taken, etc.--is anything but. 

The image metadata problem is an old one. News photographers have long needed a way to electronically convey photos and associated information to their central offices. By 1991, there was a technical standard for it. Thirty-plus years later, you'd think we'd have a well-established, straightforward way to handle image metadata. You'd be wrong. As a comment at Stack Exchange Photography put it last month, "Image and video metadata is a complete hot mess."

There are two basic reasons for this. First, there are three overlapping standards for metadata storage. All are in broad use. Terminology and conventions within and among them are inconsistent and confusing. One standard's Description field is another standard's Caption_Abstract, for example, and that's sometimes referred to simply as Caption. It's different from the Title field, which is not to be confused with the UserComment field.

The second issue is that programs working with metadata layer on additional inconsistent and confusing names. It's not easy to remember that one standard's DateTimeOriginal field is called DateCreated in some programs, but DateCreated is completely different from CreateDate, which is the name some programs use for a field officially called DateTimeDigitized. Though the Title field is not the same as the Description field, File Explorer and Photo Viewer on Windows 10 sometimes show the value of the Description field with the label Title. Sometimes with the label Subject. Occasionally with both.

Mastering the name game is one challenge. Dealing with redundancy is another. Each image file typically has three description fields, for example, one per standard. Do you write the same data into all three fields, thus ensuring consistency, but risking incoherence if one of the fields is edited, or do you write to only a single field and leave the other two blank? Sorry--trick question! Many programs automatically write to all three fields, even if you edit only one. At the same time, some programs that show descriptions read from only one of the fields, so if the one they look at is empty, you won't see anything, even if other description fields have information in them. Redundancy and potential inconsistency are, sadly, the only practical choice.

Little wonder that some people throw up their hands and look for a solution not involving embedded metadata. One approach is to store the metadata separately from the image, often using the image file's name as a key to look up in a spreadsheet or text file. For me, this as a non-starter. It's too easy for the image and the metadata to get separated. Another approach is to use an image's metadata as its file name. This is clumsy even in concept ("Joe, Bob, Sue, Fred at Lincoln Beach celebrating Bob's retirement 1980-07-16.jpg"), but a bigger problem is that it doesn't address photos stored in the cloud (where file names may not be visible) and photos sent via text message (where the sender's file name is not provided). Image file metadata is a mess, to be sure, but it's still the best of a bad lot.

I want to store metadata about a scanned photo in its image file such that it will be easily accessible in any program that displays metadata. Unless expressly removed from the file, the metadata should stay with the image if it's copied, moved, emailed, texted, uploaded, or shared in the cloud. The comments written on the back of a physical photograph stay with the photo as it's moved about. Image metadata should do the same.

Achieving my goal requires figuring out the following:

  • What metadata should be stored.
  • Which metadata fields it should be stored in.
  • How to put metadata into those fields.
  • How to view metadata in an image file.
  • How to preserve metadata when an image is moved around (e.g., emailed, texted, uploaded, etc.)

In recent weeks, I've spent a lot of time wrestling with these issues. In subsequent blog posts, I'll explain what I've learned and the conclusions I've come to. Links to the full series are at the top of this post.