The View from Aristeia

Saturday, July 20, 2024

Anthropic's Claude Aces my German Grammar Checker Tests

Yesterday I published the results of my latest testing of German grammar checking systems. Unlike my first round of tests, I included LLMs, in particular ChatGPT, Gemini, and Copilot. I had a nagging feeling that I should include Anthropic's Claude, too, but I didn't find out about Claude until near the end of my testing, and I wanted to be finished, so I decided to worry about Claude later. This was a terrible decision. Later turned out to be only a few hours after I'd published the article, when I unexpectedly found myself with enough free time to play around with the system. Claude proceeded to not only outperform every other system I'd tested, it aced my set of tests with a perfect score.

My test set is hardly exhaustive, but no other system has managed to find and correct all 21 errors in the set. Kudos to Claude and Anthropic!

Here are the updated results of my testing after the addition of Claude to the list:

Claude: 84 points (100% of possible)
ChatGPT: 76 (90%)
LanguageTool: 70 (83%)
Scribbr/Quillbot: 50 (60%)
Gemini: 48 (57%)
Sapling: 41 (49%)
Copilot: 40 (48%)
Rechtschreibprüfung24/Korrekturen: 39 (46%)
Google Docs: 34 (40%)
GermanCorrector: 31 (37%)
Online-Spellcheck: 26 (31%)
Microsoft Word 2010: 16 (19%)

Friday, July 19, 2024

German Grammar Checkers Revisited

Update: I have since added Anthropic's Claude to my tested systems, and it outperformed every system I discuss below. Details here.

A couple of months ago, I blogged about how I'd (superficially) tested a number of grammar checking tools for German. A comment from jbridge introduced me to the idea of using ChatGPT as a grammar checker, and in the course of exploring that option, I expanded my set of tests to make it a little less superficial. That led to new insights, so it seems like it's time for a German grammar checking tool follow-up.

If you're not familiar with my original post, I suggest you read it.

As before, I'm testing only free tools. I generally test without signing up for or logging into any accounts, but for ChatGPT, I created a free account and logged in so I'd have access to the more powerful ChatGPT 4o rather than the no-account-required ChatGPT 3.5.

Update on the Tools I Looked At Last Time

I remarked last time that Scribbr and QuillBot are sister companies using the same underlying technology, but Scribbr found more errors. That is no longer the case. In my most recent testing, they produce identical results, so we can speak of Scribbr/QuillBot as a single system. Unfortunately, the way this uniformity was achieved was by bringing Scribbr down to QuillBot's level rather than moving QuillBot up to Scribbr's. Even so, Scribbr/QuillBot remains the second-best grammar-checking tool I tested (after LanguageTool). It continues to find errors that LanguageTool misses, so my default policy remains to use both.

My prior test set consisted of six individual sentences. This time around, I added a letter I had written. It's about 870 words long (a little under two pages), and, as I found out to my chagrin, contains a variety of grammatical errors. Checking such a letter is more representative of how grammar checkers are typically employed. Just as we normally spell-check documents instead of single sentences, grammar checkers are usually applied to paragraphs or more.

That had an immediate effect on my view of DeepL Write. I noted in my original review that it's really a text-rewrite tool rather than a grammar checker. On single sentences, you can use it to find grammatical errors, but when I gave it a block of text, it typically got rid of my mistakes by rephrasing things to the point where the word choices I'd made had been eliminated. I no longer consider it reasonable to view DeepL Write as a grammar-checking tool. It's still useful, and I still use it; I just don't use it to look for mistakes in my grammar.

Adding the letter to my set of tests also led me to remove TextGears and Duden Mentor from consideration, because they both have a 500-character input limit. My letter runs more than ten times that, about 5300 characters. Eliminating these systems is little loss, because, as I noted in my original post, GermanCorrector produces the same results as TextGears, but it doesn't have the input limit. As for Duden Mentor, it flags errors, but it doesn't offer ways to fix them. I find this irritating. Using it is like working with someone who, when you ask if they know what time it is, says "Yes."

These changes don't really matter, because when I submitted my augmented test set (i.e., sentences from last time plus the new letter) to the systems from my original post, LanguageTool and Scribbr/QuillBot so far outperformed everybody else, I can't think of a reason not to use them. I'll provide numbers later, but first we need to talk about LLMs.

ChatGPT and other LLMs

In his comment on my original blog post, jbridge pointed out that ChatGPT 4o found and fixed all the errors in my test sentences. That was better than any of the systems I'd tested. Microsoft's LLM, Copilot, produced equally unblemished results. Google's Gemini, however, found and fixed mistakes in only four of the six sentences.

I asked all the systems (both conventional and LLM) to take a look at my letter. ChatGPT found and fixed 13 of 15 errors, the best performance of the group. LanguageTool also found 13 of the 15 mistakes, but ChatGPT fixed all 13, while LanguageTool proposed correct fixes for only 11.

Given ChatGPT's stellar performance, I was excited to see what Copilot and Gemini could do. Copilot kept up with ChatGPT until it quit--which was when it hit its 2000-character output limit. This was less than halfway through the letter, so Copilot found and fixed only four of the 15 errors.

Gemini's output was truncated at about 4600 characters, which is better than Copilot, but still less than the full length of the text. Gemini was able to find and fix eight of the 15 errors. This is notably fewer than ChatGPT, and not just because Gemini quit too early. In the text that Gemini processed, it missed three mistakes that ChatGPT caught.

I had expected that the hype surrounding ChatGPT was mostly hype and that the performance of ChatGPT, Copilot, and Gemini would be more or less equivalent. If their performance on my letter is any indication, ChatGPT is currently much better than the offerings from Microsoft and Google when it comes to checking German texts for grammatical errors. It also does better than LanguageTool, the best-performing non-LLM system, as well as the combination of LanguageTool and Scribbr/QuillBot together. It's really quite impressive.

However, just because it's better doesn't mean it's preferable. Read on.

Learning from LLMs

My interest in grammar checkers is two-fold. Sure, I want to eliminate errors in my writing (e.g., Email and text messages), thus sparing the people who receive it some of the kinks in my non-native German, but I also want to learn to make fewer mistakes. Knowing what I'm doing wrong and how to fix it will help me get there. At least I hope it will.

Traditional (non-LLM) grammar checkers are good at highlighting what's wrong and their suggestion(s) on how to fix it. This is what LanguageTool looks like on the second of my single-sentence tests. The problematic text is highlighted, and when you click on it, a box pops up with a suggested fix:

The other grammar checkers work in essentially the same way.

LLM systems are different. They can do all kinds of things, so if you want them to check a text for German grammar errors, you have to tell them that. This is the prompt I used before giving them the letter to check:

In the following text, correct the grammatical errors and highlight the changes you make.

For the most part, they did as they were told, but only for the most part. Sometimes they made changes without highlighting them. When that was the case, it was difficult for me to identify what they'd changed. It's hard to learn from a system that changes things without telling you, and it's annoying to learn from one that disregards the instructions you give it. But that's not the real problem.

The real problem is that LLMs can be extraordinarily good at doing something (e.g., correcting German grammar errors) and unimaginably bad at explaining what they've done. Here's ChatGPT's explanation of how it corrected the last of my single-sentence tests. (Don't worry if you don't remember the sentence, because it doesn't matter).

"die gute Flugverbindungen" is incorrect, as "Flugverbindungen" is plural and therefore "die" must be used, but it must also be "gute" instead of "gute".

That's nonsense on its own, but it also bears no relation to the change ChatGPT made to my text. Explanations from Gemini and Copilot were often in about the same league.

Lousy explanations may be the real problem, but they're not the whole problem. My experience with LLMs is that they complement their inability to explain what they're doing with a penchant for irreproducability. The nonsensical ChatGPT explanation above is what I got one time I had it check my test sentences, but I ran the tests more than once. On one occasion, I got this:

"Ort" is masculine, so the relative pronoun should be "der" to match the previous clause and maintain grammatical consistency.

This is absolutely correct. It's also completely different from ChatGPT's earlier explanation of the same change to the same sentence.

Copilot upped the inconsistency ante by dithering over the correctness of my single-sentence tests. The first time I had it look at them, it found errors in all six sentences. When I repeated the test some time later, it decreed that one sentence was correct as is. A while after that, Copilot was back to seeing mistakes in every sentence.

Using today's LLM systems to improve your German is like working with a skilled tutor with unpredictable mood swings. When they're good, they're very, very good, but when they're bad, they're awful.

I view the instability in LLM behavior as a sign of hope. The systems are evolving, and I'm confident that, over time, they'll get better and better. ChatGPT already finds and fixes more errors in my test set than any other system (see below). Its descriptions of what it's doing and why are sometimes delusional, but I have faith that its developers will get that under control. I have similar faith that other LLM systems will also improve, offering better results and increased consistency. For now, I'm sticking with LanguageTool and Scribbr/Quillbot, but I have no doubt that LLM technology will assume an increasingly important role in language learning. (LanguageTool says it's "AI-based," so for all I know, it already uses an LLM in some way.)

Systems and Scores

Update: As noted above, I have added Anthropic's Claude to my tested systems, and it outperformed everything in the list below. Details here.

I submitted the six single-sentence tests from my first post plus the new approximately-two-page letter to the grammar checkers in my first post as well as to ChatGPT, Gemini, and Copilot. As noted above, I disregarded the results from DeepL Write, TextGears, and Duden Mentor. I did the same for Studi-Kompass, which I was again unable to coax any error-reporting out of. I scored the remaining systems as in my original post, awarding up to four points for each error in the test set. A total of 84 points was possible: 24 from the single-sentence tests and 60 from the letter. These are the results:

ChatGPT: 76 points (90% of possible)
LanguageTool: 70 (83%)
Scribbr/Quillbot: 50 (60%)
Gemini: 48 (57%)
Sapling: 41 (49%)
Copilot: 40 (48%)
Rechtschreibprüfung24/Korrekturen: 39 (46%)
Google Docs: 34 (40%)
GermanCorrector: 31 (37%)
Online-Spellcheck: 26 (31%)
Microsoft Word 2010: 16 (19%)

Wednesday, June 19, 2024

How a Bad Monitor Port Thwarted my Move to macOS

In March, I decided to move from Windows to Mac. My goal was texting on the desktop, i.e., texting via my computer. Many people insist on texting, and I was able to communicate with them only through my iPhone. I wanted to be able to do it using my computer, as I did with email and WhatsApp.

Changing operating systems is always a production, but in thirty-plus years with Windows, I had never developed an affection for it, so my only real concern was that I'd have to give up the three-monitor setup I'd used for over a decade. I connect my monitors to a docking station made by Lenovo that's compatible with my Lenovo laptop. Apple doesn't make docking stations, but an associate at the local Apple store assured me I'd have no trouble using a third-party dock. He pointed me to Plugable. Pluagable recommended their UD-ULTC4K, and I ordered one for use with the MacBook Pro M3 I purchased.

I bought a bunch of stuff at this time. I got a new keyboard with Mac-specific keys. I got a new monitor offering HDMI and DisplayPort inputs, because one of my current monitors didn't have either, and those are the video outputs on the Plugable docking station I'd ordered. I purchased new video cables to connect everything.

As we'll see, the new monitor is the villain of this story. We'll call it the ViewSonic.

Missing Monitors and Restless Windows

A few days after I started using the Mac-based system, I wrote Plugable about an intermittent problem I was having. It occurred only after a period of inactivity (POI), i.e., when the displays turned off because I wasn't interacting with the system. When I started using the Mac after a POI, one or two monitors might fail to wake up. The Mac would then shuffle the windows from the "missing" monitors to the monitor(s) it detected. Most of the time, only one monitor went blank. It was rarely the ViewSonic.

Two weeks of debugging with Plugable followed, during which logs were collected, software was updated, cables were swapped, a replacement dock was issued, and we started all over. Then I reported what I came to call restless windows:

On some occasions, after all three screens come up, some of the windows that should be on one screen have been moved to another one.

Neither Plugable nor I had theories about how this could happen, nor did we have ideas for further debugging the missing-monitor problem, which hadn't gone away. We agreed that I'd return their dock, and they'd refund my money. I was disappointed at how things worked out, but Plugable acquitted itself exemplarily throughout.

I replaced the Plugable dock with TobenONE's UDS033. It also yielded intermittent disappearing monitors after a POI. TobenONE's interest in helping me debug the problem was minimal, and it vanished entirely when they found I'd purchased the dock from Amazon instead of from them. I returned it.

The manager at the local Apple store was sympathetic about the trouble I was having. She suggested swapping out the MacBook to see if that was causing the problems, extending Apple's return period to facilitate the swap. I ordered a replacement computer matching the one I already had.

There are two basic MacBook Pro M3 models. The M3 Pro, which is what I had, natively supports up to two external monitors. To connect three, you have to use a docking station employing a technology that, from what I can tell, fools a MacBook into thinking there are two external monitors when in fact there are three. The big cheese among such technologies is DisplayLink. Both Plugable and TobenONE use it. Internet sentiment towards DisplayLink is lukewarm, but if you need to connect three monitors to an M3 Pro MacBook, it's your primary choice.

The other MacBook Pro M3 model is the M3 Max. It's more expensive than the M3 Pro, but it natively supports up to three external monitors. I'd originally purchased an M3 Pro, and the replacement I'd ordered was also an M3 Pro, but while it was in transit, I realized that by upgrading to an M3 Max, I could eliminate the need for a docking station as well as DisplayLink, thus simplify the debugging of my problem. I canceled the M3 Pro replacement before it was delivered, and I ordered an M3 Max. Apple was unfazed by this, but my credit card company was on high alert, noting that my pricey orders from Apple were unprecedented and asking for confirmation each time.

I was excited about the M3 Max. With docks and DisplayLink out of the picture, surely my missing-monitor and restless-windows problems would disappear!

Um...no. Which is not a surprise, because I've already told you that the source of my display drama was a bad port on the ViewSonic. At the time, I didn't know that.

I connected my monitors to the M3 Max and arrayed them side by side. The ViewSonic was in the middle, because it was the newest and had the spiffiest specs. Not long afterward, I had the bizarre experience of returning to my computer after a POI and seeing that the windows on my left and right monitors had swapped! Online discussions (e.g., here and here and here) showed that I was not the first to experience this. I logged several instances before calling Apple. One tech remarked, "Yeah, that happens to me, too." A second told me, "Engineering has an open issue on that." There was every reason to believe that this was a MacBook problem.

Notice that the window-swapping behavior did not involve the ViewSonic. The restless windows afflicted only the side monitors. The ViewSonic sat quietly in the middle with an innocent look on its face.

Apple told me they'd look into the problem and get back to me. In the meantime, I logged what happened after every POI. On Day 1, the windows on the left and right monitors swapped six times out of seven POIs. On Day 2, eight of ten POIs resulted in window swaps.

Then things got strange.

The Music app often jumped from the right monitor to the middle one, even though other windows on the right monitor stayed put. Sometimes the windows from the right monitor moved to the left monitor, but the windows on the left monitor remained in place. (That's half a swap.) I started taking screen shots before POIs (i.e., just before I left the computer for a while) for comparison with what I saw after a POI. I found that my restless windows sometimes did more than just jump from one monitor to another. They might take on a different size or their position on the monitor might change. Or, as in this example, both:

Screen shots before POI (above) and after (below). The ViewSonic is the middle monitor, where nothing changes.

I gave up. I'd been battling missing monitors and restless windows for three months, and there was no end in sight. The Internet showed that others had the kinds of problems I did, and they couldn't solve them. Apple support reps told me they experienced the behavior I did, and I'd received no follow-up from Apple Engineering. I returned my MacBook on the last day of its return period. I was sad to do it, because I'd been just as pleased with texting on the desktop as I'd hoped, and I'd grown fond of photos taken on my iPhone magically appearing on the MacBook. Apple's reputation for integration isn't for nothing. But convenient texting and synchronized photos weren't enough to compensate for nondeterministic window sizing and placement each time I returned to my machine. I retreated to my Windows system to lick my wounds and consider my options.

Back to Windows

I dusted off my Lenovo laptop and its docking station (literally!), hooked everything up, and booted into Windows. I futzed around and rued the loss of my texting window, then went away for a bit. When I came back, I was stunned to see that one of the windows that had been on my left monitor was now on the middle monitor! Hoping I had somehow imagined it, I moved it back where it belonged and left the machine for another POI. It had not been my imagination. The window was not where I'd left it. In addition, a window on the left monitor was now a different size!

You know those creepy scenes in movies and TV shows where the protagonist unplugs a TV or a computer monitor to make sure it's off, but it turns back on, anyway? It was like that. The Mac was gone, and I was using the same old Windows system I'd been using for years. Restless windows were impossible. And yet...

It took me a while to realize that it wasn't the same system I'd been using for years. I hadn't swapped back in the monitor the ViewSonic had replaced. That made the ViewSonic the only component common to the Mac-based system I'd been using and the Windows-based system sitting before me--the only component common to all configurations where I'd experienced windows on walkabouts.

I swapped out the ViewSonic for the monitor it had replaced. Everything worked fine. Time after time, my Windows remained where I put them. It was apparent that the ViewSonic was anything but innocent.

Ports

There are two digital input ports on the ViewSonic: HDMI and DisplayPort (DP). (The monitor also has a VGA input, which is analog. That input isn't germane to the story, but it's worth taking a moment to marvel at the longevity of VGA, which debuted in 1987 and remains important enough that monitor vendors continue to support it.) I'd been using the ViewSonic's DP input, because the video output from the Lenovo docking station is DP, and I figured it would be better to go DP-to-DP than DP-to-HDMI.

The ViewSonic was clearly responsible for my weeks of video despair, but I didn't know if the problem lay with the monitor in general or with the DP port in particular. To find out, I swapped the ViewSonic back in to the system, this time connecting to the HDMI port. A zillion POI trials convinced me that the monitor worked fine with HDMI. The problem had to stem from the DP connection.

That connection has three parts: the DP port on the docking station, the DP port on the ViewSonic, and the cable between them. I'd successfully used the DP port on the dock when testing the ViewSonic's HDMI connection, so the dock's DP output was in the clear. That meant the source of my monitor madness was either the ViewSonic's DP port or the cable leading to it. I bought a new cable and connected to the ViewSonic's DP port. My windows were restless again. Two cables with the same behavior meant the cable wasn't the problem. The guilty party had to be the ViewSonic's DP port. It was the last suspect standing.

To really clinch the case, I'd need to replace the ViewSonic with an identical monitor and verify that everything works over a DisplayPort connection with the replacement monitor. I'm working with ViewSonic on that now.

Update: ViewSonic replaced my monitor, but the new monitor behaves the same as the old one: my windows wander when I use its DisplayPort input, but they stay put when I use its HDMI input. I can't explain this. The chance of both monitors being defective is nil. ViewSonic might be handling the DisplayPort protocol in a funky way, but if that were the case, I'd expect there to be lots of reports of the problem, and I don't see that. My windows wandered with four docking stations, three computers, and two operating systems, so the only common denominator is the ViewSonic monitor. And me, now that I think about it...

To me, the big mystery is how a bad port on one of three monitors in a system can, among other things, cause the window managers in two independent operating systems to swap windows on the monitors whose ports are not bad. If you have insight into this, please share!

Mac Thoughts

The case against the ViewSonic is rock-solid, but that doesn't mean the Mac is off the hook. The Internet reports of restless windows under macOS are still there, as are the comments from Apple's support reps acknowledging the problem. It could be that the ViewSonic is defective and Macs have unreliable multiple-monitor support. Testing this would require a fourth MacBook purchase (I wonder what my credit card issuer would think of that), replacing the ViewSonic with a known-good monitor, and seeing what happens.

It'd be easy enough to do, but I'm not sure I want to. During my three months with macOS, I devoted a great deal of time to debugging missing monitors and restless windows, but I also spent many hours familiarizing myself with the operating system and working within it. Desktop texting and synced photos were great, and moving from the M3 Pro to the M3 Max was the kind of smooth experience that drives home just how sadistic Microsoft's Windows-to-Windows migration process is. (My PC is 11 years old, in no small part because moving to a new machine is so painful.) macOS is a significant upgrade to Windows in important ways, especially if you have other Apple devices, e.g., an iPhone.

However, I found day-to-day life with macOS rather uncomfortable. The menu bar's fixed position at the top of the screen often means moving the mouse a large distance to get to it. I have 24-inch monitors, and those marathon moves got old quickly. Word on the Internet is that the decision to anchor the menu bar atop the screen dates to the original 1984 Macintosh computer. That machine had a nine-inch screen and ran one application at a time, always in full-screen mode.That's nothing like the world I live in. The Mac's fixed menu bar location feels like the 40-year-old design decision it apparently is. I don't think it's stood the test of time.

Keyboard shortcuts can reduce the need to go to the menu bar, I know, but there's a steep memorization curve for them, and not everything has a keyboard shortcut. I find Windows' per-window menu bar more usable.

macOS seems focused on applications, while Windows is built around windows. Under macOS, Command-Tab cycles through applications. The Windows equivalent Alt-Tab cycles through windows. If you have multiple windows for an application, they share a single entry in the macOS Command-Tab cycle. In the Windows Alt-Tab cycle, each window get its own entry. (macOS offers a way to cycle through all windows associated with an application (Command-↑/Command-↓ after Command-Tabbing to the application), but I find it cumbersome.) The application-based focus on the Mac is so pronounced, you can hide all windows associated with an application, an operation with no Windows counterpart, as far as I know.

The Windows approach makes more sense to me. I partition my work into windows, not applications. I often have multiple independent windows open in a single application, especially browsers and spreadsheets. It's hard for me to think of situations where I'd like to close all windows associated with an application, but it's easy for me to think of situations where I'd like to close only some windows associated with an application. During my time with macOS, I tried to find scenarios where hiding made sense, but I came up empty. I ended up minimizing windows under macOS, just like I did under Windows, and I missed the ability to easily cycle through windows when I wanted to interact with one.

I was surprised to find that I often found familiar content looking rather ugly under macOS. Messages in Thunderbird looked like everything was in bold face, while Excel spreadsheet content was so small, I had to bump the magnification up to 120% to comfortably view it. It's likely that there are configuration changes I could have made to address these issues, but I really expected that an Excel spreadsheet or a Thunderbird email message I'd created under Windows would look pretty much the same when viewed in the same application (often on the same monitors) under macOS.

On Windows, I run Excel 2010. On macOS, the closest I could get was Office 365. Excel 365 on macOS lacks customization options present in Excel 2010 under Windows. The Quick Access Toolbar (QAT) on the Mac is fixed above the ribbon, for example, while in my Windows version, you can move it below the ribbon--which I do. Some commands I've got on the QAT in my Windows version of Excel--Font Name, Font Size, and Insert Symbol--can't be put in the QAT in Excel 365 on macOS. These limitations are Microsoft's fault, not Apple's, but they still chafe.

A third-party Excel plug-in I use works quite differently and less conveniently under macOS than under Windows. That's neither Microsoft's nor Apple's fault, but it further detracts from the overall Excel experience on a Mac. That matters to me, because I use Excel a lot. Given enough time and effort, I'm sure I could get used to the Mac version of Excel or I could switch to a different spreadsheet program, but between an inconveniently located menu bar, an emphasis on applications over windows, ugly window content, and restricted Excel functionality, the total cost of texting on the desktop comes to a lot more than I'd expected.

Plan B

At least that approach to texting on the desktop does. There is another way. If my going to macOS is too much trouble, it's supposed to be possible to use remote desktop software to bring macOS to me. This requires a Mac in addition to a Windows machine, but I happen to have an old MacBook Air floating around (as it were). I should be able to run a remote desktop server on the MacBook and a remote desktop client on Windows, thus giving me a way to use apps on the Mac--notably iMessage--from Windows. With suitable remote desktop support, I should be able to copy and paste from one machine to another, and the net effect should be pretty close to running Mac apps locally.

The remote desktop software most frequently mentioned for this is Google Remote Desktop (GRD). I gave it a quick try, and, well, if you accidentally make the Mac both the GRD server and client, you end up with a screen that looks like this:

Obviously, I have more work to do.

Wednesday, May 29, 2024

Five Years of no EV for Me

Five years ago this week my search for an electric compact SUV ended with me buying a conventional gas-powered car. I disliked the car (a Nissan Rogue) within a month after buying it, and I've been on the lookout for an electric replacement ever since.

Over the years, I've vented my frustrations with the EV market in a number of blog posts, sometimes focusing on their high cost and sometimes on the lack of models with the basic features I'm looking for: all-wheel drive, an openable moonroof, a 360-degree camera, and an EPA range of at least 235 miles. In November, I discussed the luxury-car-level pricing of the only two cars that meet these criteria: the Nissan Ariya and the Volvo XC40 Recharge. Since then, the only things that have changed are the name of the Volvo (now called the EX40) and the elimination of the prospect of Chinese imports pushing down EV prices. (The US government has adopted a policy of keeping them out of the market.)

In the meantime, what I'm looking for in an EV has evolved a bit. I still want all-wheel drive, an openable moonroof, and an all-around camera, but I now want a car on the shorter end of the compact SUV spectrum. My Rogue is 185 inches long. I'd prefer no more than 180 inches. (Tesla's Model Y is 187 inches. Ford's Mustang Mach-E is 186. VW's ID.4 is 181.)

My thinking about range has also changed. EV ranges can't touch those of gas-burners, so I understood that distance driving in an EV requires planning. But this didn't strike me as a problem. If you're driving, say, 400 miles to get from Point A to Point B, stopping for a half hour at 200 miles to recharge isn't a hardship. 200 miles represents 3-4 hours of driving, and who doesn't want to stop at that point to stretch one's legs, use the bathroom, grab a snack, etc?

A recent trip made me realize that not all long drives consist of extended driving sessions. My wife and I put 400 miles on a tank of gas while meandering along the southern Oregon coast. Most of our driving sessions were under an hour, because we stopped at various beaches (including, of course, Meyers Creek Beach, which is at the mouth of Myers Creek) and small coastal and inland towns (e.g., Bandon, Coquille, Port Orford, and Gold Beach). Many of the places we stopped had no facilities of any kind, much less charging stations.

According to the Oregon Clean Vehicle Rebate Program EV Charging Station Map, the charging options in Gold Beach, where we spent one night, consist of one 120V charging outlet for E-bikes and one 120V outlet for use by hotel and shop patrons on the opposite side of the river from where we were staying. A two-port charging station at a motel we weren't staying at is noted as "coming soon." The source for this information is shown as PlugShare, but the PlugShare web site shows only the "coming soon" charger, so it's possible that there aren't any charging options in Gold Beach at all.

This casts the "planning" aspect of traveling by EV in a different light. If you're off the beaten track and poking along in a gas-powered car, stopping here and there as the whim moves you, you can take refueling opportunities for granted. (There are four gas stations in Gold Beach.) In an EV, you may have to actively seek out recharging options. You really do have to plan.

I haven't yet decided what that means for me as a potential EV owner. It's a simple fact that EV ranges are notably lower than ICE ranges, and it's an equally simple fact that the EV charging infrastructure is much less well developed than the gas station network. For the foreseeable future, buying an EV means accepting those facts and finding ways to cope with them. If I really want to own an EV, I'll have to figure out how to do that.

I probably have plenty of time. The only EV that satisfies my basic criteria and fulfills my new not-longer-than-180-inches criterion is the Volvo EX40 (née XC 40 Recharge). MSRP as I'd like it equipped is nearly $62,000, which is about $20,000 more than my budget.

However, I have a gas-powered riding lawn mower that's not likely to last a lot longer. I'm already thinking of replacing it with an electric version. It could be that the form my first EV takes will be that of a machine you sit on top of and cut grass with.

Monday, May 20, 2024

German Grammar Checkers

I can speak some German. I'll never be fluent, but I can usually get by. Sadly, I make a lot of grammatical errors. It'd be nice to have a tool that could help me find and eliminate them. Syntax and grammar are structured things, seemingly tailor-made for algorithmic analysis. Surely there is software that can analyze my sentences, point out places where I've broken the rules, and tell me how to fix things!

There is. I recently tested more than a dozen programs and web sites that offer this service. The results were less impressive than I'd expected. On my (tiny and unrepresentative) set of sentences containing errors, most tools failed to find most of them. For errors that were found, it was common for the suggested fixes to be wrong.

I found these sites to offer the most useful results:

LanguageTool describes itself as an AI-based spelling, style, and grammar checker. My sense is that the focus is on spelling and grammar, not style. I've found it to do a pretty good job, though there are errors it misses.
Scribbr bills itself simply as a grammar checker. It also produces good results, though a hair below those of LanguageTool.
DeepL Write claims that its AI approach yields "perfect spelling, grammar, and punctuation" and provides alternative phrasings that "sound fluent, professional, and natural." This means it may rewrite your text to not just eliminate mistakes, but also to make it sound different (presumably better). In my experience, it does a very good job of finding and eliminating errors, but it's sometimes difficult to determine whether it changed something because it's incorrect or because it just felt like rewording it.

In daily use, I generally feed my writing to both LanguageTool and Scribbr, because they're fast, and each sometimes finds mistakes the other misses. If I'm extra-motivated, I also turn to DeepL Write. I've found it to identify mistakes the others miss. I don't use DeepL Write all the time, because I find it annoying to have to tease out whether it changed something on the grounds of correctness or stylistic whim.

In addition to these sites, I also (very cursorily) tested the following systems. I found them to produce notably inferior results. I've listed them in order of decreasing performance, based on my (really limited) tests:

QuillBot is a sister company to Scribbr that presumably uses the same underlying technology. I found that the two systems generally give identical results. There are exceptions, however, and in those cases, I found that Scribbr did a better job.
Google Docs can be configured to check spelling and grammar as you type. In my testing, it delivered mediocre results.
Sapling also produces mediocre results, but it often says "Sign in to see premium edit." I didn't do that, so I can't comment on its premium edits.
Microsoft Word, like Google Docs, can be configured to check for spelling and grammar errors as you type. On my tests using Word from Microsoft 365, its coverage was inferior to Google's.
Rechtschreibprüfung24 and Korrekturen produced the same results in my testing, so it's possible that they use the same underlying (and unimpressive) checking engine.
TextGears and GermanCorrector also produced the same results on my tests, so it's possible that they share a checking engine. The results are similar enough to those from Rechtschreibprüfung24 and Korrekturen that it's conceivable that all four use the same underlying technology. In addition, OnlineKorrektor.de looks and acts identically to GermanCorrector, so it could be that there are two URLs for a single underlying checker.
Duden Mentor is the only system I tested that flags the errors it finds, but doesn't offer suggestions on how to fix them.
Online-Spellcheck couples its poor ability to find mistakes with a checking speed that is notably worse than its competitors. In addition, it replaces its input window with an output window, so you can't just paste new text in to check something different.
Studi-Kompass found none of the errors in my tests. That suggests that it wasn't working or that I was doing something wrong.

I must reiterate that my testing was very limited, so my conclusions are tenuous. If you know of more comprehensive comparisons of German grammar checkers, please share what you know in the comments!

My testing focused on incorrect articles, because that's a problem area for me. I used the following test sentences, where I've boldfaced the part of each sentence that's wrong. I realize that if you know German, you will recognize what's wrong without my help, and if you don't know German, you'll just see randomly boldfaced text, but I can't resist the Siren's call of the boldface error indicator.

Das Tisch sieht gut aus.
Ich gehe im Küche.
Ich bin in die Küche.
Ich will einen Ort finden, die schön aussieht.
Beim Check-in haben wir die Größe des Lobbys bewundert.
Schließlich habe ich mich entschlossen, dass ich einen Ort finden musste, der zwischen Singapur und den USA liegt (d.h., der auf dem Heimweg ist), und die gute Flugverbindungen hat.

I invented sentences 1-3 as representing common simple errors. Sentences 4-6 are from or are variations on things I've actually written.

I scored the systems' results as follows:

2 points if the error was found.
2 more points if only one fix was suggested and it was correct; 1 more point if more than one fix was suggested, but the correct one was among them.
-1 point if only incorrect fixes were suggested.
-1 point if rewrites were suggested beyond what was in error. (This is designed to penalize DeepL Write for mixing error corrections and stylistic rewrites.)

If a system found the error in a test sentence and suggested the proper fix (and it didn't suggest anything else), it got the full four points. If it found the error, but it didn't suggest the proper fix, or if it muddied the water with rewrites unrelated to the error, it got between one and three points, depending on the details of what it did.

A perfect score for the set of six sentences would be 24 points. The best any system did was LanguageTool, which got 21. Scribbr was close behind at 20 points. DeepL Write got 19. Then there was a gap until QuillBot's 16 points. Google Docs scored 14, Sapling 13, and Microsoft Word 10. Rechtschreibprüfung24, Korrekturen, TextGears, and GermanCorrector/OnlineKorrektor.de clumped together with 6 points, which is one reason I suspect they may all be using the same checking technology. Duden Mentor also got 6, but its behavior is quite different from the other systems with that score. Online-Spellcheck got 5 points. Studi-Kompass got none, but, as I noted above, my guess is that either the system wasn't working or I was doing something wrong.

Tuesday, February 20, 2024

Tracking Travel

I like to travel. I've been a few places. I have a map on the wall with pins where I've been. Old School, but I started it before the Internet existed. I'd like to move it into the digital era. Looking into that led me to Most Traveled People (MTP) and NomadMania (NM). Both offer the ability to generate maps of places visited based on data you enter. I tried both. The NM data entry process was so slow and cumbersome, I gave up. MTP worked better. The map it produced showing the countries I'd been to makes me look pretty well traveled:

This is terribly misleading. Country-level granularity means that if you visit only a single place in a big country (e.g., the USA, Canada, Russia), the map makes it look like you visited the whole thing. Recognizing this, both MTP and NM break the world into much smaller regions, 1500 in the case of MTP and 1301 in the case of NM. My MTP region map is not just less impressive, it's frankly a little depressing for somebody who feels like he's been around:

The region-based approach is better than one based on countries, but as I was entering the data for my travels in the United States, I found that MTP treats a few states as multiple regions. California, for example, comprises four regions, and Texas three. (NM does the same thing.) Like most states, Oregon--my state--is a single region, and my state pride was wounded at the idea that Colorado is broken into east and west, and Georgia into north and south, yet all of Oregon is thrown into a single basket. Eastern and western Oregon differ greatly in terms of geography, climate, economics, politics, and culture. Having been to one of them doesn't mean you've been to the other in any meaningful way.

Breaking the world into regions and tracking who's been where is good for ranking people in terms of how geographically widespread their travels have been. Such rankings are the bread and butter of MTP and NM. I was surprised to find that I'm a comparative couch potato. Kayak tells me that since I started using it in 2011, I've traveled nearly 500 days, flown over a half million miles, and been in 17 time zones, yet those trips plus my pre-Kayak travels let me lay claim to barely 10% of MTP's 1500 regions. With the paltry 43 countries I've visited, I'm not even half way to qualifying for the Travelers' Century Club. From the perspective of competitive travel, I might as well not even have a passport.

Fortunately, I'm not out to engage in big-league travel competition. I just want a digital approach to tracking where I've been. For that purpose, I'm thinking a custom Google Map with digital push-pins is the way to go. It's basically the same thing I've got on my wall now, except in digital form.

Tuesday, November 28, 2023

Pumpkin Pie Cutters

Some years ago, I got it into my head that just as there are cookie cutters for cookies, there should be pie cutters for pumpkin pie. I bought the deepest tree-shaped cookie cutters I could find, thinking I could stack them and produce festive pie pieces for a holiday party. It didn't go as planned. I couldn't get the stacking to work, and the result of using just one cutter was kind of a disaster:

Nevertheless, proof of concept!

I found that the KindredDesignsCA shop at Etsy offered custom-made 3D-printed cookie cutters. They agreed to make extra-deep cutters for me, one in the shape of a tree, another in the shape of a snowflake. The cutters worked great, except that once I'd extracted a piece of pumpkin pie with a cutter, the pie stuck inside the cutter. I had KindredDesignsCA make plungers so that I could push the pie out of the cutters:

I was so pleased with the result (shown at the top of this post), I started looking for new pie-cutter-shape ideas. For reasons not worth going into, I hit upon the idea of US states, and the next thing I knew, I was looking at pieces of pumpkin pie that looked like California, Texas, and Minnesota:

This year I decided that for Thanksgiving dinner, it would be nice to have pieces of pumpkin pie that looked like turkeys and pumpkins, so KindredDesignsCA again did their 3D magic for me:

I learned a few new things from these latest cutters. One was that it's a bad idea to try to get too detailed. Check out the well-defined beak in the turkey cutter below...

...and compare it to the poorly-defined or broken-off beaks in the pieces of turkey-shaped pie above. We're sculpting with pumpkin pie here, so just because you can produce a cutter with well-defined details doesn't mean you can get those details to be retained in the pieces of pie you cut.

On the other hand, I was worried about the narrow strips of pie for the turkey's legs holding together, and they came out fine.

So far, I've employed these cutters only for pumpkin pie, but a friend and I were musing about what else they could be used for. Ideas include sponge cake, gingerbread, pancakes, hamburger patties, ice cream sandwiches, and gelatin. Plus cookies, of course. In the end, they're just overly-deep cookie cutters with plungers.

As far as I know, the only drawback to these cutters is that they require hand washing. The material used for the 3D printing has a comparatively low melting point, so if they were to be put into a dishwasher, you'd likely end up with pie cutter goo all over everything.

Saturday, November 4, 2023

Electric Cars are Still Luxury Goods :-(

More than three years ago, I blogged about EVs being luxury goods. Some 13 months ago, I showed that the Nissan Ariya--the only electric compact SUV with the basic features I demand (all-wheel drive, an openable moonroof, a 360-degree camera, and an EPA range of at least 235 miles)--came with a 52% price premium vis-a-vis a comparable gas-powered Nissan Rogue. That difference put the Ariya solidly in luxury car territory.

In the intervening year, the Ariya has gone from forthcoming to present on dealer lots, and just yesterday Volvo made it possible to configure and price the 2024 XC40 Recharge for the US market. It joins the Ariya in offering the fundamental features I insist on. The XC40 comes in both battery- and gasoline-powered versions, so it makes it easy to measure the cost of going electric.

The intervening year has also seen a big jump in interest rates:

Average 60-month new car loan rate (per https://bit.ly/3QUl7ER)

The concomitant reduction in demand for new cars has changed the market. I decided to recheck the EV price premium by again comparing the cost of the Nissan Rogue with the equivalently-equipped Nissan Ariya. This time I checked not just MSRPs, but also prices at cars.com. I did two searches at cars.com. The first was nationwide, i.e., for the best price I could find anywhere. The second was "near me," which means within about 100 miles of Portland, Oregon. I then repeated the experiment for the Volvo XC40 (gas-powered) and the Volvo XC40 Recharge (batteries). For the Nissans, my data are for the 2023 model year, because the 2024s aren't out yet. The Volvo data are for the 2024 model year.

This is what I found:

The Ariya continues to have an MSRP about 50% higher than the equivalently-equipped Rogue, and this doesn't change when looking for real cars within 100 miles of me. If I expand my search to the entire country, the price premium drops to 41%, but it still represents a difference of nearly $14,000. It's also an artificial cost differential, because the lowest-priced Rogue is in Arizona, while the cheapest Ariya is in Illinois.

Volvo is a premium brand, so MSRP pricing for its its Rogue equivalent, the XC40, starts 26% higher than the Nissan. Going electric from there (to the XC40 Recharge) demands a relatively modest 26% premium, but the result is 58% above the MSRP for the Rogue. Within the Volvo line, the premium to go electric is only 26%, but the price increase I care about--from an ICE-powered compact SUV of any make to a similarly-equipped EV of any make--is nearly 60%. That's far above the 25% I consider acceptable.

Lest you think I'm not taking government tax credits and rebates into account in pricing the Ariya and the XC40 Recharge, I actually am. Neither qualifies for the federal $7500 tax credit (which is fictional for most people, anyway), and my state's program for EV rebates stopped accepting applications months ago, because it ran out of money.

To me, the most interesting aspect of the pricing data is the smallness of the differential between the Ariya and the XC40 Recharge. Here's the table above with a line added showing the premium you pay for choosing Volvo over Nissan (i.e., the XC40 Recharge over the Ariya):

Regardless of whether you look at MSRPs or prices at cars.com, the Volvo costs no more than 8% more than the Nissan. I've never been able to figure out what makes premium brands premium, but if Volvo has it and Nissan doesn't, I'd expect that to motivate many buyers to choose the XC40 Recharge over the Ariya.

As for me, I'll continue to bide my time and hope that the EV industry eventually comes out with a compact SUV with the features I want at a price that's no more than about 25% beyond the cost of a comparable ICE vehicle.

Monday, August 21, 2023

Nav App Audio Conflict Resolution

You're tootling down the road using a navigation app connected to your car's infotainment system, and you're approaching a turn in the route. You've enabled voice navigation, so the app should use your car's audio to tell you what to do. But what if the audio system is already in use? What should happen if your nav app and another audio source want to use your car's audio system at the same time?

Different nav apps solve this UX design question in different ways. Some resolutions seem obvious. Some are clever. Some are so bad, it's hard to believe they made it into production.

The stakes can be high. If two apps want to use the audio system at the same time, typically only one will succeed. If it's the nav app, you might miss an AMBER Alert on the radio while you're next to a car matching the description of one with a kidnapped child. If it's not the nav app, you might miss a freeway exit and have to drive many miles before the next one.

I compared the resolution of audio conflicts for Google Maps and Apple Maps under iOS 16.6 CarPlay on a 2019 Nissan Rogue. I don't know whether what I found is representative of other nav apps, other cars, or other phone-car interface systems (e.g., Android Auto).

The scenarios I checked were what happens when your nav app wants to speak and:

A streaming app is playing.
The car radio is playing.
A phone call is in progress.
You're talking to your phone.
Your phone is talking to you, e.g., Siri is responding to a query or command.

This is what I found.

Nav App vs. Streaming App

The streaming apps I tested (Pandora and Simple Radio) can be paused, which is characteristic of every streaming app I'm familiar with. It seems obvious to me that the proper behavior when a nav app wants to talk is to pause the streaming app, have the nav app talk, then resume the streaming app. Google sort of agrees, because that's what Google Maps does when the streaming app is Pandora. However, when the streaming app is Simple Radio, Google mutes it instead of pausing it. I don't know the reason for this difference.

Apple Maps has behavior that's not just different from Google's, it's incomprehensibly bad. When Apple Maps wants to talk while a streaming app is playing, it just starts talking. The streaming app continues to stream. If what's being streamed is spoken audio, you have two voices talking at the same time! Neither can be understood. It's hard to imagine a worse approach than this.

Apple would probably argue that I'm mischaracterizing what it does. Apple Maps employs audio ducking, whereby the volume of the streaming app is reduced when Apple Maps speaks. In concept, that's not unreasonable, but in my experiments, the ducking effect kicks in only when the volume of the streaming app crosses a loudness threshold. This threshold is far above my usual listening level. I had to go out of my way to elicit the ducking effect. When I did, I found that the volume of the ducked audio was still high enough to compete with Apple Maps' spoken directives.

To summarize:

Google Maps: Pause stream, speak, resume stream.
Apple Maps: Speak while stream plays. At high stream volumes, duck stream while speaking.

I think Apple's approach shows promise, but its implementation needs considerable refinement. For me, both the threshold and ducked volumes are too high. I wish I could configure them. With that said, comments on the Internet make clear that many people listening to spoken audio (e.g., podcasts) would prefer pausing over ducking, anyway.

Nav App vs. Car Radio

Radio is a different kind of streaming beast, because it's not pausable. That requires that Google change its approach to audio conflicts. Not so for Apple, which sticks to its guns and issues navigation instructions on top of whatever's playing on the radio. If you happen to be listening to a talk show or the play-by-play of a sporting event, you've again got dueling voices, and there's a good chance you'll understand neither. I'm generally a fan of UX consistency, but Apple's here is of Emerson's foolish variety.

Google's approach--to mute the radio while Google Maps speaks--is a lot more reasonable. There's a tiny chance you'll miss something important (e.g., an AMBER Alert), but it's more likely you'll just miss a snippet of a song, commercial, or host's blather.

We thus have:

Google Maps: Mute radio, speak, un-mute radio.
Apple Maps: Speak while radio plays. At high radio volumes, duck radio while speaking.

Given their current implementations, Google Maps' behavior is vastly preferable to Apple Maps', but I think a blended approach would be an improvement to both. A choice between muting and configurable ducking would be very attractive.

Nav App vs. Phone Call

While you're on the phone. Google is polite enough not to wrest control of your car's speakers from the person you're talking to, but it has no Plan B. If you don't notice the visual navigational directive on the CarPlay screen, you're out of luck: Google Maps remains silent if you're in a phone call.

Apple doesn't need a Plan B, because its Plan A is so good. When Apple Maps wants to speak, but you're on the phone, it issues a chime--a subtle indication that it'd like to tell you something, but it can't. It's your cue to look at the CarPlay screen to see what you need to do. I find it works well.

So:

Google Maps: Don't speak or make any other sound.
Apple Maps: Don't speak, but issue a chime.

What I admire about Apple's approach is that it takes advantage of the fact that you have a screen; CarPlay requires it. A chime is enough to tell you to look at it, but it's not so intrusive that it interrupts the flow of a conversation. Well done, Apple!

Nav App vs. Dictation

If a nav app wants to speak while you're talking to your phone (e.g., issuing a command to Siri or dictating a text message), the audio conflict is between you and your nav app. Google Maps bursts in like an excited child who, unable to restrain itself, says its piece without regard for the fact that you're already talking. It's rude, not to mention a sure-fire mechanism for derailing your train of thought.

Google Maps' interruption also aborts whatever you're dictating, e.g., the command you're issuing or the text you're dictating. This means you have to start over after the excited child has said its piece (and hope that it produces no further audio interruptions before you're finished).

Apple Maps' chime strategy would work well here, but Apple inexplicably adopts Google's speak-no-evil policy and remains silent. If you're dictating and Apple Maps wants to tell you something, the only way you'll know is by looking at the CarPlay screen. It's a bitter pill to swallow after the cleverness of Apple's phone call conflict resolution.

This is a case where both nav apps offer disappointing and frustrating behavior:

Google Maps: Interrupt and abort dictation, speak.
Apple Maps: Don't speak or make any other sound.

Nav App vs. Phone Voice Response

If your phone is talking to you when Apple Maps wants to speak, Apple employs the chime approach again. It works as well here as it did in the phone-call scenario, and it really raises the question of why Apple didn't apply it in the dictation case. You hear the chime, you know to look at your CarPlay screen, but you can continue to listen to your phone. It just works.

Google Maps treats your phone like a radio station. It mutes your phone's voice while it issues navigational instructions, then it turns your phone's voice on again. This is a curious decision. It means you could miss important information, such as a crucial part of an email message. I'd expect a nav app to treat a phone's voice like a streaming app and pause it while the nav app speaks. Perhaps there's no CarPlay API to pause Siri output...?

Recap:

Google Maps: Mute phone voice, speak, un-mute phone voice.
Apple Maps: Don't speak, but issue a chime.

If you have additional information on how nav apps handle audio conflicts, please let me know in the comments below.

Tuesday, July 25, 2023

Image Metadata: Thoughts after 4000+ Scans

This is part 7 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem

Part 7: Thoughts after 4000+ Scans (this post)

In January 2022, I explained in parts 1-4 of this series why and how I store metadata in image files containing scans of, e.g., 35mm slides. I failed to mention that the strategy I came up with was largely based on merely thinking about the problem. At the time, I had made relatively few scans. However, I had contracted with Chris Harmon at Scan-Slides to scan several thousand old slides, so I had to specify how the metadata was to be handled. The approach I developed was what I directed Scan-Slides to do.

In the 18 months since then, I've received the image files from Chris, worked with the metadata, and added scans of a few other types of objects (e.g., drawings, notes, and letters). My perspective on image file metadata is now based not just on thinking, but also on experience. It's a good time to reflect on how well my metadata strategy has held up.

Metadata Entry

One thing has held up very well: the decision to have Scan-Slides do the initial metadata entry. It was clear from the get-go that this was going to be demanding work. It required deciphering hand-written descriptive information on slide trays and slide frames and, for each scan, entering the information into the appropriate metadata fields in the proper format. Several slide sets were from overseas trips, so many descriptions refer to locations in or people from places that look, well, foreign, e.g., "Church at Ste Mere Eglise" and "Sabine & Sylvie - Fahrradreise in Langeland."

Copying the "when developed" date off slide frames proved unexpectedly challenging. Although many dates were clearly printed, some were debossed rather than printed, and those were harder to read. Some timestamps were mis-aligned with the slide frames they were printed on, leading to dates that were partially missing, e.g., the lower half of the text or its left or right side was not present. Some frames had no development date on them, but it was hard to distinguish that from faint timestamp ink or debossed dates that were only slightly indented.

Chris handled description and development date issues with patience and equanimity. His accuracy was impressive. There were some errors, but that's to be expected when you're copying descriptions from thousands of slides involving places, people, and languages you don't know. I don't believe anyone would have done better.

It's something of a miracle that he was willing to do the metadata entry at all. Of the half-dozen slide scanning services I contacted, he was the only one who didn't reject it out of hand. There was an extra fee for the work, of course, but it was money well spent. It allowed me to devote my energies to aspects of the project that only I could do, e.g., correcting metadata errors and organizing image sets.

Four Metadata Fields

Using only four metadata fields (Description, When Taken, When Scanned, and Copyright) has proven sufficient for my needs. Populating those fields hasn't been burdensome. I feel like I got this part of metadata storage right.

The "When Taken" Problem

Using the Description metadata fields to store definitive information about when a picture was taken and storing an approximation of that in the "when taken" metadata fields has worked acceptably, but this remains a thorny issue. I still think it's prudent to follow convention and assign unknown timestamp components the earliest permissible value, but I continue to find it counterintuitive that images with less precise timestamps chronologically precede images with more accurate information.

The vaguer the "when taken" data, the less satisfying the "use the earliest permissible timestamp values" approach. Google Photos--one of the most important programs I use with digital images--insists on sorting photo libraries chronologically, so there's no escaping the tyranny of "when taken" timestamps. (Albums in Google Photos have sorting options other than chronological, but for a complete photo library, sort-by-date is the only choice.) For images lacking "when taken" metadata, Google Photos uses the file creation date. This is typically off by decades for scans of my slides and photos, so I've found that omitting "when taken" metadata is worse than putting in even a poor approximation.

Overall, my "put the truth in the Description fields (where you know programs will ignore it) and an approximation in the 'when taken' fields (where you know programs will treat it as gospel)" approach is far from satisfying, but I don't know of a better one. If you do, please let me know.

Making Image File Metadata Standalone

I originally believed I was storing all image metadata in image files, but I was mistaken. Inadvertently, I stored some of the metadata in the file system. Scans of the slides from my 1976 trip to Iceland, for example, are in a directory named Iceland 1976, and the files are named Iceland 1976 001.jpg through Iceland 1976 146.jpg. The metadata in the image files indicate what's in the images and when the slides were taken, but they don't indicate that they were taken in Iceland or that they were from a single trip. That information is present only in the image file names and the fact that they share a common directory.

Going forward, I plan to include such overarching information about picture sets in each of the scans in the set. That will make the Description metadata in each image file longer and more cumbersome, but it will also make each image file self-contained. As things stand now, if a copy of Iceland 1976 095.jpg (shown) somehow got renamed and removed from the other files in the set (e.g., because it was sent to someone using iMessage), there would be no way to extract from the file that this picture was taken in Iceland and was part of the set of slides I took during that trip. My updated approach will rectify that.

Putting All Metadata into Image Files

The slides I had scanned were up to 70 years old. Over the decades, slides got moved from one collection to another. Some got mis-labeled or mis-filed. As I was reviewing the scans, I often found myself wondering whether a picture I was looking at was related to other pictures I'd had scanned. My parents may have made multiple trips to Crater Lake or Yosemite National Parks in the 1950s and 1960s, for example, but I'm not sure, because there's not enough descriptive information on the slides (or the boxes they were in) to know.

More than once I wished I could find a way to reconstitute the set of slides that came from a particular roll of film. I think this is often possible. In addition to hints in the images themselves (e.g., what people are wearing, what's in the background, etc.), slide frames are made of different materials and often include slide numbers, the film type, and the name of the processing lab. Development date information, if present, is either printed or debossed, and, when printed, the ink has a particular color (typically black or red). All these things can be used to determine whether two potentially related slides are likely to have come from the same roll of film.

I briefly went down the road of creating a spreadsheet summarizing these factors for various sets of slides, but it's a gargantuan job. It didn't take long for me to stop, slap myself, and reiterate that I was working on family photos, not cataloging historic imagery for scholarly research. Nevertheless, I think it would be nice to have more metadata about slide frames in the image files. Whether Chris (or anybody else) would agree to enter such information as part of the scanning process, I don't know.

Dealing with Google Photos

Words cannot describe how useful I find Google Photos' search capability. Despite the effort I've invested adding descriptive metadata to scanned image files and the time I've taken to help Google Photos (GP) identify faces of family and friends, it's not uncommon for me to search for images with visual characteristics that are neither faces nor described in the metadata, e.g., "green house" or "wall sconce". Such searches turn up what I'm looking for remarkably often. The alternative--browsing my library's tens of thousands of photos--is impractical. This makes Google Photos indispensable. That has some implications.

When Scan-Slides started delivering image files, I plopped them into a directory on my Windows machine where GP automatically uploads new pictures. I then went about the task of reviewing and, in some cases, revising the metadata in the scans. In many cases, this involved adjusting the "when taken" metadata, either because the information I'd given Scan-Slides was incorrect (e.g., mis-labeled slide trays) or because Scan-Slides had made an error when entering the data. I also revised Description information to make it more comprehensive (e.g., adding names of people who hadn't been mentioned on the slide frames) or to impose consistency in wordings, etc. The work was iterative, and I often used batch tools to edit many files at once.

Unbeknownst to me, GP uploaded each revised version of each image file I saved. And why not? Two image files differing only in metadata are different image files! By the time I realized what was happening, GP had dutifully uploaded as many as a half dozen versions of my scans. I wanted only the most recent version in each set of replicates, but Google offers virtually no tools for identifying identical images with different metadata. The fact that I'd often changed the "when taken" metadata during my revisions and that GP always sorts photo libraries chronologically meant that different versions of the same image were often nowhere near one another.

The lesson, to borrow a term from Buffy the Vampire Slayer, is that iteratively revising image file metadata and having Google Photos automatically upload new image files are un-mixy things.

I told GP to stop monitoring the directory where I put my scans, spent great chunks of time eradicating the thousands of scan files GP had uploaded, and resolved to manually upload my scans only after I was sure the metadata they contained was stable. A side-effect was that I could no longer rely on GP acting as an automatic online backup mechanism for my image files, but since I have a separate cloud-based backup system in place, that didn't concern me.

Multiple-Sided Objects

Scannable objects have two sides: the front and the back. For photographs, it often makes sense to scan both sides, because information about a photo is commonly written on the back. (If scans of slides included not just the film, but also the slide frames, it would make sense to scan both sides, thus providing a digital record of slide numbers, film types, and processing labs, etc., that I mentioned above.)

Scanning both sides of a two-sided object (e.g., the front and back of a photograph) yields two image files. That's a problem. Separate image files can get, well, separated, in which case you could find yourself with a scan of only one side of an object. Preventing this requires finding a way to merge multiple images together.

I say multiple images, because there might be more than two. Consider a letter consisting of two pieces of paper, each of which has writing on both sides. A scan of the complete letter requires four images: one for each side of each piece of paper. If the letter has an envelope, and if both sides of it are also scanned, the single letter object (i.e., the letter plus its envelope) would yield six different images.

Some file formats support storing more than one "page" (i.e., image) in a file. TIFF is probably the most common. Unfortunately, TIFFs, even when compressed, are much larger than the corresponding JPGs--about three times larger, in my experiments. More importantly, TIFF isn't among the file formats supported by Google Photos. When multi-page TIFFs are uploaded to GP, GP displays only the first page. For me, that's a deal breaker.

It's natural to consider PDF, but PDF isn't an image file format, so it doesn't offer image file metadata fields. In addition, PDF isn't supported by GP. Attempts to upload PDFs to Google Photos fail.

My approach to the multiple-images-in-one-file problem is to address only double-sided objects (e.g., photographs, postcards, etc.). I handle them by digitally putting the front and back scans side by side and saving the result as a new image file (as at right). A command-line program, ImageMagick, makes this easy. The metadata for the new file is copied from the first of the files that are appended, i.e., from the image for the front of the photograph.

I haven't yet had to deal with objects that require more than two scans, e.g., the hypothetical four-page letter and its envelope. My current thought is that the proper treatment for those is to ignore Google Photos and just use PDF. I'm guessing that most such objects will be largely textual (as would typically be the case for letters), in which case OCR and text searches will be more important than image file metadata and image searches.