Saturday, July 5, 2025

Convertible EV Spot Check

It's been more than two years since I last looked into the availability of all-electric convertibles, so I thought I'd do a quick check to see where things stand now. Here's a rundown (in alphabetical order) on the models I mentioned in that post, plus a couple of others I've added since then:

All in all, a pretty gloomy picture for those of us who yearn for an EV analogue to the Miata. Sigh.

Tuesday, May 27, 2025

The Dismal Failure of LLMs as EV Search Aids

For six years, I've been in the market for an electric compact SUV. I haven't yet found one with the features I want at a price I'm willing to pay. (As a rule, missing features have been a bigger impediment than price.) My last review took place about a year ago, so I decided it was time to look again.

This time I experimented with seven LLM-based chatbots as search assistants. I gave each the following prompt:

List all the fully electric compact SUVs for sale in the United States that have all-wheel drive, an openable moonroof or sunroof, an all-around (i.e., 360-degree) camera, an EPA range of at least 250 miles, and are no more than 180 inches in length.

My assistants were the unpaid versions of these systems (listed in the order in which I happened to test them):

The results were eye-opening. None of the systems listed the only vehicle that fulfills the criteria (the Volvo EX40), and all but one listed vehicles that violate the requirements. Worse performance is hard to imagine. The false positives waste your time pursuing dead ends, while the false negatives imply that no qualified EVs exist, even though one does.

Complete failure was averted by one system (You.com) mentioning, almost as an afterthought, the Volvo XC40 Recharge. That car was renamed the EX40 last year, but searching for the old name will quickly lead you to the new name, and that will finally put you on the trail of the only car that satisfies my criteria.

[Update 5/28/25: Per the comments on this post, paid versions of at least ChatGPT and Claude produce much better results than the ones I experienced. I've added links to the conversations I had with the various unpaid chatbots in discussions below.] 

The chatbots failed in a variety of ways (the links are to my conversations with the chatbots):

  • ChatGPT said "here are the models that meet all requirements," then listed five EVs and their specs. For four of the five, the displayed specs were contrary to the requirements, meaning ChatGPT "knew" (to the extent that LLMs "know" things) that these cars shouldn't have been listed. For the fifth car, one of the specs it listed was simply incorrect.There was no mention of the Volvo EX40.
  • Perplexity's behavior was similar to ChatGPT's: it claimed to list cars fulfilling the criteria, then "knowingly" listed ones that don't. The twist was that two of the three EVs Perplexity listed--the Rivian R2 and the Toyota C-HR EV--don't exist yet. (If they did, the Rivian would exceed the length constraint, and it looks like the Toyota would likely fail the openable-roof test.) Perplexity made no mention of the Volvo EX40.
  • Claude listed only one car, Tesla's Model Y, saying it "clearly meets all your requirements. It offers ... a panoramic glass roof ... and measures approximately 187 inches in length. However, this exceeds your 180-inch length requirement." Its dithering on the car's length was disappointing, and its failure to distinguish a panoramic glass roof from one that opens was worse. There was no mention of the Volvo EX40.
  • Gemini's response began with "Here's a breakdown of current and upcoming electric compact SUVs and how they stack up against your requirements," which was not what I had asked for. It listed five cars and their specs, noting for each vehicle the specs that violated my criteria. It ultimately concluded, "there may not be any currently available fully electric compact SUVs that precisely meet all conditions." Except there is, of course.
  • Copilot produced a refreshingly short response featuring a very nicely formatted table that summarized the two EVs it said satisfied my requirements. Neither does, though Copilot showed no specs that reflect that, so it may not have "known" it was wrong. There was no mention of the Volvo EX40.
  • You.com started with this rather confusing statement: "Based on the search results provided, there is no direct information listing fully electric compact SUVs in the United States that meet [your] criteria." It then launched into an explanation of how to perform my own search (<eyeroll/>). Then came a surprise. It introduced the Volvo XC40 Recharge and showed how it satisfied my requirements, though it seemed unsure of itself: "The Volvo XC40 Recharge appears to meet all your criteria. However, I recommend verifying [everything]." Ultimately, You.com found the rabbit in the hat and pulled it out, but its response was confusing and disjointed, and it referred to the rabbit by an obsolete name. 
  • Mistral followed Copilot's lead in producing a short, clear response built around a well-formatted table of information that was often incorrect or inconsistent with my requirements.There was no mention of the Volvo EX40.

As a group, the systems produced responses rife with claims that were incorrect, inconsistent, and/or incomplete. The last of these is the most disturbing. Six of the seven systems didn't mention the only SUV fulfilling the requirements. The one that did hid it after an explanation of how to do your own search, and even then it referred to that car by an outdated name.

There is a lot of work to be done before LLM chatbots are reliable search assistants.

Monday, May 26, 2025

Three Experiences with Video and AI

Finding an Old TV Episode

I recently found myself wondering about a TV episode I saw decades ago. I had only the haziest memory of it, so I threw this at Gemini:

I'm looking for an episode from the original TV show The Outer Limits or The Twilight Zone. The story is about a man with a robotic hand that he has to add fingers to in order to increase its ability to help him figure out what is happening. Do you know this episode? 

Gemini did, correctly identifying it as "Demon with a Glass Hand" from the 1960s series, The Outer Limits. Googling for that yielded a link to the episode at The Internet Archive, which I downloaded and added to my Plex server. 

Less than 15 minutes elapsed between the time I thought about the episode and the time I had it in my video library. It's not the best television content in the world, but I marvel at how easily I was able to track down and watch a show from 60 years ago based on only a very sketchy memory.

Upscaling the Episode

"Demon with a Glass Hand" isn't terribly compelling, but that doesn't mean it shouldn't look good. Unfortunately, 1963 TV was SD, and these days we're used to a lot better resolution than the 496 x 368 I got from the Internet Archive. 

Earlier this year, I purchased a copy of VideoProc Converter AI to experiment with upscaling low-resolution 8mm family videos I'd had digitized. The results were impressive on everything except faces, which the upscaling process tended to turn into grotesque caricatures of the people behind them. But hope springs eternal, so I decided to see what VideoProc could do with "Demon with a Glass Hand." 

Invoking the program yielded a message excitedly telling me that a new version was, you know, faster and better, and I should upgrade immediately. It was free, so I did, but I didn't expect that V3 would be noticeably better than V2. When was the last time a program upgrade lived up to its PR?

In this case, I think it does. Check it out:

Upscaling is an interesting challenge, because it involves fabricating information (pixels) not present in the original images. Simple interpolation doesn't do a very good job, and VideoProc's V2 AI-based approach fell apart on faces. V3's faces aren't perfect, but I think they're good enough for casual viewing, and that's an impressive accomplishment.

Looking Forward

A few days ago, Andrei Alexandrescu brought my attention to this reddit post featuring a synthesized video by Ari Kuschnir using Google's Veo. The clip takes advantage of Veo's new ability to generate audio tracks, including dialogue and singing. I find the clip pretty amazing. There are legitimate questions about how Veo was trained and how its output could be used for ill, but I prefer to focus on the technical progress it represents and the creative promise it offers. 

Incongruously, I was reminded of the Veo demo after viewing another old TV episode I barely remembered, one Gemini identified from this prompt:

I'm now thinking of a different episode, again from The Twilight Zone or The Outer Limits. It involves a man who goes to a store to custom-order a woman. He chooses eye color, etc. Any idea which episode this is?

Again Gemini knew what I was looking for ("I Sing the Body Electric" from the original The Twilight Zone), Google found a downloadable link to it, and I had it on my Plex only a few minutes after issuing the query. 

The episode is quite terrible (much worse than "Demon with a Glass Hand"), but I liked the hopeful ending. Not the part summarizing grandma's data collection and sharing policy ("Everything you ever said or did, everything you ever laughed or cried about, I'll share with the other machines"), but the optimistic sentiment behind Rod Serling's closing voiceover:

Who's to say at some distant moment, there might be an assembly line producing a gentle product in the form of a grandmother, whose stock in trade is love?

For countries with an aging population requiring increasingly attentive personal care, I'd expect that gentle, loving robots rolling off an assembly line could be a pretty attractive prospect.

Saturday, July 20, 2024

Anthropic's Claude Aces my German Grammar Checker Tests

Yesterday I published the results of my latest testing of German grammar checking systems. Unlike my first round of tests, I included LLMs, in particular ChatGPT, Gemini, and Copilot. I had a nagging feeling that I should include Anthropic's Claude, too, but I didn't find out about Claude until near the end of my testing, and I wanted to be finished, so I decided to worry about Claude later. This was a terrible decision. Later turned out to be only a few hours after I'd published the article, when I unexpectedly found myself with enough free time to play around with the system. Claude proceeded to not only outperform every other system I'd tested, it aced my set of tests with a perfect score

My test set is hardly exhaustive, but no other system has managed to find and correct all 21 errors in the set. Kudos to Claude and Anthropic!

Here are the updated results of my testing after the addition of Claude to the list:


Friday, July 19, 2024

German Grammar Checkers Revisited

Update: I have since added Anthropic's Claude to my tested systems, and it outperformed every system I discuss below. Details here.

 A couple of months ago, I blogged about how I'd (superficially) tested a number of grammar checking tools for German. A comment from jbridge introduced me to the idea of using ChatGPT as a grammar checker, and in the course of exploring that option, I expanded my set of tests to make it a little less superficial. That led to new insights, so it seems like it's time for a German grammar checking tool follow-up.

If you're not familiar with my original post, I suggest you read it.

As before, I'm testing only free tools. I generally test without signing up for or logging into any accounts, but for ChatGPT, I created a free account and logged in so I'd have access to the more powerful ChatGPT 4o rather than the no-account-required ChatGPT 3.5.

Update on the Tools I Looked At Last Time

I remarked last time that Scribbr and QuillBot are sister companies using the same underlying technology, but Scribbr found more errors. That is no longer the case. In my most recent testing, they produce identical results, so we can speak of Scribbr/QuillBot as a single system. Unfortunately, the way this uniformity was achieved was by bringing Scribbr down to QuillBot's level rather than moving QuillBot up to Scribbr's. Even so, Scribbr/QuillBot remains the second-best grammar-checking tool I tested (after LanguageTool). It continues to find errors that LanguageTool misses, so my default policy remains to use both.

My prior test set consisted of six individual sentences. This time around, I added a letter I had written. It's about 870 words long (a little under two pages), and, as I found out to my chagrin, contains a variety of grammatical errors. Checking such a letter is more representative of how grammar checkers are typically employed. Just as we normally spell-check documents instead of single sentences, grammar checkers are usually applied to paragraphs or more.

That had an immediate effect on my view of DeepL Write. I noted in my original review that it's really a text-rewrite tool rather than a grammar checker. On single sentences, you can use it to find grammatical errors, but when I gave it a block of text, it typically got rid of my mistakes by rephrasing things to the point where the word choices I'd made had been eliminated. I no longer consider it reasonable to view DeepL Write as a grammar-checking tool. It's still useful, and I still use it; I just don't use it to look for mistakes in my grammar.

Adding the letter to my set of tests also led me to remove TextGears and Duden Mentor from consideration, because they both have a 500-character input limit. My letter runs more than ten times that, about 5300 characters. Eliminating these systems is little loss, because, as I noted in my original post, GermanCorrector produces the same results as TextGears, but it doesn't have the input limit. As for Duden Mentor, it flags errors, but it doesn't offer ways to fix them. I find this irritating. Using it is like working with someone who, when you ask if they know what time it is, says "Yes."

These changes don't really matter, because when I submitted my augmented test set (i.e., sentences from last time plus the new letter) to the systems from my original post, LanguageTool and Scribbr/QuillBot so far outperformed everybody else, I can't think of a reason not to use them. I'll provide numbers later, but first we need to talk about LLMs.

ChatGPT and other LLMs

In his comment on my original blog post, jbridge pointed out that ChatGPT 4o found and fixed all the errors in my test sentences. That was better than any of the systems I'd tested. Microsoft's LLM, Copilot, produced equally unblemished results. Google's Gemini, however, found and fixed mistakes in only four of the six sentences.

I asked all the systems (both conventional and LLM) to take a look at my letter. ChatGPT found and fixed 13 of 15 errors, the best performance of the group. LanguageTool also found 13 of the 15 mistakes, but ChatGPT fixed all 13, while LanguageTool proposed correct fixes for only 11.  

Given ChatGPT's stellar performance, I was excited to see what Copilot and Gemini could do. Copilot kept up with ChatGPT until it quit--which was when it hit its 2000-character output limit.  This was less than halfway through the letter, so Copilot found and fixed only four of the 15 errors. 

Gemini's output was truncated at about 4600 characters, which is better than Copilot, but still less than the full length of the text. Gemini was able to find and fix eight of the 15 errors. This is notably fewer than ChatGPT, and not just because Gemini quit too early. In the text that Gemini processed, it missed three mistakes that ChatGPT caught. 

I had expected that the hype surrounding ChatGPT was mostly hype and that the performance of ChatGPT, Copilot, and Gemini would be more or less equivalent. If their performance on my letter is any indication, ChatGPT is currently much better than the offerings from Microsoft and Google when it comes to checking German texts for grammatical errors. It also does better than LanguageTool, the best-performing non-LLM system, as well as the combination of LanguageTool and Scribbr/QuillBot together. It's really quite impressive.

However, just because it's better doesn't mean it's preferable. Read on.

Learning from LLMs

My interest in grammar checkers is two-fold. Sure, I want to eliminate errors in my writing (e.g., Email and text messages), thus sparing the people who receive it some of the kinks in my non-native German, but I also want to learn to make fewer mistakes. Knowing what I'm doing wrong and how to fix it will help me get there. At least I hope it will.

Traditional (non-LLM) grammar checkers are good at highlighting what's wrong and their suggestion(s) on how to fix it. This is what LanguageTool looks like on the second of my single-sentence tests. The problematic text is highlighted, and when you click on it, a box pops up with a suggested fix:

The other grammar checkers work in essentially the same way.

LLM systems are different. They can do all kinds of things, so if you want them to check a text for German grammar errors, you have to tell them that. This is the prompt I used before giving them the letter to check:

In the following text, correct the grammatical errors and highlight the changes you make.

For the most part, they did as they were told, but only for the most part. Sometimes they made changes without highlighting them. When that was the case, it was difficult for me to identify what they'd changed. It's hard to learn from a system that changes things without telling you, and it's annoying to learn from one that disregards the instructions you give it. But that's not the real problem.

The real problem is that LLMs can be extraordinarily good at doing something (e.g., correcting German grammar errors) and unimaginably bad at explaining what they've done. Here's ChatGPT's explanation of how it corrected the last of my single-sentence tests. (Don't worry if you don't remember the sentence, because it doesn't matter).

"die gute Flugverbindungen" is incorrect, as "Flugverbindungen" is plural and therefore "die" must be used, but it must also be "gute" instead of "gute".

That's nonsense on its own, but it also bears no relation to the change ChatGPT made to my text. Explanations from Gemini and Copilot were often in about the same league.

Lousy explanations may be the real problem, but they're not the whole problem. My experience with LLMs is that they complement their inability to explain what they're doing with a penchant for irreproducability. The nonsensical ChatGPT explanation above is what I got one time I had it check my test sentences, but I ran the tests more than once. On one occasion, I got this:

"Ort" is masculine, so the relative pronoun should be "der" to match the previous clause and maintain grammatical consistency.

This is absolutely correct. It's also completely different from ChatGPT's earlier explanation of the same change to the same sentence. 

Copilot upped the inconsistency ante by dithering over the correctness of my single-sentence tests. The first time I had it look at them, it found errors in all six sentences. When I repeated the test some time later, it decreed that one sentence was correct as is. A while after that, Copilot was back to seeing mistakes in every sentence.

Using today's LLM systems to improve your German is like working with a skilled tutor with unpredictable mood swings. When they're good, they're very, very good, but when they're bad, they're awful.

I view the instability in LLM behavior as a sign of hope. The systems are evolving, and I'm confident that, over time, they'll get better and better. ChatGPT already finds and fixes more errors in my test set than any other system (see below). Its descriptions of what it's doing and why are sometimes delusional, but I have faith that its developers will get that under control. I have similar faith that other LLM systems will also improve, offering better results and increased consistency. For now, I'm sticking with LanguageTool and Scribbr/Quillbot, but I have no doubt that LLM technology will assume an increasingly important role in language learning. (LanguageTool says it's "AI-based," so for all I know, it already uses an LLM in some way.)

Systems and Scores

Update: As noted above, I have added Anthropic's Claude to my tested systems, and it outperformed everything in the list below. Details here

I submitted the six single-sentence tests from my first post plus the new approximately-two-page letter to the grammar checkers in my first post as well as to ChatGPT, Gemini, and Copilot. As noted above, I disregarded the results from DeepL Write, TextGears, and Duden Mentor. I did the same for Studi-Kompass, which I was again unable to coax any error-reporting out of. I scored the remaining systems as in my original post, awarding up to four points for each error in the test set. A total of 84 points was possible: 24 from the single-sentence tests and 60 from the letter. These are the results:

Wednesday, June 19, 2024

How a Bad Monitor Port Thwarted my Move to macOS

In March, I decided to move from Windows to Mac. My goal was texting on the desktop, i.e., texting via my computer. Many people insist on texting, and I was able to communicate with them only through my iPhone. I wanted to be able to do it using my computer, as I did with email and WhatsApp.

Changing operating systems is always a production, but in thirty-plus years with Windows, I had never developed an affection for it, so my only real concern was that I'd have to give up the three-monitor setup I'd used for over a decade. I connect my monitors to a docking station made by Lenovo that's compatible with my Lenovo laptop. Apple doesn't make docking stations, but an associate at the local Apple store assured me I'd have no trouble using a third-party dock. He pointed me to Plugable. Pluagable recommended their UD-ULTC4K, and I ordered one for use with the MacBook Pro M3 I purchased.

I bought a bunch of stuff at this time. I got a new keyboard with Mac-specific keys. I got a new monitor offering HDMI and DisplayPort inputs, because one of my current monitors didn't have either, and those are the video outputs on the Plugable docking station I'd ordered. I purchased new video cables to connect everything.

As we'll see, the new monitor is the villain of this story. We'll call it the ViewSonic.

Missing Monitors and Restless Windows

A few days after I started using the Mac-based system, I wrote Plugable about an intermittent problem I was having. It occurred only after a period of inactivity (POI), i.e., when the displays turned off because I wasn't interacting with the system. When I started using the Mac after a POI, one or two monitors might fail to wake up. The Mac would then shuffle the windows from the "missing" monitors to the monitor(s) it detected. Most of the time, only one monitor went blank. It was rarely the ViewSonic. 

Two weeks of debugging with Plugable followed, during which logs were collected, software was updated, cables were swapped, a replacement dock was issued, and we started all over. Then I reported what I came to call restless windows:

On some occasions, after all three screens come up, some of the windows that should be on one screen have been moved to another one.

Neither Plugable nor I had theories about how this could happen, nor did we have ideas for further debugging the missing-monitor problem, which hadn't gone away. We agreed that I'd return their dock, and they'd refund my money. I was disappointed at how things worked out, but Plugable acquitted itself exemplarily throughout.

I replaced the Plugable dock with TobenONE's UDS033. It also yielded intermittent disappearing monitors after a POI. TobenONE's interest in helping me debug the problem was minimal, and it vanished entirely when they found I'd purchased the dock from Amazon instead of from them. I returned it.

The manager at the local Apple store was sympathetic about the trouble I was having. She suggested swapping out the MacBook to see if that was causing the problems, extending Apple's return period to facilitate the swap. I ordered a replacement computer matching the one I already had.

There are two basic MacBook Pro M3 models. The M3 Pro, which is what I had, natively supports up to two external monitors. To connect three, you have to use a docking station employing a technology that, from what I can tell, fools a MacBook into thinking there are two external monitors when in fact there are three. The big cheese among such technologies is DisplayLink. Both Plugable and TobenONE use it. Internet sentiment towards DisplayLink is lukewarm, but if you need to connect three monitors to an M3 Pro MacBook, it's your primary choice.

The other MacBook Pro M3 model is the M3 Max. It's more expensive than the M3 Pro, but it natively supports up to three external monitors. I'd originally purchased an M3 Pro, and the  replacement I'd ordered was also an M3 Pro, but while it was in transit, I realized that by upgrading to an M3 Max, I could eliminate the need for a docking station as well as DisplayLink, thus simplify the debugging of my problem. I canceled the M3 Pro replacement before it was delivered, and I ordered an M3 Max. Apple was unfazed by this, but my credit card company was on high alert, noting that my pricey orders from Apple were unprecedented and asking for confirmation each time. 

I was excited about the M3 Max. With docks and DisplayLink out of the picture, surely my missing-monitor and restless-windows problems would disappear! 

Um...no. Which is not a surprise, because I've already told you that the source of my display drama was a bad port on the ViewSonic. At the time, I didn't know that.

I connected my monitors to the M3 Max and arrayed them side by side. The ViewSonic was in the middle, because it was the newest and had the spiffiest specs. Not long afterward, I had the bizarre experience of returning to my computer after a POI and seeing that the windows on my left and right monitors had swapped! Online discussions (e.g., here and here and here) showed that I was not the first to experience this. I logged several instances before calling Apple. One tech remarked, "Yeah, that happens to me, too." A second told me, "Engineering has an open issue on that." There was every reason to believe that this was a MacBook problem.

Notice that the window-swapping behavior did not involve the ViewSonic. The restless windows afflicted only the side monitors. The ViewSonic sat quietly in the middle with an innocent look on its face.

Apple told me they'd look into the problem and get back to me. In the meantime, I logged what happened after every POI. On Day 1, the windows on the left and right monitors swapped six times out of seven POIs. On Day 2, eight of ten POIs resulted in window swaps. 

Then things got strange.

The Music app often jumped from the right monitor to the middle one, even though other windows on the right monitor stayed put. Sometimes the windows from the right monitor moved to the left monitor, but the windows on the left monitor remained in place. (That's half a swap.) I started taking screen shots before POIs (i.e., just before I left the computer for a while) for comparison with what I saw after a POI. I found that my restless windows sometimes did more than just jump from one monitor to another. They might take on a different size or their position on the monitor might change. Or, as in this example, both:

Screen shots before POI (above) and after (below). The ViewSonic is the middle monitor, where nothing changes.

I gave up. I'd been battling missing monitors and restless windows for three months, and there was no end in sight. The Internet showed that others had the kinds of problems I did, and they couldn't solve them. Apple support reps told me they experienced the behavior I did, and I'd received no follow-up from Apple Engineering. I returned my MacBook on the last day of its return period. I was sad to do it, because I'd been just as pleased with texting on the desktop as I'd hoped, and I'd grown fond of photos taken on my iPhone magically appearing on the MacBook. Apple's reputation for integration isn't for nothing. But convenient texting and synchronized photos weren't enough to compensate for nondeterministic window sizing and placement each time I returned to my machine. I retreated to my Windows system to lick my wounds and consider my options.

Back to Windows

I dusted off my Lenovo laptop and its docking station (literally!), hooked everything up, and booted into Windows. I futzed around and rued the loss of my texting window, then went away for a bit. When I came back, I was stunned to see that one of the windows that had been on my left monitor was now on the middle monitor! Hoping I had somehow imagined it, I moved it back where it belonged and left the machine for another POI. It had not been my imagination. The window was not where I'd left it. In addition, a window on the left monitor was now a different size! 

You know those creepy scenes in movies and TV shows where the protagonist unplugs a TV or a computer monitor to make sure it's off, but it turns back on, anyway? It was like that. The Mac was gone, and I was using the same old Windows system I'd been using for years. Restless windows were impossible. And yet...

It took me a while to realize that it wasn't the same system I'd been using for years. I hadn't swapped back in the monitor the ViewSonic had replaced. That made the ViewSonic the only component common to the Mac-based system I'd been using and the Windows-based system sitting before me--the only component common to all configurations where I'd experienced windows on walkabouts.

I swapped out the ViewSonic for the monitor it had replaced. Everything worked fine. Time after time, my Windows remained where I put them. It was apparent that the ViewSonic was anything but innocent.

Ports

There are two digital input ports on the ViewSonic: HDMI and DisplayPort (DP). (The monitor also has a VGA input, which is analog. That input isn't germane to the story, but it's worth taking a moment to marvel at the longevity of VGA, which debuted in 1987 and remains important enough that monitor vendors continue to support it.) I'd been using the ViewSonic's DP input, because the video output from the Lenovo docking station is DP, and I figured it would be better to go DP-to-DP than DP-to-HDMI. 

The ViewSonic was clearly responsible for my weeks of video despair, but I didn't know if the problem lay with the monitor in general or with the DP port in particular. To find out, I swapped the ViewSonic back in to the system, this time connecting to the HDMI port. A zillion POI trials convinced me that the monitor worked fine with HDMI. The problem had to stem from the DP connection.

That connection has three parts: the DP port on the docking station, the DP port on the ViewSonic, and the cable between them. I'd successfully used the DP port on the dock when testing the ViewSonic's HDMI connection, so the dock's DP output was in the clear. That meant the source of my monitor madness was either the ViewSonic's DP port or the cable leading to it. I bought a new cable and connected to the ViewSonic's DP port. My windows were restless again. Two cables with the same behavior meant the cable wasn't the problem. The guilty party had to be the ViewSonic's DP port. It was the last suspect standing. 

To really clinch the case, I'd need to replace the ViewSonic with an identical monitor and verify that everything works over a DisplayPort connection with the replacement monitor. I'm working with ViewSonic on that now.

Update: ViewSonic replaced my monitor, but the new monitor behaves the same as the old one: my windows wander when I use its DisplayPort input, but they stay put when I use its HDMI input. I can't explain this. The chance of both monitors being defective is nil. ViewSonic might be handling the DisplayPort protocol in a funky way, but if that were the case, I'd expect there to be lots of reports of the problem, and I don't see that. My windows wandered with four docking stations, three computers, and two operating systems, so the only common denominator is the ViewSonic monitor. And me, now that I think about it...

To me, the big mystery is how a bad port on one of three monitors in a system can, among other things, cause the window managers in two independent operating systems to swap windows on the monitors whose ports are not bad. If you have insight into this, please share!

Mac Thoughts

The case against the ViewSonic is rock-solid, but that doesn't mean the Mac is off the hook. The Internet reports of restless windows under macOS are still there, as are the comments from Apple's support reps acknowledging the problem. It could be that the ViewSonic is defective and Macs have unreliable multiple-monitor support. Testing this would require a fourth MacBook purchase (I wonder what my credit card issuer would think of that), replacing the ViewSonic with a known-good monitor, and seeing what happens. 

It'd be easy enough to do, but I'm not sure I want to. During my three months with macOS, I devoted a great deal of time to debugging missing monitors and restless windows, but I also spent many hours familiarizing myself with the operating system and working within it. Desktop texting and synced photos were great, and moving from the M3 Pro to the M3 Max was the kind of smooth experience that drives home just how sadistic Microsoft's Windows-to-Windows migration process is. (My PC is 11 years old, in no small part because moving to a new machine is so painful.) macOS is a significant upgrade to Windows in important ways, especially if you have other Apple devices, e.g., an iPhone.

However, I found day-to-day life with macOS rather uncomfortable. The menu bar's fixed position at the top of the screen often means moving the mouse a large distance to get to it. I have 24-inch monitors, and those marathon moves got old quickly. Word on the Internet is that the decision to anchor the menu bar atop the screen dates to the original 1984 Macintosh computer. That machine had a nine-inch screen and ran one application at a time, always in full-screen mode.That's nothing like the world I live in. The Mac's fixed menu bar location feels like the 40-year-old design decision it apparently is. I don't think it's stood the test of time.

Keyboard shortcuts can reduce the need to go to the menu bar, I know, but there's a steep memorization curve for them, and not everything has a keyboard shortcut. I find Windows' per-window menu bar more usable. 

macOS seems focused on applications, while Windows is built around windows. Under macOS, Command-Tab cycles through applications. The Windows equivalent Alt-Tab cycles through windows. If you have multiple windows for an application, they share a single entry in the macOS Command-Tab cycle. In the Windows Alt-Tab cycle, each window get its own entry. (macOS offers a way to cycle through all windows associated with an application (Command-↑/Command-↓ after Command-Tabbing to the application), but I find it cumbersome.) The application-based focus on the Mac is so pronounced, you can hide all windows associated with an application, an operation with no Windows counterpart, as far as I know.

The Windows approach makes more sense to me. I partition my work into windows, not applications. I often have multiple independent windows open in a single application, especially browsers and spreadsheets. It's hard for me to think of situations where I'd like to close all windows associated with an application, but it's easy for me to think of situations where I'd like to close only some windows associated with an application. During my time with macOS, I tried to find scenarios where hiding made sense, but I came up empty. I ended up minimizing windows under macOS, just like I did under Windows, and I missed the ability to easily cycle through windows when I wanted to interact with one.

I was surprised to find that I often found familiar content looking rather ugly under macOS. Messages in Thunderbird looked like everything was in bold face, while Excel spreadsheet content was so small, I had to bump the magnification up to 120% to comfortably view it. It's likely that there are configuration changes I could have made to address these issues, but I really expected that an Excel spreadsheet or a Thunderbird email message I'd created under Windows would look pretty much the same when viewed in the same application (often on the same monitors) under macOS. 

On Windows, I run Excel 2010. On macOS, the closest I could get was Office 365. Excel 365 on macOS lacks customization options present in Excel 2010 under Windows. The Quick Access Toolbar (QAT) on the Mac is fixed above the ribbon, for example, while in my Windows version, you can move it below the ribbon--which I do. Some commands I've got on the QAT in my Windows version of Excel--Font Name, Font Size, and Insert Symbol--can't be put in the QAT in Excel 365 on macOS. These limitations are Microsoft's fault, not Apple's, but they still chafe.

A third-party Excel plug-in I use works quite differently and less conveniently under macOS than under Windows. That's neither Microsoft's nor Apple's fault, but it further detracts from the overall Excel experience on a Mac. That matters to me, because I use Excel a lot. Given enough time and effort, I'm sure I could get used to the Mac version of Excel or I could switch to a different spreadsheet program, but between an inconveniently located menu bar, an emphasis on applications over windows, ugly window content, and restricted Excel functionality, the total cost of texting on the desktop comes to a lot more than I'd expected.

Plan B

At least that approach to texting on the desktop does. There is another way. If my going to macOS is too much trouble, it's supposed to be possible to use remote desktop software to bring macOS to me. This requires a Mac in addition to a Windows machine, but I happen to have an old MacBook Air floating around (as it were). I should be able to run a remote desktop server on the MacBook and a remote desktop client on Windows, thus giving me a way to use apps on the Mac--notably iMessage--from Windows. With suitable remote desktop support, I should be able to copy and paste from one machine to another, and the net effect should be pretty close to running Mac apps locally. 

The remote desktop software most frequently mentioned for this is Google Remote Desktop (GRD). I gave it a quick try, and, well, if you accidentally make the Mac both the GRD server and client, you end up with a screen that looks like this:

Obviously, I have more work to do.

Wednesday, May 29, 2024

Five Years of no EV for Me

Five years ago this week my search for an electric compact SUV ended with me buying a conventional gas-powered car. I disliked the car (a Nissan Rogue) within a month after buying it, and I've been on the lookout for an electric replacement ever since. 

Over the years, I've vented my frustrations with the EV market in a number of blog posts, sometimes focusing on their high cost and sometimes on the lack of models with the basic features I'm looking for: all-wheel drive, an openable moonroof, a 360-degree camera, and an EPA range of at least 235 miles. In November, I discussed the luxury-car-level pricing of the only two cars that meet these criteria: the Nissan Ariya and the Volvo XC40 Recharge. Since then, the only things that have changed are the name of the Volvo (now called the EX40) and the elimination of the prospect of Chinese imports pushing down EV prices. (The US government has adopted a policy of keeping them out of the market.)

In the meantime, what I'm looking for in an EV has evolved a bit. I still want all-wheel drive, an openable moonroof, and an all-around camera, but I now want a car on the shorter end of the compact SUV spectrum. My Rogue is 185 inches long. I'd prefer no more than 180 inches. (Tesla's Model Y is 187 inches. Ford's Mustang Mach-E is 186. VW's ID.4 is 181.)

My thinking about range has also changed. EV ranges can't touch those of gas-burners, so I understood that distance driving in an EV requires planning. But this didn't strike me as a problem. If you're driving, say, 400 miles to get from Point A to Point B, stopping for a half hour at 200 miles to recharge isn't a hardship. 200 miles represents 3-4 hours of driving, and who doesn't want to stop at that point to stretch one's legs, use the bathroom, grab a snack, etc? 

A recent trip made me realize that not all long drives consist of extended driving sessions. My wife and I put 400 miles on a tank of gas while meandering along the southern Oregon coast. Most of our driving sessions were under an hour, because we stopped at various beaches (including, of course, Meyers Creek Beach, which is at the mouth of Myers Creek) and small coastal and inland towns (e.g., Bandon, Coquille, Port Orford, and Gold Beach). Many of the places we stopped had no facilities of any kind, much less charging stations. 

According to the Oregon Clean Vehicle Rebate Program EV Charging Station Map, the charging options in Gold Beach, where we spent one night, consist of one 120V charging outlet for E-bikes and one 120V outlet for use by hotel and shop patrons on the opposite side of the river from where we were staying. A two-port charging station at a motel we weren't staying at is noted as "coming soon." The source for this information is shown as PlugShare, but the PlugShare web site shows only the "coming soon" charger, so it's possible that there aren't any charging options in Gold Beach at all.

This casts the "planning" aspect of traveling by EV in a different light. If you're off the beaten track and poking along in a gas-powered car, stopping here and there as the whim moves you, you can take refueling opportunities for granted. (There are four gas stations in Gold Beach.) In an EV, you may have to actively seek out recharging options. You really do have to plan

I haven't yet decided what that means for me as a potential EV owner. It's a simple fact that EV ranges are notably lower than ICE ranges, and it's an equally simple fact that the EV charging infrastructure is much less well developed than the gas station network. For the foreseeable future, buying an EV means accepting those facts and finding ways to cope with them. If I really want to own an EV, I'll have to figure out how to do that.

I probably have plenty of time. The only EV that satisfies my basic criteria and fulfills my new not-longer-than-180-inches criterion is the Volvo EX40 (née XC 40 Recharge). MSRP as I'd like it equipped is nearly $62,000, which is about $20,000 more than my budget.

However, I have a gas-powered riding lawn mower that's not likely to last a lot longer. I'm already thinking of replacing it with an electric version. It could be that the form my first EV takes will be that of a machine you sit on top of and cut grass with.

Monday, May 20, 2024

German Grammar Checkers

I can speak some German. I'll never be fluent, but I can usually get by. Sadly, I make a lot of grammatical errors. It'd be nice to have a tool that could help me find and eliminate them. Syntax and grammar are structured things, seemingly tailor-made for algorithmic analysis. Surely there is software that can analyze my sentences, point out places where I've broken the rules, and tell me how to fix things!

There is. I recently tested more than a dozen programs and web sites that offer this service. The results were less impressive than I'd expected. On my (tiny and unrepresentative) set of sentences containing errors, most tools failed to find most of them. For errors that were found, it was common for the suggested fixes to be wrong. 

I found these sites to offer the most useful results:

  • LanguageTool describes itself as an AI-based spelling, style, and grammar checker. My sense is that the focus is on spelling and grammar, not style. I've found it to do a pretty good job, though there are errors it misses.
  • Scribbr bills itself simply as a grammar checker. It also produces good results, though a hair below those of LanguageTool.
  • DeepL Write claims that its AI approach yields "perfect spelling, grammar, and punctuation" and provides alternative phrasings that "sound fluent, professional, and natural." This means it may rewrite your text to not just eliminate mistakes, but also to make it sound different (presumably better). In my experience, it does a very good job of finding and eliminating errors, but it's sometimes difficult to determine whether it changed something because it's incorrect or because it just felt like rewording it.

In daily use, I generally feed my writing to both LanguageTool and Scribbr, because they're fast, and each sometimes finds mistakes the other misses. If I'm extra-motivated, I also turn to DeepL Write. I've found it to identify mistakes the others miss. I don't use DeepL Write all the time, because I find it annoying to have to tease out whether it changed something on the grounds of correctness or stylistic whim.

In addition to these sites, I also (very cursorily) tested the following systems. I found them to produce notably inferior results. I've listed them in order of decreasing performance, based on my (really limited) tests:

  • QuillBot is a sister company to Scribbr that presumably uses the same underlying technology. I found that the two systems generally give identical results. There are exceptions, however, and in those cases, I found that Scribbr did a better job.
  • Google Docs can be configured to check spelling and grammar as you type. In my testing, it delivered mediocre results.
  • Sapling also produces mediocre results, but it often says "Sign in to see premium edit." I didn't do that, so I can't comment on its premium edits.
  • Microsoft Word, like Google Docs, can be configured to check for spelling and grammar errors as you type. On my tests using Word from Microsoft 365, its coverage was inferior to Google's.
  • Rechtschreibprüfung24 and Korrekturen produced the same results in my testing, so it's possible that they use the same underlying (and unimpressive) checking engine.
  • TextGears and GermanCorrector also produced the same results on my tests, so it's possible that they share a checking engine. The results are similar enough to those from Rechtschreibprüfung24 and Korrekturen that it's conceivable that all four use the same underlying technology. In addition, OnlineKorrektor.de looks and acts identically to GermanCorrector, so it could be that there are two URLs for a single underlying checker.
  • Duden Mentor is the only system I tested that flags the errors it finds, but doesn't offer suggestions on how to fix them.
  • Online-Spellcheck couples its poor ability to find mistakes with a checking speed that is notably worse than its competitors. In addition, it replaces its input window with an output window, so you can't just paste new text in to check something different.
  • Studi-Kompass found none of the errors in my tests. That suggests that it wasn't working or that I was doing something wrong.

I must reiterate that my testing was very limited, so my conclusions are tenuous. If you know of more comprehensive comparisons of German grammar checkers, please share what you know in the comments!

My testing focused on incorrect articles, because that's a problem area for me. I used the following test sentences, where I've boldfaced the part of each sentence that's wrong. I realize that if you know German, you will recognize what's wrong without my help, and if you don't know German, you'll just see randomly boldfaced text, but I can't resist the Siren's call of the boldface error indicator.

  1. Das Tisch sieht gut aus.
  2. Ich gehe im Küche.
  3. Ich bin in die Küche.
  4. Ich will einen Ort finden, die schön aussieht.
  5. Beim Check-in haben wir die Größe des Lobbys bewundert.
  6. Schließlich habe ich mich entschlossen, dass ich einen Ort finden musste, der zwischen Singapur und den USA liegt (d.h., der auf dem Heimweg ist), und die gute Flugverbindungen hat.

I invented sentences 1-3 as representing common simple errors. Sentences 4-6 are from or are variations on things I've actually written.

I scored the systems' results as follows:

  • 2 points if the error was found.
  • 2 more points if only one fix was suggested and it was correct; 1 more point if more than one fix was suggested, but the correct one was among them.
  • -1 point if only incorrect fixes were suggested.
  • -1 point if rewrites were suggested beyond what was in error.  (This is designed to penalize DeepL Write for mixing error corrections and stylistic rewrites.)

If a system found the error in a test sentence and suggested the proper fix (and it didn't suggest anything else), it got the full four points. If it found the error, but it didn't suggest the proper fix, or if it muddied the water with rewrites unrelated to the error, it got between one and three points, depending on the details of what it did.

A perfect score for the set of six sentences would be 24 points. The best any system did was LanguageTool, which got 21. Scribbr was close behind at 20 points. DeepL Write got 19. Then there was a gap until QuillBot's 16 points. Google Docs scored 14, Sapling 13, and Microsoft Word 10. Rechtschreibprüfung24, Korrekturen, TextGears, and GermanCorrector/OnlineKorrektor.de clumped together with 6 points, which is one reason I suspect they may all be using the same checking technology. Duden Mentor also got 6, but its behavior is quite different from the other systems with that score. Online-Spellcheck got 5 points. Studi-Kompass got none, but, as I noted above, my guess is that either the system wasn't working or I was doing something wrong.

 

Tuesday, February 20, 2024

Tracking Travel

I like to travel. I've been a few places. I have a map on the wall with pins where I've been. Old School, but I started it before the Internet existed. I'd like to move it into the digital era. Looking into that led me to Most Traveled People (MTP) and NomadMania (NM). Both offer the ability to generate maps of places visited based on data you enter. I tried both. The NM data entry process was so slow and cumbersome, I gave up. MTP worked better. The map it produced showing the countries I'd been to makes me look pretty well traveled:

This is terribly misleading. Country-level granularity means that if you visit only a single place in a big country (e.g., the USA, Canada, Russia), the map makes it look like you visited the whole thing. Recognizing this, both MTP and NM break the world into much smaller regions, 1500 in the case of MTP and 1301 in the case of NM. My MTP region map is not just less impressive, it's frankly a little depressing for somebody who feels like he's been around:
The region-based approach is better than one based on countries, but as I was entering the data for my travels in the United States, I found that MTP treats a few states as multiple regions. California, for example, comprises four regions, and Texas three. (NM does the same thing.) Like most states, Oregon--my state--is a single region, and my state pride was wounded at the idea that Colorado is broken into east and west, and Georgia into north and south, yet all of Oregon is thrown into a single basket. Eastern and western Oregon differ greatly in terms of geography, climate, economics, politics, and culture. Having been to one of them doesn't mean you've been to the other in any meaningful way.

Breaking the world into regions and tracking who's been where is good for ranking people in terms of how geographically widespread their travels have been. Such rankings are the bread and butter of MTP and NM. I was surprised to find that I'm a comparative couch potato. Kayak tells me that since I started using it in 2011, I've traveled nearly 500 days, flown over a half million miles, and been in 17 time zones, yet those trips plus my pre-Kayak travels let me lay claim to barely 10% of MTP's 1500 regions. With the paltry 43 countries I've visited, I'm not even half way to qualifying for the Travelers' Century Club. From the perspective of competitive travel, I might as well not even have a passport.

Fortunately, I'm not out to engage in big-league travel competition. I just want a digital approach to tracking where I've been. For that purpose, I'm thinking a custom Google Map with digital push-pins is the way to go. It's basically the same thing I've got on my wall now, except in digital form.


Tuesday, November 28, 2023

Pumpkin Pie Cutters

Some years ago, I got it into my head that just as there are cookie cutters for cookies, there should be pie cutters for pumpkin pie. I bought the deepest tree-shaped cookie cutters I could find, thinking I could stack them and produce festive pie pieces for a holiday party. It didn't go as planned. I couldn't get the stacking to work, and the result of using just one cutter was kind of a disaster:
Nevertheless, proof of concept! 

I found that the KindredDesignsCA shop at Etsy offered custom-made 3D-printed cookie cutters. They agreed to make extra-deep cutters for me, one in the shape of a tree, another in the shape of a snowflake. The cutters worked great, except that once I'd extracted a piece of pumpkin pie with a cutter, the pie stuck inside the cutter. I had KindredDesignsCA make plungers so that I could push the pie out of the cutters:

I was so pleased with the result (shown at the top of this post), I started looking for new pie-cutter-shape ideas. For reasons not worth going into, I hit upon the idea of US states, and the next thing I knew, I was looking at pieces of pumpkin pie that looked like California, Texas, and Minnesota:
This year I decided that for Thanksgiving dinner, it would be nice to have pieces of pumpkin pie that looked like turkeys and pumpkins, so KindredDesignsCA again did their 3D magic for me:
I learned a few new things from these latest cutters. One was that it's a bad idea to try to get too detailed. Check out the well-defined beak in the turkey cutter below...

...and compare it to the poorly-defined or broken-off beaks in the pieces of turkey-shaped pie above. We're sculpting with pumpkin pie here, so just because you can produce a cutter with well-defined details doesn't mean you can get those details to be retained in the pieces of pie you cut.  

On the other hand, I was worried about the narrow strips of pie for the turkey's legs holding together, and they came out fine.

So far, I've employed these cutters only for pumpkin pie, but a friend and I were musing about what else they could be used for. Ideas include sponge cake, gingerbread, pancakes, hamburger patties, ice cream sandwiches, and gelatin. Plus cookies, of course. In the end, they're just overly-deep cookie cutters with plungers.

As far as I know, the only drawback to these cutters is that they require hand washing. The material used for the 3D printing has a comparatively low melting point, so if they were to be put into a dishwasher, you'd likely end up with pie cutter goo all over everything.



Saturday, November 4, 2023

Electric Cars are Still Luxury Goods :-(

More than three years ago, I blogged about EVs being luxury goods. Some 13 months ago, I showed that the Nissan Ariya--the only electric compact SUV with the basic features I demand (all-wheel drive, an openable moonroof, a 360-degree camera, and an EPA range of at least 235 miles)--came with a 52% price premium vis-a-vis a comparable gas-powered Nissan Rogue. That difference put the Ariya solidly in luxury car territory.

In the intervening year, the Ariya has gone from forthcoming to present on dealer lots, and just yesterday Volvo made it possible to configure and price the 2024 XC40 Recharge for the US market. It joins the Ariya in offering the fundamental features I insist on. The XC40 comes in both battery- and gasoline-powered versions, so it makes it easy to measure the cost of going electric.

The intervening year has also seen a big jump in interest rates:

Average 60-month new car loan rate (per https://bit.ly/3QUl7ER)

The concomitant reduction in demand for new cars has changed the market. I decided to recheck the EV price premium by again comparing the cost of the Nissan Rogue with the equivalently-equipped Nissan Ariya. This time I checked not just MSRPs, but also prices at cars.com. I did two searches at cars.com. The first was nationwide, i.e., for the best price I could find anywhere. The second was "near me," which means within about 100 miles of Portland, Oregon. I then repeated the experiment for the Volvo XC40 (gas-powered) and the Volvo XC40 Recharge (batteries). For the Nissans, my data are for the 2023 model year, because the 2024s aren't out yet. The Volvo data are for the 2024 model year.

This is what I found:

The Ariya continues to have an MSRP about 50% higher than the equivalently-equipped Rogue, and this doesn't change when looking for real cars within 100 miles of me. If I expand my search to the entire country, the price premium drops to 41%, but it still represents a difference of nearly $14,000. It's also an artificial cost differential, because the lowest-priced Rogue is in Arizona, while the cheapest Ariya is in Illinois.

Volvo is a premium brand, so MSRP pricing for its its Rogue equivalent, the XC40, starts 26% higher than the Nissan. Going electric from there (to the XC40 Recharge) demands a relatively modest 26% premium, but the result is 58% above the MSRP for the Rogue. Within the Volvo line, the premium to go electric is only 26%, but the price increase I care about--from an ICE-powered compact SUV of any make to a similarly-equipped EV of any make--is nearly 60%. That's far above the 25% I consider acceptable.

Lest you think I'm not taking government tax credits and rebates into account in pricing the Ariya and the XC40 Recharge, I actually am. Neither qualifies for the federal $7500 tax credit (which is fictional for most people, anyway), and my state's program for EV rebates stopped accepting applications months ago, because it ran out of money.

To me, the most interesting aspect of the pricing data is the smallness of the differential between the Ariya and the XC40 Recharge. Here's the table above with a line added showing the premium you pay for choosing Volvo over Nissan (i.e., the XC40 Recharge over the Ariya): 

 
Regardless of whether you look at MSRPs or prices at cars.com, the Volvo costs no more than 8% more than the Nissan. I've never been able to figure out what makes premium brands premium, but if Volvo has it and Nissan doesn't, I'd expect that to motivate many buyers to choose the XC40 Recharge over the Ariya. 

As for me, I'll continue to bide my time and hope that the EV industry eventually comes out with a compact SUV with the features I want at a price that's no more than about 25% beyond the cost of a comparable ICE vehicle.


Monday, August 21, 2023

Nav App Audio Conflict Resolution

You're tootling down the road using a navigation app connected to your car's infotainment system, and you're approaching a turn in the route. You've enabled voice navigation, so the app should use your car's audio to tell you what to do. But what if the audio system is already in use? What should happen if your nav app and another audio source want to use your car's audio system at the same time?

Different nav apps solve this UX design question in different ways. Some resolutions seem obvious. Some are clever. Some are so bad, it's hard to believe they made it into production.

The stakes can be high. If two apps want to use the audio system at the same time, typically only one will succeed. If it's the nav app, you might miss an AMBER Alert on the radio while you're next to a car matching the description of one with a kidnapped child. If it's not the nav app, you might miss a freeway exit and have to drive many miles before the next one

I compared the resolution of audio conflicts for Google Maps and Apple Maps under iOS 16.6 CarPlay on a 2019 Nissan Rogue. I don't know whether what I found is representative of other nav apps, other cars, or other phone-car interface systems (e.g., Android Auto).

The scenarios I checked were what happens when your nav app wants to speak and:

  • A streaming app is playing.
  • The car radio is playing.
  • A phone call is in progress.
  • You're talking to your phone.
  • Your phone is talking to you, e.g., Siri is responding to a query or command.

This is what I found.

Nav App vs. Streaming App

The streaming apps I tested (Pandora and Simple Radio) can be paused, which is characteristic of every streaming app I'm familiar with. It seems obvious to me that the proper behavior when a nav app wants to talk is to pause the streaming app, have the nav app talk, then resume the streaming app. Google sort of agrees, because that's what Google Maps does when the streaming app is Pandora. However, when the streaming app is Simple Radio, Google mutes it instead of pausing it. I don't know the reason for this difference.

Apple Maps has behavior that's not just different from Google's, it's incomprehensibly bad. When Apple Maps wants to talk while a streaming app is playing, it just starts talking. The streaming app continues to stream. If what's being streamed is spoken audio, you have two voices talking at the same time! Neither can be understood. It's hard to imagine a worse approach than this.

Apple would probably argue that I'm mischaracterizing what it does. Apple Maps employs audio ducking, whereby the volume of the streaming app is reduced when Apple Maps speaks. In concept, that's not unreasonable, but in my experiments, the ducking effect kicks in only when the volume of the streaming app crosses a loudness threshold. This threshold is far above my usual listening level. I had to go out of my way to elicit the ducking effect. When I did, I found that the volume of the ducked audio was still high enough to compete with Apple Maps' spoken directives.

 To summarize:

  • Google Maps: Pause stream, speak, resume stream.
  • Apple Maps: Speak while stream plays. At high stream volumes, duck stream while speaking.

I think Apple's approach shows promise, but its implementation needs considerable refinement. For me, both the threshold and ducked volumes are too high. I wish I could configure them. With that said, comments on the Internet make clear that many people listening to spoken audio (e.g., podcasts) would prefer pausing over ducking, anyway.

Nav App vs. Car Radio

Radio is a different kind of streaming beast, because it's not pausable. That requires that Google change its approach to audio conflicts. Not so for Apple, which sticks to its guns and issues navigation instructions on top of whatever's playing on the radio. If you happen to be listening to a talk show or the play-by-play of a sporting event, you've again got dueling voices, and there's a good chance you'll understand neither. I'm generally a fan of UX consistency, but Apple's here is of Emerson's foolish variety.

Google's approach--to mute the radio while Google Maps speaks--is a lot more reasonable. There's a tiny chance you'll miss something important (e.g., an AMBER Alert), but it's more likely you'll just miss a snippet of a song, commercial, or host's blather.

We thus have:

  • Google Maps: Mute radio, speak, un-mute radio.
  • Apple Maps: Speak while radio plays. At high radio volumes, duck radio while speaking.
Given their current implementations, Google Maps' behavior is vastly preferable to Apple Maps', but I think a blended approach would be an improvement to both. A choice between muting and configurable ducking would be very attractive.

Nav App vs. Phone Call

While you're on the phone. Google is polite enough not to wrest control of your car's speakers from the person you're talking to, but it has no Plan B. If you don't notice the visual navigational directive on the CarPlay screen, you're out of luck: Google Maps remains silent if you're in a phone call.

Apple doesn't need a Plan B, because its Plan A is so good. When Apple Maps wants to speak, but you're on the phone, it issues a chime--a subtle indication that it'd like to tell you something, but it can't. It's your cue to look at the CarPlay screen to see what you need to do. I find it works well.

So:

  • Google Maps: Don't speak or make any other sound.
  • Apple Maps: Don't speak, but issue a chime.

What I admire about Apple's approach is that it takes advantage of the fact that you have a screen; CarPlay requires it. A chime is enough to tell you to look at it, but it's not so intrusive that it interrupts the flow of a conversation. Well done, Apple!

Nav App vs. Dictation

If a nav app wants to speak while you're talking to your phone (e.g., issuing a command to Siri or dictating a text message), the audio conflict is between you and your nav app. Google Maps bursts in like an excited child who, unable to restrain itself, says its piece without regard for the fact that you're already talking. It's rude, not to mention a sure-fire mechanism for derailing your train of thought. 

Google Maps' interruption also aborts whatever you're dictating, e.g., the command you're issuing or the text you're dictating. This means you have to start over after the excited child has said its piece (and hope that it produces no further audio interruptions before you're finished).

Apple Maps' chime strategy would work well here, but Apple inexplicably adopts Google's speak-no-evil policy and remains silent. If you're dictating and Apple Maps wants to tell you something, the only way you'll know is by looking at the CarPlay screen. It's a bitter pill to swallow after the cleverness of Apple's phone call conflict resolution.

This is a case where both nav apps offer disappointing and frustrating behavior:

  • Google Maps: Interrupt and abort dictation, speak.
  • Apple Maps: Don't speak or make any other sound.

Nav App vs. Phone Voice Response

If your phone is talking to you when Apple Maps wants to speak, Apple employs the chime approach again. It works as well here as it did in the phone-call scenario, and it really raises the question of why Apple didn't apply it in the dictation case. You hear the chime, you know to look at your CarPlay screen, but you can continue to listen to your phone. It just works.

Google Maps treats your phone like a radio station. It mutes your phone's voice while it issues navigational instructions, then it turns your phone's voice on again. This is a curious decision. It means you could miss important information, such as a crucial part of an email message. I'd expect a nav app to treat a phone's voice like a streaming app and pause it while the nav app speaks. Perhaps there's no CarPlay API to pause Siri output...?

Recap:

  • Google Maps: Mute phone voice, speak, un-mute phone voice.
  • Apple Maps: Don't speak, but issue a chime.

If you have additional information on how nav apps handle audio conflicts, please let me know in the comments below.