The View from Aristeia: 2024

Saturday, July 20, 2024

Anthropic's Claude Aces my German Grammar Checker Tests

Yesterday I published the results of my latest testing of German grammar checking systems. Unlike my first round of tests, I included LLMs, in particular ChatGPT, Gemini, and Copilot. I had a nagging feeling that I should include Anthropic's Claude, too, but I didn't find out about Claude until near the end of my testing, and I wanted to be finished, so I decided to worry about Claude later. This was a terrible decision. Later turned out to be only a few hours after I'd published the article, when I unexpectedly found myself with enough free time to play around with the system. Claude proceeded to not only outperform every other system I'd tested, it aced my set of tests with a perfect score.

My test set is hardly exhaustive, but no other system has managed to find and correct all 21 errors in the set. Kudos to Claude and Anthropic!

Here are the updated results of my testing after the addition of Claude to the list:

Claude: 84 points (100% of possible)
ChatGPT: 76 (90%)
LanguageTool: 70 (83%)
Scribbr/Quillbot: 50 (60%)
Gemini: 48 (57%)
Sapling: 41 (49%)
Copilot: 40 (48%)
Rechtschreibprüfung24/Korrekturen: 39 (46%)
Google Docs: 34 (40%)
GermanCorrector: 31 (37%)
Online-Spellcheck: 26 (31%)
Microsoft Word 2010: 16 (19%)

Friday, July 19, 2024

German Grammar Checkers Revisited

Update: I have since added Anthropic's Claude to my tested systems, and it outperformed every system I discuss below. Details here.

A couple of months ago, I blogged about how I'd (superficially) tested a number of grammar checking tools for German. A comment from jbridge introduced me to the idea of using ChatGPT as a grammar checker, and in the course of exploring that option, I expanded my set of tests to make it a little less superficial. That led to new insights, so it seems like it's time for a German grammar checking tool follow-up.

If you're not familiar with my original post, I suggest you read it.

As before, I'm testing only free tools. I generally test without signing up for or logging into any accounts, but for ChatGPT, I created a free account and logged in so I'd have access to the more powerful ChatGPT 4o rather than the no-account-required ChatGPT 3.5.

Update on the Tools I Looked At Last Time

I remarked last time that Scribbr and QuillBot are sister companies using the same underlying technology, but Scribbr found more errors. That is no longer the case. In my most recent testing, they produce identical results, so we can speak of Scribbr/QuillBot as a single system. Unfortunately, the way this uniformity was achieved was by bringing Scribbr down to QuillBot's level rather than moving QuillBot up to Scribbr's. Even so, Scribbr/QuillBot remains the second-best grammar-checking tool I tested (after LanguageTool). It continues to find errors that LanguageTool misses, so my default policy remains to use both.

My prior test set consisted of six individual sentences. This time around, I added a letter I had written. It's about 870 words long (a little under two pages), and, as I found out to my chagrin, contains a variety of grammatical errors. Checking such a letter is more representative of how grammar checkers are typically employed. Just as we normally spell-check documents instead of single sentences, grammar checkers are usually applied to paragraphs or more.

That had an immediate effect on my view of DeepL Write. I noted in my original review that it's really a text-rewrite tool rather than a grammar checker. On single sentences, you can use it to find grammatical errors, but when I gave it a block of text, it typically got rid of my mistakes by rephrasing things to the point where the word choices I'd made had been eliminated. I no longer consider it reasonable to view DeepL Write as a grammar-checking tool. It's still useful, and I still use it; I just don't use it to look for mistakes in my grammar.

Adding the letter to my set of tests also led me to remove TextGears and Duden Mentor from consideration, because they both have a 500-character input limit. My letter runs more than ten times that, about 5300 characters. Eliminating these systems is little loss, because, as I noted in my original post, GermanCorrector produces the same results as TextGears, but it doesn't have the input limit. As for Duden Mentor, it flags errors, but it doesn't offer ways to fix them. I find this irritating. Using it is like working with someone who, when you ask if they know what time it is, says "Yes."

These changes don't really matter, because when I submitted my augmented test set (i.e., sentences from last time plus the new letter) to the systems from my original post, LanguageTool and Scribbr/QuillBot so far outperformed everybody else, I can't think of a reason not to use them. I'll provide numbers later, but first we need to talk about LLMs.

ChatGPT and other LLMs

In his comment on my original blog post, jbridge pointed out that ChatGPT 4o found and fixed all the errors in my test sentences. That was better than any of the systems I'd tested. Microsoft's LLM, Copilot, produced equally unblemished results. Google's Gemini, however, found and fixed mistakes in only four of the six sentences.

I asked all the systems (both conventional and LLM) to take a look at my letter. ChatGPT found and fixed 13 of 15 errors, the best performance of the group. LanguageTool also found 13 of the 15 mistakes, but ChatGPT fixed all 13, while LanguageTool proposed correct fixes for only 11.

Given ChatGPT's stellar performance, I was excited to see what Copilot and Gemini could do. Copilot kept up with ChatGPT until it quit--which was when it hit its 2000-character output limit. This was less than halfway through the letter, so Copilot found and fixed only four of the 15 errors.

Gemini's output was truncated at about 4600 characters, which is better than Copilot, but still less than the full length of the text. Gemini was able to find and fix eight of the 15 errors. This is notably fewer than ChatGPT, and not just because Gemini quit too early. In the text that Gemini processed, it missed three mistakes that ChatGPT caught.

I had expected that the hype surrounding ChatGPT was mostly hype and that the performance of ChatGPT, Copilot, and Gemini would be more or less equivalent. If their performance on my letter is any indication, ChatGPT is currently much better than the offerings from Microsoft and Google when it comes to checking German texts for grammatical errors. It also does better than LanguageTool, the best-performing non-LLM system, as well as the combination of LanguageTool and Scribbr/QuillBot together. It's really quite impressive.

However, just because it's better doesn't mean it's preferable. Read on.

Learning from LLMs

My interest in grammar checkers is two-fold. Sure, I want to eliminate errors in my writing (e.g., Email and text messages), thus sparing the people who receive it some of the kinks in my non-native German, but I also want to learn to make fewer mistakes. Knowing what I'm doing wrong and how to fix it will help me get there. At least I hope it will.

Traditional (non-LLM) grammar checkers are good at highlighting what's wrong and their suggestion(s) on how to fix it. This is what LanguageTool looks like on the second of my single-sentence tests. The problematic text is highlighted, and when you click on it, a box pops up with a suggested fix:

The other grammar checkers work in essentially the same way.

LLM systems are different. They can do all kinds of things, so if you want them to check a text for German grammar errors, you have to tell them that. This is the prompt I used before giving them the letter to check:

In the following text, correct the grammatical errors and highlight the changes you make.

For the most part, they did as they were told, but only for the most part. Sometimes they made changes without highlighting them. When that was the case, it was difficult for me to identify what they'd changed. It's hard to learn from a system that changes things without telling you, and it's annoying to learn from one that disregards the instructions you give it. But that's not the real problem.

The real problem is that LLMs can be extraordinarily good at doing something (e.g., correcting German grammar errors) and unimaginably bad at explaining what they've done. Here's ChatGPT's explanation of how it corrected the last of my single-sentence tests. (Don't worry if you don't remember the sentence, because it doesn't matter).

"die gute Flugverbindungen" is incorrect, as "Flugverbindungen" is plural and therefore "die" must be used, but it must also be "gute" instead of "gute".

That's nonsense on its own, but it also bears no relation to the change ChatGPT made to my text. Explanations from Gemini and Copilot were often in about the same league.

Lousy explanations may be the real problem, but they're not the whole problem. My experience with LLMs is that they complement their inability to explain what they're doing with a penchant for irreproducability. The nonsensical ChatGPT explanation above is what I got one time I had it check my test sentences, but I ran the tests more than once. On one occasion, I got this:

"Ort" is masculine, so the relative pronoun should be "der" to match the previous clause and maintain grammatical consistency.

This is absolutely correct. It's also completely different from ChatGPT's earlier explanation of the same change to the same sentence.

Copilot upped the inconsistency ante by dithering over the correctness of my single-sentence tests. The first time I had it look at them, it found errors in all six sentences. When I repeated the test some time later, it decreed that one sentence was correct as is. A while after that, Copilot was back to seeing mistakes in every sentence.

Using today's LLM systems to improve your German is like working with a skilled tutor with unpredictable mood swings. When they're good, they're very, very good, but when they're bad, they're awful.

I view the instability in LLM behavior as a sign of hope. The systems are evolving, and I'm confident that, over time, they'll get better and better. ChatGPT already finds and fixes more errors in my test set than any other system (see below). Its descriptions of what it's doing and why are sometimes delusional, but I have faith that its developers will get that under control. I have similar faith that other LLM systems will also improve, offering better results and increased consistency. For now, I'm sticking with LanguageTool and Scribbr/Quillbot, but I have no doubt that LLM technology will assume an increasingly important role in language learning. (LanguageTool says it's "AI-based," so for all I know, it already uses an LLM in some way.)

Systems and Scores

Update: As noted above, I have added Anthropic's Claude to my tested systems, and it outperformed everything in the list below. Details here.

I submitted the six single-sentence tests from my first post plus the new approximately-two-page letter to the grammar checkers in my first post as well as to ChatGPT, Gemini, and Copilot. As noted above, I disregarded the results from DeepL Write, TextGears, and Duden Mentor. I did the same for Studi-Kompass, which I was again unable to coax any error-reporting out of. I scored the remaining systems as in my original post, awarding up to four points for each error in the test set. A total of 84 points was possible: 24 from the single-sentence tests and 60 from the letter. These are the results:

ChatGPT: 76 points (90% of possible)
LanguageTool: 70 (83%)
Scribbr/Quillbot: 50 (60%)
Gemini: 48 (57%)
Sapling: 41 (49%)
Copilot: 40 (48%)
Rechtschreibprüfung24/Korrekturen: 39 (46%)
Google Docs: 34 (40%)
GermanCorrector: 31 (37%)
Online-Spellcheck: 26 (31%)
Microsoft Word 2010: 16 (19%)

Wednesday, June 19, 2024

How a Bad Monitor Port Thwarted my Move to macOS

In March, I decided to move from Windows to Mac. My goal was texting on the desktop, i.e., texting via my computer. Many people insist on texting, and I was able to communicate with them only through my iPhone. I wanted to be able to do it using my computer, as I did with email and WhatsApp.

Changing operating systems is always a production, but in thirty-plus years with Windows, I had never developed an affection for it, so my only real concern was that I'd have to give up the three-monitor setup I'd used for over a decade. I connect my monitors to a docking station made by Lenovo that's compatible with my Lenovo laptop. Apple doesn't make docking stations, but an associate at the local Apple store assured me I'd have no trouble using a third-party dock. He pointed me to Plugable. Pluagable recommended their UD-ULTC4K, and I ordered one for use with the MacBook Pro M3 I purchased.

I bought a bunch of stuff at this time. I got a new keyboard with Mac-specific keys. I got a new monitor offering HDMI and DisplayPort inputs, because one of my current monitors didn't have either, and those are the video outputs on the Plugable docking station I'd ordered. I purchased new video cables to connect everything.

As we'll see, the new monitor is the villain of this story. We'll call it the ViewSonic.

Missing Monitors and Restless Windows

A few days after I started using the Mac-based system, I wrote Plugable about an intermittent problem I was having. It occurred only after a period of inactivity (POI), i.e., when the displays turned off because I wasn't interacting with the system. When I started using the Mac after a POI, one or two monitors might fail to wake up. The Mac would then shuffle the windows from the "missing" monitors to the monitor(s) it detected. Most of the time, only one monitor went blank. It was rarely the ViewSonic.

Two weeks of debugging with Plugable followed, during which logs were collected, software was updated, cables were swapped, a replacement dock was issued, and we started all over. Then I reported what I came to call restless windows:

On some occasions, after all three screens come up, some of the windows that should be on one screen have been moved to another one.

Neither Plugable nor I had theories about how this could happen, nor did we have ideas for further debugging the missing-monitor problem, which hadn't gone away. We agreed that I'd return their dock, and they'd refund my money. I was disappointed at how things worked out, but Plugable acquitted itself exemplarily throughout.

I replaced the Plugable dock with TobenONE's UDS033. It also yielded intermittent disappearing monitors after a POI. TobenONE's interest in helping me debug the problem was minimal, and it vanished entirely when they found I'd purchased the dock from Amazon instead of from them. I returned it.

The manager at the local Apple store was sympathetic about the trouble I was having. She suggested swapping out the MacBook to see if that was causing the problems, extending Apple's return period to facilitate the swap. I ordered a replacement computer matching the one I already had.

There are two basic MacBook Pro M3 models. The M3 Pro, which is what I had, natively supports up to two external monitors. To connect three, you have to use a docking station employing a technology that, from what I can tell, fools a MacBook into thinking there are two external monitors when in fact there are three. The big cheese among such technologies is DisplayLink. Both Plugable and TobenONE use it. Internet sentiment towards DisplayLink is lukewarm, but if you need to connect three monitors to an M3 Pro MacBook, it's your primary choice.

The other MacBook Pro M3 model is the M3 Max. It's more expensive than the M3 Pro, but it natively supports up to three external monitors. I'd originally purchased an M3 Pro, and the replacement I'd ordered was also an M3 Pro, but while it was in transit, I realized that by upgrading to an M3 Max, I could eliminate the need for a docking station as well as DisplayLink, thus simplify the debugging of my problem. I canceled the M3 Pro replacement before it was delivered, and I ordered an M3 Max. Apple was unfazed by this, but my credit card company was on high alert, noting that my pricey orders from Apple were unprecedented and asking for confirmation each time.

I was excited about the M3 Max. With docks and DisplayLink out of the picture, surely my missing-monitor and restless-windows problems would disappear!

Um...no. Which is not a surprise, because I've already told you that the source of my display drama was a bad port on the ViewSonic. At the time, I didn't know that.

I connected my monitors to the M3 Max and arrayed them side by side. The ViewSonic was in the middle, because it was the newest and had the spiffiest specs. Not long afterward, I had the bizarre experience of returning to my computer after a POI and seeing that the windows on my left and right monitors had swapped! Online discussions (e.g., here and here and here) showed that I was not the first to experience this. I logged several instances before calling Apple. One tech remarked, "Yeah, that happens to me, too." A second told me, "Engineering has an open issue on that." There was every reason to believe that this was a MacBook problem.

Notice that the window-swapping behavior did not involve the ViewSonic. The restless windows afflicted only the side monitors. The ViewSonic sat quietly in the middle with an innocent look on its face.

Apple told me they'd look into the problem and get back to me. In the meantime, I logged what happened after every POI. On Day 1, the windows on the left and right monitors swapped six times out of seven POIs. On Day 2, eight of ten POIs resulted in window swaps.

Then things got strange.

The Music app often jumped from the right monitor to the middle one, even though other windows on the right monitor stayed put. Sometimes the windows from the right monitor moved to the left monitor, but the windows on the left monitor remained in place. (That's half a swap.) I started taking screen shots before POIs (i.e., just before I left the computer for a while) for comparison with what I saw after a POI. I found that my restless windows sometimes did more than just jump from one monitor to another. They might take on a different size or their position on the monitor might change. Or, as in this example, both:

Screen shots before POI (above) and after (below). The ViewSonic is the middle monitor, where nothing changes.

I gave up. I'd been battling missing monitors and restless windows for three months, and there was no end in sight. The Internet showed that others had the kinds of problems I did, and they couldn't solve them. Apple support reps told me they experienced the behavior I did, and I'd received no follow-up from Apple Engineering. I returned my MacBook on the last day of its return period. I was sad to do it, because I'd been just as pleased with texting on the desktop as I'd hoped, and I'd grown fond of photos taken on my iPhone magically appearing on the MacBook. Apple's reputation for integration isn't for nothing. But convenient texting and synchronized photos weren't enough to compensate for nondeterministic window sizing and placement each time I returned to my machine. I retreated to my Windows system to lick my wounds and consider my options.

Back to Windows

I dusted off my Lenovo laptop and its docking station (literally!), hooked everything up, and booted into Windows. I futzed around and rued the loss of my texting window, then went away for a bit. When I came back, I was stunned to see that one of the windows that had been on my left monitor was now on the middle monitor! Hoping I had somehow imagined it, I moved it back where it belonged and left the machine for another POI. It had not been my imagination. The window was not where I'd left it. In addition, a window on the left monitor was now a different size!

You know those creepy scenes in movies and TV shows where the protagonist unplugs a TV or a computer monitor to make sure it's off, but it turns back on, anyway? It was like that. The Mac was gone, and I was using the same old Windows system I'd been using for years. Restless windows were impossible. And yet...

It took me a while to realize that it wasn't the same system I'd been using for years. I hadn't swapped back in the monitor the ViewSonic had replaced. That made the ViewSonic the only component common to the Mac-based system I'd been using and the Windows-based system sitting before me--the only component common to all configurations where I'd experienced windows on walkabouts.

I swapped out the ViewSonic for the monitor it had replaced. Everything worked fine. Time after time, my Windows remained where I put them. It was apparent that the ViewSonic was anything but innocent.

Ports

There are two digital input ports on the ViewSonic: HDMI and DisplayPort (DP). (The monitor also has a VGA input, which is analog. That input isn't germane to the story, but it's worth taking a moment to marvel at the longevity of VGA, which debuted in 1987 and remains important enough that monitor vendors continue to support it.) I'd been using the ViewSonic's DP input, because the video output from the Lenovo docking station is DP, and I figured it would be better to go DP-to-DP than DP-to-HDMI.

The ViewSonic was clearly responsible for my weeks of video despair, but I didn't know if the problem lay with the monitor in general or with the DP port in particular. To find out, I swapped the ViewSonic back in to the system, this time connecting to the HDMI port. A zillion POI trials convinced me that the monitor worked fine with HDMI. The problem had to stem from the DP connection.

That connection has three parts: the DP port on the docking station, the DP port on the ViewSonic, and the cable between them. I'd successfully used the DP port on the dock when testing the ViewSonic's HDMI connection, so the dock's DP output was in the clear. That meant the source of my monitor madness was either the ViewSonic's DP port or the cable leading to it. I bought a new cable and connected to the ViewSonic's DP port. My windows were restless again. Two cables with the same behavior meant the cable wasn't the problem. The guilty party had to be the ViewSonic's DP port. It was the last suspect standing.

To really clinch the case, I'd need to replace the ViewSonic with an identical monitor and verify that everything works over a DisplayPort connection with the replacement monitor. I'm working with ViewSonic on that now.

Update: ViewSonic replaced my monitor, but the new monitor behaves the same as the old one: my windows wander when I use its DisplayPort input, but they stay put when I use its HDMI input. I can't explain this. The chance of both monitors being defective is nil. ViewSonic might be handling the DisplayPort protocol in a funky way, but if that were the case, I'd expect there to be lots of reports of the problem, and I don't see that. My windows wandered with four docking stations, three computers, and two operating systems, so the only common denominator is the ViewSonic monitor. And me, now that I think about it...

To me, the big mystery is how a bad port on one of three monitors in a system can, among other things, cause the window managers in two independent operating systems to swap windows on the monitors whose ports are not bad. If you have insight into this, please share!

Mac Thoughts

The case against the ViewSonic is rock-solid, but that doesn't mean the Mac is off the hook. The Internet reports of restless windows under macOS are still there, as are the comments from Apple's support reps acknowledging the problem. It could be that the ViewSonic is defective and Macs have unreliable multiple-monitor support. Testing this would require a fourth MacBook purchase (I wonder what my credit card issuer would think of that), replacing the ViewSonic with a known-good monitor, and seeing what happens.

It'd be easy enough to do, but I'm not sure I want to. During my three months with macOS, I devoted a great deal of time to debugging missing monitors and restless windows, but I also spent many hours familiarizing myself with the operating system and working within it. Desktop texting and synced photos were great, and moving from the M3 Pro to the M3 Max was the kind of smooth experience that drives home just how sadistic Microsoft's Windows-to-Windows migration process is. (My PC is 11 years old, in no small part because moving to a new machine is so painful.) macOS is a significant upgrade to Windows in important ways, especially if you have other Apple devices, e.g., an iPhone.

However, I found day-to-day life with macOS rather uncomfortable. The menu bar's fixed position at the top of the screen often means moving the mouse a large distance to get to it. I have 24-inch monitors, and those marathon moves got old quickly. Word on the Internet is that the decision to anchor the menu bar atop the screen dates to the original 1984 Macintosh computer. That machine had a nine-inch screen and ran one application at a time, always in full-screen mode.That's nothing like the world I live in. The Mac's fixed menu bar location feels like the 40-year-old design decision it apparently is. I don't think it's stood the test of time.

Keyboard shortcuts can reduce the need to go to the menu bar, I know, but there's a steep memorization curve for them, and not everything has a keyboard shortcut. I find Windows' per-window menu bar more usable.

macOS seems focused on applications, while Windows is built around windows. Under macOS, Command-Tab cycles through applications. The Windows equivalent Alt-Tab cycles through windows. If you have multiple windows for an application, they share a single entry in the macOS Command-Tab cycle. In the Windows Alt-Tab cycle, each window get its own entry. (macOS offers a way to cycle through all windows associated with an application (Command-↑/Command-↓ after Command-Tabbing to the application), but I find it cumbersome.) The application-based focus on the Mac is so pronounced, you can hide all windows associated with an application, an operation with no Windows counterpart, as far as I know.

The Windows approach makes more sense to me. I partition my work into windows, not applications. I often have multiple independent windows open in a single application, especially browsers and spreadsheets. It's hard for me to think of situations where I'd like to close all windows associated with an application, but it's easy for me to think of situations where I'd like to close only some windows associated with an application. During my time with macOS, I tried to find scenarios where hiding made sense, but I came up empty. I ended up minimizing windows under macOS, just like I did under Windows, and I missed the ability to easily cycle through windows when I wanted to interact with one.

I was surprised to find that I often found familiar content looking rather ugly under macOS. Messages in Thunderbird looked like everything was in bold face, while Excel spreadsheet content was so small, I had to bump the magnification up to 120% to comfortably view it. It's likely that there are configuration changes I could have made to address these issues, but I really expected that an Excel spreadsheet or a Thunderbird email message I'd created under Windows would look pretty much the same when viewed in the same application (often on the same monitors) under macOS.

On Windows, I run Excel 2010. On macOS, the closest I could get was Office 365. Excel 365 on macOS lacks customization options present in Excel 2010 under Windows. The Quick Access Toolbar (QAT) on the Mac is fixed above the ribbon, for example, while in my Windows version, you can move it below the ribbon--which I do. Some commands I've got on the QAT in my Windows version of Excel--Font Name, Font Size, and Insert Symbol--can't be put in the QAT in Excel 365 on macOS. These limitations are Microsoft's fault, not Apple's, but they still chafe.

A third-party Excel plug-in I use works quite differently and less conveniently under macOS than under Windows. That's neither Microsoft's nor Apple's fault, but it further detracts from the overall Excel experience on a Mac. That matters to me, because I use Excel a lot. Given enough time and effort, I'm sure I could get used to the Mac version of Excel or I could switch to a different spreadsheet program, but between an inconveniently located menu bar, an emphasis on applications over windows, ugly window content, and restricted Excel functionality, the total cost of texting on the desktop comes to a lot more than I'd expected.

Plan B

At least that approach to texting on the desktop does. There is another way. If my going to macOS is too much trouble, it's supposed to be possible to use remote desktop software to bring macOS to me. This requires a Mac in addition to a Windows machine, but I happen to have an old MacBook Air floating around (as it were). I should be able to run a remote desktop server on the MacBook and a remote desktop client on Windows, thus giving me a way to use apps on the Mac--notably iMessage--from Windows. With suitable remote desktop support, I should be able to copy and paste from one machine to another, and the net effect should be pretty close to running Mac apps locally.

The remote desktop software most frequently mentioned for this is Google Remote Desktop (GRD). I gave it a quick try, and, well, if you accidentally make the Mac both the GRD server and client, you end up with a screen that looks like this:

Obviously, I have more work to do.

Wednesday, May 29, 2024

Five Years of no EV for Me

Five years ago this week my search for an electric compact SUV ended with me buying a conventional gas-powered car. I disliked the car (a Nissan Rogue) within a month after buying it, and I've been on the lookout for an electric replacement ever since.

Over the years, I've vented my frustrations with the EV market in a number of blog posts, sometimes focusing on their high cost and sometimes on the lack of models with the basic features I'm looking for: all-wheel drive, an openable moonroof, a 360-degree camera, and an EPA range of at least 235 miles. In November, I discussed the luxury-car-level pricing of the only two cars that meet these criteria: the Nissan Ariya and the Volvo XC40 Recharge. Since then, the only things that have changed are the name of the Volvo (now called the EX40) and the elimination of the prospect of Chinese imports pushing down EV prices. (The US government has adopted a policy of keeping them out of the market.)

In the meantime, what I'm looking for in an EV has evolved a bit. I still want all-wheel drive, an openable moonroof, and an all-around camera, but I now want a car on the shorter end of the compact SUV spectrum. My Rogue is 185 inches long. I'd prefer no more than 180 inches. (Tesla's Model Y is 187 inches. Ford's Mustang Mach-E is 186. VW's ID.4 is 181.)

My thinking about range has also changed. EV ranges can't touch those of gas-burners, so I understood that distance driving in an EV requires planning. But this didn't strike me as a problem. If you're driving, say, 400 miles to get from Point A to Point B, stopping for a half hour at 200 miles to recharge isn't a hardship. 200 miles represents 3-4 hours of driving, and who doesn't want to stop at that point to stretch one's legs, use the bathroom, grab a snack, etc?

A recent trip made me realize that not all long drives consist of extended driving sessions. My wife and I put 400 miles on a tank of gas while meandering along the southern Oregon coast. Most of our driving sessions were under an hour, because we stopped at various beaches (including, of course, Meyers Creek Beach, which is at the mouth of Myers Creek) and small coastal and inland towns (e.g., Bandon, Coquille, Port Orford, and Gold Beach). Many of the places we stopped had no facilities of any kind, much less charging stations.

According to the Oregon Clean Vehicle Rebate Program EV Charging Station Map, the charging options in Gold Beach, where we spent one night, consist of one 120V charging outlet for E-bikes and one 120V outlet for use by hotel and shop patrons on the opposite side of the river from where we were staying. A two-port charging station at a motel we weren't staying at is noted as "coming soon." The source for this information is shown as PlugShare, but the PlugShare web site shows only the "coming soon" charger, so it's possible that there aren't any charging options in Gold Beach at all.

This casts the "planning" aspect of traveling by EV in a different light. If you're off the beaten track and poking along in a gas-powered car, stopping here and there as the whim moves you, you can take refueling opportunities for granted. (There are four gas stations in Gold Beach.) In an EV, you may have to actively seek out recharging options. You really do have to plan.

I haven't yet decided what that means for me as a potential EV owner. It's a simple fact that EV ranges are notably lower than ICE ranges, and it's an equally simple fact that the EV charging infrastructure is much less well developed than the gas station network. For the foreseeable future, buying an EV means accepting those facts and finding ways to cope with them. If I really want to own an EV, I'll have to figure out how to do that.

I probably have plenty of time. The only EV that satisfies my basic criteria and fulfills my new not-longer-than-180-inches criterion is the Volvo EX40 (née XC 40 Recharge). MSRP as I'd like it equipped is nearly $62,000, which is about $20,000 more than my budget.

However, I have a gas-powered riding lawn mower that's not likely to last a lot longer. I'm already thinking of replacing it with an electric version. It could be that the form my first EV takes will be that of a machine you sit on top of and cut grass with.

Monday, May 20, 2024

German Grammar Checkers

I can speak some German. I'll never be fluent, but I can usually get by. Sadly, I make a lot of grammatical errors. It'd be nice to have a tool that could help me find and eliminate them. Syntax and grammar are structured things, seemingly tailor-made for algorithmic analysis. Surely there is software that can analyze my sentences, point out places where I've broken the rules, and tell me how to fix things!

There is. I recently tested more than a dozen programs and web sites that offer this service. The results were less impressive than I'd expected. On my (tiny and unrepresentative) set of sentences containing errors, most tools failed to find most of them. For errors that were found, it was common for the suggested fixes to be wrong.

I found these sites to offer the most useful results:

LanguageTool describes itself as an AI-based spelling, style, and grammar checker. My sense is that the focus is on spelling and grammar, not style. I've found it to do a pretty good job, though there are errors it misses.
Scribbr bills itself simply as a grammar checker. It also produces good results, though a hair below those of LanguageTool.
DeepL Write claims that its AI approach yields "perfect spelling, grammar, and punctuation" and provides alternative phrasings that "sound fluent, professional, and natural." This means it may rewrite your text to not just eliminate mistakes, but also to make it sound different (presumably better). In my experience, it does a very good job of finding and eliminating errors, but it's sometimes difficult to determine whether it changed something because it's incorrect or because it just felt like rewording it.

In daily use, I generally feed my writing to both LanguageTool and Scribbr, because they're fast, and each sometimes finds mistakes the other misses. If I'm extra-motivated, I also turn to DeepL Write. I've found it to identify mistakes the others miss. I don't use DeepL Write all the time, because I find it annoying to have to tease out whether it changed something on the grounds of correctness or stylistic whim.

In addition to these sites, I also (very cursorily) tested the following systems. I found them to produce notably inferior results. I've listed them in order of decreasing performance, based on my (really limited) tests:

QuillBot is a sister company to Scribbr that presumably uses the same underlying technology. I found that the two systems generally give identical results. There are exceptions, however, and in those cases, I found that Scribbr did a better job.
Google Docs can be configured to check spelling and grammar as you type. In my testing, it delivered mediocre results.
Sapling also produces mediocre results, but it often says "Sign in to see premium edit." I didn't do that, so I can't comment on its premium edits.
Microsoft Word, like Google Docs, can be configured to check for spelling and grammar errors as you type. On my tests using Word from Microsoft 365, its coverage was inferior to Google's.
Rechtschreibprüfung24 and Korrekturen produced the same results in my testing, so it's possible that they use the same underlying (and unimpressive) checking engine.
TextGears and GermanCorrector also produced the same results on my tests, so it's possible that they share a checking engine. The results are similar enough to those from Rechtschreibprüfung24 and Korrekturen that it's conceivable that all four use the same underlying technology. In addition, OnlineKorrektor.de looks and acts identically to GermanCorrector, so it could be that there are two URLs for a single underlying checker.
Duden Mentor is the only system I tested that flags the errors it finds, but doesn't offer suggestions on how to fix them.
Online-Spellcheck couples its poor ability to find mistakes with a checking speed that is notably worse than its competitors. In addition, it replaces its input window with an output window, so you can't just paste new text in to check something different.
Studi-Kompass found none of the errors in my tests. That suggests that it wasn't working or that I was doing something wrong.

I must reiterate that my testing was very limited, so my conclusions are tenuous. If you know of more comprehensive comparisons of German grammar checkers, please share what you know in the comments!

My testing focused on incorrect articles, because that's a problem area for me. I used the following test sentences, where I've boldfaced the part of each sentence that's wrong. I realize that if you know German, you will recognize what's wrong without my help, and if you don't know German, you'll just see randomly boldfaced text, but I can't resist the Siren's call of the boldface error indicator.

Das Tisch sieht gut aus.
Ich gehe im Küche.
Ich bin in die Küche.
Ich will einen Ort finden, die schön aussieht.
Beim Check-in haben wir die Größe des Lobbys bewundert.
Schließlich habe ich mich entschlossen, dass ich einen Ort finden musste, der zwischen Singapur und den USA liegt (d.h., der auf dem Heimweg ist), und die gute Flugverbindungen hat.

I invented sentences 1-3 as representing common simple errors. Sentences 4-6 are from or are variations on things I've actually written.

I scored the systems' results as follows:

2 points if the error was found.
2 more points if only one fix was suggested and it was correct; 1 more point if more than one fix was suggested, but the correct one was among them.
-1 point if only incorrect fixes were suggested.
-1 point if rewrites were suggested beyond what was in error. (This is designed to penalize DeepL Write for mixing error corrections and stylistic rewrites.)

If a system found the error in a test sentence and suggested the proper fix (and it didn't suggest anything else), it got the full four points. If it found the error, but it didn't suggest the proper fix, or if it muddied the water with rewrites unrelated to the error, it got between one and three points, depending on the details of what it did.

A perfect score for the set of six sentences would be 24 points. The best any system did was LanguageTool, which got 21. Scribbr was close behind at 20 points. DeepL Write got 19. Then there was a gap until QuillBot's 16 points. Google Docs scored 14, Sapling 13, and Microsoft Word 10. Rechtschreibprüfung24, Korrekturen, TextGears, and GermanCorrector/OnlineKorrektor.de clumped together with 6 points, which is one reason I suspect they may all be using the same checking technology. Duden Mentor also got 6, but its behavior is quite different from the other systems with that score. Online-Spellcheck got 5 points. Studi-Kompass got none, but, as I noted above, my guess is that either the system wasn't working or I was doing something wrong.

Tuesday, February 20, 2024

Tracking Travel

I like to travel. I've been a few places. I have a map on the wall with pins where I've been. Old School, but I started it before the Internet existed. I'd like to move it into the digital era. Looking into that led me to Most Traveled People (MTP) and NomadMania (NM). Both offer the ability to generate maps of places visited based on data you enter. I tried both. The NM data entry process was so slow and cumbersome, I gave up. MTP worked better. The map it produced showing the countries I'd been to makes me look pretty well traveled:

This is terribly misleading. Country-level granularity means that if you visit only a single place in a big country (e.g., the USA, Canada, Russia), the map makes it look like you visited the whole thing. Recognizing this, both MTP and NM break the world into much smaller regions, 1500 in the case of MTP and 1301 in the case of NM. My MTP region map is not just less impressive, it's frankly a little depressing for somebody who feels like he's been around:

The region-based approach is better than one based on countries, but as I was entering the data for my travels in the United States, I found that MTP treats a few states as multiple regions. California, for example, comprises four regions, and Texas three. (NM does the same thing.) Like most states, Oregon--my state--is a single region, and my state pride was wounded at the idea that Colorado is broken into east and west, and Georgia into north and south, yet all of Oregon is thrown into a single basket. Eastern and western Oregon differ greatly in terms of geography, climate, economics, politics, and culture. Having been to one of them doesn't mean you've been to the other in any meaningful way.

Breaking the world into regions and tracking who's been where is good for ranking people in terms of how geographically widespread their travels have been. Such rankings are the bread and butter of MTP and NM. I was surprised to find that I'm a comparative couch potato. Kayak tells me that since I started using it in 2011, I've traveled nearly 500 days, flown over a half million miles, and been in 17 time zones, yet those trips plus my pre-Kayak travels let me lay claim to barely 10% of MTP's 1500 regions. With the paltry 43 countries I've visited, I'm not even half way to qualifying for the Travelers' Century Club. From the perspective of competitive travel, I might as well not even have a passport.

Fortunately, I'm not out to engage in big-league travel competition. I just want a digital approach to tracking where I've been. For that purpose, I'm thinking a custom Google Map with digital push-pins is the way to go. It's basically the same thing I've got on my wall now, except in digital form.

The View from Aristeia