Upshot: My EV options remain lousy, they'll probably get worse, and AI chatbots are still terrible search assistants.
Roughly once a year I survey the field of electric vehicle (EV) offerings to see what's available that satisfies my basic criteria. Those criteria are summarized in a prompt I started feeding to various AI chatbots last year. (Before that, I surveyed the field manually; my first blog post on the topic was in 2020.) Here's the prompt:
List all the fully electric compact SUVs for sale in the United States that have all-wheel drive, an openable moonroof or sunroof, an all-around (i.e., 360-degree) camera, an EPA range of at least 250 miles, and are no more than 180 inches in length.
My experiences with the seven chatbots I tested last year were so bad, I wrote only about them and skipped over the fact that only one EV met my criteria. This year I'll write about both EVs and chatbots, but the story's pretty much the same across the board: nothing's improved on the EV front, and even though I tested two additional chatbots this year, their collective performance as search assistants remains terrible.
Compact SUV EVs
Let's start with the EV situation. Last year, the only car that satisfied my criteria was the Volvo EX40. That remains the case this year. Part of the problem is that my length limit of 180 inches excludes almost all compact SUVs. However, upping it to 185 inches (near the top end for the segment) doesn't help much, because my desire for an openable roof is almost as constraining as the 180-inch limitation. If we assume that manufacturers produce the cars the market wants, it's not unreasonable to conclude that the underlying problem here is me, not the EV industry.
But I think there's more to the story. My chatbot prompt includes only my basic criteria, not all of them. I think a look at the bigger picture makes me look, well, somewhat less unreasonable.
Consider the Volvo EX40, the sole EV satisfying my basic criteria. A secondary criterion is that for EVs with CarPlay, it has to be wireless CarPlay. If a manufacturer refuses to support CarPlay on principle (as is the case with GM and Rivian), that's one thing, but if they offer only wired CarPlay, that just makes me mad. Wireless CarPlay is now standard on cars costing well under $30,000 (e.g., Hyundai Kona, Kia K4), and the fact that many EVs offer it means there's no EV-related technical limitation. Volvo is a luxury brand, and a nicely equipped EX40's MSRP runs nearly $60,000. Yet its CarPlay is wired. I use CarPlay virtually every time I get in my car. For sixty grand, I'm not going to dig my phone out of my pocket and fumble with a cord every time I drive. Nor am I going to buy a third-party adapter and hope for the best. What I'm going to do is turn up my nose at the car and the inexplicable management that produced it.
Something else I'm not going to do is put up with a car that gives me trouble. I expect my cars to work. Always. Completely. One of the ways I try to ensure that is by paying attention to Consumer Reports' reliability ratings. CR is far from perfect, but unlike virtually every other source that reports on cars, I believe it is independent of the industry and that its evaluations are based on reasonably objective data. No car gets on my short list unless it's got the CR "Recommended" badge (which denotes statistical reliability, safety, and owner satisfaction). The EX40 doesn't have the badge.
Nor do the Audi Q4 E-Tron and the Kia EV6. These cars pass my basic screening if I up the length limit to 185 inches. Both offer openable roofs. They also offer wireless CarPlay. But CR's reliability score for the Q4 E-Tron is 27 out of 100, and for the EV6 it's 25. People purchase these cars, so there are definitely buyers not put off by such poor reliability numbers. I'm not one of them.
Driver Monitoring Systems and the False Positive Problem
In time, the EX40 could get wireless CarPlay, and more EVs could build up badge-worthy reliability scores, but time brings a new issue with it: ineluctable driver monitoring systems (DMSes). The 2021 Infrastructure Investment and Jobs Act requires the NHTSA to develop rules that, when they go into effect, will mandate that cars monitor drivers for signs of impairment (e.g., being drunk, high, drowsy, etc.) and take steps to prevent driving under those conditions.
The NHTSA has not finalized the rules, so manufacturers don't have to do anything yet, but they seem to be preparing for the future by installing DMSes (e.g., steering wheel torque sensors and driver-monitoring cameras) on many models. Among compact SUV EVs, for example, the IONIQ 5, the Toyota C-HR, and the Lexus RZ all have DMS cameras in at least some trims.
In principle, I have no objection to DMSes. There are privacy issues to be ironed out, but I think the bigger challenge is getting past the problem of false positives: cars that detect driver impairment or lack of engagement when there isn't any. False positives lead to misleading, distracting, and possibly hazardous audible and/or visual and/or haptic alerts.
My personal experience with DMSes has not been encouraging:
- I've disabled the torque-based "hands on the wheel" systems in cars I've owned, because I've found it tiring to have to fight the automated steering associated with lane-keep-assist so that my car won't incorrectly complain that I'm not holding the wheel.
- A DMS-camera-based system in a car I was test driving issued an invalid "hands on the wheel" warning on the freeway. The warning was threefold--audible, visual, and haptic--and constituted a distraction that would be dangerous in any car, but was especially so in one I wasn't familiar with.
- Blind spot warning systems are fabulously naive. I have two cars with them, and they go off frequently when I'm in the outer lane of a two-turn-lane configuration. They also chirp from time to time for no apparent reason, even when there is no other car around.
- Emergency braking systems occasionally flash giant BRAKE! warnings (often accompanied by an audible alert) when there is nothing to brake for. These generally disappear immediately after they appear, presumably because the system realized it made an error, but they are disconcerting for the driver, who finds himself (or, in the case of my wife, herself) desperately looking around for the phantom obstacle he (or she) allegedly overlooked.
- My wife and I each have cars with parking sensors that beep nearly every time we back out of our garage. They fail to distinguish between "there is something near the path your car is taking" and "there is something in the path your car is taking." It's an irritating enough start to nearly every trip that I've disabled the rear sensors.
I believe that automobile manufacturers are installing DMS and other safety systems before they are sufficiently mature. When such systems are optional (e.g., can be easily disabled), I have no objection, but the DMS systems on an increasing number of cars cannot be practically avoided. Driver-facing cameras, for example, typically cannot be turned off and issue repeated warnings if they are covered.
In my view, deploying un-disableable safety systems with high false-positive rates is irresponsible. This is an area where Consumer Reports and I have quite different opinions. From what I can tell, CR doesn't consider the spurious warnings associated with false positives a safety risk. I do. I also think that the irritation and frustration stemming from false-positive-based warnings constitutes hazards in their own right. Drivers paying a continual annoyance tax in the form of baseless warnings can't possibly be as attentive as they would be if their cars didn't incessantly cry wolf.
Automakers don't see things my way, of course, so even without a regulatory mandate, my expectation is that in the coming years, always-on DMSes will become the norm (for both electric and petroleum-powered vehicles). Unless these systems' false-positive rates improves dramatically, that will be reason enough for me to avoid buying.
Convertible EVs
Since I last wrote about EV convertibles 11 months ago, nothing has changed. Well, that's not quite true. The starting price of the only EV convertible you can buy in the United States, the Maserati GranCabrio Folgore, has increased from $206,700 to $209,350. Perhaps you snapped one up earlier this year when Maserati discounted the price by $85,000, thus reducing the cost of the 2025 GranCabrio Folgore to a trifling $123,000.AI Chatbots as Search Assistants
Last year I fed my EV search prompt to seven AI chatbots. The results were terrible. This year I added two additional systems (DeepSeek and Grok) and tested a bit more extensively. The results were still terrible. (I tested only the free versions of these systems. It's possible that the paid versions would produce better results.) Here's the full list of systems I tested:
And here's my prompt again:
List all the fully electric compact SUVs for sale in the United States that have all-wheel drive, an openable moonroof or sunroof, an all-around (i.e., 360-degree) camera, an EPA range of at least 250 miles, and are no more than 180 inches in length.
I submitted this prompt under various conditions. (Details follow). None of the systems consistently identified the Volvo EX40 as the only car that satisfies my criteria. Most of them never mentioned the car at all, not even by its former name, the XC40 Recharge. These systems--the utter and complete failures--are:
- ChatGPT
- Claude
- Copilot
- You.com
- Mistral
- Grok
Some systems failed by saying no cars satisfy my criteria. Some failed by listing only cars that don't satisfy the criteria. Some failed in both ways, depending on how I tested them.
Testing the systems was complicated by the fact that I play around with chatbots on an ongoing basis as a way of getting a sense for what they can do. While playing around with them on the topic of EVs, I sometimes tell them things or lead them to find out things they'd initially overlooked, For example, if a chatbot responds to my prompt and doesn't mention the EX40, I might say
What about the Volvo EX40?
That generally causes the chatbot to recognize its error. However, the chatbot has thereby been "polluted" in its interactions with me (its developers would probably characterize this as learning), so feeding it my EV search prompt again as a test wouldn't lend any insight into how the chatbot might respond for other people. For this blog post, I tried to test the chatbots in ways that would prevent their recognizing that they were interacting with me. Where possible, I didn't log in, and in my browsers, I tried to shake off cookies that could identify me.
For chatbots where no login is required, I accessed them in anonymous browser windows using two browsers and two computers. With one of the browsers and one of the computers, I'd had virtually no interactions with any AI chatbots. For chatbots where I had to log in, I preceded my EV search prompt with this:
For the following question, forget every past chat we have ever had. Treat it as a completely new request from a completely new user. Do you understand?
My testing began only after the chatbot confirmed its understanding and compliance.
I ultimately employed this "forget your history with me" approach after logging in to all the systems, even the ones where I was able to test without logging in. I thus probed each system under as many of these conditions as I could:
- An anonymous Firefox window on Windows 11, no chatbot login.
- An anonymous Edge window on Windows 11, no chatbot login.
- An anonymous Firefox window on MacOS Sonoma, no chatbot login.
- Logged into chatbot and primed with the "forget our history" prompt.
One aspect of the chatbots' terrible performance was the range of responses that even a single chatbot produced depending on the environment of the test. Perplexity, one of the three systems that mentioned the Volvo EX40 in response to my prompt, took home gold for the greatest breadth of inconsistency:
| Prompt Environment | EVs Listed by Perplexity |
|---|---|
| Anonymous Firefox window, Windows 11. | Chevrolet Equinox EV, Ford Mustang Mach-E, Tesla Model Y, Volvo EX30, Volvo EX40. |
| Anonymous Edge window, Windows 11. | VW ID.4, Hyundai IONIQ 5, Kia EV6, Tesla Model Y, Volvo EX30. |
| Anonymous Firefox window, MacOS Sonoma. | Ford Mustang Mach-E, Tesla Model Y, Volvo EX30 Twin Motor Performance. |
| Logged in and told to forget all history. | Hyundai IONIQ 5, Volvo EX30. |
| Anonymous Firefox window, Windows 11 (repeat of first test). | Volvo EX30. |
The first and last lines of the table correspond to the same test conditions, but the responses are markedly different. Perplexity offers not just behavior that's inconsistent, it's irreproducably inconsistent. The other chatbots exhibited similarly inconsistent behavior. ChatGPT, for example, produced four different sets of results under the conditions listed in the table--hardly better than Perplexity's five.
The table shows that in only one of the five prompt environments did Perplexity indicate that the EX40 satisfied the search criteria. That's terrible, although arguably better than the six chatbots that never mentioned the car at all. DeepSeek was less bad, indicating that the EX40 was the sole qualifying EV in two of the three prompt environments in which I tested it. (DeepSeek doesn't permit anonymous use.) One of the responses was quite roundabout, however; you can view it here. Amusingly, DeepSeek's source for the EX40 satisfying my criteria is my blog post from last year, and one of the sources it cited when responding in my "forget your history with me" environment was a conversation I had with Mistral during last year's testing!
There was a single bright spot in my trials, and it was Gemini. Last year Gemini struck out, but check out its performance this time around:
| Prompt Environment | EVs Listed by Gemini |
|---|---|
| Anonymous Firefox window, Windows 11. | No cars satisfy the criteria |
| Anonymous Edge window, Windows 11. | Volvo EX40 |
| Anonymous Firefox window, MacOS Sonoma. | Volvo EX40 |
| Logged in and told to forget all history. | Volvo EX40 |
| Anonymous Firefox window, Windows 11 (repeat of first test). | Volvo XC40 Recharge (aka EX40) |
It correctly identified the Volvo EX40 as the only car satisfying the criteria in three of the five test cases. In an additional case, it listed the XC40 Recharge, but it pointed out that it had been renamed EX40. Interestingly, the case it failed in was the first test in an anonymous window (the top line in the table). A repeat of this test in a new anonymous window (the bottom line in the table) yielded a different response. This second response is essentially correct. Whether you view that as good news or bad depends on how you feel about inconsistency. Personally, I am not a fan.
Gemini's performance is encouraging, but it's only one of nine systems I tried. My testing indicates that the conclusion I reached last year remains valid: AI chatbots, as a group, are a long way from being reliable search assistants.
Aside: Chatbots and Image Generation for this Post
Before writing this post, my thinking was that it was about continuing failure. The continuing failure of the EV industry to offer a compact SUV that satisfies my criteria, the continuing failure of that industry to build EV convertibles for the US market, and the continuing failure of AI chatbots to acquit themselves as competent search assistants. As such, I thought it'd be fitting to start the post with a chatbot-generated image of failure, and I gave the following prompt to each system I tested:
I'm writing a blog post about an undertaking that ended in complete failure. Create an image suitable to accompany this blog post.
You.com, Mistral, and DeepSeek don't generate images, but the others do, so I had six pictures to choose from. None of them really wowed me, although this one from Gemini isn't bad:
Grok's image made me smile, because Grok is developed by xAI, which is owned by SpaceX. To them, I guess this is what failure looks like:Because none of the generated images felt right, I prompted again:
Try something like a cake that's turned out completely wrong.
I got back pictures that matched the prompt, including this charming one from Copilot,
but none struck me as embodying the fundamental theme of this post.I then hit upon the idea of an animated gif showing a plane spiraling into the ground, and a Google search--remember those?--led me to this image, which was used in The Atlantic's article on, ironically, how to avoid spiraling into the ground. I'd have happily used the picture, but it's copyrighted, and I try to respect copyrights. So I fell back on the sincerest form of flattery and asked the chatbots to create a similar image:
Generate an image of a plane spiraling into the ground.
Three systems (Perplexity, Claude, and Copilot) refused my request on the grounds that they don't generate violent imagery. ChatGPT, Gemini, and Grok had no such compunction, but the pictures they generated were nowhere near as good as the one accompanying The Atlantic's article. (Score: Humans 1, Chatbots 0.) ChatGPT didn't seem to realize that a plane spiraling into the ground hasn't hit the ground yet,
while Grok didn't seem to understand that a plane spiraling into the ground is actually heading towards the ground:In the end, I decided to go with no image at all, although I'd have used the one from The Atlantic had it not been copyrighted.




