AI may not care whether humans live or die, but tools like ChatGPT will still affect life-and-death decisions — once they become a standard tool in the hands of doctors. Some are already experimenting with ChatGPT to see if it can diagnose patients and choose treatments. Whether this is good or bad hinges on how doctors use it.
GPT-4, the latest update to ChatGPT, can get a perfect score on medical licensing exams. When it gets something wrong, there’s often a legitimate medical dispute over the answer. It’s even good at tasks we thought took human compassion, such as finding the right words to deliver bad news to patients.
These systems are developing image processing capacity as well. At this point you still need a real doctor to palpate a lump or assess a torn ligament, but AI could read an MRI or CT scan and offer a medical judgment. Ideally AI wouldn’t replace hands-on medical work but enhance it — and yet we’re nowhere near understanding when and where it would be practical or ethical to follow its recommendations.
And it’s inevitable that people will use it to guide our own healthcare decisions just the way we’ve been leaning on “Dr. Google” for years. Despite more information at our fingertips, public health experts this week blamed an abundance of misinformation for our relatively short life expectancy — something that might get better or worse with GPT-4.
Andrew Beam, a professor of biomedical informatics at Harvard, has been amazed at GPT-4’s feats, but told me he can get it to give him vastly different answers by subtly changing the way he phrases his prompts. For example, it won’t necessarily ace medical exams unless you tell it to ace them by, say, telling it to act as if it’s the smartest person in the world.
He said that all it’s really doing is predicting what words should come next — an autocomplete system. And yet it looks a lot like thinking.
“The amazing thing, and the thing I think few people predicted, was that a lot of tasks that we think require general intelligence are autocomplete tasks in disguise,” he said. That includes some forms of medical reasoning.
The whole class of technology, large language models, are supposed to deal exclusively with language, but users have discovered that teaching them more language helps them to solve ever-more complex math equations. “We don’t understand that phenomenon very well,” said Beam. “I think the best way to think about it is that solving systems of linear equations is a special case of being able to reason about a large amount of text data in some sense.”
Isaac Kohane, a physician and chairman of the biomedical informatics program at Harvard Medical School, had a chance to start experimenting with GPT-4 last fall. He was so impressed that he rushed to turn it into a book, The AI Revolution in Medicine: GPT-4 and Beyond, co-authored with Microsoft’s Peter Lee and former Bloomberg journalist Carey Goldberg. One of the most obvious benefits of AI, he told me, would be in helping reduce or eliminate hours of paperwork that are now keeping doctors from spending enough time with patients, something that often leads to burnout.
But he’s also used the system to help him make diagnoses as a pediatric endocrinologist. In one case, he said, a baby was born with ambiguous genitalia, and GPT-4 recommended a hormone test followed by a genetic test, which pinpointed the cause as 11 hydroxylase deficiency. “It diagnosed it not just by being given the case in one fell swoop, but asking for the right workup at every given step,” he said.
For him, the value was in offering a second opinion — not replacing him — but its performance raises the question of whether getting just the AI opinion is still better than nothing for patients who don’t have access to top human experts.
Like a human doctor, GPT-4 can be wrong, and not necessarily honest about the limits of its understanding. “When I say it ‘understands,’ I always have to put that in quotes because how can you say that something that just knows how to predict the next word actually understands something? Maybe it does, but it’s a very alien way of thinking,” he said.
You can also get GPT-4 to give different answers by asking it to pretend it’s a doctor who considers surgery a last resort, versus a less-conservative doctor. But in some cases, it’s quite stubborn: Kohane tried to coax it to tell him which drugs would help him lose a few pounds, and it was adamant that no drugs were recommended for people who were not more seriously overweight.
Despite its amazing abilities, patients and doctors shouldn’t lean on it too heavily or trust it too blindly. It may act like it cares about you, but it probably doesn’t. ChatGPT and its ilk are tools that will take great skill to use well — but exactly which skills aren’t yet well understood.
Even those steeped in AI are scrambling to figure out how this thought-like process is emerging from a simple autocomplete system. The next version, GPT-5, will be even faster and smarter. We’re in for a big change in how medicine gets practiced — and we’d better do all we can to be ready.