AI may ace multiple-choice medical exams, but it still stumbles when faced with changing clinical information, according to new research in the New England Journal of Medicine.
University of Alberta neurology resident Liam McCoy evaluated how well large language models perform clinical reasoning - the ability to sort through symptoms, order the right tests, evaluate new information and come to the correct conclusion about what's wrong with a patient.
He found that advanced AI models struggle to update judgment in response to new and uncertain information, and often fail to recognize when some information is completely irrelevant. In fact, some recent improvements designed to make AI reasoning better have actually made this overconfidence problem worse.
It all means that while AI may do really well on medical licensing exams, there's a lot more to being a good doctor than instantly recalling facts, says McCoy.










