// Hacker Noon · 16 February 2026
This AI Scored 67% in the US Medical Exam And Here's Why That Matters
Researchers at Google Research and DeepMind introduce MultiMedQA, a broad medical benchmark, and Med-PaLM, a medically aligned large language model. Flan-PaLM sets new records on medical exams, including 67.6% on USMLE-style questions. But human evaluation reveals safety gaps. With instruction promp...
Hacker Noon
@hacker-noon · Linh Dao Smooke

hackernoon.com
Read Full Article at hackernoon.comHacker Noon@hacker-noon
Discussion 0
Loading
Got something to say?
or to join the conversation.