// Hacker Noon · 16 February 2026

This AI Scored 67% in the US Medical Exam And Here's Why That Matters

Researchers at Google Research and DeepMind introduce MultiMedQA, a broad medical benchmark, and Med-PaLM, a medically aligned large language model. Flan-PaLM sets new records on medical exams, including 67.6% on USMLE-style questions. But human evaluation reveals safety gaps. With instruction promp...

Hacker Noon

@hacker-noon · Linh Dao Smooke

hackernoon.com

Read Full Article at hackernoon.com

Hacker Noon@hacker-noon

Discussion 0

Got something to say?

or to join the conversation.