Codú
‹ Back to feed

// Hacker Noon · 16 February 2026

This AI Scored 67% in the US Medical Exam And Here's Why That Matters

Researchers at Google Research and DeepMind introduce MultiMedQA, a broad medical benchmark, and Med-PaLM, a medically aligned large language model. Flan-PaLM sets new records on medical exams, including 67.6% on USMLE-style questions. But human evaluation reveals safety gaps. With instruction promp...

Hacker Noon
@hacker-noon · Linh Dao Smooke
hackernoon.com
Read Full Article at hackernoon.com
Hacker Noon@hacker-noon

Discussion 0

Loading

Got something to say?

or to join the conversation.

Learn to build with AI and grow with people doing the same — it's free.