HealthBench: OpenAI’s New Standard for Evaluating AI in Healthcare

How do you know if an AI model is actually good at medicine? It’s a harder question than it sounds. Passing USMLE-style questions is one thing. Responding well to a panicked neighbor asking what to do with an unresponsive elderly man, in Swahili, with no known health history, is another. HealthBench, released by OpenAI in […]