研究发现AI像ChatGPT一样, 在真正的医学谈话中表现不佳, 尽管测试评分良好。
Study finds AI like ChatGPT performs poorly in real medical conversations despite scoring well on tests.
来自哈佛医学院和斯坦福大学的研究人员发现,尽管象ChatGPT这样的AI模型在标准化医学测试方面表现良好,但它们在现实世界医学对话中的效力有限。
Researchers from Harvard Medical School and Stanford University found that while AI models like ChatGPT perform well on standardized medical tests, their effectiveness in real-world medical conversations is limited.
该研究使用了一个新的评价框架,称为CRAFT-MD,它模拟现实世界临床互动。
The study used a new evaluation framework called CRAFT-MD, which simulates real-world clinical interactions.
AI模型努力收集患者信息和准确诊断,强调在临床环境使用这些工具之前需要更现实的测试方法。
The AI models struggled with collecting patient information and making accurate diagnoses, highlighting the need for more realistic testing methods before these tools are used in clinical settings.