用新鲜、真实的内容，自然地学习语言！

按地区探索

人工智能初创公司伽利略科技 (Galileo Technologies) 在幻觉指数基准测试中将 Claude 3.5 Sonnet、谷歌的 Gemini 和阿里巴巴的 Qwen2-72B-Instruct 评为榜首。 AI startup Galileo Technologies ranks Claude 3.5 Sonnet, Google's Gemini, and Alibaba's Qwen2-72B-Instruct top in the Hallucination Index benchmark.

flag 人工智能初创公司伽利略科技 (Galileo Technologies) 在新的基准测试“幻觉指数”中对中端和开源大型语言模型进行了高度评价。 flag AI startup Galileo Technologies has ranked midrange and open-source large language models highly in a new benchmark test, the Hallucination Index. flag 该基准评估了 22 种领先的生成式 AI 模型，并在三个任务集合中测量了它们的准确性。 flag The benchmark, which evaluates 22 leading generative AI models, measured their accuracy across three task collections. flag Anthropic 的 Claude 3.5 Sonnet 位居榜首，而谷歌的 Gemini 1.5 Flash 则在成本方面表现最佳。 flag Anthropic's Claude 3.5 Sonnet topped the ranking, while Google's Gemini 1.5 Flash performed best on cost. flag 阿里巴巴的 Qwen2-72B-Instruct 是表现最佳的开源模型。 flag Alibaba's Qwen2-72B-Instruct was the top-performing open-source model.

3 文章

文章

SiliconANGLE

SD Times

PYMNTS.com

-- 显示更少 --

延伸阅读

Sonnet

FLASH

Generative AI

Anthropic

热门话题

按地区探索

文章

延伸阅读

相关故事