人工智能初创公司伽利略科技 (Galileo Technologies) 在幻觉指数基准测试中将 Claude 3.5 Sonnet、谷歌的 Gemini 和阿里巴巴的 Qwen2-72B-Instruct 评为榜首。 AI startup Galileo Technologies ranks Claude 3.5 Sonnet, Google's Gemini, and Alibaba's Qwen2-72B-Instruct top in the Hallucination Index benchmark.
人工智能初创公司伽利略科技 (Galileo Technologies) 在新的基准测试“幻觉指数”中对中端和开源大型语言模型进行了高度评价。 AI startup Galileo Technologies has ranked midrange and open-source large language models highly in a new benchmark test, the Hallucination Index. 该基准评估了 22 种领先的生成式 AI 模型,并在三个任务集合中测量了它们的准确性。 The benchmark, which evaluates 22 leading generative AI models, measured their accuracy across three task collections. Anthropic 的 Claude 3.5 Sonnet 位居榜首,而谷歌的 Gemini 1.5 Flash 则在成本方面表现最佳。 Anthropic's Claude 3.5 Sonnet topped the ranking, while Google's Gemini 1.5 Flash performed best on cost. 阿里巴巴的 Qwen2-72B-Instruct 是表现最佳的开源模型。 Alibaba's Qwen2-72B-Instruct was the top-performing open-source model.