苹果研究者发现LLMs更依赖模式匹配而非逻辑推理, Apple researchers find LLMs rely more on pattern-matching than logical reasoning, struggling with complex questions.
苹果研究人员对大型语言模型(LLMs)的数学推理能力提出了关切,发现他们的反应因输入的微小变化而大不相同。 Apple researchers have raised concerns about the mathematical reasoning abilities of large language models (LLMs), finding that their responses vary significantly based on slight input changes. 这说明LLMs更多地依赖概率模式匹配而不是真正的逻辑推理。 This suggests LLMs rely more on probabilistic pattern-matching than true logical reasoning. 为了更好地评估这些能力,他们采用了全球SM-Symbolic基准,揭示了LLMs与复杂问题作斗争,突出了它们在可靠推理方面的局限性。 To better assess these capabilities, they introduced the GSM-Symbolic benchmark, revealing that LLMs struggle with complex questions, highlighting their limitations in reliable reasoning.