GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models - V2EX

首页注册登录

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

发现一篇很有意思的论文 https://arxiv.org/abs/2410.05229，v2 上没什么讨论。

论文质疑 GSM 评测指标的合理性，并通过一些 trick 来给原本的题目作出数值替换、加一些干扰子句，发现各大模型“推理”能力都显著下滑甚至砍半。

目前尚无回复

关于 · 帮助文档 · 博客 · API · FAQ · 实用小工具 · 1555 人在线 最高记录 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 20ms · UTC 17:13 · PVG 01:13 · LAX 09:13 · JFK 12:13
Developed with CodeLauncher
♥ Do have faith in what you're doing.