@
siriussilen 1800 种多么?基本都是改写、翻译、推理、抽取这类比较传统的任务,最常用的 coding 和 role playing 是缺失的。
此外,在真实的 c 端场景里,用户往往不会那么精心的编写 prompt ,比如"请详细的为我介绍一下大型语言模型是什么",用户的 prompt 可能是"啥是大语言模型",甚至语义、标点都不明确。如果只用这种 input 非常工整的数据训练,那么对于真实人类的输入的鲁棒性就会很差。当然,这不是贬低 orca 的工作,而是说在混合数据的时候,应该特别关注任务的多样性和输入的多样性,不能仅仅依赖单一数据集。
实际上,orca 的"良好"表现基本只存在于论文精心编写的 prompt 中,换成随意的输入就歇菜了。
举例
prompt: whats large lm
blossom: "LM" typically stands for "language model." In the context of artificial intelligence and natural language processing, a large LM refers to a language model that has been trained on a large amount of data and has a high number of parameters. These models are designed to understand and generate human-like text based on the input they receive. xxx 后面还有一些内容,省略。
orca2: I'm sorry, I don't understand your question. Could you please rephrase it or provide more context?
再举一个 case ,带有逻辑陷阱的推理任务,一样无法完成。
prompt: Joe has 10 apples. He washed 3 apples. How many apples does he have now?
blossom: Joe initially had 10 apples, and after washing 3 apples, he still has 10 apples. Washing the apples does not reduce their quantity.
orca2: Given:
Joe has 10 apples
He washed 3 apples
Goal: How many apples does he have now?
Steps:
To find the number of apples Joe has now, we need to subtract the number of washed apples from the number of original apples.
We can use the subtraction algorithm to do this.
10
- 3
----
7
此处省略一些内容
### Final answer: Joe has 7 apples now.