1 points | by philipswood3 hours ago
Suggesting that, while they are not stochastic parrots - they're nowhere near human level intelligence yet.
> ...in this work we tested 7 state-of-the-art LLMs on simple comprehension questions targeting short sentences, purposefully setting an extremely low bar for the evaluation of the models.
> Systematic testing showed that the performance of these LLMs lags behind that of humans both quantitatively and qualitatively, providing further confirmation that tasks that are easy for humans are not always easily developed in AI. We argue that these results invite further reflection about the standards of evaluation we adopt for claiming human-likeness in AI.
Tasks like:
> John deceived Mary and Lucy was deceived by Mary. In this context, did Mary deceive Lucy?
> Franck read to himself and John read to himself, Anthony and Franck. In this context, was Franck read to?