A fresh set of benchmarks is needed to assess artificial intelligence's real-world knowledge, according to experts.
Artificial intelligence (AI) models have shown impressive performance on law exams, answering multiple-choice, short-answer, and essay questions as well as humans [1]. However, they struggle with real-world legal tasks.
Some lawyers have learnt that the hard way, and have been fined for filing AI-generated court briefs that misrepresented principles of law and cited non-existent cases.
Experts like Chaudhri, principal scientist at Knowledge Systems Research in Sunnyvale, California, emphasize the need for better understanding of AI's capabilities.
A new Turing test could help specialists evaluate AI's real-world knowledge and improve its performance in practical applications.
Author's summary: New benchmarks for AI assessment.