"How to game the METR plot" by shash42
Failed to add items
Sorry, we are unable to add the item because your shopping cart is already at capacity.
Add to basket failed.
Please try again later
Add to Wish List failed.
Please try again later
Remove from Wish List failed.
Please try again later
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
14 prompts ruled AI discourse in 2025
The METR horizon length plot was an excellent idea: it proposed measuring the length of tasks models can complete (in terms of estimated human hours needed) instead of accuracy. I'm glad it shifted the community toward caring about long-horizon tasks. They are a better measure of automation impacts, and economic outcomes (for example, labor laws are often based on number of hours of work).
However, I think we are overindexing on it, far too much. Especially the AI Safety community, which based on it, makes huge updates in timelines, and research priorities. I suspect (from many anecdotes, including roon's) the METR plot has influenced significant investment [...]
---
Outline:
(01:24) 14. prompts ruled AI discourse in 2025
(04:58) To improve METR horizon length, train on cybersecurity contests
(07:12) HCAST Accuracy alone predicts log-linear trend in METR Horizon Lengths
---
First published:
December 20th, 2025
Source:
https://www.lesswrong.com/posts/2RwDgMXo6nh42egoC/how-to-game-the-metr-plot
---
Narrated by TYPE III AUDIO.
---
Images from the article:
No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.