
METR
METR researches, develops and runs cutting-edge tests of AI capabilities, including broad autonomous capabilities and the ability of AI systems to accelerate AI R&D. We also study potential AI behavior …
Five hours of expert level autonomy: METR’s Claude Opus 4.5’s ...
5 hours ago · A new result from the AI evaluation nonprofit METR has pushed the conversation around autonomous AI systems into new territory. According to METR’s latest reporting, Claude Opus 4.5 …
METR: Claude Opus 4.5 has a 50% task completion time horizon ...
1 day ago · METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year — We estimate that, on …
METR - Wikipedia
In March 2025, METR published a paper noting that the length of software engineering tasks that the leading AI model could complete had a doubling time of around 7 months between 2019 and 2024.
Anthropic's models beat o3 in some time-horizon tests | METR ...
In measurements using our set of multi-step software and reasoning tasks, Anthropic's Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively. Note ...
Exponential AI Progress Defies Slowdown Claims: METR and ...
Sep 29, 2025 · A new analysis brings hard numbers from METR and OpenAI evaluations and shows models already completing 2-hour tasks with meaningful success rates, with GPT-5 and Claude …
METR - AI Wiki - Artificial Intelligence Wiki
METR's mission centers on understanding and quantifying the risks posed by increasingly capable autonomous AI systems. The organization serves as an independent third-party evaluator for major …