About 88,100 results
Open links in new tab
  1. METR

    METR researches, develops and runs cutting-edge tests of AI capabilities, including broad autonomous capabilities and the ability of AI systems to accelerate AI R&D. We also study potential AI behavior …

  2. Five hours of expert level autonomy: METR’s Claude Opus 4.5’s ...

    5 hours ago · A new result from the AI evaluation nonprofit METR has pushed the conversation around autonomous AI systems into new territory. According to METR’s latest reporting, Claude Opus 4.5 …

  3. METR: Claude Opus 4.5 has a 50% task completion time horizon ...

    1 day ago · METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year — We estimate that, on …

  4. METR - Wikipedia

    In March 2025, METR published a paper noting that the length of software engineering tasks that the leading AI model could complete had a doubling time of around 7 months between 2019 and 2024.

  5. Anthropic's models beat o3 in some time-horizon tests | METR ...

    In measurements using our set of multi-step software and reasoning tasks, Anthropic's Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively. Note ...

  6. Exponential AI Progress Defies Slowdown Claims: METR and ...

    Sep 29, 2025 · A new analysis brings hard numbers from METR and OpenAI evaluations and shows models already completing 2-hour tasks with meaningful success rates, with GPT-5 and Claude …

  7. METR - AI Wiki - Artificial Intelligence Wiki

    METR's mission centers on understanding and quantifying the risks posed by increasingly capable autonomous AI systems. The organization serves as an independent third-party evaluator for major …