AI7 views

Claude Beats GPT-5 in Real-World Work Tasks, According to OpenAI Study

A surprising new study from OpenAI itself reveals that Claude outperformed GPT-5 in practical workplace scenarios.

Test Results: Claude Takes the Lead

The study evaluated AI models on real-world business tasks, including:

  • Responding to unhappy customer emails
  • Optimizing table layouts
  • Auditing prices
Performance Rankings:
  1. Claude Opus 4.1 - 47.6% accuracy ✓
  2. GPT-5 high - 38.8%
  3. o3 high - 34.1%
  4. Gemini 2.5 Pro - 25.5%
  5. Grok 4 - 24.3%
  6. GPT-4o - 12.4% (worst performance)

Where Claude Excels

Claude demonstrated superior performance in:

  • Public services
  • Healthcare
  • Social assistance tasks

The Bottom Line

This OpenAI study confirms that Claude Opus 4.1 currently leads in practical workplace applications, with nearly 50% accuracy in real-world tasks—significantly outperforming even OpenAI's own GPT-5.

Source: TechRadar