Claude Opus 4 Is Here: Anthropic's Most Powerful Model Yet

6/7/2026

Anthropic dropped Claude Opus 4 this week, and the benchmarks are impressive. But as always, real-world usage tells a different story than leaderboards.

Let me start with the numbers. On SWE-bench, the standard for measuring AI coding capabilities, Claude Opus 4 scored 67.8 percent. That is about 12 points higher than the previous best. On HumanEval, it hit 96.2 percent. On GPQA, graduate-level science questions, it scored 84.5 percent.

Impressive? Yes. But here is what actually matters.

I have been using Claude Opus 4 through Claude Code for the past few days, and the most noticeable improvement is context handling. The 200K token context window is not new for Claude models, but Opus 4 actually uses it better. I tested this by loading an entire Django project into context, about 15,000 lines of code across 40 files, and asking detailed questions about cross-cutting concerns. It handled it without losing track.

The second thing I noticed is better rejection of bad instructions. When I deliberately asked Claude Opus 4 to write code with a known security vulnerability, it refused and explained why. This is a significant improvement over previous models that would sometimes follow harmful instructions if they were well-framed.

The code quality itself is excellent. Claude Opus 4 generates cleaner, more idiomatic code than its predecessors. React components, Python async patterns, Rust error handling. It produces code that looks like an experienced developer wrote it.

Pricing is the same as Claude Opus 3.5: 15 dollars per 1M input tokens, 75 dollars per 1M output tokens. For most developers, this translates to roughly 30-60 dollars per month with regular Claude Code usage.

The model is available immediately through Claude Code and the Anthropic API. If you have been on the fence about trying Claude Code, Opus 4 might be the reason to jump in.