GPT-5 Review: Raw Performance, Real Testing

Engineering Superpowers: Coding with GPT-5

Let’s begin where GPT-5 truly shines: software development.

OpenAI claims GPT-5 is “a true coding collaborator.” Benchmarks like SWE-Bench (74.9% accuracy) and Ader Polyglot (88%) reinforce this claim. And indeed, the model displays an impressive understanding of:

Existing codebases
Bug fixing
Refactoring
Generating multi-file applications

🎮 Real-World Test 1: 2D Space Game

We asked GPT-5 to build a self-contained 2D space game in which a spaceship avoids and destroys asteroids. The model generated complete HTML + JS code, respecting prompt constraints, and the result worked—with basic functionality.

However, there were clear issues:

No runtime instructions provided
UI lacked polish
Movement was janky
Code wasn’t modular

Takeaway: GPT-5 can build basic games from scratch but still requires a developer’s hand for refinement and deployment. It’s more of an “accelerator” than a “one-click creator.”

Tool Use & Agentic Workflows

This is where GPT-5 separates itself from GPT-4 and other predecessors.

It can chain tools, reason across steps, and manage multiple agents/tools simultaneously. In one test, I asked it to build a Lego-style brutalist building editor, and it generated a two-file solution (index.html and main.js) using modular DOM elements.

While the output code was syntactically correct and well-commented, the app failed to render anything useful when served. Despite the failure, this revealed GPT-5’s ability to think in multi-file, componentized architecture, a big leap over past models.

Agentic Control

GPT-5 introduces intelligent agentic behavior:

Parallel tool execution
Sequencing without instruction loss
Coherent memory across steps
Error-handling improvements
Recovery from incomplete inputs

This makes GPT-5 viable for autonomous workflows, potentially challenging startups focused on multi-agent orchestration. Expect to see GPT-5 integrated into production tools for task automation, ETL pipelines, and data engineering orchestration very soon.

Real-World Strategy Task

To test high-level reasoning, we gave GPT-5 a CEO crisis prompt: navigate ambiguity, communicate under pressure, and make irreversible decisions.

The output? Stunningly good.

GPT-5 delivered a response that balanced:

Immediate accountability
Public trust building
Long-term restructuring
Ethical transparency

It synthesized corporate ethics, risk communication, and regulatory realism into one coherent action plan. This wasn’t just filler—it was a genuine masterclass in leadership communication.

GPT-5: Guardrails & Humor

We tested GPT-5’s moderation system with an intentionally humorous (borderline inappropriate) prompt. The model’s response was measured, respectful, and displayed mature filtering without being patronizing. It acknowledged humor while preserving social boundaries.

No hallucinations. No trolling. Just balanced guidance.

Multilingual & Obscure Language Translation

We tested the phrase: “Sometimes you just have to let go” in over a dozen languages including Urdu, Saraiki, Malagasy, and Ancient Runes.

GPT-5:

Handled well-known languages with near-native phrasing
Managed obscure and regional languages with structurally accurate, if slightly semantically shallow, translations
Generated random language output with plausible structure

This shows GPT-5’s upgraded token diversity, multilingual embeddings, and capacity to handle edge cases—great news for global applications.

GPT-5 Weak Spots & Limitations

Let’s be clear: GPT-5 is not flawless.

Load latency made testing difficult
Code often lacks execution instructions
Multimodal capabilities are not fully live
Certain UI components felt underbaked
Creative outputs (e.g., humor, fiction) remain solid, but not significantly stronger than GPT-4

Developer Experience for GPT-5

From the API side:

New controls include reasoning effort, response style, and custom tool support
Output format flexibility: plain text, no JSON required
Multiple model sizes allow cost-performance trade-offs
Early API access seems tied to Playground first, with ChatGPT rollout staggered

Final Verdict: Intentional Intelligence

If GPT-4 was a generalist, GPT-5 is a specialist with initiative.

It acts with intention, follows complex instructions closely, and excels at structured, multi-step reasoning. It’s not just reactive—it feels agentic, like it’s trying to solve your problem, not just respond to your input.

That said, don’t expect it to “just work” every time. It’s still an assistant, not a replacement.

One-Word Summary: Deliberate

GPT-5 delivers deliberate, goal-oriented intelligence. It’s precise, powerful, and collaborative—but demands thoughtful prompting to unlock its full potential.

Have you tried using GPT-5 yet? What’s your verdict?

GPT-5 Review: Raw Performance, Real Testing

Engineering Superpowers: Coding with GPT-5

🎮 Real-World Test 1: 2D Space Game

Tool Use & Agentic Workflows

Agentic Control

Real-World Strategy Task

GPT-5: Guardrails & Humor

Multilingual & Obscure Language Translation

GPT-5 Weak Spots & Limitations

Developer Experience for GPT-5

Final Verdict: Intentional Intelligence

One-Word Summary: Deliberate

By Andrew Miller

You Missed

Sora AI Review (2025): Text-to-Video That Wows—Mostly

MySignature Review: The AI Email Signature Tool

MarketMuse Review: The AI Content Strategist Your Blog Needs (But Your Wallet Might Side-Eye)

Lumen5 Review: Can AI Really Turn Your Blog Into a Video?

Categories

GPT-5 Review: Raw Performance, Real Testing

Engineering Superpowers: Coding with GPT-5

🎮 Real-World Test 1: 2D Space Game

Tool Use & Agentic Workflows

Agentic Control

Real-World Strategy Task

GPT-5: Guardrails & Humor

Multilingual & Obscure Language Translation

GPT-5 Weak Spots & Limitations

Developer Experience for GPT-5

Final Verdict: Intentional Intelligence

One-Word Summary: Deliberate

By Andrew Miller

Related Post

MarketMuse Review: The AI Content Strategist Your Blog Needs (But Your Wallet Might Side-Eye)

Bigjpg Review: The AI Upscaler Artists Actually Like

SurrealDB Review: A Backend Powerhouse for AI and Agentic Apps

You Missed

Sora AI Review (2025): Text-to-Video That Wows—Mostly

MySignature Review: The AI Email Signature Tool

MarketMuse Review: The AI Content Strategist Your Blog Needs (But Your Wallet Might Side-Eye)

Lumen5 Review: Can AI Really Turn Your Blog Into a Video?