The Inference Price War Is Here: AMD Runs GLM-5.2 at 2x Cheaper Than Blackwell

2026-07-04 · 4 min read

Wafer published benchmark numbers yesterday that should change how you think about inference costs. Running GLM-5.2 on AMD MI355X hardware, they hit 2,626 tokens per second per node at 2.4 requests per second, with sub-5 second time-to-first-token and 100 percent success rate. The headline number is not the raw throughput. It is that they achieved 80 percent of B200 performance at less than half the cost per GPU.

Full article content is being processed. Check back soon for the complete story with analysis and key takeaways.

In the meantime, browse our latest articles for more AI, crypto, and tech coverage.

The Inference Price War Is Here: AMD Runs GLM-5.2 at 2x Cheaper Than Blackwell

Related Articles

GLM-5.2: The Open-Weight Model That Matches GPT-5.5 for 1/6 the Cost

Palantir's Karp Declares War on Token Pricing: Something Has Gone Completely Wrong

The AI Tax Debate Is No Longer Theoretical — Progressives Are Putting Numbers on It

Never Miss Another Call

Enjoyed this article?

Bella