The Inference Price War Is Here: AMD Runs GLM-5.2 at 2x Cheaper Than Blackwell
Wafer published benchmark numbers yesterday that should change how you think about inference costs. Running GLM-5.2 on AMD MI355X hardware, they hit 2,626 tokens per second per node at 2.4 requests per second, with sub-5 second time-to-first-token and 100 percent success rate. The headline number is not the raw throughput. It is that they achieved 80 percent of B200 performance at less than half the cost per GPU.
Full article content is being processed. Check back soon for the complete story with analysis and key takeaways.
In the meantime, browse our latest articles for more AI, crypto, and tech coverage.