Down and out with Cerebras Code

September 15, 2025

115

Out of Fireworks and into the fire

However, my start with Cerebras’s hosted Qwen was not the same as what I experienced (for a lot more money) on Fireworks, another provider. Initially, Cerebras’s Qwen didn’t even work in my CLI. It also didn’t seem to work in Roo Code or any other tool I knew how to use. After taking a bug report, Cerebras told me it was my code. My same CLI that worked on Fireworks, for Claude, for GPT-4.1 and GPT-5, for o3, for Qwen hosted by Qwen/Alibaba was at fault, said Cerebras. To be fair, my log did include deceptive artifacts when Cerebras fragmented the stream, putting out stream parts as messages (which Cerebras still does on occasion). However, this has been generally their approach. Don’t fix their so-called OpenAI compatibility—blame and/or adapt the client. I took the challenge and adapted my CLI, but it was a lot of workarounds. This was a massive contrast with Fireworks. I had issues with Fireworks when it started and showed them my debug output; they immediately acknowledged the problem (occasionally it would spit out corrupt, native tool calls instead of OpenAI-style output) and fixed it overnight. Cerebras repeatedly claimed their infrastructure was working perfectly and requests were all successful—in direct contradiction to most commentary on their Discord.

Feeling like I had finally cracked the nut after three weeks of on-and-off testing and adapting, I grabbed a second Cerebras Code Max account when the window opened again. This was after discovering that for part of the time, Cerebras had charged me for a Max account but given me a Pro account. They fixed it and offered no compensation for the days my service was set to Pro, not Max, and it is difficult to prove because their analytics console is broken, in part because it provides measurements in local time, but the limits are in UTC.

Then I did the math. One Cerebras Code Max account is limited to 120 million tokens per day at a cost equivalent to four times that of a Cerebras Code Pro account. The Pro account is 24 million tokens per day. If you multiply that by four, you get 96 million tokens. However, the Pro account is limited to 300k tokens per minute, compared to 400k for the Max. Using Cerebras is a bit frustrating. For 10 to 20 seconds, it really flies, then you hit the cap on tokens per minute, and it throws 429 errors (too many requests) until the minute is up. If your coding tool is smart, it will just retry with an exponential back-off. If not, it will break the stream. So, had I bought four Pro accounts, I could have had 1,200,000 TPM in theory, a much better value than the Max account.

Previous articleMore hardware won’t fix bad engineering

Next articleGoogle AI helps Indian farmers predict start of monsoon season

Down and out with Cerebras Code

Out of Fireworks and into the fire

Related Articles

Intel CIO on earning trust in the first six months

CIOs say they wouldn’t pull workloads back from the cloud

Why enterprises are still bad at multicloud

LEAVE A REPLY Cancel reply

CATEGORIES & TAGS

LATEST COMMENTS

Most Popular

Oracle under fire for its handling of separate security incidents

These fintech companies are hiring in 2025 after a turbulent year

8 Lessons That Helped Me Lead Remote Teams with Trust, Inclusion, and Results | by Subhasis Ghosh | The Startup | Apr, 2025

It’s Time To Stop Doing Feature Requests

Choosing the Right SAP Implementation Partner: What Businesses Need to Know

Down and out with Cerebras Code

Out of Fireworks and into the fire

Related Articles

LEAVE A REPLY Cancel reply

Stay Connected

CATEGORIES & TAGS

LATEST COMMENTS

Most Popular