This text initially appeared on Medium. Tim O’Brien has given us permission to repost right here on Radar. |
While you’re working with AI instruments like Cursor or GitHub Copilot, the actual energy isn’t simply accessing completely different fashions—it’s understanding when to make use of them. Some jobs are OK with Auto. Others want a stronger mannequin. And generally you need to bail and swap for those who proceed spending cash on a fancy downside with a lower-quality mannequin. For those who don’t, you’ll waste each money and time.
And that is the lacking dialogue in code era. There are a couple of “camps” right here; nearly all of individuals writing about this seem to view this as a fantastical and enjoyable “vibe coding” expertise, and some individuals on the market try to make use of this expertise to ship actual merchandise. If you’re in that final class, you’ve in all probability began to comprehend you could spend a implausible amount of cash for those who don’t have a technique for mannequin choice.
Let’s make it very particular—for those who join Cursor and drop $20/month on a subscription utilizing Auto and you’re pleased with the output, there’s not a lot to fret about. However in case you are beginning to run brokers in parallel and are paying for token consumption atop a month-to-month subscription, this publish will make sense. In my very own expertise, a single developer working alone can simply spend $200–$300/day (or 4 instances that determine) if they’re attempting to sort out a undertaking and have opted for the most costly mannequin.
And—in case you are an organization and also you give your builders limitless entry to those instruments—prepare for some surprises.
My Escalation Ladder for Fashions…
- Begin right here: Auto. Let Cursor path to a powerful mannequin with good capability. If output high quality degrades or the loop happens, escalate the problem. (Cursor explicitly says Auto selects amongst premium fashions and can swap when output is degraded.)
- Medium-complexity duties: Sonnet 4/GPT‑5/Gemini. Use for targeted duties on a handful of recordsdata: sturdy unit assessments, focused refactors, API remodels.
- Heavy carry: Sonnet 4 – 1 million. If I have to do one thing that requires extra context, however I nonetheless don’t need to pay prime greenback, I’ve been beginning to transfer up fashions that don’t shortly max out on context.
- Ultraheavy carry: Opus 4/4.1. Use this when the duty spans a number of initiatives or requires lengthy context and cautious reasoning, then swap again as soon as the large transfer is completed. (Anthropic positions Opus 4 as a deep‑reasoning, lengthy‑horizon mannequin for coding and agent workflows.)
Auto works nice, however there are occasions when you may sense that it’s chosen the incorrect mannequin, and for those who use these fashions sufficient, you recognize when you find yourself Gemini Professional output by the verbosity or the ChatGPT fashions by the best way they go about fixing an issue.
I’ll admit that my heavy and ultraheavy decisions listed here are biased in the direction of the fashions I’ve had extra expertise with—your individual expertise may range. Nonetheless, you must also have an identical escalation listing. Begin with Auto and solely improve if it’s worthwhile to; in any other case, you will be taught some classes about how a lot this prices.
Watch Out for “Pondering” Mannequin Prices
Some fashions assist express “pondering” (longer reasoning). Helpful, however costlier. Cursor’s docs be aware that enabling pondering on particular Sonnet variations can depend as two requests below workforce request accounting, and within the particular person plans, the identical thought interprets to extra tokens burned. In brief, pondering mode is great—use it if you want it.
And when do you want it? My rule of thumb right here is that once I perceive what must be carried out already, once I’m asking for a unit check to be polished or a technique to be executed within the sample of one other… I normally don’t want a pondering mannequin. However, if I’m asking it to research an issue and suggest numerous choices for me to select from, or (one thing I do typically) once I’m asking it to problem my selections and play satan’s advocate, I’ll pay the premium for the very best mannequin.
Max Mode and When to Use It
For those who want big context home windows or prolonged reasoning (e.g., sweeping adjustments throughout 20+ recordsdata), Max Mode may help—however it should eat extra utilization. Make Max Mode a short-term instrument, not your default. If you end up continually requiring Max Mode to be turned on, there’s a very good probability you’re “overapplying” this expertise.
If it must eat one million tokens for hours on finish? That’s normally a touch that you simply want one other programmer. Extra on that later, however what I’ve seen too typically are managers who suppose that is just like the “vibe coding” they’re witnessing. Spoiler alert: Vibe coding is that factor that individuals do in displays as a result of it takes 5 minutes to make a foolish online game. It’s 100% not programming, and to make use of codegen, right here’s the key: It’s a must to perceive methods to program.
Max Mode and pondering fashions will not be a shortcut, and neither are they a substitute for good programmers. For those who suppose they’re, you will be paying prime greenback for code that can in the future must be rewritten by a very good programmer utilizing these similar instruments.
Most Vital Tip: Watch Your Invoice as It Occurs
A very powerful tip is to frequently monitor your utilization and utilization charges in Cursor, since they seem inside a minute or two of working one thing. You possibly can see utilization by the minute, the variety of tokens consumed, and in some instances, how a lot you’re being charged past your subscription. Make a behavior of checking a few instances a day, particularly throughout heavy periods, and ideally each half hour. This helps you catch runaway prices—like spending $100 an hour—earlier than they get out of hand, which is fully potential for those who’re working many parallel brokers or doing resource-intensive work. Paying consideration ensures you keep accountable for each your utilization and your invoice.
Preserve Observe and Keep away from Loops
The opposite factor it’s worthwhile to do is hold observe of what works and what doesn’t. Over time, you’ll discover it’s very straightforward to make errors, and the fashions themselves can generally fall into loops. You may give an instruction, and as a substitute of resolving it, the system retains working the identical course of time and again. For those who’re not paying consideration, you may burn by numerous tokens—and some huge cash—with out truly getting sound output. That’s why it’s important to look at your periods carefully and be able to interrupt if one thing seems prefer it’s caught.
One other pitfall is pushing the fashions past their limits. There are duties they’ll’t deal with properly, and when that occurs, it’s tempting to maintain rephrasing the request and asking once more, hoping for a greater consequence. In apply, that always results in the identical cycle of failure, besides you’re footing the invoice for each try. Realizing the place the boundaries are and when to cease is crucial.
A sensible method to keep on prime of that is to take care of a working diary of what labored and what didn’t. Document prompts, outcomes, and notes about effectivity so you may be taught from expertise as a substitute of repeating costly errors. Mixed with maintaining a tally of your dwell utilization metrics, this behavior will enable you refine your strategy and keep away from losing each money and time.