The next is Half 2 of three from Addy Osmani’s authentic publish “Context Engineering: Bringing Engineering Self-discipline to Components.” Half 1 might be discovered right here.
Nice context engineering strikes a stability—embrace every thing the mannequin really wants however keep away from irrelevant or extreme element that might distract it (and drive up value).
As Andrej Karpathy described, context engineering is a fragile mixture of science and artwork.
The “science” half entails following sure ideas and strategies to systematically enhance efficiency. For instance, for those who’re doing code technology, it’s nearly scientific that it is best to embrace related code and error messages; for those who’re doing question-answering, it’s logical to retrieve supporting paperwork and supply them to the mannequin. There are established strategies like few-shot prompting, retrieval-augmented technology (RAG), and chain-of-thought prompting that we all know (from analysis and trial) can increase outcomes. There’s additionally a science to respecting the mannequin’s constraints—each mannequin has a context size restrict, and overstuffing that window can’t solely enhance latency/value however doubtlessly degrade the standard if the essential items get misplaced within the noise.
Karpathy summed it up properly: “Too little or of the improper kind and the LLM doesn’t have the precise context for optimum efficiency. An excessive amount of or too irrelevant and the LLM prices would possibly go up and efficiency would possibly come down.”
So the science is in strategies for choosing, pruning, and formatting context optimally. As an illustration, utilizing embeddings to search out essentially the most related docs to incorporate (so that you’re not inserting unrelated textual content) or compressing lengthy histories into summaries. Researchers have even catalogued failure modes of lengthy contexts—issues like context poisoning (the place an earlier hallucination within the context results in additional errors) or context distraction (the place an excessive amount of extraneous element causes the mannequin to lose focus). Realizing these pitfalls, a superb engineer will curate the context rigorously.
Then there’s the “artwork” aspect—the instinct and creativity born of expertise.
That is about understanding LLMs’ quirks and refined behaviors. Consider it like a seasoned programmer who “simply is aware of” tips on how to construction code for readability: An skilled context engineer develops a really feel for tips on how to construction a immediate for a given mannequin. For instance, you would possibly sense that one mannequin tends to do higher for those who first define an answer method earlier than diving into specifics, so that you embrace an preliminary step like “Let’s suppose step-by-step…” within the immediate. Otherwise you discover that the mannequin usually misunderstands a selected time period in your area, so that you preemptively make clear it within the context. These aren’t in a handbook—you study them by observing mannequin outputs and iterating. That is the place prompt-crafting (within the previous sense) nonetheless issues, however now it’s in service of the bigger context. It’s just like software program design patterns: There’s science in understanding frequent options however artwork in understanding when and tips on how to apply them.
Let’s discover a number of frequent methods and patterns context engineers use to craft efficient contexts:
Retrieval of related information: Some of the highly effective strategies is retrieval-augmented technology. If the mannequin wants information or domain-specific knowledge that isn’t assured to be in its coaching reminiscence, have your system fetch that data and embrace it. For instance, for those who’re constructing a documentation assistant, you would possibly vector-search your documentation and insert the highest matching passages into the immediate earlier than asking the query. This manner, the mannequin’s reply shall be grounded in actual knowledge you supplied quite than in its typically outdated inner information. Key abilities right here embrace designing good search queries or embedding areas to get the precise snippet and formatting the inserted textual content clearly (with citations or quotes) so the mannequin is aware of to make use of it. When LLMs “hallucinate” information, it’s actually because we failed to supply the precise reality—retrieval is the antidote to that.
Few-shot examples and position directions: This hearkens again to traditional immediate engineering. In order for you the mannequin to output one thing in a selected model or format, present it examples. As an illustration, to get structured JSON output, you would possibly embrace a few instance inputs and outputs in JSON within the immediate, then ask for a brand new one. Few-shot context successfully teaches the mannequin by instance. Likewise, setting a system position or persona can information tone and conduct (“You might be an professional Python developer serving to a consumer…”). These strategies are staples as a result of they work: They bias the mannequin towards the patterns you need. Within the context-engineering mindset, immediate wording and examples are only one a part of the context, however they continue to be essential. In reality, you possibly can say immediate engineering (crafting directions and examples) is now a subset of context engineering—it’s one device within the toolkit. We nonetheless care rather a lot about phrasing and demonstrative examples, however we’re additionally doing all these different issues round them.
Managing state and reminiscence: Many functions contain a number of turns of interplay or long-running classes. The context window isn’t infinite, so a serious a part of context engineering is deciding tips on how to deal with dialog historical past or intermediate outcomes. A typical approach is abstract compression—after every few interactions, summarize them and use the abstract going ahead as an alternative of the total textual content. For instance, Anthropic’s Claude assistant mechanically does this when conversations get prolonged, to keep away from context overflow. (You’ll see it produce a “[Summary of previous discussion]” that condenses earlier turns.) One other tactic is to explicitly write essential information to an exterior retailer (a file, database, and so forth.) after which later retrieve them when wanted quite than carrying them in each immediate. That is like an exterior reminiscence. Some superior agent frameworks even let the LLM generate “notes to self” that get saved and might be recalled in future steps. The artwork right here is determining what to maintain, when to summarize, and how to resurface previous data on the proper second. Performed properly, it lets an AI preserve coherence over very lengthy duties—one thing that pure prompting would wrestle with.
Device use and environmental context: Trendy AI brokers can use instruments (e.g., calling APIs, operating code, internet shopping) as a part of their operations. After they do, every device’s output turns into new context for the following mannequin name. Context engineering on this situation means instructing the mannequin when and the way to make use of instruments after which feeding the outcomes again in. For instance, an agent might need a rule: “If the consumer asks a math query, name the calculator device.” After utilizing it, the consequence (say 42) is inserted into the immediate: “Device output: 42.” This requires formatting the device output clearly and possibly including a follow-up instruction like “Given this consequence, now reply the consumer’s query.” A number of work in agent frameworks (LangChain, and so forth.) is basically context engineering round device use—giving the mannequin an inventory of accessible instruments, together with syntactic pointers for invoking them, and templating tips on how to incorporate outcomes. The bottom line is that you just, the engineer, orchestrate this dialogue between the mannequin and the exterior world.
Data formatting and packaging: We’ve touched on this, however it deserves emphasis. Typically you could have extra data than suits or is beneficial to incorporate absolutely. So that you compress or format it. In case your mannequin is writing code and you’ve got a big codebase, you would possibly embrace simply perform signatures or docstrings quite than complete information, to provide it context. If the consumer question is verbose, you would possibly spotlight the primary query on the finish to focus the mannequin. Use headings, code blocks, tables—no matter construction greatest communicates the information. For instance, quite than “Person knowledge: [massive JSON]… Now reply query.” you would possibly extract the few fields wanted and current “Person’s Identify: X, Account Created: Y, Final Login: Z.” That is simpler for the mannequin to parse and likewise makes use of fewer tokens. Briefly, suppose like a UX designer, however your “consumer” is the LLM—design the immediate for its consumption.
The influence of those strategies is large. Whenever you see a formidable LLM demo fixing a posh process (say, debugging code or planning a multistep course of), you’ll be able to guess it wasn’t only a single intelligent immediate behind the scenes. There was a pipeline of context meeting enabling it.
As an illustration, an AI pair programmer would possibly implement a workflow like:
- Search the codebase for related code.
- Embody these code snippets within the immediate with the consumer’s request.
- If the mannequin proposes a repair, run assessments within the background.
- If assessments fail, feed the failure output again into the immediate for the mannequin to refine its answer.
- Loop till assessments cross.
Every step has rigorously engineered context: The search outcomes, the check outputs, and so forth., are every fed into the mannequin in a managed manner. It’s a far cry from “simply immediate an LLM to repair my bug” and hoping for one of the best.
The Problem of Context Rot
As we get higher at assembling wealthy context, we run into a brand new downside: Context can truly poison itself over time. This phenomenon, aptly termed “context rot” by developer Workaccount2 on Hacker Information, describes how context high quality degrades as conversations develop longer and accumulate distractions, dead-ends, and low-quality info.
The sample is frustratingly frequent: You begin a session with a well-crafted context and clear directions. The AI performs fantastically at first. However because the dialog continues—particularly if there are false begins, debugging makes an attempt, or exploratory rabbit holes—the context window fills with more and more noisy info. The mannequin’s responses step by step turn into much less correct and extra confused, or it begins hallucinating.

Why does this occur? Context home windows aren’t simply storage—they’re the mannequin’s working reminiscence. When that reminiscence will get cluttered with failed makes an attempt, contradictory info, or tangential discussions, it’s like attempting to work at a desk coated in previous drafts and unrelated papers. The mannequin struggles to establish what’s at present related versus what’s historic noise. Earlier errors within the dialog can compound, making a suggestions loop the place the mannequin references its personal poor outputs and spirals additional off monitor.
That is particularly problematic in iterative workflows—precisely the type of advanced duties the place context engineering shines. Debugging classes, code refactoring, doc enhancing, or analysis initiatives naturally contain false begins and course corrections. However every failed try leaves traces within the context that may intrude with subsequent reasoning.
Sensible methods for managing context rot embrace:
- Context pruning and refresh: Workaccount2’s answer is “I work round it by often making summaries of situations, after which spinning up a brand new occasion with contemporary context and feed within the abstract of the earlier occasion.” This method preserves the important state whereas discarding the noise. You’re primarily doing rubbish assortment to your context.
- Structured context boundaries: Use clear markers to separate completely different phases of labor. For instance, explicitly mark sections as “Earlier makes an attempt (for reference solely)” versus “Present working context.” This helps the mannequin perceive what to prioritize.
- Progressive context refinement: After vital progress, consciously rebuild the context from scratch. Extract the important thing selections, profitable approaches, and present state, then begin contemporary. It’s like refactoring code—often that you must clear up the accrued cruft.
- Checkpoint summaries: At common intervals, have the mannequin summarize what’s been completed and what the present state is. Use these summaries as seeds for contemporary context when beginning new classes.
- Context windowing: For very lengthy duties, break them into phases with pure boundaries the place you’ll be able to reset context. Every part will get a clear begin with solely the important carry-over from the earlier part.
This problem additionally highlights why “simply dump every thing into the context” isn’t a viable long-term technique. Like good software program structure, good context engineering requires intentional info administration—deciding not simply what to incorporate but additionally when to exclude, summarize, or refresh.
AI instruments are shortly shifting past chat UX to stylish agent interactions. Our upcoming AI Codecon occasion, Coding for the Agentic World, will spotlight how builders are already utilizing brokers to construct progressive and efficient AI-powered experiences. We hope you’ll be a part of us on September 9 to discover the instruments, workflows, and architectures defining the following period of programming. It’s free to attend. Register now to save lots of your seat.