is the science of offering LLMs with the proper context to maximise efficiency. While you work with LLMs, you usually create a system immediate, asking the LLM to carry out a sure activity. Nevertheless, when working with LLMs from a programmer’s perspective, there are extra parts to contemplate. You need to decide what different knowledge you may feed your LLM to enhance its capability to carry out the duty you requested it to do.
On this article, I’ll focus on the science of context engineering and how one can apply context engineering methods to enhance your LLM’s efficiency.
You can too learn my articles on Reliability for LLM Purposes and Doc QA utilizing Multimodal LLMs
Desk of Contents
Definition
Earlier than I begin, it’s vital to outline the time period context engineering. Context engineering is basically the science of deciding what to feed into your LLM. This may, for instance, be:
- The system immediate, which tells the LLM the best way to act
- Doc knowledge fetch utilizing RAG vector search
- Few-shot examples
- Instruments
The closest earlier description of this has been the time period immediate engineering. Nevertheless, immediate engineering is a much less descriptive time period, contemplating it implies solely altering the system immediate you might be feeding to the LLM. To get most efficiency out of your LLM, you need to contemplate all of the context you might be feeding into it, not solely the system immediate.
Motivation
My preliminary motivation for this text got here from studying this Tweet by Andrej Karpathy.
I actually agreed with the purpose Andrej made on this tweet. Immediate engineering is unquestionably an vital science when working with LLMs. Nevertheless, immediate engineering doesn’t cowl all the pieces we enter into LLMs. Along with the system immediate you write, you even have to contemplate parts akin to:
- Which knowledge must you insert into your immediate
- How do you fetch that knowledge
- The way to solely present related info to the LLM
- And many others.
I’ll focus on all of those factors all through this text.
API vs Console utilization
One vital distinction to make clear is whether or not you might be utilizing the LLMs from an API (calling it with code), or through the console (for instance, through the ChatGPT web site or utility). Context engineering is unquestionably vital when working with LLMs by means of the console; nonetheless, my focus on this article can be on API utilization. The rationale for that is that when utilizing an API, you might have extra choices for dynamically altering the context you might be feeding the LLM. For instance, you are able to do RAG, the place you first carry out a vector search, and solely feed the LLM crucial bits of knowledge, moderately than your entire database.
These dynamic adjustments usually are not obtainable in the identical method when interacting with LLMs by means of the console; thus, I’ll concentrate on utilizing LLMs by means of an API.
Context engineering methods
Zero-shot prompting
Zero-shot prompting is the baseline for context engineering. Doing a activity zero-shot means the LLM is performing a activity it hasn’t seen earlier than. You might be primarily solely offering a activity description as context for the LLM. For instance, offering an LLM with an extended textual content and asking it to categorise the textual content into class A or B, in keeping with some definition of the courses. The context (immediate) you might be feeding the LLM may look one thing like this:
You might be an professional textual content classifier, and tasked with classifying texts into
class A or class B.
- Class A: The textual content accommodates a constructive sentiment
- Class B: The subsequent accommodates a detrimental sentiment
Classify the textual content: {textual content}
Relying on the duty, this might work very effectively. LLMs are generalists and are in a position to carry out simplest text-based duties. Classifying a textual content into considered one of two courses will often be a easy activity, and zero-shot prompting will thus often work fairly effectively.
Few-shot prompting
This infographic highlights the best way to carry out few-shot prompting:

The follow-up from zero-shot prompting is few-shot prompting. With few-shot prompting, you present the LLM with a immediate much like the one above, however you additionally present it with examples of the duty it’s going to carry out. This added context will assist the LLM enhance at performing the duty. Following up on the immediate above, a few-shot immediate may seem like:
You might be an professional textual content classifier, and tasked with classifying texts into
class A or class B.
- Class A: The textual content accommodates a constructive sentiment
- Class B: The subsequent accommodates a detrimental sentiment
{textual content 1} -> Class A
{textual content 2} -> class B
Classify the textual content: {textual content}
You may see I’ve supplied the mannequin some examples wrapped in
Few-shot prompting works effectively since you are offering the mannequin with examples of the duty you might be asking it to carry out. This often will increase efficiency.
You may think about this works effectively on people as effectively. If you happen to ask a human a activity they’ve by no means accomplished earlier than, simply by describing the duty, they may carry out decently (after all, relying on the issue of the duty). Nevertheless, if you happen to additionally present the human with examples, their efficiency will often enhance.
General, I discover it helpful to consider LLM prompts as if I’m asking a human to carry out a activity. Think about as an alternative of prompting an LLM, you merely present the textual content to a human, and also you ask your self the query:
Given this immediate, and no different context, will the human be capable to carry out the duty?
If the reply is not any, it is best to work on clarifying and bettering your immediate.
I additionally need to point out dynamic few-shot prompting, contemplating it’s a way I’ve had a number of success with. Historically, with few-shot prompting, you might have a set listing of examples you feed into each immediate. Nevertheless, you may typically obtain larger efficiency utilizing dynamic few-shot prompting.
Dynamic few-shot prompting means choosing the few-shot examples dynamically when creating the immediate for a activity. For instance, in case you are requested to categorise a textual content into courses A and B, and you have already got an inventory of 200 texts and their corresponding labels. You may then carry out a similarity search between the brand new textual content you might be classifying and the instance texts you have already got. Persevering with, you may measure the vector similarity between the texts and solely select essentially the most related texts (out of the 200 texts) to feed into your immediate as context. This manner, you’re offering the mannequin with extra related examples of the best way to carry out the duty.
RAG
Retrieval augmented era is a widely known approach for growing the data of LLMs. Assume you have already got a database consisting of 1000’s of paperwork. You now obtain a query from a consumer, and should reply it, given the data inside your database.
Sadly, you may’t feed your entire database into the LLM. Though we now have LLMs akin to Llama 4 Scout with a 10-million context size window, databases are often a lot bigger. You due to this fact have to seek out essentially the most related info within the database to feed into your LLM. RAG does this equally to dynamic few-shot prompting:
- Carry out a vector search
- Discover essentially the most related paperwork to the consumer query (most related paperwork are assumed to be most related)
- Ask the LLM to reply the query, given essentially the most related paperwork
By performing RAG, you might be doing context engineering by solely offering the LLM with essentially the most related knowledge for performing its activity. To enhance the efficiency of the LLM, you may work on the context engineering by bettering your RAG search. This may, for instance, be accomplished by bettering the search to seek out solely essentially the most related paperwork.
You may learn extra about RAG in my article about creating a RAG system on your private knowledge:
Instruments (MCP)
You can too present the LLM with instruments to name, which is a vital a part of context engineering, particularly now that we see the rise of AI brokers. Instrument calling right this moment is usually accomplished utilizing Mannequin Context Protocol (MCP), an idea began by Anthropic.
AI brokers are LLMs able to calling instruments and thus performing actions. An instance of this could possibly be a climate agent. If you happen to ask an LLM with out entry to instruments concerning the climate in New York, it won’t be able to supply an correct response. The rationale for that is naturally that details about the climate must be fetched in actual time. To do that, you may, for instance, give the LLM a software akin to:
@software
def get_weather(metropolis):
# code to retrieve the present climate for a metropolis
return climate
If you happen to give the LLM entry to this software and ask it concerning the climate, it may then seek for the climate for a metropolis and offer you an correct response.
Offering instruments for LLMs is extremely vital, because it considerably enhances the skills of the LLM. Different examples of instruments are:
- Search the web
- A calculator
- Search through Twitter API
Matters to contemplate
On this part, I make a couple of notes on what it is best to contemplate when creating the context to feed into your LLM
Utilization of context size
The context size of an LLM is a vital consideration. As of July 2025, you may feed most frontier mannequin LLMs with over 100,000 enter tokens. This offers you with a number of choices for the best way to make the most of this context. You need to contemplate the tradeoff between:
- Together with a number of info in a immediate, thus risking a number of the info getting misplaced within the context
- Lacking some vital info within the immediate, thus risking the LLM not having the required context to carry out a selected activity
Normally, the one method to determine the steadiness, is to check your LLMs efficiency. For instance with a classificaition activity, you may examine the accuracy, given completely different prompts.
If I uncover the context to be too lengthy for the LLM to work successfully, I typically cut up a activity into a number of prompts. For instance, having one immediate summarize a textual content, and a second immediate classifying the textual content abstract. This can assist the LLM make the most of its context successfully and thus enhance efficiency.
Moreover, offering an excessive amount of context to the mannequin can have a big draw back, as I describe within the subsequent part:
Context rot
Final week, I learn an attention-grabbing article about context rot. The article was about the truth that growing the context size lowers LLM efficiency, although the duty issue doesn’t enhance. This suggests that:
Offering an LLM irrelevant info, will lower its capability to carry out duties succesfully, even when activity issue doesn’t enhance
The purpose right here is basically that it is best to solely present related info to your LLM. Offering different info decreases LLM efficiency (i.e., efficiency is just not impartial to enter size)
Conclusion
On this article, I’ve mentioned the subject of context engineering, which is the method of offering an LLM with the precise context to carry out its activity successfully. There are a number of methods you may make the most of to replenish the context, akin to few-shot prompting, RAG, and instruments. These are all highly effective methods you need to use to considerably enhance an LLM’s capability to carry out a activity successfully. Moreover, you even have to contemplate the truth that offering an LLM with an excessive amount of context additionally has downsides. Growing the variety of enter tokens reduces efficiency, as you would examine within the article about context rot.
👉 Comply with me on socials:
🧑💻 Get in contact
🔗 LinkedIn
🐦 X / Twitter
✍️ Medium
🧵 Threads