File size: 10,063 Bytes
36cdf5a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
# [Prompting for Instruction Following](https://chatgpt.com/canvas/shared/6825ebe022148191bceb9fa5473a34eb)
## Overview
GPT-4.1 represents a significant shift in how developers should structure prompts for reliable, deterministic, and consistent behavior. Unlike earlier models which often inferred intent liberally, GPT-4.1 adheres to instructions in a far more literal, detail-sensitive manner. This brings both increased control and greater responsibility for developers: well-designed prompts yield exceptional results, while ambiguous or conflicting instructions may result in brittle or unexpected behavior.
This guide outlines best practices, real-world examples, and design patterns to fully utilize GPT-4.1’s instruction-following improvements across a variety of applications. It is structured to help you:
* Understand GPT-4.1’s instruction handling behavior
* Design high-integrity prompt scaffolds
* Debug prompt failures and mitigate ambiguity
* Align instructions with OpenAI’s guidance around tool usage, task persistence, and planning
This file is designed to stand alone for practical use and is fully aligned with the broader `openai-cookbook-pro` repository.
## Why Instruction-Following Matters
Instruction following is central to:
* **Agent behavior**: models acting in multi-step environments must reliably interpret commands
* **Tool use**: execution hinges on clearly-defined tool invocation criteria
* **Support workflows**: factual grounding depends on accurate boundary adherence
* **Security and safety**: systems must not misinterpret prohibitions or fail to enforce policy constraints
With GPT-4.1’s shift toward literal interpretation, instruction scaffolding becomes the primary control interface.
## GPT-4.1 Instruction Characteristics
### 1. **Literal Compliance**
GPT-4.1 follows instructions with minimal assumption. If a step is missing or unclear, the model is less likely to “fill in” or guess the user’s intent.
* **Previous behavior**: interpreted vague prompts broadly
* **Current behavior**: waits for or requests clarification
This improves safety and traceability but also increases fragility in loosely written prompts.
### 2. **Order-Sensitive Resolution**
When instructions conflict, GPT-4.1 favors those listed **last** in the prompt. This means developers should order rules hierarchically:
* General rules go early
* Specific overrides go later
Example:
```markdown
# Instructions
- Do not guess if unsure
- Use your knowledge if a tool isn’t available
- If both options are available, prefer the tool
```
### 3. **Format-Aware Behavior**
GPT-4.1 performs better with clearly formatted instructions. Prefer structured formats:
* Markdown with headers and lists
* XML with nested tags
* Structured sections like `# Steps`, `# Output Format`
Poorly formatted, unsegmented prompts lead to instruction bleed and undesired merging of behaviors.
## Recommended Prompt Structure
Organize your prompt using a structure that mirrors OpenAI’s internal evaluation standards.
### 📁 Standard Sections
```markdown
# Role and Objective
# Instructions
## Sub-categories for Specific Behavior
# Workflow Steps (Optional)
# Output Format
# Examples (Optional)
# Final Reminder
```
### Example Prompt Template
```markdown
# Role and Objective
You are a customer service assistant. Your job is to resolve user issues efficiently, using tools when needed.
# Instructions
- Greet the user politely.
- Use a tool before answering any account-related question.
- If unsure how to proceed, ask the user for clarification.
- If a user requests escalation, refer them to a human agent.
## Output Format
- Always use a friendly tone.
- Format your answer in plain text.
- Include a summary at the end of your response.
## Final Reminder
Do not rely on prior knowledge. Use provided tools and context only.
```
## Instruction Categories
### 1. **Task Definition**
Clearly state the model’s job in the opening lines. Be explicit:
✅ “You are an assistant that reviews and edits legal contracts.”
🚫 “Help with contracts.”
### 2. **Behavioral Constraints**
List what the model must or must not do:
* Must call tools before responding to factual queries
* Must ask for clarification if user input is incomplete
* Must not provide financial or legal advice
### 3. **Response Style**
Define tone, length, formality, and structure.
* “Keep responses under 250 words.”
* “Avoid lists unless asked.”
* “Use a neutral tone.”
### 4. **Tool Use Protocols**
Models often hallucinate tools unless guided:
* “If you don’t have enough information to use a tool, ask the user for more.”
* “Always confirm tool usage before responding.”
## Debugging Instruction Failures
Instruction-following failures often stem from the following:
### Common Causes
* Ambiguous rule phrasing
* Conflicting instructions (e.g., both asking to guess and not guess)
* Implicit behaviors expected, not stated
* Overloaded instructions without formatting
### Diagnosis Steps
1. Read the full prompt in sequence
2. Identify potential ambiguity
3. Reorder to clarify precedence
4. Break complex rules into atomic steps
5. Test with structured evals
## Instruction Layering: The 3-Tier Model
When designing prompts for multi-step tasks, layer your instructions in tiers:
| Tier | Layer Purpose | Example |
| ---- | --------------------------- | ------------------------------------------ |
| 1 | Role Declaration | “You are an assistant for legal tasks.” |
| 2 | Global Behavior Constraints | “Always cite sources.” |
| 3 | Task-Specific Instructions | “In contracts, highlight ambiguous terms.” |
Each layer helps disambiguate behavior and provides a fallback structure if downstream instructions fail.
## Long Context Instruction Handling
In prompts exceeding 50,000 tokens:
* Place **key instructions** both **before and after** the context.
* Use format anchors (`# Instructions`, `<rules>`) to signal boundaries.
* Avoid relying solely on the top-of-prompt instructions.
GPT-4.1 is trained to respect these placements, especially when consistent structure is maintained.
## Literal vs. Flexible Models
| Capability | GPT-3.5 / GPT-4-turbo | GPT-4.1 |
| ---------------------- | --------------------- | --------------- |
| Implicit inference | High | Low |
| Literal compliance | Moderate | High |
| Prompt flexibility | Higher tolerance | Lower tolerance |
| Instruction debug cost | Lower | Higher |
GPT-4.1 performs better **when prompts are precise**. Treat prompt engineering as API design — clear, testable, and version-controlled.
## Tips for Designing Instruction-Sensitive Prompts
### ✔️ DO:
* Use structured formatting
* Scope behaviors into separate bullet points
* Use examples to anchor expected output
* Rewrite ambiguous instructions into atomic steps
* Add conditionals explicitly (e.g., “if X, then Y”)
### ❌ DON’T:
* Assume the model will “understand what you meant”
* Use overloaded sentences with multiple actions
* Rely on invisible or implied rules
* Assume formatting styles (e.g., bullets) are optional
## Example: Instruction-Controlled Code Agent
```markdown
# Objective
You are a code assistant that fixes bugs in open-source projects.
# Instructions
- Always use the tools provided to inspect code.
- Do not make edits unless you have confirmed the bug’s root cause.
- If a change is proposed, validate using tests.
- Do not respond unless the patch is applied.
## Output Format
1. Description of bug
2. Explanation of root cause
3. Tool output (e.g., patch result)
4. Confirmation message
## Final Note
Do not guess. If you are unsure, use tools or ask.
```
> For a complete walkthrough, see `/examples/code-agent-instructions.md`
## Instruction Evolution Across Iterations
As your prompts grow, preserve instruction integrity using:
* Versioned templates
* Structured diffs for instruction edits
* Commented rules for traceability
Example diff:
```diff
- Always answer user questions.
+ Only answer user questions after validating tool output.
```
Maintain a changelog for prompts as you would with source code. This ensures instructional integrity during collaborative development.
## Testing and Evaluation
Prompt engineering is empirical. Validate instruction design using:
* **A/B tests**: Compare variants with and without behavioral scaffolds
* **Prompt evals**: Use deterministic queries to test edge case behavior
* **Behavioral matrices**: Track compliance with instruction categories
Example matrix:
| Instruction Category | Prompt A Pass | Prompt B Pass |
| -------------------- | ------------- | ------------- |
| Ask if unsure | ✅ | ❌ |
| Use tools first | ✅ | ✅ |
| Avoid sensitive data | ❌ | ✅ |
## Final Reminders
GPT-4.1 is exceptionally effective **when paired with well-structured, comprehensive instructions**. Follow these principles:
* Instructions should be modular and auditable.
* Avoid unnecessary repetition, but reinforce critical rules.
* Use formatting styles that clearly separate content.
* Assume literalism — write prompts as if programming a function, not chatting with a person.
Every prompt is a contract. GPT-4.1 honors that contract, but only if written clearly.
## See Also
* [`Agent Workflows`](../agent_design/swe_bench_agent.md)
* [`Prompt Format Reference`](../reference/prompting_guide.md)
* [`Long Context Strategies`](../examples/long-context-formatting.md)
* [`OpenAI 4.1 Prompting Guide`](https://platform.openai.com/docs/guides/prompting)
For questions, suggestions, or prompt design contributions, submit a pull request to `/examples/instruction-following.md` or open an issue in the main repo.
|