recursivelabsai
/

openai-cookbook-pro

Model card Files Files and versions Community

openai-cookbook-pro / prompting_for_instruction_following.md

recursivelabs

Upload 13 files

36cdf5a verified 3 months ago

preview code

raw

history blame contribute delete

10.1 kB

	# [Prompting for Instruction Following](https://chatgpt.com/canvas/shared/6825ebe022148191bceb9fa5473a34eb)

	## Overview

	GPT-4.1 represents a significant shift in how developers should structure prompts for reliable, deterministic, and consistent behavior. Unlike earlier models which often inferred intent liberally, GPT-4.1 adheres to instructions in a far more literal, detail-sensitive manner. This brings both increased control and greater responsibility for developers: well-designed prompts yield exceptional results, while ambiguous or conflicting instructions may result in brittle or unexpected behavior.

	This guide outlines best practices, real-world examples, and design patterns to fully utilize GPT-4.1’s instruction-following improvements across a variety of applications. It is structured to help you:

	* Understand GPT-4.1’s instruction handling behavior
	* Design high-integrity prompt scaffolds
	* Debug prompt failures and mitigate ambiguity
	* Align instructions with OpenAI’s guidance around tool usage, task persistence, and planning

	This file is designed to stand alone for practical use and is fully aligned with the broader `openai-cookbook-pro` repository.


	## Why Instruction-Following Matters

	Instruction following is central to:

	* Agent behavior: models acting in multi-step environments must reliably interpret commands
	* Tool use: execution hinges on clearly-defined tool invocation criteria
	* Support workflows: factual grounding depends on accurate boundary adherence
	* Security and safety: systems must not misinterpret prohibitions or fail to enforce policy constraints

	With GPT-4.1’s shift toward literal interpretation, instruction scaffolding becomes the primary control interface.


	## GPT-4.1 Instruction Characteristics

	### 1. Literal Compliance

	GPT-4.1 follows instructions with minimal assumption. If a step is missing or unclear, the model is less likely to “fill in” or guess the user’s intent.

	* Previous behavior: interpreted vague prompts broadly
	* Current behavior: waits for or requests clarification

	This improves safety and traceability but also increases fragility in loosely written prompts.

	### 2. Order-Sensitive Resolution

	When instructions conflict, GPT-4.1 favors those listed last in the prompt. This means developers should order rules hierarchically:

	* General rules go early
	* Specific overrides go later

	Example:

	```markdown
	# Instructions
	- Do not guess if unsure
	- Use your knowledge if a tool isn’t available
	- If both options are available, prefer the tool
	```

	### 3. Format-Aware Behavior

	GPT-4.1 performs better with clearly formatted instructions. Prefer structured formats:

	* Markdown with headers and lists
	* XML with nested tags
	* Structured sections like `# Steps`, `# Output Format`

	Poorly formatted, unsegmented prompts lead to instruction bleed and undesired merging of behaviors.


	## Recommended Prompt Structure

	Organize your prompt using a structure that mirrors OpenAI’s internal evaluation standards.

	### 📁 Standard Sections

	```markdown
	# Role and Objective
	# Instructions
	## Sub-categories for Specific Behavior
	# Workflow Steps (Optional)
	# Output Format
	# Examples (Optional)
	# Final Reminder
	```

	### Example Prompt Template

	```markdown
	# Role and Objective
	You are a customer service assistant. Your job is to resolve user issues efficiently, using tools when needed.

	# Instructions
	- Greet the user politely.
	- Use a tool before answering any account-related question.
	- If unsure how to proceed, ask the user for clarification.
	- If a user requests escalation, refer them to a human agent.

	## Output Format
	- Always use a friendly tone.
	- Format your answer in plain text.
	- Include a summary at the end of your response.

	## Final Reminder
	Do not rely on prior knowledge. Use provided tools and context only.
	```


	## Instruction Categories

	### 1. Task Definition

	Clearly state the model’s job in the opening lines. Be explicit:

	✅ “You are an assistant that reviews and edits legal contracts.”

	🚫 “Help with contracts.”

	### 2. Behavioral Constraints

	List what the model must or must not do:

	* Must call tools before responding to factual queries
	* Must ask for clarification if user input is incomplete
	* Must not provide financial or legal advice

	### 3. Response Style

	Define tone, length, formality, and structure.

	* “Keep responses under 250 words.”
	* “Avoid lists unless asked.”
	* “Use a neutral tone.”

	### 4. Tool Use Protocols

	Models often hallucinate tools unless guided:

	* “If you don’t have enough information to use a tool, ask the user for more.”
	* “Always confirm tool usage before responding.”


	## Debugging Instruction Failures

	Instruction-following failures often stem from the following:

	### Common Causes

	* Ambiguous rule phrasing
	* Conflicting instructions (e.g., both asking to guess and not guess)
	* Implicit behaviors expected, not stated
	* Overloaded instructions without formatting

	### Diagnosis Steps

	1. Read the full prompt in sequence
	2. Identify potential ambiguity
	3. Reorder to clarify precedence
	4. Break complex rules into atomic steps
	5. Test with structured evals


	## Instruction Layering: The 3-Tier Model

	When designing prompts for multi-step tasks, layer your instructions in tiers:

	\| Tier \| Layer Purpose \| Example \|
	\| ---- \| --------------------------- \| ------------------------------------------ \|
	\| 1 \| Role Declaration \| “You are an assistant for legal tasks.” \|
	\| 2 \| Global Behavior Constraints \| “Always cite sources.” \|
	\| 3 \| Task-Specific Instructions \| “In contracts, highlight ambiguous terms.” \|

	Each layer helps disambiguate behavior and provides a fallback structure if downstream instructions fail.


	## Long Context Instruction Handling

	In prompts exceeding 50,000 tokens:

	* Place key instructions both before and after the context.
	* Use format anchors (`# Instructions`, `<rules>`) to signal boundaries.
	* Avoid relying solely on the top-of-prompt instructions.

	GPT-4.1 is trained to respect these placements, especially when consistent structure is maintained.


	## Literal vs. Flexible Models

	\| Capability \| GPT-3.5 / GPT-4-turbo \| GPT-4.1 \|
	\| ---------------------- \| --------------------- \| --------------- \|
	\| Implicit inference \| High \| Low \|
	\| Literal compliance \| Moderate \| High \|
	\| Prompt flexibility \| Higher tolerance \| Lower tolerance \|
	\| Instruction debug cost \| Lower \| Higher \|

	GPT-4.1 performs better when prompts are precise. Treat prompt engineering as API design — clear, testable, and version-controlled.


	## Tips for Designing Instruction-Sensitive Prompts

	### ✔️ DO:

	* Use structured formatting
	* Scope behaviors into separate bullet points
	* Use examples to anchor expected output
	* Rewrite ambiguous instructions into atomic steps
	* Add conditionals explicitly (e.g., “if X, then Y”)

	### ❌ DON’T:

	* Assume the model will “understand what you meant”
	* Use overloaded sentences with multiple actions
	* Rely on invisible or implied rules
	* Assume formatting styles (e.g., bullets) are optional


	## Example: Instruction-Controlled Code Agent

	```markdown
	# Objective
	You are a code assistant that fixes bugs in open-source projects.

	# Instructions
	- Always use the tools provided to inspect code.
	- Do not make edits unless you have confirmed the bug’s root cause.
	- If a change is proposed, validate using tests.
	- Do not respond unless the patch is applied.

	## Output Format
	1. Description of bug
	2. Explanation of root cause
	3. Tool output (e.g., patch result)
	4. Confirmation message

	## Final Note
	Do not guess. If you are unsure, use tools or ask.
	```

	> For a complete walkthrough, see `/examples/code-agent-instructions.md`


	## Instruction Evolution Across Iterations

	As your prompts grow, preserve instruction integrity using:

	* Versioned templates
	* Structured diffs for instruction edits
	* Commented rules for traceability

	Example diff:

	```diff
	- Always answer user questions.
	+ Only answer user questions after validating tool output.
	```

	Maintain a changelog for prompts as you would with source code. This ensures instructional integrity during collaborative development.


	## Testing and Evaluation

	Prompt engineering is empirical. Validate instruction design using:

	* A/B tests: Compare variants with and without behavioral scaffolds
	* Prompt evals: Use deterministic queries to test edge case behavior
	* Behavioral matrices: Track compliance with instruction categories

	Example matrix:

	\| Instruction Category \| Prompt A Pass \| Prompt B Pass \|
	\| -------------------- \| ------------- \| ------------- \|
	\| Ask if unsure \| ✅ \| ❌ \|
	\| Use tools first \| ✅ \| ✅ \|
	\| Avoid sensitive data \| ❌ \| ✅ \|


	## Final Reminders

	GPT-4.1 is exceptionally effective when paired with well-structured, comprehensive instructions. Follow these principles:

	* Instructions should be modular and auditable.
	* Avoid unnecessary repetition, but reinforce critical rules.
	* Use formatting styles that clearly separate content.
	* Assume literalism — write prompts as if programming a function, not chatting with a person.

	Every prompt is a contract. GPT-4.1 honors that contract, but only if written clearly.


	## See Also

	* [`Agent Workflows`](../agent_design/swe_bench_agent.md)
	* [`Prompt Format Reference`](../reference/prompting_guide.md)
	* [`Long Context Strategies`](../examples/long-context-formatting.md)
	* [`OpenAI 4.1 Prompting Guide`](https://platform.openai.com/docs/guides/prompting)


	For questions, suggestions, or prompt design contributions, submit a pull request to `/examples/instruction-following.md` or open an issue in the main repo.