openai-cookbook-pro / prompt_engineering_guide.md
recursivelabs's picture
Upload 13 files
36cdf5a verified
# [Prompt Engineering Reference Guide for GPT-4.1](https://chatgpt.com/canvas/shared/6825f88d7170819180b56e101e8b9d31)
## Overview
This reference guide consolidates OpenAI’s latest findings and recommendations for effective prompt engineering with GPT-4.1. It is designed for developers, researchers, and applied AI engineers who seek reliable, reproducible results from GPT-4.1 in both experimental and production settings. The techniques presented here are rooted in empirical validation across use cases ranging from agent workflows to structured tool integration, long-context processing, and instruction-following optimization.
This document emphasizes concrete prompt patterns, scaffolding techniques, and deployment-tested prompt modularity.
## Key Prompting Concepts
### 1. Instruction Literalism
GPT-4.1 follows instructions **more precisely** than its predecessors. Developers should:
* Avoid vague or underspecified prompts
* Be explicit about desired behaviors, output formats, and prohibitions
* Expect literal compliance with phrasing, including negations and scope restrictions
### 2. Planning Induction
GPT-4.1 does not natively plan before answering but can be prompted to simulate step-by-step reasoning.
**Template:**
```text
Think carefully step by step. Break down the task into manageable parts. Then begin.
```
Planning prompts should be framed before actions and reinforced between reasoning phases.
### 3. Agentic Harnessing
Use GPT-4.1’s enhanced persistence and tool adherence by specifying three types of reminders:
* **Persistence**: “Keep working until the problem is fully resolved.”
* **Tool usage**: “Use available tools to inspect files—do not guess.”
* **Planning enforcement**: “Plan and reflect before and after every function call.”
These drastically increase the model’s task completion rate when integrated at the top of a system prompt.
## Prompt Structure Blueprint
A recommended modular scaffold:
```markdown
# Role and Objective
You are a [role] tasked with [goal].
# Instructions
- Bullet-point rules or constraints
- Output format expectations
- Prohibited topics or phrasing
# Workflow (Optional)
1. Step-by-step plan
2. Reflection checkpoints
3. Tool interaction order
# Reasoning Strategy (Optional)
Describes how the model should analyze input or context before generating output.
# Output Format
JSON, Markdown, YAML, or prose specification
# Examples (Optional)
Demonstrates expected input/output behavior
```
This format increases predictability and flexibility during live prompt debugging and iteration.
## Long-Context Prompting
GPT-4.1 supports up to **1M token inputs**, enabling:
* Multi-document ingestion
* Codebase-wide searches
* Contextual re-ranking and synthesis
### Strategies:
* **Repeat instructions at top and bottom**
* **Use markdown/XML tags** for structure
* **Insert reasoning checkpoints every 5–10k tokens**
* **Avoid JSON for large document embedding**
**Effective Delimiters:**
| Format | Use Case |
| -------- | ------------------------------------ |
| Markdown | General sectioning |
| XML | Hierarchical document parsing |
| Title/ID | Multi-document input structuring |
| JSON | Code/tool tasks only; avoid for text |
## Tool-Calling Integration
### Schema-Based Tool Usage
Define tools in the OpenAI `tools` field, not inline. Provide:
* **Name** (clear and descriptive)
* **Parameters** (structured JSON)
* **Usage examples** (in `# Examples` section, not in `description`)
**Tool Example:**
```json
{
"name": "get_user_info",
"description": "Fetches user details from the database",
"parameters": {
"type": "object",
"properties": {
"user_id": { "type": "string" }
},
"required": ["user_id"]
}
}
```
### Prompt Reinforcement:
```markdown
# Tool Instructions
- Use tools before answering factual queries
- If info is missing, request input from user
```
### Failure Mitigation:
| Issue | Fix |
| ------------------ | ------------------------------------------- |
| Null tool calls | Prompt: “Ask for missing info if needed” |
| Over-calling tools | Add reasoning delay + post-call reflection |
| Missed call | Add output format block and trigger keyword |
## Instruction Following Optimization
GPT-4.1 is optimized for literal and structured instruction parsing. Improve reliability with:
### Multi-Tiered Rules
Use layers:
* `# Instructions`: High-level
* `## Response Style`: Format and tone
* `## Error Handling`: Edge case mitigation
### Ordered Workflows
Use numbered sequences to enforce step-by-step logic.
**Prompt Snippet:**
```markdown
# Instructions
- Greet the user
- Request missing parameters
- Avoid repeating exact phrasing
- Escalate on request
# Workflow
1. Confirm intent
2. Call tool
3. Reflect
4. Respond
```
## Chain-of-Thought Prompting (CoT)
Chain-of-thought induces linear reasoning. Works best for:
* Logic puzzles
* Multi-hop QA
* Comparative analysis
**CoT Example:**
```text
Let’s think through this. First, identify what the question is asking. Then examine context. Finally, synthesize an answer.
```
**Advanced Prompt (Modular):**
```markdown
# Reasoning Strategy
1. Query analysis
2. Context selection
3. Evidence synthesis
# Final Instruction
Think step by step using the strategy above.
```
## Failure Modes and Fixes
| Problem | Mitigation |
| ------------------ | -------------------------------------------------------------- |
| Tool hallucination | Require tool call block, validate schema |
| Early termination | Add: "Do not yield until goal achieved." |
| Verbose repetition | Add paraphrasing constraint and variation list |
| Overcompliance | If model follows a sample phrase verbatim, instruct to vary it |
## Evaluation Strategy
Prompt effectiveness should be evaluated across:
* **Instruction adherence**
* **Tool utilization accuracy**
* **Reasoning coherence**
* **Failure mode frequency**
* **Latency and cost tradeoffs**
### Recommended Methodology:
* Create a test suite with edge-case prompts
* Log errors and model divergence cases
* Use eval tags (`# Eval:`) in prompt for meta-analysis
## Delimiter Comparison Table
| Delimiter Type | Format Example | GPT-4.1 Performance | |
| -------------- | ------------------ | ------------------------------- | -------- |
| Markdown | `## Section Title` | Excellent | |
| XML | `<doc>` tags | Excellent | |
| JSON | `{"text": "..."}` | High (in code), Poor (in prose) | |
| Pipe-delimited | \`TITLE | CONTENT\` | Moderate |
### Best Practice:
Use Markdown or XML for general structure; JSON for code/tools only.
## Example: Prompt Debugging Workflow
### Step 1: Identify Goal
E.g., summarizing medical trial documents with context weighting.
### Step 2: Draft Prompt Template
```markdown
# Objective
Summarize each trial based on outcome clarity and trial scale.
# Workflow
1. Parse hypothesis/result
2. Score for clarity
3. Output structured summary
# Output Format
{"trial_id": ..., "clarity_score": ..., "summary": ...}
```
### Step 3: Insert Sample
```json
{"trial_id": "T01", "clarity_score": 8, "summary": "Well-documented results..."}
```
### Step 4: Validate Output
Ensure model adheres to output format, logic, and reasoning instructions.
## Summary: Prompt Engineering Heuristics
| Technique | When to Use |
| -------------------------- | ----------------------------------- |
| Instruction Bullets | All prompts |
| Chain-of-Thought | Any task requiring logic or steps |
| Workflow Lists | Multiphase reasoning tasks |
| Tool Block | Any prompt using API/tool calls |
| Reflection Reminders | Long context, debugging, validation |
| Dual Instruction Placement | Long documents (>100K tokens) |
## Final Notes
Prompt engineering is empirical, not theoretical. Every use case is different. To engineer effectively with GPT-4.1:
* Maintain modular, versioned prompt templates
* Use structured instructions and output formats
* Enforce explicit planning and tool behavior
* Iterate prompts based on logs and evals
**Start simple. Add structure. Evaluate constantly.**
This guide is designed to be expanded. Use it as your baseline and evolve it as your systems scale.