File size: 11,214 Bytes
36cdf5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# [Tool Use and Integration](https://chatgpt.com/canvas/shared/6825ee7dbfd081919e67bd643748f8de)

## Overview

GPT-4.1 introduces robust capabilities for working with tools directly through the OpenAI API’s `tools` parameter. Rather than relying solely on the model's internal knowledge, developers can now extend functionality, reduce hallucination, and enforce reliable workflows by integrating explicitly defined tools into their applications.

This document offers a comprehensive guide for designing and deploying tool-augmented applications using GPT-4.1. It includes best practices for tool registration, prompting strategies, tool schema design, usage examples, and debugging common tool invocation failures. Each section is modular and designed to help you build reliable systems that scale across contexts, task types, and user interfaces.


## What is a Tool in GPT-4.1?

A **tool** is an explicitly defined function or utility passed to the GPT-4.1 API, allowing the model to trigger predefined operations such as:

* Running code or bash commands
* Retrieving documents or structured data
* Performing API calls
* Applying file patches or diffs
* Looking up user account information

Tools are defined in a structured JSON schema and passed via the `tools` parameter. When the model determines a tool is required, it emits a function call rather than plain text. This enables **precise execution**, **auditable behavior**, and **tight application integration**.


## Why Use Tools?

| Benefit                        | Description                                                                |
| ------------------------------ | -------------------------------------------------------------------------- |
| **Reduces hallucination**      | Encourages the model to call real-world functions instead of guessing      |
| **Improves traceability**      | Tool calls are logged and interpretable as function outputs                |
| **Enables complex workflows**  | Offloads parts of the task to external systems (e.g., shell, Python, APIs) |
| **Enhances compliance**        | Limits model responses to grounded tool outputs                            |
| **Improves agent performance** | Required for persistent, multi-turn agentic workflows                      |


## Tool Definition: The Schema

Tools are defined using a JSON schema object that includes:

* `name`: A short, unique identifier
* `description`: A concise explanation of what the tool does
* `parameters`: A standard JSON Schema describing expected input

### Example: Python Execution Tool

```json
{
  "type": "function",
  "name": "python",
  "description": "Run Python code or terminal commands in a secure environment.",
  "parameters": {
    "type": "object",
    "properties": {
      "input": {
        "type": "string",
        "description": "The code or command to run"
      }
    },
    "required": ["input"]
  }
}
```

### Best Practices for Schema Design

* Use clear names: `run_tests`, `lookup_policy`, `apply_patch`
* Keep descriptions actionable: Describe *when* and *why* to use
* Minimize complexity: Use shallow parameter objects where possible
* Use enums or constraints to reduce ambiguous calls


## Registering Tools in the API

In the Python SDK:

```python
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=chat_history,
    tools=[python_tool, get_user_info_tool],
    tool_choice="auto"
)
```

Set `tool_choice` to:

* `"auto"`: Allow the model to choose when to call
* A specific tool name: Force one call
* `"none"`: Prevent tool usage (useful for testing)


## Prompting for Tool Use

### Tool Use Prompting Guidelines

To guide GPT-4.1 toward proper tool usage:

* **Don’t rely on the model to infer when to call a tool.** Tell it explicitly when tools are required.
* **Prompt for failure cases**: Tell the model what to do when it lacks information (e.g., “ask the user” or “pause”).
* **Avoid ambiguity**: Be clear about tool invocation order and data requirements.

### Example Prompt Snippet

```markdown
Before answering any user question about billing, check if the necessary context is available.
If not, use the `lookup_policy_document` tool to find relevant information.
Never answer without citing a retrieved document.
```

### Escalation Pattern

```markdown
If the tool fails to return the necessary data, ask the user for clarification.
If the user cannot provide it, explain the limitation and pause further action.
```


## Tool Use in Agent Workflows

Tool usage is foundational to agent design in GPT-4.1.

### Multi-Stage Task Example: Bug Fix Agent

```markdown
1. Use `read_file` to inspect code
2. Analyze and plan a fix
3. Use `apply_patch` to update the file
4. Use `run_tests` to verify changes
5. Reflect and reattempt if needed
```

Each tool call is logged as a JSON event and can be parsed programmatically.


## Apply Patch: Recommended Format

One of the most powerful GPT-4.1 patterns is **patch generation** using a diff-like format.

### Patch Structure

```bash
apply_patch <<"EOF"
*** Begin Patch
*** Update File: path/to/file.py
@@ def function():
-    old_code()
+    new_code()
*** End Patch
EOF
```

### Tool Behavior

* No line numbers required
* Context determined by `@@` anchors and 3 lines of code before/after
* Errors must be handled gracefully and logged

See `/examples/apply_patch/` for templates and error-handling techniques.


## Tool Examples by Use Case

| Use Case              | Tool Name       | Description                                |
| --------------------- | --------------- | ------------------------------------------ |
| Execute code          | `python`        | Runs code or shell commands                |
| Apply file diff       | `apply_patch`   | Applies a patch to a source file           |
| Fetch document        | `lookup_policy` | Retrieves structured policy text           |
| Get user account data | `get_user_info` | Fetches user account info via phone number |
| Log analytics         | `log_event`     | Sends metadata to your analytics platform  |


## Error Handling and Recovery

Tool failure is inevitable in complex systems. Plan for it.

### Guidelines for GPT-4.1:

* Detect and summarize tool errors
* Ask for missing input
* Retry if safe
* Escalate to user if unresolvable

### Prompt Pattern: Failure Response

```markdown
If a tool fails with an error, summarize the issue clearly for the user.
Only retry if the cause of failure is known and correctable.
If not, explain the problem and ask the user for next steps.
```


## Tool Debugging and Logging

Enable structured logging to track model-tool interactions:

* **Log call attempts**: Include input parameters and timestamps
* **Log success/failure outcomes**: Include model reflections
* **Log retry logic**: Show how failures were handled

This creates full traceability for AI-involved actions.

### Sample Tool Call Log (JSON)

```json
{
  "tool_name": "run_tests",
  "input": "!python3 -m unittest discover",
  "result": "3 tests passed, 1 failed",
  "timestamp": "2025-05-15T14:32:12Z"
}
```


## Tool Evaluation and Performance Monitoring

Track tool usage metrics:

* **Tool Call Rate**: How often a tool is invoked
* **Tool Completion Rate**: How often tools finish without failure
* **Tool Contribution Score**: Impact on final task completion
* **Average Attempts per Task**: Retry behavior over time

Use this data to refine prompting and improve tool schema design.


## Common Pitfalls and Solutions

| Issue                        | Likely Cause                                   | Solution                                             |
| ---------------------------- | ---------------------------------------------- | ---------------------------------------------------- |
| Tool called with empty input | Missing required parameter                     | Prompt model to validate input presence              |
| Tool ignored                 | Tool not described clearly in schema or prompt | Add clear instruction for when to use tool           |
| Repeated failed calls        | No failure mitigation logic                    | Add conditionals to check and respond to tool errors |
| Model mixes tool names       | Ambiguous tool naming                          | Use short, specific, unambiguous names               |


## Combining Tools with Instructions

When combining tools with detailed instruction sets:

* Include a `# Tools` section in your system prompt
* Define when and why each tool should be used
* Link tool calls to reasoning steps in `# Workflow`

### Example Combined Prompt

```markdown
# Role
You are a bug-fix agent using provided tools to solve code issues.

# Tools
- `read_file`: Inspect code files
- `apply_patch`: Apply structured diffs
- `run_tests`: Validate code after changes

# Instructions
1. Always start with file inspection
2. Plan before making changes
3. Test after every patch
4. Do not finish until all tests pass

# Output
Include patch summaries, test outcomes, and current status.
```


## Tool Testing Templates

Create test cases that validate:

* Input formatting
* Response validation
* Prompt-tool alignment
* Handling of edge cases

Use both synthetic and real examples:

```markdown
## Tool Call Test: run_tests
**Input**: Code with known error
**Expected Output**: Test failure summary
**Follow-up Behavior**: Retry with fixed patch
```


## Tool Choice Design

Choose between model-directed or developer-directed tool invocation:

| Mode          | Behavior                                    | Use Case                           |
| ------------- | ------------------------------------------- | ---------------------------------- |
| `auto`        | Model decides whether and when to use tools | General assistants, exploration    |
| `none`        | Model cannot use tools                      | Testing model reasoning only       |
| `forced` name | Developer instructs tool call immediately   | Known pipeline steps, unit testing |

Choose based on control needs and task constraints.


## Summary: Best Practices for Tool Integration

| Area             | Best Practice                                            |
| ---------------- | -------------------------------------------------------- |
| Tool Naming      | Use action-based, unambiguous names                      |
| Prompt Structure | Clearly define when and how tools should be used         |
| Tool Invocation  | Register tools in API, not in plain prompt text          |
| Failure Handling | Provide instructions for retrying or asking the user     |
| Schema Design    | Use JSON Schema with constraints to reduce invalid input |
| Evaluation       | Track tool call success rate and contribution to outcome |


## Further Exploration

* [`Designing Agent Workflows`](./Designing%20Agent%20Workflows.md)
* [`Prompting for Instruction Following`](./Prompting%20for%20Instruction%20Following.md)
* [`Long Context Strategies`](./Long%20Context.md)

For community templates and tool libraries, explore the `/tools/` and `/examples/` directories in the main repository.


For contributions, open a pull request or submit an issue in `/tools/Tool Use and Integration.md`.