Prompt, Context & Agent Orchestration

142 个 AI Agent 提示词模式

2026-04-29T00:00:00+00:00

从 500+ 个真实 AI agent 插件中提炼的 142 个提示词工程模式。不是理论，每个模式都有名字、解决的问题、和可直接使用的 prompt 片段。

为什么做这个

市面上的”awesome prompts”集合大多面向 ChatGPT 用户的一次性问答。这个目录面向构建 AI agent 和多步骤插件的开发者 — prompt 稳定性、技能间协调、防御性模式才是关键。

这些模式来自对 500+ 个生产插件的分析，涵盖：DevOps 自动化、安全分析、代码迁移、事件响应、部署编排等。每个模式至少在 3 个独立插件中出现才被收录。

目录结构

catalog/
├── catalog-index.md              ← 总索引（全部 142 个模式）
├── categories/                   ← 按功能分组
│   ├── patterns-structural-scaffolding.md
│   ├── patterns-input-output-contracts.md
│   ├── patterns-execution-control.md
│   ├── patterns-knowledge-and-context.md
│   ├── patterns-agent-orchestration.md
│   ├── patterns-safety-and-trust.md
│   ├── patterns-quality-and-feedback.md
│   └── ...（共 18 个文件）
├── techniques/                   ← 深度指南
│   ├── token-level-techniques.md    ← 基于熵理论的 9 个技巧
│   ├── anti-laziness.md             ← 防止 agent 偷懒的 8 种策略
│   ├── skill-architecture.md        ← 技能打包与组合
│   ├── branching-stability.md       ← 分支逻辑的可靠性
│   ├── reference-skip-playbook.md   ← 强制 agent 阅读引用文件
│   └── good-vs-bad-template.md      ← 好坏 prompt 对比
└── standards/                    ← 评审框架
    ├── quality-standards.md         ← P0/P1/P2 严重度分级
    └── review-checklist.md          ← 9 维度 prompt 评审

12 个模式分类

#	分类	模式数	覆盖内容
1	结构脚手架	15	阶段门控、决策树、边界标签
2	输入/输出契约	12	Schema 约束、格式锁定、校验
3	执行控制	14	尝试上限、停止条件、重试逻辑
4	知识与上下文	12	SSOT 注册表、按需加载、缓存层
5	Agent 编排	11	子 agent 派发、并行执行、交接
6	安全与信任	10	护栏、禁止操作、升级门控
7	质量与反馈	9	自审查、证据门控、置信度评分
8	高级 I/O 与领域	10	领域路由、多模态、Schema 演进
9	高级编排	8	DAG 执行、共识、群体模式
10	高级质量	7	回归检测、漂移监控
11	高级安全	8	数据分类、审计追踪、合规
12	高级工作流	10	部署门控、回滚、状态机

还有补充分类：Karpathy 行为模式、Claude Code 平台模式、开源技能模式、Gap-fill 模式。

举例：模式 23 — 带上限的修复循环

问题： AI agent 修复构建错误时可能无限循环，或者太早放弃。

模式：

## 停止条件（穷举）

修复循环只在以下条件满足时停止：

| 条件 | 动作 |
|------|------|
| (a) 构建成功 | 返回成功 |
| (b) 尝试次数达到 N | 返回失败，附带剩余错误 |
| (c) 会话死亡 | 返回 session_dead |

没有其他理由可以停止。"错误太多"不行，
"超出范围"不行，"无法修复"也不行。

为什么有效： 消除了 agent 合理化提前退出的倾向。穷举表格不留歧义 — agent 无法发明第 4 个停止条件。

举例：模式 45 — 指令式写前审查

问题： Agent 写入错误的配置变更，破坏生产行为。

模式：

每次编辑前，逐条检查护栏：

| # | 检查项 | 通过 | 失败 |
|---|--------|------|------|
| G1 | 抑制是否有范围？ | 在具体条目上 | 全局范围 |
| G2 | 覆写是否必要？ | 默认值不满足 | 默认值已经可以 |
| G3 | 是否创建了配套文件？ | 文件对存在 | 孤立条件 |

任何护栏返回"失败" → 不写入。先修正。

为什么有效： 在”决定做什么”和”执行”之间强制暂停。表格格式让每项检查独立可评估 — agent 无法在连续叙述中跳过某一项。

技巧亮点

Token 级技巧（9 个）

基于 LLM 实际处理 token 的方式 — 不是直觉。例如：决策树在分支逻辑上优于自然语言，因为树结构把注意力集中在一条路径上，而自然语言同时分散注意力到所有条件。

防偷懒策略（8 种）

Agent 会跳过引用文件的阅读、把多步流程压缩成捷径、”记住”而不是重新读取。防偷懒指南记录了 8 种系统性防御，从强制读取门控到渐进式披露。

Prompt 评审框架

9 个维度（清晰度、确定性、安全性、可测试性……）的结构化评审流程，配合 P0/P1/P2 严重度分级。设计用于 agent prompt 的同行评审。

如何使用

构建新技能？ 浏览目录索引找到匹配的模式
调试不稳定行为？ 查看执行控制和防偷懒
评审别人的 prompt？ 使用评审清单
学习 prompt 工程？ 从 Token 级技巧开始

许可

MIT。可自由用于你的 agent、插件和项目。

142 Prompt Patterns for AI Agent Development

2026-04-29T00:00:00+00:00

A categorized catalog of 142 prompt engineering patterns — extracted from 500+ real-world AI agent plugins. Not theory. Every pattern has a name, a problem it solves, and a concrete prompt snippet.

Why This Exists

Most “awesome prompt” collections target ChatGPT users writing one-off queries. This catalog targets developers building AI agents and multi-step plugins — where prompt stability, inter-skill coordination, and defensive patterns matter.

The patterns were extracted by analyzing 500+ production plugins across categories: DevOps automation, security analysis, code migration, incident response, deployment orchestration, and more. Each pattern appeared in at least 3 independent plugins before inclusion.

Catalog Structure

catalog/
├── catalog-index.md              ← Master index (all 142 patterns)
├── categories/                   ← Patterns grouped by function
│   ├── patterns-structural-scaffolding.md
│   ├── patterns-input-output-contracts.md
│   ├── patterns-execution-control.md
│   ├── patterns-knowledge-and-context.md
│   ├── patterns-agent-orchestration.md
│   ├── patterns-safety-and-trust.md
│   ├── patterns-quality-and-feedback.md
│   └── ... (18 files total)
├── techniques/                   ← Deep-dive guides
│   ├── token-level-techniques.md    ← 9 techniques grounded in entropy theory
│   ├── anti-laziness.md             ← 8 strategies to prevent agent shortcutting
│   ├── skill-architecture.md        ← Skill packaging and composition
│   ├── branching-stability.md       ← Branch logic reliability
│   ├── reference-skip-playbook.md   ← Force agents to read references
│   └── good-vs-bad-template.md      ← Side-by-side prompt comparison
└── standards/                    ← Review frameworks
    ├── quality-standards.md         ← P0/P1/P2 severity grading
    └── review-checklist.md          ← 9-dimension prompt review

The 12 Pattern Categories

#	Category	Patterns	What It Covers
1	Structural Scaffolding	15	Phase gates, decision trees, boundary tags
2	Input/Output Contracts	12	Schema enforcement, format locks, validation
3	Execution Control	14	Attempt limits, stop conditions, retry logic
4	Knowledge & Context	12	SSOT registries, on-demand loading, cache layers
5	Agent Orchestration	11	Sub-agent dispatch, parallel execution, handoffs
6	Safety & Trust	10	Guardrails, prohibited actions, escalation gates
7	Quality & Feedback	9	Self-review, evidence gates, confidence scoring
8	Advanced I/O & Domain	10	Domain routing, multi-modal, schema evolution
9	Advanced Orchestration	8	DAG execution, consensus, swarm patterns
10	Advanced Quality	7	Regression detection, drift monitoring
11	Advanced Safety	8	Data classification, audit trails, compliance
12	Advanced Workflow	10	Deployment gates, rollback, state machines

Plus supplementary categories: Karpathy behavioral patterns, Claude Code platform patterns, open-source skill patterns, and gap-fill patterns.

Example: Pattern 23 — Attempt-Capped Repair Loop

Problem: An AI agent fixing build errors might loop forever or give up too early.

Pattern:

## Stop Conditions (Exhaustive)

The repair loop stops ONLY when ONE of these is met:

| Condition | Action |
|-----------|--------|
| (a) Build succeeds | Return success |
| (b) Attempt counter reaches N | Return failed with remaining errors |
| (c) Session dies | Return session_dead |

No other condition justifies stopping. Not "too many errors",
not "beyond scope", not "unfixable."

Why it works: Eliminates the agent’s natural tendency to rationalize early exit. The exhaustive table leaves no ambiguity — the agent cannot invent a 4th stop condition.

Example: Pattern 45 — Directive-Based Pre-Write Review

Problem: Agent writes incorrect config changes that break production behavior.

Pattern:

Before EVERY edit, evaluate each guardrail:

| # | Check | PASS | FAIL |
|---|-------|------|------|
| G1 | Is suppression scoped? | On specific item | Blanket scope |
| G2 | Is override needed? | Default insufficient | Default works fine |
| G3 | Is companion created? | Paired files exist | Orphaned condition |

If ANY guardrail returns FAIL → do NOT write. Revise first.

Why it works: Forces a mandatory pause between “decide what to do” and “do it.” The table format means each check is independently evaluable — the agent can’t skip one by flowing past it in prose.

Techniques Highlights

Token-Level Techniques (9 techniques)

Grounded in how LLMs actually process tokens — not intuition. Example: Decision trees beat prose for branching logic because tree structure concentrates attention on one path, while prose spreads attention across all conditions simultaneously.

Anti-Laziness Strategies (8 strategies)

Agents skip reference reads, collapse multi-step procedures into shortcuts, and “remember” instead of re-reading. The anti-laziness guide documents 8 systematic defenses, from mandatory read gates to progressive disclosure.

Prompt Review Framework

A structured review process with 9 dimensions (clarity, determinism, safety, testability…) and P0/P1/P2 severity grading. Designed for peer review of agent prompts — not just self-review.

How to Use

Building a new skill? Scan the catalog index for patterns that match your problem
Debugging unstable behavior? Check execution control and anti-laziness
Reviewing someone’s prompt? Use the review checklist
Learning prompt engineering? Start with token-level techniques

License

MIT. Use these patterns in your own agents, plugins, and projects.

模式：决策树比自然语言更适合 AI 提示词中的分支逻辑

2026-04-19T00:00:00+00:00

用自然语言段落告诉 AI 该做什么，大多数时候能用。但当逻辑有分支（”如果 X 就 Y，否则 Z”），自然语言提示词在多次运行中会产生不一致的输出 — 即使输入完全相同。

决策树能解决这个问题。 用可视化的树结构替代自然语言的分支逻辑，模型就能确定性地执行。

问题

考虑一个事故响应系统：AI 需要判断严重等级、分配响应人员、选择缓解措施、生成结构化的响应计划。规则很复杂：4 个严重等级、6 种根因类别、基于层级的人员分配、升级条件、通信截止时间。

我们为同一个任务写了两种提示词 — 一种是自然语言段落（约 140 行），一种是决策树（约 260 行）— 用 claude -p 各跑了 20 次，输入完全相同。

结果（各 20 次运行）

主要缓解措施

自然语言	决策树
20 次中有 15 种不同写法	20 次中只有 1 种写法

自然语言输出 — 每次都不同：

第 1 次:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality) while investigating root cause"
第 2 次:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality for non-critical read paths)"
第 3 次:  "Enable graceful degradation mode on Cosmos DB East US 2"
第 4 次:  "Enable graceful degradation on Cosmos DB East US 2 to serve cached/reduced-functionality responses while..."
...共 15 种不同表达

决策树输出 — 每次相同：

第 1-20 次:  "enable_graceful_degradation"

次要措施

自然语言	决策树
20 种不同写法（零重复）	2 种（19 次 `hotfix`，1 次 `failover_to_secondary`）

自然语言的次要措施达到了 0% 的可复现性 — 每一次运行都产生独特的句子。

完整对比

维度	自然语言（20 次）	决策树（20 次）
严重等级	20/20 SEV2	20/20 SEV2
主要措施（唯一值数量）	15	1
次要措施（唯一值数量）	20	2
响应人员角色组合	2 种（大小写不一致：`on-call` vs `On-Call`）	1 种（一致）
“避免操作”数量	不一致（16 次 2 项，4 次 3 项）	一致（20 次都是 2 项）
升级触发条件	3 条，一致	3 条，一致

这意味着什么

两种方式都做出了正确的决策 — SEV2、启用优雅降级、分配 4 名响应人员。差异在于输出的确定性。

如果下游代码这样写：

if response["mitigation_plan"]["primary_action"]["action"] == "enable_graceful_degradation":
    execute_graceful_degradation()

决策树提示词：20/20 次匹配成功
自然语言提示词：0/20 次匹配成功（action 是自由句子，永远不会精确匹配）

这不是理论问题。任何解析 AI 输出的系统 — 自动化流水线、agent 编排、工具调用 — 都会因为输出格式不可预测而崩溃。

模式

核心思路：把这种写法：

如果根因是部署问题且服务支持即时回滚，优先选择回滚，因为它能以最小风险恢复到最后已知良好状态。
如果根因是部署问题但回滚不可用，则考虑关闭功能标志（如果变更在功能标志后面）。
如果没有功能标志，则进行热修复。

替换成这种：

根因类别？
├─ 部署问题
│   └─ 服务支持即时回滚？
│       ├─ 是 → action: "rollback"
│       └─ 否
│           └─ 变更在功能标志后面？
│               ├─ 是 → action: "disable_feature_flag"
│               └─ 否 → action: "hotfix"

同样的逻辑。树版本在每个叶子节点给出了精确的输出字符串，缩进编码了决策路径。

为什么有效

逐步收窄注意力。 模型每次只评估一个条件，而不是同时持有所有规则。
提供精确的输出文本。 叶子节点包含字面量 action 字符串 — 模型直接复制，而不是自己编写新句子。
用空间编码层级关系。 缩进 token（空白字符）编码了父子关系。LLM 在训练中从数百万份代码文件、YAML 配置、目录树中学到了这种模式。

什么时候用 / 什么时候不用

用决策树：

提示词有分支逻辑（if/else、switch/case）
输出要被代码解析（JSON 字段、action 名称、状态值）
需要多次运行的一致性
构建 agent 编排或自动化流水线

不需要决策树：

开放式创意任务（变化是好事）
没有分支逻辑
输出只给人类阅读（措辞变化无所谓）

自己试试

克隆仓库然后运行：

cd eval/decision-tree-ab
bash run.sh 20           # 20 次运行，默认模型
bash run.sh 10 haiku     # 10 次运行，haiku 模型
python3 analyze.py       # 分析结果

包含三个场景：简单（部署控制器）、歧义（未知服务器状态）、复杂（200+ 行的事故响应提示词）。

Pattern: Decision Trees Beat Prose in AI Prompts

2026-04-19T00:00:00+00:00

When you tell an AI what to do using natural language paragraphs, it works — most of the time. But when the logic has branches (“if X then Y, otherwise Z”), prose prompts produce inconsistent outputs across runs — even when the input is identical.

Decision trees fix this. Replace prose branching logic with a visual tree structure, and the model follows it deterministically.

The Problem

Consider an incident response system. The AI must triage incidents, assign responders, choose mitigation actions, and produce a structured plan. The rules are complex: 4 severity levels, 6 root cause categories, tier-based responder assignment, escalation conditions, communication deadlines.

We wrote two prompts for the same task — one as prose paragraphs (~140 lines), one as decision trees (~260 lines) — and ran each 20 times with identical input using claude -p.

Results (20 Runs Each)

Primary Mitigation Action

Prose	Decision Tree
15 unique phrasings out of 20 runs	1 phrasing out of 20 runs

Prose outputs looked like this — every run different:

Run 1:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality) while investigating root cause"
Run 2:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality for non-critical read paths)"
Run 3:  "Enable graceful degradation mode on Cosmos DB East US 2"
Run 4:  "Enable graceful degradation on Cosmos DB East US 2 to serve cached/reduced-functionality responses while..."
...15 unique variants total

Decision tree — every run identical:

Run 1-20:  "enable_graceful_degradation"

Secondary Action

Prose	Decision Tree
20 unique phrasings (zero repeats)	2 variants (19× `hotfix`, 1× `failover_to_secondary`)

Prose achieved 0% reproducibility on secondary actions. Every single run produced a unique sentence.

Full Comparison Table

Dimension	Prose (20 runs)	Decision Tree (20 runs)
Severity classification	20/20 SEV2	20/20 SEV2
Primary action (unique values)	15	1
Secondary action (unique values)	20	2
Responder role sets	2 variants (capitalization: `on-call` vs `On-Call`)	1 (consistent)
Actions-to-avoid count	inconsistent (16× two items, 4× three items)	consistent (20× two items)
Escalation triggers	3/3 consistent	3/3 consistent

What This Means in Practice

Both approaches made the same correct decisions — SEV2, enable graceful degradation, assign 4 responders. The difference is output determinism.

If your downstream code does:

if response["mitigation_plan"]["primary_action"]["action"] == "enable_graceful_degradation":
    execute_graceful_degradation()

Decision tree prompt: works 20/20 times
Prose prompt: works 0/20 times (action is a free-form sentence, never matches)

This isn’t a theoretical problem. Any system that parses AI output — automation pipelines, agent orchestration, tool calling — breaks when the output format is unpredictable.

The Pattern

Here’s the core idea. Replace this:

If the root cause is a bad deployment and the service supports instant 
rollback, always prefer rollback over other mitigations because it restores 
the last known good state with minimal risk. If the root cause is a bad 
deployment but rollback is not available, then consider feature flag 
disablement if the change is behind a feature flag. If no feature flag 
exists, proceed with hotfix.

With this:

Root cause category?
├─ Bad deployment
│   └─ Service supports instant rollback?
│       ├─ YES → action: "rollback"
│       └─ NO
│           └─ Change behind feature flag?
│               ├─ YES → action: "disable_feature_flag"
│               └─ NO  → action: "hotfix"

Same logic. The tree version gives the model an exact string to output at each leaf node, and the indentation encodes the decision path spatially.

Why It Works

Narrows attention at each step. The model evaluates one condition at a time instead of holding all rules in attention simultaneously.
Provides exact output text. Leaf nodes contain the literal action string — the model copies it rather than composing a new sentence.
Encodes hierarchy spatially. Indentation tokens (whitespace) encode parent-child relationships. LLMs learned this pattern from millions of code files, YAML configs, and directory trees during training.

When to Use / When Not To

Use decision trees when:

Your prompt has branching logic (if/else, switch/case)
Output is parsed by code (JSON fields, action names, status values)
You need consistency across multiple runs
You’re building agent orchestration or automation pipelines

Skip decision trees when:

The task is open-ended creative work (variation is desirable)
There’s no branching logic
Output is read by humans only (phrasing variation doesn’t matter)

Try It Yourself

Clone the repo and run:

cd eval/decision-tree-ab
bash run.sh 20           # 20 runs, default model
bash run.sh 10 haiku     # 10 runs, haiku model
python3 analyze.py       # analyze results

Three scenarios are included: simple (deployment controller), ambiguous (unknown server status), and complex (incident response with 200+ line prompts).

提示词工程模式：写出稳定可靠的 AI 提示词

2026-04-19T00:00:00+00:00

一份写出稳定、可预测的 AI 提示词的实用指南。

核心原则：降低不确定性

AI 模型每生成一个词，都在很多候选词中做选择。你的提示词结构直接决定了模型在关键节点是”犹豫”还是”确定”。

确定性高 → 行为稳定
确定性低 → 行为飘忽

下面所有技巧都在做同一件事：让模型在关键决策点更确定。

简单例子

你的提示词需要模型根据环境决定是否询问用户。

自然语言版本：

如果在 CI 环境中且没有参数，使用默认值。
如果在交互模式中有参数，直接运行。
如果在交互模式中没有参数，询问用户。

模型读完后，3 条规则同时争夺注意力。它需要回头扫描才能拼出答案。

决策树版本：

## CI 环境？
├─ 是
│  └─ 有参数？
│     ├─ 是 → 使用参数，执行
│     └─ 否 → 使用默认值，执行
└─ 否（交互模式）
   └─ 有参数？
      ├─ 是 → 使用参数，执行
      └─ 否 → 询问用户

模型判断 “CI = 是” 且 “参数 = 否” 后，注意力集中在这一行：

│     └─ 否 → 使用默认值，执行
              ↑ 注意力集中在这里

答案就在眼前，不需要回头看。确定性很高。

一句话总结

决策树让模型看几个附近的词就知道该做什么。自然语言让它扫一整段才能拼出答案。搜索范围越小，结果越确定。

模式 1：决策树替代自然语言

对有分支逻辑的指令，用可视化树结构替代文字描述。

不好：自然语言

如果在 CI 环境中且没有参数，使用默认值。如果在交互模式中有参数，
直接运行。如果在交互模式中没有参数，询问用户。

好：决策树

## $ARGUMENTS 非空？
├─ 是 → 解析参数，直接执行，不交互
└─ 否
   ## $CI 或 $CLAUDE_NONINTERACTIVE 已设置？
   ├─ 是 → 使用  的值，直接执行
   └─ 否 → 询问用户缺少的参数，然后执行

为什么有效： 缩进编码了层级关系。模型在训练中见过大量缩进结构（代码、YAML、目录树），学会了”缩进越深 = 子条件”。自然语言没有这种空间编码。

模式 2：锚定（给起点）

给模型一个具体的起点，不让它凭空发挥。

不好：无锚定

生成一个部署脚本。

好：用模板锚定

基于这个模板生成部署脚本：

为什么有效： 模板的内容直接参与模型的注意力计算。模型的输出会被”拉向”模板的风格，而不是从”部署脚本”这个笼统概念中随机生成。

模式 3：认知卸载（把思考步骤写出来）

把模型本来需要隐式推理的步骤显式地写出来。

不好：隐式推理

分析这段代码的性能问题并修复。

好：显式步骤

找出所有循环和递归
标注每个的时间复杂度
标记 O(n²) 或更高的
为每个标记的部分提出优化方案

按顺序执行这些步骤。

为什么有效： LLM 没有真正的工作记忆。把中间步骤写出来等于给了”外部记忆”——每一步只需要看上一步的输出，不需要从头推导。

决策树 = 分支逻辑的认知卸载。思维链 = 推理过程的认知卸载。同一个原理，不同应用。

模式 4：注意力局部性（把相关的放在一起）

相关信息在文本中应该靠近。越近的词获得越强的注意力。

不好：规则离目标太远

永远不要删除生产数据库
...（中间隔了 500 个词）...
清理过期数据

好：规则紧挨目标

清理过期数据
永远不要删除生产数据库

为什么有效： Transformer 的注意力理论上是全局的，但实际上有位置偏好——近的词注意力更强。把约束放在它约束的动作旁边，不要放在远处的”通用规则”里。

模式 5：指令-动作绑定（一条指令一个动作）

每条指令应该尽可能直接对应一个可执行动作。

不好：一句话多个动作

检查代码风格问题并修复，然后运行测试确保通过。

好：一条指令 = 一个动作

运行：`eslint --fix src/`
运行：`npm test`
如果测试失败 → 读错误输出，修复问题，回到步骤 2

为什么有效： 模型把一条清晰指令映射到一个工具调用的可靠性，远高于从一个长句中提取多个隐含动作。

模式 6：输出格式预设（给输出一个”形状”）

给模型一个输出的结构，它来填内容。

不好：开放式

分析这个 PR 的风险。

好：结构约束

- risk_level: high | medium | low
- affected_files: [列表]
- rollback_plan: [字符串]
- requires_review: true | false

为什么有效： 结构定义就像”铁轨”。生成每个字段值时，模型的注意力被字段名强力引导，大幅减少偏离。

模式 7：负空间（说”不要”的同时说”要”）

告诉模型不该做什么时，永远同时告诉它该做什么。

不好：只说不要

不要直接修改数据库。
不要跳过测试。
不要用 sudo。

好：不要 + 替代方案

- 数据库变更 → 生成迁移文件，不要执行原始 SQL
- 需要验证 → 运行完整测试套件再继续，不要跳过
- 需要提权 → 请求用户确认，不要用 sudo

为什么有效： “不要做 X” 只压制了某些输出，但没有推动任何替代方案。模型知道不往哪走，但不知道往哪走。同时给出替代方案就能同时压制错误路径并推动正确路径。

模式 8：XML 标签做语义分区

Claude 的训练数据中包含 XML 标签。用它们来划分提示词的不同部分。

模式 9：带推理过程的示例

让模型看到怎么想，不只是输出什么。

不好：只有输入/输出


deploy staging
已部署到 staging。

好：输入 + 思考过程 + 输出


deploy staging

1. 提供了参数："staging" → 非空 → 跳过用户交互
2. 环境 "staging" 有效（匹配 staging|production）
3. 未检测到 CI 变量 → 但有参数 → 静默执行
4. 执行部署到 staging

已成功部署到 staging。

为什么有效： 示例中的模式会被泛化到模型自己的推理中。它学到的是推理方式，不只是输出格式。

各模式之间的关系

行为稳定性
    ↑
决策点的确定性高
    ↑
注意力分布集中
    ↑
提示词中的词语空间排列
    ↑
┌──────────┬──────────┬──────────┬──────────┐
│ 决策树    │ 注意力   │ 认知卸载  │ 输出格式  │
│          │ 局部性    │          │ 预设     │
├──────────┼──────────┼──────────┼──────────┤
│ 锚定     │ 指令-动作 │ 负空间    │ XML      │
│          │ 绑定      │          │ 标签     │
├──────────┼──────────┼──────────┼──────────┤
│ 带推理的  │          │          │          │
│ 示例     │          │          │          │
└──────────┴──────────┴──────────┴──────────┘

所有技巧都在做同一件事：
改变生成时注意力在各个词上的分布。

参考资料

Prompt Engineering Patterns for Claude Code Skills

2026-04-19T00:00:00+00:00

A practical guide to writing stable, predictable skill prompts — grounded in how LLMs actually process tokens.

Core Principle: Reduce Conditional Entropy

Every token an LLM generates is a probability distribution over candidates. Your prompt’s structure directly controls how sharp or diffuse that distribution is at each decision point.

Sharp distribution (low entropy) → deterministic behavior
Diffuse distribution (high entropy) → unstable, unpredictable behavior

Everything below is a technique for sharpening the distribution at the moments that matter.

What “conditional entropy” actually means

Every time the model generates a token, it faces a set of candidates, each with a probability. If probabilities are spread evenly (e.g., 10 candidates at 10% each), the model is “hesitating” — that’s high entropy. If one candidate sits at 90% and the rest are negligible, the model is “certain” — that’s low entropy.

The way you write your prompt directly determines whether the model hesitates or commits at critical decision points.

Concrete example

Suppose your skill needs to decide whether to ask the user for input, based on the environment.

Prose version:

If running in CI with no arguments, use defaults.
If interactive with arguments, run directly.
If interactive without arguments, ask the user.

After reading this, the model needs to generate its next action. Here’s what’s happening inside attention:

"CI"              → line 1, beginning     ← attention must look back
"no arguments"    → line 1, middle        ← attention must look back
"use defaults"    → line 1, end           ← attention must look back
"interactive"     → line 2, beginning     ← also competing for attention
"ask the user"    → line 3, end           ← also competing for attention

5 conditions scattered across 3 lines — which should I attend to?

Attention is spread across multiple positions → no single condition gets enough weight → the model is uncertain about which path to take → high entropy → unstable behavior.

Tree version:

## CI environment?
├─ YES
│  └─ Has arguments?
│     ├─ YES → use arguments, execute
│     └─ NO  → use defaults, execute
└─ NO (interactive)
   └─ Has arguments?
      ├─ YES → use arguments, execute
      └─ NO  → ask user

After the model determines “CI = YES” and “arguments = NO”, its attention is here:

│     └─ NO  → use defaults, execute
              ↑
              attention is concentrated on this line

The tokens “use defaults, execute” are right next to the cursor. No need to look back anywhere. The model is nearly 100% certain what to do next → very low entropy → deterministic behavior.

Feel it in numbers

Prose version — probability distribution at the decision point:

use arguments, execute:  35%
use defaults, execute:   30%
ask user:                25%
other:                   10%

Three options are close in probability. Run 10 times, roughly 3 may go wrong.

Tree version — at the correct branch leaf:

use defaults, execute:   92%
use arguments, execute:   5%
other:                    3%

One option dominates. Run 10 times, 0–1 deviations.

Why indentation is information

└─ NO (interactive)
   └─ Has arguments?
      └─ NO  → ask user

The indentation (0 spaces, 3 spaces, 6 spaces) becomes whitespace tokens after tokenization. These whitespace tokens encode hierarchy — the model has seen massive amounts of indented structures (source code, YAML, directory trees) during training and has learned: deeper indent = child of parent condition.

Prose has no such spatial encoding. In “If interactive but arguments provided” — the subordination between “interactive” and “arguments provided” must be inferred from natural language grammar alone. That inference itself costs attention and introduces uncertainty.

One-line summary

A tree lets the model look at a few nearby tokens to know what to do. Prose forces it to scan an entire paragraph to piece together the answer. The smaller the search radius, the more certain the outcome.

1. Visual Decision Trees over Prose

Claude follows visual tree structures more reliably than prose descriptions of the same logic, because each branch terminates with an explicit action.

Why it works

Prose scatters conditions across a sentence. The model must attend to multiple distant tokens simultaneously, diluting attention. A tree places the relevant condition and its action adjacent in the token sequence — the model only needs to look at nearby tokens to know what to do.

Bad: prose

If running in CI with no arguments, use defaults. If interactive with arguments,
run directly. If interactive without arguments, ask the user.

Good: decision tree

## Is $ARGUMENTS non-empty?
├─ YES → parse arguments, execute directly, no interaction
└─ NO
   ## Is $CI or $CLAUDE_NONINTERACTIVE set?
   ├─ YES → use values from , execute directly
   └─ NO  → ask user for missing parameters, then execute

Why indentation matters

Indentation tokens encode hierarchy. Models have seen massive amounts of indented structures (code, YAML, directory trees) during training and have learned that deeper indent = child of parent condition. Prose has no such spatial encoding — the model must infer nesting from natural language grammar, which costs attention and introduces uncertainty.

2. Grounding (Anchoring)

Give the model a concrete starting point instead of letting it sample from an infinite space.

Bad: unanchored

Generate a deployment script.

Good: anchored with template

Generate a deployment script based on this template:

Why: Template tokens directly participate in attention — the model’s output is “pulled toward” the template’s distribution rather than sampling from the generic concept of “deployment script.”

3. Cognitive Offloading

Externalize reasoning steps that the model would otherwise have to perform implicitly.

Bad: implicit reasoning required

Analyze this code's performance issues and fix them.

Good: explicit steps provided

Identify all loops and recursion
Annotate each with time complexity
Flag anything O(n²) or higher
Propose optimization for each flagged section

Execute these steps in order.

Why: LLMs have no true working memory. Each reasoning step consumes attention resources from the context window. Writing out intermediate steps provides “external memory” — each step only needs to attend to the previous step’s output, not derive everything from scratch.

Decision trees = cognitive offloading for branching logic. Chain-of-thought = cognitive offloading for reasoning. Same principle, different applications.

4. Attention Locality

Related information should be close together in the token sequence. Closer tokens get higher attention weights in practice.

Bad: rule far from its target

Never delete production databases
... (500 tokens of other content) ...
Clean up expired data

Good: rule adjacent to its target

Clean up expired data
Never delete production databases

Why: Transformer attention is theoretically global but has positional bias — nearby tokens receive stronger attention scores. Place constraints next to the actions they constrain, not in a distant “general rules” section.

5. Token-Action Binding

Each instruction should map as directly as possible to one executable action.

Bad: multiple implicit actions in one sentence

Check code style issues and fix them then run tests and make sure they pass.

Good: one instruction = one action

Run: `eslint --fix src/`
Run: `npm test`
If tests fail → read error output, fix the issue, go to step 2

Why: The model maps a single clear instruction sequence to a single tool call far more reliably than extracting multiple implied actions from a run-on sentence.

6. Schema Priming

Give the model an output “shape” and it fills in the content.

Bad: open-ended

Analyze this PR's risk.

Good: schema-constrained

- risk_level: high | medium | low
- affected_files: [list]
- rollback_plan: [string]
- requires_review: true | false

Why: Schema tokens act as “rails” during decoding. When generating each field value, the model’s attention is strongly guided by the schema key names, drastically reducing drift.

7. Negative Space (Explicit Alternatives)

When telling the model what NOT to do, always provide what TO DO instead.

Bad: negation only

Don't modify the database directly.
Don't skip tests.
Don't use sudo.

Good: negation + alternative path

- Database changes → generate a migration file, never execute raw SQL
- Validation needed → run full test suite before continuing, never skip
- Elevated permissions → request user confirmation, never use sudo

Why: “Don’t do X” only suppresses certain token sequences but doesn’t boost any alternative. The model knows where not to go but not where to go → unstable. Providing the alternative simultaneously suppresses the wrong path and boosts the right one.

8. XML Tags for Semantic Boundaries

Claude was trained with XML tags in its training data. Use them to delineate prompt sections.

Recommended structure for skill prompts

Background information the model needs to understand the domain.

Inputs with types, defaults, and sources.

Visual branching logic with explicit leaf actions.

...
Step-by-step reasoning the model should follow
...

What not to do + what to do instead.

Expected output shape.

Why: XML tags create hard semantic boundaries. The model treats content inside different tags as distinct sections, reducing cross-contamination between instructions, examples, and constraints.

9. Few-Shot with Embedded Reasoning

Show the model HOW to think, not just WHAT to output.

Bad: input/output pairs only


deploy staging
Deployed to staging.

Good: input + thinking + output


deploy staging

1. Arguments provided: "staging" → non-empty → skip user interaction
2. Environment "staging" is valid (matches staging|production)
3. No CI variable detected → but args present → proceed silently
4. Execute deployment to staging

Deployed to staging successfully.

Why: The pattern inside few-shot examples gets generalized into the model’s own extended thinking blocks. It learns the reasoning pattern, not just the output pattern.

Putting It All Together: Skill Prompt Template

---
name: my-skill
description: One line that tells Claude WHEN to activate this skill
context: default
allowed-tools: Bash(specific-commands*), Write, Edit
---

What this skill does and why it exists.
Domain-specific background if needed.

- target: from $ARGUMENTS, or ask user
- env_mode: from $CI or $CLAUDE_NONINTERACTIVE, default "interactive"

## Has $ARGUMENTS?
├─ YES → parse into `target`, skip interaction
└─ NO
   ## Is env_mode non-interactive?
   ├─ YES → use defaults from , proceed
   └─ NO  → ask user for `target`, then proceed

- target: "staging"

1. Validate `target` against allowed values (staging | production)
2. Run preflight checks: `npm test`
3. If tests fail → stop, report error, do NOT proceed
4. Execute deployment to `target`
5. Verify deployment health

/my-skill production

1. $ARGUMENTS = "production" → non-empty → use directly
2. target = "production" → valid
3. Run tests → pass
4. Deploy to production

Deployed to production. Health check passed.

- Never deploy without passing tests → run `npm test` first, abort on failure
- Never modify .env files → read config from environment variables only
- Never run with sudo → request user confirmation for elevated actions

Summary of Relationships

Behavioral Stability
      ↑
Low Conditional Entropy at Decision Points
      ↑
Sharp Attention Distribution
      ↑
Token Spatial Arrangement in Prompt
      ↑
┌─────────────┬──────────────┬───────────────┬──────────────┐
│ Decision    │ Attention    │ Cognitive     │ Schema       │
│ Trees       │ Locality     │ Offloading    │ Priming      │
├─────────────┼──────────────┼───────────────┼──────────────┤
│ Grounding   │ Token-Action │ Negative      │ XML          │
│ (Anchoring) │ Binding      │ Space         │ Boundaries   │
├─────────────┼──────────────┼───────────────┼──────────────┤
│ Few-Shot w/ │              │               │              │
│ Reasoning   │              │               │              │
└─────────────┴──────────────┴───────────────┴──────────────┘

All techniques manipulate the same thing:
how attention is distributed across tokens at generation time.

Prompt, Context & Agent Orchestration

142 个 AI Agent 提示词模式

为什么做这个

目录结构

12 个模式分类

举例：模式 23 — 带上限的修复循环

举例：模式 45 — 指令式写前审查

技巧亮点

Token 级技巧（9 个）

防偷懒策略（8 种）

Prompt 评审框架

如何使用

许可

142 Prompt Patterns for AI Agent Development

Why This Exists

Catalog Structure

The 12 Pattern Categories

Example: Pattern 23 — Attempt-Capped Repair Loop

Example: Pattern 45 — Directive-Based Pre-Write Review

Techniques Highlights

Token-Level Techniques (9 techniques)

Anti-Laziness Strategies (8 strategies)

Prompt Review Framework

How to Use

License

模式：决策树比自然语言更适合 AI 提示词中的分支逻辑

问题

结果（各 20 次运行）

主要缓解措施

次要措施

完整对比

这意味着什么

模式

为什么有效

什么时候用 / 什么时候不用

自己试试

延伸阅读

Pattern: Decision Trees Beat Prose in AI Prompts

The Problem

Results (20 Runs Each)

Primary Mitigation Action

Secondary Action

Full Comparison Table

What This Means in Practice

The Pattern

Why It Works

When to Use / When Not To

Try It Yourself

Further Reading

提示词工程模式：写出稳定可靠的 AI 提示词

核心原则：降低不确定性

简单例子

一句话总结

模式 1：决策树替代自然语言

不好：自然语言

好：决策树

模式 2：锚定（给起点）

不好：无锚定

好：用模板锚定

模式 3：认知卸载（把思考步骤写出来）

不好：隐式推理

好：显式步骤

模式 4：注意力局部性（把相关的放在一起）

不好：规则离目标太远

好：规则紧挨目标

模式 5：指令-动作绑定（一条指令一个动作）

不好：一句话多个动作

好：一条指令 = 一个动作

模式 6：输出格式预设（给输出一个”形状”）

不好：开放式

好：结构约束

模式 7：负空间（说”不要”的同时说”要”）

不好：只说不要

好：不要 + 替代方案

模式 8：XML 标签做语义分区

推荐的提示词结构

模式 9：带推理过程的示例

不好：只有输入/输出

好：输入 + 思考过程 + 输出

各模式之间的关系