<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://wenbo97.github.io/prompt-context-patterns/feed.xml" rel="self" type="application/atom+xml" /><link href="https://wenbo97.github.io/prompt-context-patterns/" rel="alternate" type="text/html" /><updated>2026-04-29T09:52:22+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/feed.xml</id><title type="html">Prompt, Context &amp;amp; Agent Orchestration</title><subtitle>Practical patterns for writing stable prompts, engineering context, and orchestrating AI agents.</subtitle><entry xml:lang="zh"><title type="html">142 个 AI Agent 提示词模式</title><link href="https://wenbo97.github.io/prompt-context-patterns/2026/04/29/prompt-pattern-catalog-zh/" rel="alternate" type="text/html" title="142 个 AI Agent 提示词模式" /><published>2026-04-29T00:00:00+00:00</published><updated>2026-04-29T00:00:00+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/2026/04/29/prompt-pattern-catalog-zh</id><content type="html" xml:base="https://wenbo97.github.io/prompt-context-patterns/2026/04/29/prompt-pattern-catalog-zh/"><![CDATA[<p>从 500+ 个真实 AI agent 插件中提炼的 142 个提示词工程模式。不是理论，每个模式都有名字、解决的问题、和可直接使用的 prompt 片段。</p>

<hr />

<h2 id="为什么做这个">为什么做这个</h2>

<p>市面上的”awesome prompts”集合大多面向 ChatGPT 用户的一次性问答。<strong>这个目录面向构建 AI agent 和多步骤插件的开发者</strong> — prompt 稳定性、技能间协调、防御性模式才是关键。</p>

<p>这些模式来自对 500+ 个生产插件的分析，涵盖：DevOps 自动化、安全分析、代码迁移、事件响应、部署编排等。每个模式至少在 3 个独立插件中出现才被收录。</p>

<hr />

<h2 id="目录结构">目录结构</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>catalog/
├── catalog-index.md              ← 总索引（全部 142 个模式）
├── categories/                   ← 按功能分组
│   ├── patterns-structural-scaffolding.md
│   ├── patterns-input-output-contracts.md
│   ├── patterns-execution-control.md
│   ├── patterns-knowledge-and-context.md
│   ├── patterns-agent-orchestration.md
│   ├── patterns-safety-and-trust.md
│   ├── patterns-quality-and-feedback.md
│   └── ...（共 18 个文件）
├── techniques/                   ← 深度指南
│   ├── token-level-techniques.md    ← 基于熵理论的 9 个技巧
│   ├── anti-laziness.md             ← 防止 agent 偷懒的 8 种策略
│   ├── skill-architecture.md        ← 技能打包与组合
│   ├── branching-stability.md       ← 分支逻辑的可靠性
│   ├── reference-skip-playbook.md   ← 强制 agent 阅读引用文件
│   └── good-vs-bad-template.md      ← 好坏 prompt 对比
└── standards/                    ← 评审框架
    ├── quality-standards.md         ← P0/P1/P2 严重度分级
    └── review-checklist.md          ← 9 维度 prompt 评审
</code></pre></div></div>

<hr />

<h2 id="12-个模式分类">12 个模式分类</h2>

<table>
  <thead>
    <tr>
      <th>#</th>
      <th>分类</th>
      <th>模式数</th>
      <th>覆盖内容</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>结构脚手架</td>
      <td>15</td>
      <td>阶段门控、决策树、边界标签</td>
    </tr>
    <tr>
      <td>2</td>
      <td>输入/输出契约</td>
      <td>12</td>
      <td>Schema 约束、格式锁定、校验</td>
    </tr>
    <tr>
      <td>3</td>
      <td>执行控制</td>
      <td>14</td>
      <td>尝试上限、停止条件、重试逻辑</td>
    </tr>
    <tr>
      <td>4</td>
      <td>知识与上下文</td>
      <td>12</td>
      <td>SSOT 注册表、按需加载、缓存层</td>
    </tr>
    <tr>
      <td>5</td>
      <td>Agent 编排</td>
      <td>11</td>
      <td>子 agent 派发、并行执行、交接</td>
    </tr>
    <tr>
      <td>6</td>
      <td>安全与信任</td>
      <td>10</td>
      <td>护栏、禁止操作、升级门控</td>
    </tr>
    <tr>
      <td>7</td>
      <td>质量与反馈</td>
      <td>9</td>
      <td>自审查、证据门控、置信度评分</td>
    </tr>
    <tr>
      <td>8</td>
      <td>高级 I/O 与领域</td>
      <td>10</td>
      <td>领域路由、多模态、Schema 演进</td>
    </tr>
    <tr>
      <td>9</td>
      <td>高级编排</td>
      <td>8</td>
      <td>DAG 执行、共识、群体模式</td>
    </tr>
    <tr>
      <td>10</td>
      <td>高级质量</td>
      <td>7</td>
      <td>回归检测、漂移监控</td>
    </tr>
    <tr>
      <td>11</td>
      <td>高级安全</td>
      <td>8</td>
      <td>数据分类、审计追踪、合规</td>
    </tr>
    <tr>
      <td>12</td>
      <td>高级工作流</td>
      <td>10</td>
      <td>部署门控、回滚、状态机</td>
    </tr>
  </tbody>
</table>

<p>还有补充分类：Karpathy 行为模式、Claude Code 平台模式、开源技能模式、Gap-fill 模式。</p>

<hr />

<h2 id="举例模式-23--带上限的修复循环">举例：模式 23 — 带上限的修复循环</h2>

<p><strong>问题：</strong> AI agent 修复构建错误时可能无限循环，或者太早放弃。</p>

<p><strong>模式：</strong></p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu">## 停止条件（穷举）</span>

修复循环只在以下条件满足时停止：

| 条件 | 动作 |
|------|------|
| (a) 构建成功 | 返回成功 |
| (b) 尝试次数达到 N | 返回失败，附带剩余错误 |
| (c) 会话死亡 | 返回 session_dead |

没有其他理由可以停止。"错误太多"不行，
"超出范围"不行，"无法修复"也不行。
</code></pre></div></div>

<p><strong>为什么有效：</strong> 消除了 agent 合理化提前退出的倾向。穷举表格不留歧义 — agent 无法发明第 4 个停止条件。</p>

<hr />

<h2 id="举例模式-45--指令式写前审查">举例：模式 45 — 指令式写前审查</h2>

<p><strong>问题：</strong> Agent 写入错误的配置变更，破坏生产行为。</p>

<p><strong>模式：</strong></p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>每次编辑前，逐条检查护栏：

| # | 检查项 | 通过 | 失败 |
|---|--------|------|------|
| G1 | 抑制是否有范围？ | 在具体条目上 | 全局范围 |
| G2 | 覆写是否必要？ | 默认值不满足 | 默认值已经可以 |
| G3 | 是否创建了配套文件？ | 文件对存在 | 孤立条件 |

任何护栏返回"失败" → 不写入。先修正。
</code></pre></div></div>

<p><strong>为什么有效：</strong> 在”决定做什么”和”执行”之间强制暂停。表格格式让每项检查独立可评估 — agent 无法在连续叙述中跳过某一项。</p>

<hr />

<h2 id="技巧亮点">技巧亮点</h2>

<h3 id="token-级技巧9-个">Token 级技巧（9 个）</h3>

<p>基于 LLM 实际处理 token 的方式 — 不是直觉。例如：<strong>决策树在分支逻辑上优于自然语言</strong>，因为树结构把注意力集中在一条路径上，而自然语言同时分散注意力到所有条件。</p>

<h3 id="防偷懒策略8-种">防偷懒策略（8 种）</h3>

<p>Agent 会跳过引用文件的阅读、把多步流程压缩成捷径、”记住”而不是重新读取。防偷懒指南记录了 8 种系统性防御，从强制读取门控到渐进式披露。</p>

<h3 id="prompt-评审框架">Prompt 评审框架</h3>

<p>9 个维度（清晰度、确定性、安全性、可测试性……）的结构化评审流程，配合 P0/P1/P2 严重度分级。设计用于 agent prompt 的同行评审。</p>

<hr />

<h2 id="如何使用">如何使用</h2>

<ol>
  <li><strong>构建新技能？</strong> 浏览<a href="/prompt-context-patterns/catalog/">目录索引</a>找到匹配的模式</li>
  <li><strong>调试不稳定行为？</strong> 查看<a href="/prompt-context-patterns/catalog/categories/patterns-execution-control-zh">执行控制</a>和<a href="/prompt-context-patterns/catalog/techniques/anti-laziness-zh">防偷懒</a></li>
  <li><strong>评审别人的 prompt？</strong> 使用<a href="/prompt-context-patterns/catalog/standards/review-checklist-zh">评审清单</a></li>
  <li><strong>学习 prompt 工程？</strong> 从 <a href="/prompt-context-patterns/catalog/techniques/token-level-techniques-zh">Token 级技巧</a>开始</li>
</ol>

<hr />

<h2 id="许可">许可</h2>

<p>MIT。可自由用于你的 agent、插件和项目。</p>]]></content><author><name></name></author><category term="patterns" /><category term="catalog" /><summary type="html"><![CDATA[从 500+ 个真实 AI agent 插件中提炼的 142 个提示词工程模式。不是理论，每个模式都有名字、解决的问题、和可直接使用的 prompt 片段。 为什么做这个 市面上的”awesome prompts”集合大多面向 ChatGPT 用户的一次性问答。这个目录面向构建 AI agent 和多步骤插件的开发者 — prompt 稳定性、技能间协调、防御性模式才是关键。 这些模式来自对 500+ 个生产插件的分析，涵盖：DevOps 自动化、安全分析、代码迁移、事件响应、部署编排等。每个模式至少在 3 个独立插件中出现才被收录。 目录结构 catalog/ ├── catalog-index.md ← 总索引（全部 142 个模式） ├── categories/ ← 按功能分组 │ ├── patterns-structural-scaffolding.md │ ├── patterns-input-output-contracts.md │ ├── patterns-execution-control.md │ ├── patterns-knowledge-and-context.md │ ├── patterns-agent-orchestration.md │ ├── patterns-safety-and-trust.md │ ├── patterns-quality-and-feedback.md │ └── ...（共 18 个文件） ├── techniques/ ← 深度指南 │ ├── token-level-techniques.md ← 基于熵理论的 9 个技巧 │ ├── anti-laziness.md ← 防止 agent 偷懒的 8 种策略 │ ├── skill-architecture.md ← 技能打包与组合 │ ├── branching-stability.md ← 分支逻辑的可靠性 │ ├── reference-skip-playbook.md ← 强制 agent 阅读引用文件 │ └── good-vs-bad-template.md ← 好坏 prompt 对比 └── standards/ ← 评审框架 ├── quality-standards.md ← P0/P1/P2 严重度分级 └── review-checklist.md ← 9 维度 prompt 评审 12 个模式分类 # 分类 模式数 覆盖内容 1 结构脚手架 15 阶段门控、决策树、边界标签 2 输入/输出契约 12 Schema 约束、格式锁定、校验 3 执行控制 14 尝试上限、停止条件、重试逻辑 4 知识与上下文 12 SSOT 注册表、按需加载、缓存层 5 Agent 编排 11 子 agent 派发、并行执行、交接 6 安全与信任 10 护栏、禁止操作、升级门控 7 质量与反馈 9 自审查、证据门控、置信度评分 8 高级 I/O 与领域 10 领域路由、多模态、Schema 演进 9 高级编排 8 DAG 执行、共识、群体模式 10 高级质量 7 回归检测、漂移监控 11 高级安全 8 数据分类、审计追踪、合规 12 高级工作流 10 部署门控、回滚、状态机 还有补充分类：Karpathy 行为模式、Claude Code 平台模式、开源技能模式、Gap-fill 模式。 举例：模式 23 — 带上限的修复循环 问题： AI agent 修复构建错误时可能无限循环，或者太早放弃。 模式： ## 停止条件（穷举） 修复循环只在以下条件满足时停止： | 条件 | 动作 | |------|------| | (a) 构建成功 | 返回成功 | | (b) 尝试次数达到 N | 返回失败，附带剩余错误 | | (c) 会话死亡 | 返回 session_dead | 没有其他理由可以停止。"错误太多"不行， "超出范围"不行，"无法修复"也不行。 为什么有效： 消除了 agent 合理化提前退出的倾向。穷举表格不留歧义 — agent 无法发明第 4 个停止条件。 举例：模式 45 — 指令式写前审查 问题： Agent 写入错误的配置变更，破坏生产行为。 模式： 每次编辑前，逐条检查护栏： | # | 检查项 | 通过 | 失败 | |---|--------|------|------| | G1 | 抑制是否有范围？ | 在具体条目上 | 全局范围 | | G2 | 覆写是否必要？ | 默认值不满足 | 默认值已经可以 | | G3 | 是否创建了配套文件？ | 文件对存在 | 孤立条件 | 任何护栏返回"失败" → 不写入。先修正。 为什么有效： 在”决定做什么”和”执行”之间强制暂停。表格格式让每项检查独立可评估 — agent 无法在连续叙述中跳过某一项。 技巧亮点 Token 级技巧（9 个） 基于 LLM 实际处理 token 的方式 — 不是直觉。例如：决策树在分支逻辑上优于自然语言，因为树结构把注意力集中在一条路径上，而自然语言同时分散注意力到所有条件。 防偷懒策略（8 种） Agent 会跳过引用文件的阅读、把多步流程压缩成捷径、”记住”而不是重新读取。防偷懒指南记录了 8 种系统性防御，从强制读取门控到渐进式披露。 Prompt 评审框架 9 个维度（清晰度、确定性、安全性、可测试性……）的结构化评审流程，配合 P0/P1/P2 严重度分级。设计用于 agent prompt 的同行评审。 如何使用 构建新技能？ 浏览目录索引找到匹配的模式 调试不稳定行为？ 查看执行控制和防偷懒 评审别人的 prompt？ 使用评审清单 学习 prompt 工程？ 从 Token 级技巧开始 许可 MIT。可自由用于你的 agent、插件和项目。]]></summary></entry><entry><title type="html">142 Prompt Patterns for AI Agent Development</title><link href="https://wenbo97.github.io/prompt-context-patterns/2026/04/29/prompt-pattern-catalog/" rel="alternate" type="text/html" title="142 Prompt Patterns for AI Agent Development" /><published>2026-04-29T00:00:00+00:00</published><updated>2026-04-29T00:00:00+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/2026/04/29/prompt-pattern-catalog</id><content type="html" xml:base="https://wenbo97.github.io/prompt-context-patterns/2026/04/29/prompt-pattern-catalog/"><![CDATA[<p>A categorized catalog of 142 prompt engineering patterns — extracted from 500+ real-world AI agent plugins. Not theory. Every pattern has a name, a problem it solves, and a concrete prompt snippet.</p>

<hr />

<h2 id="why-this-exists">Why This Exists</h2>

<p>Most “awesome prompt” collections target ChatGPT users writing one-off queries. <strong>This catalog targets developers building AI agents and multi-step plugins</strong> — where prompt stability, inter-skill coordination, and defensive patterns matter.</p>

<p>The patterns were extracted by analyzing 500+ production plugins across categories: DevOps automation, security analysis, code migration, incident response, deployment orchestration, and more. Each pattern appeared in at least 3 independent plugins before inclusion.</p>

<hr />

<h2 id="catalog-structure">Catalog Structure</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>catalog/
├── catalog-index.md              ← Master index (all 142 patterns)
├── categories/                   ← Patterns grouped by function
│   ├── patterns-structural-scaffolding.md
│   ├── patterns-input-output-contracts.md
│   ├── patterns-execution-control.md
│   ├── patterns-knowledge-and-context.md
│   ├── patterns-agent-orchestration.md
│   ├── patterns-safety-and-trust.md
│   ├── patterns-quality-and-feedback.md
│   └── ... (18 files total)
├── techniques/                   ← Deep-dive guides
│   ├── token-level-techniques.md    ← 9 techniques grounded in entropy theory
│   ├── anti-laziness.md             ← 8 strategies to prevent agent shortcutting
│   ├── skill-architecture.md        ← Skill packaging and composition
│   ├── branching-stability.md       ← Branch logic reliability
│   ├── reference-skip-playbook.md   ← Force agents to read references
│   └── good-vs-bad-template.md      ← Side-by-side prompt comparison
└── standards/                    ← Review frameworks
    ├── quality-standards.md         ← P0/P1/P2 severity grading
    └── review-checklist.md          ← 9-dimension prompt review
</code></pre></div></div>

<hr />

<h2 id="the-12-pattern-categories">The 12 Pattern Categories</h2>

<table>
  <thead>
    <tr>
      <th>#</th>
      <th>Category</th>
      <th>Patterns</th>
      <th>What It Covers</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>Structural Scaffolding</td>
      <td>15</td>
      <td>Phase gates, decision trees, boundary tags</td>
    </tr>
    <tr>
      <td>2</td>
      <td>Input/Output Contracts</td>
      <td>12</td>
      <td>Schema enforcement, format locks, validation</td>
    </tr>
    <tr>
      <td>3</td>
      <td>Execution Control</td>
      <td>14</td>
      <td>Attempt limits, stop conditions, retry logic</td>
    </tr>
    <tr>
      <td>4</td>
      <td>Knowledge &amp; Context</td>
      <td>12</td>
      <td>SSOT registries, on-demand loading, cache layers</td>
    </tr>
    <tr>
      <td>5</td>
      <td>Agent Orchestration</td>
      <td>11</td>
      <td>Sub-agent dispatch, parallel execution, handoffs</td>
    </tr>
    <tr>
      <td>6</td>
      <td>Safety &amp; Trust</td>
      <td>10</td>
      <td>Guardrails, prohibited actions, escalation gates</td>
    </tr>
    <tr>
      <td>7</td>
      <td>Quality &amp; Feedback</td>
      <td>9</td>
      <td>Self-review, evidence gates, confidence scoring</td>
    </tr>
    <tr>
      <td>8</td>
      <td>Advanced I/O &amp; Domain</td>
      <td>10</td>
      <td>Domain routing, multi-modal, schema evolution</td>
    </tr>
    <tr>
      <td>9</td>
      <td>Advanced Orchestration</td>
      <td>8</td>
      <td>DAG execution, consensus, swarm patterns</td>
    </tr>
    <tr>
      <td>10</td>
      <td>Advanced Quality</td>
      <td>7</td>
      <td>Regression detection, drift monitoring</td>
    </tr>
    <tr>
      <td>11</td>
      <td>Advanced Safety</td>
      <td>8</td>
      <td>Data classification, audit trails, compliance</td>
    </tr>
    <tr>
      <td>12</td>
      <td>Advanced Workflow</td>
      <td>10</td>
      <td>Deployment gates, rollback, state machines</td>
    </tr>
  </tbody>
</table>

<p>Plus supplementary categories: Karpathy behavioral patterns, Claude Code platform patterns, open-source skill patterns, and gap-fill patterns.</p>

<hr />

<h2 id="example-pattern-23--attempt-capped-repair-loop">Example: Pattern 23 — Attempt-Capped Repair Loop</h2>

<p><strong>Problem:</strong> An AI agent fixing build errors might loop forever or give up too early.</p>

<p><strong>Pattern:</strong></p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu">## Stop Conditions (Exhaustive)</span>

The repair loop stops ONLY when ONE of these is met:

| Condition | Action |
|-----------|--------|
| (a) Build succeeds | Return success |
| (b) Attempt counter reaches N | Return failed with remaining errors |
| (c) Session dies | Return session_dead |

No other condition justifies stopping. Not "too many errors",
not "beyond scope", not "unfixable."
</code></pre></div></div>

<p><strong>Why it works:</strong> Eliminates the agent’s natural tendency to rationalize early exit. The exhaustive table leaves no ambiguity — the agent cannot invent a 4th stop condition.</p>

<hr />

<h2 id="example-pattern-45--directive-based-pre-write-review">Example: Pattern 45 — Directive-Based Pre-Write Review</h2>

<p><strong>Problem:</strong> Agent writes incorrect config changes that break production behavior.</p>

<p><strong>Pattern:</strong></p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Before EVERY edit, evaluate each guardrail:

| # | Check | PASS | FAIL |
|---|-------|------|------|
| G1 | Is suppression scoped? | On specific item | Blanket scope |
| G2 | Is override needed? | Default insufficient | Default works fine |
| G3 | Is companion created? | Paired files exist | Orphaned condition |

If ANY guardrail returns FAIL → do NOT write. Revise first.
</code></pre></div></div>

<p><strong>Why it works:</strong> Forces a mandatory pause between “decide what to do” and “do it.” The table format means each check is independently evaluable — the agent can’t skip one by flowing past it in prose.</p>

<hr />

<h2 id="techniques-highlights">Techniques Highlights</h2>

<h3 id="token-level-techniques-9-techniques">Token-Level Techniques (9 techniques)</h3>

<p>Grounded in how LLMs actually process tokens — not intuition. Example: <strong>Decision trees beat prose for branching logic</strong> because tree structure concentrates attention on one path, while prose spreads attention across all conditions simultaneously.</p>

<h3 id="anti-laziness-strategies-8-strategies">Anti-Laziness Strategies (8 strategies)</h3>

<p>Agents skip reference reads, collapse multi-step procedures into shortcuts, and “remember” instead of re-reading. The anti-laziness guide documents 8 systematic defenses, from mandatory read gates to progressive disclosure.</p>

<h3 id="prompt-review-framework">Prompt Review Framework</h3>

<p>A structured review process with 9 dimensions (clarity, determinism, safety, testability…) and P0/P1/P2 severity grading. Designed for peer review of agent prompts — not just self-review.</p>

<hr />

<h2 id="how-to-use">How to Use</h2>

<ol>
  <li><strong>Building a new skill?</strong> Scan the <a href="/prompt-context-patterns/catalog/">catalog index</a> for patterns that match your problem</li>
  <li><strong>Debugging unstable behavior?</strong> Check <a href="/prompt-context-patterns/catalog/categories/patterns-execution-control">execution control</a> and <a href="/prompt-context-patterns/catalog/techniques/anti-laziness">anti-laziness</a></li>
  <li><strong>Reviewing someone’s prompt?</strong> Use the <a href="/prompt-context-patterns/catalog/standards/review-checklist">review checklist</a></li>
  <li><strong>Learning prompt engineering?</strong> Start with <a href="/prompt-context-patterns/catalog/techniques/token-level-techniques">token-level techniques</a></li>
</ol>

<hr />

<h2 id="license">License</h2>

<p>MIT. Use these patterns in your own agents, plugins, and projects.</p>]]></content><author><name></name></author><category term="patterns" /><category term="catalog" /><summary type="html"><![CDATA[A categorized catalog of 142 prompt engineering patterns — extracted from 500+ real-world AI agent plugins. Not theory. Every pattern has a name, a problem it solves, and a concrete prompt snippet. Why This Exists Most “awesome prompt” collections target ChatGPT users writing one-off queries. This catalog targets developers building AI agents and multi-step plugins — where prompt stability, inter-skill coordination, and defensive patterns matter. The patterns were extracted by analyzing 500+ production plugins across categories: DevOps automation, security analysis, code migration, incident response, deployment orchestration, and more. Each pattern appeared in at least 3 independent plugins before inclusion. Catalog Structure catalog/ ├── catalog-index.md ← Master index (all 142 patterns) ├── categories/ ← Patterns grouped by function │ ├── patterns-structural-scaffolding.md │ ├── patterns-input-output-contracts.md │ ├── patterns-execution-control.md │ ├── patterns-knowledge-and-context.md │ ├── patterns-agent-orchestration.md │ ├── patterns-safety-and-trust.md │ ├── patterns-quality-and-feedback.md │ └── ... (18 files total) ├── techniques/ ← Deep-dive guides │ ├── token-level-techniques.md ← 9 techniques grounded in entropy theory │ ├── anti-laziness.md ← 8 strategies to prevent agent shortcutting │ ├── skill-architecture.md ← Skill packaging and composition │ ├── branching-stability.md ← Branch logic reliability │ ├── reference-skip-playbook.md ← Force agents to read references │ └── good-vs-bad-template.md ← Side-by-side prompt comparison └── standards/ ← Review frameworks ├── quality-standards.md ← P0/P1/P2 severity grading └── review-checklist.md ← 9-dimension prompt review The 12 Pattern Categories # Category Patterns What It Covers 1 Structural Scaffolding 15 Phase gates, decision trees, boundary tags 2 Input/Output Contracts 12 Schema enforcement, format locks, validation 3 Execution Control 14 Attempt limits, stop conditions, retry logic 4 Knowledge &amp; Context 12 SSOT registries, on-demand loading, cache layers 5 Agent Orchestration 11 Sub-agent dispatch, parallel execution, handoffs 6 Safety &amp; Trust 10 Guardrails, prohibited actions, escalation gates 7 Quality &amp; Feedback 9 Self-review, evidence gates, confidence scoring 8 Advanced I/O &amp; Domain 10 Domain routing, multi-modal, schema evolution 9 Advanced Orchestration 8 DAG execution, consensus, swarm patterns 10 Advanced Quality 7 Regression detection, drift monitoring 11 Advanced Safety 8 Data classification, audit trails, compliance 12 Advanced Workflow 10 Deployment gates, rollback, state machines Plus supplementary categories: Karpathy behavioral patterns, Claude Code platform patterns, open-source skill patterns, and gap-fill patterns. Example: Pattern 23 — Attempt-Capped Repair Loop Problem: An AI agent fixing build errors might loop forever or give up too early. Pattern: ## Stop Conditions (Exhaustive) The repair loop stops ONLY when ONE of these is met: | Condition | Action | |-----------|--------| | (a) Build succeeds | Return success | | (b) Attempt counter reaches N | Return failed with remaining errors | | (c) Session dies | Return session_dead | No other condition justifies stopping. Not "too many errors", not "beyond scope", not "unfixable." Why it works: Eliminates the agent’s natural tendency to rationalize early exit. The exhaustive table leaves no ambiguity — the agent cannot invent a 4th stop condition. Example: Pattern 45 — Directive-Based Pre-Write Review Problem: Agent writes incorrect config changes that break production behavior. Pattern: Before EVERY edit, evaluate each guardrail: | # | Check | PASS | FAIL | |---|-------|------|------| | G1 | Is suppression scoped? | On specific item | Blanket scope | | G2 | Is override needed? | Default insufficient | Default works fine | | G3 | Is companion created? | Paired files exist | Orphaned condition | If ANY guardrail returns FAIL → do NOT write. Revise first. Why it works: Forces a mandatory pause between “decide what to do” and “do it.” The table format means each check is independently evaluable — the agent can’t skip one by flowing past it in prose. Techniques Highlights Token-Level Techniques (9 techniques) Grounded in how LLMs actually process tokens — not intuition. Example: Decision trees beat prose for branching logic because tree structure concentrates attention on one path, while prose spreads attention across all conditions simultaneously. Anti-Laziness Strategies (8 strategies) Agents skip reference reads, collapse multi-step procedures into shortcuts, and “remember” instead of re-reading. The anti-laziness guide documents 8 systematic defenses, from mandatory read gates to progressive disclosure. Prompt Review Framework A structured review process with 9 dimensions (clarity, determinism, safety, testability…) and P0/P1/P2 severity grading. Designed for peer review of agent prompts — not just self-review. How to Use Building a new skill? Scan the catalog index for patterns that match your problem Debugging unstable behavior? Check execution control and anti-laziness Reviewing someone’s prompt? Use the review checklist Learning prompt engineering? Start with token-level techniques License MIT. Use these patterns in your own agents, plugins, and projects.]]></summary></entry><entry xml:lang="zh"><title type="html">模式：决策树比自然语言更适合 AI 提示词中的分支逻辑</title><link href="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/decision-tree-pattern-zh/" rel="alternate" type="text/html" title="模式：决策树比自然语言更适合 AI 提示词中的分支逻辑" /><published>2026-04-19T00:00:00+00:00</published><updated>2026-04-19T00:00:00+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/2026/04/19/decision-tree-pattern-zh</id><content type="html" xml:base="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/decision-tree-pattern-zh/"><![CDATA[<p>用自然语言段落告诉 AI 该做什么，大多数时候能用。但当逻辑有分支（”如果 X 就 Y，否则 Z”），自然语言提示词在多次运行中会产生<strong>不一致的输出</strong> — 即使输入完全相同。</p>

<p><strong>决策树能解决这个问题。</strong> 用可视化的树结构替代自然语言的分支逻辑，模型就能确定性地执行。</p>

<hr />

<h2 id="问题">问题</h2>

<p>考虑一个事故响应系统：AI 需要判断严重等级、分配响应人员、选择缓解措施、生成结构化的响应计划。规则很复杂：4 个严重等级、6 种根因类别、基于层级的人员分配、升级条件、通信截止时间。</p>

<p>我们为同一个任务写了两种提示词 — 一种是自然语言段落（约 140 行），一种是决策树（约 260 行）— 用 <code class="language-plaintext highlighter-rouge">claude -p</code> 各跑了 <strong>20 次</strong>，输入完全相同。</p>

<h2 id="结果各-20-次运行">结果（各 20 次运行）</h2>

<h3 id="主要缓解措施">主要缓解措施</h3>

<table>
  <thead>
    <tr>
      <th>自然语言</th>
      <th>决策树</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>20 次中有 <strong>15 种不同写法</strong></td>
      <td>20 次中只有 <strong>1 种写法</strong></td>
    </tr>
  </tbody>
</table>

<p>自然语言输出 — 每次都不同：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>第 1 次:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality) while investigating root cause"
第 2 次:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality for non-critical read paths)"
第 3 次:  "Enable graceful degradation mode on Cosmos DB East US 2"
第 4 次:  "Enable graceful degradation on Cosmos DB East US 2 to serve cached/reduced-functionality responses while..."
...共 15 种不同表达
</code></pre></div></div>

<p>决策树输出 — 每次相同：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>第 1-20 次:  "enable_graceful_degradation"
</code></pre></div></div>

<h3 id="次要措施">次要措施</h3>

<table>
  <thead>
    <tr>
      <th>自然语言</th>
      <th>决策树</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>20 种不同写法</strong>（零重复）</td>
      <td><strong>2 种</strong>（19 次 <code class="language-plaintext highlighter-rouge">hotfix</code>，1 次 <code class="language-plaintext highlighter-rouge">failover_to_secondary</code>）</td>
    </tr>
  </tbody>
</table>

<p>自然语言的次要措施达到了 <strong>0% 的可复现性</strong> — 每一次运行都产生独特的句子。</p>

<h3 id="完整对比">完整对比</h3>

<table>
  <thead>
    <tr>
      <th>维度</th>
      <th>自然语言（20 次）</th>
      <th>决策树（20 次）</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>严重等级</td>
      <td>20/20 SEV2</td>
      <td>20/20 SEV2</td>
    </tr>
    <tr>
      <td>主要措施（唯一值数量）</td>
      <td><strong>15</strong></td>
      <td><strong>1</strong></td>
    </tr>
    <tr>
      <td>次要措施（唯一值数量）</td>
      <td><strong>20</strong></td>
      <td><strong>2</strong></td>
    </tr>
    <tr>
      <td>响应人员角色组合</td>
      <td>2 种（大小写不一致：<code class="language-plaintext highlighter-rouge">on-call</code> vs <code class="language-plaintext highlighter-rouge">On-Call</code>）</td>
      <td>1 种（一致）</td>
    </tr>
    <tr>
      <td>“避免操作”数量</td>
      <td>不一致（16 次 2 项，4 次 3 项）</td>
      <td>一致（20 次都是 2 项）</td>
    </tr>
    <tr>
      <td>升级触发条件</td>
      <td>3 条，一致</td>
      <td>3 条，一致</td>
    </tr>
  </tbody>
</table>

<h2 id="这意味着什么">这意味着什么</h2>

<p>两种方式都做出了<strong>正确的决策</strong> — SEV2、启用优雅降级、分配 4 名响应人员。差异在于<strong>输出的确定性</strong>。</p>

<p>如果下游代码这样写：</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">response</span><span class="p">[</span><span class="s">"mitigation_plan"</span><span class="p">][</span><span class="s">"primary_action"</span><span class="p">][</span><span class="s">"action"</span><span class="p">]</span> <span class="o">==</span> <span class="s">"enable_graceful_degradation"</span><span class="p">:</span>
    <span class="n">execute_graceful_degradation</span><span class="p">()</span>
</code></pre></div></div>

<ul>
  <li><strong>决策树提示词</strong>：20/20 次匹配成功</li>
  <li><strong>自然语言提示词</strong>：<strong>0/20 次匹配成功</strong>（action 是自由句子，永远不会精确匹配）</li>
</ul>

<p>这不是理论问题。任何解析 AI 输出的系统 — 自动化流水线、agent 编排、工具调用 — 都会因为输出格式不可预测而崩溃。</p>

<h2 id="模式">模式</h2>

<p>核心思路：把这种写法：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>如果根因是部署问题且服务支持即时回滚，优先选择回滚，因为它能以最小风险恢复到最后已知良好状态。
如果根因是部署问题但回滚不可用，则考虑关闭功能标志（如果变更在功能标志后面）。
如果没有功能标志，则进行热修复。
</code></pre></div></div>

<p>替换成这种：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>根因类别？
├─ 部署问题
│   └─ 服务支持即时回滚？
│       ├─ 是 → action: "rollback"
│       └─ 否
│           └─ 变更在功能标志后面？
│               ├─ 是 → action: "disable_feature_flag"
│               └─ 否 → action: "hotfix"
</code></pre></div></div>

<p>同样的逻辑。树版本在每个叶子节点给出了<strong>精确的输出字符串</strong>，缩进编码了决策路径。</p>

<h2 id="为什么有效">为什么有效</h2>

<ol>
  <li><strong>逐步收窄注意力。</strong> 模型每次只评估一个条件，而不是同时持有所有规则。</li>
  <li><strong>提供精确的输出文本。</strong> 叶子节点包含字面量 action 字符串 — 模型直接复制，而不是自己编写新句子。</li>
  <li><strong>用空间编码层级关系。</strong> 缩进 token（空白字符）编码了父子关系。LLM 在训练中从数百万份代码文件、YAML 配置、目录树中学到了这种模式。</li>
</ol>

<h2 id="什么时候用--什么时候不用">什么时候用 / 什么时候不用</h2>

<p><strong>用决策树：</strong></p>
<ul>
  <li>提示词有分支逻辑（if/else、switch/case）</li>
  <li>输出要被代码解析（JSON 字段、action 名称、状态值）</li>
  <li>需要多次运行的一致性</li>
  <li>构建 agent 编排或自动化流水线</li>
</ul>

<p><strong>不需要决策树：</strong></p>
<ul>
  <li>开放式创意任务（变化是好事）</li>
  <li>没有分支逻辑</li>
  <li>输出只给人类阅读（措辞变化无所谓）</li>
</ul>

<h2 id="自己试试">自己试试</h2>

<p>克隆仓库然后运行：</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd eval</span>/decision-tree-ab
bash run.sh 20           <span class="c"># 20 次运行，默认模型</span>
bash run.sh 10 haiku     <span class="c"># 10 次运行，haiku 模型</span>
python3 analyze.py       <span class="c"># 分析结果</span>
</code></pre></div></div>

<p>包含三个场景：简单（部署控制器）、歧义（未知服务器状态）、复杂（200+ 行的事故响应提示词）。</p>

<h2 id="延伸阅读">延伸阅读</h2>

<ul>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices">Anthropic: Prompt Engineering Best Practices</a></li>
  <li><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic: Effective Context Engineering</a></li>
</ul>]]></content><author><name></name></author><category term="patterns" /><category term="decision-tree" /><summary type="html"><![CDATA[用自然语言段落告诉 AI 该做什么，大多数时候能用。但当逻辑有分支（”如果 X 就 Y，否则 Z”），自然语言提示词在多次运行中会产生不一致的输出 — 即使输入完全相同。 决策树能解决这个问题。 用可视化的树结构替代自然语言的分支逻辑，模型就能确定性地执行。 问题 考虑一个事故响应系统：AI 需要判断严重等级、分配响应人员、选择缓解措施、生成结构化的响应计划。规则很复杂：4 个严重等级、6 种根因类别、基于层级的人员分配、升级条件、通信截止时间。 我们为同一个任务写了两种提示词 — 一种是自然语言段落（约 140 行），一种是决策树（约 260 行）— 用 claude -p 各跑了 20 次，输入完全相同。 结果（各 20 次运行） 主要缓解措施 自然语言 决策树 20 次中有 15 种不同写法 20 次中只有 1 种写法 自然语言输出 — 每次都不同： 第 1 次: "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality) while investigating root cause" 第 2 次: "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality for non-critical read paths)" 第 3 次: "Enable graceful degradation mode on Cosmos DB East US 2" 第 4 次: "Enable graceful degradation on Cosmos DB East US 2 to serve cached/reduced-functionality responses while..." ...共 15 种不同表达 决策树输出 — 每次相同： 第 1-20 次: "enable_graceful_degradation" 次要措施 自然语言 决策树 20 种不同写法（零重复） 2 种（19 次 hotfix，1 次 failover_to_secondary） 自然语言的次要措施达到了 0% 的可复现性 — 每一次运行都产生独特的句子。 完整对比 维度 自然语言（20 次） 决策树（20 次） 严重等级 20/20 SEV2 20/20 SEV2 主要措施（唯一值数量） 15 1 次要措施（唯一值数量） 20 2 响应人员角色组合 2 种（大小写不一致：on-call vs On-Call） 1 种（一致） “避免操作”数量 不一致（16 次 2 项，4 次 3 项） 一致（20 次都是 2 项） 升级触发条件 3 条，一致 3 条，一致 这意味着什么 两种方式都做出了正确的决策 — SEV2、启用优雅降级、分配 4 名响应人员。差异在于输出的确定性。 如果下游代码这样写： if response["mitigation_plan"]["primary_action"]["action"] == "enable_graceful_degradation": execute_graceful_degradation() 决策树提示词：20/20 次匹配成功 自然语言提示词：0/20 次匹配成功（action 是自由句子，永远不会精确匹配） 这不是理论问题。任何解析 AI 输出的系统 — 自动化流水线、agent 编排、工具调用 — 都会因为输出格式不可预测而崩溃。 模式 核心思路：把这种写法： 如果根因是部署问题且服务支持即时回滚，优先选择回滚，因为它能以最小风险恢复到最后已知良好状态。 如果根因是部署问题但回滚不可用，则考虑关闭功能标志（如果变更在功能标志后面）。 如果没有功能标志，则进行热修复。 替换成这种： 根因类别？ ├─ 部署问题 │ └─ 服务支持即时回滚？ │ ├─ 是 → action: "rollback" │ └─ 否 │ └─ 变更在功能标志后面？ │ ├─ 是 → action: "disable_feature_flag" │ └─ 否 → action: "hotfix" 同样的逻辑。树版本在每个叶子节点给出了精确的输出字符串，缩进编码了决策路径。 为什么有效 逐步收窄注意力。 模型每次只评估一个条件，而不是同时持有所有规则。 提供精确的输出文本。 叶子节点包含字面量 action 字符串 — 模型直接复制，而不是自己编写新句子。 用空间编码层级关系。 缩进 token（空白字符）编码了父子关系。LLM 在训练中从数百万份代码文件、YAML 配置、目录树中学到了这种模式。 什么时候用 / 什么时候不用 用决策树： 提示词有分支逻辑（if/else、switch/case） 输出要被代码解析（JSON 字段、action 名称、状态值） 需要多次运行的一致性 构建 agent 编排或自动化流水线 不需要决策树： 开放式创意任务（变化是好事） 没有分支逻辑 输出只给人类阅读（措辞变化无所谓） 自己试试 克隆仓库然后运行： cd eval/decision-tree-ab bash run.sh 20 # 20 次运行，默认模型 bash run.sh 10 haiku # 10 次运行，haiku 模型 python3 analyze.py # 分析结果 包含三个场景：简单（部署控制器）、歧义（未知服务器状态）、复杂（200+ 行的事故响应提示词）。 延伸阅读 Anthropic: Prompt Engineering Best Practices Anthropic: Effective Context Engineering]]></summary></entry><entry><title type="html">Pattern: Decision Trees Beat Prose in AI Prompts</title><link href="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/decision-tree-pattern/" rel="alternate" type="text/html" title="Pattern: Decision Trees Beat Prose in AI Prompts" /><published>2026-04-19T00:00:00+00:00</published><updated>2026-04-19T00:00:00+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/2026/04/19/decision-tree-pattern</id><content type="html" xml:base="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/decision-tree-pattern/"><![CDATA[<p>When you tell an AI what to do using natural language paragraphs, it works — most of the time. But when the logic has branches (“if X then Y, otherwise Z”), prose prompts produce <strong>inconsistent outputs across runs</strong> — even when the input is identical.</p>

<p><strong>Decision trees fix this.</strong> Replace prose branching logic with a visual tree structure, and the model follows it deterministically.</p>

<hr />

<h2 id="the-problem">The Problem</h2>

<p>Consider an incident response system. The AI must triage incidents, assign responders, choose mitigation actions, and produce a structured plan. The rules are complex: 4 severity levels, 6 root cause categories, tier-based responder assignment, escalation conditions, communication deadlines.</p>

<p>We wrote two prompts for the same task — one as prose paragraphs (~140 lines), one as decision trees (~260 lines) — and ran each <strong>20 times</strong> with identical input using <code class="language-plaintext highlighter-rouge">claude -p</code>.</p>

<h2 id="results-20-runs-each">Results (20 Runs Each)</h2>

<h3 id="primary-mitigation-action">Primary Mitigation Action</h3>

<table>
  <thead>
    <tr>
      <th>Prose</th>
      <th>Decision Tree</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>15 unique phrasings</strong> out of 20 runs</td>
      <td><strong>1 phrasing</strong> out of 20 runs</td>
    </tr>
  </tbody>
</table>

<p>Prose outputs looked like this — every run different:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Run 1:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality) while investigating root cause"
Run 2:  "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality for non-critical read paths)"
Run 3:  "Enable graceful degradation mode on Cosmos DB East US 2"
Run 4:  "Enable graceful degradation on Cosmos DB East US 2 to serve cached/reduced-functionality responses while..."
...15 unique variants total
</code></pre></div></div>

<p>Decision tree — every run identical:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Run 1-20:  "enable_graceful_degradation"
</code></pre></div></div>

<h3 id="secondary-action">Secondary Action</h3>

<table>
  <thead>
    <tr>
      <th>Prose</th>
      <th>Decision Tree</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>20 unique phrasings</strong> (zero repeats)</td>
      <td><strong>2 variants</strong> (19× <code class="language-plaintext highlighter-rouge">hotfix</code>, 1× <code class="language-plaintext highlighter-rouge">failover_to_secondary</code>)</td>
    </tr>
  </tbody>
</table>

<p>Prose achieved <strong>0% reproducibility</strong> on secondary actions. Every single run produced a unique sentence.</p>

<h3 id="full-comparison-table">Full Comparison Table</h3>

<table>
  <thead>
    <tr>
      <th>Dimension</th>
      <th>Prose (20 runs)</th>
      <th>Decision Tree (20 runs)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Severity classification</td>
      <td>20/20 SEV2</td>
      <td>20/20 SEV2</td>
    </tr>
    <tr>
      <td>Primary action (unique values)</td>
      <td><strong>15</strong></td>
      <td><strong>1</strong></td>
    </tr>
    <tr>
      <td>Secondary action (unique values)</td>
      <td><strong>20</strong></td>
      <td><strong>2</strong></td>
    </tr>
    <tr>
      <td>Responder role sets</td>
      <td>2 variants (capitalization: <code class="language-plaintext highlighter-rouge">on-call</code> vs <code class="language-plaintext highlighter-rouge">On-Call</code>)</td>
      <td>1 (consistent)</td>
    </tr>
    <tr>
      <td>Actions-to-avoid count</td>
      <td>inconsistent (16× two items, 4× three items)</td>
      <td>consistent (20× two items)</td>
    </tr>
    <tr>
      <td>Escalation triggers</td>
      <td>3/3 consistent</td>
      <td>3/3 consistent</td>
    </tr>
  </tbody>
</table>

<h2 id="what-this-means-in-practice">What This Means in Practice</h2>

<p>Both approaches made the <strong>same correct decisions</strong> — SEV2, enable graceful degradation, assign 4 responders. The difference is <strong>output determinism</strong>.</p>

<p>If your downstream code does:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">response</span><span class="p">[</span><span class="s">"mitigation_plan"</span><span class="p">][</span><span class="s">"primary_action"</span><span class="p">][</span><span class="s">"action"</span><span class="p">]</span> <span class="o">==</span> <span class="s">"enable_graceful_degradation"</span><span class="p">:</span>
    <span class="n">execute_graceful_degradation</span><span class="p">()</span>
</code></pre></div></div>

<ul>
  <li><strong>Decision tree prompt</strong>: works 20/20 times</li>
  <li><strong>Prose prompt</strong>: works <strong>0/20 times</strong> (action is a free-form sentence, never matches)</li>
</ul>

<p>This isn’t a theoretical problem. Any system that parses AI output — automation pipelines, agent orchestration, tool calling — breaks when the output format is unpredictable.</p>

<h2 id="the-pattern">The Pattern</h2>

<p>Here’s the core idea. Replace this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>If the root cause is a bad deployment and the service supports instant 
rollback, always prefer rollback over other mitigations because it restores 
the last known good state with minimal risk. If the root cause is a bad 
deployment but rollback is not available, then consider feature flag 
disablement if the change is behind a feature flag. If no feature flag 
exists, proceed with hotfix.
</code></pre></div></div>

<p>With this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Root cause category?
├─ Bad deployment
│   └─ Service supports instant rollback?
│       ├─ YES → action: "rollback"
│       └─ NO
│           └─ Change behind feature flag?
│               ├─ YES → action: "disable_feature_flag"
│               └─ NO  → action: "hotfix"
</code></pre></div></div>

<p>Same logic. The tree version gives the model an <strong>exact string to output</strong> at each leaf node, and the indentation encodes the decision path spatially.</p>

<h2 id="why-it-works">Why It Works</h2>

<ol>
  <li><strong>Narrows attention at each step.</strong> The model evaluates one condition at a time instead of holding all rules in attention simultaneously.</li>
  <li><strong>Provides exact output text.</strong> Leaf nodes contain the literal action string — the model copies it rather than composing a new sentence.</li>
  <li><strong>Encodes hierarchy spatially.</strong> Indentation tokens (whitespace) encode parent-child relationships. LLMs learned this pattern from millions of code files, YAML configs, and directory trees during training.</li>
</ol>

<h2 id="when-to-use--when-not-to">When to Use / When Not To</h2>

<p><strong>Use decision trees when:</strong></p>
<ul>
  <li>Your prompt has branching logic (if/else, switch/case)</li>
  <li>Output is parsed by code (JSON fields, action names, status values)</li>
  <li>You need consistency across multiple runs</li>
  <li>You’re building agent orchestration or automation pipelines</li>
</ul>

<p><strong>Skip decision trees when:</strong></p>
<ul>
  <li>The task is open-ended creative work (variation is desirable)</li>
  <li>There’s no branching logic</li>
  <li>Output is read by humans only (phrasing variation doesn’t matter)</li>
</ul>

<h2 id="try-it-yourself">Try It Yourself</h2>

<p>Clone the repo and run:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd eval</span>/decision-tree-ab
bash run.sh 20           <span class="c"># 20 runs, default model</span>
bash run.sh 10 haiku     <span class="c"># 10 runs, haiku model</span>
python3 analyze.py       <span class="c"># analyze results</span>
</code></pre></div></div>

<p>Three scenarios are included: simple (deployment controller), ambiguous (unknown server status), and complex (incident response with 200+ line prompts).</p>

<h2 id="further-reading">Further Reading</h2>

<ul>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices">Anthropic: Prompt Engineering Best Practices</a></li>
  <li><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic: Effective Context Engineering</a></li>
</ul>]]></content><author><name></name></author><category term="patterns" /><category term="decision-tree" /><summary type="html"><![CDATA[When you tell an AI what to do using natural language paragraphs, it works — most of the time. But when the logic has branches (“if X then Y, otherwise Z”), prose prompts produce inconsistent outputs across runs — even when the input is identical. Decision trees fix this. Replace prose branching logic with a visual tree structure, and the model follows it deterministically. The Problem Consider an incident response system. The AI must triage incidents, assign responders, choose mitigation actions, and produce a structured plan. The rules are complex: 4 severity levels, 6 root cause categories, tier-based responder assignment, escalation conditions, communication deadlines. We wrote two prompts for the same task — one as prose paragraphs (~140 lines), one as decision trees (~260 lines) — and ran each 20 times with identical input using claude -p. Results (20 Runs Each) Primary Mitigation Action Prose Decision Tree 15 unique phrasings out of 20 runs 1 phrasing out of 20 runs Prose outputs looked like this — every run different: Run 1: "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality) while investigating root cause" Run 2: "Enable graceful degradation on Cosmos DB (cached responses, reduced functionality for non-critical read paths)" Run 3: "Enable graceful degradation mode on Cosmos DB East US 2" Run 4: "Enable graceful degradation on Cosmos DB East US 2 to serve cached/reduced-functionality responses while..." ...15 unique variants total Decision tree — every run identical: Run 1-20: "enable_graceful_degradation" Secondary Action Prose Decision Tree 20 unique phrasings (zero repeats) 2 variants (19× hotfix, 1× failover_to_secondary) Prose achieved 0% reproducibility on secondary actions. Every single run produced a unique sentence. Full Comparison Table Dimension Prose (20 runs) Decision Tree (20 runs) Severity classification 20/20 SEV2 20/20 SEV2 Primary action (unique values) 15 1 Secondary action (unique values) 20 2 Responder role sets 2 variants (capitalization: on-call vs On-Call) 1 (consistent) Actions-to-avoid count inconsistent (16× two items, 4× three items) consistent (20× two items) Escalation triggers 3/3 consistent 3/3 consistent What This Means in Practice Both approaches made the same correct decisions — SEV2, enable graceful degradation, assign 4 responders. The difference is output determinism. If your downstream code does: if response["mitigation_plan"]["primary_action"]["action"] == "enable_graceful_degradation": execute_graceful_degradation() Decision tree prompt: works 20/20 times Prose prompt: works 0/20 times (action is a free-form sentence, never matches) This isn’t a theoretical problem. Any system that parses AI output — automation pipelines, agent orchestration, tool calling — breaks when the output format is unpredictable. The Pattern Here’s the core idea. Replace this: If the root cause is a bad deployment and the service supports instant rollback, always prefer rollback over other mitigations because it restores the last known good state with minimal risk. If the root cause is a bad deployment but rollback is not available, then consider feature flag disablement if the change is behind a feature flag. If no feature flag exists, proceed with hotfix. With this: Root cause category? ├─ Bad deployment │ └─ Service supports instant rollback? │ ├─ YES → action: "rollback" │ └─ NO │ └─ Change behind feature flag? │ ├─ YES → action: "disable_feature_flag" │ └─ NO → action: "hotfix" Same logic. The tree version gives the model an exact string to output at each leaf node, and the indentation encodes the decision path spatially. Why It Works Narrows attention at each step. The model evaluates one condition at a time instead of holding all rules in attention simultaneously. Provides exact output text. Leaf nodes contain the literal action string — the model copies it rather than composing a new sentence. Encodes hierarchy spatially. Indentation tokens (whitespace) encode parent-child relationships. LLMs learned this pattern from millions of code files, YAML configs, and directory trees during training. When to Use / When Not To Use decision trees when: Your prompt has branching logic (if/else, switch/case) Output is parsed by code (JSON fields, action names, status values) You need consistency across multiple runs You’re building agent orchestration or automation pipelines Skip decision trees when: The task is open-ended creative work (variation is desirable) There’s no branching logic Output is read by humans only (phrasing variation doesn’t matter) Try It Yourself Clone the repo and run: cd eval/decision-tree-ab bash run.sh 20 # 20 runs, default model bash run.sh 10 haiku # 10 runs, haiku model python3 analyze.py # analyze results Three scenarios are included: simple (deployment controller), ambiguous (unknown server status), and complex (incident response with 200+ line prompts). Further Reading Anthropic: Prompt Engineering Best Practices Anthropic: Effective Context Engineering]]></summary></entry><entry xml:lang="zh"><title type="html">提示词工程模式：写出稳定可靠的 AI 提示词</title><link href="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/prompt-engineering-patterns-zh/" rel="alternate" type="text/html" title="提示词工程模式：写出稳定可靠的 AI 提示词" /><published>2026-04-19T00:00:00+00:00</published><updated>2026-04-19T00:00:00+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/2026/04/19/prompt-engineering-patterns-zh</id><content type="html" xml:base="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/prompt-engineering-patterns-zh/"><![CDATA[<p>一份写出稳定、可预测的 AI 提示词的实用指南。</p>

<hr />

<h2 id="核心原则降低不确定性">核心原则：降低不确定性</h2>

<p>AI 模型每生成一个词，都在很多候选词中做选择。<strong>你的提示词结构直接决定了模型在关键节点是”犹豫”还是”确定”。</strong></p>

<ul>
  <li>确定性高 → 行为稳定</li>
  <li>确定性低 → 行为飘忽</li>
</ul>

<p><strong>下面所有技巧都在做同一件事：让模型在关键决策点更确定。</strong></p>

<h3 id="简单例子">简单例子</h3>

<p>你的提示词需要模型根据环境决定是否询问用户。</p>

<p><strong>自然语言版本：</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>如果在 CI 环境中且没有参数，使用默认值。
如果在交互模式中有参数，直接运行。
如果在交互模式中没有参数，询问用户。
</code></pre></div></div>

<p>模型读完后，3 条规则同时争夺注意力。它需要回头扫描才能拼出答案。</p>

<p><strong>决策树版本：</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## CI 环境？
├─ 是
│  └─ 有参数？
│     ├─ 是 → 使用参数，执行
│     └─ 否 → 使用默认值，执行
└─ 否（交互模式）
   └─ 有参数？
      ├─ 是 → 使用参数，执行
      └─ 否 → 询问用户
</code></pre></div></div>

<p>模型判断 “CI = 是” 且 “参数 = 否” 后，注意力集中在这一行：</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>│     └─ 否 → 使用默认值，执行
              ↑ 注意力集中在这里
</code></pre></div></div>

<p>答案就在眼前，不需要回头看。确定性很高。</p>

<h3 id="一句话总结">一句话总结</h3>

<blockquote>
  <p>决策树让模型看几个附近的词就知道该做什么。自然语言让它扫一整段才能拼出答案。搜索范围越小，结果越确定。</p>
</blockquote>

<hr />

<h2 id="模式-1决策树替代自然语言">模式 1：决策树替代自然语言</h2>

<p>对有分支逻辑的指令，用可视化树结构替代文字描述。</p>

<h3 id="不好自然语言">不好：自然语言</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>如果在 CI 环境中且没有参数，使用默认值。如果在交互模式中有参数，
直接运行。如果在交互模式中没有参数，询问用户。
</code></pre></div></div>

<h3 id="好决策树">好：决策树</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## $ARGUMENTS 非空？
├─ 是 → 解析参数，直接执行，不交互
└─ 否
   ## $CI 或 $CLAUDE_NONINTERACTIVE 已设置？
   ├─ 是 → 使用 &lt;defaults&gt; 的值，直接执行
   └─ 否 → 询问用户缺少的参数，然后执行
</code></pre></div></div>

<p><strong>为什么有效：</strong> 缩进编码了层级关系。模型在训练中见过大量缩进结构（代码、YAML、目录树），学会了”缩进越深 = 子条件”。自然语言没有这种空间编码。</p>

<hr />

<h2 id="模式-2锚定给起点">模式 2：锚定（给起点）</h2>

<p>给模型一个具体的起点，不让它凭空发挥。</p>

<h3 id="不好无锚定">不好：无锚定</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>生成一个部署脚本。
</code></pre></div></div>

<h3 id="好用模板锚定">好：用模板锚定</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>基于这个模板生成部署脚本：
&lt;template&gt;
#!/bin/bash
set -euo pipefail
ENV="${1:?Usage: deploy.sh &lt;env&gt;}"
# ... 你的步骤
&lt;/template&gt;
</code></pre></div></div>

<p><strong>为什么有效：</strong> 模板的内容直接参与模型的注意力计算。模型的输出会被”拉向”模板的风格，而不是从”部署脚本”这个笼统概念中随机生成。</p>

<hr />

<h2 id="模式-3认知卸载把思考步骤写出来">模式 3：认知卸载（把思考步骤写出来）</h2>

<p>把模型本来需要隐式推理的步骤显式地写出来。</p>

<h3 id="不好隐式推理">不好：隐式推理</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>分析这段代码的性能问题并修复。
</code></pre></div></div>

<h3 id="好显式步骤">好：显式步骤</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;analysis_steps&gt;
1. 找出所有循环和递归
2. 标注每个的时间复杂度
3. 标记 O(n²) 或更高的
4. 为每个标记的部分提出优化方案
&lt;/analysis_steps&gt;
按顺序执行这些步骤。
</code></pre></div></div>

<p><strong>为什么有效：</strong> LLM 没有真正的工作记忆。把中间步骤写出来等于给了”外部记忆”——每一步只需要看上一步的输出，不需要从头推导。</p>

<p>决策树 = 分支逻辑的认知卸载。
思维链 = 推理过程的认知卸载。
同一个原理，不同应用。</p>

<hr />

<h2 id="模式-4注意力局部性把相关的放在一起">模式 4：注意力局部性（把相关的放在一起）</h2>

<p>相关信息在文本中应该靠近。越近的词获得越强的注意力。</p>

<h3 id="不好规则离目标太远">不好：规则离目标太远</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;rules&gt;永远不要删除生产数据库&lt;/rules&gt;
...（中间隔了 500 个词）...
&lt;task&gt;清理过期数据&lt;/task&gt;
</code></pre></div></div>

<h3 id="好规则紧挨目标">好：规则紧挨目标</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;task&gt;
清理过期数据
&lt;constraint&gt;永远不要删除生产数据库&lt;/constraint&gt;
&lt;/task&gt;
</code></pre></div></div>

<p><strong>为什么有效：</strong> Transformer 的注意力理论上是全局的，但实际上有位置偏好——近的词注意力更强。把约束放在它约束的动作旁边，不要放在远处的”通用规则”里。</p>

<hr />

<h2 id="模式-5指令-动作绑定一条指令一个动作">模式 5：指令-动作绑定（一条指令一个动作）</h2>

<p>每条指令应该尽可能直接对应一个可执行动作。</p>

<h3 id="不好一句话多个动作">不好：一句话多个动作</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>检查代码风格问题并修复，然后运行测试确保通过。
</code></pre></div></div>

<h3 id="好一条指令--一个动作">好：一条指令 = 一个动作</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. 运行：`eslint --fix src/`
2. 运行：`npm test`
3. 如果测试失败 → 读错误输出，修复问题，回到步骤 2
</code></pre></div></div>

<p><strong>为什么有效：</strong> 模型把一条清晰指令映射到一个工具调用的可靠性，远高于从一个长句中提取多个隐含动作。</p>

<hr />

<h2 id="模式-6输出格式预设给输出一个形状">模式 6：输出格式预设（给输出一个”形状”）</h2>

<p>给模型一个输出的结构，它来填内容。</p>

<h3 id="不好开放式">不好：开放式</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>分析这个 PR 的风险。
</code></pre></div></div>

<h3 id="好结构约束">好：结构约束</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;output_schema&gt;
- risk_level: high | medium | low
- affected_files: [列表]
- rollback_plan: [字符串]
- requires_review: true | false
&lt;/output_schema&gt;
</code></pre></div></div>

<p><strong>为什么有效：</strong> 结构定义就像”铁轨”。生成每个字段值时，模型的注意力被字段名强力引导，大幅减少偏离。</p>

<hr />

<h2 id="模式-7负空间说不要的同时说要">模式 7：负空间（说”不要”的同时说”要”）</h2>

<p>告诉模型不该做什么时，永远同时告诉它该做什么。</p>

<h3 id="不好只说不要">不好：只说不要</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>不要直接修改数据库。
不要跳过测试。
不要用 sudo。
</code></pre></div></div>

<h3 id="好不要--替代方案">好：不要 + 替代方案</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;boundaries&gt;
- 数据库变更 → 生成迁移文件，不要执行原始 SQL
- 需要验证 → 运行完整测试套件再继续，不要跳过
- 需要提权 → 请求用户确认，不要用 sudo
&lt;/boundaries&gt;
</code></pre></div></div>

<p><strong>为什么有效：</strong> “不要做 X” 只压制了某些输出，但没有推动任何替代方案。模型知道不往哪走，但不知道往哪走。同时给出替代方案就能同时压制错误路径并推动正确路径。</p>

<hr />

<h2 id="模式-8xml-标签做语义分区">模式 8：XML 标签做语义分区</h2>

<p>Claude 的训练数据中包含 XML 标签。用它们来划分提示词的不同部分。</p>

<h3 id="推荐的提示词结构">推荐的提示词结构</h3>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;context&gt;</span>
模型需要了解的背景信息。
<span class="nt">&lt;/context&gt;</span>

<span class="nt">&lt;parameters&gt;</span>
输入参数，包含类型、默认值、来源。
<span class="nt">&lt;/parameters&gt;</span>

<span class="nt">&lt;decision_tree&gt;</span>
可视化的分支逻辑，每个叶子有明确动作。
<span class="nt">&lt;/decision_tree&gt;</span>

<span class="nt">&lt;examples&gt;</span>
<span class="nt">&lt;example&gt;</span>
<span class="nt">&lt;input&gt;</span>...<span class="nt">&lt;/input&gt;</span>
<span class="nt">&lt;thinking&gt;</span>模型应该遵循的逐步推理<span class="nt">&lt;/thinking&gt;</span>
<span class="nt">&lt;output&gt;</span>...<span class="nt">&lt;/output&gt;</span>
<span class="nt">&lt;/example&gt;</span>
<span class="nt">&lt;/examples&gt;</span>

<span class="nt">&lt;boundaries&gt;</span>
不要做什么 + 应该做什么。
<span class="nt">&lt;/boundaries&gt;</span>

<span class="nt">&lt;output_schema&gt;</span>
期望的输出格式。
<span class="nt">&lt;/output_schema&gt;</span>
</code></pre></div></div>

<p><strong>为什么有效：</strong> XML 标签创建硬性的语义边界。模型把不同标签内的内容当作独立的区块，减少指令、示例、约束之间的互相干扰。</p>

<hr />

<h2 id="模式-9带推理过程的示例">模式 9：带推理过程的示例</h2>

<p>让模型看到<strong>怎么想</strong>，不只是<strong>输出什么</strong>。</p>

<h3 id="不好只有输入输出">不好：只有输入/输出</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;example&gt;
&lt;input&gt;deploy staging&lt;/input&gt;
&lt;output&gt;已部署到 staging。&lt;/output&gt;
&lt;/example&gt;
</code></pre></div></div>

<h3 id="好输入--思考过程--输出">好：输入 + 思考过程 + 输出</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;example&gt;
&lt;input&gt;deploy staging&lt;/input&gt;
&lt;thinking&gt;
1. 提供了参数："staging" → 非空 → 跳过用户交互
2. 环境 "staging" 有效（匹配 staging|production）
3. 未检测到 CI 变量 → 但有参数 → 静默执行
4. 执行部署到 staging
&lt;/thinking&gt;
&lt;output&gt;已成功部署到 staging。&lt;/output&gt;
&lt;/example&gt;
</code></pre></div></div>

<p><strong>为什么有效：</strong> 示例中的 <code class="language-plaintext highlighter-rouge">&lt;thinking&gt;</code> 模式会被泛化到模型自己的推理中。它学到的是推理方式，不只是输出格式。</p>

<hr />

<h2 id="各模式之间的关系">各模式之间的关系</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>行为稳定性
    ↑
决策点的确定性高
    ↑
注意力分布集中
    ↑
提示词中的词语空间排列
    ↑
┌──────────┬──────────┬──────────┬──────────┐
│ 决策树    │ 注意力   │ 认知卸载  │ 输出格式  │
│          │ 局部性    │          │ 预设     │
├──────────┼──────────┼──────────┼──────────┤
│ 锚定     │ 指令-动作 │ 负空间    │ XML      │
│          │ 绑定      │          │ 标签     │
├──────────┼──────────┼──────────┼──────────┤
│ 带推理的  │          │          │          │
│ 示例     │          │          │          │
└──────────┴──────────┴──────────┴──────────┘

所有技巧都在做同一件事：
改变生成时注意力在各个词上的分布。
</code></pre></div></div>

<hr />

<h2 id="参考资料">参考资料</h2>

<ul>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices">Anthropic 提示词工程最佳实践</a></li>
  <li><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic 上下文工程</a></li>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags">XML 标签使用指南</a></li>
  <li><a href="https://code.claude.com/docs/en/skills">Claude Code Skills 文档</a></li>
  <li><a href="https://github.com/Piebald-AI/claude-code-system-prompts">Claude Code 系统提示词（社区）</a></li>
  <li><a href="https://github.com/travisvn/awesome-claude-skills">Claude Skills 精选合集</a></li>
</ul>]]></content><author><name></name></author><category term="prompt-engineering" /><category term="patterns" /><summary type="html"><![CDATA[一份写出稳定、可预测的 AI 提示词的实用指南。 核心原则：降低不确定性 AI 模型每生成一个词，都在很多候选词中做选择。你的提示词结构直接决定了模型在关键节点是”犹豫”还是”确定”。 确定性高 → 行为稳定 确定性低 → 行为飘忽 下面所有技巧都在做同一件事：让模型在关键决策点更确定。 简单例子 你的提示词需要模型根据环境决定是否询问用户。 自然语言版本： 如果在 CI 环境中且没有参数，使用默认值。 如果在交互模式中有参数，直接运行。 如果在交互模式中没有参数，询问用户。 模型读完后，3 条规则同时争夺注意力。它需要回头扫描才能拼出答案。 决策树版本： ## CI 环境？ ├─ 是 │ └─ 有参数？ │ ├─ 是 → 使用参数，执行 │ └─ 否 → 使用默认值，执行 └─ 否（交互模式） └─ 有参数？ ├─ 是 → 使用参数，执行 └─ 否 → 询问用户 模型判断 “CI = 是” 且 “参数 = 否” 后，注意力集中在这一行： │ └─ 否 → 使用默认值，执行 ↑ 注意力集中在这里 答案就在眼前，不需要回头看。确定性很高。 一句话总结 决策树让模型看几个附近的词就知道该做什么。自然语言让它扫一整段才能拼出答案。搜索范围越小，结果越确定。 模式 1：决策树替代自然语言 对有分支逻辑的指令，用可视化树结构替代文字描述。 不好：自然语言 如果在 CI 环境中且没有参数，使用默认值。如果在交互模式中有参数， 直接运行。如果在交互模式中没有参数，询问用户。 好：决策树 ## $ARGUMENTS 非空？ ├─ 是 → 解析参数，直接执行，不交互 └─ 否 ## $CI 或 $CLAUDE_NONINTERACTIVE 已设置？ ├─ 是 → 使用 &lt;defaults&gt; 的值，直接执行 └─ 否 → 询问用户缺少的参数，然后执行 为什么有效： 缩进编码了层级关系。模型在训练中见过大量缩进结构（代码、YAML、目录树），学会了”缩进越深 = 子条件”。自然语言没有这种空间编码。 模式 2：锚定（给起点） 给模型一个具体的起点，不让它凭空发挥。 不好：无锚定 生成一个部署脚本。 好：用模板锚定 基于这个模板生成部署脚本： &lt;template&gt; #!/bin/bash set -euo pipefail ENV="${1:?Usage: deploy.sh &lt;env&gt;}" # ... 你的步骤 &lt;/template&gt; 为什么有效： 模板的内容直接参与模型的注意力计算。模型的输出会被”拉向”模板的风格，而不是从”部署脚本”这个笼统概念中随机生成。 模式 3：认知卸载（把思考步骤写出来） 把模型本来需要隐式推理的步骤显式地写出来。 不好：隐式推理 分析这段代码的性能问题并修复。 好：显式步骤 &lt;analysis_steps&gt; 1. 找出所有循环和递归 2. 标注每个的时间复杂度 3. 标记 O(n²) 或更高的 4. 为每个标记的部分提出优化方案 &lt;/analysis_steps&gt; 按顺序执行这些步骤。 为什么有效： LLM 没有真正的工作记忆。把中间步骤写出来等于给了”外部记忆”——每一步只需要看上一步的输出，不需要从头推导。 决策树 = 分支逻辑的认知卸载。 思维链 = 推理过程的认知卸载。 同一个原理，不同应用。 模式 4：注意力局部性（把相关的放在一起） 相关信息在文本中应该靠近。越近的词获得越强的注意力。 不好：规则离目标太远 &lt;rules&gt;永远不要删除生产数据库&lt;/rules&gt; ...（中间隔了 500 个词）... &lt;task&gt;清理过期数据&lt;/task&gt; 好：规则紧挨目标 &lt;task&gt; 清理过期数据 &lt;constraint&gt;永远不要删除生产数据库&lt;/constraint&gt; &lt;/task&gt; 为什么有效： Transformer 的注意力理论上是全局的，但实际上有位置偏好——近的词注意力更强。把约束放在它约束的动作旁边，不要放在远处的”通用规则”里。 模式 5：指令-动作绑定（一条指令一个动作） 每条指令应该尽可能直接对应一个可执行动作。 不好：一句话多个动作 检查代码风格问题并修复，然后运行测试确保通过。 好：一条指令 = 一个动作 1. 运行：`eslint --fix src/` 2. 运行：`npm test` 3. 如果测试失败 → 读错误输出，修复问题，回到步骤 2 为什么有效： 模型把一条清晰指令映射到一个工具调用的可靠性，远高于从一个长句中提取多个隐含动作。 模式 6：输出格式预设（给输出一个”形状”） 给模型一个输出的结构，它来填内容。 不好：开放式 分析这个 PR 的风险。 好：结构约束 &lt;output_schema&gt; - risk_level: high | medium | low - affected_files: [列表] - rollback_plan: [字符串] - requires_review: true | false &lt;/output_schema&gt; 为什么有效： 结构定义就像”铁轨”。生成每个字段值时，模型的注意力被字段名强力引导，大幅减少偏离。 模式 7：负空间（说”不要”的同时说”要”） 告诉模型不该做什么时，永远同时告诉它该做什么。 不好：只说不要 不要直接修改数据库。 不要跳过测试。 不要用 sudo。 好：不要 + 替代方案 &lt;boundaries&gt; - 数据库变更 → 生成迁移文件，不要执行原始 SQL - 需要验证 → 运行完整测试套件再继续，不要跳过 - 需要提权 → 请求用户确认，不要用 sudo &lt;/boundaries&gt; 为什么有效： “不要做 X” 只压制了某些输出，但没有推动任何替代方案。模型知道不往哪走，但不知道往哪走。同时给出替代方案就能同时压制错误路径并推动正确路径。 模式 8：XML 标签做语义分区 Claude 的训练数据中包含 XML 标签。用它们来划分提示词的不同部分。 推荐的提示词结构 &lt;context&gt; 模型需要了解的背景信息。 &lt;/context&gt; &lt;parameters&gt; 输入参数，包含类型、默认值、来源。 &lt;/parameters&gt; &lt;decision_tree&gt; 可视化的分支逻辑，每个叶子有明确动作。 &lt;/decision_tree&gt; &lt;examples&gt; &lt;example&gt; &lt;input&gt;...&lt;/input&gt; &lt;thinking&gt;模型应该遵循的逐步推理&lt;/thinking&gt; &lt;output&gt;...&lt;/output&gt; &lt;/example&gt; &lt;/examples&gt; &lt;boundaries&gt; 不要做什么 + 应该做什么。 &lt;/boundaries&gt; &lt;output_schema&gt; 期望的输出格式。 &lt;/output_schema&gt; 为什么有效： XML 标签创建硬性的语义边界。模型把不同标签内的内容当作独立的区块，减少指令、示例、约束之间的互相干扰。 模式 9：带推理过程的示例 让模型看到怎么想，不只是输出什么。 不好：只有输入/输出 &lt;example&gt; &lt;input&gt;deploy staging&lt;/input&gt; &lt;output&gt;已部署到 staging。&lt;/output&gt; &lt;/example&gt; 好：输入 + 思考过程 + 输出 &lt;example&gt; &lt;input&gt;deploy staging&lt;/input&gt; &lt;thinking&gt; 1. 提供了参数："staging" → 非空 → 跳过用户交互 2. 环境 "staging" 有效（匹配 staging|production） 3. 未检测到 CI 变量 → 但有参数 → 静默执行 4. 执行部署到 staging &lt;/thinking&gt; &lt;output&gt;已成功部署到 staging。&lt;/output&gt; &lt;/example&gt; 为什么有效： 示例中的 &lt;thinking&gt; 模式会被泛化到模型自己的推理中。它学到的是推理方式，不只是输出格式。 各模式之间的关系 行为稳定性 ↑ 决策点的确定性高 ↑ 注意力分布集中 ↑ 提示词中的词语空间排列 ↑ ┌──────────┬──────────┬──────────┬──────────┐ │ 决策树 │ 注意力 │ 认知卸载 │ 输出格式 │ │ │ 局部性 │ │ 预设 │ ├──────────┼──────────┼──────────┼──────────┤ │ 锚定 │ 指令-动作 │ 负空间 │ XML │ │ │ 绑定 │ │ 标签 │ ├──────────┼──────────┼──────────┼──────────┤ │ 带推理的 │ │ │ │ │ 示例 │ │ │ │ └──────────┴──────────┴──────────┴──────────┘ 所有技巧都在做同一件事： 改变生成时注意力在各个词上的分布。 参考资料 Anthropic 提示词工程最佳实践 Anthropic 上下文工程 XML 标签使用指南 Claude Code Skills 文档 Claude Code 系统提示词（社区） Claude Skills 精选合集]]></summary></entry><entry><title type="html">Prompt Engineering Patterns for Claude Code Skills</title><link href="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/prompt-engineering-patterns/" rel="alternate" type="text/html" title="Prompt Engineering Patterns for Claude Code Skills" /><published>2026-04-19T00:00:00+00:00</published><updated>2026-04-19T00:00:00+00:00</updated><id>https://wenbo97.github.io/prompt-context-patterns/2026/04/19/prompt-engineering-patterns</id><content type="html" xml:base="https://wenbo97.github.io/prompt-context-patterns/2026/04/19/prompt-engineering-patterns/"><![CDATA[<p>A practical guide to writing stable, predictable skill prompts — grounded in how LLMs actually process tokens.</p>

<hr />

<h2 id="core-principle-reduce-conditional-entropy">Core Principle: Reduce Conditional Entropy</h2>

<p>Every token an LLM generates is a probability distribution over candidates. <strong>Your prompt’s structure directly controls how sharp or diffuse that distribution is at each decision point.</strong></p>

<ul>
  <li>Sharp distribution (low entropy) → deterministic behavior</li>
  <li>Diffuse distribution (high entropy) → unstable, unpredictable behavior</li>
</ul>

<p><strong>Everything below is a technique for sharpening the distribution at the moments that matter.</strong></p>

<h3 id="what-conditional-entropy-actually-means">What “conditional entropy” actually means</h3>

<p>Every time the model generates a token, it faces a set of candidates, each with a probability. If probabilities are spread evenly (e.g., 10 candidates at 10% each), the model is “hesitating” — that’s high entropy. If one candidate sits at 90% and the rest are negligible, the model is “certain” — that’s low entropy.</p>

<p><strong>The way you write your prompt directly determines whether the model hesitates or commits at critical decision points.</strong></p>

<h3 id="concrete-example">Concrete example</h3>

<p>Suppose your skill needs to decide whether to ask the user for input, based on the environment.</p>

<p><strong>Prose version:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>If running in CI with no arguments, use defaults.
If interactive with arguments, run directly.
If interactive without arguments, ask the user.
</code></pre></div></div>

<p>After reading this, the model needs to generate its next action. Here’s what’s happening inside attention:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"CI"              → line 1, beginning     ← attention must look back
"no arguments"    → line 1, middle        ← attention must look back
"use defaults"    → line 1, end           ← attention must look back
"interactive"     → line 2, beginning     ← also competing for attention
"ask the user"    → line 3, end           ← also competing for attention

5 conditions scattered across 3 lines — which should I attend to?
</code></pre></div></div>

<p>Attention is spread across multiple positions → no single condition gets enough weight → the model is uncertain about which path to take → <strong>high entropy → unstable behavior</strong>.</p>

<p><strong>Tree version:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## CI environment?
├─ YES
│  └─ Has arguments?
│     ├─ YES → use arguments, execute
│     └─ NO  → use defaults, execute
└─ NO (interactive)
   └─ Has arguments?
      ├─ YES → use arguments, execute
      └─ NO  → ask user
</code></pre></div></div>

<p>After the model determines “CI = YES” and “arguments = NO”, its attention is here:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>│     └─ NO  → use defaults, execute
              ↑
              attention is concentrated on this line
</code></pre></div></div>

<p>The tokens “use defaults, execute” are <strong>right next to the cursor</strong>. No need to look back anywhere. The model is nearly 100% certain what to do next → <strong>very low entropy → deterministic behavior</strong>.</p>

<h3 id="feel-it-in-numbers">Feel it in numbers</h3>

<p>Prose version — probability distribution at the decision point:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>use arguments, execute:  35%
use defaults, execute:   30%
ask user:                25%
other:                   10%
</code></pre></div></div>

<p>Three options are close in probability. Run 10 times, roughly 3 may go wrong.</p>

<p>Tree version — at the correct branch leaf:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>use defaults, execute:   92%
use arguments, execute:   5%
other:                    3%
</code></pre></div></div>

<p>One option dominates. Run 10 times, 0–1 deviations.</p>

<h3 id="why-indentation-is-information">Why indentation is information</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>└─ NO (interactive)
   └─ Has arguments?
      └─ NO  → ask user
</code></pre></div></div>

<p>The indentation (0 spaces, 3 spaces, 6 spaces) becomes whitespace tokens after tokenization. These whitespace tokens encode <strong>hierarchy</strong> — the model has seen massive amounts of indented structures (source code, YAML, directory trees) during training and has learned: deeper indent = child of parent condition.</p>

<p>Prose has no such spatial encoding. In “If interactive but arguments provided” — the subordination between “interactive” and “arguments provided” must be inferred from natural language grammar alone. That inference itself costs attention and introduces uncertainty.</p>

<h3 id="one-line-summary">One-line summary</h3>

<blockquote>
  <p>A tree lets the model look at a few nearby tokens to know what to do. Prose forces it to scan an entire paragraph to piece together the answer. The smaller the search radius, the more certain the outcome.</p>
</blockquote>

<hr />

<h2 id="1-visual-decision-trees-over-prose">1. Visual Decision Trees over Prose</h2>

<p>Claude follows visual tree structures more reliably than prose descriptions of the same logic, because each branch terminates with an explicit action.</p>

<h3 id="why-it-works">Why it works</h3>

<p>Prose scatters conditions across a sentence. The model must attend to multiple distant tokens simultaneously, diluting attention. A tree places the relevant condition and its action adjacent in the token sequence — the model only needs to look at nearby tokens to know what to do.</p>

<h3 id="bad-prose">Bad: prose</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>If running in CI with no arguments, use defaults. If interactive with arguments,
run directly. If interactive without arguments, ask the user.
</code></pre></div></div>

<h3 id="good-decision-tree">Good: decision tree</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## Is $ARGUMENTS non-empty?
├─ YES → parse arguments, execute directly, no interaction
└─ NO
   ## Is $CI or $CLAUDE_NONINTERACTIVE set?
   ├─ YES → use values from &lt;defaults&gt;, execute directly
   └─ NO  → ask user for missing parameters, then execute
</code></pre></div></div>

<h3 id="why-indentation-matters">Why indentation matters</h3>

<p>Indentation tokens encode hierarchy. Models have seen massive amounts of indented structures (code, YAML, directory trees) during training and have learned that deeper indent = child of parent condition. Prose has no such spatial encoding — the model must infer nesting from natural language grammar, which costs attention and introduces uncertainty.</p>

<hr />

<h2 id="2-grounding-anchoring">2. Grounding (Anchoring)</h2>

<p>Give the model a concrete starting point instead of letting it sample from an infinite space.</p>

<h3 id="bad-unanchored">Bad: unanchored</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Generate a deployment script.
</code></pre></div></div>

<h3 id="good-anchored-with-template">Good: anchored with template</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Generate a deployment script based on this template:
&lt;template&gt;
#!/bin/bash
set -euo pipefail
ENV="${1:?Usage: deploy.sh &lt;env&gt;}"
# ... your steps here
&lt;/template&gt;
</code></pre></div></div>

<p><strong>Why:</strong> Template tokens directly participate in attention — the model’s output is “pulled toward” the template’s distribution rather than sampling from the generic concept of “deployment script.”</p>

<hr />

<h2 id="3-cognitive-offloading">3. Cognitive Offloading</h2>

<p>Externalize reasoning steps that the model would otherwise have to perform implicitly.</p>

<h3 id="bad-implicit-reasoning-required">Bad: implicit reasoning required</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Analyze this code's performance issues and fix them.
</code></pre></div></div>

<h3 id="good-explicit-steps-provided">Good: explicit steps provided</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;analysis_steps&gt;
1. Identify all loops and recursion
2. Annotate each with time complexity
3. Flag anything O(n²) or higher
4. Propose optimization for each flagged section
&lt;/analysis_steps&gt;
Execute these steps in order.
</code></pre></div></div>

<p><strong>Why:</strong> LLMs have no true working memory. Each reasoning step consumes attention resources from the context window. Writing out intermediate steps provides “external memory” — each step only needs to attend to the previous step’s output, not derive everything from scratch.</p>

<p>Decision trees = cognitive offloading for branching logic.
Chain-of-thought = cognitive offloading for reasoning.
Same principle, different applications.</p>

<hr />

<h2 id="4-attention-locality">4. Attention Locality</h2>

<p>Related information should be close together in the token sequence. Closer tokens get higher attention weights in practice.</p>

<h3 id="bad-rule-far-from-its-target">Bad: rule far from its target</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;rules&gt;Never delete production databases&lt;/rules&gt;
... (500 tokens of other content) ...
&lt;task&gt;Clean up expired data&lt;/task&gt;
</code></pre></div></div>

<h3 id="good-rule-adjacent-to-its-target">Good: rule adjacent to its target</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;task&gt;
Clean up expired data
&lt;constraint&gt;Never delete production databases&lt;/constraint&gt;
&lt;/task&gt;
</code></pre></div></div>

<p><strong>Why:</strong> Transformer attention is theoretically global but has positional bias — nearby tokens receive stronger attention scores. Place constraints next to the actions they constrain, not in a distant “general rules” section.</p>

<hr />

<h2 id="5-token-action-binding">5. Token-Action Binding</h2>

<p>Each instruction should map as directly as possible to one executable action.</p>

<h3 id="bad-multiple-implicit-actions-in-one-sentence">Bad: multiple implicit actions in one sentence</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Check code style issues and fix them then run tests and make sure they pass.
</code></pre></div></div>

<h3 id="good-one-instruction--one-action">Good: one instruction = one action</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. Run: `eslint --fix src/`
2. Run: `npm test`
3. If tests fail → read error output, fix the issue, go to step 2
</code></pre></div></div>

<p><strong>Why:</strong> The model maps a single clear instruction sequence to a single tool call far more reliably than extracting multiple implied actions from a run-on sentence.</p>

<hr />

<h2 id="6-schema-priming">6. Schema Priming</h2>

<p>Give the model an output “shape” and it fills in the content.</p>

<h3 id="bad-open-ended">Bad: open-ended</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Analyze this PR's risk.
</code></pre></div></div>

<h3 id="good-schema-constrained">Good: schema-constrained</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;output_schema&gt;
- risk_level: high | medium | low
- affected_files: [list]
- rollback_plan: [string]
- requires_review: true | false
&lt;/output_schema&gt;
</code></pre></div></div>

<p><strong>Why:</strong> Schema tokens act as “rails” during decoding. When generating each field value, the model’s attention is strongly guided by the schema key names, drastically reducing drift.</p>

<hr />

<h2 id="7-negative-space-explicit-alternatives">7. Negative Space (Explicit Alternatives)</h2>

<p>When telling the model what NOT to do, always provide what TO DO instead.</p>

<h3 id="bad-negation-only">Bad: negation only</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Don't modify the database directly.
Don't skip tests.
Don't use sudo.
</code></pre></div></div>

<h3 id="good-negation--alternative-path">Good: negation + alternative path</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;boundaries&gt;
- Database changes → generate a migration file, never execute raw SQL
- Validation needed → run full test suite before continuing, never skip
- Elevated permissions → request user confirmation, never use sudo
&lt;/boundaries&gt;
</code></pre></div></div>

<p><strong>Why:</strong> “Don’t do X” only suppresses certain token sequences but doesn’t boost any alternative. The model knows where not to go but not where to go → unstable. Providing the alternative simultaneously suppresses the wrong path and boosts the right one.</p>

<hr />

<h2 id="8-xml-tags-for-semantic-boundaries">8. XML Tags for Semantic Boundaries</h2>

<p>Claude was trained with XML tags in its training data. Use them to delineate prompt sections.</p>

<h3 id="recommended-structure-for-skill-prompts">Recommended structure for skill prompts</h3>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;context&gt;</span>
Background information the model needs to understand the domain.
<span class="nt">&lt;/context&gt;</span>

<span class="nt">&lt;parameters&gt;</span>
Inputs with types, defaults, and sources.
<span class="nt">&lt;/parameters&gt;</span>

<span class="nt">&lt;decision_tree&gt;</span>
Visual branching logic with explicit leaf actions.
<span class="nt">&lt;/decision_tree&gt;</span>

<span class="nt">&lt;examples&gt;</span>
<span class="nt">&lt;example&gt;</span>
<span class="nt">&lt;input&gt;</span>...<span class="nt">&lt;/input&gt;</span>
<span class="nt">&lt;thinking&gt;</span>Step-by-step reasoning the model should follow<span class="nt">&lt;/thinking&gt;</span>
<span class="nt">&lt;output&gt;</span>...<span class="nt">&lt;/output&gt;</span>
<span class="nt">&lt;/example&gt;</span>
<span class="nt">&lt;/examples&gt;</span>

<span class="nt">&lt;boundaries&gt;</span>
What not to do + what to do instead.
<span class="nt">&lt;/boundaries&gt;</span>

<span class="nt">&lt;output_schema&gt;</span>
Expected output shape.
<span class="nt">&lt;/output_schema&gt;</span>
</code></pre></div></div>

<p><strong>Why:</strong> XML tags create hard semantic boundaries. The model treats content inside different tags as distinct sections, reducing cross-contamination between instructions, examples, and constraints.</p>

<hr />

<h2 id="9-few-shot-with-embedded-reasoning">9. Few-Shot with Embedded Reasoning</h2>

<p>Show the model HOW to think, not just WHAT to output.</p>

<h3 id="bad-inputoutput-pairs-only">Bad: input/output pairs only</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;example&gt;
&lt;input&gt;deploy staging&lt;/input&gt;
&lt;output&gt;Deployed to staging.&lt;/output&gt;
&lt;/example&gt;
</code></pre></div></div>

<h3 id="good-input--thinking--output">Good: input + thinking + output</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;example&gt;
&lt;input&gt;deploy staging&lt;/input&gt;
&lt;thinking&gt;
1. Arguments provided: "staging" → non-empty → skip user interaction
2. Environment "staging" is valid (matches staging|production)
3. No CI variable detected → but args present → proceed silently
4. Execute deployment to staging
&lt;/thinking&gt;
&lt;output&gt;Deployed to staging successfully.&lt;/output&gt;
&lt;/example&gt;
</code></pre></div></div>

<p><strong>Why:</strong> The <code class="language-plaintext highlighter-rouge">&lt;thinking&gt;</code> pattern inside few-shot examples gets generalized into the model’s own extended thinking blocks. It learns the reasoning pattern, not just the output pattern.</p>

<hr />

<h2 id="putting-it-all-together-skill-prompt-template">Putting It All Together: Skill Prompt Template</h2>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">my-skill</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">One line that tells Claude WHEN to activate this skill</span>
<span class="na">context</span><span class="pi">:</span> <span class="s">default</span>
<span class="na">allowed-tools</span><span class="pi">:</span> <span class="s">Bash(specific-commands*), Write, Edit</span>
<span class="nn">---</span>

<span class="nt">&lt;context&gt;</span>
What this skill does and why it exists.
Domain-specific background if needed.
<span class="nt">&lt;/context&gt;</span>

<span class="nt">&lt;parameters&gt;</span>
<span class="p">-</span> target: from $ARGUMENTS, or ask user
<span class="p">-</span> env_mode: from $CI or $CLAUDE_NONINTERACTIVE, default "interactive"
<span class="nt">&lt;/parameters&gt;</span>

<span class="nt">&lt;decision_tree&gt;</span>
<span class="gu">## Has $ARGUMENTS?</span>
├─ YES → parse into <span class="sb">`target`</span>, skip interaction
└─ NO
   ## Is env_mode non-interactive?
   ├─ YES → use defaults from <span class="nt">&lt;defaults&gt;</span>, proceed
   └─ NO  → ask user for <span class="sb">`target`</span>, then proceed
<span class="nt">&lt;/decision_tree&gt;</span>

<span class="nt">&lt;defaults&gt;</span>
<span class="p">-</span> target: "staging"
<span class="nt">&lt;/defaults&gt;</span>

<span class="nt">&lt;steps&gt;</span>
<span class="p">1.</span> Validate <span class="sb">`target`</span> against allowed values (staging | production)
<span class="p">2.</span> Run preflight checks: <span class="sb">`npm test`</span>
<span class="p">3.</span> If tests fail → stop, report error, do NOT proceed
<span class="p">4.</span> Execute deployment to <span class="sb">`target`</span>
<span class="p">5.</span> Verify deployment health
<span class="nt">&lt;/steps&gt;</span>

<span class="nt">&lt;examples&gt;</span>
<span class="nt">&lt;example&gt;</span>
<span class="nt">&lt;input&gt;</span>/my-skill production<span class="nt">&lt;/input&gt;</span>
<span class="nt">&lt;thinking&gt;</span>
<span class="p">1.</span> $ARGUMENTS = "production" → non-empty → use directly
<span class="p">2.</span> target = "production" → valid
<span class="p">3.</span> Run tests → pass
<span class="p">4.</span> Deploy to production
<span class="nt">&lt;/thinking&gt;</span>
<span class="nt">&lt;output&gt;</span>Deployed to production. Health check passed.<span class="nt">&lt;/output&gt;</span>
<span class="nt">&lt;/example&gt;</span>
<span class="nt">&lt;/examples&gt;</span>

<span class="nt">&lt;boundaries&gt;</span>
<span class="p">-</span> Never deploy without passing tests → run <span class="sb">`npm test`</span> first, abort on failure
<span class="p">-</span> Never modify .env files → read config from environment variables only
<span class="p">-</span> Never run with sudo → request user confirmation for elevated actions
<span class="nt">&lt;/boundaries&gt;</span>
</code></pre></div></div>

<hr />

<h2 id="summary-of-relationships">Summary of Relationships</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Behavioral Stability
      ↑
Low Conditional Entropy at Decision Points
      ↑
Sharp Attention Distribution
      ↑
Token Spatial Arrangement in Prompt
      ↑
┌─────────────┬──────────────┬───────────────┬──────────────┐
│ Decision    │ Attention    │ Cognitive     │ Schema       │
│ Trees       │ Locality     │ Offloading    │ Priming      │
├─────────────┼──────────────┼───────────────┼──────────────┤
│ Grounding   │ Token-Action │ Negative      │ XML          │
│ (Anchoring) │ Binding      │ Space         │ Boundaries   │
├─────────────┼──────────────┼───────────────┼──────────────┤
│ Few-Shot w/ │              │               │              │
│ Reasoning   │              │               │              │
└─────────────┴──────────────┴───────────────┴──────────────┘

All techniques manipulate the same thing:
how attention is distributed across tokens at generation time.
</code></pre></div></div>

<hr />

<h2 id="references">References</h2>

<ul>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices">Anthropic Prompting Best Practices</a></li>
  <li><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic Context Engineering</a></li>
  <li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags">XML Tags Guide</a></li>
  <li><a href="https://code.claude.com/docs/en/skills">Claude Code Skills Docs</a></li>
  <li><a href="https://github.com/Piebald-AI/claude-code-system-prompts">Claude Code System Prompts (community)</a></li>
  <li><a href="https://github.com/travisvn/awesome-claude-skills">Awesome Claude Skills</a></li>
</ul>]]></content><author><name></name></author><category term="prompt-engineering" /><category term="patterns" /><summary type="html"><![CDATA[A practical guide to writing stable, predictable skill prompts — grounded in how LLMs actually process tokens. Core Principle: Reduce Conditional Entropy Every token an LLM generates is a probability distribution over candidates. Your prompt’s structure directly controls how sharp or diffuse that distribution is at each decision point. Sharp distribution (low entropy) → deterministic behavior Diffuse distribution (high entropy) → unstable, unpredictable behavior Everything below is a technique for sharpening the distribution at the moments that matter. What “conditional entropy” actually means Every time the model generates a token, it faces a set of candidates, each with a probability. If probabilities are spread evenly (e.g., 10 candidates at 10% each), the model is “hesitating” — that’s high entropy. If one candidate sits at 90% and the rest are negligible, the model is “certain” — that’s low entropy. The way you write your prompt directly determines whether the model hesitates or commits at critical decision points. Concrete example Suppose your skill needs to decide whether to ask the user for input, based on the environment. Prose version: If running in CI with no arguments, use defaults. If interactive with arguments, run directly. If interactive without arguments, ask the user. After reading this, the model needs to generate its next action. Here’s what’s happening inside attention: "CI" → line 1, beginning ← attention must look back "no arguments" → line 1, middle ← attention must look back "use defaults" → line 1, end ← attention must look back "interactive" → line 2, beginning ← also competing for attention "ask the user" → line 3, end ← also competing for attention 5 conditions scattered across 3 lines — which should I attend to? Attention is spread across multiple positions → no single condition gets enough weight → the model is uncertain about which path to take → high entropy → unstable behavior. Tree version: ## CI environment? ├─ YES │ └─ Has arguments? │ ├─ YES → use arguments, execute │ └─ NO → use defaults, execute └─ NO (interactive) └─ Has arguments? ├─ YES → use arguments, execute └─ NO → ask user After the model determines “CI = YES” and “arguments = NO”, its attention is here: │ └─ NO → use defaults, execute ↑ attention is concentrated on this line The tokens “use defaults, execute” are right next to the cursor. No need to look back anywhere. The model is nearly 100% certain what to do next → very low entropy → deterministic behavior. Feel it in numbers Prose version — probability distribution at the decision point: use arguments, execute: 35% use defaults, execute: 30% ask user: 25% other: 10% Three options are close in probability. Run 10 times, roughly 3 may go wrong. Tree version — at the correct branch leaf: use defaults, execute: 92% use arguments, execute: 5% other: 3% One option dominates. Run 10 times, 0–1 deviations. Why indentation is information └─ NO (interactive) └─ Has arguments? └─ NO → ask user The indentation (0 spaces, 3 spaces, 6 spaces) becomes whitespace tokens after tokenization. These whitespace tokens encode hierarchy — the model has seen massive amounts of indented structures (source code, YAML, directory trees) during training and has learned: deeper indent = child of parent condition. Prose has no such spatial encoding. In “If interactive but arguments provided” — the subordination between “interactive” and “arguments provided” must be inferred from natural language grammar alone. That inference itself costs attention and introduces uncertainty. One-line summary A tree lets the model look at a few nearby tokens to know what to do. Prose forces it to scan an entire paragraph to piece together the answer. The smaller the search radius, the more certain the outcome. 1. Visual Decision Trees over Prose Claude follows visual tree structures more reliably than prose descriptions of the same logic, because each branch terminates with an explicit action. Why it works Prose scatters conditions across a sentence. The model must attend to multiple distant tokens simultaneously, diluting attention. A tree places the relevant condition and its action adjacent in the token sequence — the model only needs to look at nearby tokens to know what to do. Bad: prose If running in CI with no arguments, use defaults. If interactive with arguments, run directly. If interactive without arguments, ask the user. Good: decision tree ## Is $ARGUMENTS non-empty? ├─ YES → parse arguments, execute directly, no interaction └─ NO ## Is $CI or $CLAUDE_NONINTERACTIVE set? ├─ YES → use values from &lt;defaults&gt;, execute directly └─ NO → ask user for missing parameters, then execute Why indentation matters Indentation tokens encode hierarchy. Models have seen massive amounts of indented structures (code, YAML, directory trees) during training and have learned that deeper indent = child of parent condition. Prose has no such spatial encoding — the model must infer nesting from natural language grammar, which costs attention and introduces uncertainty. 2. Grounding (Anchoring) Give the model a concrete starting point instead of letting it sample from an infinite space. Bad: unanchored Generate a deployment script. Good: anchored with template Generate a deployment script based on this template: &lt;template&gt; #!/bin/bash set -euo pipefail ENV="${1:?Usage: deploy.sh &lt;env&gt;}" # ... your steps here &lt;/template&gt; Why: Template tokens directly participate in attention — the model’s output is “pulled toward” the template’s distribution rather than sampling from the generic concept of “deployment script.” 3. Cognitive Offloading Externalize reasoning steps that the model would otherwise have to perform implicitly. Bad: implicit reasoning required Analyze this code's performance issues and fix them. Good: explicit steps provided &lt;analysis_steps&gt; 1. Identify all loops and recursion 2. Annotate each with time complexity 3. Flag anything O(n²) or higher 4. Propose optimization for each flagged section &lt;/analysis_steps&gt; Execute these steps in order. Why: LLMs have no true working memory. Each reasoning step consumes attention resources from the context window. Writing out intermediate steps provides “external memory” — each step only needs to attend to the previous step’s output, not derive everything from scratch. Decision trees = cognitive offloading for branching logic. Chain-of-thought = cognitive offloading for reasoning. Same principle, different applications. 4. Attention Locality Related information should be close together in the token sequence. Closer tokens get higher attention weights in practice. Bad: rule far from its target &lt;rules&gt;Never delete production databases&lt;/rules&gt; ... (500 tokens of other content) ... &lt;task&gt;Clean up expired data&lt;/task&gt; Good: rule adjacent to its target &lt;task&gt; Clean up expired data &lt;constraint&gt;Never delete production databases&lt;/constraint&gt; &lt;/task&gt; Why: Transformer attention is theoretically global but has positional bias — nearby tokens receive stronger attention scores. Place constraints next to the actions they constrain, not in a distant “general rules” section. 5. Token-Action Binding Each instruction should map as directly as possible to one executable action. Bad: multiple implicit actions in one sentence Check code style issues and fix them then run tests and make sure they pass. Good: one instruction = one action 1. Run: `eslint --fix src/` 2. Run: `npm test` 3. If tests fail → read error output, fix the issue, go to step 2 Why: The model maps a single clear instruction sequence to a single tool call far more reliably than extracting multiple implied actions from a run-on sentence. 6. Schema Priming Give the model an output “shape” and it fills in the content. Bad: open-ended Analyze this PR's risk. Good: schema-constrained &lt;output_schema&gt; - risk_level: high | medium | low - affected_files: [list] - rollback_plan: [string] - requires_review: true | false &lt;/output_schema&gt; Why: Schema tokens act as “rails” during decoding. When generating each field value, the model’s attention is strongly guided by the schema key names, drastically reducing drift. 7. Negative Space (Explicit Alternatives) When telling the model what NOT to do, always provide what TO DO instead. Bad: negation only Don't modify the database directly. Don't skip tests. Don't use sudo. Good: negation + alternative path &lt;boundaries&gt; - Database changes → generate a migration file, never execute raw SQL - Validation needed → run full test suite before continuing, never skip - Elevated permissions → request user confirmation, never use sudo &lt;/boundaries&gt; Why: “Don’t do X” only suppresses certain token sequences but doesn’t boost any alternative. The model knows where not to go but not where to go → unstable. Providing the alternative simultaneously suppresses the wrong path and boosts the right one. 8. XML Tags for Semantic Boundaries Claude was trained with XML tags in its training data. Use them to delineate prompt sections. Recommended structure for skill prompts &lt;context&gt; Background information the model needs to understand the domain. &lt;/context&gt; &lt;parameters&gt; Inputs with types, defaults, and sources. &lt;/parameters&gt; &lt;decision_tree&gt; Visual branching logic with explicit leaf actions. &lt;/decision_tree&gt; &lt;examples&gt; &lt;example&gt; &lt;input&gt;...&lt;/input&gt; &lt;thinking&gt;Step-by-step reasoning the model should follow&lt;/thinking&gt; &lt;output&gt;...&lt;/output&gt; &lt;/example&gt; &lt;/examples&gt; &lt;boundaries&gt; What not to do + what to do instead. &lt;/boundaries&gt; &lt;output_schema&gt; Expected output shape. &lt;/output_schema&gt; Why: XML tags create hard semantic boundaries. The model treats content inside different tags as distinct sections, reducing cross-contamination between instructions, examples, and constraints. 9. Few-Shot with Embedded Reasoning Show the model HOW to think, not just WHAT to output. Bad: input/output pairs only &lt;example&gt; &lt;input&gt;deploy staging&lt;/input&gt; &lt;output&gt;Deployed to staging.&lt;/output&gt; &lt;/example&gt; Good: input + thinking + output &lt;example&gt; &lt;input&gt;deploy staging&lt;/input&gt; &lt;thinking&gt; 1. Arguments provided: "staging" → non-empty → skip user interaction 2. Environment "staging" is valid (matches staging|production) 3. No CI variable detected → but args present → proceed silently 4. Execute deployment to staging &lt;/thinking&gt; &lt;output&gt;Deployed to staging successfully.&lt;/output&gt; &lt;/example&gt; Why: The &lt;thinking&gt; pattern inside few-shot examples gets generalized into the model’s own extended thinking blocks. It learns the reasoning pattern, not just the output pattern. Putting It All Together: Skill Prompt Template --- name: my-skill description: One line that tells Claude WHEN to activate this skill context: default allowed-tools: Bash(specific-commands*), Write, Edit --- &lt;context&gt; What this skill does and why it exists. Domain-specific background if needed. &lt;/context&gt; &lt;parameters&gt; - target: from $ARGUMENTS, or ask user - env_mode: from $CI or $CLAUDE_NONINTERACTIVE, default "interactive" &lt;/parameters&gt; &lt;decision_tree&gt; ## Has $ARGUMENTS? ├─ YES → parse into `target`, skip interaction └─ NO ## Is env_mode non-interactive? ├─ YES → use defaults from &lt;defaults&gt;, proceed └─ NO → ask user for `target`, then proceed &lt;/decision_tree&gt; &lt;defaults&gt; - target: "staging" &lt;/defaults&gt; &lt;steps&gt; 1. Validate `target` against allowed values (staging | production) 2. Run preflight checks: `npm test` 3. If tests fail → stop, report error, do NOT proceed 4. Execute deployment to `target` 5. Verify deployment health &lt;/steps&gt; &lt;examples&gt; &lt;example&gt; &lt;input&gt;/my-skill production&lt;/input&gt; &lt;thinking&gt; 1. $ARGUMENTS = "production" → non-empty → use directly 2. target = "production" → valid 3. Run tests → pass 4. Deploy to production &lt;/thinking&gt; &lt;output&gt;Deployed to production. Health check passed.&lt;/output&gt; &lt;/example&gt; &lt;/examples&gt; &lt;boundaries&gt; - Never deploy without passing tests → run `npm test` first, abort on failure - Never modify .env files → read config from environment variables only - Never run with sudo → request user confirmation for elevated actions &lt;/boundaries&gt; Summary of Relationships Behavioral Stability ↑ Low Conditional Entropy at Decision Points ↑ Sharp Attention Distribution ↑ Token Spatial Arrangement in Prompt ↑ ┌─────────────┬──────────────┬───────────────┬──────────────┐ │ Decision │ Attention │ Cognitive │ Schema │ │ Trees │ Locality │ Offloading │ Priming │ ├─────────────┼──────────────┼───────────────┼──────────────┤ │ Grounding │ Token-Action │ Negative │ XML │ │ (Anchoring) │ Binding │ Space │ Boundaries │ ├─────────────┼──────────────┼───────────────┼──────────────┤ │ Few-Shot w/ │ │ │ │ │ Reasoning │ │ │ │ └─────────────┴──────────────┴───────────────┴──────────────┘ All techniques manipulate the same thing: how attention is distributed across tokens at generation time. References Anthropic Prompting Best Practices Anthropic Context Engineering XML Tags Guide Claude Code Skills Docs Claude Code System Prompts (community) Awesome Claude Skills]]></summary></entry></feed>