Category 1: Structural Scaffolding

How a skill prompt is organized — the skeleton that holds everything together.

Related foundational techniques: Schema Priming, XML Tags for Semantic Boundaries (see prompt-engineering-for-skills.md)


Pattern 1: YAML Frontmatter Metadata Block

Prevalence: ~100% of skills (2,290+ files) Related patterns: Activation Scope, Tool Routing Tables

What it is: A structured YAML block at the top of every SKILL.md that declares the skill’s identity, trigger conditions, tool permissions, arguments, and (in advanced cases) input/output schemas. The frontmatter is parsed by the platform, not by the LLM — it controls what the skill CAN do before the model even sees the body.

When to use:

Positive Example

---
name: specification-writing
description: "Use when writing a product spec, feature spec, API contract, agent task spec, or any other specification where a zero-question document is required. Encodes outcome-first methodology, acceptance criteria taxonomy, scope boundary protocol, executor context model, and ambiguity resolution framework."
version: "1.3.0"
type: "codex"
tags: ["Problem Shaping", "Execution"]
created: "2026-02-19"
valid_until: "2026-08-19"
derived_from: "shared/toolkits/skills/specification_writing.md"
tested_with: ["Claude Sonnet 4.6", "Claude Opus 4.6"]
license: "MIT"
capability_summary: "Produces a Zero-Question Specification where every assumption is surfaced, every ambiguity is resolved or explicitly marked TBD, every acceptance criterion is binary-testable, and every scope boundary names the adjacent capability it excludes."
input_schema:
  feature_or_capability: "string  what is being specified"
  spec_type: "enum[product_feature, api_contract, agent_task, process_workflow, infrastructure_migration, research_discovery]"
  outcome: "string  the desired user or business outcome"
  executor: "string  optional, who will build this"
  constraints: "string  optional, technical constraints, dependencies, timeline"
  prior_context: "string  optional, existing docs, prior specs, stakeholder decisions"
output_schema:
  outcome_definition: "What success looks like, with binary-testable criteria"
  scope: "In-scope items with explicit out-of-scope boundaries"
  acceptance_criteria: "Binary-testable criteria per requirement, tagged by confidence"
  dependencies: "External dependencies, blocking decisions, TBD items with owners"
  failure_conditions: "What happens when things go wrong, with mitigation strategies"
  assumption_registry: "Load-bearing assumptions with confidence annotations"
  zero_question_score: "Computed completeness score after Step 6 audit"
  self_critique: ">=3 genuine weaknesses in this specification"
---

Why this works: Every field serves a purpose. The description tells the platform exactly when to trigger. input_schema and output_schema create a contract — the model knows what it receives and what it must produce. Version tracking, test coverage (tested_with), and expiration (valid_until) support operational hygiene. The schema fields use typed definitions (enum[...], string — description) that double as documentation.

Negative Example

---
name: spec-writer
description: Writes specs
---

Write a specification for whatever the user asks about. Make it detailed and comprehensive.

Why this fails: The description is too vague — the platform can’t distinguish this from any other writing skill, leading to false activations. No input/output schema means the model invents different structures each run. No version, no expiration, no test record — impossible to track regressions. The body is a single vague sentence that gives the model no structure to follow.


Pattern 2: Phased/Stepped Execution Flow

Prevalence: ~54% of skills (1,245 files) Related patterns: Confirmation Gates, Progress Feedback, Workflow Mode Branching

What it is: Breaking the skill into numbered, sequential phases or steps that must be executed in order. Each phase has a clear goal, specific actions, and defined outputs. This is the most dominant structural pattern for anything beyond trivial skills.

When to use:

Positive Example

## Phase 0: Language Detection

**Goal**: Determine the project's primary language to route to the correct pipeline

**Actions**:
1. Invoke `detect-project-language` skill
2. Read `.analysis/detect_project_language/language_detection.json`
3. If `primaryLanguage == "csharp"` → proceed to **C# Pipeline**
4. If `primaryLanguage == "powershell"` → proceed to **PowerShell Pipeline**
5. If `primaryLanguage == "unknown"` → stop and report to user

---

## C# Phase 1: Discovery

**Goal**: Understand file(s) that need to be reviewed

**Actions**:
1. Create todo list with all C# phases
2. Find .cs files under the target folder or belonging to the project

---

## C# Phase 2: Understand Codebase Architectures

**Goal**: Determine if the project is an internal infrastructure service and generate architecture context

**Actions**:
1. Check if INPUT's directory contains **src/sources** — if yes, it is internal infrastructure code
2. Use `understand-codebase-architecture` skill to understand the high-level architecture
3. Write a `architecture_{INPUT}.md` file to `.analysis/architecture/`

---

## C# Phase 3: Taint Analysis

**Goal**: Analyze each file for taint analysis and write security reports in parallel

**Actions**:
1. Launch taint-analyzer agents in parallel based on number of files, up to 64 agents

Why this works: Each phase has a named Goal (what), numbered Actions (how), and clear transition logic (when to proceed, when to stop). The Goal/Actions split means the model understands intent even if it needs to adapt the specific commands. Phase numbering creates a progress tracker the model can reference.

Negative Example

Analyze the code for security issues. First figure out what language it is, then look for
vulnerabilities. Check for taint analysis issues and also understand the architecture.
Write reports for everything you find. Make sure to be thorough and check all the files.

Why this fails: All actions are compressed into a single paragraph with no sequencing. The model can’t tell what depends on what — does architecture understanding come before or after taint analysis? There’s no stopping condition for language detection failure. “Be thorough” is not an executable instruction. Each run will execute steps in a different order with different thoroughness.

Phase 1 as mandatory research (mcp-builder / skill-creator)

A specific variant repeatedly appears in Anthropic’s mcp-builder and skill-creator skills: Phase 1 is reserved for exhaustive research, and no implementation may begin until Phase 1 outputs exist.

## Phase 1 (MANDATORY RESEARCH — no code yet)

Goal: produce `research.md` covering every API surface we'll touch.

Actions:
1. Fetch the upstream protocol docs (full read, not just headers).
2. Fetch the SDK reference for the target language; read every type used.
3. Fetch 2+ reference implementations; note divergent patterns.
4. Write `research.md`:
     - API surface table (endpoint, params, return shape)
     - Auth flow
     - Rate limits and pagination
     - Known pitfalls + error semantics

Phase 1 is COMPLETE only when `research.md` exists AND covers every item the
plan in Phase 2 will need. If Phase 2 needs something `research.md` doesn't
have, return to Phase 1.

The “no code yet” prohibition and the closing loop (“return to Phase 1”) are load-bearing — they prevent the common failure mode where the model writes 80% of an MCP server, hits an undocumented edge case, and then can’t recover without restarting.


Pattern 3: Workflow Mode Branching

Prevalence: ~5% of skills (100-150 files) Related patterns: Phased Execution, Intent Classification, $ARGUMENTS Pattern

What it is: Defining multiple execution modes within a single skill, each with different phase flows, guardrails, and output depth. The mode is selected based on the user’s role, arguments, or context.

When to use:

Positive Example

## Mode: Partner vs OCE

| | **Partner mode** (default) | **OCE mode** |
|---|---|---|
| **User** | Customer/partner team | Cosmic App Deployment OCE |
| **Phases** | 0 → 1 → 2(a,b,c,f Step 1, Step 4) → 4 → 8 | 0 → 1 → 2 → 3 → 4 → 4.5 → 5 → 6 → 7 → 8 |
| **Gates** | None — runs end-to-end uninterrupted | G1, G2, G3, G4 |
| **IncidentTracker correlation** | Mandatory — runs automatically | Mandatory — runs automatically |
| **Phase 4.5 (deep dive)** | Skipped | On demand |
| **Phase 5 (mitigations)** | Skipped (platform-internal) | Full |
| **Phase 6/7 (PRs/TSGs)** | Skipped | On demand |
| **Report content** | Partner-friendly: error summary, runbook links, customer actions only | Full: all buckets, platform analysis, code traces |
| **R8/R9 rules** | Disabled — no hypothesis prompts | Enabled |

Why this works: A comparison table makes mode differences instantly scannable. Each dimension (phases, gates, report content, rule activation) is explicitly specified for both modes — no ambiguity about what’s included or excluded. The default mode is marked, so the model doesn’t need to guess when no mode is specified.

Negative Example

If the user is a partner, make the output simpler and skip the technical details.
If the user is an OCE engineer, include everything. Use your judgment about what
to include based on who you think the user is.

Why this fails: “Simpler” and “everything” are undefined. “Use your judgment” creates non-deterministic behavior — each run will include/exclude different sections. There’s no way to verify which mode was selected or whether the right phases executed. The model must infer the user’s role rather than having it declared.


Pattern 4: $ARGUMENTS Variable Pattern

Prevalence: ~7% of skills (169 files) Related patterns: Configuration Persistence, Intent Classification

What it is: Using the platform-injected $ARGUMENTS placeholder to receive user input at invocation time, then parsing it for structured data, flags, and options.

When to use:

Positive Example

## Arguments
$ARGUMENTS

Parse the arguments. Also check for these flags:
- `--json` — generate structured JSON output (see Step 7.8)
- `--force` — force a full review, ignoring cache and skipping early returns
- `--api-only` — skip local repo detection, use ADO API for everything

Why this works: The flags are enumerated with exact names and clear effects. Each flag maps to a specific behavioral change referenced later in the skill (Step 7.8 for JSON output). The model knows exactly what to look for in the arguments string and what each flag means.

Negative Example

The user will provide some arguments. Parse them and figure out what they want.
Handle any options they might pass.

Why this fails: No flag names are defined, so the model invents its own argument syntax. “Figure out what they want” is an open-ended NLU task, not argument parsing. There’s no connection between parsed flags and skill behavior — even if the model parses correctly, it doesn’t know what to do with the results.


Pattern 151: HARD-GATE Block Tag

Prevalence: Multi-source (3+ uses across superpowers): brainstorming, using-superpowers, executing-plans (varying tag names: HARD-GATE, SUBAGENT-STOP, BLOCKING) Related patterns: Negative Constraints, Confirmation Gates, Iron-Law Inviolable Rule Framing

What it is: A visually distinct, all-caps tag (HARD-GATE, SUBAGENT-STOP, BLOCKING) wrapping a rule that the agent must not pass without satisfying. Distinct from Pattern 6 (which is about what to negate) — this is the visual marker that elevates a single rule above ordinary prose.

Positive Example

<HARD-GATE>
Before invoking any tool that writes to disk, you MUST have either:
  (a) explicit user approval in this turn, OR
  (b) a confirmation gate that has already fired

If neither condition holds: STOP. Output the proposed write as a plan and
ask for approval. Do not call the tool.

This gate cannot be passed by reasoning that the write is "trivial",
"reversible", or "what the user clearly wants".
</HARD-GATE>

Why this works: The XML-style tag creates a strong attention anchor — the model’s training data treats angle-bracket tags as structural boundaries. All-caps HARD-GATE is rare in normal prose, so it doesn’t blend in. The anti-loophole clause prevents the most common rationalizations. Multiple skills using the same tag pattern means the model learns a cross-skill convention.

Negative Example

Important: before writing to disk, make sure you have approval. Don't write
without checking first.

Why this fails: The rule reads like ordinary instructional prose — the model can paraphrase it as “be careful with writes” and continue. No visual anchor means it doesn’t survive attention competition with surrounding context.


Pattern 152: DOT-Graph Decision Flow Embedded in Prompt

Prevalence: Multi-source (5+ skills): triage, code-review, verify, finishing-a-development-branch, and others Related patterns: Workflow Mode Branching, Intent Classification, Chart Decision Tree with Anti-Pattern Guards

What it is: A small Graphviz/DOT-syntax graph embedded directly in the skill body that encodes the decision flow as labeled edges between named states. Distinct from a Markdown decision tree (ASCII branches) — the DOT format is more compact, supports cycles, and the model parses it reliably as a state machine.

Positive Example

## Triage State Machine

```dot
digraph triage {
  start -> classify [label="incoming issue"];
  classify -> bug [label="repro steps + error"];
  classify -> feature [label="behaviour change request"];
  classify -> question [label="usage / how-to"];
  classify -> clarify [label="ambiguous"];

  bug -> severity [label="reproduced locally"];
  severity -> p0 [label="data loss / outage"];
  severity -> p1 [label="user-blocking"];
  severity -> p2 [label="degraded"];

  clarify -> classify [label="user answered"];
  question -> close [label="answer + link docs"];
}
```

Apply the graph: start at `start`, follow the edge whose label matches the
input, end at a terminal node (p0, p1, p2, feature, close). If no edge matches,
the input is out-of-scope — escalate.

Why this works: DOT syntax is unambiguous and the model parses it as a state machine. Cycles (clarify → classify) handle iteration naturally — Markdown trees can’t. The labeled edges name the condition for transition, not just the next node, so the model knows which branch to take.

Negative Example

Triage workflow:
- If it's a bug, find the severity
- For features, route to product
- For questions, answer or link docs
- If unclear, ask the user

Why this fails: No explicit start state, no terminal states, no cycle structure. The model can re-enter the wrong branch and not know it. “If unclear, ask the user” has no transition back into the flow once the user answers.