Skill Architecture Patterns — Lessons from Open-Source Repos

Structural patterns for how skills are organized, packaged, and composed — extracted from Anthropic’s official skills repo and ComposioHQ’s awesome-claude-skills.

Data sources:

Anthropic official skills: 17 official skills, marketplace.json, template/SKILL.md
Community curated skills: 32 unique + 832 Composio skills, CONTRIBUTING.md
Extraction date: 2026-04-15

1. The Canonical Skill Structure

Both repos converge on this minimal structure:

skill-name/
├── SKILL.md          # REQUIRED: YAML frontmatter + instructions
├── LICENSE.txt        # OPTIONAL: per-skill license
├── scripts/           # OPTIONAL: helper scripts (Python, Bash, JS)
├── reference/         # OPTIONAL: domain knowledge files
├── templates/         # OPTIONAL: output templates
├── examples/          # OPTIONAL: example inputs/outputs
├── agents/            # OPTIONAL: sub-agent definitions
├── assets/            # OPTIONAL: fonts, images, schemas
└── themes/            # OPTIONAL: pre-built configurations

SKILL.md Anatomy

---
name: lowercase-hyphenated-name
description: One line explaining WHEN to activate and WHAT it does
license: "Complete terms in LICENSE.txt" | "Proprietary" | omitted
---

# Skill Title (optional — some skills skip this)

[Body: instructions, workflows, constraints, examples]

Key Design Decisions

Decision	Anthropic Official	Community
`name` format	`lowercase-hyphenated`	`lowercase-hyphenated` (enforced by CONTRIBUTING.md)
`description` purpose	Trigger condition for Claude	Trigger condition (same intent)
`license` field	15/17 skills include it	~10% include it
Body size	30-590 lines	30-500 lines
Sections	Varies by complexity	5-section template recommended

2. Description Writing — The Trigger Mechanism

The description field is the most important line in any skill. It determines when Claude activates the skill. Analysis of 50+ descriptions reveals distinct patterns:

Pattern A: Action + Domain + Trigger Keywords

description: >-
  Guide for creating high-quality MCP servers that enable LLMs to
  interact with external services. Use when building MCP servers to
  integrate external APIs.

Structure: {what it does}. {when to use it}.

Pattern B: Capability List

description: >-
  Toolkit for interacting with and testing local web applications
  using Playwright. Supports verifying frontend functionality,
  debugging UI behavior, capturing browser screenshots, and viewing
  browser logs.

Structure: {primary capability}. Supports {capability}, {capability}, {capability}.

Pattern C: Negative Boundary

description: >-
  Suite of tools for creating elaborate HTML artifacts using React,
  Tailwind CSS, shadcn/ui. Use for complex artifacts requiring state
  management — not for simple single-file HTML artifacts.

Structure: {what it does}. Use for {scope} — not for {anti-scope}.

Pattern D: Emotional Hook (Community)

description: >-
  Automatically creates user-facing changelogs from git commits.
  Turns hours of manual changelog writing into minutes of automated
  generation.

Structure: {what it does}. {value proposition — time saved}.

Best Practice

Combine A + C: State the action, the domain, the trigger condition, AND the negative boundary.

description: >-
  Migrate .NET Framework projects to .NET Core multi-targeting.
  Use when the user says "start migration" or provides a .csproj path.
  NOT for greenfield .NET projects or projects already on .NET Core.

3. The Five-Section Community Template

From awesome-claude-skills/CONTRIBUTING.md — the recommended sections for any skill:

# {Skill Name}

## When to Use
- [Trigger conditions — bullet list]
- Keywords: [comma-separated trigger words]

## What This Skill Does
- [Capabilities — bullet list]
- [Limitations — explicit boundaries]

## How to Use
### Basic Usage
[Simplest invocation]

### Advanced Usage
[With options/flags]

## Example
[Complete, realistic input → output]

## Tips
- [Power user advice]
- [Common pitfalls]
- [Related workflows]

Analysis: Which Sections Matter Most?

Section	Impact on Quality	Prevalence
How to Use (steps)	Critical — provides the execution path	~90% of skills
Example (populated)	High — grounds the output format	~65% of skills
When to Use (scope)	Medium — prevents mis-activation	~40% of skills
Tips (pitfalls)	Medium — prevents common errors	~50% of skills
What This Skill Does	Low — redundant with description	~30% of skills

4. Marketplace Packaging

Anthropic’s marketplace.json groups 17 skills into 3 installable packages:

{
  "plugins": [
    {
      "name": "document-skills",
      "skills": ["xlsx", "docx", "pptx", "pdf"]
    },
    {
      "name": "example-skills",
      "skills": ["algorithmic-art", "brand-guidelines", "canvas-design", ...]
    },
    {
      "name": "claude-api",
      "skills": ["claude-api"]
    }
  ]
}

Grouping Strategy

Strategy	When to Use	Example
By domain	Skills share resources (schemas, helpers)	document-skills: xlsx + docx + pptx + pdf
By audience	Skills target same user type	example-skills: creative + testing + building
Standalone	Skill is self-contained and specialized	claude-api

The document-skills package shares:

39 XSD schemas (OOXML validation)
Python helper scripts (file manipulation)
Office helpers (style management, XML editing)

This reduces total package size vs. 4 independent skills.

5. Script Integration Patterns

Pattern A: Self-Documenting Script (–help)

Always run `python scripts/with_server.py --help` before first use.
Use the script as a black box — do not modify it.

Used by: webapp-testing, skill-creator

Pattern B: Init + Build Pipeline

Run `bash scripts/init-artifact.sh` (creates project structure)
[User develops in the created structure]
Run `bash scripts/bundle-artifact.sh` (produces single output file)

Used by: web-artifacts-builder, artifacts-builder

Pattern C: Validation Script

Before packaging, run validation:
`python scripts/package_skill.py --validate`
Checks: frontmatter present, description non-empty, no broken references

Used by: skill-creator

Pattern D: Data Processing Script

Use bundled scripts for file processing:
- `scripts/read_pdf.py` — extract text from PDF
- `scripts/create_pdf.py` — generate PDF with reportlab
- `scripts/merge_pdfs.py` — combine multiple PDFs

Used by: pdf, docx, pptx, xlsx

6. Reference File Organization

Flat References (simple skills)

mcp-builder/
├── SKILL.md
└── reference/
    ├── mcp_best_practices.md
    ├── node_mcp_server.md
    ├── python_mcp_server.md
    └── evaluation.md

Hierarchical References (complex skills)

claude-api/
├── SKILL.md
├── shared/           # Cross-language
│   ├── models.md
│   ├── prompt-caching.md
│   └── tool-use-concepts.md
├── python/
│   └── claude-api/
│       ├── README.md
│       ├── streaming.md
│       └── tool-use.md
├── typescript/
│   └── claude-api/
│       ├── README.md
│       └── ...
└── [5 more languages]

Theme/Variant References

theme-factory/
├── SKILL.md
└── themes/
    ├── arctic-frost.md
    ├── ocean-depths.md
    ├── sunset-boulevard.md
    └── [7 more themes]

Key Insight: Reference Files as Progressive Disclosure Tier 3

All reference files follow the Progressive Disclosure pattern (Pattern 100):

SKILL.md tells the model WHICH file to read
The model reads the file only when entering the relevant phase/workflow
If the workflow doesn’t require that file, it’s never loaded

7. Sub-Agent Architecture

Only 2 of 49 non-Composio skills use sub-agents, both from Anthropic:

skill-creator: 3 Sub-Agents

Agent	Role	When Spawned	Memory
`grader`	Evaluate test results	After each eval run	Isolated (no skill context)
`comparator`	Blind A/B comparison	After two versions run	Isolated (doesn’t know version labels)
`analyzer`	Extract improvements	After comparison	Has both transcripts + results

doc-coauthoring: 1 Sub-Agent

Agent	Role	When Spawned	Memory
naive-reader	Fresh-eyes review	After document draft	Isolated (only has the document)

Key Architectural Rule

Sub-agents get LESS context, not MORE. Their value comes from what they DON’T know:

Grader doesn’t know the skill’s intent → judges output purely
Comparator doesn’t know version labels → eliminates bias
Naive-reader doesn’t know the writing process → finds genuine confusion

8. Composio’s Universal Template

832 skills use a near-identical template with only 3 variables:

Variable	Example Values
`{service_name}`	“slackbot”, “atlassian”, “googledrive”
`{toolkit_name}`	“SLACKBOT”, “ATLASSIAN”, “GOOGLEDRIVE”
`{specific_tools}`	Varies — some skills list tools, most don’t

Template Quality Tiers

Tier	Count	Characteristic
Generic	~780	3-step (Search → Connect → Execute), no specific tools listed
Enriched	~50	Lists specific tool names, parameter tables, common patterns
Custom	~2	Fully custom content beyond the template

Lesson: Templates Enable Scale, Not Quality

Composio achieved 832 skills by templating. But the generic skills are low-quality — they provide no domain-specific guidance. The enriched variants (like googledocs-automation) that list specific tools and pitfalls are significantly more useful.

For our plugin: Templates are useful for generating boilerplate (PR descriptions, branch names) but the high-value content (error handling rules, fix strategies) must be hand-crafted per domain.

9. Cross-Repo Comparison Summary

Dimension	Anthropic Official	Community	Our Plugin (migration plugin)
Skills count	17	32 unique + 832 template	21 skills + 3 agents + 4 commands
Avg SKILL.md size	~200 lines	~120 lines	~150 lines
Progressive Disclosure	Explicit 3-tier	Rare	Explicit (`<always-loaded>` / `<on-demand>`)
Sub-agents	2 skills use them	0 skills	3 agents (build-repair, ado, dependency)
Script bundling	5 skills (Python, Bash)	0 skills	2 skills (Python)
Eval framework	Built-in (skill-creator)	None	dev-prompt-review (custom)
Safety boundaries	Provider guards, anti-slop	Minimal	Extensive (7 guardrails, 4 tiers, 99 patterns)
Domain knowledge	SDK docs, font catalogs	Algorithm internals, SaaS APIs	Error handling rules, package maps, project maps
Output verification	Visual QA sub-agent	None	Build loop (7 attempts)

What We Can Learn From Each

From Anthropic Official:

Progressive Disclosure is worth formalizing (we already do this)
Anti-Slop naming is powerful for creative/generative output
Blind A/B comparison for skill quality evaluation
Visual QA for generated artifacts

From Community:

Clarifying Questions are universally adopted — we should ensure Phase 0 does this
Populated Examples are more effective than abstract schemas
The 5-section template is a good minimum bar for any new skill
Emotional hooks in descriptions increase human engagement

From Composio:

Universal templates enable scale but not quality
The Search → Connect → Execute pattern is a powerful abstraction for any dynamic API
Known Pitfalls sections prevent repeated failures across users