aiGalen Guan

How We Adapted Quality Playbook into Hermes: Extraction, Not Duplication

When you encounter a well-crafted third-party skill with nearly 10,000 installs and a 479-line instruction set, the instinct is to bring the whole thing over. More is better, right?

Wrong. The art of skill adaptation is subtraction, not addition. It's about extracting what's genuinely novel and discarding what your existing ecosystem already handles. This is the story of how we adapted Andrew Stellman's quality-playbook-generator — and ended up creating two skills from a 479-line behemoth while leaving behind four redundant artifacts.

The Source Material

Quality Playbook Generator (v1.2.0, 9.5K installs from github/awesome-copilot) is an impressive piece of work. Its 479-line SKILL.md orchestrates a multi-phase process that produces six files: a quality constitution, functional tests, a code review protocol, an integration test protocol, a "Council of Three" spec audit, and an AGENTS.md bootstrap.

The core innovation is what the skill calls "finding skeletons" — systematically searching for defensive code patterns as evidence of past failures. Every try/except, null check, and retry loop is the codebase whispering its history. This forensic approach to quality is genuinely novel and language-agnostic. It works for Python, Java, Scala, TypeScript, Go, and Rust.

But the full playbook also generates four protocol documents that our existing skill graph already handles:

  • Code review protocols → covered by code-review-and-quality
  • Integration test protocols → covered by testing-patterns
  • Spec audit methodology → covered by security-auditor and code-reviewer
  • Quality constitution → overlaps with domain-glossary and documentation-and-adrs

Duplicating these into standalone protocols would create a parallel quality system competing with — instead of integrating into — our existing graph. That's the anti-pattern we explicitly avoid.

The Methodology: Cost → Extract → Compose → Integrate

Third-Party Skill Adaptation Pipeline — The Hermes Methodology

Our third-party skill adaptation process follows a strict sequence:

Step 1: Cost Model Check (Before Source Review)

This comes first, not last. Quality Playbook requires no API keys, no cloud services, no subscription fees. It operates entirely on local code. Score: Free SaaS. Proceed.

If this step had revealed paid API requirements, we would have stopped immediately — this is the Skywork rule. Seven skills were integrated and removed from our library because $19.99/month disqualified them, regardless of their 7.9/10 technical score.

Step 2: Source Review and Security Audit

Read every line of the 479-line SKILL.md. Check for: hardcoded credentials, external data flows, license compatibility, and privilege escalation vectors. Quality Playbook is clean — MIT licensed by Andrew Stellman, no network calls, no credential handling. The only "dependency" is the assumption that an AI agent can run grep and read files.

Step 3: Multi-Dimensional Evaluation

Score each deliverable against existing alternatives:

Deliverable Novelty Existing Coverage Action
Defensive pattern audit High — unique forensic approach None EXTRACT
Fitness-to-purpose scenarios High — grounded in code evidence None EXTRACT
QUALITY.md constitution Medium — overlaps with domain docs domain-glossary, documentation-and-adrs DISCARD (integrate pointers)
Functional test generation Low — well covered test-driven-development, testing-patterns DISCARD
Code review protocol Low — well covered code-review-and-quality DISCARD
Council of Three audit Medium — multi-model idea is interesting code-reviewer, security-auditor DISCARD (reference approach in existing skills)

Step 4: Tier Mapping

The extracted forensic methodology naturally maps to two tiers:

  1. Atom: defensive-code-audit — Single-purpose, deterministic grep-and-classify workflow. Language-agnostic patterns, risk domain classification, scenario generation format. No sub-skill dependencies.

  2. Molecule: codebase-quality-forensics — Chains defensive-code-audit with existing domain-glossary, code-review-and-quality, and documentation-and-adrs. Orchestrates the full forensic pipeline: discovery → defensive pattern mining → quality posture report → presentation and iteration. Declares requires_skills and feeds_into edges.

Step 5: Rewrite, Never Copy-Paste

Third-party skill descriptions are routinely keyword-stuffed (the quality-playbook description is over 800 characters listing every possible trigger phrase). Hermes descriptions are concise: "Use when..." formats under 200 characters.

The body rewrite is equally important. The original 479-line SKILL.md includes detailed instructions for generating PowerPoint-style reports and browser-based review viewers — features that require infrastructure we don't have and wouldn't use. Our adapted versions are 150 and 250 lines respectively, each focused on execution not presentation.

Step 6: Script Handling

Quality Playbook includes no scripts — only reference files (defensive_patterns.md, schema_mapping.md, constitution.md, etc.). These are methodological guidance, not executable code. Our adaptation embeds the grep patterns directly in the skill body rather than externalizing them as reference files. The patterns are the skill — hiding them behind a reference pointer adds indirection without value.

Step 7: Graph Integration

The final and most critical step: wiring the new skills into the existing graph.

defensive-code-audit (atom)
    feeds_into → codebase-quality-forensics

codebase-quality-forensics (molecule)
    requires_skills → [defensive-code-audit, code-review-and-quality, domain-glossary, documentation-and-adrs]
    feeds_into → [coding-build-ship, testing-patterns]

The feeds_into edges are the integration points. coding-build-ship can now reference forensic findings before implementation. testing-patterns can generate regression tests for discovered failure modes. The new skills don't compete with the existing graph — they extend it.

What We Left Behind (And Why That's Good)

The original playbook generated six files. We adapted exactly one capability (defensive pattern mining) and one workflow (forensic pipeline orchestration). Everything else was already covered:

  • Functional teststest-driven-development handles test generation with RED-GREEN-REFACTOR discipline. Adding parallel functional test generation from a different philosophy creates confusion about which approach to use.

  • Code review protocolcode-review-and-quality already includes guardrails (line numbers, read bodies, grep before claiming). A parallel review protocol document adds documentation overhead without new capability.

  • Integration test protocoltesting-patterns covers integration test strategy. The quality-playbook approach of generating markdown protocol documents is a presentation choice, not a capability — and not one we've adopted.

  • Council of Three audit → The multi-model audit idea is genuinely interesting but requires infrastructure (three different AI models, independent runs, results merging). Our code-reviewer and security-auditor skills cover the same surface area with less ceremony.

  • Quality constitutiondomain-glossary maintains domain terminology across sessions. documentation-and-adrs records architectural decisions. A standalone QUALITY.md adds fragmentation without new information.

The lesson: third-party skills often conflate "what we do" with "how we present what we do." The methodology is valuable; the presentation format is an implementation detail. Extract the methodology, integrate it into your existing graph, and discard the presentation artifacts.

Results and Verification

After creation, both skills pass the composition integrity checks:

  • Frontmatter starts at byte 0 with valid --- delimiters
  • name fields are lowercase + hyphens, under 64 characters
  • defensive-code-audit has empty requires_skills: [] (correct for atom)
  • codebase-quality-forensics lists four dependency skills (correct for molecule)
  • No circular dependencies in the extended graph
  • Both skills appear in skills_list under software-development

The adaptation is also verified against our security checklist:

  • No auth tokens, network calls, or external dependencies
  • No telemetry, tracking, or analytics code
  • File access limited to local project directories
  • MIT license permits integration

The final state: two tightly focused skills that extend the forensic and quality surface area of our existing graph. No duplication. No fragmentation. No parallel quality systems competing for attention. Just clean, composable extension of what was already there.

This is the adaptation methodology we'll apply to every future third-party skill integration. Extract what's novel. Discard what your graph already handles. Wire the extraction into the graph with explicit edges. Ship.