aiGalen Guan

Agent Skills Ecosystem Survey: What 8 Categories Reveal About AI Agent Evolution

The agent skills ecosystem is growing fast. The skills.sh registry now tracks thousands of skills across hundreds of categories. But installation counts don't tell the whole story. A skill with 180K installs might be superseded by a better architectural pattern. A skill with 100 installs might contain the exact methodology your project needs.

Over the past week, we surveyed eight skill categories: skill-creator, tmux, documentation/readme/api-docs, refactoring/code review, git/changelog/release, web-search/extraction, Playwright testing, and skill discovery. Here's what we learned — and what it says about the state of AI agent tooling.

The Survey Scorecard

Skills Ecosystem Survey Scorecard — Adoption Results

The Survey Framework

Each category was evaluated across five dimensions:

Dimension Questions
Installation Velocity How many installs? Growing or plateauing?
Architectural Fit Does it compose with other skills or stand alone?
Capability Gap Does our existing skill graph already cover this?
Cost Model Free? Freemium? Paid-only with automatic disqualification?
Adaptation Potential Can we extract novel methodology while discarding redundancy?

The cost model filter is non-negotiable. Paid-only skills are automatically skipped regardless of technical merit — a hard rule learned from integrating and then removing seven Skywork skills.

Category 1: Skill Creator (3 versions surveyed)

Anthropic Skills (@skill-creator) — 182.8K installs

The dominant standard. 485-line SKILL.md with rigorous evaluation pipeline: parallel subagent testing, baseline comparisons, quantitative benchmarking, description optimization via 20-eval-query sets. Architectural weakness: flat structure with no dependency graph.

OpenClaw Skill Creator (@skil-creator) — 2.8K installs (in steipete/clawdis)

Part of a massive 50+ skill library. Rich metadata (emoji, OS constraints, auto-install instructions). But same architectural limitation: standalone skills with no composition model.

Codex Skill Creator (@skill-creator) — 1.3K installs (in openai/skills)

Lightweight with init_skill.py and package_skill.py automation. Philosophy: "Codex is already very smart — only add context it doesn't have." Clean but narrow.

Verdict: All three are superseded by Hermes Skill Graph 2.0's tiered composition model. The evaluation methodology is valuable as inspiration, but the flat architecture limits leverage. We keep the forensic approach (defensive pattern mining) and discard the standalone evaluation pipeline.

Category 2: Tmux Session Control

steipete/clawdis@tmux — 2.8K installs

Clean, focused skill for remote-controlling tmux sessions: send-keys, capture-pane, session management. Designed for monitoring Claude Code sessions in tmux.

The skill itself is well-written — 170 lines, clear when-to-use/when-not-to-use, safe sending patterns for interactive TUIs. But the capability gap analysis is decisive: Hermes' native terminal(background=true) with process polling provides equivalent functionality without the tmux dependency layer. For long-running Claude Code sessions, terminal(pty=true) handles interactive I/O directly.

Verdict: Skip. Hermes' terminal tool is architecturally superior — one tool handles interactive, background, and foreground execution without an external multiplexer. The tmux skill is valuable for OpenClaw/Claude Code users who need to manage sessions across SSH disconnects; Hermes doesn't have that constraint.

Category 3: Documentation / README / API-Docs

This category was unexpectedly sparse. The top results: sgcarstrends/readme-updates (61 installs), ruvnet/ruflo@api-docs (57 installs), uinaf/skills@docs (36 installs). None crossed the 100-install threshold that signals community validation.

The ecosystem gap is interesting. Documentation generation is a task every developer needs, yet no dominant skill has emerged. Possible explanations:

  • Documentation is too context-specific for a generic skill
  • AI agents handle README/doc generation well with basic tools
  • The fragmentation of documentation formats (Markdown, reStructuredText, AsciiDoc, Javadoc, Sphinx) makes a unified skill difficult

Verdict: No adoption targets. Our existing domain-glossary handles terminology documentation, documentation-and-adrs handles architectural decision records, and raw content authoring is well within an agent's baseline capabilities.

Category 4: Refactoring / Code Review

github/awesome-copilot@review-and-refactor — 9.5K installs

The standout in this category, and part of the same repository as quality-playbook-generator. The refactoring methodology emphasizes systematic approach over mechanical transformations.

Verdict: Partially superseded. Our code-review-and-quality skill handles review protocols. clean-code-solid-security defines refactoring quality baselines. django-monolith-to-package-refactor and vue-service-extraction handle language-specific refactoring patterns. The review-and-refactor skill's value is its systematic methodology, which we've already absorbed into our tiered approach.

Category 5: Git / Changelog / Release

paperclipai/paperclip@release-changelog — 106 installs

Barely above the threshold. The Paperclip ecosystem is interesting — it's a control plane for AI-agent companies with governance-aware release management. But at 106 installs and tied to a specific platform, it's not general-purpose enough for adoption.

The broader observation: changelog generation and release management are surprisingly underserved. johnlindquist/claude@changelog (33 installs), moonshotai/kimi-cli@gen-changelog (26 installs). No skill has broken 1K installs.

Verdict: No adoption. Our git-safe-commit-push handles commit discipline. Release management is handled through worktree orchestration (teardown = commit + push + merge). The changelog/release automation niche remains open — potentially a future atom.

Category 6: Web Search / Content Extraction

inference-skills/skills@web-search — 8.4K installs

The dominant skill in this category, but immediately disqualified under our cost model rules. It requires inference.sh CLI (belt) which depends on Tavily Search and Exa APIs — both paid services beyond their initial credit tiers.

The skill's description is a keyword-stuffed 300+ characters listing every possible use case. The actual functionality: five apps (Tavily Search, Tavily Extract, Exa Search, Exa Answer, Exa Extract) all requiring paid API access.

Verdict: Automatic SKIP (paid API dependency). Our web-research-via-curl skill provides free DuckDuckGo-based research. daily-news-brief handles structured news collection. The web-search category confirms a pattern: paid API-dependent skills may dominate installation counts, but free alternatives exist that integrate more cleanly into agent workflows.

Category 7: Testing / Playwright

bobmatnyc/claude-mpm-skills@playwright-e2e-testing — 2.7K installs

The top result (2.7K installs) turned out to be mislabeled — the repository actually contains mcp-protocol-builder, not a Playwright testing skill. This is a skills.sh cataloging error.

Other results: alinaqi/claude-bootstrap@playwright-testing (937 installs), manutej/luxor-claude-marketplace@playwright-visual-testing (559 installs). None cross 1K installs.

The broader pattern: Playwright testing skills are fragmented. Multiple low-install variants compete without consensus. This suggests Playwright testing is either (a) too project-specific for a generic skill, or (b) handled adequately by the Playwright CLI directly without a skill wrapper.

Verdict: No adoption. Our webapp-testing skill covers browser-based testing. The test-driven-development skill provides test methodology. Adding a Playwright-specific skill without clear architectural superiority would fragment our testing approach.

Category 8: Find Skills (Already Local)

find-skills — Already installed

This skill is already in our local library and functioning well. It provides the npx skills find workflow for discovering skills across the ecosystem. No adaptation needed — it's a discovery tool, not a capability skill.

Verdict: Keep as-is. Continue using for ecosystem discovery.

The Scorecard

Category Top Install Count Action Reason
Skill Creator 182.8K Superseded Hermes 2.0 tiered composition > flat
Tmux 2.8K Skip terminal(background=true) handles this
Docs/Readme 61 Skip No mature skill exists
Refactor/Review 9.5K Partially absorbed Existing skills cover most surface area
Changelog/Release 106 Skip Underserved niche, no mature skill
Web Search 8.4K SKIP (paid) Paid API dependency
Playwright 2.7K (mislabeled) Skip Fragmented, no dominant skill
Find Skills N/A Keep Already working, discovery tool
Quality Playbook 9.5K ADOPTED Extracted forensic methodology

The only adoption from this survey was quality-playbook-generator — and even that was extracted, not copied. Its forensic approach (defensive pattern mining, fitness-to-purpose scenarios) was genuinely novel and integrated cleanly into our existing graph. Everything else was either redundant, paid, or too immature.

What This Says About the Ecosystem

Three patterns emerged:

1. Installation Count ≠ Architectural Quality

The skill with the most installs (Anthropic skill-creator, 182.8K) had the most fundamental architectural limitation: no composition model. High installs reflect ecosystem dominance, not architectural sophistication.

2. Paid APIs Dominate "Power" Categories

Web search, image generation, video processing — the categories that require external services are dominated by paid-API skills. The free alternatives are less polished but integrate more cleanly into agent workflows without credential management overhead.

3. The Ecosystem Is Fragmented Below 1K Installs

Below ~1,000 installs, skills are overwhelmingly personal projects with 10-100 users. These aren't "bad" skills — they're just not general-purpose enough to cross the community adoption threshold. The lesson: before creating a new skill, check if the niche is actually underserved or simply too narrow to ever reach broad adoption.

Recommendations for Skill Builders

If you're building skills for the agent ecosystem:

  1. Design for composition before publication. A skill that declares its dependencies and consumers is worth 10 standalone skills. The graph edge metadata (requires_skills, feeds_into) should be part of every SKILL.md from day one.

  2. Separate methodology from presentation. Quality Playbook's forensic approach is valuable. Its PowerPoint-style report generation is not. Extract the former, discard the latter.

  3. Cost model is part of architecture. If your skill requires a paid API, document it in the frontmatter. If the cost is unsustainable for individual developers, consider whether a free alternative path exists.

  4. Don't create skills for tool wrappers. A "Playwright testing" skill that just wraps npx playwright test adds indirection without value. Create skills for methodology — the thinking behind the tool use, not the tool invocation itself.

The agent skills ecosystem is maturing from "more tools" to "better thinking." The skills worth adopting are the ones that encode judgment, not just commands. That's the lesson this survey reinforced: sometimes the best skill isn't the one with the most installs, but the one that teaches the agent why — not just how.