v3.22.0
1 / 9
SkillCheck
Methodology
Phase I · Build to Spec
2 / 9
Build to
Spec
v1.0–1.4 · Dec 2025 · 3 sources
10
categories
1–10
Categories
- 01Structure
- 06Visual
- 02Body
- 07Security
- 03Naming
- 08Quality
- 04Semantics
- 09Token
- 05Anti-Slop
- 10Enterprise
Phase II · From Practice
3 / 9
From
Practice
v3.10–3.14 · Mar 2026 · 1 source
08
categories
11–18
Categories
- 11Quality Pro
- 15Orch. Safety
- 12Workflow
- 16Autonomy
- 13Ref. Integrity
- 17Composability
- 14Eval Readiness
- 18Observability
Phase III · From the Field
4 / 9
From the
Field
v3.16–3.17 · Apr 2026 · 2 sources
03
categories
19–21
Categories
- 19Design Pattern
- 20Trigger Collision
- 21Eval Kit
Phase IV · Ecosystem
5 / 9
Ecosystem
v3.18.0 · Apr 2026 · 3 sources
01
category
22
Category
A single category, but a decisive one. Anchored by Hasan and Wang's arXiv papers measuring description-quality smells across thousands of MCP servers.
Phase V · Marketplace
6 / 9
Marketplace
v3.20.0 · Apr 2026 · 3 sources
03
categories
23–25
Categories
- 23Agent Integration Readiness
- 24Marketplace Governance
- 25Memory Governance
Phase VI · Agentic Safety
7 / 9
Agentic
Safety
v3.21.0 · Jun 2026 · 1 source
01
category
26
Category
Checks a skill against the OWASP Top 10 for Agentic Applications. Eight deterministic risks run free and local; three intent-based risks are graded by your own AI judge, no API key required.
Phase VII · Lived-In-Ness
8 / 9
Lived-In-
Ness
v3.22.0 · Jun 2026 · 1 source
01
category
27
Category
Reads a skill's git history to answer what the SKILL.md can't: is anyone actually using this, or was it written once and abandoned? Scored in its own domain, so a one-commit skill can still be excellent.
Empirical Anchors
9 / 9
Hasan et al. · description smell rate
97%
of 856 MCP tools across 103 servers contained at least one description smell.
Wang et al. · selection probability
72%vs20%
Across 10,831 MCP servers, standard-compliant descriptions get picked by an agent 72% of the time. Non-compliant descriptions get picked 20% of the time.
Starting score
100
Critical −20 · Warning −5 · Suggestion −1