Home / Methodology

Every check traces back to a source

Every check category built from published specs, peer-reviewed papers, and practitioner experience, not opinions. When we add a check, we document what informed it and why it holds up.

Built on
Anthropic W3C WCAG 2.2 OWASP Top 10 Agentic OpenAI Google ADK arXiv · Hasan et al. arXiv · Wang et al. MCP Protocol
Design philosophy

Validate against standards, not opinions

Every check traces to a published spec, a research paper, or documented practitioner experience. When we add one, we document what informed it and why it matters.

Progressive depth

Checks run in layers: structure first, then semantics, then content quality, security, agent readiness, and whether the knowledge is substantive or well-formatted filler.

Reward quality, don't just punish

A strong gotchas section, concrete code references, or clear error handling show up as positive signals in your report, not just the absence of penalties.

Free for structure, Pro for substance

Free tells you if the skill is built correctly. Pro tells you if it's built well. Every finding carries a severity: critical, warning, suggestion, or strength.

How the checks evolved

Each phase grounded the checks in new independent evidence.

SkillCheck started from one lab's guidelines. From there: practitioner field observations, cross-lab methodology, peer-reviewed academic research, and the OWASP agentic security catalogue. Each round made the checks harder to game.

How a check runs

Pre-compiled patterns, applied line by line

Regex scans the skill content skipping code blocks and frontmatter. Compound patterns require multiple signals on the same line to fire. Every result carries a severity that feeds the scoring engine.

Match → strength
// Consequence pattern (Pro)
Input: "Never call HTTP inside
transactions; we had a 3-hr outage"
Match: imperative + consequence
strength · Knowledge density
No match → skipped
// Hollow content
Input: "Follow team standards."
Match: none, no compound signal
No finding emitted
How checks evaluate

Three methods, used across both tiers

Structural and pattern checks are reproducible: same input, same finding. Judgment checks have a wider tolerance band; the criteria are published so you can predict the outcome.

Structural

Present or absent

Required fields, file references, secrets, token counts. Exact, pass or fail, no ambiguity.

Structure · Body · Security · Token · Trigger Collision
Pattern

Named patterns

Anti-slop phrases, density signals, design patterns, governance checklists. Inspectable, read the rules and predict the outcome.

Naming · Anti-Slop · Enterprise · OWASP det. · Knowledge Density
Judgment

Reading comprehension

Contradictions, workflow clarity, subagent specificity, autonomy boundaries. Rubric-based against published criteria.

Semantics · Workflow · Autonomy Design · OWASP grader
Scoring model

Skills start at 100. Findings subtract. Strengths surface.

Every finding carries a severity. Strengths add no score but appear as positive signals in your report: proof that you built it well, not just not-wrongly.

Critical
−20
Structural violation. Must fix before shipping.
Warning
−5
Quality gap. Should fix.
Suggestion
−1
Minor improvement. Nice to fix.
Strength
+0
Positive signal. No penalty, shown in report as proof of quality.
Two tiers, one standard
Free

Validates shape

Does the skill have the right structure, fields and sections? Free tells you what's missing. Open source, no install, no API key.

Pro · $79 lifetime

Validates substance

Is the content inside those sections actually good? Pro tells you whether what's there is real, security, slop, readiness, governance.

Independence

SkillCheck is an independent project, not affiliated with, endorsed by, or officially connected to Anthropic, OpenAI, Google, or any other AI lab. Research from those organizations informed specific check categories, as documented in the phases above. The implementation, scoring and quality judgments are SkillCheck's own.

The substance layer is Pro

Free proves your skill is built correctly.
Pro proves it's built well.