Learn/Methodology
Methodology

How OKR Orca scores your OKR.

Seven quality criteria, six anti-pattern checks. Each criterion targets one specific failure mode. The scoring is automatic; the logic behind each score is visible, so you can argue with it, teach from it, and improve against it.

The rubric is opinionated. It will flag some genuinely decent OKRs as incomplete because a baseline is missing or an alignment reference is not stated. That is a feature. The goal is not perfect OKRs by some abstract standard. It is OKRs that actually change outcomes rather than producing well-formatted planning theatre.

The 7 criteria

O1Clarity

Does the Objective name a specific customer and a specific scope?

Vague beneficiaries produce vague KRs downstream. If the Objective does not say who benefits or from what, the team cannot prioritise between the many ways to reach the stated direction.

Passes (scores 2)
"Internal backend engineers stop losing time to environment failures" names a customer and a scope. No ambiguity about who benefits.
Fails (scores 0)
"Improve the developer experience" could mean internal engineers, external API consumers, or both. "Experience" covers everything and describes nothing.
O2Timebox

Is there an explicit date or quarter?

An Objective without a timebox cannot be tracked. Teams defer the hard conversation about whether they are on track because there is no date to be on track against.

Passes (scores 2)
"By end of Q3 2026" is an explicit reference that creates a review moment and a deadline.
Fails or partial
"This year" scores 1. "Soon" and objectives with no time reference score 0.
O3Strategy

Is the Objective problem-framed, with no solution prescribed in the text?

A team that writes solution-first objectives has usually skipped the problem definition step. If the solution changes mid-quarter, the Objective becomes false. Problem-framed Objectives survive pivots.

Passes (scores 2)
"Cut the time it takes customers to complete their first order" names a problem and a direction without specifying features, platforms, or methods.
Fails (scores 0)
"Launch the self-service checkout portal so customers can place orders faster" embeds the portal as the answer before any work has started.
KROutcome form

Does the Key Result follow the structure "who does what by how much"?

Output-verbs (launch, migrate, deliver, create, build, implement) score 0. A metric with a vague actor scores 1. The full "who + does what + by how much" structure scores 2. Applied per Key Result.

Passes (scores 2)
"New customers complete checkout without contacting support, from 34% to 52%" has a named actor, a specific behaviour, and a measurable range.
Fails (scores 0)
"Launch checkout improvements by end of Q3" is work, not a result. The outcome version asks what changes for customers after the launch.
KRMeasurability

Does the KR include both a baseline and a target?

One present, one missing scores 1. Neither scores 0. Both, plus an implied or named data source, scores 2. If the baseline is unknown, the correct OKR is to instrument the metric first, not to improve it.

Passes (scores 2)
"Session-to-signup conversion moves from 2.1% to 3.5% (source: GA4, 30-day rolling average)" tells you the current state, the target, and where to find the number.
Partial (scores 1)
"Increase conversion rate to 3.5%" has no baseline, so you cannot know if the market simply moved the number.
A1Alignment

Does the OKR set reference its parent objective or the strategy it contributes to?

Alignment is not just governance overhead; it is the mechanism that connects team effort to organisational outcomes. The work may be well-intentioned and still be optimising the wrong thing.

Passes (scores 2)
"Contributes to company OKR: Become the lowest-friction checkout experience in our category" states the link explicitly rather than assuming it.
Fails (scores 0)
An OKR set with no reference to anything above it scores 0, regardless of how well-constructed the KRs are.
C1Completeness

Are there placeholders in the OKR set?

Anything marked X%, TBD, (owner), (tbc), or "numbers tbd" scores 0. A placeholder is a deferred decision. Submitting an OKR with placeholders is submitting a draft as a commitment.

Passes (scores 2)
Every field populated with real numbers, real owners, and real data sources.
Fails (scores 0)
"Increase NPS from X to Y (owner: TBD)" creates the appearance of measurability without the substance.

The 6 anti-patterns

Output-as-KR

A KR that describes work your team does rather than a change that happens in the world. The verb is the tell: migrate, launch, deliver, build, implement.

"Migrate 100% of orders to the new OMS by Q3." The outcome of migration might be speed, reliability, or error reduction. Write the KR about that instead.
Impact-as-KR

A KR so high-level and lagging that no single team can control it. A team that writes this kind of KR cannot tell at week 6 whether they are contributing or bystanders.

"Increase annual revenue by 20%." Revenue is the result of many teams' work. Find the specific behaviour one level down.
Vanity metric

A plausible-sounding number that does not connect to a specific actor or behaviour. Easy to move without moving the thing that matters.

"Increase engagement by 25%." Engagement of what, by whom, on which surface? Name the actor: "Email subscribers who click a product card, from 6% to 11%."
Placeholder

A KR with unknown numbers committed as though they were known. If the baseline is unknown, the KR is a wish. Instrument the metric first.

"Reduce load time from X% to Y%." No baseline, no target. This is a direction, not a result.
Binary milestone

A pass/fail milestone that tells you whether something happened, not whether it worked. Usually an Output-as-KR in disguise.

"100% of teams onboarded to the new framework." If the onboarding was supposed to reduce planning cycle time, measure that.
Task-list-in-disguise

Three or more KRs that are really one project plan. Inputs, not results. A set with seven KRs where two do the heavy lifting and five are there for coverage is a set with five hidden tasks.

"Assign two engineers. Create the mapping document. Get sign-off from Legal." These describe effort. Compress to one or two KRs about the outcome.

The "so what?" test

For every KR, ask three questions before committing. Any "no" means the KR needs rewriting.

Question 1
If all KRs turn green, is the Objective obviously achieved? If not, the KRs are not tightly coupled to the Objective. Something is missing.
Question 2
If this KR turns red, does it signal a real problem the team must act on? If the answer is "we'd notice but carry on," the KR is not important enough to be in the set.
Question 3
Does the team actually control this metric? If it can move due to factors entirely outside the team's influence, it is a weak signal for team performance.

The test surfaces the gap between activity and outcome. Most OKR problems are visible the moment you ask these three questions.

How the score is computed

Each of the 7 criteria scores 0, 1, or 2. KR-level criteria (Outcome form, Measurability) apply per Key Result. The total raw score is normalised to a 0-100 percentage.

Score rangeTierWhat it means
0-20RewriteCore structural failures. The OKR cannot be tracked as written. Start over.
21-40ReframePremise is off. Multiple criteria fail. Reshape the Objective or the KR set before tweaking.
41-60RefineWorking shape, but gaps will bite mid-quarter. Sharpen specific KRs.
61-80SolidSolid foundation. A few criteria need sharpening before commitment.
81-100ShipAll criteria met or nearly met. This is a committable OKR.

Tiers are diagnostic signals, not grades. A score of 42 means specific criteria are dragging the set down. The per-criterion breakdown shows which ones and why.

Ready to try the rubric on your OKR?
Paste it into Diagnose, get a score in 60 seconds.
Diagnose →
OKR Orca by Frederik Metz. Source. No backend. No tracking. Your key stays in your browser.