What is the purpose of the Robovations Score?

A compact descriptive signal for maturity/reliability—not a recommendation

What is the core principle of the Robovations Score?

Robovations Score ≠ endorsement.

Saved

Classification frameworkv3.05 dimensions

The Robovations Score: one number, built from five weighted reads of a robot.

A single headline number only earns its place if the weights behind it are visible. The Robovations Score is the weighted average of five dimensions, each scored 0 to 100, published with its full breakdown on every robot page.

Revisionv3.0
January 20, 2026
426 robots scored

The five dimensionsSegment width reflects weight. Autonomy carries the largest share because it is the most consequential variable for a robot.Weight contribution

30^%AutonomyWhat it does on its own25^%ReliabilityDoes it hold up over time15^%MaintenanceEffort the owner must give back15^%ValueWhat the price actually returns15^%PrivacyWhat the robot sends home

Heaviest weightEqual-weighted trio

Scale

0 – 100

Weighted average of five reads

Revision

v3.0

Published Jan 20, 2026

Applied to

426 robots

Breakdown on every robot page

Median score

63 · Capable

Across all scored robots

What the score measures

One number, published with its arithmetic.

A headline score is only useful if the path to it is visible. Every Robovations Score is the weighted average of the same five dimensions, using the same weights, scored by the same rubric. The weights are argued on this page; the per-robot scores are on each robot page; the weighted arithmetic is the same for every product in the database.

A score is not a recommendation. A robot with a 72 is not “better than” a 68 any more than a 200-page book is better than a 180-page book. The score summarizes a relationship between a robot and a set of concerns that matter in a household. Two robots with the same score can serve different households better. The dimensions and weights exist so a reader can tell whether the score was built on the things that matter to them.

The five dimensions

What goes into a score.

Each dimension is scored 0 to 100 from its own sources and rubric. The weights are set by how consequential each is to ownership, and how hard it is for a buyer to verify before purchase.

30^%

Autonomy

What the robot does without a human in the loop: mapping, recovery, edge-case behavior, unattended run time. Anchored to the Autonomy Ladder level.

Read fromOwner footage, firmware notes, FCC filings, sustained first-90-day reports.

25^%

Reliability

Whether the robot holds up across ownership. Mean time between interventions, failure modes, part availability, and firmware cadence after launch.

Read fromOwner reports at six, twelve, and twenty-four months; warranty escalation threads; recall filings.

The equal-weighted trio

15^%

Maintenance

The effort the robot asks of the owner: cleaning, part swaps, consumable cadence, recalibration.

Read fromManual specs, owner upkeep diaries, consumable pricing histories, third-party part availability.

15^%

Value

What the price actually returns. Total cost over three years against the capability delivered.

Read fromMSRP history, subscription terms, documented feature lockouts, long-term owner cost logs.

15^%

Privacy

What the robot sends home, to whom, on what schedule, and with what owner controls.

Read fromPrivacy policies, published network-traffic analyses, app permission audits, jurisdictional filings.

The rubric

Evidence to score band, dimension by dimension.

A worked example shows the arithmetic. The rubric shows the inputs. For each of the five dimensions, these are the observable thresholds that move a robot between bands. Two analysts applying the same rubric to the same evidence basket should land in the same band.

Autonomy

Excellent80–100: Anchored at Autonomy Ladder Level IV+. Owner footage shows novel-obstacle reasoning across multiple unscripted scenes. Recovery from failure modes occurs without human input.
Capable60–79: Verified Level III. Completes full task cycles in known environments. Owner-recorded edge cases resolve with light intervention rather than rescue.
Adequate40–59: Level II or weak Level III. Starts tasks autonomously but stalls regularly on edge cases documented in owner reports.
Limited0–39: Level I or supervised Level II. Requires near-continuous oversight, teleoperation, or scripted-only operation.

Reliability

Excellent80–100: At least eighteen months in market, no open recalls, fewer than three documented firmware regressions, sustained parts availability, owner-reported failure rate under 10%.
Capable60–79: At least twelve months in market, isolated regressions resolved, parts pipeline functional, owner-reported failure rate under 20%.
Adequate40–59: Six to twelve months of owner data, mixed regression record, owner-reported failure rate 20–35%.
Limited0–39: Active recall, manufacturer abandonment, failure rate above 35%, or no replacement parts.

Maintenance

Excellent80–100: Self-empties and self-cleans. Annual consumables under $50. No recalibration cadence beyond annual.
Capable60–79: Self-empty or self-clean, not both. Consumables $50–150 per year. Quarterly attention.
Adequate40–59: Manual empty and clean. Consumables $150–300 per year. Monthly attention.
Limited0–39: Frequent manual intervention. Consumables above $300 per year. Weekly maintenance.

Value

Excellent80–100: Three-year total cost of ownership returns capability at least 20% above the category median. No subscription gates on core features. MSRP transparent and stable.
Capable60–79: Three-year TCO within 20% of the category median. Optional subscription, not gating core function. MSRP stable.
Adequate40–59: TCO below median capability return, or a subscription gates at least one core feature.
Limited0–39: Multiple core features behind subscription, MSRP volatility, or undisclosed running costs.

Privacy

Excellent80–100: Local-only operating mode available. Data flows fully disclosed. App permissions minimal. No third-party telemetry partners.
Capable60–79: Cloud required for some features but disclosed in plain language. App permissions reasonable. Data residency in a jurisdiction with adequate consumer privacy law.
Adequate40–59: Cloud required for core function. Telemetry partners not fully disclosed. App permissions over-broad.
Limited0–39: Mandatory cloud upload of video or audio. Opaque telemetry. No clear data-deletion path or jurisdictional concerns.

For platforms under twelve months in market

How we score Reliability when the owner-data window is too short.

Reliability draws from owner reports at six, twelve, and twenty-four months. New platforms have less of this data on file. We score them against a pre-launch evidence basket — manufacturer documentation, FCC filings, peer-reviewed teardowns, beta-program reports, and the owner data that does exist — then mark the headline score Provisional. The Provisional band is widened toward the category median to reflect the uncertainty. A score moves out of Provisional once the six-month owner window accrues, at which point the rubric above takes over.

How a number becomes a label

The bands, and one worked example.

The 0 to 100 score maps to four bands, each with a threshold. The bands are argued on this page, printed on every robot page, and never vary by category.

0 – 39Limited

40 – 59Adequate

60 – 79Capable

80 – 100Excellent

Limited (0 – 39)Fails one or more floor-level reads. Category is consistent with a Falls Short readiness state.

Adequate (40 – 59)Works at the level of the category median. Usable, with documented tradeoffs.

Capable (60 – 79)Above the median on most dimensions, sometimes at the cost of one. Most household-ready robots land here.

Excellent (80 – 100)Above the median on every dimension. Rare. Typically a robot that has matured across two or more firmware cycles.

Worked example

How the Mamibot W120-T scores.

OverallAdequate · 54 / 100Weighted across 5 dimensions

Autonomy38^/100

Reliability55^/100

Maintenance60^/100

Value60^/100

Privacy70^/100

Pre-Release Assessment

A robot that has not shipped cannot be measured against a rubric built for shipping robots.

The Robovations Score above is built from owner-reported failure rates, firmware-iteration history, published privacy policies, and consumable-cadence data. None of those exist for a robot that has not shipped. A pre-release robot scored against the Robovations Score rubric is a score detached from the criteria it claims to measure.

So a pre-release robot does not get a Robovations Score. It gets a Pre-Release Assessment instead: a parallel evaluation built from five dimensions that can be honestly evaluated before shipping. The two frameworks live side by side and a robot is scored under exactly one. Lifecycle stage decides which.

Which evaluation applies

Lifecycle stage	Evaluation framework	Label on the page
Pre-release Announced, not yet shipping to consumers	Pre-Release Assessment	“Pre-Release Assessment · 58 of 100”
Provisional Shipping under 6 months, thin owner data	Robovations Score	“Robovations Score · 68 of 100 · Provisional”
Verified Shipping 6+ months, sufficient evidence	Robovations Score	“Robovations Score · 72 of 100”
Discontinued Withdrawn from market	Robovations Score (frozen)	“Robovations Score · 70 of 100 · Discontinued [date]”

The five Pre-Release Assessment dimensions.

Each one is evaluable from public information before the product ships. Weights sum to 100% and are front-loaded on the dimensions most predictive at the pre-release stage.

25%

weight

Manufacturer Track Record

What it asks: Has this maker shipped what they demoed before, on schedule, with the promised capability? Recall history?
Evidence source: Prior products from the same manufacturer (their j-series, S-series, prior humanoid generations). Anchor against their previous two or three releases.

HigherRoborock's Saros line, iRobot's j-series — shipped on schedule with promised capability: 75-85

LowerA first-time manufacturer with no track record: 30-45

25%

weight

Engineering Plausibility

What it asks: How novel is the mechanical approach? Are the demoed capabilities achievable in shipping form, or do they require breakthroughs not yet in evidence?
Evidence source: Demo footage analysis, patent filings, prior-art comparison, manufacturer technical disclosures.

HigherA familiar form factor with one new feature: 70-85

LowerA genuinely new mechanical approach (wheel-legs, soft actuators, articulated arm on a vacuum): 30-50

20%

weight

Demonstrated vs Claimed

What it asks: What has been shown to actually work in published demos, vs. what is claimed in marketing? Is there a gap?
Evidence source: CES coverage, manufacturer videos, third-party event reporting, owner footage where available.

HigherAligned — demos cover the claims: 70-85

LowerLarge gap — heavy marketing, scripted demos, key claims unsupported: 20-40

15%

weight

Path to Consumer Readiness

What it asks: Is there an announced ship date? Pricing? Distribution? Regulatory status?
Evidence source: Press releases, manufacturer roadmaps, FCC filings, certifications, retail listings.

HigherAnnounced + dated + priced + retailer: 75-85

LowerPure CES vapor with no commercial signal: 0-20

15%

weight

Open Questions

What it asks: How many fundamental capability questions still need answering before the product can be honestly evaluated? Inverted — fewer open questions means a higher score.
Evidence source: Editorial assessment against the five-dimension Robovations Score rubric. The questions themselves are listed on each pre-release robot page.

HigherOnly minor specs unknown: 80+

LowerFundamental capability questions remain: 25-45

What the headline number means.

Pre-Release Assessment bands are distinct from the Robovations Score bands (Limited / Adequate / Capable / Excellent) so a reader instantly knows which framework they are reading.

0–25Speculative

Minimal evidence, fundamental unknowns. The product may not be real.

26–45Concept Stage

Meaningful gaps in plausibility or readiness. Direction is set, execution is not.

46–60Plausible

Defensible path forward with material open questions. The publication can take a position.

61–75Promising

Strong signals across most dimensions. Specific risks are named and bounded.

76–100Pre-Order Ready

High confidence the product will ship as demoed. Narrow remaining uncertainty.

What happens when a pre-release robot ships.

When a pre-release product becomes available to consumers, three things happen in sequence:

The pre-release watch fires. A periodic check runs against every pre-release robot in the database, looking for a shipping announcement from the manufacturer or a reputable outlet. When one is detected, the editorial team is notified.
The robot transitions to Provisional. The Pre-Release Assessment is archived for historical reference. The first Robovations Score is computed under the standard five-dimension rubric, with a “Provisional” flag while owner data is still thin (the first six months in market).
The transition itself becomes a tracker entry. “First Robovations Score available” is logged on the tracker so readers who followed the pre-release classification see the transition explicitly, not silently.

The Pre-Release Assessment never converts into the Robovations Score. They measure different things. The transition is a switch, not an interpolation.

Common questions

What readers actually ask about the score.

I.Why one number if classification is not ranking?

The score is a summary, not an ordering. It exists so a reader can see at a glance whether a robot sits at the household-ready floor, the median, or the top of its category. It never appears without the five-dimension breakdown beside it, so a reader can tell whether the headline was built on the reads that matter to them.

II.Why those five dimensions and not more?

Five is the smallest set that covers the distinct reads a household cannot verify after purchase. More dimensions dilute the arithmetic without adding signal. Fewer dimensions collapse reads that ought to be separate, like maintenance effort and reliability, which move independently.

III.Why does Autonomy weigh more than Privacy?

Autonomy is the load-bearing read for a robot. If the robot does not do the job without help, the other reads matter less. Privacy is weighted 15%, up from 10% in v2, on the argument that what a robot sends home is the second most consequential read a household cannot verify after purchase.

IV.Why does a pre-release robot not get a Robovations Score?

The Robovations Score is built from owner-reported failure rates, firmware-iteration history, published privacy policies, and consumable-cadence data. None of those exist before a product ships. Scoring a pre-release robot against this rubric would produce a number detached from the criteria it claims to measure. Pre-release robots get a Pre-Release Assessment instead, which scores them on dimensions that can be evaluated honestly before shipping (manufacturer track record, engineering plausibility, demonstrated vs claimed capability, path to consumer readiness, open questions).

V.What happens when a pre-release robot ships?

The Pre-Release Assessment is archived and the first Robovations Score is computed under the standard five-dimension rubric, initially flagged "Provisional" while owner data is still thin. A tracker entry is logged so readers who followed the pre-release classification see the transition. The two evaluations measure different things, so the switch is exact rather than interpolated.

.Can a robot score well in every dimension and still be Falls Short?

Rarely, but yes. The score measures the robot. Readiness measures the robot-in-market, which includes availability, support, and manufacturer commitments. A robot with a good score can still land in Falls Short if the manufacturer has abandoned it or if a safety recall is outstanding.

.Can I see every score change over time?

Yes. Every robot page keeps a dated score history, with the reason each change was made. Weight revisions to the framework, like the v3.0 rebalance in January 2026, are logged on the framework change log and trigger a full rescoring across all products.

Relationship to other frameworks

What this score borrows, and where it departs.

A descriptive score that earns the reader’s trust has to say where it sits relative to the prior art. The Robovations Score is not a derivative of any single existing framework, but it does take ideas from several and depart from each in places that matter for consumer robotics.

Consumer Reports / Wirecutter aggregate scores

What we borrow

A single weighted headline summarizing several reads, published with the underlying breakdown.

Where we depart

We do not test in a lab. The dimensions are scored from owner data, manufacturer documentation, and source-traceable evidence rather than benchtop measurements.

SAE J3016 (vehicle automation levels)

What we borrow

The principle that autonomy is a spectrum with discrete behavioral breakpoints. Anchors our Autonomy dimension, which is keyed to the Autonomy Ladder.

Where we depart

J3016 governs a vehicle on a road. We classify a robot in a household, where the operating envelope is messier and the safety floor is different.

ANSI/HFES 400 — Human Readiness Levels (HRL)

What we borrow

The distinction between capability and human-readiness as separate axes. Our Human Readiness Criteria sit on a parallel track to the Score for the same reason.

Where we depart

HRL is a procurement framework for systems acquisition. Our HRC describes whether a household can adopt a robot today. The names are similar; the intended user is different.

ISO 13482 (personal-care robot safety)

What we borrow

The premise that safety, privacy, and operational autonomy are distinct concerns that need to be scored independently.

Where we depart

ISO 13482 is a conformity standard. The Robovations Score is a descriptive signal — we do not certify, and a robot scoring well here is not certified to ISO 13482.

Next up

See how the five dimensions play out on the Mamibot W120-T.

Open the Mamibot W120-T

Framework v3.0 · published Jan 20, 2026 · applied to 426 scored robotsSuggest a correction