What robots actually collect from your home
A modern autonomous vacuum is a sensor platform first, a cleaning appliance second. Inside your home, these robots gather information that would have seemed invasive just a few years ago. The data pipeline typically includes home maps generated from LiDAR or visual simultaneous localization and mapping, schedule patterns that reveal when occupants are home or away, voice commands recorded for control and training, carpet and floor-type classification, and on some models, video clips from front-facing cameras. Roborock and Dreame devices log network information including WiFi SSIDs and signal strength; Ecovacs models transmit room-level activity summaries; iRobot publishes collection of floor plan imagery for Genius map features. This is not hypothetical. The 2022 incident in which iRobot test images from homes in Venus, Florida were inadvertently shared during machine learning training exposed the scope of what these devices capture: multiple angles of living spaces, floor plans, and personal objects, all collected as part of routine operation. MIT Technology Review documented the incident in December 2022, revealing that iRobot had been collecting these images as a normal part of its Genius feature training data without explicit opt-in beyond the device’s privacy policy.
Modern vacuum robots collect extensive data about your home layout, schedules, and voice patterns, often transmitting it to cloud servers. Local processing was once standard; today it has become the premium option, gated behind subscriptions and proprietary ecosystems. Understanding where your data actually flows is the foundation of an informed buy.
Where the data actually lives and who processes it
The destination of this data is where the privacy model diverges most sharply across manufacturers. iRobot servers process image data in the United States for Genius training; the company has committed to manual review of training images since the 2022 incident. Roborock maintains cloud servers in China as the primary processing location for map optimization and AI training, with a separate US endpoint available for customers in North America. Dreame similarly uses Chinese cloud infrastructure alongside optional local processing for certain features. Ecovacs routes user data through servers in China and multiple international data centers depending on regional regulations. Samsung and Eufy take comparatively lighter cloud approaches: Samsung’s data primarily stays local to the device or syncs only map updates, while Eufy advertises offline-first operation for core navigation and cleaning, with cloud access as an optional feature for remote monitoring. The critical gap is that even when manufacturers provide privacy policies describing this flow, the policies often do not disclose which data is retained, how long it is stored, or who has access beyond internal teams. iRobot’s privacy policy describes data collection but provides no retention schedules. Roborock’s privacy policy is similarly vague on deletion timelines, stating only that data is kept “as long as necessary” for service delivery.
Subscription gates on features that used to work offline
A significant privacy-plus-pricing shift is the gating of offline capabilities behind subscription tiers. The most documented case is iRobot’s Genius mapping, which was originally a fully local feature in older Roomba models and some iRobot models up through the j7 series. Starting with the j9 and later models, Genius map editing, no-go zones, and room selection require an iRobot+ subscription, moving map processing from your home network to iRobot’s cloud infrastructure. iRobot’s support documentation confirms this gate on remote mapping features. Roborock employs a similar model: basic cleaning is entirely local, but multifloor mapping, room merging, and advanced cleaning schedules route data to Roborock’s cloud. Dreame restricts advanced room selection and multi-level optimization to cloud users. The trade-off is not accidental. By moving these features to the cloud, manufacturers gain two advantages: they can train machine learning models on aggregate data from thousands of homes, and they can create a recurring revenue stream from what were previously one-time hardware capabilities. From a privacy standpoint, the effect is that choosing the cheapest option also forces the most local operation, while paying for subscriptions increases data leaving your home.
Documented incidents and what they reveal
The Venus, Florida iRobot incident remains the clearest documented case of how this data pipeline can fail. MIT Technology Review’s reporting showed that iRobot had contracted with a third-party labeling firm to annotate home images for Genius training. The labeling firm had access to unredacted home photos. When iRobot subsequently shifted to a different labeling approach, historical images were stored in an improperly secured environment accessible via simple URL guessing, leading to exposure of multiple homes’ floor plans and contents. The incident did not involve a breach in the traditional sense; it was a failure of data governance around third-party contractors. It revealed that iRobot’s own policies did not require masking or anonymization of training data before shipping to labelers. Since then, iRobot has stated it manually reviews all training images and uses in-house labeling, but the incident established that vacuum makers do engage external parties in processing home imagery. Roborock, Dreame, and Ecovacs have not disclosed similar incidents, but they have also not published equivalent transparency reports. The absence of a reported incident does not mean the data is more secure; it may reflect less public scrutiny or less willingness to disclose problems. The Mozilla Foundation’s “Privacy Not Included” project has flagged multiple robot vacuum manufacturers for vague data practices, though its assessments predate the 2026 landscape and should be treated as baseline rather than current.
The Matter standard and the promise of local-first operation
The Connectivity Standards Alliance’s Matter protocol is often cited as a path toward decentralized robot control that reduces reliance on proprietary clouds. Matter enables local communication between IoT devices and hubs without requiring internet routing for every command. A Matter-certified vacuum could theoretically accept navigation commands from a local smart home hub, execute cleaning jobs, and report status without transmitting detailed telemetry to the manufacturer’s cloud. The CSA’s Matter specification and profiles include standards for robotic vacuum control that support local operation. However, the current generation of commercial vacuums shows that Matter adoption does not automatically reduce cloud dependency. Even when a robot supports Matter, manufacturers have continued to route features like training data feedback, map optimization, and predictive maintenance through cloud channels. Roborock’s latest models support Matter for basic control but still require cloud connectivity for Genius features. Dreame’s Matter support similarly coexists with optional cloud synchronization. The architecture is not forcing a choice; manufacturers can offer both local Matter control and cloud services simultaneously. This means Matter represents a floor for local operation, not a ceiling on cloud use. If you disable cloud on a robot that supports Matter, you retain basic autonomous operation; if you enable the manufacturer’s cloud, you gain more aggressive learning and remote features on top.
Comparing manufacturer postures and what “works without cloud” means
Five broad postures exist across the current generation of Level III vacuums. Eufy stands at the local-first extreme: its RoboVac X10 Pro Omni runs full navigation, cleaning, and obstacle avoidance offline; cloud access is optional and limited to remote monitoring. Samsung’s Bespoke Jet Bot Combo AI similarly preserves core autonomy offline, uploading only activity summaries and app-control payloads. iRobot and Roborock occupy the cloud-integrated middle ground, where mapping and scheduling work locally but advanced features require cloud opt-in; Dreame shares this model. Ecovacs is the most cloud-dependent, with more aggressive telemetry even for basic operation, though it does not require subscription for core cleaning. The phrase “works without cloud” is technically true for all of these; what differs is how much capability you surrender. A Roborock S8 MaxV Ultra without cloud access cleans effectively but loses room selection, multifloor mapping, and voice-control features tied to Genius. An Ecovacs Deebot X11 OmniCyclone loses remote monitoring and some predictive maintenance signals. An iRobot Roomba Combo j9+ without iRobot+ retains basic cleaning but cannot store no-go zones or room maps in the app. None of the manufacturers currently allow you to achieve full capability offline and cloud-optional simultaneously; there is always a trade-off between autonomy and feature completeness.
Practical guidance: understanding your data flow and making tradeoffs
The decision is not whether to accept data collection, but what type and quantity of collection you can tolerate. Step one is identifying what matters most to you: if remote monitoring and voice control are required, cloud connectivity is mandatory. If you prioritize offline operation, Eufy and Samsung provide the clearest path, accepting that their voice features and remote capabilities are less developed than Roborock’s or iRobot’s. If you choose a cloud-integrated robot, disable the manufacturer’s cloud synchronization if that option exists in the app settings, understanding that you are losing map learning and advanced scheduling. Second, examine the jurisdiction and data retention. iRobot’s US-based processing offers some regulatory clarity under US privacy law, though no deletion timeline. Roborock and Dreame’s Chinese cloud processing is subject to Chinese data governance rules, which differ significantly from EU and US frameworks. Ecovacs publishes multiple regional endpoints, which suggests awareness of this friction but does not inherently make data more private. Third, assume that if a manufacturer can train machine learning models on your data, it will. This is not inherently wrongful, but it is a fact worth acknowledging. The 2022 iRobot incident shows that even with good intentions, third-party contractors and storage failures happen. If you are uncomfortable with your home imagery or floor plans being used for AI training anywhere, the only reliable safeguard is to avoid models that collect images (Roborock and iRobot) and choose manufacturers that use only LiDAR-based mapping (Dreame, Ecovacs, Eufy, Samsung). Finally, remember that privacy policies are not enforcement mechanisms. They describe what manufacturers say they do, not what actually happens in practice. The most useful approach is to combine published privacy policies with manufacturer track records, third-party audits if available, and your own comfort with each company’s disclosure practices.



