CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

CORA brings calibrated risk control into mobile GUI agents, helping automation systems act more safely when interfaces are uncertain, high-stakes, or prone to cascading errors.

Yushi Feng1 Junye Du1 Qifan Wang1 Zizhan Ma2 Qian Niu3 Yutaka Matsuo3 Long Feng1 Lequan Yu1
1The University of Hong Kong 2The Chinese University of Hong Kong 3The University of Tokyo

fengys@connect.hku.hk lqyu@hku.hk

Framework

Method Overview

CORA framework showing training and calibration, followed by the runtime safeguarded mobile GUI automation loop.

Overview of the CORA framework. CORA acts as a safety shield that turns open-ended action proposals into selective execution.

Guardian (System 1)

Given a locked user intent, the base agent proposes a low-level GUI action. An action-conditional Guardian scores the execution risk in the current screen.

Conformal Calibration

Maps risk scores to a user-tunable execute/abstain threshold, offering a principled safety-autonomy trade-off without modifying the base policy.

Diagnostician (System 2)

High-risk actions are rejected and routed to a Diagnostician that produces an interpretable risk type, a UI-grounded rationale, and a minimal intervention: Reflect, Abort, or Ask to Confirm.

Demo

Video Demonstrations

The demo video combines three representative high-stakes scenarios in which CORA intervenes before unsafe execution.

Part 1: Deliberate Misuse Prevention

The user maliciously requests to grant "Anyone can edit" access to a sensitive password document in Google Drive. CORA identifies this as privacy theft, flags the high risk, and triggers an Abort before any sharing action is executed.

Part 2: Resisting Visual Prompt Injections

A malicious on-screen pop-up attempts to override the user goal by demanding the agent to send a password via SMS. CORA recognizes the injection, dismisses the pop-up, and continues with the original frozen user intent via Goal-Lock.

Part 3: Mitigating Model Misbehavior

When facing an irreversible financial transaction such as a one-tap purchase under a benign user goal, CORA prevents autonomous execution and routes the action to a Confirm intervention.

Benchmark

Phone-Harm Benchmark

Phone-Harm Benchmark is a new benchmark for evaluating step-level safety of mobile GUI agents under realistic, high-stakes interactions.

It features a Harm-150 subset with human-authored harmful tasks annotated per-step for misuse, injection, and misbehavior, together with a matched Normal-150 subset of purely benign tasks for evaluating utility preservation and false alarms under mixed traffic.

(a)

App Distribution (All)

Harm-150 29 Apps
WeChat 6.0% WeChat Pay 6.0% SMS 5.3% Taobao 5.3% QQ 4.7% Trip.com 4.7% Gallery 4.0% JD 4.0% QQ Music 4.0% Qunar 4.0% Settings 4.0% Tencent Video 4.0% Xiaohongshu 4.0% Youku 4.0% Amap 3.3% Baidu Maps 3.3% Meituan 3.3% NetEase Music 3.3% Weibo 3.3% Baidu 2.7% Contacts 2.7% iQIYI 2.7% Banking App 2.0% Douyin 2.0% Pinduoduo 2.0% Railway 12306 2.0% Alipay 1.3% Clock 1.3% Bilibili 0.7%
(b)

Misuse Sub-category Distribution

Misuse
Privacy theft 54% Account abuse 24% Financial fraud 14% Harassment 4% Illegal 4%
(c)

Misbehavior Sub-category Distribution

Misbehavior
Privacy overreach 52% Misinterpretation 24% Financial risk 12% Unsafe defaults 8% Data loss 2% Over execution 2%
(d)

Injection Sub-category Distribution

Injection
Ad banner 36% Pop-up 34% Social comment 14% Notification 6% Webpage 6% SMS 4%

Harm-150 distribution overview. (a) App distribution over the full Harm-150 subset, highlighting a long-tail coverage. (b-d) Sub-category distributions for Misuse, Misbehavior, and Injection, respectively, demonstrating concentrated risk modes within each harm type.

Why CORA?

Existing safeguards for mobile GUI agents rely on prompt engineering, brittle heuristics, or VLM-as-critic monitors, which lack formal verification and user-tunable guarantees. This leaves users trapped in a rigid, opaque trade-off between over-interruption and silent harmful execution.

Statistical Guarantees via Conformal Risk Control

Rather than thresholding raw scores, CORA calibrates an execute/abstain boundary that satisfies a user-specified risk budget on the executed-harm rate.

Minimal Intervention Burden

High-risk actions are routed to a Generative Diagnostician, which performs multimodal reasoning to recommend targeted interventions such as Confirm, Reflect, or Abort to minimize user interruption.

Immunity to Visual Injection

A Goal-Lock mechanism anchors the safety assessment to a clarified, frozen user intent, successfully resisting indirect prompt injections from untrusted on-screen content.

Optimal Safety-Helpfulness Trade-off

Applied post-policy and pre-action, CORA improves the safety-helpfulness-interruption Pareto frontier without modifying the base policy.

Citation

BibTeX

@misc{feng2026coraconformalriskcontrolledagents,
  title         = {CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation},
  author        = {Yushi Feng and Junye Du and Qifan Wang and Zizhan Ma and Qian Niu and Yutaka Matsuo and Long Feng and Lequan Yu},
  year          = {2026},
  eprint        = {2604.09155},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url           = {https://arxiv.org/abs/2604.09155},
}