WebTactix

Semantic Tree-Guided Parallel Multi-Agent Planning for Web Task
Code Results
Overview figure of WebTactix
Overview of WebTactix. WebTactix is a semantic tree–guided parallel-search framework for solving web tasks. It iteratively explores the web environment by (i) simplifying observations into a task-focused AxTree view, (ii) expanding multiple candidate actions in parallel and assigning a planning agent to each resulting page to produce fact-based summaries and executable next-step plans, (iii) selecting the most promising branch via a decision agent using grounded evidence stored in the semantic tree, and (iv) executing candidate plans in parallel tabs and recording their outcomes back into the tree. Memory management operations (Partially Done, Data Extraction) increase history information density and reduce redundant browsing during long-horizon tasks.

Method

Task Preprocessing

A constraint agent converts a user request into explicit constraints C={c1,c2,...,cm}, each describing a factual requirement (e.g., range, quantity, or format), so progress and stopping conditions can be checked more clearly.

Simplified Observation

WebTactix converts the raw accessibility tree into a text-based AxTree with unique indices for interactive elements, then simplifies it:

Semantic Tree Memory

WebTactix maintains a semantic tree G=(V,E). Each node represents a visited webpage state. Each directed edge corresponds to an executable plan proposed by a planning agent. After execution, the resulting pages become new nodes, and their fact-based summaries are stored to guide future branch selection.

Parallel Planning

At each step, WebTactix expands multiple candidate actions from the current page in parallel. A planning agent assigned to each resulting page produces (i) a fact-based page summary and (ii) multiple executable next-step plans grounded to AxTree indices. Plans are generated from a high-level action space:

Fact-based Decision + Recovery

A decision agent compares candidate nodes using their grounded summaries and outgoing plans, then chooses the next branch. When needed, it can (i) generate a short reflection to re-plan on the same state (Reflex), or (ii) reselect a previously unchosen node from a global queue (Reselect) to continue exploration without restarting search.

Plan Execution

Case Study

The paper presents two examples illustrating how WebTactix solves real web tasks through parallel exploration and semantic-tree-guided selection: a shopping product comparison task and a review inspection task where parallel exploration and Reflex summarize facts across branches.

Case study figure of WebTactix
Case study figure. Top: product comparison on a shopping website. Bottom: review inspection task, where parallel exploration and Reflex summarize facts across branches to identify dissatisfied users.

Results

Leaderboard: https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit?usp=sharing

Site Tasks Correct Fail N/A Success Rate (%)
SHOPPING_ADMIN 182 143 38 1 79.01
MAP 109 79 30 0 72.48
SHOPPING 187 139 47 1 74.73
REDDIT 106 89 14 3 86.41
GITLAB 180 118 54 8 68.60
MULTISITE 48 26 22 0 54.17
TOTAL 812 594 205 13 74.34
Notes: “N/A” denotes tasks without a valid evaluation outcome in the run logs.