WebTactix

Semantic Tree-Guided Parallel Multi-Agent Planning for Web Task

Overview of WebTactix. WebTactix is a semantic tree–guided parallel-search framework for solving web tasks. It iteratively explores the web environment by (i) simplifying observations into a task-focused AxTree view, (ii) expanding multiple candidate actions in parallel and assigning a planning agent to each resulting page to produce fact-based summaries and executable next-step plans, (iii) selecting the most promising branch via a decision agent using grounded evidence stored in the semantic tree, and (iv) executing candidate plans in parallel tabs and recording their outcomes back into the tree. Memory management operations (Partially Done, Data Extraction) increase history information density and reduce redundant browsing during long-horizon tasks.

Method

Task Preprocessing

A constraint agent converts a user request into explicit constraints C={c1,c2,...,cm}, each describing a factual requirement (e.g., range, quantity, or format), so progress and stopping conditions can be checked more clearly.

Simplified Observation

WebTactix converts the raw accessibility tree into a text-based AxTree with unique indices for interactive elements, then simplifies it:

Cross-page deduplication: after navigation, keep only newly introduced interactive elements to reduce redundant content.
Structure-aware rewriting: compress repetitive structures (tables, comments, product lists) by keeping headers and the first k items; remaining items can be retrieved during Data Extraction.

Semantic Tree Memory

WebTactix maintains a semantic tree G=(V,E). Each node represents a visited webpage state. Each directed edge corresponds to an executable plan proposed by a planning agent. After execution, the resulting pages become new nodes, and their fact-based summaries are stored to guide future branch selection.

Parallel Planning

At each step, WebTactix expands multiple candidate actions from the current page in parallel. A planning agent assigned to each resulting page produces (i) a fact-based page summary and (ii) multiple executable next-step plans grounded to AxTree indices. Plans are generated from a high-level action space:

Page judgement (e.g., decide whether to go_back).
Memory management (e.g., Partially Done, Data Extraction).
Web operation (click/input/select grounded to AxTree indices).

Fact-based Decision + Recovery

A decision agent compares candidate nodes using their grounded summaries and outgoing plans, then chooses the next branch. When needed, it can (i) generate a short reflection to re-plan on the same state (Reflex), or (ii) reselect a previously unchosen node from a global queue (Reselect) to continue exploration without restarting search.

Plan Execution

Partially Done: compress completed steps and confirmed facts into a concise summary to reduce long-history interference.
Data Extraction: iteratively interact with the page (optionally using Python for data processing) to return a structured task result, recorded as a single action.
Parallel tabs: execute multiple plans for the selected node in parallel tabs by replaying required history to reach the corresponding state, then recording outcomes back into the semantic tree.

Case Study

The paper presents two examples illustrating how WebTactix solves real web tasks through parallel exploration and semantic-tree-guided selection: a shopping product comparison task and a review inspection task where parallel exploration and Reflex summarize facts across branches.

Case study figure. Top: product comparison on a shopping website. Bottom: review inspection task, where parallel exploration and Reflex summarize facts across branches to identify dissatisfied users.

Results

Leaderboard: https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit?usp=sharing

Site	Tasks	Correct	Fail	N/A	Success Rate (%)
SHOPPING_ADMIN	182	143	38	1	79.01
MAP	109	79	30	0	72.48
SHOPPING	187	139	47	1	74.73
REDDIT	106	89	14	3	86.41
GITLAB	180	118	54	8	68.60
MULTISITE	48	26	22	0	54.17
TOTAL	812	594	205	13	74.34

Notes: “N/A” denotes tasks without a valid evaluation outcome in the run logs.