The Architecture of Agency: Advanced Planning and Dependency Resolution in LLM Tool Orchestration

I. Foundational Principles of Tool Orchestration

The integration of large language models (LLMs) into autonomous systems marks a significant architectural shift in artificial intelligence. This evolution moves past simple applications, such as Retrieval-Augmented Generation (RAG), toward agentic systems where the LLM functions as the central "brain" or orchestrator. This orchestrator is responsible for decomposing complex user requests into executable, multi-step plans by coordinating specialized external functions, often referred to as tools \(t_i\) or APIs.

I.A. The Agentic Paradigm and the Shift from Retrieval to Planning

While LLMs excel at generating natural language and performing System I tasks—such as pattern recognition and flexible generation—they frequently encounter challenges with long-horizon, structured reasoning, which is classified as System II cognition. This limitation creates a planning bottleneck, leading to the generation of unreliable plans, especially as the required number of steps scales up. The fundamental tension arises from bridging the LLM's inherently non-deterministic nature with the strict, deterministic requirements of external tools (functions, APIs) that operate under a fixed input/output contract.

To resolve this issue, robust agentic systems integrate formal planning methods. This strategy effectively offloads the rigid, constraint-adherence aspects of System II cognition to established symbolic AI constructs. By constraining the LLM's actions through formalized mechanisms, the overall system retains the LLM's generative strengths while achieving the necessary structure and validity required for execution.

I.B. Formalizing the Tool Dependency Graph (TDG)

The prerequisite for any structured execution is the formalization of relationships between available tools through a Tool Dependency Graph (TDG), defined as \(G = (T, E)\).

Definition: Let \(T = \{t_1, t_2, \ldots, t_n\}\) be the finite set of available tools, representing the node set of the graph. A directed edge \((t_i, t_j) \in E \subseteq T \times T\) indicates a dependency where the output or functionality of tool \(t_i\) is required as a necessary input or precondition for the execution of tool \(t_j\).

Effective tool planning requires treating each tool execution as an action that transforms the system's current state \(s \in S\) into a subsequent state \(s' \in S\), where \(S\) is the state space. The application of tool \(t_i\) defines a transition function \(\text{Apply}(t_i, s) = s'\). The dependencies within the graph ensure a logically valid trajectory through this state space, moving from the initial conditions defined by the user request to the final goal state.

Traditional planning often relies on topological sorting of the TDG, which is generally structured as a Directed Acyclic Graph (DAG). However, topological sorting yields only a sequential, total order—a significant limitation in dynamic, resource-constrained environments. This approach fails to incorporate crucial operational metrics such as opportunities for parallel execution, computational cost, latency considerations, or the contextual necessity of a tool.

Consequently, the definition of a TDG edge is evolving from a simple functional prerequisite to an explicit causal link. Modeling dependencies causally is essential for failure recovery and system robustness. When a subsequent tool fails, the agent can efficiently diagnose whether the failure was due to an internal execution error or an incomplete or faulty state change produced by a preceding tool.

II. Methodologies for Robust TDG Construction

The reliability and scalability of agentic planning hinge critically on the accuracy and completeness of the TDG. Construction methodologies range from manual symbolic definition to dynamic neuro-symbolic inference.

II.A. Explicit Symbolic Modeling (Precondition-Effect Formalism)

The most rigorous approach involves defining tools using a formal symbolic framework, often adapted from classical automated planning, such as the Precondition-Effect-Result (P-E-R) schema.

Each tool \(t_i \in T\) is formally described by three components:

\[t_i = \langle \text{Pre}(t_i), \text{Add}(t_i), \text{Del}(t_i) \rangle\]

The state transition is then formally defined as:

\[s' = \text{Apply}(t_i, s) = (s \setminus \text{Del}(t_i)) \cup \text{Add}(t_i)\]

A notable advancement is Precondition Grounding, where the LLM planner autonomously generates and validates these necessary preconditions during execution. This promotes scalability and furnishes formal, explicit feedback for recovery mechanisms during failure. For constraint enforcement, frameworks leverage machine-readable schema definitions, such as JSONSchema, to guarantee structural output. The description fields within these schemas function as a "semantic compass," guiding the LLM's internal reasoning.

Furthermore, integrating formal language constraints, as demonstrated by the Formal-LLM framework, allows an external automaton to supervise the LLM's generation process, significantly increasing plan validity and executability. This architecture positions the LLM not just as an interpreter of a planning domain but as a synthesizer that translates unstructured requirements into the formal constraints needed for structural enforcement.

II.B. Automated and LLM-Inferred Graph Generation (Neuro-Symbolic Approaches)

Manually defining P-E-R schemas for large toolsets is resource-intensive. Automated neuro-symbolic methods address this by inferring dependency relationships from tool metadata and observed behavior.

GNN-Based Dependency Inference

Frameworks like GTool construct a task-specific TDG for efficient tool selection. The Tool Graph Retriever (TGR) trains a discriminator to identify dependencies and utilizes graph encoding techniques to derive better tool representations for optimal retrieval and planning. This structural analysis of tool metadata provides a critical mechanism for preemptively managing tool conflicts and redundancy.

The Interactive Graph Discovery Agent (IGDA) uses an LLM pipeline to iteratively propose experimental tool executions and updates the dependency graph \(\hat{G}_R\) based on binary feedback (success or failure) over \(R\) rounds, aiming to minimize the distance to the ground truth graph \(G^*\).

Causal Structure Learning

Beyond static metadata, dependencies can be derived dynamically from historical execution data. Structural Causal Action (SCA) models analyze action traces to learn a Causal Action Matrix (\(M\)), which explicitly encodes the causal influence of preceding states and actions on subsequent decisions. This causal knowledge enables the LLM to adjust action probabilities based on learned dynamics, offering a mechanism for continuous self-improvement of the TDG.

If execution monitoring reveals that a tool link has become unreliable, the causal influence weight for that path is automatically degraded, effectively implementing implicit runtime pruning based on observed reliability.

Methodology Principle TDG Output Key Advantages Primary Disadvantage
PDDL/P-E-R Schema Formal Symbolic Logic Explicit, static Precondition-Effect links Guarantees logical validity and debuggability Highly labor-intensive; poor scalability
LLM-Inferred Semantics (GTool, AutoGraph) Neuro-Symbolic Inference (GNNs) Request-specific graph, semantic embeddings Automates discovery, handles incomplete data Accuracy depends heavily on LLM reasoning and training data
Causal Structure Learning Dynamic Behavioral Analysis Weighted causal matrix (M) of state-action influence Learns dynamic dependencies from execution history, supports reliability/recovery Requires extensive, reliable execution data; complex training

Table I: Comparison of Tool Dependency Graph Construction Methodologies

III. Beyond Topological Sort: Advanced Dependency Resolution Algorithms

The inadequacies of simple linear scheduling derived from topological sort necessitate the use of advanced planning algorithms optimized for efficiency, scale, and feasibility assurance.

III.A. Maximizing Parallelism (Partial Order Planning - POP)

Topological sorting imposes a total, sequential execution order \(\Pi: t_1 \to t_2 \to \cdots \to t_n\), which inherently limits efficiency. Partial Order Planning (POP), or Causal Link Planning (POCL), generates a plan \(\Pi\) that is a partially ordered set of steps, maximizing concurrent execution opportunities. The plan's integrity is maintained by explicit causal links.

Causal Link: A causal link is a triple \(\langle t_i, c, t_j \rangle\), where:

A critical challenge is managing threats—an intermediate action \(t_k\) that "clobbers" the link if \(t_i \prec t_k \prec t_j\) and \(c \in \text{Del}(t_k)\). POP algorithms must resolve such threats by imposing additional ordering constraints on \(t_k\) (i.e., forcing \(t_k \prec t_i\) or \(t_j \prec t_k\)).

Modern agent frameworks, such as LangGraph and AutoGen's GraphFlow, utilize graph-based execution control (DiGraphs) that leverage this partial ordering to support structured parallelism, conditional branching, and looping behaviors. The LLM Compiler approach generates a full, static "Plan-to-Execute" upfront, pre-optimizing the sequence for parallelism.

III.B. Managing Hierarchical Complexity (HTN Planning)

For large-scale, complex tasks, Hierarchical Task Network (HTN) Planning offers superior management of complexity. HTN defines a method to decompose non-primitive tasks \(T_{NP}\) into smaller sub-tasks until the plan consists entirely of primitive actions \(T_P = T\) (tool calls). This hierarchical structure intrinsically simplifies global planning by enabling the system to focus on high-level skills and objectives, ensuring scalability.

The LLM is deployed to drive this decomposition process, generating a sequence of high-level sub-problems that can be refined iteratively or solved in parallel using a Divide and Conquer strategy.

III.C. Guaranteeing Feasibility (Constraint Satisfaction Problems - CSP)

To move agentic planning from "statistically likely to be correct" to "guaranteed valid," tool selection and ordering must be framed as a Constraint Satisfaction Problem (CSP). A CSP is formally defined as a triple \(\langle X, D, C \rangle\), where:

The objective is to find an assignment of values to \(X\) such that all constraints in \(C\) are satisfied. LLMs play a crucial role by automating the translation of the natural language problem description into a CSP blueprint, defining the required variables, constraints, and objective functions. This constraint-compliant linguistic optimization is essential because LLMs prioritize statistical likelihood over absolute logical consistency, and the integration of formal constraints prevents the generation of invalid or non-executable plans.

III.D. Reliability through Structured Failure Recovery

The inherent non-determinism and fragility of LLMs necessitate formalized recovery mechanisms. Structured Reflection introduces a trainable mechanism for error correction. When a tool execution fails, the agent diagnoses the error based on formal evidence (e.g., failure to meet a grounded precondition) and proposes a corrected, executable follow-up tool call \(t_i'\). This approach transforms the process of moving "from error to repair" into a controlled, optimizable loop.

IV. Contextual Reasoning and Dynamic Execution Plan Optimization

Contextual reasoning allows the agent to integrate real-time constraints and performance metrics into the TDG, transforming the graph into a dynamic execution model by weighting edges and pruning nodes based on task requirements.

IV.A. Dynamic Graph Pruning and Tool Relevance

Contextual information enables the planner to adapt the execution topology based on the specific user request, dynamically removing tools that are available but contextually irrelevant. The Adaptive Graph Pruning (AGP) framework achieves this through a dual optimization strategy:

This task-adaptive approach significantly reduces the complexity and token consumption. Advanced frameworks use a Tool Dependency Heterogeneous Graph (TDHG), where node embeddings fuse static API schema structure with historical invocation behavior. This context (reliability, cost) guides a heuristic search strategy toward the most efficient and dependable tool sequence.

IV.B. Cost-Aware Planning and Scheduling Optimization

In production environments, execution costs are key variables that must be minimized alongside successful task completion.

Modeling Sequential Costs

Cost-aware planning is a sequential decision problem where the selection of tool \(t_i\) affects the budget for subsequent actions \(t_{i+1}, \ldots\). Let \(C(t_i)\) be the computational cost (e.g., latency, token consumption) of executing tool \(t_i\). The agent seeks to minimize the total expected cost of the plan \(\Pi = \langle t_1, \ldots, t_k \rangle\):

\[\min \sum_{i=1}^{k} C(t_i)\]

Reinforcement Learning (RL) algorithms are used to fine-tune the LLM policy, training the agent to generate plans that consider the cascading effect of current tool choices.

Minimizing Latency with Critical Path Method (CPM)

By assigning estimated execution durations \(D(t_i)\) to each tool node, the TDG can be analyzed using the Critical Path Method (CPM). The critical path \(P_{\text{crit}}\) identifies the longest sequence of dependent activities, determining the minimum total execution time \(T_{CP}\):

\[T_{CP} = \max_{P \in \text{Paths}(G)} \sum_{t_i \in P} D(t_i)\]

CPM identifies critical tasks that must be prioritized. For maximizing inference throughput, techniques resembling shortest-job-first (SJF) scheduling are employed by using Learning-to-Rank models to predict and prioritize requests based on predicted remaining generation lengths.

Criteria Metric/Mechanism Algorithm Applied Impact on TDG
Computational Cost Token overhead, Latency, API fees Reinforcement Learning (RL), Dynamic Programming (DP) Edge weighting (Cost function minimization), sequential decision policy refinement
Execution Time Critical Path Duration, Tool I/O Latency Critical Path Method (CPM) Identifies bottleneck nodes/edges, prioritizes parallelization
Tool Relevance Task-specific necessity, Redundancy Adaptive Graph Pruning (AGP), Hard/Soft Pruning Dynamic topology construction; elimination of irrelevant nodes/edges \(T \to T'\)
Reliability/Quality Historical success rate, QoC (accuracy, variance) Relative Reputation (RR), Causal Matrix Adjustment Edge weighting (Trust/Risk maximization), tool selection filtering

Table III: Contextual Optimization Criteria and Techniques

V. Conflict Resolution and Robustness in Multi-Tool Systems

Conflicts are inevitable in multi-tool systems and must be managed through structured protocols and sophisticated arbitration.

V.A. Identifying and Classifying Tool Conflicts

Conflicts typically arise from limited resources, contradictory information, or misalignment between individual agent objectives:

V.B. Strategic Resolution Mechanisms

Conflict management moves beyond simple failure detection to strategic resolution. In multi-agent systems, resolution is often achieved through negotiation protocols, such as agents using a bidding system to allocate resources or tasks, thereby dynamically managing resource contention and task assignment.

Alternatively, irreconcilable conflicts are resolved through rule-based arbitration. These policies, which can be custom-defined or automatically generated by LLMs, dictate prioritization. For safety-critical systems, LLM-generated rules have demonstrated effectiveness in operating over execution plans to enforce compliance and mitigate high-risk actions.

V.C. Enhancing Reliability with Auditable Execution and Recovery

Due to the inherent non-determinism of LLMs, the creation of reliable, auditable execution trails is critical for high-stakes applications. Modeling tool and agent interactions as a causal graph enables precise causal tracing, allowing the system to isolate and trace failures back to the specific originating agent or interaction responsible for the faulty state change, supporting deep root cause analysis in cascaded workflows.

VI. Conclusion and Future Directions

VI.A. Synthesis: The Neuro-Symbolic Planning Stack

The development of high-performance agentic systems demands an integrated, neuro-symbolic architecture that leverages the strengths of both LLMs and formal planning methods. This synthesis requires:

  1. Automated Modeling: Utilizing LLMs to infer dynamic dependencies (semantic or causal) and synthesize formal constraints (such as PDDL or rigorous JSONSchema descriptions) for TDG construction, overcoming the labor-intensive nature of manual formalization
  2. Flexible Planning: Employing advanced algorithms like Partial Order Planning (POP) and Hierarchical Task Networks (HTN) to maximize parallel execution and manage task complexity, moving beyond the constraints of sequential topological sorting
  3. Dynamic Optimization: Integrating contextual metrics—including cost, execution time (CPM), and historical reliability—to weight the TDG, utilizing scheduling techniques (RL, DP) to generate resource-optimized execution plans that are not just valid, but economically and temporally efficient

The integration of constraint satisfaction principles, achieved through formal language supervision (e.g., Formal-LLM), is the mechanism that ensures plan validity, transforming LLM planning output from probabilistic suggestions to guaranteed executable plans, essential for enterprise reliability.

VI.B. Open Challenges and Research Trajectories

While significant advancements have been made in TDG construction and resolution, several key challenges remain for generalizable, robust deployment:

  1. Standardization of Tool Metadata: A universal adoption of explicit Precondition-Effect-Result (P-E-R) schemas, coupled with standardized fields for cost metadata and dynamic reliability scores, is required to facilitate large-scale, automated TDG inference and cross-framework optimization
  2. Real-time Cost Prediction Accuracy: Further research is needed to improve the accuracy of learning-to-rank models and RL frameworks in predicting dynamic operational costs (e.g., variable token consumption under load or API latency variance) for proactive and highly efficient scheduling
  3. Generalizing Self-Correction: Developing structured reflection and error recovery techniques that are highly generalizable across a diverse array of tool failure modes and domain types is necessary to reduce the brittleness often observed in multi-turn tool interaction

About the Author: Gordi (Ghodrat) Aalipour is a Principal Applied Scientist at Gusto, specializing in agentic AI systems, LLM orchestration, and knowledge graph integration. With a background in both mathematics and computer science, he brings a unique perspective to the theoretical foundations and practical implementations of modern AI systems.