Esta es la implementacion educativa completa del ejemplo del articulo Como un agente decide que hacer despues (Planning vs Reactive).
Si aun no leiste el articulo, empieza por ahi. Aqui el foco es solo el codigo y el comportamiento de las dos estrategias.
Que demuestra este ejemplo
- Como una misma tarea se ejecuta con dos enfoques: Planning y Reactive
- Como el agente planning primero construye un plan y, ante fallos, lo reconstruye
- Como el agente reactive elige la siguiente accion despues de cada resultado
- Por que reactive suele ser mas resistente a flakes, mientras planning es mas simple de controlar
Estructura del proyecto
foundations/
└── planning-vs-reactive/
└── python/
├── main.py # ejecuta ambas estrategias y compara resultados
├── llm.py # capa simple de decisiones: plan / replan / next action
├── planning_agent.py # agente con plan previo
├── reactive_agent.py # agente que actua segun la situacion
├── tools.py # herramientas + flake determinista para aprendizaje
└── requirements.txt
tools.py incluye intencionalmente un fallo controlado (determinista), para que la diferencia entre enfoques sea reproducible en cada ejecucion.
Como ejecutar
1. Clona el repositorio y entra a la carpeta:
git clone https://github.com/AgentPatterns-tech/agentpatterns.git
cd foundations/planning-vs-reactive/python
2. Instala dependencias (este ejemplo no tiene paquetes externos):
pip install -r requirements.txt
3. Ejecuta la comparacion:
python main.py
Que construimos en codigo
Creamos dos robots y les damos la misma tarea.
- el primer robot primero arma un plan y luego sigue el plan
- el segundo robot decide el siguiente paso durante la ejecucion
- si algo falla, observamos quien se adapta mas rapido
Es como dos viajes: uno va con mapa preparado, el otro se orienta en el camino.
Codigo
tools.py — herramientas con flake determinista
from typing import Any
def make_initial_state(user_id: int) -> dict[str, Any]:
# Deterministic flake config for teaching: orders fails once, then succeeds.
return {
"user_id": user_id,
"_flaky": {
"orders_failures_left": 1,
"balance_failures_left": 0,
},
}
def fetch_profile(state: dict[str, Any]) -> dict[str, Any]:
user_id = state["user_id"]
return {
"profile": {
"user_id": user_id,
"name": "Anna",
"tier": "pro",
}
}
def fetch_orders(state: dict[str, Any]) -> dict[str, Any]:
flaky = state["_flaky"]
if flaky["orders_failures_left"] > 0:
flaky["orders_failures_left"] -= 1
return {"error": "orders_api_timeout"}
return {
"orders": [
{"id": "ord-1001", "total": 49.9, "status": "paid"},
{"id": "ord-1002", "total": 19.0, "status": "shipped"},
]
}
def fetch_balance(state: dict[str, Any]) -> dict[str, Any]:
flaky = state["_flaky"]
if flaky["balance_failures_left"] > 0:
flaky["balance_failures_left"] -= 1
return {"error": "billing_api_unavailable"}
return {"balance": {"currency": "USD", "value": 128.4}}
def build_summary(state: dict[str, Any]) -> dict[str, Any]:
profile = state.get("profile")
orders = state.get("orders")
balance = state.get("balance")
if not profile or not orders or not balance:
return {"error": "not_enough_data_for_summary"}
text = (
f"User {profile['name']} ({profile['tier']}) has "
f"{len(orders)} recent orders and balance {balance['value']} {balance['currency']}."
)
return {"summary": text}
llm.py — capa educativa simple de decisiones
from typing import Any
DEFAULT_PLAN = ["fetch_profile", "fetch_orders", "fetch_balance", "build_summary"]
def create_plan(task: str) -> list[str]:
# Learning version: fixed starter plan keeps behavior easy to reason about.
_ = task
return DEFAULT_PLAN.copy()
def replan(task: str, state: dict[str, Any], failed_step: str, error: str) -> list[str]:
# Learning version: rebuild plan from missing data in state.
_ = task, failed_step, error
remaining: list[str] = []
if "profile" not in state:
remaining.append("fetch_profile")
if "orders" not in state:
remaining.append("fetch_orders")
if "balance" not in state:
remaining.append("fetch_balance")
if "summary" not in state:
remaining.append("build_summary")
return remaining
def choose_next_action(task: str, state: dict[str, Any]) -> str:
# Learning version: one-step-at-a-time policy driven by current state.
_ = task
if "profile" not in state:
return "fetch_profile"
# If orders just failed, fetch other missing data first.
if state.get("last_error") == "orders_api_timeout" and "balance" not in state:
return "fetch_balance"
if "orders" not in state:
return "fetch_orders"
if "balance" not in state:
return "fetch_balance"
return "build_summary"
llm.py aqui no llama a API externa. Es una simplificacion intencional para aprendizaje: asi se ve mas claro la diferencia entre planning y reactive.
planning_agent.py — primero plan, luego ejecucion
from typing import Any
from llm import create_plan, replan
from tools import build_summary, fetch_balance, fetch_orders, fetch_profile, make_initial_state
TOOLS = {
"fetch_profile": fetch_profile,
"fetch_orders": fetch_orders,
"fetch_balance": fetch_balance,
"build_summary": build_summary,
}
def run_planning_agent(task: str, user_id: int, max_steps: int = 8) -> dict[str, Any]:
state = make_initial_state(user_id)
plan = create_plan(task)
trace: list[str] = [f"Initial plan: {plan}"]
step = 0
while plan and step < max_steps:
action = plan.pop(0)
step += 1
trace.append(f"[{step}] action={action}")
tool = TOOLS.get(action)
if not tool:
trace.append(f"unknown_action={action}")
state["last_error"] = f"unknown_action:{action}"
continue
result = tool(state)
trace.append(f"result={result}")
if "error" in result:
state["last_error"] = result["error"]
trace.append("planning: replan after failure")
plan = replan(task, state, failed_step=action, error=result["error"])
trace.append(f"new_plan={plan}")
continue
state.update(result)
state.pop("last_error", None)
if "summary" in state:
return {"mode": "planning", "done": True, "steps": step, "state": state, "trace": trace}
return {"mode": "planning", "done": False, "steps": step, "state": state, "trace": trace}
El agente planning toma una decision estrategica por adelantado, y solo ante fallos pasa a reconstruir el plan.
reactive_agent.py — decision en cada paso
from typing import Any
from llm import choose_next_action
from tools import build_summary, fetch_balance, fetch_orders, fetch_profile, make_initial_state
TOOLS = {
"fetch_profile": fetch_profile,
"fetch_orders": fetch_orders,
"fetch_balance": fetch_balance,
"build_summary": build_summary,
}
def run_reactive_agent(task: str, user_id: int, max_steps: int = 8) -> dict[str, Any]:
state = make_initial_state(user_id)
trace: list[str] = []
for step in range(1, max_steps + 1):
if "summary" in state:
return {"mode": "reactive", "done": True, "steps": step - 1, "state": state, "trace": trace}
action = choose_next_action(task, state)
trace.append(f"[{step}] action={action}")
tool = TOOLS.get(action)
if not tool:
trace.append(f"unknown_action={action}")
state["last_error"] = f"unknown_action:{action}"
continue
result = tool(state)
trace.append(f"result={result}")
if "error" in result:
state["last_error"] = result["error"]
continue
state.update(result)
state.pop("last_error", None)
return {"mode": "reactive", "done": False, "steps": max_steps, "state": state, "trace": trace}
El agente reactive no se aferra al plan inicial. Evalua el estado despues de cada accion y elige el siguiente paso segun el state actual.
main.py — comparacion de dos enfoques
from planning_agent import run_planning_agent
from reactive_agent import run_reactive_agent
TASK = "Prepare a short account summary for user_id=42 with profile, orders, and balance."
USER_ID = 42
def print_result(result: dict) -> None:
print(f"\n=== {result['mode'].upper()} ===")
print(f"done={result['done']} | steps={result['steps']}")
print("summary:", result["state"].get("summary"))
print("\ntrace:")
for line in result["trace"]:
print(" ", line)
def main() -> None:
planning = run_planning_agent(task=TASK, user_id=USER_ID)
reactive = run_reactive_agent(task=TASK, user_id=USER_ID)
print_result(planning)
print_result(reactive)
if __name__ == "__main__":
main()
requirements.txt
# No external dependencies for this learning example.
Ejemplo de salida
=== PLANNING ===
done=True | steps=5
summary: User Anna (pro) has 2 recent orders and balance 128.4 USD.
trace:
Initial plan: ['fetch_profile', 'fetch_orders', 'fetch_balance', 'build_summary']
[1] action=fetch_profile
result={'profile': {'user_id': 42, 'name': 'Anna', 'tier': 'pro'}}
[2] action=fetch_orders
result={'error': 'orders_api_timeout'}
planning: replan after failure
new_plan=['fetch_orders', 'fetch_balance', 'build_summary']
[3] action=fetch_orders
result={'orders': [{'id': 'ord-1001', 'total': 49.9, 'status': 'paid'}, {'id': 'ord-1002', 'total': 19.0, 'status': 'shipped'}]}
[4] action=fetch_balance
result={'balance': {'currency': 'USD', 'value': 128.4}}
[5] action=build_summary
result={'summary': 'User Anna (pro) has 2 recent orders and balance 128.4 USD.'}
=== REACTIVE ===
done=True | steps=5
summary: User Anna (pro) has 2 recent orders and balance 128.4 USD.
trace:
[1] action=fetch_profile
result={'profile': {'user_id': 42, 'name': 'Anna', 'tier': 'pro'}}
[2] action=fetch_orders
result={'error': 'orders_api_timeout'}
[3] action=fetch_balance
result={'balance': {'currency': 'USD', 'value': 128.4}}
[4] action=fetch_orders
result={'orders': [{'id': 'ord-1001', 'total': 49.9, 'status': 'paid'}, {'id': 'ord-1002', 'total': 19.0, 'status': 'shipped'}]}
[5] action=build_summary
result={'summary': 'User Anna (pro) has 2 recent orders and balance 128.4 USD.'}
Nota: el ejemplo esta hecho determinista para aprendizaje.
En cada ejecucionfetch_ordersfalla una vez, por eso el trace se reproduce de forma estable.
Lo que se ve en la practica
| Agente Planning | Agente Reactive | |
|---|---|---|
| Cuando elige pasos | Al inicio (plan) | Despues de cada accion |
| Reaccion al fallo | Reconstruye el plan | Elige de inmediato un nuevo paso |
| Previsibilidad | Mas alta | Mas baja |
| Resistencia a flakes | Media | Normalmente mas alta |
Que cambiar en este ejemplo
- Cambia
orders_failures_leftenmake_initial_statede1a2y mira como cambia el trace - Agrega un limite separado para la cantidad de
replanen el agente planning - Agrega la regla "no repetir la misma accion 3 veces seguidas" para el agente reactive
- Pon
balance_failures_left = 1y observa quien se recupera mas rapido tras dos fallos diferentes
Codigo completo en GitHub
En el repositorio esta la version completa de esta demo: dos estrategias de agente, herramientas compartidas y trazado de pasos.
Ver codigo completo en GitHub ↗