Elephants don't play chess

A classic in AI and robotics by Rodney Brooks (MIT prof, co-founder of iRobots):

There is an alternative route to Artificial Intelligence that diverges from the directions pursued under that banner for the last thirty some years. The traditional approach has emphasized the abstract manipulation of symbols, whose grounding, in physical reality has rarely been achieved. We explore a research methodology which emphasizes ongoing physical interaction with the environment as the primary source of constraint on the design of intelligent systems. We show how this methodology has recently had significant successes on a par with the most successful classical efforts. We outline plausible future work along these lines which can lead to vastly more ambitious systems.

I'm particularly interested in how such biological techniques can be applied to software. Rather than perform rigid omnipotent symbolic reasoning, autonomous agents could instead react and behave to their environment, where global software behavior emerges.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Automata theory

While I have some sympathy for Brooks's thesis, there is a connection with languages here that I've not seen emphasised. The subsumption architecture he proposed was based on finite state automata (augmented with timers) and so is surely only capable of responding to regular sequences of environmental stimuli. This seems like quite a strong theoretical limitation, unless I am missing something?

I think the implementation

I think the implementation he's put forth is a bit dated (this is from 1990 afterall!). You can build a subsumption architecture using more powerful primitives; YinYang has an object system that can encode state machines but is fundamentally more powerful than that. However, the basic idea is that grounded intelligence can emerge from a bunch of simple behaviors, possibly lots of state machines.

Subsumption + extras == subsumption?

Sure, you can add symbolic processing to subsumption to make it more powerful (in the form of a stack, tape, etc), but wasn't that exactly what Brooks was arguing against? Seems to me that "lots of state machines" is no more expressive than a single state machine.

Right, but "a lot of state

Right, but "a lot of state machines" is more modular than a single state machine. To me, a subsumption architecture doesn't address a problem of power, it addresses a problem of modularity in encoding intelligent behavior.

I don't think that's how

I don't think that's how subsumption was pitched, and indeed, as I understand it, lack of modularity seems to have been one of the factors that led to development of hybrid architectures that brought back a level of symbolic reasoning. I think the most powerful aspect of Brooks's work is not subsumption, but the critique of attempts to build general "perception", "reasoning" etc. capabilities as if these were separable concerns.

Modularity

Brooks's subsumption architecture is 'modular' in the sense that the system is described by wiring together components with narrow interfaces and relatively rich internal behavior. (That is, you could break development work down into a bunch of modules.) Unfortunately, the model is neither extensible nor compositional nor scalable, so you don't gain much advantage from that modularity.

I agree with the vision Brooks is pitching - that 'intelligent' behavior emerges from coordination of simpler behaviors, that there must be a tight relationship between perception and action - these are beliefs I also hold. Symbolic reasoning can be involved on the path, so long as that reasoning remains modular. Multi-agent systems fulfill this general vision without the limitations imposed by the subsumption architecture.

Unfortunately, the model is

Unfortunately, the model is neither extensible nor compositional nor scalable, so you don't gain much advantage from that modularity.

Indeed, not really modular at all.

Edit: To clarify, I don't believe that modularity was a strong distinguishing feature of subsumption. It's not as if preceeding systems weren't modular. It's just that they were often based on a top-down a priori decomposition into grand general components with labels like "reasoning system", "perception", "executive" etc., which didn't really make much sense when you came to solve real problems (in robotics; they were often very useful in other domains). The "revolution" of subsumption and behaviour-based AI was that you really wanted decomposition on a per-behaviour/task basis, and so you can't design a general perception (etc) module without first deciding what it is you want to perceive, and why.

The arguments against symbolic reasoning seem to me to be more about minimising the amount of potentially stale information kept about the world, which is nicely achieved by reactive planning (e.g., teleo-reactive programs), without throwing the symbolic baby out with the bathwater.

Modularity of Subsumption

Granted, subsumption is about a particular form of modular decomposition (into smaller, independent behaviors) whereas the symbolic reasoning techniques tended to 'modularize' (if you can call it that) into cross-cutting aspects.

But I believe that this modularity of behavior, eliminating that cross-cutting reasoning, is among the relevant distinguishing features. After all, there are ways we could support cross-cutting tweaks to our 'behavior' decompositions, yet the subsumption architecture doesn't support you in these patterns.

The arguments against symbolic reasoning seem to me to be more about minimising the amount of potentially stale information kept about the world

That wasn't my impression. 'Symbolic reasoning' as a mechanism for system control requires a fairly comprehensive model of both the world and the robot's capabilities. The motivation Brooks describes is to skip this step, create a tighter binding between perception and action. Memory and stale sensory information and structured values (on the wires) still fit into the subsumption architecture; the issue was that the robot should not possess a model of itself nor of how its own behaviors affect the world - i.e. that a robot's behavior model fundamentally decompose into 'doing and reacting' as opposed to 'imagining'.

I don't believe symbolic

I don't believe symbolic reasoning necessarily means a full-blown world model, although that was perhaps the direction things were heading, based on STRIPS planners and the like. My reading of Brooks's work, with the emphasis on a tight connection between perception and action was that it ruled out even trivial state such as pushdown automata, but perhaps that is an overly constrained reading. Certainly, advances since subsumption have overcome these limitations, so perhaps I'm being too critical.

Definition of Subsumption Architecture:

The original introduction to the subsumption architecture, A Robust Layered Control System for a Mobile Robot, should help clarify its emphasis and the reasoning behind it. [with some paraphrasing]:

Rather than slice the problem on the basis of internal workings of the solution, we slice the problem on the basis of desired external manifestations of the robot control system. To this end, we have defined a number of levels of competence for an autonomous robot. A level of competence is an informal specification of a desired class of behaviors for a robot over all environments it will encounter.

  1. Avoid contact with objects
  2. Wander aimlessly without hitting things
  3. "Explore" the world by seeing places in the distance that seem reachable, and heading for them
  4. Build a map of the environment and plan routes from one place to another
  5. Notice changes in the "static" environment
  6. Reason about the world in terms of identifiable objects and perform tasks related to certain objects
  7. Formulate and execute plans which involve changing the state of the world in some desirable way
  8. Reason about the behavior of objects in the world and modify plans accordingly

The key idea of levels of competence is that we can simply add a new layer to an existing stack to move to the next higher level of overall competence. We start by building a complete robot system that achieves level 1 competence. It is debugged thoroughly. We never alter that system. Next we build another control layer. It is able to examine data from the level 1 system, and is also permitted to inject data into the internal interfaces of level 1, suppressing the normal data flow. The level l system continues to run unaware of the layer above it, which sometimes interferes with its data paths.

The same process is repeated to achieve higher levels of competence. We call this architecture a subsumption architecture.

Basically, we can understand the subsumption architecture as starting with low-level jellyfish-like reflex behaviors, then progressively introducing higher layers that can suppress those reflexes in favor of a more intelligent behavior. The higher level model, in turn, can also siphon the original data and decisions of the lower level model, allowing influence in both directions. At the highest levels, it is imagined that we would eventually have truly intelligent behaviors.

The goal is that "the system can be partitioned at any level, and the layers below form a complete operational control system".

In Elephants don't play Chess, the description of a subsumption architecture is much more brief:

A subsumption program is built on a computational substrate that is organized in a series of incremental layers, each, in the general case, connecting perception to action. In our case, the substrate is networks of finite state machines augmented with timing elements.

The Elephants paper also describes some changes from the original subsumption architecture to achieve a bit more vertical modularity - i.e. the introduction of 'behaviors' "as a way of grouping AFSMs into more manageable units with the capability for whole units being selectively activated or deactivated" for greater modularity in the vertical slice. ("Behaviors act as abstraction barriers; one behavior cannot reach inside another.") When I earlier said that subsumption is 'modular', I was speaking of this 'new' behavior-level modularity.

Brooks's definitions in both papers suggest that the choice of FSMs is orthogonal to the subsumption architecture. There are, of course, plenty of valid reasons to favor FSMs - i.e. fixed space costs, real-time decisions, easy debugging and system validation. Similarly, a glance at the higher envisioned 'layers' should make it clear that rich world-modeling was not being excluded from Brooks's vision, rather that we should have no need to 'plan' our reflex behaviors, and should put most of the planning at a much higher level than precise motion control.

In his planning paper, his argument seems to be with respect to his subsumption architecture (which was developed a year earlier) - i.e. his argument amounts to: "there seems to be no 'level of competence' where a generic planner wouldn't better be broken into finer grained competencies". As I argue below, this was before behavior-level modularity (which was developed two years later) and does not consider the utility of 'observing' (and potentially 'subsuming') planned behaviors within and across layers.

Similarly, a glance at the

Similarly, a glance at the higher envisioned 'layers' should make it clear that rich world-modeling was not being excluded from Brooks's vision, rather that we should have no need to 'plan' our reflex behaviors, and should put most of the planning at a much higher level than precise motion control.

Right, but is that really much different to any previous architecture? I'm not aware of any that proposed planning for precise motion control.

This view also seems at odds with Brooks's views expressed elsewhere, e.g., in Intelligence without Reason where he argues (p2):

An observer can legitimately talk about an agent's beliefs and goals, even though the agent need not manipulate symbolic data structures at run time. A formal grounding in semantics can be compiled away.

(My emphasis). It seems to me that an agent that does "not manipulate symbolic data structures at run time" is precisely limited to finite state automata, as even a simple stack is a symbolic data structure.

Differences from previous architectures

is that really much different to any previous architecture?

I would expect so, since Brooks (in his Robust Layered Control paper) is arguing against a 'horizontal' decomposition, where planning is in the middle between sensors and actuators (i.e. such that if you removed planning, you'd no longer have a working system). However, I haven't actually poked through history to verify what Brooks is discouraging was in common use at the time.

view also seems at odds with Brooks's views expressed elsewhere

Elsewhen, too. That paper is five years after the Robust Layered paper, which mentions planning, and is more in accordance with the Planning paper mentioned by Z-bo below. A person's vision can change in five years. (I know mine did! new pair of glasses, a new paradigm, et al.)

In any case I expect that Brooks, like most intelligent people, easily partitions his views. I.e. on one hand he has an understanding for subsumption architecture that allows the possibility of planning at higher levels; on the other hand, he has since developed an argument for why we don't need symbolic planners at any level. Brooks hasn't expressed any confusion that blends the two views, so why should you?

You can see below about what I think about Brooks' 1991 arguments. My position on planning also doesn't really require agents 'manipulate symbolic data structures', just the pervasive ability to look ahead a bit and act on expected future conditions (similar to LL(K) parsers).

I may be wrong, but I

I may be wrong, but I recollect being told that the Intelligence without Reason and Intelligence without Representation papers were written several years before they were published. Still, I take your point that subsumption per se doesn't preclude planning and symbolic reasoning.

I've been interested for some time in applying automata and language theory to agents as I've not seen that approach emphasised before and it appears to offer much relevant theory when comparing agent architectures, allowing categorisation into "regular agents", "context free agents" and so forth. Your work also looks interesting. Do you blog or publish papers I could read?

Automata and Language Theory for Agents

I've been interested for some time in applying automata and language theory to agents as I've not seen that approach emphasised before

That surprises me. There is a ton of material out there, e.g. regarding multi-agent grammar systems. I provide a few links at the bottom of an earlier post.

categorisation into "regular agents", "context free agents" and so forth

I can't say I've spent any time trying to classify individual agents. I like multi-agent systems, but agent-oriented systems (which generally emphasize rich features and inheritance rules for individual agents) somehow really put me off. I.e. I feel the entire effort with BDI agents is wasted. The value for multi-agent systems comes not from the individual agent, but rather from the relationships between agents and properties of composition.

Do you blog or publish papers I could read?

No, though I'm in the process of starting a blog since you're about the tenth person who's asked.

Grammars and agents

Yes, my interest is much more on individual agents (BDI or otherwise). My impression has been that grammar formalisms have mostly been applied to coordination and communication in multi-agent systems (and tangentially related areas such as plan recognition), whereas I'm more interested in the expressivity of individual agents. I'll pursue your references though, as I'd be happy to be proved wrong.

Good luck with the blog. I look forward to it. On the MAS front I'd be interested in your take on formalisms like Cohen & Levesque's joint intentions work (and Bratman's theoretical underpinnings) or the SharedPlans approach, but this is probably getting too OT for this thread.

Planning is Just a Way of Avoiding Figuring Out What To Do Next

Sean, since you are into Kodu now, you may also like to read:

Planning is Just a Way of Avoiding Figuring Out What To Do Next

Abstract. The idea of planning and plan execution is just an intuition based decomposition. There is no reason it has to be this way. Most likely in the long term, real empirical evidence from systems we know to be built that way (from designing them like that) will determine whether its a very good idea or not. Below that level we get a choice of whether to slot in another planner or to place a program which does the right thing. Why stop there? Maybe we can go up the hierarchy and eliminate the planners there too. To do this we must move from a state based way of reasoning to a process based way of acting.

I've read all of Brooks'

I've read all of Brooks' older papers and find them fascinating. Why can't we write papers like that anymore? Grr....

Sharing Plans is a Fine Basis for Wait-Free Coordination

Brooks argues in this paper that "plans provide a useful level of abstraction for a designer or observer of a system, but provide nothing to a robot operationally". Yet a year earlier, in A robust layered control system for a mobile robot, he describes the initial subsumption architecture - which essentially describes a robot operationally as a wired-up set of observers.

That contradiction is not lost on me. Tossing out planners because they seem artificial in some architectures: baby. bathwater. both gone.

I've been developing a few programming models (temporal RDP and temporal generative grammars) that use 'plans' quite pervasively. In these models, every input to an agent is actually 'plan of input', and every output is actually a 'plan of output'. In a sense, every agent becomes a planner, albeit one limited to local reasoning from inputs. For stateful agents, we must also compute an agent's 'planned' state.

The system operates under a useful constraint: an agent's plan of output becomes its actual behavior unless there are corresponding changes to the agent's plan of input - that is, action remains tightly coupled to perception. However, unlike in a simple messaging model, agents may effectively anticipate one another and coordinate future behaviors, without any need for expensive waits, handshakes, timeouts, and synchronization protocols. If one agent 'inhibits' another, even that is planned out in advance, allowing an agent to know 'when' it will be inhibited and therefore plan accordingly.

A practical implementation of this still requires some delays. However, they becomes a logical delay in the planned behavior to adequately model the time it takes to communicate changes and compute a new plan. Agents never need to actually wait upon one another.

With plans directly accessible to agent behaviors, we can also anticipate the future - i.e. base present behavior upon predicted future observations. The cost, of course, is accuracy - since our predictions might change on us. Still, for robotics and many other domains, this is a major boon: there are a lot of problems where correcting for the occasional error is an acceptable tradeoff for improved latency.

Another advantage is efficiency - i.e. when an observation changes predictably but rapidly, we don't need to communicate periodic updates. We just communicate the prediction, then communicate an update when our predictions change. In case of sensor data, we might need to filter it through a world model, but we could 'predict' trajectories of moving objects and be accurate for tens of seconds at a time, potentially saving two orders of magnitude or more in computation costs relative to periodic updates.

Brooks's objection was more against using a separate, intelligent 'planner' that somehow has global knowledge of the world and how to manipulate the system. My vision binds planning tightly to the behavior model, and keeps it internally focused (an agent only plans its own behavior). This doesn't contradict the argument Brooks makes for objecting to planning. However, I certainly reject the title of his paper: planning is not just a way to avoid figuring out what to do next.

You can present memories of

You can present memories of the past and predictions about the future as perception. You could create different kinds of perceptions that aggregate and extrapolate; e.g., using reaction diffusion to compute paths to some locations (anti-objects). In any case, you can enhance perception so that the planning aspect of your behavior is embedded in perception rather than being an explicit part of the behavior. You can see some of this happening in Kodu's support for engrams where a robot can "remember" where a target is even after it leaves the robot's line of sight.

I'm not really getting RDP. How you can encode planning as an input and output...and how can that be elegant and simple? I'd really like to see some concrete examples.

Re: 'enhanced' perception

You can present memories of the past and predictions about the future as perception.

True. Unfortunately, it is not trivial to do so in a compositional manner that works effectively across multiple agents and stateful behaviors, much less across independently developed libraries and services. All you need is a 'quick' dip in a turing tarpit - anyone can do it, but sane people won't put more than a toe in... i.e. in practice, you'd end up with very 'shallow' predictions and coordination, similar to how developers in OO languages often achieve very 'shallow' reactivity via observer pattern.

To achieve the aforementioned features in practice requires model-level support. Presenting 'planned' futures as 'enhanced' perception is essentially what I do for temporal RDP... just, pervasively and correctly.

How you can encode planning as an input and output?

'Planning' (verb) is encoded in the transition from input(s) to output(s). The inputs and outputs effectively are 'plans' (noun).

The plans in temporal RDP are very simplistic - no support for 'choice' or 'contingency', just a linear relationship between time and state. I rely on reactive behaviors or more explicit constraint expressions to support contingency.

a robot can "remember" where a target is even after it leaves the robot's line of sight

World-models, histories, learning databases and the like are powerful, stateful techniques. I use them. They synergize very well with the ability to anticipate future inputs. But practical use of such techniques is essentially orthogonal to the 'pure' (stateless) use of planning, and solve a different set of problems.

I'm not really getting RDP.

This isn't the place to explain it. My use of planning in temporal RDP could be adapted to other communication and computation models.

Autonomous rational agents...

is the mainstream research focus in much of symbolic AI today, where the focus is on how agents make decisions about what to do, and how they go about achieving them. Look at Rao's work on the Belief-Desire-Intention model of rational agency, e.g., Rao, 1995, BDI agents: From theory to practice.

When applied to robotics, this approach becomes well-grounded in physical reality.