This is the draft for discussing the philosophy underlying the research on learning systems when the IIIA started a long term research line in Machine Learning. The contents of this paper owe a lot to discussions with Ramon Lopez de Mantaras and Walter van de Velde, but the opinions expressed in the draft are those of the author need not be, and indeed not all are, shared by these or other persons.
This is a Draft
IIIA - Institut d'Investigació en
CSIC - Spanish Scientific Research Council,
Campus Universitat Autonòma de Barcelona,
08193 Bellaterra, Catalonia, Spain.
Voice: +34 3 5809570;
Fax: +34 3 5809661;
"Memory, reason, and free will were considered, in the ancient times, the three higher forces of human soul, almost to the level of the Holy Trinity" (S Lenz). It may seem strange, from today's stance, that memory were held in such high esteem. Particularly, in Artificial Intelligence free will was methodologically unaccounted for, reason was considered the core of intelligence following the common sense definition that "reason is the intellectual faculty by which conclusions are drawn from premises" (The Concise Oxford Dictionary), and memory was relegated to the computer metaphor of data storage accessing.
In Artificial Intelligence there were no place for notions like the capability of memory and the weight of memory, the helps of memory, and the gaps of memory, the fidelity of memory and the reconstruction of experiences, even though all the time there was plenty of place for the notions of knowledge, inference, deduction, deducibility, and even time and learning. The metaphor of computer memory splitted off the organization of inference (the procedure), knowledge (the contents) from the organization of memory (the structure). In designing new systems, inference and knowledge posed new problems and requirements, and new achievements were made to overcome them, all the time being memory a subservient issue organized just for achieving the said requirements.
Instead, facing the problems and requirements posited by memory organization itself, new light can be shed to the inference and knowledge issues in Artificial Intelligence. This is the theme of this draft, carving in a trend or tradition of AI that has not carelessly neglected the role of memory organization in the architecture of cognition, and reaching towards a research programme in which the interplay of inference, learning, and memory can offer new insights of scientific relevance and technical interest.
The idée-force of the MMP is that the notion of memory has to be considered of paramount importance in AI research. Furthermore, this new active role of memory is also evolving not only in computational models of intelligence but also in computational models of massive parallelism and concurrency. As it will be seen in this report, the memory-based approaches in AI are the ones that more easily are being adapted to the new wave of massively parallel computer architectures. This is remarkable for two reasons:
a) Parallel processing challenges computational methods, algorithms, and techniques; only slowly these aspects will be adapted to the parallelism of the future. Memory-based approaches instead seem to accept the change and take advantage of it .
b) History of AI research shows that AI development has been heavily influenced by the existing computer architectures. For this reason, AI has continually pushed forward innovations in the design computer architectures and programming languages. With the advent of massive parallelism, instead, AI methods have been seemingly unable to profit from this gain in computational power: the Connection Machine and the AI applications that run on it are paradigmatic. An argument goes in that symbolic reasoning and AI techniques are essentially sequential, and therefore cannot profit from the parallelism trend. As a solution to this it has been proposed to focus on subsymbolic computation in order to gain from parallel computing [STEELS]. The theme of this report is that this is not necessarily so, although the subsymbolic approach is, for other reasons, also promising (see [SAMS]); that developing the experience-based tradition in AI research, as the MMP project proposes, AI research can profit from the coming trend of massive parallelism and even push forward new requirements that will innovate computer architectures and language design.
i) learning is absent or just some external module that could be interfaced with a given problem-solving architecture, i.e. is orthogonal to the problem-solving research. Decomposition of problems in subproblems is a good strategy for research, but it is necessary in the long run to be able to integrate them again. It is significant that only machine learning projects based on an architecture for general intelligence (e.g. [PRODIGY, SOAR, THEO]) have tried recently this integration of learning techniques with problem-solving architectures, while mainstream AI keeps stuck on the knowledge-acquisition bottleneck and the brittleness issues.
ii) memory plays no active role, merely being a receptacle of inputted or inferred information. By `no active role' I just mean that memory is not viewed or designed as doing nothing particularly interesting or intelligent. A tentative explanation to this passive role of memory is that AI researchers had applied the metaphor of von Neumann computer architecture's memory (which is a particular kind of repository with an adress-indexing mechanism). Instead, other AI approaches have conceptualized memory using the metaphor of memory as seen by psychology and cognitive sciences, as we will presently see.
In fact there has been another tradition in AI, more related to psychology, from Minsky and Schank until now with the connectionist research in associative memory and learning; from the Perceptron until the Society of Mind, the experiential-based tradition has came across a number of foundational notions for AI, like stereotypes (frames, scripts), remindings, analogy formation and cases. An underlying idea has been that solving a problem may not be due to exploring search spaces sequentially, but has to do with recognition, pattern-matching, and spreading activation in a massively parallel way. Because of this underlying idea the notion of a stereotypical structure has been central to these approaches, as well of notions like indexing and pattern matching, all of them notions which are viewed by the search-oriented tradition as mere artifacts used for expediency reasons by sequential information processing methods .
We will develop this intuitions in the following sections for AI and machine learning approaches.
Explanation-based learning, the most currently pursued machine learning approach has also been adopted by memory-based learning. As MOPs are fossilized plans, XPs (explanation patterns) are fossilized explanations [Schank 89]. The explanations packaged in a XP may come from the agent experiences or from common knowledge (like "too many cooks spoil the broth"). An explainer of a case-based style can use old parts of XP that fit the new case, and can recognize and modify the parts that do not apply.
The kernel of this approach is that the intelligent agency does not work from first principles (unless it is imperative) or applying a method for solve a task, but have recourse to the memories of past experiences that can be adapted to try to solve the current situation. Notice here that learning and problem solving is always situated. What I mean by `situated' is that learning and problem-solving depend essentially of the current situation and of the situations encountered in the past. As a consequence, the notion of an agent is central, while in problem-solving the method is the central idea and there exist no agent and no situations in which this agent evolves. We will expound the idea of situated learning in [[section]]7.
The Society of Mind [MINSKY] is a theory of intelligence that integrates Minsky's previous work, like frames or K-lines. As pointed out in [MINSKY], to posit the problematic of memory is to also raise the questions of how knowledge is represented, stored, retrieved, and used. The Society of Mind assumes that mind is composed by a collection of mindless, highly interconnected, concurrently active components called agents. An agency is any assembly of parts that can achieve some effect as a unit (regardless of how it is achieved). The theory of memory in [The Society of Mind] is based on a type of agents called Knowlege-lines. A K-line, in essence, is responsible for memorizing how the system solved a task by means of making a list of the agents involved in a given activity. In this theory re-membering is a process that activates the agents attached to the given K-line by some previous `learning' episode. In the Society of Mind, memory is an unspecific word that include re-membering, re-collecting, re-minding, and re-cognizing but all of them are understood as taking place in the reproduction of a former partial mental state. This process is carried out by memoizers (an agent that can reset an agency into some previously useful state) and recognizers (an agent that become active in response to some pattern of input signals).
This aspect of learning involves modifying a successful problem-solving process into a memory structure amenable to reuse for the same kind of problems. The metaphors of `compilation' and `operationalization' have been used for describe limited versions of this learning. From our point of view these metaphors are misleading, and moreover this type of learning only has sense if we commit our approach to several assumptions, like assuming that solving a problem involves a recognition of the problem type, the retrieving and adapting of methods used for similar problems, etc. We will explore in [[section]]6 the particular commitments of MMP for these aspects of learning regarding (i) the reproduction of former partial mental states and (ii) the coding and representation of partial mental states.
Another important notion relevant to the MMP approach is the accumulation learning: "The accumulation of a myriad subagents: We learn many different ways to achieve each kind of goal." (p. 308). Accumulation is "a type of learning based on collecting examples of an idea without attempting to describe what they have in common." Let us remind also that "Human thought is not based on any single and uniform kind of `logic', but upon a myriad processes, scripts, stereotypes, critics and censors." (p. 184). This two reasons here remark why the massive aspect of MMP is also conceptually a requirement for a machine learning approach (we could also substitute in MMP the first M for Massive by that of Myriad).
While this theoretical statements are similar to ours, Chandrasekaran's research approach is completely different from the experiential-based paradigm. His motivation come from research on the generic tasks for problem-solving in expert systems, and his main difficulty has been that generic tasks have proved to be less flexible than needed. The solution he advocates is implementing the generic tasks in a general-purpose problem-solver like SOAR (which is not memory-based but section [[section]]22.214.171.124 is dedicated to discuss SOAR). In this way, the problem-solver conquers the brittleness problem allowing the generic tasks to degrade gracefully into search weak methods.
Chandrasekaran's approach is too vague to become an specific architecture and indeed at the Workshop [ESF] it was presented as an intellectual tool for integrating different architectures and methods by means of the memory, and is not a part of his research programme. What is needed for a research approach is the sort of specific commitments that MMP takes in sections from [[section]]3 to [[section]]6. But before we will revise the work that has these similar commitments, as CYC, case-based reasoning and discovery systems.
The problem of starting coding a concept is that it depends on many other concepts that are absent at the beginning. It might happened that these coding were without end, but as the knowledge base grew large the set of primitives began to converse. The focus is on CYC being able to cope with 99% of the common cases, and, for instance, temporal issues are represented with a simple grammar of 50 relations (such as ends-during). CYC uses a score of inference methods (inheritance, automatic classification, slot-value, subsumption, Horn clause rules, etc) instead of using a unique general inference procedure. From 1984 to date (1989) there are half a million units in CYC and is expected to increase in two orders of magnitude by the end of 1994.
The key position about CYC is that the absence of a consensus reality KB, what we can name as memory of world facts, is the major source of problems in diverse areas, causing the bottleneck in knowledge acquisition, the brittleness in expert systems, the semantic aspects in natural languages understanding, and analogy usage in machine learning.
The MMP approach is complementary to the CYC goals. Of course, we have stressed the necessity of a massive memory but attacking a scope like that of CYC is unachievable. In fact, the advantage of MMP is that it could swiftly profit (from 1994 on) from the results of encoding, in a frame-like manner, a vast amount of memory about mundane knowledge. This advantage show how the MMP connects to current (or future) breakthroughs in A.I. research. Other connections of MMP to this approach are manifest in [[section]]2.2.2, since CYC is a research descendant of EURISKO discovery learning program, and also in the whole of [[section]]6 as regards inference and knowledge representation.
Finally, both CYC and MMP research converge on the identification and implementation of the so-called intermediate inference methods or cognitive clichés. Intermediate methods are between general-purpose weak methods and domain-dependent methods. The features that characterize intermediate methods are a) they are domain-independent, b) they are not general but can be reused for most of domains, and c) they are not weak, put have powerful methods that can be used when domains provide certain characteristics.
For instance, in a task where there is knowledge about possible observations that specifies the cost of observations and the apriori informativeness degree of observations, the problem solver can use the LESSER-COST-FILTER and the INFORMATION-LIKELIHOOD-FILTER clichés. In order to combine the two criteria embodied by these filters the problem solver can use, for instance, a evidence-combination cliché that weights the relative importance of cost and informativeness. See [PS CLICHéS] for a more detailed explanation of problem-solving clichés.
There is no common agreement nowadays about the nature and feasibility of these intermediate methods, although several research efforts can be seen as sharing this underlying approach, which is called cognitive clichés in MMP and is further explained in [[section]]7.2. In CYC, for instance, there have been identified and implemented two dozens of intermediate inference methods. In the course of the MMP project the learning methods are to be analyzed and expressed in the form of cognitive clichés that specify for some domain features which inference methods may be useful to apply.
The different dimensions along which the learning methods can be defined (intensiveness of knowledge use, case- vs. generalization-based approaches, etc) are contemplated in MMP, for it is our tenet that intelligent behavior emerges from the use of a myriad different methods. The MMP position is not a syncretist proposal: the use of any method is a function of the past experience and of the epistemological features of domain and task environment modulated by cognitive economy reasons. This formulation of the experience-based paradigm, developed in [[section]]4-7, is a proposal as regards how to combine the different learning methods, according to its features, in a comprehensive framework. Nevertheless, the MMP approach takes some options as to which are the basic method features upon which others can be build upon: the explicit representation of past experiences (cases) and the view of generalizations (rules) as a subtype of experiences (cases), for instance. Let's go further in to this topic.
Case-based and analogical reasoning have no clearcut distinctions and I'll use indistinctly[*]. The basic mechanism of case-based reasoning (CBR) is the memorization of experiential episodes and their organization (indexing) in memory so that it can be recovered when new `similar' situations arise, and adapted so as to be useful to that new situation. If we compare CBR with EBL we see that EBL basic mechanism is a normative model of a domain and the purpose of building up a generalization of instances of a concept; instead, CBR has an adaptive model of a domain and generalizes the situations (the indexing) in which one case is worth (useful) to be remembered and reused.
The major commitments of SOAR architecture are:
(1) uniform task representation by problem spaces,
(2) universal subgoaling: any decision can be object of goal-directed attention, particularly, it can reflect on its own problem solving to take a control decision in a goal-directed way,
(3) uniform representation of all long-term memory by a production system,
(4) knowledge to control search expressed by preferences,
(5) All goals arise to cope with impasses, i.e. when there is a lack of knowledge about what to do next ,
(6) the basic problem-solving methods arise directly from knowledge of the task. Weak methods like hill-climbing, means-ends analysis, etc., are realized by adding control knowledge in form of productions (without need for a procedural representation for each method).
(7) Continuous learning by experience through chunking. SOAR automatically learns by caching the results of achieving its subgoals as new productions.
The SOAR approach is in some aspects close to MMP and, in others, it is very far. It is close to the experience-based paradigm in that learning is continuously in action during problem-solving and problem-solving is realized by productions that have been chuncked (learned) previously. It departs from the MMP approach in that SOAR has no commitment to memory as such: there is a long-term memory where productions are stored and retrieved when needed by the problem-solver.
There is a most interesting point on SOAR, namely universal subgoaling (point 2). We want to state here that universal subgoaling achieves, by means of SOAR's capability for introspection what Minsky calls Recursion Principle:
Recursion Principle: When a problem splits into smaller parts, then unless one can apply the mind's full power to each subpart, one's intellect will get dispersed and leave less cleverness for each new task. (p. 161 [Minsky]).
The Recursion Principle poses to MMP the requirement the representation language used has some introspection capabilities. This issue is discussed in [[section]]6, but here SOAR is a good example to show (a) that introspection is a way to apply the full problem-solving capability to a part of the problem-solving process itself, and (b) that the exact type of introspection depends on the organization of each system, being in SOAR the capability to express the system's internal problems explicitly as goals (i. e. in a way to apply the system's problem-solving capabilities). The justification in MMP to include introspection as a research issue is that it has to provide the same kind of capability, and furthermore, we argue that memory is the proper place in which introspection is a relevant issue (meaning it can be advantageous) instead of programming languages (where introspection has shown to provide limited advantages). The reason is that having uniformly represented past problem-solving experiences as memory structures, learning can take place, at a meta-level, over these experiences, which form the object-level.
In the sense that THEO aims at an architecture that learns from experience in problem-solving i is close to the MMP proposal. Even more so, at a technical level, since a) it is also frame-based representation and b) memorization, in the form of caching of experiences with dependency records, is a basic mechanism for learning and for the whole system. What is shared between THEO and MMP approaches, at a theoretical level, is the basic commitment of learning from experience: an intelligent agent just plans or solves a problem when it has to, but for routine or familiar situations just knows what to do by means of what has learnt from its past history.
However, between MMP and THEO approaches there is one basic difference at the theoretical level and another one at the technical level, the second entailed by the first. The theoretical difference is that MMP has some commitments about memory and remembering that THEO has not. The only commitment THEO has in this respect is caching of past inferences with a dependency mechanism that assures those cached inferences will be applied whenever they are still applicable, and THEO will resort to problem-solving when old inferences are not applicable. The technical difference resulting from this is that the basic experience of THEO can be applied only to new situations that are exactly the same, while remembering at MMP when an exact match is lacking, will restores the most similar past experience that can be then (maybe) adapted in order to be applied to the new situation. This can be formulated as the basic difference between case-based and rule-based learning, as discussed in [[section]]7.2.2 and in [Clarck 89].
For instance, in [THEO] it is shown that once it learns the weight of a `thing', that is THEO learns, by a chunking mechanism that if something is 10*10*10, and has density 5 and is under PYRAMID1, then its weight is 8000. THEO caches the result (weight 8000) and the rule with their corresponding dependencies. But this experience is directly useful only for anything that is 10*10*10, has density 5, and is under PYRAMID1. Imagine the system encounters something 8*9*11 and is not under PYRAMID1. THEO can not use directly the old experience, it only can apply the inference method again and recompute the new weight. On the other hand, the MMP approach specifies that the spontaneous inference of the system remembers the old episode and has a degree of mismatch, so it knows that, with some approximation, it can assume that the weight is close to 8000. Assume means that it is a non-deductive inference that may be wrong, for instance if density is very different from 5. The difference is that for MMP this behavior is the basic thing, while for THEO to reuse this kind of experience it would need to implement some case-based learning on top of THEO and use it instead of chunking.
There is another issue in this example, namely learning when it is useful to learn or to use one specific inference method (a goal for both MMP and THEO). Imagine MMP assumes something weights close to 8000. Then if the goal of the system is to pick it up to somewhere, and the system can pick up anything up to 20000, then computing the exact value is needless, while if the limit weight it can pick up is 9000 its is surely useful to use some other more accurate method for inferring the actual weight. The distinction of spontaneous and deliberate inference in MMP helps to integrate common sense reasoning (like assuming the weight is close to 8000) with deliberate problem solving and remembering offers the baseline to investigate what to learn and when to learn.
i) theoretically memory is something to be accounted for in AI, and in this report we have been showing that it sheds light on other aspects of knowledge, learning and reasoning,
ii) computational models of intelligence have strong connection with computational assumptions, and memory as a computational metaphor is changing strongly as AI is pushing to massively parallel and concurrent computational models.
Maybe this second aspect is less developed in the report so far, so it is time to manifest that this newer, more active role of memory has evolved in research on parallel computation architectures as a need to understand and build more comprehensive models of hardware, algorithms, and programming techniques. The massive parallelism trend offers big expectations for high performance, and higher in orders of magnitude. Nevertheless it challenges all current theoretical models and know-how in computer science and AI as to how to actually take advantage of that potential power.
The Connection Machine is a very illustrative example that we will use. The Connection Machine offers a hundred thousand processors computation power, and in some way it has been conceptualized as an active memory. The metaphor of an active memory aims at showing that in the Connection Machine memory and processing are tightly integrated in the hundred thousand memory-processing units, and that the old computational metaphor of memory is no more usable here. The remarkable thing is that it has been difficult to find AI problems that can exploit all the computational power of the Connection Machine [WALTZ]. The reason is that parallel processing challenges computational methods, algorithms, and techniques, and MMP intends to show that memory-based AI approaches can accompany the parallelism trend and take advantage of it. In fact, the most successful applications of the Connection Machine have been (apart from simulation tasks) in problem-solving tasks that use memory-intensive approaches [WALTZ]. The new version of the Connection Machine being released (1990) contains one million synchronous memory-processor units, and the challenge will still be pushing higher in the future: fine-grained hypercube scheduled for 1992, like Mosaic at CalTech and J-Machine at MIT [J-Machine] will offer up to 64000 asynchonous computing nodes.
The Connection Machine is shown not only as an example to support the MMP view. In fact, the most outstanding aspect is that AI has continually pushed forward innovations in the design computer architectures and programming languages. With the advent of massive parallelism, instead, AI methods have been seemingly unable to profit from this gain in computational power: the Connection Machine and the AI applications that run on it are paradigmatic. An argument goes in that symbolic reasoning and AI techniques are essentially sequential, and therefore cannot profit from the parallelism trend.
We argue that what is to devise new approaches for computational models of intelligence. One approach is connectionism and other subsymbolic approaches [SAMS]. The MMP option instead is that also higher-level symbolic techniques can be used, for instance marker-passing models of spreading-activation techniques in [[section]]5. The reason supporting this option taken by MMP is that symbolic processing is not essentially sequential. Instead, we claim that logic-based, deduction-oriented approaches to inference can be considered essentially sequential, in the sense that parallelism will not change greatly their efficiency and will be applied only at the implementation of some aspects of those techniques. Moreover, we claim that non-deductive inference as advocated by the experiential trend in AI, from Schank to Minsky, are essentially parallel, in the sense that they can immediately profit from parallelism in all aspects, form implementation to radically new models of inferencing.
This mechanism for recovering old partial mental states is close to Boltzman machines and other connectionist approaches, and for instance [uKLONE] shows how a frame-based language can be compiled into a connectionist network that hill-climbs an energy function in order to compute a relevant context and resolve ambiguities.
Our claim has three parts
i) that what is essential for recovering old partial mental states is this kind of approximate matching that gives not an optimal solution but a good enough solution
ii) that the underlying mechanism is some kind of hill-climbing of energy function (for instance, simulated annealing)
iii) that the connectionist network implementation is just one of several possible embodiments of the (ii) formal mechanism, and another is for instance a marker-passing implementation of spreading activation
iv) and that the essential notion is that of spreading activation and the subsequent context reconstruction
a) hardware architecture (multicomputers, data-parallelism architectures, data-flow machines, rewriting rules machines, etc.)
b) programming languages
b.1) computation models (actors, dataflow, etc)
b.2) programming concepts and constructs (implicit or explicit parallelism, futures, message-passing, etc)
c) algorithmic and programming techniques
c.1) parallel algorithms (new algorithms vs. parallelization of existing algorithms) c.2) new AI and ML techniques and approaches vs. parallelization of existing techniques and approaches (these include ML techniques, inference methods, knowledge representation schemes, etc)
Different research efforts can be characterized depending on the options taken in these three levels. The actors model [HEWITT], for instance, assumes a fine-grained multicomputer on level (a), and develops essentially level (b) with (i) the actor model of concurrent computation, and (ii) a hierarchy of low-level, high-level (Lisp-like) and representation languages. The result of these approach would be that whatever parallelism is at level (c) the supporting layers will profit of all possible parallelism. Of course, new techniques and algorithms can provide more parallelism to take advantage from. The data-parallelism of the Connection Machine focus its research in new techniques for AI problem-solving in a massive parallel way, and in some new constructs for programming languages.
Our proposal is that the experience-based approaches of AI can easily be parallelized, specifically memory-based systems that use non-deductive inference techniques. We propose to study and use marker-passing spreading activation in MMP because a) it has been recently used in a parallel implementation of memory-based techniques [KOLODNER 88], and b) it is a possible means to integrate memory-based symbolic reasoning with subsymbolic inference [HENDLER 89], offering some interesting and complementary interaction with the SAMS project [SAMS]. Moreover, [uKLONE] shows that the power of a KLONE-like representation language can be achieved by a parallel connectionist implementation, so another research issue linking MMP to SAMS would be the different ways in which symbolic and subsymbolic inference can be combined and implemented.
Now, regarding parallelelism in AI systems, the notions of spontaneous and deliberate inference are explained in depth in [[section]]7,but let me just state for the moment a conjecture that MMP may show correct or wrong: I see spontaneous inference essentially as a kind of non-deductive, highly-parallel inference, while I tend to see deliberate inference essentially as a kind of deductive, sequential inference. In the next section we will discuss these issues by means of the example of case-based leaning.
* each case-memory is a concurrent process, `awakened' by the spread of activation, and each case-memory carries out its own activity according to what it knows
* it is a cooperative activity when computes the adequate activation marker to be send to the neighboring memory-structures that are relevant, arousing them also
* it is a competitive activity when degrees of activation function as a selectivity mechanism to choose the most relevant memory-structures
* the last two points are akin to the Society of Mind theory on active agents. The K-lines theory of memory also can fit here.
All these tasks can be seen as a an instance of sense disambiguation or context finding[*] . We define the sense disambiguation as a task whose input is an underconstrained description and whose output is a set (maybe a singleton) of most plausible models for that description. A model here is intended to mean any network of interconnected concepts. In language or speech precessing it is clear this is an instance of what can be called sense disambiguation or context finding. A case-based diagnosis system can also be seen as parsing a sentence (the patient description) and finding the most plausible model or context. Of course, the description changes over time, as the system asks questions whose answers change that description: this task then involves more than merely context finding as a diagnosis process, but at each step the diagnosis process involves context finding.
The necessity of being introspective was showed in the section on SOAR (in [[section]]126.96.36.199) to be required by Minsky's Recursion Principle, and in [[section]]7.2 is argued for and some examples are given of the usage of meta-level and reflective capabilities. That representation is fame-based, we would argue just for historical reasons, in the sense that it may be used other kinds of languages, like [TYPICAL] based on mutable functions (a functionalization of frames, see [NOGOOD]). The historical argument is that the much experiential-based AI research, from the first formulation of frames by Minsky to EURISKO, CYC, THEO systems, and including the dynamic-memory approaches of Schank, Riesbeck, and Charniak. This historical reason is nonetheless well-founded: the frame-based languages consider the frames as stereotypical organization of events, perceptions, etc, and address other fundamental issues in memory organization like expectations, hierarchical packaging, spontaneous inferencing. Notwithstanding, MMP has to consider also alternatives, for instance the mutable functions approach [TYPICAL].
In the following, we use some concepts developed by a variety of authors, like K Haase [ARLO], D Lenat [RLL], and Chandrasekaran [CHANDRA], but since we make use of them unavoidably we will be re-defining these concepts (like cognitive economy, spontaneous and deliberate inference, etc) for our own purposes in MMP. The source will be referred to when used, but will be used freely.
Inference in logic has to do only with the validity of inference results, and therefore representation has nothing to do with logic. On the other hand, inference in AI has also to do with the usage (utility, cost, frequency, etc) of inferences, and representation is a mechanism that reifies a class of inferences in an organization such that, in some sense, the results of those inferences are already there. It is like the distinction between knowing and reasoning: between knowing "this is a system of equations" and doing some reasoning to solve the system of equations. The first is immediate and the second a goal-oriented deliberate activity. All AI systems draw, at some point, this distinction. In this perspective, inferences in AI are classified into two types: spontaneous inference and deliberate inference. The next section [[section]]6.2 is dedicated to them.
The distinction between representation and inference is obscured in most AI systems because it is fixed. Only in learning systems, where representation is seen to change according to experience, it is easier to show that most of learning has to do with incorporating to memory a problem-solving episode in such a way that it is ready for future use in a direct way, without needing to repeat the whole process. Particularly, this is evident for case-based and analogy-driven learning systems. This issue we will refer to as cognitive economy, following Lenat's [RLL] usage. Cognitive economy is the principle that systems with leaning or adaptation reorganize its representations according to its experience of the usage of inferences in some domain. Cognitive economy may use straightforward mechanisms, such as caching results, and complex mechanisms, such as remindings, generalizations, and indexing in case-based systems. Also, as developed in [[section]]7.2, learning about what is useful to retain (memorize, represent) and what is to be done when solving a problem, according to environment and domain features, is part of the cognitive economy. See in [[section]]7.2, and specifically in [[section]]7.2.4, an example of how cognitive economy can drive a learning system from a purely exemplar-based learning to a learning based on a hierarchy of generalization descriptions.
The mutual necessity and embedding of deliberate and spontaneous inference is general. As an example, case-based reasoning needs both kinds of inference: spontaneous inference for recalling old situations and their solutions from the current problem, while when there is some difference between the current situation and an old situation or solution, deliberate inference is needed to adapt the old patterns to be effectively applicable the old solution to the new pattern. This same example shows how deliberate inference is embedded inside a process of spontaneous inference. As an example of spontaneous inference embedded in deliberate inference, we can point out to the XP's (explanation patterns) where a difference between old and new patterns is interpreted as an anomaly whose type is recognized by the systems (a spontaneous inference) and the repair strategy associated with the type is instantiated [Schank 89]
The Society of Mind makes a similar distinction between `direct' reaction behavior and goal oriented behavior.
A "goal-driven" system does not seem to react directly to the stimuli or situations it encounters. Instead, it treats the things it finds as objects to exploit, avoid, or ignore, as it were concerned with something else that doesn't yet exist. [MINSKY].
MMP characterizes spontaneous inference by several features:
a) spontaneous inference is automatic and immediate, built-in
b) spontaneous inference is non-deductive
c) spontaneous inference is massively parallel
d) spontaneous inference never builds large mental structures [ARLO]
Spontaneous inference occur quickly, with no visible complex intermediate steps. As we stated, quickly is a relevant notion in the characterization of AI systems since we are dealing with cognitive economy issues: spontaneous inference encompasses all a system `knows' or `has learned'. It is non-deductive in that sense that is not guaranteed to be sound or complete, since usually it will get an approximated answer or a partial answer. It is massively parallel, or at least MMP claims it can be implemented so in most cases: from reminding of cases to phrase parsing. Spontaneous inference never builds large or complex mental structures, it merely completes existing ones, like guessing the name of a women from her husband's name, or her telephone number. In fact, this last aspect is a corollary of (a), but it characterizes clearly spontaneous as opposed to deliberate inference, which in fact is mainly occupied in building up new representations.
Recognizing this fact is crucial, and it is not surprising that indexing (or classifying) is so broadly used in AI systems. For instance, in KLONE-like languages it is called realization, and in case-based systems reminding. Our argument is that experience-based approaches always have to do with recognition, but sometimes it is obscured with more logic-centered aspects. For instance, in KLONE the expressive power of the language makes term subsumption NP-hard: the reason being that KLONE aims at assuring deductive properties of inference. Instead, uKLONE uses a non-deductive inference mechanism that may give answers not true in all possible models (in the model-theoretic sense) but that is true for the most likely model, achieving inferences (and failures) closer to human reasoning [uKLONE]. The properties of KLONE blur the distinction between spontaneous and deliberate inference, while uKLONE keeps in spirit akin to activation-spreading inferences or case based systems: they recognize (remembers) a solution to a problem that may be not completely correct, and then simply the system adapts the solution or tries another one.
The goal-oriented (or intentional or purposive) behavior is modelled in the Society of Mind by the Difference-Engine: a difference between the ideal (or goal) state and the actual state arouse some subagents whose activities tend to diminish the difference that aroused them. We can try from here another characterization of deliberate and spontaneous inference: spontaneous inference seems to work based on similarities between actual state and an internal representation while deliberate inference seems to work based on differences between actual state and an internal representation (the ideal). In he Society of Mind example the Difference-Engine works on differences, while the subagents are aroused because of the similarity of an actual state (that representing the difference) with the representation they have of their arousing situations.
Also in the XP's example [Schank 89] explanation patterns are spontaneously inferred to explain a current situation, but if some mismatch occurs, this sets up a goal to be solved by deliberate inference. The mismatch type selects the repair strategies to be tried, and this selection again is spontaneous inference based on the mismatch state. It may seem that in this view every inference can be considered as spontaneous, but it is not so, When the XP's solve the mismatch, new knowledge structures are constructed in such a way that, when encountering the same problem (or a similar one) the new knowledge structures embody the direct method for solving it (spontaneous inference therefore), and the mismatch state and the method that solved it (that which formed the deliberate inference) will not occur. This last part takes us directly to the learning issue in the next section.
Another example of this process is case-based learning: after solving a new situation instance, this instance is to be transformed and coded in the memory as a new case instance. Such transformation involves a) a reorganization of the memory for integrating the new case, and b) the coding of the relevant aspects of the solved case, both (a) and (b) in order to achieve the goals of the transformation, namely (i) that any similar case can be recognized and (ii) that the past experience is ready to be used for solving a new case.
In fact, most differences of case-based systems acting on different domains, is on the type of representation change it effects in order to transform an experience event into a memory structure that is to be used easily and effectively on new occasions. In fact, most case-based reasoners perform few work on this part, typically storing a case with its indices, and sometimes the the problem solving trace, while most of the work is done on the fly adapting the retrieved case to solve the new situation. Now, the amount of adaptation, and hence of deliberate inference, basically depends on the differences between the stored case and the new situation. In routine task environments deliberate inference will diminish its role to a small degree, while in other domains the spontaneous inference recognizes and retrieves skeletal solutions that are then adapted systematically at each new occasion.
We want to stress the primacy of environment (and history) in the MMP approach to machine learning. In section [[section]]6, we have already established the relationship of learning and environment as regarding representation and cognitive economy: learning always involves some kind of change of the representation in order for the system to adapt itself to the process of problem-solving in an historical environment. But there is a second aspect to which the environment plays a role at the knowledge level: there exists a myriad of methods to achieve a goal, and learning which method (or a new one) is best in an environment crucially depends on how they relate to the environment epistemological features.
For instance, in [Davis 88] there is a knowledge-level account of several systems of model-based reasoning for circuit troubleshooting. All systems use a same generic method called Generate-Test-Debug, but each system in fact uses a different variation of the method that depend on the type of knowledge the method is able to use. The same method in fact can be seen as a bunch of similar methods that range from the most weak and generally applicable to the more circuit behavior informed, and hence more efficient and less general. The creation of new methods or the selection and adaptation of methods to a given environment is a fundamental part of learning that works upon epistemological features as cost and availability of data, time constraints, knowledge on typicality of situations, etc.
Our proposal will be that methods for tasks can be realized as clichés with meta-level inference capabilities A cliché is "(...) used in this paper to refer to a standard method for dealing with a task - a lemma or a partial solution" [Waters 81]. This concept originated from the programming knowledge representation in [Waters 81] and later was extended to the notion of cognitive clichés: "Cognitive clichés are highly exploitable, domain independent, formally specified properties of representations: not a theory of representation themselves, they are properties of domains and the representations of domains. (...) Examples of clichés are continuity, ordering partitions, equivalence classes or symmetry" [TYPICAL]. The essence of the idea is that a cliché is a reusable, typical method for solving a task in different domains, and in [MM2] it is argued the necessity of meta-level inference capabilities for clichés. In this section we develop this issue by means of an example, describing to ML methods and how they are viewed as clichés and memory structures in MMP.
For instance, it is necessary a study like [Clark 89] in which ML techniques are described as inference done at a meta-level, in order to show the similitude of rule-based and case-based learning techniques. By means of a meta-level specification of techniques it can be seen that systems realizing variations of both techniques have in common certain aspects (like generalization) and specific differences (like if generalization is realized previously or at demand) that imply certain trade-offs (for instance, regarding generalization of examples, memory size, computational effort, reorganization of memory, etc).
Our proposal is that meta-level inference is not only appropriate for knowledge-level description of systems and techniques but also for their implementation.
On the one hand, since in MMP these ML techniques have to be memory structures, it is natural that they are seen as inferences that act from a meta-level (how to learn) upon the object-level (what is learned).
On the other hand, having learning techniques as first-class objects in MMP enables the system a higher-order learning: it can be learned in which situations is better using a learning technique (depending on the features of the domain), and even synthesize variations of learning techniques from existing learning techniques subparts (like modifying when generalization is done, e.g. eagerly or on demand, for a new domain with certain features).
Specifically, our first approach for realizing this proposal will be to consider each ML technique as a cliché in which a meta-level organizes its subparts (like generalization, partial pattern-matching, evidence combination, prototypicality, etc) as other clichés. Of course, a first step in MMP will be to realize the different ML techniques as clichés. We will use in this section the analysis of [Clark 89] as an example to illustrate the meta-level proposal.
i) the bias with which they select concept descriptions during search: case-based prefer systems concept description that group examples in a space defined by boundaries of equal distance in some metric, while rule-based systems prefer hyper-poligons with edges parallel to the feature axes of the example space.
ii) the computational approaches of realizing generalization during or before runtime (case- and rule-based systems respectively).
However, actual ML systems of both techniques take into account resource constraints (like memory size, efficiency, etc) and this has caused to implement variations of the techniques that share common aspects with the other technique. For instance, rule-based systems, in order to implement a degree of membership of an example to a concept needs to incorporate a matcher producing measurements of concept membership, as is done in all case-based systems. Conversely, case-based systems implement generalization at runtime when solving a new example, but some variations gain efficiency by pre-computing some generalizations much as rule-based systems do. Other trade-offs include example memorization: in one extreme there is pure memory-based approach storing all cases (but this consumes memory and slows matching), and on the other extreme, pure rule-based systems that do not store any example (and makes incremental learning impossible).
So we see that both approaches have common problems which [Clarck 89] summarizes as:
a) deciding what information is useful to summarize and compile in order to be retrieved from memory when solving a problem.
b) since both retain in some way previous examples, both need to decide which and how much information can be discarded.
The important issue of rising a technique from a procedural implementation to a meta-level is that it is a means to achieve the end that the system can reason about the content of its learning and about the learning method as well. For instance, the meta-level can easily reason about the number of solved cases, and use only runtime generalization and storage of all examples while these number is small (to avoid computing generalizations that will have to be discarded or modified constantly), and only start pre-computing generalizations (and not storing all examples) when a good sample of examples has been solved by the system. As another example, the system can try small modifications of these clichés changing some criterion on the meta-level and adapt or discover methods which are more efficient for a specific domain or environment. In the long run, the system can reason about the features of different domains (like incomplete information) and environment (like time constraints) and different learning clichés that it applied to them, eventually learning the best methods according to the environment and domain's epistemological features.
Since we have seen that between case- and rule-based there is in fact a continuum of ways to solve the same issues, we now develop it in the MMP view. In fact, the distinction between how much is pre-computed or computed on the fly, is the question of what is to be inferred and memorized in order to have it there already, as spontaneous inference, for the following situations. The more it is precomputed, the more it is needed to know about the specific goals and features of the domain, as rule-based leaning systems (and also EBL systems) have shown. When the domain is less systematic, or less known, MMP hypothesizes that more pure case-based learning takes place, but that as the systems gains experience on the domain and discovers some regularities that are useful, then the system changes to more pre-computing ("operationalized") representations for new learning techniques.
It is a main research issue to MMP the spontaneous vs. deliberate inference trade-off, how to have pure case-based leaning and several variations of case-based and rule-based learning for several tasks; but specially how learning may be realized so that one moves from one type of learning to the other, and in which type of domains it is so. As stated this implies analyzing at the knowledge-level different leaning `algorithms' and implement them as clichés with their meta-level inference implementing those algorithms.
For instance, an experiment to be taken in MMP can be learning the pronunciation of words from its spelling. It has been shown [MBL] that in English a purely memory-based, without any generalization, can work in comparable levels of efficiency as others. MMP hypothesize that this is due to the structure of the English language domain, and trying out this same experiment in Catalan, Spanish, and Flemish, the type of learning should be different. In Catalan an Spanish, it would be more efficient to discover the abstract rules that map spelling to pronunciation. Specially in Spanish they should be straightforward, and therefore learned faster than Catalan rules. This experiments can show how to learn changing the best representation for spontaneous inference, and therefore also the learning technique used, and show how it depends on domain features and cognitive economy aspects.
[Carbonell] J G Carbonell (1989), Introduction: Paradigms for Machine Learning, Artificial Intelligence, 40, 1-9.
[CHANDRA] B Chandrasekaram (1986), Generic Tasks in Knowledge-based Reasoning: High-Level Building Blocks for Expert System Design, IEEE Expert, Vol. 1.
[Clarck 89] A comparison of Rule and Exemplar-based learning systems, in P B Brazdil and K Konolige (Eds.) Machine Learning, Meta-Reasoning and Logics, Kluwer Pub.
[CLICHé] David Chapman (1986), Cognitive Clichés, MIT AI Working
David Chapman (1983), Naive Problem Solving and Naive Mathematics, MIT AIL Working Paper 249.
[CYC] D B Lenat (1989), When will machines learn?, Machine Learning, 4,9-11.
[Davis 88] R Davis & W C Hamscher, Model-based Reasoning: Troubleshooting, MIT AIL Memo 1059.
[ESF] L. Steels & D. McDermott (Eds.) Expert Systems Foundations: Conversations and Comments, Draft Edition (1989).
[EURISKO] D. Lenat (1983), Eurisko: A Program Which Learns New Heuristics and Domain Concepts, Artificial Intelligence, 21.
[FIKES] R Fikes (1989) Report on the Knowledge System Development Tools and Languages, AI Magazine, 10, 3.
[FRAME] M Minsky (1975), A Framework for Representing Knowledge, in P H Winston (Ed.) Psychology of Computer Vision McGraw-Hill
[HAASE] K Haase (1987), TYPICAL: A Knowledge Representation System for Automated Discovery and Inference, MIT AI TR 988.
[Galambos 86] J A Galambos, R P Abelson, J B Black (Eds.), (1986) Knowledge Structures, Lawrence Erlbaum Associates.
[HAMMOND] K J Hammond (1986), Case-based planning: An integrated theory of planning, learning, and memory. Ph.D. Thesis, Yale University.
[HENDLER 88] J Hendler, B Israel (1988), A highly parallel implementation of a marker passing algorithm, University of Maryland, UMMIACS TR 88-62, CS Dept TR 2089.
[HENDLER 89] J Hendler (1989), Marker-passing over microfeatures: towards a hybrid symbolic/connectionist model, Cognitive Science,
[iPSC 85] Intel Scientific Computers, iPSC User's Guide, Order No. 175455-001, Santa Clara, CA.
[J-Machine] W J Dally (1989), Concurrent Computer Architecture, in A K Noor (Ed.) Parallel Computations and Their Impact on Mechanics, The American Society of Mechanical Engineers, Vol 86.
[KOLODNER] J L Kolodner (1987), Extending Problem Solving Capabilities Through
Case-Based Inference, Proc. Fourth International Machine Learning Workshop.
J L Kolodner & R M Kolodner (1987), Using Experience in Clinical Problem Solving: Introduction and Framework. IEEE Transactions on Systems, Man, and Cybernetics.
J L Kolodner & R E Cullingford (1986), Towards a Memory Architecture that Supports Reminding. Proc. Eighth Annual Conference of the Cognitive Science Society.
[Langley 89] P Langley (1989), Machine Learning.
[MBL] C Stanfill and D Waltz (1986), Toward Memory Based Learning,
Communications of the ACM, 29(12).
C Stanfill and D Waltz (1986), The Memory Based Learning Paradigm, Proc. Case-Based Reasoning Workshop (DARPA). Morgan-Kaufmann Publ., San Mateo, CA.
[uKLONE] M Derthick (1988), Mundane Reasoning by Parallel Constraint Satisfaction, Ph D Thesis and CMU-CS-88-182, Carnegie-Mellon University, Pittsburgh.
[MINSKY] M Minsky (1986), The Society of Mind, Simon & Schuster.
[MM2] Enric Plaza (1990), Modularity and Meta-Level Part II
[NOGOOD] K Haase (1987), Why Representation Languages are No Good, MIT AI Memo 943.
[PS CLICHéS] E Plaza (1990), Problem Solving Clichés, Report de Recerca GRIAL 90/9.
[PRODIGY] S Minton, J Carbonell, C Knoblock, D Kuokka, O Etzioni, Y Gil (1989), Explanation-based learning: A problem solving perspective, Artificial Intelligence, 40, 63-118.
[RIESBECK] C K Riesbeck (1986), Direct Memory Access Parsing. in J L Kolodner & C K Riesbeck (Eds.) Experience, Memory and Reasoning. Lawrence Erlbaum Associates, Hilldale NJ.
[RLL] R Greiner (1980), RLL-1: A Representation Language Language. Working Paper 80-9, Stanford Heuristic Programming Project.
[TYPICAL] K Haase (1987), TYPICAL: A Knowledge Representation System for Automated Discovery and Inference, MIT AI TR 988.
[SAMS] Selforganising and Analogical Models for Subsymbolic Computation. The SAMS Project Proposal
[SCHANK82] R C Schank (1982), Dynamic Memory: A Theory of Learning in Computers and People, Cambridge University Press.
[SCHANK89] R C Schank & D B Leake (1989). Case-based explanation, Artificial Intelligence, 40, p.353-385.
[SOAR] J Laird, A Newell, P S Rosenbloom (1986), SOAR: An architecture for general intelligence, CMU-CS-86-171.
[STEELS] Steps toward common-sense, ECAI-89.
[WALTZ] Waltz (1989), Personal communication, Amalfi (Italy).
[Waters 81] R C Waters (1981), The Programer's Apprentice: A Session with KBEmacs, IEEE Trans. Software Eng., 11, 11, pp. 1296-1320.