diagrams as models: an architecture for diagrammatic

37
Diagrams as Models: An Architecture for Diagrammatic Representation and Reasoning B. Chandrasekaran MBR-2004 Collaborators: Unmesh Kurup, Bonny Banerjee, John Josephson, Vivek Bharathan Robert Winkler

Upload: others

Post on 14-Mar-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Diagrams as Models: An Architecture for Diagrammatic Representation and Reasoning

B. ChandrasekaranMBR-2004

Collaborators:Unmesh Kurup, Bonny Banerjee, John

Josephson, Vivek BharathanRobert Winkler

Different Uses of “Model”• In Logic: A Domain as Model of a Description

– The term is used in two related ways:• Given A, a set of sentences (axioms), a domain D is a model of A if the actual

properties of D satisfy A (under appropriate mappings of the symbols and entities).

– Arithmetic is a model of Peano’s Axioms– Plane Geometry is a model of Euclidean Axioms

• A somewhat different, purely syntactic sense, in assigning truth values to elements of the Herbrand Universe.

• In Philosophy and Practice of Science: A Description as a model of a Domain

• A description – a set of equations – is a model a domain D if the description can be used to predict phenomena in D

– Maxwell’s Equations model electro-magnetic phenomena – the Newtonian model vs the Einsteinian model, etc.

• In AI/Cog Sci, the sense in “mental model of agents” is the second sense– The opposite senses especially annoying when we wish to study

reasoning of agents• I won’t even get into the sense of “my car was 2003 model.” ☺

Models and Thinking• Model of domain D:

– A representation of D that provides information about D under appropriate use of the model

• Applies both to external physical models and mental models

• All thinking, all of AI, is model-based– “If the organism carries a small-scale model of

external reality and of its possible actions within its head, it is able to try out various alternatives, conclude and react to future situations before they arise..." (Kenneth Craik 1943)

– Thinking involves not only using a model to make predictions, but also to create models about the world.

Model Frameworks Characterized by Two Associated Schemes

• A model can be useful only when we have two schemes to go with it:– A Model creation/transformation scheme

• Ways of creating a model to represent aspects of D, and also to change it when aspects of D change.

– A scheme to extract information from model

Dominant Model Framework in AI/Cog Sci

• Language of Thought Hypothesis– Predicate structure of language, symbols stand for individuals, relations,

and connectives,• On(A,B) ^ On(B,C), A Is-a B

– AI, both of the logicist and non-logicist kind, is also committed to it. • Modeled after what is taken to be the structure of natural language.

• Great power: generality, ability to connect distal elements of the representation on an as-needed basis

• Model creation: representing truths (or approximations thereof) as predicates that hold in D. – Model Xformation: Apply rules that describe domain change

• Information Extraction: inference (not necessarily deductive), based on rule application by symbol matching.

• Role for perception in these theories is to supply information about the world in the form of symbolic expressions, not to play a role in reasoning as such.

The Standard Story in AI

ExternalWorld/Representation

Perception

Memory

Propositionsfrom Memory

Propositions fromPerception

Problem Solver,Goals

Action

Perception

Perception

Propositions describing action

But Phenomenology of Inner Selves Suggests …

• ..experience of images in multiple modalities– Visual ones, especially, play role in reasoning and problem

solving– Given A is to the left of B, and C is to the right of B, what is the relation

of A to C?– Imagine taking a step forward, a step to the right, and a step back.

Where are you with respect to the starting point?– Experience of applying perception to image to extract the

needed information– The phenomenology is independent of the controversies about

the “true” nature of mental images– The logic of problem solving is as if perceptions are applied to

images • The images play a certain functional role in problem solving

Agent’s Cognitive State as Multi-Modal

• Most general account of models used by an agent then is that of multiple modes:– Language-like expressions

• with the information extraction operation being inference; and

– Images in various perceptual modalities, • with information extraction operators being the

corresponding perceptions applicable to the modalities.

External Representations• Models can be and often are external

• 3-D architectural and molecular models• Diagrams• Chemist solving a chemistry problem by solving a subproblem by

mixing up chemicals and seeing what happens– Physics of the world automatically takes care of transformations– External perception takes care of information extraction– “Situated cognition”

• Where the external world is its own representation– Good: If perception is more or less free, information extraction by

perception saves us from having to “think.” Less strain on STM, not only for information storage, but for storing and manipulating rules for inference-making.

– Bad: Model is an instance, so need to be careful about generalizations.

Diagrammatic Reasoning Research:Many Payoffs

• Diagrams are a wonderful window into the larger multi-modal cognitive architecture

• Practical importance, in view of their ubiquity in everyday and professional reasoning

• Advances help not simply in theory-making, or building automated agents, but provide basis for improvements in visual interfaces.

Scope of Diagrams for First Round

• Neutrality with respect to external vsinternal– Certainly applies to external diagrams, but

depending on point of view can be taken to apply to internal diagrams

• Simple diagrams:– B&W (i.e., no grey scale carrying information,

no color) and static• Framework can be extended

How do Diagrams Help? • Makes relevant

information highly salient for visual perception to pick up

Minard’s map of Napoleon’s Army in Russia, as redone by Mary A. Pender

B CA

Intersection point, and the various segments, are emergent objectsThe causal structure of physical

space does the work for us

• Emergent objects and relations:– diagram created to represent

information I, also represents certain implications of I.

• A diagram created to represent Left(A,B), Right(C,B) automatically represents relation between A and C.

• Free ride (Shimojima)

The Downside

• Overspecificity:– Not all information available for perception is

“true”, so need to be careful • No sense to “see” that C is farther to the right of B

than A is to the left of B• Not that generalization is not possible, but requires

keeping track of the representational commitments– We can still use the diagram to say that for all A, B, and

C, if …, even though the diagram just represents one instance out of all possible such instances.

Some Issues in the Logic of Diagrams

• Warrant for Generalization– A specific diagram, such as that

on the right, is one model of the given. (E.g., of the statements, Left(A,B) ^ Right(C,B))

B CA

– A specific proof-technique that may be called model-based generalization is used.

S1

S2

S3 ---- has model M1, S4 is true in M1, generalizing,

S4

Logic of Diagrams, contd.• Barwise and Etchemendy

– They want to treat a diagram as a two-dimensional “sentence,” just as P(A,B) is a one-dimensional sentence.

– They wish to develop “rewrite rules” that will transform diagrammatic sentences into diagrammatic sentences, so that certain diagrammatic “proofs” will be allowed in logic

• Such as certain proofs in Set Theory using Venn or Euler diagrams.

– This view is not one based on a diagram as a physical model of a set of propositions.

• It is still syntactic transformation, just as traditional rewrite rules are.

Our Group’s Focus• How diagrammatic representations are used in conjunction with

predicate-symbolic representations, i.e., multimodally, in problem solving – Flexible integration of information generation modes

• Use whichever modality is best useful for solving a given subtask

Diagram

ActionRoutines

DRS

Perceptual Routines

ProblemSolver

SymbolicRepr.

Inference Rules

Bi-Modal State

Power of Opportunism• Flexible integration:

– Information might be obtained in one step from the diagram by perception;

– The information so obtained might be combined in the next with other information in memory to support applying an inference rule to information in symbolic form,

• which might result in changes to the diagram, • which might in turn give rise to emergent objects and

relations that can be picked up by perception, • … and so on.

– Each modality supplies information that it is best suited for.

– Reduction in search by a factor of 300 in geometry theorem proving

Diagrams Abstract

• Diagram is a special kind of “image”– Distinction between a terrain

photograph and a map of the same region

– Diagrams are spatial representations by and for agents

• They reflect representational choices by an agent, some things are abstracted away, some spatial aspects are taken to be representational, while others are not, etc.

Focus in the Talk

• Issues Not Considered Here – How agent decides what and how to

represent in a diagram– Issues related to how aspects of non-spatial

domains may be represented by diagrams• Focus will be on architectural issues

– Representation and coordination between modalities

Three Levels of Perception• The first is Figure-ground discrimination,

seeing the world as composed of objects and their shapes– With diagrams, this dimension corresponds to

perceiving it as composed of objects – or figures –as 2-d shapes.

• The second dimension is that of seeing as, e.g., recognizing an object as a telephone, or a figure in a diagram as a triangle. – Output is a symbol that names

• The third dimension is relational – seeing that an object is to the left of another object, taller than another, is part of another, etc. – Produces symbol structures whose symbols refer to spatial relations.

• The first is the only one whose output is intrinsically spatial

DRS

• DRS represents the output of Level 1. – That is, DRS is not simply a collection of marks or

pixels, neither is it a collection of symbolic representation of propositions about the scene

– DRS sees the diagram as a configuration of diagrammatic objects,

• Each of which represents some element in the domain, • Selected spatial properties of each is intended to

represent

DRS: Representing Diagrams• Distinction between physical

diagrams and intended spatial representation.

– The alphabetical symbols and the icons are part of the physical diagram, not DRS. Symbolic annotations in DRS.

– The arrows are regions in the physical diagram, but the intended spatial representation is the axial curve, not the region. The arrowhead is a symbolic annotation of directionality.

• DRS represents it a curve.– The hatches in the regions

symbolize a no-go region. DRS of the regions do not have the hatches, but the symbol is attached to the region object

– Attitude to geometric figures in a diagram during theorem-proving

Abstract Diagram

• The abstract diagram in DRS is to the physical diagram as the abstract (intended) predicate relation is to P(A,B)

DRS: Functional Equivalent of Diagram

• Diagram is a configuration of diagrammatic objects, each of which is one of {point, curve, region}. – Points in curves

and regions, and curves in regions may be identified as distinguished objects.• The DRS assigns an internal label for each

diagrammatic object, and associates with each object its spatial specification.

Object, Label A,type: Curve, EndPoint1:D, EndPoint2:E

Object, Label C, type: Region Periphery: F

Object, Label D, type: Point

I DDS

Object, Label E, type: Point

Object, Label C, type: Point

Object, Label F, type: Curve

DRS - continued• A DRS representation is not a pixel-level

description. It corresponds to the agent’s perceptual experience after a figure-ground organization is already made, the array is already interpreted as objects. – None of the operations on DRS correspond to “image

processing” operations. • DRS is more abstract than a physical diagram.

– Points and curves in physical diagrams vs points and curves in DRS.

– Iconic aspects not present in DRS

Perception Operators• Spatial properties of objects

– Length of a curve, e.g.– That a curve is a straight line, a region is

a triangle, etc• Emergent object identification

– Point, curve, and region objects that are created when diagrammatic objects are declared.

• Emergent relations– Inside, touches, left-of (a,b,pov),– Subsumption relations– Angular relations– …. (domain-independent …domain-

specific.)

Action Operators

• Create objects satisfying properties or relations– E.g, Point to the left of region, a point

on curve,..– Curve such that it connects points A

and B, and avoids region R. • Action operators make use of

perception operators to satisfy constraints.

Open-ended

• Perception and action operators are an open-ended set

• However, a large number of domain-independent, reusable PR and AR’s can be identified.

• We have implemented a basic, reusable set.

Example :Information Fusion for Entity Re-identification

• System receives a new report about a sighting of entity, say T3, of type T

• Has to decide if the new sighting is the same as any of the entities in its database of earlier sightings, or an entirely new entity– Reasoning has to integrate information from different

sources – database of sightings, capabilities of vehicles, sensor reports, terrain and map information – to make the decision

• Uses a generic fusion engine for generating, evaluating, combining, and modifying hypotheses about the entity

• Diagrammatic Reasoning is used to handle spatial aspects: Possible routes, whether routes intersect sensor fields, how to modify routes to avoid sensor fields

Problem Setup

GoalsKnowledge

problem solving engine

DiagramPerceptions routines

Action routines

Fusion Engine

DRS

The start: On receipt of sighting of T3, the problem solver – the Fusion Engine (FE) in this case -- queries the entity database for entities of the same type in the Area of Interest, and gets back two vehicles T1 and T2, their types, locations and times of sightings.

Area of Interest (AOI)

Examining Possible Routes

• Fusion Engine (FE) asks Diagrammatic Reasoner (DR) to identify routes that T1 and T2 might have taken to get to T3.

• FE rules out the longer route in each case as too long, based on time elapsed, leaving one route each.

• FE asks database for information of tunnels, storage depots, etc., from which new vehicle might have made appearance. New vehicle hypothesis is ruled out.

Failure of Expectation Critiques:Crossing Sensor Fields

FE identifies from the database two sensor fields in the AOI, and information that neither of them reported any sightings. The fields are added to the diagram.

FE asks DR if Route 1 intersects a sensor field. PR identifies Sensor_Field2.

FE asks if the route could be modified so as to avoid the sensor. DR tries it and says, yes.

Repeat Failure of Expectation Critique for T2

• However, DR says this time that the route cannot be modified to avoid the sensor field.

• Note: All these activities, the diagram representation, applying the various perceptions on the diagrammatic objects, creating diagrammatic objects (routes) that satisfy certain perceptual constraints, are all done inside the computer, and the corresponding objects are displayed for users, but the display does not play a role in the reasoning itself (except when a user introduce new diagrammatic objects, which are then transferred to the program).

• FE now proposes T1, along Route 1, as the most likely hypothesis.

Generalizing Cognitive State• Problem state is bi-modal

– DRS is the functional equivalent of an external diagram, just as a “mental image” is the functional equivalent of a diagram.

• Or, DRS *is* the mental image for an computational agent.– Any stage in problem solving, part of the information is

in predicate-symbolic form, and part in functional diagrammatic form

– Whichever information extraction operator is most expedient can be deployed.

• Initial implementation of an extension to Soar, the cognitive architecture, in which each state is bimodal. – Partial relief for the Frame Problem

ExternalWorld/Representation

Action

PerceptualSystems

EPS1

EPS2

EPSn

IPS1

IPS2

IPSn

Conceptual/SymbolicProblem Solver,

Goals

Memory

Cognitive state of agent is multimodal

Advantages to an Agent of Multi-Modal State and memory

• Not all the symbolic propositions corresponding to a memory have to extracted and stored in memory for future use– An “abstract” image may be retrieved and

task-specific information extraction can be made.

Concluding Remarks

• Cognition is built on perception, not simply in the obvious sense that perception gives us information about the world, but in the sense that our memories and our problem solving uses perceptual representations during problem solving.

• Diagrammatic reasoning is a great window into studying a large number of issues if we take the above view seriously.