chapter 2 inflating an artist’s sketch visual intelligence: how we create what we see donald d....

Chapter 2Inflating an Artist’s Sketch

Visual Intelligence: How We Create What We SeeDonald D. Hoffman

Setting the Scene • Chapter begins with story of a boy born blind.• William Cheselden, famed surgeon, gave boy artificial pupils

in each eye. • After the surgery, the boy associated objects with how they

feel rather than how they appear. • The boy asked which was lying, feeling or seeing.• The boy also confirmed Berkeley’s theory that one born blind

could not, upon being cured, immediately recognize shapes. • According to Berkeley, we do not see shape and space, we feel

them. We associate colored patches with the shapes and faces we feel through experience.

The Necker Cube• Published by Swiss

naturalist Louis Albert Necker in 1832• We see a 3D shape but feel

a 2D shape• Notice there are two cubes• You flip between seeing one

and the other, one with A in front and the other with B in front.

The Necker Cube Four Main Questions1. Where is cube B when you are seeing cube A?2. Where is cube A when you are seeing cube B?3. Where are cubes A and B when you look away from the

figures?4. When you see cube A and your friend at the same time sees

cube B, who is right?

Possible Responses:1. Don’t worry. The Necker cube does not occur in nature. It tricks

the eye in unnatural ways and reveals nothing about normal vision. The questions are pointless.

2. Cubes? What cubes? I don’t feel any cubes when I touch this figure.

3. You construct the cubes you see. You aren’t aware, perhaps, of your construction process, but you are aware of the result.

Response (1) is unclear as to what defines “natural” and “unnatural.” Also denies the study of simple systems. Response (2) refers to the principle discussed before. I see a cube but do not feel one, so one sense is lying. However, dismissing the illusionary cubes from further study is the wrong strategy. Response (3) is the response most cognitive scientists and researchers give. This is the response we want.

According to Response (3)…

Where is cube B when you see cube A?A: If, when you see A you construct A but not B, B is nowhere. It may be that when you see A you also construct B but don’t see it. Where are cubes A and B when you look away from the figures?A: When you don’t view the figure, you don’t construct the cubes. So the cubes are nowhere.

Picky Minds

• Hertha Kopfermann discovered in 1930 that we are picky about which figures we see as cubes.

• The middle cube is the Necker cube. It is easily seen in 3D.• Most viewers see the Kopfermann cubes (left and right)

initially in 2D, then see a 3D cube. • Why do we see the Necker cube so easily but not the

Kopfermann cubes?

Answer:

First, why is it easy to see the Necker cube in 3D?Because you see three dimensions in images that have but two each time you open your eyes, according to rules. The image at the eye (like a drawing) is always two dimensional. You construct the third dimension. You construct depth according to rules, and these rules deter you from seeing Kopfermann cubes in 3D. Simply put, you visual system is biased. It constructs only 3D images that conform to its rules. So, you have a principled ambiguity each time you need to see depth.

Seeing Depth

The fundamental problem of seeing depth:The image at the eye has two dimensions; therefore it has countless interpretations in three dimensions.

Rules of Construction• Critical to vision. Without them, vision would be a huge mess.• We employ many rules simultaneously. Often, one overrides

another or rules compromise. • Not explicitly written down in the mind, but are implicit in its

workings.• In general, you are not aware of these rules. • You learn them early in life through visual experience.

The Fundamental Role of Visual Rules:You construct visual worlds from ambiguous images in conformance to visual rules.

First Example

Rule of Generic Views:Construct only those visual worlds for which the image is a stable (i.e., generic) view.

To visualize:Consider the following vertex:• Wish to give drawing a 3D interpretation.• Could construct “chopsticks” in which these two segments do not meet at their tips. • There is a gap between the tips, but it is not visible because of the angle.

By rotating one’s head, the gap or a crossing would appear:

So, for the chopsticks, it’s an accident of viewpoint if the tips of the two segments appear to coincide. Accidents of viewpoint are hard to come by. The probability that any given image falling at the eye is an accidental view is almost zero. Therefore, you reject any interpretation that requires that your current image be an accidental view. That, in essence, is the rule of generic views.

First Two Rules

Rule 1: Always interpret a straight line in an image as a straight line in 3D.

Rule 2: If the tips of two lines coincide in an image, then always interpret them as coinciding in 3D.

For Example:

Three lines run through the middle,joining six vertices. According to Rule 1, each line must be interpreted linearly, without any corners,in space. If we change our view of this cubeslightly, we obtain a generic view and will see a cube:

Alternate Explanation: Symmetry However, perhaps it’s not the rule of generic views that keeps you from seeing the Kopfermann cubes, but that the drawings have simple and symmetric interpretations as 2D figures. Two competing explanations: generic views and symmetry. Both account for our failure to see Kopfermann cubes. We can check which prevails here.Check a view that is symmetric and generic. Symmetry predicts that since the view is symmetric, you should see it as 2D. But, the rule of generic views predicts that since the view is generic, you see it as 3D.

Symmetric and Generic• If you see a 3D shape,

then the rule of generic views dominates your visual construction.• If you see a 2D figure,

symmetry dominates your visual construction. • We can perform a

second test with unsymmetric and non-generic figures.

Unsymmetric and Non-Generic

On the left, we see a generic view of a 3D shape.On the right, we see a near-by view of the same shape, only non-generic. If you more easily see three dimensions on the left, then this suggests that the rule of generic views, not symmetry, here dominates your visual construction.

Devil’s Triangle

Rule 1 says to see each line in this figure as a line in space. Rule 2 says to see each vertex as a vertex in space. But, the triangle on the left is not generic, and Rules 1 and 2 only apply to generic views. You construct an impossible 3D interpretation rather than violate this rule.By covering the non-generic parts, however, you construct a legitimate 3D interpretation.

Trading Towers

Note the long diagonals running through the center.When you see the tall, thin towers, these diagonals bend in the center. This breaks Rule 1 – no bends allowed! When you see the short, wide towers, then the vertical and horizontal lines that cross in the center bend, again breaking Rule 1. Your visual intelligence searches for an alternative to breaking this rule.

Attached Boxes

We see a small box resting on top of a big box. Notice the vertical line on the front right edge – colinear. In 3D, though, the line does not appear colinear. The top box appears to sit well behind the front of the bottom box. Your interpretation breaks the rule of generic views.

Third Rule

Rule 3: Always interpret lines colinear in an image as colinear in 3D.

Reasoning: If two lines in space are not colinear, then they only appear colinear accidentally. Why do we break this rule of generic views with the boxes?One explanation is because our vision takes gravity into account. Objects under gravity don’t float in midair. But this doesn’t seem to be the main rule here. Try explaining the occurrence with proximity.

Necker Cube with Bubbles

The four bubbles appear to have different depths, two in front and two behind. If you cover the cube but not the bubbles, the bubbles appear coplanar. The bubbles inherit their depths from the cube. The rule of inheritance is proximity: each bubble inherits its depth from the portion of the cube nearest to it in the figure.

Fourth Rule

Rule 4: Interpret elements nearby in an image as nearby in 3D.

So far, we have focused on figures composed entirely of linear structures. Let’s now consider a structure of curves, like the doughnut.

The Doughnut

Rules 1-3 cannot help you construct the figure.But, the rule of generic views, which dictates Rules 1-3, dictates other rules that can help with the doughnut. For instance, it dictates:

Rule 5: Always interpret a curve that is smooth in an image as smooth in 3D.

Reason for Rule 5

The reason is simple:If you interpret such a curve as not smooth in 3D, then a slight change of view would destroy its smooth appearance. Notice the two curves in the middle of the figure meet at two points.

You see two T-junctions (upside down T’s), with the stem appearing to lie behind the cap.

Why don’t we interpret the stem and cap the other way, with the stem in front and cap behind?If we did so, a gap would appear and would violate the rule of generic views. Why not interpret the stem and cap as having the same depth, so that they form a real T?This satisfies the rule of generic views, but this would violate the rule of projection. This rule describes how three dimensions can be smashed into two dimensions. Like the rule of generic views, the rule of projection is what we call a “megarule.” The rule of projection dictates linear perspective, natural perspective, and specific rules to interpret the silhouettes of smooth objects. With these rules, we properly interpret a T-junction.

Surface Normals

If you were to walk around on asmooth surface, like a sphere, you would notice that the surfacechanges orientation as you walk.Image surface normals rise as you walk.Normals in the line of sight appear as dots, and nearby normals as short lines. This effect is called foreshortening.

T-Junction Interpretation

The solid curves depict the doughnut’s rim. This depiction works because your visual system constructs its 3D interpretation with a bias toward interpreting image curves, where possible, as rims of smooth objects.

Rule 6: Where possible, interpret a curve in an image as the rim of a surface in 3D.

T-Junction Interpretation, cont.

The dashed curves on the doughnut indicate points where the line of sight grazes the surface. They just don’t happen to be visible from the current view. What we can now see in this figure is that the T-junction is the point where one part of the full rim, namely the cap of the T, starts to conceal another part of the full rim, namely the stem. The dashed contours show the concealed portion of the full rim. This is true for T-junctions formed by projecting any smooth surface.

Rule 7: Where possible, interpret a T-junction in an image as a point where the full rim conceals itself: the cap conceals the stem.

The Rites of Spring and Silhouettes

Rules 5-7 dictate much about 3D shape construction, but not all. Consider: The Rites of Spring silhouettes have no texture or shading, and have nocolor but black, yet convey a 3D illusion.

A theory by Jan Koenderink explainshow one can construct curved shapeswhen you view silhouettes such as these.

Principle Directions and Curvatures:

Imagine a cylinder. The slope along itsside changes rapidly as one goes fromside to side (2), but remains constant as one goes up and down (1). (1) and (2) are principle directions, and they have corresponding principle curvatures. In (1), the principle curvature is 0. In (2), the principle curvature is represented by the equation ofthe circle at the top and bottom of the cylinder. In the cylinder, (1) and (2) are perpendicular to each other. In the end, we discover that principle directions are always perpendicular to each other, no matter the curve. (Derived originally by Leonhard Euler)

Possible Curves

According to Koenderink:

Rule 8: Interpret each convex point on a bound as a convex point on a rim.Rule 9: Interpret each concave point on a bound as a saddle point on a rim.

Back to the Doughnut

Koenderink’s rules restrict how one can interpret the doughnut.

The outer bound of the figure is convex at each point. By Rule 8, you interpret it as a convex rim. The inner bound of the figure is concave at each point.By Rule 9, you interpret it saddle points on a rim. Almost done with the doughnut, but we need one more rule!

Tenth Rule

Rule 10: Construct surfaces in 3D that are as smooth as possble.

“As smooth as possible” is ambiguous – skip justification for now.

So, by Rules 8 and 9, the doughnut is convex on the outside and saddle-shaped on the inside. In between, a surface is constructed which, according to Rule 10, changes smoothly from convex to saddle.The points where it switches are indicated by the dashed curve. At these points, the surface is neither convex nor saddle, but is curved in one principal direction, straight in the other (like a cylinder).

To Sum:• The outer portion of the doughnut is convex.• The inner portion of the doughnut is saddle.• The dashed curve (where convex meets saddle) is cylindrical.• Nowhere is the surface of the doughnut concave.

This also explains why we see three dimensions in The Rites of Spring.

Our visual intelligence goes further than that, as we can construct images that advanced computing machines like the Imager for Mars Pathfinder cannot interpret.

chapter 2 inflating an artist’s sketch visual intelligence: how we create what we see donald d....

Documents

cube b

necker cube

d cube

middle cube

kopfermann cubes

illusionary cubes

hoffman slide

d images