sims 202, marti hearst metadata, objects, relations: similarities and differences and cognitive...
Post on 20-Dec-2015
225 views
TRANSCRIPT
SIMS 202, Marti Hearst
MetaData, Objects, MetaData, Objects, Relations: Similarities and Relations: Similarities and
DifferencesDifferencesandand
Cognitive Aspects of Cognitive Aspects of CategorizationCategorization
SIMS 202, Lecture 10SIMS 202, Lecture 10
Fall, 1997Fall, 1997
Prof. Marti HearstProf. Marti Hearst
UCB SIMS 202
Today: Four Related Today: Four Related QuestionsQuestions Why are we learning about Why are we learning about
metadatametadata database designdatabase design object oriented systems?object oriented systems?
How are these related to one another?How are these related to one another? How are these different from one another?How are these different from one another? Why is it hard to define/design these things?Why is it hard to define/design these things?
What cognitive science isWhat cognitive science is What cogsci tells us about categorizationWhat cogsci tells us about categorization
UCB SIMS 202
Why are we learning about Why are we learning about metadata, database design metadata, database design and OO systems?and OO systems? Information Information organizationorganization These are all ways to handle complexity, These are all ways to handle complexity,
by imposing structure and order on messy by imposing structure and order on messy datadata
Each is useful in a different wayEach is useful in a different way
UCB SIMS 202
How is the Relational Model How is the Relational Model Related to the Object Related to the Object Oriented Model?Oriented Model? Let’s start with a re-description of objects.Let’s start with a re-description of objects.
Objects are instantiated classes Objects are instantiated classes Classes are have attributesClasses are have attributes Attribute is the TYPE of information (kind of like a Attribute is the TYPE of information (kind of like a
data type in a programming language)data type in a programming language) Attributes have VALUES that fit their TYPEAttributes have VALUES that fit their TYPE
attribute TYPE: integer, VALUE: 9attribute TYPE: integer, VALUE: 9 attribute TYPE: suit, VALUE: club, heart, spade, attribute TYPE: suit, VALUE: club, heart, spade,
diamonddiamond attribute TYPE: name, VALUE: Juanita, Dekai, Lauraattribute TYPE: name, VALUE: Juanita, Dekai, Laura
UCB SIMS 202
Attributes vs. ClassesAttributes vs. Classes
How do we make this distinction?How do we make this distinction? Say we are clothing manufacturers.Say we are clothing manufacturers.
Fur is a classFur is a class Animal is an attributeAnimal is an attribute
Say we are naturalists.Say we are naturalists. Animal is a classAnimal is a class Fur is an attributeFur is an attribute
UCB SIMS 202
Garment Makers vs. Garment Makers vs. NaturalistsNaturalists
Class Class FurFur AnimalAnimal: fox, rabbit, sable: fox, rabbit, sable Color: red, black, whiteColor: red, black, white Texture: silky, thick, coarseTexture: silky, thick, coarse Garment_type: coat, stole, hatGarment_type: coat, stole, hat
Class Class AnimalAnimal Outer_Covering:Outer_Covering: fur fur, skin, scales, skin, scales Number_of_limbs: 4, 6, 8Number_of_limbs: 4, 6, 8 Circulatory_System: cold_blooded, hot_bloodedCirculatory_System: cold_blooded, hot_blooded
UCB SIMS 202
Attributes vs. ClassesAttributes vs. Classes
This example showed that one user’s This example showed that one user’s classes are another user’s attributes.classes are another user’s attributes.
UCB SIMS 202
Let’s Revisit RelationsLet’s Revisit Relations
Table contains rows of dataTable contains rows of data The data has attribute typesThe data has attribute types Can perform certain operations:Can perform certain operations:
select (pick out rows)select (pick out rows) project (pick out columns)project (pick out columns) join (match up 2 or more tables’ data)join (match up 2 or more tables’ data) add (add a new row)add (add a new row) delete (delete a row)delete (delete a row) update (change a value within a row)update (change a value within a row)
UCB SIMS 202
Relations vs. ObjectsRelations vs. Objects ER Diagram:ER Diagram:
Entity = ClassEntity = Class Attribute = AttributeAttribute = Attribute Relation ~ MethodRelation ~ Method
Relational TableRelational Table Table ~ ClassTable ~ Class Row ~ Instantiated Object of ClassRow ~ Instantiated Object of Class Column = Attribute TYPEColumn = Attribute TYPE Value in (row,column) = Attribute VALUEValue in (row,column) = Attribute VALUE Name ~ Primary KeyName ~ Primary Key
UCB SIMS 202
Relations vs. ObjectsRelations vs. Objects There are no Class-specific Methods in the Relational There are no Class-specific Methods in the Relational
ModelModel There are general-purpose methods on all data:There are general-purpose methods on all data:
update (change), select, delete, add, join, projectupdate (change), select, delete, add, join, project The Relation in the ER diagram indicates how to set up the tables The Relation in the ER diagram indicates how to set up the tables
so they can be easily joinedso they can be easily joined
There is no unique Object Id (Address) in the Relational There is no unique Object Id (Address) in the Relational Model Model
Can only access an “instantiated object” by combinations of its Can only access an “instantiated object” by combinations of its “attribute values”“attribute values”
Normalization can cause the object representation to be spread Normalization can cause the object representation to be spread out across several tablesout across several tables
No encapsulated data in the Relational ModelNo encapsulated data in the Relational Model
UCB SIMS 202
Garment Maker vs. Garment Garment Maker vs. Garment MakerMaker
Class Class FurFur Animal: fox, rabbit, sableAnimal: fox, rabbit, sable Color: Color: red, black, whitered, black, white Texture: silky, thick, coarseTexture: silky, thick, coarse Garment_type: coat, stole, hatGarment_type: coat, stole, hat
Class GarmentClass Garment Material: Material: furfur, cotton, wool, cotton, wool Color: Color: red, black, brown, white, bluered, black, brown, white, blue Garment_Type: coat, stole, hatGarment_Type: coat, stole, hat
Problem: match color to materialProblem: match color to material
UCB SIMS 202
Nesting Attributes and Nesting Attributes and ClassesClasses
Class GarmentClass Garment Material: Material:
Class FurClass Fur Animal: fox, rabbit, sableAnimal: fox, rabbit, sable Color: red, black, whiteColor: red, black, white Texture: silky, thick, coarseTexture: silky, thick, coarse
Class CottonClass Cotton Color: red, blue, white, brown, blackColor: red, blue, white, brown, black Thread_Count: 100, 200Thread_Count: 100, 200
Garment_type: stole, coat, hat, t-shirtGarment_type: stole, coat, hat, t-shirt Attributes often must be nestedAttributes often must be nested Alternative: two subclasses of GarmentAlternative: two subclasses of Garment
UCB SIMS 202
Normalization and NestingNormalization and Nesting In the Relational Model, Normalization In the Relational Model, Normalization
“flattens out” the Nesting“flattens out” the Nesting Why?Why?
Normalization makes certain kinds of Normalization makes certain kinds of access more efficient, less likely to mess access more efficient, less likely to mess up updatesup updates
Why isn’t this confusing in the OO model?Why isn’t this confusing in the OO model? Key: Relational and OO used in different Key: Relational and OO used in different
situationssituations
UCB SIMS 202
Relations vs. ObjectsRelations vs. Objects
ObjectsObjects Nomads, doing their own thing, rugged Nomads, doing their own thing, rugged
individualists, use one-at-a-timeindividualists, use one-at-a-time Example: program running on a printerExample: program running on a printer
Relations: Relations: Packed into apartments, lots and lots of items Packed into apartments, lots and lots of items
all being lined up in one place for easy all being lined up in one place for easy comparisoncomparison
Queries: Find all X that have Y and are > ZQueries: Find all X that have Y and are > Z Example: Everyone’s phone bill in the U.S.Example: Everyone’s phone bill in the U.S.
UCB SIMS 202
Relations vs. ObjectsRelations vs. Objects
Can you have a table of objects?Can you have a table of objects? Can you have an object that has a table?Can you have an object that has a table?
UCB SIMS 202
Metadata vs. ObjectsMetadata vs. Objects
MetaData like the Dublin Core is simpleMetaData like the Dublin Core is simple Much like the name, attribute parts of a ClassMuch like the name, attribute parts of a Class No methodsNo methods
MetaData like AACRII is messierMetaData like AACRII is messier A bunch of rules about how to deal with the A bunch of rules about how to deal with the
exceptionsexceptions Law deals quite a bit with exceptionsLaw deals quite a bit with exceptions Computer Science tries as hard as possible to Computer Science tries as hard as possible to
abstract away or ignore exceptionsabstract away or ignore exceptions
UCB SIMS 202
Why are we learning about Why are we learning about this old library stuff?this old library stuff?
The computer science tradition is good at The computer science tradition is good at abstracting away details.abstracting away details.
The computer science tradition is The computer science tradition is notnot good at good at describing detail and convoluted exceptions.describing detail and convoluted exceptions.
The library tradition can teach us something The library tradition can teach us something useful about how to describe complex data.useful about how to describe complex data.
Think about how these bibliographic Think about how these bibliographic examples can be applied to other domains examples can be applied to other domains (maybe a test question!!!)(maybe a test question!!!)
UCB SIMS 202
Metadata vs. Relational Metadata vs. Relational ModelModel Relational model makes use of MetadataRelational model makes use of Metadata The description of the database is often called a The description of the database is often called a
SchemaSchema The Schema is a kind of Metadata descriptionThe Schema is a kind of Metadata description Main differences:Main differences:
Exceptions not handled well in the relational model Exceptions not handled well in the relational model eithereither
Relational model focus is on the system designRelational model focus is on the system design Metadata focus is on the description of the data, Metadata focus is on the description of the data,
independent of a computer systemindependent of a computer system
UCB SIMS 202
Fresh Topic: Why is this Stuff Fresh Topic: Why is this Stuff Hard?Hard?
These are all variations on CategorizationThese are all variations on Categorization Categorization is an important topic in:Categorization is an important topic in:
PhilosophyPhilosophy Language/LinguisticsLanguage/Linguistics PsychologyPsychology
How does the human mind do How does the human mind do categorization?categorization?
UCB SIMS 202
What’s In a Sentence?What’s In a Sentence?““A sentence is not a verbal snapshot or movie of A sentence is not a verbal snapshot or movie of
an event. In framing an utterance, you have to an event. In framing an utterance, you have to abstract away from everything you know, or can abstract away from everything you know, or can picture, about a situation, and present a picture, about a situation, and present a schematic version which conveys the essentials. schematic version which conveys the essentials. In terms of grammatical marking, there is not In terms of grammatical marking, there is not enough time in the speech situation for any enough time in the speech situation for any language to allow for the marking of everything language to allow for the marking of everything which could possibly be significant to the which could possibly be significant to the message.”message.”
Dan Slobin, in Dan Slobin, in Language Acquisition: The state of the art, Language Acquisition: The state of the art, 19821982
UCB SIMS 202
Approximating MeaningApproximating Meaning
Defining attributesDefining attributes A weak approximation to meanings and A weak approximation to meanings and
conceptsconcepts Defining methodsDefining methods
A weak approximation to how these A weak approximation to how these meanings interact and changemeanings interact and change
Necessary and Sufficient ConditionsNecessary and Sufficient Conditions Example: A prime number is an integer Example: A prime number is an integer
divisible only by itself and 1.divisible only by itself and 1.
UCB SIMS 202
Properties of CategorizationProperties of Categorization
Family Resemblance: Family Resemblance: Members of a category may be related to Members of a category may be related to
one another without all members having one another without all members having any properties in common that define the any properties in common that define the category.category.
Centrality:Centrality: Some members of a category may be Some members of a category may be
“better examples” of that category than “better examples” of that category than others.others.
UCB SIMS 202
CentralityCentrality A category: Prime NumbersA category: Prime Numbers
Definition: An integer divisible only by itself Definition: An integer divisible only by itself and 1and 1
Examples: 1, 2, 3, 5, 7, 11, 13, 17, …Examples: 1, 2, 3, 5, 7, 11, 13, 17, … A very clear-cut category. Or is it?A very clear-cut category. Or is it? Can one number be “more prime” than Can one number be “more prime” than
another?another? CENTRALITY: some members of a category CENTRALITY: some members of a category
may be “better examples” than othersmay be “better examples” than others
UCB SIMS 202
Definition of GameDefinition of Game Famous example by WittgensteinFamous example by Wittgenstein Classic categories: clear boundaries defined by common Classic categories: clear boundaries defined by common
propertiesproperties Counterexample: GameCounterexample: Game No common properties shared by all gamesNo common properties shared by all games
card games, ball games, Olympic games, children’s gamescard games, ball games, Olympic games, children’s games competition: ring-around-the-rosiecompetition: ring-around-the-rosie skill: dice gamesskill: dice games luck: chessluck: chess
No fixed boundary; can be extended to new gamesNo fixed boundary; can be extended to new games video gamesvideo games
Alternative: Concepts related by Family ResemblancesAlternative: Concepts related by Family Resemblances
UCB SIMS 202
Characteristic FeaturesCharacteristic Features
Perceived degree of category membership Perceived degree of category membership has to do with which features define the has to do with which features define the category.category.
Members usually do not have ALL the Members usually do not have ALL the necessary features, but have some subset.necessary features, but have some subset.
Those members that have more of the central Those members that have more of the central features are seen as more central members.features are seen as more central members.
People have conceptions of typical members.People have conceptions of typical members.
UCB SIMS 202
Properties of CategorizationProperties of Categorization
Basic-level Categories:Basic-level Categories: Categories are organized into a hierarchy from Categories are organized into a hierarchy from
the most general to the most specific, but the the most general to the most specific, but the level that is most cognitively basic is “in the level that is most cognitively basic is “in the middle” of the hierarchymiddle” of the hierarchy
Basic-level Primacy:Basic-level Primacy: Basic-level categories are functionally primary Basic-level categories are functionally primary
with respect to factors including ease of with respect to factors including ease of cognitive processing (learning, reasoning, cognitive processing (learning, reasoning, recognition, etc).recognition, etc).
UCB SIMS 202
Levels of AbstractionLevels of Abstraction Brown 1958, 65, Berlin et al., 1972, 73Brown 1958, 65, Berlin et al., 1972, 73 Folk biology:Folk biology:
unique beginner: plant, animalunique beginner: plant, animal life form: tree, bush, flowerlife form: tree, bush, flower generic name: pine, oak, maple, elmgeneric name: pine, oak, maple, elm specific name: Ponderosa pine, white pinespecific name: Ponderosa pine, white pine varietal name: western Ponderosa pinevarietal name: western Ponderosa pine
No overlap between levelsNo overlap between levels Level 3 is basicLevel 3 is basic Level 3 corresponds to genusLevel 3 corresponds to genus
UCB SIMS 202
Characteristics of Basic-level Characteristics of Basic-level CategoriesCategories
LanguageLanguage People name things more readily at basic levelPeople name things more readily at basic level Name learned earliest in childhoodName learned earliest in childhood Languages have simpler names at basic levelLanguages have simpler names at basic level Sounds like the “real name” Sounds like the “real name” Name used more frequentlyName used more frequently
Strange to call a dime a coin, a metal objectStrange to call a dime a coin, a metal object
Names used in neutral contextNames used in neutral context There’s a dog on the porch.There’s a dog on the porch. There’s a terrier on the porch.There’s a terrier on the porch.
UCB SIMS 202
Characteristics of Basic-level Characteristics of Basic-level CategoriesCategories ConceptsConcepts
Things perceived more wholistically at basic level Things perceived more wholistically at basic level (rather than by parts)(rather than by parts)
No difference in how people interact with the No difference in how people interact with the concept between basic and more specific levelsconcept between basic and more specific levels
Things are remembered more readily at basic Things are remembered more readily at basic levellevel
Folk biology categories correspond accurately to Folk biology categories correspond accurately to scientific biological categories only at the basic scientific biological categories only at the basic levellevel
UCB SIMS 202
Superordinate and Superordinate and Subordinate LevelsSubordinate Levels
SUPERORDINATE animal furnitureSUPERORDINATE animal furniture
BASIC LEVEL dog chairBASIC LEVEL dog chair
SUBORDINATE terrier rockerSUBORDINATE terrier rocker
Children take longer to learn Children take longer to learn superordinatesuperordinate
Superordinate not associated with mental Superordinate not associated with mental images or motor actionsimages or motor actions
UCB SIMS 202
Typicality and Characteristic Typicality and Characteristic FeaturesFeatures Some categories have clear boundaries, but have Some categories have clear boundaries, but have
graded membershipgraded membership What is a good example of a bird?What is a good example of a bird? Examples from language:Examples from language:
A robin is a bird. A robin is a bird. A chicken is a bird.A chicken is a bird. A bat is a bird.A bat is a bird.
Takes longer for people to say the second is true and Takes longer for people to say the second is true and the third is falsethe third is false
Features characterize the categoryFeatures characterize the category How many typical features does the object possess?How many typical features does the object possess?
UCB SIMS 202
Characteristic FeaturesCharacteristic Features
Is a cat on a mat at cat?Is a cat on a mat at cat? Is a dead cat a cat?Is a dead cat a cat? Is a photo of a cat a cat?Is a photo of a cat a cat? Is a cat with three legs a cat?Is a cat with three legs a cat? Is a cat that barks a cat?Is a cat that barks a cat? Is a cat with a dog’s brain a cat?Is a cat with a dog’s brain a cat? Is a cat with every cell replaced by a dog’s cells a Is a cat with every cell replaced by a dog’s cells a
cat?cat?
UCB SIMS 202
PolysemyPolysemy
Most words have more than one senseMost words have more than one sense that dog has floppy earsthat dog has floppy ears good ear for jazzgood ear for jazz three ears of cornthree ears of corn
Homonymy: same word, different meaningHomonymy: same word, different meaning Polysemy: different senses of same wordPolysemy: different senses of same word
UCB SIMS 202
Category Structure and Category Structure and PolysemyPolysemy
Category membership is determined by Category membership is determined by shared subsets of featuresshared subsets of features
Different senses of a word reflect Different senses of a word reflect differences in which attributes are shareddifferences in which attributes are shared
This is reflected in language by polysemyThis is reflected in language by polysemy related meaning, but slightly differentrelated meaning, but slightly different Example: bankExample: bank
the building, the institution, the notion of where the building, the institution, the notion of where money is storedmoney is stored
UCB SIMS 202
MetonymyMetonymy
Use one aspect of something to stand for Use one aspect of something to stand for the wholethe whole The building stands for the institution of the The building stands for the institution of the
bank.bank. Newscast: “The White House relased new Newscast: “The White House relased new
figures today.”figures today.” Waitperson: “The ham sandwich spilled his Waitperson: “The ham sandwich spilled his
drink.”drink.”
UCB SIMS 202
SynonymySynonymy
Different ways of expressing related conceptsDifferent ways of expressing related concepts ExamplesExamples
cat, feline, Siamese catcat, feline, Siamese cat Overlaps with basic, subordinate levelOverlaps with basic, subordinate level Synonyms are almost never “true”Synonyms are almost never “true”
used in different contextsused in different contexts have different implicationshave different implications This is a point of contention.This is a point of contention.
UCB SIMS 202
ThesauriThesauri
Polysemy: same word, different senses of Polysemy: same word, different senses of meaningmeaning slightly different concepts expressed slightly different concepts expressed
similarlysimilarly Synonyms: different words, related Synonyms: different words, related
senses of meaningssenses of meanings different ways to express similar conceptsdifferent ways to express similar concepts
Thesauri help draw all these togetherThesauri help draw all these together
UCB SIMS 202
SummarySummary
Processes of categorization underlie many of the Processes of categorization underlie many of the issues having to do with information organizationissues having to do with information organization
Categorization is messier than our computer Categorization is messier than our computer systems would likesystems would like
Human categories have graded membership, Human categories have graded membership, consisting of family resemblances.consisting of family resemblances.
Family resemblance is expressed in part by which Family resemblance is expressed in part by which subset of features are sharedsubset of features are shared
It is also determined by underlying understandings It is also determined by underlying understandings of the world that do not get represented in most of the world that do not get represented in most systemssystems