13 other types of databases oracle, m / mumps, x.500/ldap and search engines, + lessons from gm...

55
13 Other Types of Databases Oracle, M/MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

Upload: hayley-codrington

Post on 01-Apr-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

13

Other Types of DatabasesOracle, M/MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day”

MIS 304 Winter 2006

Page 2: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

2

13

Goals for this class

• Identify tools to help evaluate database products• Understand the role of other data management

architectures.• Understand the features of the MUMPS data

structure.• Understand the structure of the X.500/LDAP

directory standards.• Understand the Linear Associative Model

Page 3: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

3

13

Database Evaluation

• Requirements, Requirements, Requirements• Do the evaluation!• Make it as realistic as possible• Use outside tools

Page 4: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

4

13

Transaction Processing Council

• www.tpc.org

Page 5: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

13

M/Mumps

Page 6: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

6

13

There are 2 basic ways to organize data

• The tree• The table

Page 7: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

7

13

“M” a.k.a. MUMPS

• Massachusetts General Hospital Multi-utility Programming System

• The ANSI Standard version now called simply “M”

• “Multidimensional” database.– http://www.cache.com

Page 8: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

8

13

The MUMPS Data Structure

• In traditional programming languages SOMETHING(X,Y) or SOMETHING(1,2,3,4…) or SALES (1,1,1,1) = 42

• In MUMPS Sales(region,salesman,product,time) example: TotalSales = Sales(east,Fred,clocks,Q1) + Sales(east,Ed,clocks,Q1)

Note that the indexes on the “array” are word valued and not number valued.

Page 9: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

9

13

So What Does the Query Look Like?

FOR region = EAST to WEST FOR salesman=Adams to Smith FOR product=1111 to 9999 FOR time=Jan to Dec TOTALSALES = TOTALSALES +

SALES(region,salesman,product,time) NEXT time NEXT product NEXT salesmanNEXT region

Page 10: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

10

13

Page 11: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

13

Directory ServicesA Special Database Case

MIS 304 Fall 2005

Page 12: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

12

13

Class Goal

• Understand the application of Naming to network management.

• Understand the idea of a classification hierarchy.

• Understand Lightweight Directory Access Protocol (LDAP) and its application.

Page 13: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

13

13

The Case for Directories

• The “Net” has become increasingly complex.

• More need than ever to work across organizational boundaries.

• Wouldn’t it be great if everything had a unique and understandable name?

Page 14: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

14

13

What’s in a Name

• People• Buildings• Computers• Printers• Locations• Objects (computer)• Roads• Vehicles

• Rooms• Stock locations• Truck wells• Servers

Page 15: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

15

13

The Goal

If you can name it and locate it you can manage it.

Page 16: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

16

13

What’s in a Name

• A name draws a distinction between two things. G. Spencer-Brown, Laws of Form, Dutton, 1979.

• To take advantage of human processing capabilities names should be “friendly”.

Page 17: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

17

13

Taxonomy

• The study of the general principles of scientific classification.

• A way to organize anything into hierarchical categories based on characteristics.

• Used widely in Biological Sciences.

Page 18: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

18

13

Taxonomy Example

P hy laV er tib ra te

P hy laV er tib ra te

K ing domA nim a l

K ing domP lan t

R o ot

Page 19: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

19

13

Taxonomy Example in Biology

• Kingdom • Phylum (in animals)

or Division (in plants)• Class• Order• Family• Genus • Species

Page 20: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

20

13

Taxonomy Example in IS

O = E D S

G = Ja n ice

S = M or in

O U = P eo p le

O = Fo rd

C = U S

O = D a im le rC h rys le r

C = D E

G = G a ry

S = M or in

O U = A uto m o tive

O = B C E E m e rg is

C = C A

R o ot

Page 21: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

21

13

X.500

• Originally part of the Open Systems Interconnect (OSI) network suite.

• Defined directory structure on an OSI network.

• Modified to run over TCP/IP networks (Internet).

Page 22: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

22

13

Tags

• C = Country• O = Organization• OU = Organizational Unit• L = Location• G = Given Name• S = Surname

Page 23: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

23

13

Person Identifier

• C= CA• O= BCE Emergis• OU= Automotive• S= Morin• G= Gary

Page 24: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

24

13

Person Identifier

• Because of object “inheritance” each level inherits the attributes of the preceding level.

Page 25: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

25

13

Database Structure

A sse m b ly V iste on

O = Fo rd O = D a im lerCh rys le r

A uto m otive

O = B C E E m e rg is

R o ot

• Can be either hierarchical or relational.• If it’s relational, what’s the key?

O OU SBCE Emergis Automotive MorinDaimler Chrysler Chrysler SmithEDS US CutlerFord Assembly JonesFord Visteon Morin

Page 26: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

26

13

Distinguished Name

• A string of globally unique characters.• Almost everything has problems.

– Mohamed Chang?– SSN?– An E-Mail address?

• You almost always have a “messy” key.

Page 27: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

27

13

Lookup in SQL

• Select * from DIRECTORY where c = us and o = Ford and s = “Morin”

• Where is DIRECTORY?• SQL may not be the ideal answer.

Page 28: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

28

13

LDAP

• X.500 was getting really messy.• Most organizations did not need all of

the features.• Some U of M students wrote the

Lightweight Directory Access Protocol.• Defines how to connect to an query a

X.500 style database with lots less overhead.

Page 29: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

29

13

LDAP Examples

• Microsoft Exchange/Outlook• Lotus Notes• Novell NDS• Netscape browser• Open LDAP http://www.openldap.org• WAX500, MAX500, XAX500

Page 30: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

30

13

Logical Extensions

• Once you can name it, locate it and have a way of querying it just extend the idea to any object.

Page 31: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

31

13

Communities of Interest

• Internet Engineering Task Force X.521 describes a “person” object.

• AIAG has a guideline to describe Companies and Locations.

Page 32: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

32

13

Example 1

• CN = ITM Centerline• ou=locations• o=arius.com• street = 25999 Lawrence Ave• l = Centerline• st = Michigan• c = us• postalCode = 48015-0303• buildingNumberOfFloors = 1

Page 33: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

33

13

Example 2

• cn = Detroit Medical Center Helipad• ou = locations• landingStripType = concrete• landingStripElevation = 630 ft• landingStripAirportID =5MI0• l = Detroit• st = Michigan• street = 420 St. Antoine• c = us

Page 34: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

34

13

Naming Objects

• Computer Objects are somewhat different than physical things.

• Human readability is not so much of an issue and lookup speed is.

Page 35: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

35

13

OSI ASN.1

• A notation for describing data structures.

• Uses an Object Identifier (OID) and a short text description to identify levels of the tree.

• If a labeled node is a leaf in a tree then it is an object and contains a value.

Page 36: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

36

13

Example 1.3.6.1.2.2

C C IT T (0 )

D irec to ry (2 )

M IB -I (1 ) M IB -II (2 )

M an ag em en t (2 ) E xpe rim e nta l (3 ) P r iv a te (4 )

D O D (1 )

In te rne t (6 )

O R G (3 )

IS O (1 ) Jo in t - IS O -C C IT T (2 )

R o o t

Page 37: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

37

13

So What?

• You can build a cross company directory.– Names are agreed on by a common

standards body (AIAG)– Common Query Language (LDAP)

• Each organization keeps its own information current.

• Extensions are easy to add.

Page 38: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

13

Search Engines and The Associative Retrieval Modela new kind of Database?

MIS 304 Fall 2004

Page 39: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

39

13

Goals for this class

• Understand that a linear associative retrieval model is.

Page 40: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

40

13

There are 2 basic ways to organize data?

• The tree• The table

And…• A Matrix of Associations?

Page 41: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

41

13

The Problem to be Solved

• The Internet has a large number of documents linked together with the documents spread out physically across many web servers.

• How do you find anything?

Page 42: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

42

13

One solution

• Build a data structure that indexes the pages.• The structure is populated by searching

individual pages with a “bot”, a program that surfs the web returning the text of the many pages there.

• The pages returned by the bot are processed into a special kind of database.

Page 43: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

43

13

A Simple Document Index Structure

• Create a matrix containing the index terms on one axis and the documents containing them.– Leave out words like a, the, and, it…– Assign a number to each term and document.

– Call this matrix C

Doc 1 Doc 2 Doc 3 Doc 4 Doc 5

Term 1 1 1 1 0 1

Term 2 0 1 1 0 0

Term 3 1 0 0 1 1

Page 44: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

44

13

Coordinate Retrieval

• Now suppose we want take all of the documents we have retrieved from the web and query our C matrix for where a term occurs in a document.

• We can do this by creating a 1xt matrix of the terms (t) we want to search for and call it Q then if we normalize so that each row in C sums to 1 we get a 1xd matrix of documents (d) with a score for every document by:

R = QC

Page 45: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

45

13

Discussion

• This is a good as far as it goes but… • This does nothing to help us get to the situation

where there are more complex relationships between the terms.

• Synonyms are a good example.• Suppose you are writing a document you don’t

want to use the same word to describe something over and over again so you use a synonym.

• The probability that both words occur in same document is greatly increased.

Page 46: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

46

13

Inter-term Relationships

• Suppose we want to include these inter-term relationships in our search.

• We need a Thesaurus.

Page 47: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

47

13

Transform

• Now look through the table and create a matrix of the number of times terms occur together in a document.

Term1 Term2 Term3

Term1 4 2 2

Term2 2 2 0

Term3 2 0 3

Page 48: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

48

13

Normalization Matrix

• Normalize the transform table so that the cells are the “cost” of that two terms occur together. Call that matrix L.

Term1 Term2 Term3

Term1 .125 .125 .33

Term2 .125 .50 0

Term3 .33 0 .33

Page 49: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

49

13

Query Vector

• Now create a vector of the terms you want to search for.

Term 1 Term 2 Term 3

1 0 1

Page 50: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

50

13

Now the Math

Multiply the index term table, call it T by the normalized transform table C and the Query vector Q and you get a vector R that contains a ranking of documents 0 to 1.

R = QLC

Page 51: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

51

13

Results

• The result vector.

• The documents with the highest value have the most likelihood of being relevant to our search.

Doc 1 Doc 2 Doc 3 Doc 4 Doc 5

Rank 0 .5 .3 .6 .1

Page 52: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

52

13Discussion

• The Matrix that is created by the multiplication of L and C now becomes a new kind of structure a matrix of “associations” between documents and terms and the terms themselves.

• This may be the only other way of organizing data besides the table and the tree.

• You can extend this by creating a new structure that is a normalized document by document (dxd) matrix that takes into account associations between documents. (e.g.) chapters or authors.

• This falls into the new category called “Connectionist” models that include Neural Networks.

Page 53: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

53

13

A Model of Consciousness

• Some have even gone so far as to say this may be one of the structures in a conscious brain. (Kanerva, 1988)

• Do some thought experiments on your own “associative” brain by trying some stream of consciousness exercises.

Page 54: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

54

13

Linear Associative Retrieval Model

• Giulianio and Jones, Linear Associative Retrieval, Vistas in Information Handling, Spartan Press 1962,

• Hough, The Control of Complex Systems, Progress in Cybernetics and Systems Research, Halstead Press, 1975.

• Kanerva, Sparse Distributed Memory, MIT Press, 1988.

Page 55: 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006

55

13

The Future

• More of the same– There is a lot of pent up inertia

– SQL is a pretty good programming language

• More XML– There is no stopping this train.

• More AI/Connectionist/Associative tools• Bigger and bigger databases