9 database and the internet mis 304 winter 2006. 9 2 review of last class object oriented technology...
TRANSCRIPT
9
Database and the Internet
MIS 304 Winter 2006
2
9
Review of Last Class
• Object Oriented technology provides an alternative to the “Relational” database model.
• OO combines the data and the operations in one package.
• UML provides an alternative modeling language for OO and Relational databases.
3
9
Class Goals
• Understand the role of databases in the Internet world.
• Understand the technology that support the use of databases over the Internet.– Internet protocols– Database technology– Design issues
• Understand the fundamentals of XML• Understand why XML is such a rapidly
growing technology.
4
9
Characteristic and Benefits of Internet Technologies
• Hardware and Software Independence.• Common and simple network protocols
and user interface.• Location independence.• Rapid deployment and manageable costs.
5
9
Independence
• Common network protocols let disparate equipment and programs communicate as though they were the same.
• Separating the network into layers lets the appropriate equipment or programs handle the each job.
6
9TCP/IP
Foundation of the Internet
• Every device (node) has an address.– Dotted syntax 255.255.255.255– Assigned name cottageland.net
• The lowest level of interaction is the “Socket”.– Sockets have a “port” number– Ports are assigned to “protocols”
• Higher level protocols use sockets to communicate via formatted messages.
• Most of the messages are simple English.– SMTP (E-mail) “HELLO”
7
9
Common Protocols
• SMTP - Simple Mail Transport Protocol• FTP - File Transfer Protocol• Telnet – Terminal Emulation• LDAP - Lightweight Directory Access
Protocol• HTTP - Hypertext Transport Protocol
– The World Wide Web– SOAP
8
9
Rapid Deployment
• Location Independence– One world wide network means that in
theory you can put your applications anywhere and run it from anywhere.
• Standards mean:– Software tool cost remain low.– Programmers and managers are easier to
find.– Tools do a lot more work for the
programmer and manager.– No network specific costs.– Hardware and software can be optimized
9
9A Sample Of Applications That Use Internet Technology
10
9Intranets and Extranets
• Intranets– An Intranet is a locally owned and operated Internet
whose access is carefully controlled. Its objective is to enhance company operations through improved data access management.
– Intranets provide a platform for the development of new systems in a timely and cost-effective manner.
– Intranets are relatively easy to set up and to implement at the technical level. Once implemented, intranet services tend to grow exponentially.
• Extranets– If an intranet extends beyond a single corporate entity,
it is known as an extranet.– An extranet extends the intranets to the corporation’s
value chain.
11
9Intranet/Extranet Components
12
9
Intranet Architecture
• Web server
• Electronic mail
• Document search
• File Transfer Protocol (FTP)
• News or discussion groups
• Workflow and team collaboration
• Web-to-database services
• Transaction processing
• Directory, security, and authentication services
• Firewalls and proxy servers
• Load balancing and caching
• Web-to-host access
Common Intranet Services
13
9Basic Components Of The World Wide Web
14
9
Web Building Blocks
• HTTP: The communications protocol that Web uses.
• HTML: The mechanism for coding the display pages.
• URL: An address for every web page.• Hyperlink: A way to link an addresses to a
Web page.• Web Server: A system to format and
present Web pages on request from a…• Web Browser: A program for converting
HTML to a graphical display on a local device.
15
9
Web Building Blocks
• Web pages can be:– Static, coded once and stored on the
server.– Dynamic, created “on the fly” each time a
client asks for one.
In either case the place to store content is in a database!
16
9
Common Gateway Interface
• Web pages are bi-directional– HTTP Get fetches a page at a specified
address.– HTTP Put sends a formatted page back to
the server.
• CGI the Common Gateway Interface:– A standard way of sending formatted
requests to the server for action.– Runs programs on the server end to do
work.
17
9
Intranet Architecture
• Server-Side Extensions: Web-to-Database Middleware
– A server-side extension is a program that interacts directly with the Web server to handle specific types of requests.
– A database server-side extension program is also known as Web-to-database middleware.
18
9Web-To-Database Middleware (ColdFusion)
19
9
Two Web Server Interfaces
• Common Gateway Interface (CGI)
– The CGI uses script files that perform specific functions based on the client’s parameters that are passed to the Web servers.
– The script file is a small program containing commands written in some programming language (e.g., PERL, C++, or Visual Basic)
• Application programming interfaces (APIs)
– APIs are a newer Web server interface standard that is much more efficient and faster than CGI scripts.
– APIs are implemented as shared code or as dynamic-link libraries (DLLs) in the Windoz environment.
20
9The API And CGI Web Server Interfaces
21
9Intranet Architecture
• Open Database Connectivity (ODBC)– ODBC is Microsoft’s implementation of a superset of the
SQL Access Group Call-Level-Interface (CLI) standard for database access.
22
9
23
9
Intranet Architecture
• The Web Browser
– The Web browser is located in the client computer and it is the end user interface to the Web.
– The Web browser’s job is to interpret the HTML or XML code that it receives from the Web server and to present the different page components in a standard way.
– The browser’s interpretation and presentation capabilities are not sufficient to develop Web-based applications, requiring plug-ins and other client-side extensions.
24
9
HTML
<HTML> <HEAD> <TITLE> The Title </TITLE></HEAD> <BODY> Lots of HTML </BODY></HTML>
25
9Intranet Architecture
• Client-Side Extensions
– Plug-ins
• A plug-in is an external application that is automatically invoked by the browser when needed.
• The plug-in is OS specific.
• The plug-in is associated with a data object to allow the Web server to properly handle data that are not originally supported.
26
9
Java & JavaScript
• Java
– Java is an object-oriented programming language developed by Sun Microsystems that run on top of the Web browser software.
– Java applications are compiled and stored in the Web server.
– Calls to Java routines are embedded inside the HTML page.
• JavaScript
– JavaScript, developed by Netscape, is a scripting language that allows Web authors to design interactive sites.
– JavaScript code is embedded in the Web pages.
– The embedded JavaScript is downloaded with the Web page and is activated when a specific event takes place.
27
9Active X & VB Script
• Active X is Microsoft’s alternative to Java. It is a specification for writing programs that will run inside the Microsoft client browser.
• Active X extends the browser by adding “controls” to Web pages. These controls can be downloaded from the Web server and let the user manipulate data inside the browser.
• VBScript is another Microsoft product that is used to extend the browser's functionality.
• VBScript is derived from Visual Basic.
• VBScript code is embedded inside an HTML page and this code is activated by triggering events such as clicking on a link.
28
9
So Let’s Look at Some Web Pages
• First impressions– No one SQL statement can generate all of this
data.– There are implied hierarchies in the display– Images are a major component
• More detailed– Tables aren’t very “Normalized”
29
9
How Can We Execute SQL Queries From a Web Page?
30
9
Web-to-DB - ColdFusion
• An example of a Web-to-DB tool is Cold Fusion– A product from Adobe (the PDF folks)
• ColdFusion application middleware can be used to:
– Connect to and query a database from a Web page.
– Present database data in a Web page, using various formats.
– Create dynamic Web search pages.
– Create Web pages to insert, update, and delete database data.
31
9
Web-to-DB - ColdFusion
• Define required and optional relationships.
• Define required and optional form fields.
• Enforce referential integrity in form fields.
• Use simple and nested queries and form select fields to represent business rules.
32
9How ColdFusion Works
33
9The RobCor Database’s Relational Schema
34
9
Web-to-DB - ColdFusion
• Creating a Simple Query with CFQuery and CFOutput
– Tasks:
• Query the database, using standard SQL to retrieve a data set that contains all records found in the VENDOR table.
• Format all of the records generated in Step 1 in HTML to let them be included in the page that is returned to the client browser.
– See Script 14.1 for the script and Figure 14.10 for the output.
35
9A Simple Query Using CFQUERY And CFOUTPUT
36
9The CH14-1.CFM Script Output
37
9CFQUERY With Tabular CFOUTPUT
38
9The CH14-2.CFM Script Output
39
9
Web-to-DB - ColdFusion
• Creating a Simple Query with CFQuery and CFTable
– Task:
• Perform the same query with the result presented in tabular format.
– See Script 14-3 and Figure 14.12
40
9CFQUERY With CFTABLE
41
9The CH14-3.CFM Script Output
42
9Internet DB Systems:
Special Considerations
• What Data Types Are Supported?– How does one store and extract data objects such as
documents, pictures, and movies through a Web browser?
– How much overhead will be created by the storage of binary objects in the database? How robust must the DBMS be to handle binary object transactions? What are the limitations for extended or OLE data types? How many extended or OLE data type fields can tables have?
43
9Internet DB Systems:
Special Considerations
– Does the client browser support the data type of the object you are trying to access? Are the necessary plug-ins available? Is there a way to automatically translate documents from their native format to HTML?
– Does the DBMS support Very Large Databases? What about transaction speed? How many users are going to access the database? How often?
44
9Internet DB Systems:
Special Considerations
• Data Security
– Security can be implemented in the Web server, the database and in the networking infrastructure.
– At the Web server level, most Web clients and servers can perform secure transactions by using encryption routines at the TCP/IP protocol level.
– At the SQL level, administrators can use the GRANT and REVOKE commands to assign access restrictions to tables and/or to specific SQL commands.
– Web-to-database middleware vendors usually have several security mechanisms available to interface with databases.
45
9Internet DB Systems:
Special Considerations
• Transaction Management
– The designers must ensure proper transaction management support at the database server level since the Web does not support the concept of database transaction:
• The Web cannot maintain an open line between the client and the database server.
• The mechanics of a recovery from incomplete or corrupted database transactions require that the client must maintain an open communications line with the database server.
46
9Internet DB Systems:
Special Considerations
• De-normalization of Database Tables
– The Web environment does not support multitable (parent-child) data entry.
– Although implementing the parent/child data entry is not impossible in a Web environment, its final outcome is less than optimum, counterintuitive, less user-friendly, and prone to errors.
– Web programming languages such as Java, JavaScript, or VBScript can be used to create the required Web interfaces.
9
XMLExtensible Markup
Language
MIS 304 Winter 2006
48
9
Where to start?
• Well what’s missing from HTML?
• What do we want or need to add?
49
9
Missing Components
• Document Type• Structure• Validation
50
9
Document Type
Well first there is only one kind of document in HTML a Web Page and a cool as the web is it should be easy to see that there is more than one type of document out there and they all have slightly different requirements. Ideally you would like to have a way to specify and encode different document types.
51
9
Structure
Second, HTML has a very limited Structure. The ideal systems would be able to represent tree and graph structures as well as forms.
52
9
Validation
Third, there is no way to validate HTML at the client end. This is usually not a problem because HTML is simple, but validation becomes a bigger issue when the documents become more complex. The ideal client application can check for structural validity of the documents.
53
9
The new markup language must…
• Be usable over the Internet Support a wide variety of
applications Be SGML compatible Be easy to write Be easy to process by programs Have no “optional features” Be human-legible and clear Be designed quickly Have a formal and concise design
54
9
XML documents contain
• Character data– Unicode (ISO 10646)– Captures all of the Western and Eastern character sets.
• Elements and escapements– <! ---- Sample XML ---- >
• Processing instructions– Special instructions to the client processor– <?XML VERSION=”1.0” ?>
• Elements– Start tag, body, end tag– The author controls the tag names– <p> </p>– <p><Name>Mickey Mouse </Name></p>
• Document Type Definition Markup
55
9
Tags continued
• Tags can be arranged in lots of different combinations but in particular you probably want to arrange them in either hierarchical or relational style (or both).
56
9
Attributes
• Optional Attribute, Value pairs• Example
<person firstname=”Mickey” surname=”Mouse”><address> Disneyland</address><e-mail>[email protected]</e-mail>
</person>
57
9
Document Type Declaration
<!DOCTYPE person SYSTEM “person.dtd” [ <!ENTITY mickey [email protected]
58
9
Document Type Definition
• Identifies the syntax of the XML flavor being used.
• Meta-information about document contents– Valid element names– Valid attribute names and values– How elements nest with one another
• The DTD is typically stored in a separate document
• The DTD says nothing about document semantics
59
9
XML
• Well-Formed– Conforms to basic XML syntax– Can be parsed without regard to DTD
• Valid– Well formed– Conforms to DTD
60
9
DTD Element Declarations
• Specifies valid element and its contents
• Uses regular expressions to define valid contents
<!ELEMENT br EMPTY> //empty element
<!ELEMENT p ANY> //allows everything<!ELEMENT person
(firstname | surname | address | e-mail) * >
61
9
A Huge Side Benefit
• A structured document is very much like a database file!
• One problem with “Groupware” has been that the documents tend to be BLOBS.– Binary Large Objects– MS Word Structure– PDF Structure
• Now instead of using BLOBS you can reach into the documents and pull out data.
62
9
Applications of XML
• As a Database “substitute”• Resource Description Framework
(RDF)• Content Description Format (CDF)• XSL Extensible Style Languages• XML/EDI (Electronic Data Interchange)
63
9
Database Substitute
• Many databases are small sets of information that only needs to be organized.
• Data interchange between databases is a particular problem.
• Census Data Example
64
9
Two Ways of Organizing Data
H ierarchical
Level 1
Level 2 Level 2 Level 2
Level 1 Level 1
RootAttribute1 Attribute2
Entity 1
Entity 2
Entity 3
Entity 4
Entity 5
RelationalHierarchical
65
9
So what about XML?
• XML is just a way of organizing data based on marking up the logical structures in the data or document.
• A Database is nothing if not a way of logically organizing data so it seems a simple step to go from a document markup to a Database.
• For a while people were calling XML documents “Data Islands”
66
9
RDB vs XML Terminology
• The Table itself is an Entity.• The Rows are Instances or Entity Instances.• The Columns are Attributes.• Each table has a primary Key• Foreign keys can be used to link tables
• XML Tags are also called Elements.• XML also uses the Attributes name to
describe tag values.
67
9Example of Primary and Foreign Keys
PartNo PartName Quantity Cost VendorNum
147589 Bolt 1066 .03 100002
148995 Washer 1550 .01 100003
151000 Spring 5034 1.00 100001
164601 Bolt 2507 .75 100002
199472 Wingnut 1952 .25 100002
VendorNum VendorName City StPr
100001 ABC Spring Troy MI
100002 XYZZY Fastener Toledo OH
100003 Boom Stamping Guelph ON
Primary Key
Foreign Key
68
9
Entity Relationship (ER) Diagram
Part
PartNoPartNameQuantity
CostVendorNum
Vendor
VendorNumVendorName
CityStPr
1
M
69
9
RDBMS Normalization
• To be properly “normalized” all of the tables in a relational database must be organized so that all of the attributes of the table are “facts” about the primary key.
• Attributes in the table should not be facts about any attribute other than the primary key attribute(s).
• A highly normalized database can assure the designer that he/she has removed all of the redundancy possible from the tables.
70
9Hierarchical XML
<Root><Level-1> Identifier 1
<Level-2> Data Element </Level-2></Level-1>
<Level-1> Identifier 2
<Level-2> Data Element </Level-2></Level-1>
<Level-1> Identifier 3
<Level-2> Data Element </Level-2></Level-1>
</Root>
71
9
Transform<Root>
<Level-1> Identifier 1
<Level-2> Data Element </Level-2>
</Level-1>
<Level-1> Identifier 2
<Level-2> Data Element </Level-2>
</Level-1>
<Level-1> Identifier 3
<Level-2> Data Element </Level-2>
</Level-1>
</Root>
Level 2
Level 1
Level 2
Level 1
Level 2
Level 1
Root
72
9
Relational Style XML<Entity>
<Instance>
<Attribute-1> Information </Attribute-1>
<Attribute-2> More info </Attribute-2>
<Attribute-3> Even more </Attribute-3>
</Instance>
<Instance>
<Attribute-1> Information </Attribute-1>
<Attribute-2> More info </Attribute-2>
<Attribute-3> Even more </Attribute-3>
</Instance></Entity>
73
9
Transform
<Entity>
<Instance><Attribute-1> Information </Attribute-
1><Attribute-2> More info </Attribute-2><Attribute-3> Even more </Attribute-
3>
</Instance><Instance>
<Attribute-1> Information </Attribute-1>
<Attribute-2> More info </Attribute-2><Attribute-3> Even more </Attribute-
3>
</Instance></Entity>
Attribute-1 Attribute-2
Instance 1
Instance 2
Instance 3
Instance 4
Instance 5
74
9Mixed Style
<Root>
<Level-0>
<Level-1> Identifier 1
<Level-2.1> Data Element 2.1 </Level-2.1>
<Level-2.2> Data Element 2.2 </Level-2.2>
<Level-2.3> Data Element 2.3</Level-2.3>
</Level-1>
<Level-1> Identifier 2
<Level-2.1> Data Element 2.1 </Level-2.1>
<Level-2.2> Data Element 2.2 </Level-2.2>
<Level-2.3> Data Element 2.3 </Level-2.3>
</Level-1> </Level-0>
</Root>
75
9
What does this transform into?
76
9
Linked Entities
Level-0
PrimaryKey
Level-1
PrimaryKeyForeignKeyLevel-2.1Level-2.2Level-2.3
1M
77
9
Impact Relational Database Development
• This little trickery has not been lost on the XML community or the relational database developers.– Many traditional RDBMS vendors now support the
import and export of transactions and tables in XML Format.
– Many XML tool vendors now have RDBMS table export functions in their tools.
– There is a lively exchange of ideas between the camps.
78
9
XML Spy Demo
• XML Spy is a product of Altova Corp.• http://www.altova.com
79
9
RDBMS – XML Issues
• One of the major issues is the type of identifiers that each system uses. A legal XML tag name may not be a legal RDBMS Entity or Attribute name and vice versa.
• Complex XML documents may not “normalize” well.
80
9Queries
• A major impetus to relational database development has been the need to query the contents of the database for reporting and analysis.
• This has led to the continued development of the Structured Query Language (SQL) for relational databases.
• Traditional hierarchical databases like IMS relied on the programming language to implement the data manipulation tools.
• Since XML is “hierarchical” it looks like we are going backwards!
81
9The Example Two Table RDB
PartNo PartName Quantity Cost VendorNum
147589 Bolt 1066 .03 100002
148995 Washer 1550 .01 100003
151000 Spring 5034 1.00 100001
164601 Bolt 2507 .75 100002
199472 Wingnut 1952 .25 100002
VendorNum VendorName City StPr
100001 ABC Spring Troy MI
100002 XYZZY Fastener Toledo OH
100003 Boom Stamping Guelph ON
Primary Key
Foreign Key
82
9
Relational SQL
• SQL> SELECT PartNum, Cost FROM Part WHERE Cost > .25
• SQL> SELECT PartNum, VendorName, CostFROM Part, VendorWHERE Cost > .25 ANDPart.VendorNum = Vendor.VendorNum
PartNo Cost
151000 1.00
164601 .75
PartNo VendorName Cost
151000 ABC Spring 1.00
164601 XYZZY Fastener .75
83
9
XML Queries
• If you had an XML formatted document you would like to be able to do the same thing.
• This may not be so easy as there is no XML equivalent to SQL.
• Worse, the things that make SQL work like Primary and Foreign Keys aren’t present in XML.
(In database speak XML has no underlying “Relational Algebra”)
84
9
Navigating the wild XML document
• An XML document is just that, a document.• We know what to do with documents you open
them with an “Editor”– Editors have text manipulation tools like “Find”
and “Replace”.– Editors can Cut and Paste parts of the document
from one location to another in the document or into another document.
– The problem is that Editors are manually operated and to be useful a database needs to run on automatic.
85
9
XPath
• Xpath is an XML enabling technology that allows us to navigate the XML document “Tree” and to discover and manipulate its contents.
• Xpath consists of three parts– A transversal direction called an Axis
– A node test which selects nodes along the path.
– An algorithmic predicate used to refine the the exact node selected.
86
9
Axis
• Think of an ant “Sauntering” about the XML document “Tree”.
• Xpath treats the XML document as a tree of nodes.• The ant visits the various nodes and reports on its
location when it finds what it is looking for.• Xpath has two types of location paths “relative” and
“absolute”.– Relative paths proceed from another node.
– Absolute paths proceed from the root.
87
9
Node Positions
• Child• Descendent• Parent• Ancestor• Following-sibling• Proceeding-sibling
• Following• Preceding• Attribute• Self• Descendent-or-self• Ancestor-or-self
88
9Relative vs Absolute
• Relative
child::title/child::paragraph
selects all the children of the division wherever you happen to be.
• Absolute
/child::book/child::title/child::paragraph
selects the same thing except only if they are immediate children of the “book” document.
89
9
XML Snippet
<chapter> <title> I Discover Moses and the Bulrushers </title> <paragraph lang=“en-Twain”> <quote> You don’t know about me without you have read a book name of The Adventures of Tom Sawyer; but that aint no matter. </quote> </paragraph> <paragraph> The above quote is from Mark Twain </paragraph></chapter>
90
9
XPath Examples
• You can select all of the child elements of the fragment, assuming Chapter is the current node:
child::*
• If you want the first paragraph element you could use:
child::paragraph[position()=1]
• Select element
child::element-name = element-name
91
9XPath Core Functions
• Node Set Functions– Number last()– Number posiiton ()– Number count(node-set)– Node-set id(object)– String local-name()– String namespace-uri– String name()
• Boolean Functions– Boolean Boolean()– Boolean not()– Boolean true()– Boolean false()– Boolean lang()
• String Functions– String string(object)– String concat()– Boolean starts-with()– Boolean contains()– String substring-before()– String substring-after()– String substring()– Number string-length()– String normalize-space()– String translate()
• Number Functions– Number object()– Number sum()– Number floor()– Number ceiling()– Number round()
92
9
Mix and match
• XPath can be combined with other W3C XML standards like XSL, XSLT, Xpointer and programming languages like:– Java
– Java script
– Visual Basic
– ASP
– C/C++
– C#
93
9
XPath Issues
• For those of you who started their careers writing Cobol this looks all too familiar.
• You are required to “Navigate” the document using the XPath API.
• XPath statements can be very verbose so the standard allows lots of abbreviations.
• The RDBMS was supposed to free us from all of this!
94
9
XML DBMS
• Another approach to this problem is to build a complete DBMS around an XML document(s).
• When used a mid-tier applications between traditional a RDBMS and the end application they can offer improved performance.
• Possible candidate applications for XML DBMS include:– Employee Portals
– Catalog Repository
– Membership databases
95
9
XML DBMS
• Three approaches:– Map XML onto a traditional RDBMS.
– Native XML storage.
– Unstructured data management.
• Several companies offer such products.– Ipedo
– Tamino (Sofware AG)
– Coherity
• Most use unique tools as well as the standard W3C tool standards.
96
9
XML DBMS Value Proposition
• Pluses– Less translation to
and from the traditional RDBMS.
– Data is highly structured.
– More flexible programming.
• Minuses– Native XML is a
redundancy Nightmare.
– Traditional database indexing is gone.
– Lower performance.
– Less mature management tools.
97
9
The Future
• XQuery– An XML based Query language from the
W3C with XPath as a subset.
• SQLX sometimes called SQL/XML is a proposed standard from ISO.– Allows you to query XML documents using
an SQL like syntax.
98
9
XSL Extensible Style Sheets
• Maps XML to HTML• Rule based mappings• XML + XSL + HTML4.0 = Industrial
strength publishing• Different devices require different
formats. (Cell Phones, PDA, Car Dashboard)
99
9
Example XML EDI
<?XML version=“1.0” encoding=“UTF-8 ?><PurchaseOrder Version=“4010”><PurchaseOrderHeader>
<TransactionSetHeader X12.ID=“850”><TransactionSetIDCode code=“1234”/><TransactionSetNumber>00001</
TransactionSetNumber><ConventionRef>123456</ConventionRef>
</TransactionSetHeader><BeginningSegment>
<PurposeTypeCode Code=“SU Status Update”/><OrderTypeCode Code=“KN Purchase Order”/><ReleaseNumber>MTB 98765</ReleaseNumber>
</BeginningSegment>