lecture 2 database management systems. data models and their categories schemas, instances, and...

Download Lecture 2 Database Management Systems. Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture Data Independence DBMS

If you can't read please download the document

Upload: magdalene-mosley

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Lecture 2 Database Management Systems
  • Slide 2
  • Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture Data Independence DBMS Languages and Interfaces Database System Utilities and Tools Centralized and Client-Server Architectures Classification of DBMSs Research topics for Graduate-term-paper Slide 2- 2
  • Slide 3
  • Architecture? A general term to describe buildings, physical structures, and computer-based systems E.g., DBMS Architecture DBMS has also architecture Monolithic client/server Client module Server module Slide 2- 3
  • Slide 4
  • Brunelleschis dome LOCKHEED F-117A NIGHT HAWK Slide 2- 4
  • Slide 5
  • Data Model? A set of concepts to describe the structure of a database, the operations for manipulating these structures, the constraints that the database should obey. Provides proper means to describe the design of a database at the Physical level Logical level View levels Slide 2- 5
  • Slide 6
  • Data Model Structure? By structure data and data types and entities (e.g., record, table) relationships among data Constraints on data These are restrictions/conditions on valid data; must be enforced at all times E.g., Each department must have at least 5 employees A student can not have more than one advisor The salary of employee cannot exceed the salary of his/her immediate supervisors Slide 2- 6
  • Slide 7
  • Data Model Operations (behaviors) operations that are used for specifying database retrievals and updates by referring to the constructs of the data model Operations on the data model may include basic model operations e.g. generic insert, delete, update user-defined operations e.g. compute_student_gpa, update_inventory Slide 2- 7
  • Slide 8
  • Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data (ERD) (Also called entity-based or object-based data models.) Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer (e.g., access control) Implementation (representational) data models: Provide concepts that fall between the above two used by many commercial DBMS implementations (e.g. relational data models, and OO-Model) Slide 2- 8
  • Slide 9
  • Database Schema: The description or specification of a database Includes descriptions of the database structure data types the constraints on the database. Schema Diagram? An illustrative display of (most aspects of) a database schema. Schema Construct? A component of the schema or an object within the schema, e.g., STUDENT, COURSE. Slide 2- 9
  • Slide 10
  • Slide 2- 10
  • Slide 11
  • Database State: The actual data stored in a database at a particular moment in time. includes the collection of all the data in the database. Also called database instance (snapshot). The term instance is also applied to individual database components, e.g. record instance table instance Slide 2- 11
  • Slide 12
  • Slide 2- 12
  • Slide 13
  • Initial Database State? Refers to the database state when it is initially loaded into the system Valid State? A state that satisfies the structure and constraints of the database. Slide 2- 13
  • Slide 14
  • Major Distinction The database schema changes very infrequently. The database state changes very frequently Schema are stored in the DBMS catalog Schema is also called intension. State is also called extension. Slide 2- 14
  • Slide 15
  • Proposed to support DBMS characteristics of: Self-describing nature of DBMS Program-data independence. Support of multiple views of the data Not explicitly used in commercial DBMS products, but has been useful in explaining database system organization Slide 2- 15
  • Slide 16
  • Defines DBMS schemas at three levels: Internal schema: This level describe physical storage structures and access paths (e.g., indexes). Typically uses a physical data model. Conceptual schema this level describe the structure and constraints for the whole database for a community of users. Uses a conceptual (implementation) data model. External schemas This level describe the various user views. Usually uses the same data model as the conceptual schema. Slide 2- 16
  • Slide 17
  • Slide 2- 17
  • Slide 18
  • Mappings among schema levels are needed to transform requests and data. Programs refer to an external schema are mapped (i.e., translated) by the DBMS to the internal schema for execution. Data extracted from the internal DBMS level is reformatted to match the users external view e.g., formatting the results of an SQL query for display in a Web page Slide 2- 18
  • Slide 19
  • Three-schema architecture is used to achieve data independence changing schema at one level without changing the schema at the next higher level Slide 2- 19
  • Slide 20
  • Logical Data Independence: change the conceptual schema without having to change the external schemas and their associated application programs E.g., view should not be changed if the base tables are changed Physical Data Independence: change the internal schema without having to change the conceptual schema. examples the internal schema may be changed whenever certain file structures are reorganized new indexes are created to improve database performance Sli de 2- 20
  • Slide 21
  • When a schema at a lower level is changed, only the mappings between this schema and its higher-level schemas need to be changed. The higher-level schemas themselves are unchanged. i.e., the application programs need not be changed since they refer to the external schemas. Slide 2- 21
  • Slide 22
  • Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database; Also used to define internal and external schemas Storage Definition Language (SDL) SDL is typically realized via DBMS commands provided to the DBA and database designers View Definition Language (VDL) Used to define internal and external schemas. Slide 2- 22
  • Slide 23
  • Data Manipulation Language (DML): Used to specify database retrievals and updates Retrieval of information Insertion of new information Deletion and/or modification of information stored in the DB Two main types DMLs High-level (nonprocedural) DML stand-alone DML commands can be applied directly (called a query language). Low level (procedural ) DML embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java. Used to retrieve individual records/objects from database Slide 2- 23
  • Slide 24
  • Programmers interface for embedding DML in a programming languages: Embedded Approach: embedded SQL (for C, C++, etc.), and SQLJ (for Java) Procedure Call Approach: JDBC (Java Database Connectivity )for Java, ODBC (Open Database Connectivity )for other programming languages Database Programming Language Approach: e.g. Oracles PL/SQL: a programming language based on SQL; language incorporates SQL and its data types as integral components Slide 2- 24
  • Slide 25
  • Menu-based: popular for browsing on the web Forms-based: designed for nave users Graphics-based: Point and Click, Drag and Drop, etc. Natural language: Requests in written English Combinations of the above: For example, both menus and forms used extensively in Web database interfaces Slide 2- 25
  • Slide 26
  • Speech as Input and Output Web Browser as an interface Parametric interfaces: e.g., bank tellers using function keys. Interfaces for the DBA: Creating user accounts, granting authorizations Setting system parameters Changing schemas or access paths Slide 2- 26
  • Slide 27
  • To perform certain functions such as: Loading data stored in files into a database. Includes data conversion tools Backing up the database periodically on tape Reorganizing database file structures Report generation utilities Performance monitoring utilities. Other functions, sorting, user monitoring, data compression, etc. Slide 2- 27
  • Slide 28
  • Data dictionary / repository: Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc. Slide 2- 28
  • Slide 29
  • Application Development Environments and CASE (Computer-Aided Software Engineering) tools: Examples: PowerBuilder (Sybase) JBuilder (Borland) JDeveloper 10G (Oracle) Slide 2- 29
  • Slide 30
  • Slide 2- 30
  • Slide 31
  • Centralized DBMS Architecture: Combines everything into single system including DBMS software, hardware, application programs, user interface processing software. User can connect through a remote terminal however, all processing is done at centralized site. E.g., time-sharing systems (good old days!) Slide 2- 31
  • Slide 32
  • Slide 2- 32
  • Slide 33
  • Clients (users) can access the specialized servers as needed Provide GUI and local processing Server? Specialized Servers with Specialized functions Print server (share printer) File server (share files) DBMS server (share DBMS) Web server (share web-pages) Email server Slide 2- 33
  • Slide 34
  • Slide 2- 34
  • Slide 35
  • Provide appropriate interfaces through a client software module to access and utilize the various server resources. Clients may be diskless machines or PCs or Workstations with disks with only the client software installed. Connected to the servers via some form of a network. (LAN: local area network, wireless network, etc.) Slide 2- 35
  • Slide 36
  • Provides database query and transaction services to the clients Relational DBMS servers are often called SQL servers Query servers Transaction servers Applications running on clients uses Application Program Interface (API) to access server databases via standard interface such as: ODBC: Open Database Connectivity standard JDBC: for Java programming access Client and Server must install appropriate client and server modules software for ODBC or JDBC Slide 2- 36
  • Slide 37
  • A client program may connect to several DBMSs, sometimes called the data sources. E.g., data sources can be files non-DBMS software that manages data. Other variations of clients are possible (thin, fat, p2p, SOA, etc) e.g., in some object DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers, etc. Slide 2- 37
  • Slide 38
  • Common for Web applications Intermediate Layer called Application Server or Web Server: Stores the web connectivity software and the business logic part of the application used to access the corresponding data from the database server Used as a channel for sending partially processed data between the database server and the client. Three-tier Architecture Enhances Security: Database server only accessible via middle tier Clients cannot directly access database server Slide 2- 38
  • Slide 39
  • Slide 2- 39
  • Slide 40
  • Based on the data model used Classical: Relational, Network, Hierarchical. Emerging: Object-oriented, Object-relational. Based on Users Single-user multi-user Based on Distributeness Centralized Distributed Slide 2- 40
  • Slide 41
  • Based on Homogeneity Homogeneous DDBMS Heterogeneous DDBMS Federated or Multi-database Systems (Autonomy, distributeness, heterogeneously) Slide 2- 41
  • Slide 42
  • Based on Cost from free open-source systems to configurations costing millions of dollars Examples of free relational DBMSs: MySQL, PostgreSQL Slide 2- 42
  • Slide 43
  • Network Model Hierarchical Model Relational Model Object-oriented Data Models Object-Relational Models Slide 2- 43
  • Slide 44
  • Network Model: The first network DBMS was implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (Conference on Data Systems Languages) (CODASYL - DBTG report of 1971). Slide 2- 44
  • Slide 45
  • Slide 2- 45
  • Slide 46
  • Advantages: Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. Can handle most situations for modeling using record types and relationship types. Slide 2- 46
  • Slide 47
  • Disadvantages: Navigational and procedural nature of processing Database contains a complex array of pointers that thread through a set of records. Little scope for automated query optimization Slide 2- 47
  • Slide 48
  • Hierarchical Data Model: Initially implemented as a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. IBMs IMS product had (and still has) a very large customer base worldwide Hierarchical model was formalized based on the IMS system Other systems based on this model: System 2k (SAS inc.) Slide 2- 48
  • Slide 49
  • Advantages: Simple to construct and operate Corresponds to a number of natural hierarchically organized domains, e.g., organization (org) chart Language is simple: Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc. Disadvantages: Navigational and procedural nature of processing Database is visualized as a linear arrangement of records Little scope for "query optimization" Slide 2- 49
  • Slide 50
  • Relational Model: Proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-1982. Now in several commercial products e.g. DB2, ORACLE, MS SQL Server, SYBASE, INFORMIX Several free open source implementations, e.g. MySQL, PostgreSQL SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99, SQL3, Slide 2- 50
  • Slide 51
  • Object-oriented Data Models: Several models have been proposed for implementing in a database system. One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, IRIS (at HP- used in Open OODB). Chapters 11-12 describe this model. Slide 2- 51
  • Slide 52
  • Object-Relational Models: Most Recent Trend. Started with Informix Universal Server. Relational systems incorporate concepts from object databases leading to object-relational. Exemplified in the versions of Oracle-10i, DB2, and SQL Server and other DBMSs. Standards included in SQL-99 and expected to be enhanced in future SQL standards. Slide 2- 52
  • Slide 53
  • For Graduate Students
  • Slide 54
  • Report Format IEEE/ACM Format 8-10 pages One column Fully Citied References must be completed (i.e., authors, title, source, page, year, etc) Use the same format for References No dangling references HTTP: must have title Tables/Figures must be numbered and briefly explained Sli de 2- 54
  • Slide 55
  • Paper Title Name Affiliation (e.g., UND) Email Paper content Abstract 100-100 words Introduction Background Related works Approach Discussion Conclusion and Future work Acknowledgments (if any) References Appendix Sli de 2- 55
  • Slide 56
  • Cloud Computing? One of the biggest IT game changer Estimates market growth rates for public cloud products and services is 20% growth annually $110-120 B in 2012 in just U.S. Cloud Computing and Databases as a service (DaaS) DBMS vs. HDBMS/DDBMS vs. CLOUD COMPUTING If and how youre adopting cloud services Which applications youre using for the cloud The benefits youre realizing and challenges youre facing When to use cloud computing (CC)? When not to use cloud computing (CC)? Sli de 2- 56
  • Slide 57
  • What are the quality attributes of cloud computing from cloud user (consumer) and cloud providers (producer) point of view. What are the architectural design decision regarding CC? What are the risks associated with CC? Type of analysis and tradeoff techniques to analyze the quality of CC? Transition of DB and its deployment on CC How to provide service elasticity, co-locateability, and other cloud related qualities in CC? Sli de 2- 57
  • Slide 58
  • Big DATA? Referred to data (structured and unstructured) that is measured in exabytes and beyond Approaches Hadoop (NOSQL) to support Distributed application using distributed file system and MapRedure Big and High Performance Database Mgt System solutions (MPP DBMS) What are the technologies being applied to big data massively parallel-processing databases, search-based applications data-mining grids, distributed file systems, distributed databases, cloud based infrastructure (Saas, Daas, Paas, etc.) What are the challenges of using (e.g., CC, RDBMS, DDBMS, ) to search, analyze, etc. of BIG Data? Sli de 2- 58
  • Slide 59
  • Data Models and Their Categories History of Data Models Schemas, Instances, and States Three-Schema Architecture Data Independence DBMS Languages and Interfaces Database System Utilities and Tools Centralized and Client-Server Architectures Classification of DBMSs Graduate Research Topics Big Data, Cloud Computing, etc. Slide 2- 59