database trends and directions: current challenges ...ceur-ws.org/vol-567/ trends and directions:...

Download Database Trends and Directions: Current Challenges ...ceur-ws.org/Vol-567/ Trends and Directions: Current Challenges and Opportunities ... Database Trends and Directions: Current Challenges

Post on 06-Feb-2018

223 views

Category:

Documents

3 download

Embed Size (px)

TRANSCRIPT

  • Database Trends and Directions:Current Challenges and Opportunities

    George Feuerlicht1,2

    1 Department of Information Technology, University of Economics, PragueW. Churchill Sq. 4, Prague, Czech Republic

    2 Faculty of Engineering and Information Technology University of Technology, SydneyP.O. Box 123 Broadway, Sydney, NSW 2007, Australia

    jiri@it.uts.edu.au

    Database Trends and Directions: Current Challenges

    and Opportunities

    George Feuerlicht 1,2

    1 Department of Information Technology,

    University of Economics, Prague, W. Churchill Sq. 4, Prague, Czech Republic 2 Faculty of Engineering and Information Technology,

    University of Technology, Sydney,

    P.O. Box 123 Broadway, Sydney, NSW 2007, Australia

    Abstract. Database management has undergone more than four decades of

    evolution producing vast range of research and extensive array of technology

    solutions. The database research community and software industry has

    responded to numerous challenges resulting from changes in user requirements

    and opportunities presented by hardware advances. The relational database

    approach as represented by SQL databases has been particularly successful and

    one of the most durable paradigms in computing. Most recent database

    challenges include internet-scale databases databases that manage hundreds of

    millions of users and cloud databases that use novel techniques for managing

    massive amounts of data. In this paper we review the evolution of database

    management systems over the last four decades and then focus on the most

    recent database developments discussing research and implementation

    challenges presented by modern database applications.

    Keywords: Relational Databases, Object-Relational Databases, NoSQL

    Databases

    1 Introduction

    Databases, in particular relational databases, are a ubiquitous part of todays

    computing environment. Database management systems support a wide variety of

    applications, from business to scientific and more recently various types of internet

    and electronic commerce applications. Database management systems (DBMS) are a

    core technology in most organizations today and run mission-critical applications that

    banks, hospitals, airlines, and most other types of organizations rely on for their day

    to day operation. Over the last three decades relational DBMS technology has proven

    to be highly adaptable and has evolved to accommodate new application requirements

    and the ever-increasing size and complexity of data. But, there are indications that

    some of the recently emerging data-intensive applications (e.g. internet searches)

    cannot be satisfactorily addressed using existing DBMS technology, and some experts

    argue that significant innovation is needed (a new database paradigm) to overcome

    the limitations of the current generation of database technology.

    The combination of inexpensive and high capacity storage and the prevalence of

    digital devices (digital cameras, sound recorders, video recorders, mobile phones,

    J. Pokorny, V. Snasel, K. Richta (Eds.): Dateso 2010, pp. 163174, ISBN 978-80-7378-116-3.

  • 164 George Feuerlicht

    RFID readers, and various types of sensors) is creating a deluge of digital

    information. According to a recent article in the Economist [1] the amount of data

    collected by various sensors, computers, and devices is growing at a compound

    annual rate of 60%. A 2008 study by International Data Corporation (IDC) predicted

    that over a thousand exabytes of digital data will be generated in 2010 [2]. Scientific

    applications in astronomy, earth sciences, etc. (e-science) tend to produce massive

    amounts of data; well-documented examples include the Large Hadron Collider at

    CERN [3] that generates 40 terabytes of data every second. Storing and analyzing

    such volumes of data represents an insurmountable challenge for the current

    generation of database technology. Another relatively recent development that may

    require a revision of current database paradigms are internet-scale applications (e.g.

    search engines, social networking applications, cloud computing services, etc.) that

    typically process petabytes of data, use thousands of servers, and serve millions of

    users that demand sub-second access to information. Companies like Google,

    Facebook, Amazon, and eBay manipulate petabytes of data every day. For example,

    Facebook handles 20 petabytes of data, managing 20 billion photographs in 4

    different resolutions, growing by 2 billion photographs per month. The Facebook

    database is serving 600,000 photographs per second for a user base of 300 million

    active users [4]. Google manages vast amounts of semi-structured data: billions of

    URLs with associated internet content, crawl metadata, geographic objects (roads,

    satellite images, etc.), and hundreds of terabytes of satellite image data, with hundreds

    of millions of users and thousands of queries per second [5]. The scale and level of

    functionality required for such big data applications has not been anticipated by

    commercially available DBMSs, and almost invariably internet companies were

    forced to develop their own database solutions. But, even more traditional database

    applications manage increasingly large volumes of data; for example the retail chain

    WalMart handles more than one million transactions per hour, and manages databases

    with more than 2.5 petabytes of data.

    It is estimated that structured data constitutes only about 5% of the total volume of

    generated data, with the rest of this digital universe in semi-structured or

    unstructured form, making it more difficult to manage and to extract meaningful

    information from it. This massive increase in the volume and complexity of data is

    challenging available database management techniques and technologies, forcing a re-

    evaluation of the direction of database research. Some fundamental questions arise,

    including what constitutes a database application. Can applications that search

    petabytes of unstructured data (e.g. Web pages) using thousands of servers working in

    parallel be classified as database applications?

    In this paper we firstly review the past achievements of database research and

    technology solutions (section 2), and then discuss the research challenges and

    opportunities created by new types of database applications (section 3). The final

    sections (section 4) are our conclusions.

  • Database Trends and Directions: Current Challenges and Opportunities 165

    2 Evolution of Database Technology

    While the origin of commercial database management systems can be traced to

    hierarchical and CODASYL (Conference on Data Systems Languages) databases of

    1960s and 1970s it was the emergence of relational DBMS during the 1980s that

    started a revolution in data management. The simplicity and elegance of the relational

    model proposed by E.F. Codd in 1970 [6] resulted in unprecedented volume of

    research activity and the emergence of highly successful relational DBMS (RDBMS)

    implementations. Relational databases are a rare example of a theoretical model

    preceding and guiding the implementation of technologies. Codd is often credited

    with turning the previously black art of data management into an engineering

    discipline providing a blueprint for the design and implementation of databases and

    the foundation of modern database technology. The basic idea of the relational model

    is to represent data as two-dimensional tables with well-defined properties and to use

    of a high-level query language for data access. This remarkably simple set of ideas

    based on the underlying relational theory had a major impact on the development of

    database technology over the following two decades. Relational databases solved two

    major interrelated problems of the earlier database approaches. The first achievement

    was to de-couple the database from application programs by providing effective

    support for data independence. Second, and equally important achievement of the

    relational approach was to free database application developers from the burden of

    programming navigational access to database records by introducing a non-procedural

    query language.

    A number of different relational languages were proposed following Codds

    original description of the relational model, notably a language called QUEL (Ingres

    DBMS) developed at University of California at Berkeley, and IBMs Structured

    Query Language (SQL) developed at the IBM San Jose Research Laboratory. The

    next major milestone in the evolution of relational databases was the acceptance by

    ANSI (American National Standards Institute) of a subset of IBMs SQL as the first

    version of the standard relational database language - SQL86. Although SQL86

    lacked many important features of the relational model as originally proposed by

    Codd, including key aspects of the model such as referential integrity and domains, it

    quickly became universally accepted as the database language for relational DBMS

    systems. The shortcomings in SQL86 were largely rectified in the subsequent releases

    of the SQL standard (SQL89, SQL92) and SQL has evolved from a relatively simple

    language into a comprehensive database language implemented in all significant

Recommended

View more >