emerging database teshnologies

Upload: rao

Post on 08-Apr-2018

220 views

Category:

Documents


6 download

TRANSCRIPT

  • 8/7/2019 Emerging Database Teshnologies

    1/44

    Dinesh Rao

  • 8/7/2019 Emerging Database Teshnologies

    2/44

    1. Mobile Database.

    2. Multimedia Database.

    3. GIS ( Geographic Information Systems ).

    4. Genome Data Management

  • 8/7/2019 Emerging Database Teshnologies

    3/44

    1.Mobile Database

  • 8/7/2019 Emerging Database Teshnologies

    4/44

    ` Portable devices and wireless technology led to mobile computing.

    ` Portable computing devices and wireless communication allowed

    the client to access data from any where and any time.

    ` There are some HW and SW problems that must be solved to make

    maximum exploitation of mobile computing. i.e. Database recovery.

    ` Hardware problems are more difficult. Wireless coverage.

    Battery.

    Changes in network topology.

    Wireless Transmission Speed.

  • 8/7/2019 Emerging Database Teshnologies

    5/44

    Mobile Computing Architecture

  • 8/7/2019 Emerging Database Teshnologies

    6/44

    In a MANET, co-located mobile units do not need tocommunicate via a fixed network, but instead, form their ownusing cost-effective technologies such as Bluetooth.

    In a MANET, mobile units are responsible for routing their owndata, effectively acting as base stations as well as clients.

    MANET must be robust enough to handle changes in networktopology. Such as arrival or departure of mobile unites.

    MANET can fall underP2P architecture.

    Mobile Ad-Hoc Network (MANET)

  • 8/7/2019 Emerging Database Teshnologies

    7/44

    Communication latency

    Intermittent connectivity

    Limited battery life

    Changing client location

    All of these Characteristics impact data management in mobile

    computing.

    Characteristics ofMobile Environments - 1

  • 8/7/2019 Emerging Database Teshnologies

    8/44

    The server may not be able to reach the client or vise

    versa.

    We can add proxies to the client and the server to cacheupdates into when connection is not available.

    After the connection is available proxy automatically

    forward these updates to its destination.

    Characteristics ofMobile Environments - 2

  • 8/7/2019 Emerging Database Teshnologies

    9/44

    The latency involved in wireless communication makesscalability a problem.x Since latency increases the time to service each client request,

    so the server can handle fewer clients.

    Servers can use Broadcasting to solve this problem.

    Broadcast well reduces the load on the server, as clients

    do not have to maintain active connections to it.x For example weather broadcasting

    Characteristics ofMobile Environments - 3

  • 8/7/2019 Emerging Database Teshnologies

    10/44

    ` Client mobility also poses many data management

    challenges: Servers must keep track of client locations in order to efficiently

    route messages to them. Client data should be stored in the network location that

    minimizes the traffic necessary to access it.

    The act of moving between cells must be transparent to the

    client.

    ` Client mobility also allows new applications that are

    location-based.

    Characteristics ofMobile Environments - 4

  • 8/7/2019 Emerging Database Teshnologies

    11/44

    Mobile databases can be distributed under two possible

    scenarios:1. The entire database is distributed mainly among the wired

    components, possibly with full or partial replication. Management is done in fixed hosts, with additional functionalities.

    2. The database is distributed among wired and wireless

    components.

    Management is done in both fixed hosts and mobile units.

    Data Management Issues

  • 8/7/2019 Emerging Database Teshnologies

    12/44

    Data distribution and replication (Cache)

    Transactions models

    Query processing (where data is located?)

    Recovery and fault tolerance Mobile database design

    Location-based service

    Division of labor

    Security

    Data Management Issues

  • 8/7/2019 Emerging Database Teshnologies

    13/44

    Application: IntermittentlySynchronized Databases

    Insert\Update Data

  • 8/7/2019 Emerging Database Teshnologies

    14/44

    2.Multimedia Database

  • 8/7/2019 Emerging Database Teshnologies

    15/44

    ` In the years ahead multimedia information systems

    are expected to dominate our daily lives.

  • 8/7/2019 Emerging Database Teshnologies

    16/44

    ` DBMSs have been constantly adding to the types of

    data they support.

    ` Today many types of multimedia data are available in

    current systems. Text.

    Graphics.

    Images.

    Animation.

    Video. Audio.

  • 8/7/2019 Emerging Database Teshnologies

    17/44

    ` Multimedia data may be stored, delivered, and utilized

    in many different ways.

    ` Applications may be categorized based on their datamanagement characteristics. Repository applications.

    x A large amount of multimedia data as well as metadata is stored for retrieval

    purposes.

    Presentation applications.

    x Simple multimedia viewing of video or audio data.

    Collaborative work using multimedia information.

    x Which engineers may execute a complex design task by merging drawings,

    fitting subjects to design constraints, and generating new documentation,

    change notifications, and so forth.

    Nature ofMultimedia Applications

  • 8/7/2019 Emerging Database Teshnologies

    18/44

    y Multimedia applications dealing with thousands

    of images, documents, audio and video

    segments, and free text data depend criticallyon:y Appropriate modeling of the structure and content of

    data.

    y Designing appropriate database schemas for storing andretrieving multimedia information.

    Data Management Issues - 1

  • 8/7/2019 Emerging Database Teshnologies

    19/44

    ` Multimedia information systems are very complex and

    embrace a large set of issues:

    Modeling:x Complex Objects, dealing with large number of types of data (Graphics).

    Design:

    x Conceptual, logical, and physical design of multimedia has not been

    addressed fully, and it remains an area of active research.

    Storage:

    x Multimedia data on standard disk devices presents problems of representation,compression, mapping to device hierarchies, archiving, and buffering during

    the input/output operation.

    x DBMS has presented the BLOB type (BinaryLarge Object).

    Data Management Issues - 2

  • 8/7/2019 Emerging Database Teshnologies

    20/44

    Multimedia information systems are very complex and

    embrace a large set of issues (cont.):

    Queries and retrieval: The database way of retrieving information is based on query languages

    and internal index structures.

    Performance :

    Multimedia applications involving only documents and text, performance

    constraints are subjectively determined by the user.

    Applications involving video playback or audio-video synchronization,physical limitations dominate.

    Data Management Issues - 3

  • 8/7/2019 Emerging Database Teshnologies

    21/44

    Documents and records management

    Knowledge dissemination

    Education and training Marketing, advertising, retailing, entertainment, and

    travel

    Real-time control and monitoring

    Multimedia Database Applications

  • 8/7/2019 Emerging Database Teshnologies

    22/44

    3.Geographic InformationSystems (GIS)

  • 8/7/2019 Emerging Database Teshnologies

    23/44

    ` Geographic information systems(GIS): A systematicintegrationofhardwareandsoftwarefor

    capturing,storing,displaying,updatingmanipulatingand

    analyzingspatialdata.

  • 8/7/2019 Emerging Database Teshnologies

    24/44

    y GIS can be divided into two formats:y Vectordata represents geometric objects such as points, lines,

    and polygons.

    y Rasterdata is characterized as an array of points, where each

    point represents the value of an attribute for a real-worldlocation.

    y Informally, raster images are n-dimensional array where each entry

    is a unit of the image and represents an attribute

  • 8/7/2019 Emerging Database Teshnologies

    25/44

  • 8/7/2019 Emerging Database Teshnologies

    26/44

    ` There are several aspects of the geographical

    objects need to be considered:Location.

    Temporality.

    Complex Spatial Features.

    Object ID.

    Data Quality.

    Characteristics of Data in GIS

  • 8/7/2019 Emerging Database Teshnologies

    27/44

    y The geographic context, topologic relations and

    other spatial relationships are fundamentally

    important in order to define spatialintegrityrules.

    Characteristics of Data in GIS

  • 8/7/2019 Emerging Database Teshnologies

    28/44

    y TopologyIntegrity.y Dealswith thebehavioroffeaturesandthespatial

    relationshipbetween

    them.

    y Semantic Integrity.y Dealswith themeaning.

    y User Defined Integrity.

    y Businessrules.y Temporal.

    y Punctualand Durable.

    Constraints in GIS

  • 8/7/2019 Emerging Database Teshnologies

    29/44

    ` Briefly discuss the common conceptual models

    for storing spatial data in GIS.

    ` Some conceptual data models: Rasterdatamodel:x Used for analytical applications.

    Vectordatamodel:x Analysis is done using a well defined set of tools.

    Conceptual Data Models forGIS

  • 8/7/2019 Emerging Database Teshnologies

    30/44

    Conceptual Data Models forGIS

    Some conceptual data models (cont.):

    Networkmodel:

    Define how lines connect to each other in a point. Rules are stored in a connectivity table.

    Example of everyday application, optimizing a school bus route.

    TIN datamodel: TriangularIrregular Network.

    Is a vector-based approach.

    models surfaces by connecting sample points as vector

    of triangles.

  • 8/7/2019 Emerging Database Teshnologies

    31/44

    DBMS Enhancements forGIS

    Until the mid 1990s, GIS system was based mainly on file-based

    systems.

    No transfer standards was defined, which limited vendors in terms of

    sharing.Involved in a geo-structure and attributes was stored in DBMS.

    The spatial features was kept in a file and linked to the attributes.

    Could not take FULL advantage of commercial RDBMS.

    Database extensions has been released by vendors like DB2 spatial

    extender, and OracleSpatial and OracleLocator to support GIS data.

    These extensions allowed theuser to store, manage, and retrievegeo-objects.

  • 8/7/2019 Emerging Database Teshnologies

    32/44

  • 8/7/2019 Emerging Database Teshnologies

    33/44

    GISStanders and Operations

    Spatial Analysis Standard:Distance.

    Returns the shortest distance between any two points in two

    geometries.Buffer.

    Returns a geometry that represents all points whose

    distance from the given geometry is less than or equal to

    distance.Convex Hull.

    Union.

    And more.

  • 8/7/2019 Emerging Database Teshnologies

    34/44

    GISStanders and OperationsCREATE TABLE STATES (Sname VARCHAR(50) NOT NULL,State_shape POLYGON NOT NULL,

    Country VARCH

    AR(50)

    NOT NULL,PRIMARY KEY (Sname),

    FOREIGN KEY (Country) REFERENCES COUNTRIES (Cname));

    SELECT SnameFROM STATSWHERE (AREA (State_shape) > 50000)

  • 8/7/2019 Emerging Database Teshnologies

    35/44

    Future ofGISThere are some challenges in developing GIS

    applications:Data Source.

    Data Model.

    Standards.

    Mobile GIS.

    Specialized DBMS forGIS.

  • 8/7/2019 Emerging Database Teshnologies

    36/44

    4.Genome DataManagement

  • 8/7/2019 Emerging Database Teshnologies

    37/44

    Biological Sciences and Genetics(1):The biological sciences encompass an enormous variety ofinformation. Environmental science gives us a view of how specieslive and interact in a world filled with natural phenomena. Histologyand cell biology delve into the tissue and cellular levels and provideknowledge about the inner structure and function of the cell. This

    wealth of information that has been generated, classified, and storedfor centuries has only recently become a major application ofdatabase technology.Genetics has emerged as an ideal field for the application ofinformation technology. In a broad sense, it can be taught of as theconstruction of models based on information about genes which

    can be defined as units of heredity

    and population and the seekingout of relationships in that information.

  • 8/7/2019 Emerging Database Teshnologies

    38/44

    Biological Sciences and Genetics(2):

    The study of genetics can be divided into three branches:

    1. Mendelian genetics is the study of the transmission of traits

    between generations.

    2. Molecular genetics is the study of the chemical structure and

    function of genes at the molecular level.3. Population genetics is the study of how genetic information

    varies across populations of organisms

  • 8/7/2019 Emerging Database Teshnologies

    39/44

    Biological data exhibits many special characteristics that makemanagement of biological information a particularly challenging

    problem. The characteristics related to biological information, and

    focusing on a multidisciplinary field called bioinformatics that has

    emerged. Bioinformatics addresses information management of

    genetic information with special emphasis on DNA sequence

    analysis.Applications of bioinformatics span design of targets for drugs, study

    of mutations and related diseases, anthropologicalinvestigations on migration patterns of tribes and therapeutictreatments.

    Characteristic 1: Biological data is highly complex when comparedwith most other domains or applications.

    Characteristic 2: The amount and range of variability in data is high.Characteristic 3: Schemas in biological databases change at a rapid

    pace.

  • 8/7/2019 Emerging Database Teshnologies

    40/44

    Characteristic 4: Representations of the same data by different biologists

    will likely be different (even using the same system).

    Characteristic5: Most users of biological data do not require write access

    to the database; read-only access is adequate.

    Characteristic 6: Most biologists are not likely to have knowledge of theinternal structure of the database or about schema design.

    Characteristic 7: The context of data gives added meaning for its use in

    biological applications.

    Characteristic 8: Defining and representing complex queries is extremely

    important to the biologist.

    Characteristic 9: Users of biological information often require access toold values of the data particularly when verifying previously

    reported results.

  • 8/7/2019 Emerging Database Teshnologies

    41/44

    GenBank

    As of release 135.0 in April 2003, GenBank contains over31

    billion nucleotide bases of more than 24 million sequences from

    over100,000 species with roughly1400 new organisms being

    added eachmonth.

    The database size in flat file format is over100GB

    uncompressed and has been doubling every15 months.

    International collaboration with the European MolecularBiology

    Laboratory (EMBL) in the U.K. and the DNA Data Bank of Japan

    (DDBJ) on daily basis.

  • 8/7/2019 Emerging Database Teshnologies

    42/44

    Other limited data sources (e.g. three-dimensional structure and

    Online Mendelian Inheritance in Man (OMIM), have been added

    recently by reformatting the existing OMIM and PDB databases

    and redesigning the structure of the GenBank system to

    accommodate these new data sets.

    The system is maintained as a combination of flat files,

    relational databases, and files containing Abstract Syntax

    Notation One (ASN.1)

    The average user of the database is not able to access thestructure of the data directly for querying or other functions,

    although complete snapshots of the database are available for

    export in a number of formats, including ASN.1.

  • 8/7/2019 Emerging Database Teshnologies

    43/44

    DATABASE

    NAME

    MAJOR

    CONTENT

    INITIAL

    TECHNOLOGY

    CURRENT

    TECHNOLOGY

    DB PROBLEM

    AREAS

    PRIMARY DATA

    TYPES

    GenBank DNA/RNA

    sequence,

    protein

    Text files Flat-file/ASN.1 Schema browsing,

    schema evolution,

    linking to other dbs

    Text, numeric, Some

    complex types

    OMIM Disease phenotypes

    and genotypes,etc

    Index cards/text files Flat-file/ASN.1 Unstructured, free

    text entries linking toother dbs

    Text

    GDB Genetic map linkage

    data

    Flat file Relational Schema expansion /

    evolution, complex

    objects, linking to

    other dbs

    Text, Numeric

    ACEDB Genetic map linkage

    data, sequence

    data(non-human)

    OO OO Schema expansion

    /evolution, linking to

    other dbs

    Text, Numeric

    HGMDB Sequence and

    sequence variants

    Flat File-application

    specific

    Flat File-application

    specific

    Schema expansion

    /evolution, linking to

    other dbs

    Text

    EcoCyc Biochemical

    reactions and

    pathways

    OO OO Locked into class

    hierarchy, schema

    evolution

    Complex types, text,

    numeric

  • 8/7/2019 Emerging Database Teshnologies

    44/44

    Thanks