cxquery (chamois xquery) and its applications

48
2005-09-26 H.S.Yong, EWU. 1 CXquery (Chamois Xquer y) and its Applications Hwan-Seung Yong ( 용 용용 , 용 용용 ) Dept. of Computer Science and Engineering 梨梨梨梨梨梨梨 (Ewha Womans Univ.) Seoul, 梨梨梨梨

Upload: zeus-webster

Post on 02-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

CXquery (Chamois Xquery) and its Applications. Hwan-Seung Yong ( 용 환승 , 龍 煥昇 ) Dept. of Computer Science and Engineering 梨花女子大學校 (Ewha Womans Univ.) Seoul, 大韓民國. Contents. Motivations of CXquery: Structure Agnostic Query Query Processing Issues Developing CXquery System Experience - PowerPoint PPT Presentation

TRANSCRIPT

  • CXquery (Chamois Xquery) and its ApplicationsHwan-Seung Yong ( , )

    Dept. of Computer Science and Engineering (Ewha Womans Univ.)Seoul,

  • ContentsMotivations of CXquery: Structure Agnostic QueryQuery Processing IssuesDeveloping CXquery System ExperienceCXquery to Xquery ConversionCXquery to XML Stream Query ProcessingFinal Remarks

  • What is CXqueryCXquery: Chamois XqueryChamoisProject name for Component Based Knowledge Engineering System Framework in Ewha lead by Dr. Won Kim [IEEE 2002]Chamois is an antelope name living Alps MountainThis animal requires short steps to leap highCXquery is same with Xquery except oneWe dont need Xpath composing conditions.Only use element/attribute name

  • BackgroundIn RDBquery is made using schema (relation name, attribute name) with Constant Values for condition checkingSchema is relatively simple structureEasy to learn query and can be used by end userIn OO and ORDBSchema have complex structureSo query composition and design is very hard taskOnly professionals doIn XML, what happen?

  • XML and Query issuesHard to compose like OO/OR caseTry to design SQL like XML query until nowXquery, Xpath: W3C standardIs it SQL like? XML even allow data with no schema (DTD unknown)How do we make query?Natural language query for RDBFor easy of useHow about natural language query for XML?Or how about semi-natural language query

  • Some aspects on XMLXML is a meta language for encoding domain information (book, movie, music, product, company, math, chemistry etc)There need XML standard for each domain worldwide (MathML, CML, BioXML etc)But not yet enoughThis means data from equal domain can be encoded using different DTDThere can be many kind of movie DTDs, music DTDs.Xquery have to follow DTDs, so same query can be expressed by different Xquery

  • DTD Design Choices of same dataElement representationAttribute representationNested representationCombinations of the above

  • Element representation: 1st DTD TypeXquery depends on DTD structure elements representation

  • Attribute representation: 2nd DTD Typefor $m in doc ()//moviewhere $m/*[@genre = action] and $m/*[@year = 1994] and $m/*[actor = Jean Reno]return $m/*/@title attributes representation

  • Nested representation: Third DTD Typefor $y in doc ()//yearwhere $y[text() = 1994] // genre[text() = action] // actor[text()= Jean Reno]return $y//title/text() 1994 America action Leon Luc Besson Jean Reno

    nested representation

  • Combinations: Fourth DTD Typefor $g in doc ()//genrewhere $g[@* = action] // year[*= 1994] // actor[text() = Jean Reno]return $g//title/text()

    1994 Leon

    nested + attributes + elementsrepresentation

  • Independence and DBMSBut in XML in heterogeneous(?) distributed environment Each Xquery seriously depends on its DTDWithout defining single DTD and XML data conversion, we have to make different Xquery

  • Real SQL like XML queryRather use XPath expressions/continent/country/state/city/name = Kyoto //city/name=Kyoto//city//* = Kyoto//city/@name=KyotoJust use name=KyotoJust use element or attribute name instead of XpathFind information about city named KyotoNatural query requires heav semantic processing

  • CXquery ApproachAssumption User have to know exact tag name (Element/Attribute) and valuesUser didnt know the structure (DTD) of XML Query ExampleSearch for movie titles whose genre is action, release year is 1994, and whose stars include Jean Reno genre = action and year = 1994 and actor = Jean Reno for $t in doc()//titlewhere genre = action and year = 1994 and actor = Jean Renoreturn $tApply to XQuery

  • ContentsMotivations of CXquery: Structure Agnostic QueryQuery Processing IssuesDeveloping CXquery System ExperienceCXquery to Xquery ConversionCXquery to XML Stream Query ProcessingFinal Remarks

  • Four Query Processing IssuesFirst, similarity matching is requiredIn an environment where the schema or DTD of XML documents is not precisely known or fuzzy (approximate) search is done, even the precise names of the elements and attributes may not be known. Use thesaurus based matching E.g) for the element names actor, genre, and year, the query processor may also need to search for names such as performer, category, and date, respectively.

  • Query Processing IssuesSecond, heterogeneous representation of same content in XMLintervening elements and/or an attribute between an element name and its corresponding value. Figure (a): One example DTD: element representationFigure (b): type intervenes between genre and action, name intervenes between actor and Jean Reno, and yyyy intervenes between year and 1994. Figure (c), genre, year, and actor are represented as attributesFigure (d), genre, year, and actor are represented as elements but their values are represented as attribute values.This introduces significant implementation difficultiesProcessor should consider all possible representations.

  • (a)(b)(c)(d)

  • Query Processing IssuesThird, intervening elements ()and/or an attribute between an element and its corresponding valueleads to semantic uncertainty in the association between the element and the value.

    Jean Reno is the value associated with the element or attribute family of actor. blind binding of actor to Jean Reno is possble, declare that the search predicate actor = Jean Reno is trueEx) Jean Reno Semantic correctness may be in question !!!

  • Query Processing IssuesFourth, identification of nearest common ancestor (NCA) is neededof all element and attribute names that appear in the search predicates For query-processing optimizationFor preventing erroneous results

  • Query Processing IssuesHowever, the problem is difficult since the structure of the XML hierarchy is not specified in CXqueryEx) NCA of year, genre, and actor

  • ContentsMotivations of CXquery: Structure Agnostic QueryQuery Processing IssuesDeveloping CXquery System ExperienceCXquery to Xquery ConversionCXquery to XML Stream Query ProcessingFinal Remarks

  • One Approach to support CXquery

    Implementation genre =action AND year =1993 AND actor =Tommy Lee JonesStructure??Condition clause:Result clause:title

  • One Approach to support CXqueryImplementation of XML Server based on CXquerySpecial Indexing is usedNode index: all element and attribute nameValue index: all constant value in XMLAll node and value numbering to find their structural relationshipIndices are stored using RDB

    Performance evaluation shows promising result. [ISMIS 2005]

  • One ApproachQuery processor should drive all paths among the names and values. Identification of name and value relationshipIdentification of relationship between namesClassification of all possible paths XML can have is investigatedgenre =action AND year = 1993 AND actor =Tommy Lee Jones

  • To search all possible paths, node numbering scheme is used for each node in XML One Approach

    Leon Luc Besson Jean Reno Natalie Portman . .

  • Node numbering to identify relationship

  • Processing flow diagram overview Implement an XML-server to evaluate the performance of the query expression

  • ContentsMotivations of CXquery: Structure Agnostic QueryQuery Processing IssuesDeveloping CXquery System ExperienceCXquery to Xquery ConversionCXquery to XML Stream Query ProcessingFinal Remarks

  • CXquery to Xquery ConversionSystem Diagram OverviewCXQueryUserCXQuery to XqueryConverterXML Server

    (eXist 1.0)DTD 1 XMLDTD 2XMLXML DBResult XML DocumentDTD/ResultXquery

  • CXquery to Xquery Converter Set of Xquery should be generated for one CXquery based on number of different DTDFor $c in doc()Where genre=action AND actor=Jean RenoReturn titleFor $c in /moviesWhere $c/genre=action AND $c/actor=Jean RenoReturn $c/titleCXQueryXQuery

  • Xquery for each DTD type

  • ContentsMotivations of CXquery: Structure Agnostic QueryQuery Processing IssuesDeveloping CXquery System ExperienceCXquery to Xquery ConversionCXquery to XML Stream Query ProcessingFinal Remarks

  • System Flow diagramInputCXQueryFileProcessingYfilter XML Stream EngineOutputPath GeneratorCXQuery ConverterXMLXML StreamDTD FileDTD Path SetCXQueriesXML SteamXpath Queris

  • CXquery to Xquery Conversion/cities/city/name/cities/city/latitude/cities/city/population/cities/city/located_at/cities/city[@is_country_cap]/cities/city[@is_state_cap]/cities/city/population[@year]/cities/city/located_at[@watertype]path_mondial-cities.txtCXQ1:is_country_cap="yes" or latitudeCXQ2:car_code="MK and area="25333"CXQ3:name="Caspian Sea" or area="17000"CXQ4:latitudeCXQ5:ethnicgroupsCXQ6:nameCXQ7:country="Korea"(a) DTD Path Set(b) CXQuery/cities/city/name/cities/city/latitude/cities/city[@is_country_cap](c) Xpath ser from (a) and (b)/cities/city/latitude/cities/city[@is_country_cap=yes]/cities/city[name=Caspian Sea]/cities/city/name(d) Xquery Generation

  • Xquery Conversion for each DTDs

  • Implementation ResultCXquery Example

  • Matching process of CXquery with Path Set

  • Xquery Conversion results

  • Example 6 CXquery

  • Converted 13 Xquery

  • CXquery for distributed XML serversIn heterogeneous DBMS environmentSingle standard schema is required in central serverQuery translation is requiredQuery on Standard schema translated into sites schemaDistributed CXquery environmentWe dont need standard XML schema but collection of Each Sites DTD is enoughUser only compose query using CXqueryCXquery has DTD neutral propertyCentral site then convert CXquery to sites Xquery and collect result.

  • Heterogeneous XML stream query processingStream data is increasingRSS stream, news stream, stock trading, sensor stream, multimedia stream etc.Stream processing engine is neededHandle large number of heterogeneous XML stream concurrentlyHow do we use single stream query on this multiple heterogeous streamsQuery translation for each stream and processing differently?Apply single CXquery to multiple heterogeneous stream.

  • Final Remarks CXquery having no path is introduced This area of research need more works from now onTechnical issues for future researchElement/Attribute Value Association is required to solve Semantic ambiguity problemName = Kyoto vs name=Tanaka vs name = Winter Sonata

  • Final Remarks Possible approachesDefine DTD tag name more specificallyCityname = Kyoto vs person-name=Tanaka vs Movie-name = Winter Sonata System can resolve domain conflict exactly through data mining etc.Kyoto represents city name, Tanaka represents Person name etc.User specify exact domain name for all constantsName = Kyoto[City], name=Tanaka[Person] name=Winter Sonata[Movie]XML extension is required

  • Thank you for your attention Questions?

  • ReferencesISMIS 2005] Wol Young Lee, Hwan Seung Yong, "A Query Expression and Processing Technique for an XML Search Engine," ISMIS 2005: 15th International Symposium on Methodologies for Intelligent Systems, Saratoga Springs, NY, USA, May 2005.pp.266-275. [JOT 2004b]Won Kim, Wol Young Lee, Hwan Seung Yong, "On Query-Processing Issues for Non-Navigational Queries for XML," in Journal of Object Technology, Vol.3, No. 10, November-December 2004, pp. 19-26. [JOT 2004a] Won Kim, Wol Young Lee, Hwan Seung Yong, On Supporting Structure-Agnostic Queries for XML, in Journal of Object Technology, Vol.3, No.7, July-August 2004, pp.27-35 , [JOT 2002] Won Kim et al., "The Chamois Reconfigurable Data-Mining Architecture, " Journal of Object Technology, Vol. 1, No. 2, July-August 2002, pp.2-10. [IEEE 2002] Won Kim et al., "Chamois: A Component-Based Knowledge Engineering Framework," IEEE Computer, Vol. 35, No. 5, May 2002, pp. 46-54.

    For the work, first of all, we design a semiFor example if we want to search movie title of whichwe express as follows.This expression represents only names of data and their values regardless of structure.We call names of data mandatory components.Second, we compute all possible paths among mandatory components in advance because we express only mandatory components and their values.All possible paths are divided into two cases. One is a case of using only mandatory components, another is a case of inserting other components between mandatory components.This figure shows paths of two cases.And then we develop .For instance, we model this documents as a treeAnd then we assign a unique identifier to each component in a tree and store the identifiers and components into a relational database. And then we make a query processor return query results of X-NPE.I have implemented an XML server to evaluate the performance of Experimental system architecture is the following. A processor for XML documents and X-NPE and conventional path-based query processor.The processor for XML documents consists of document parser and index constructor.The processor for query expressions consists of parser, SQL translator, query processor, and result constructor.