a new semantic model with applications in a multimedia database system

14
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2009; 21:691–704 Published online 1 September 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1373 A new semantic model with applications in a multimedia database system Qing Li 1, , , Na Li 1 , Liping Wang 2 and Xiaoping Sun 3 1 Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong, China 2 School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia 3 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China SUMMARY When people realized that relational databases fall short in supporting advanced applications including multimedia data management due to the limited modeling power of the relational data model, researchers went ahead with devising semantic, object-oriented data models in the 1980s (until early 1990s). While the later commercial development of database systems has led to the so-called object-relational databases since late 1990s, such a marriage of the two does not actually solve the problems encountered by multimedia data management. In this paper, we present a new semantic multimedia database model based on an extension to the traditional ANSI/SPARC three-level architecture, attempting to cater for the unique requirements of multimedia data management. Various facilities of this semantic multimedia database model are described and discussed. A number of applications have been developed based on this new model, and in this paper we shall describe some of these including recipe modeling and graph mining for retrieval. Copyright © 2008 John Wiley & Sons, Ltd. Received 7 June 2008; Accepted 7 June 2008 KEY WORDS: multimedia database; MediaView; recipe model; similarity measure Correspondence to: Qing Li, Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong, China. E-mail: [email protected] Contract/grant sponsor: City University of Hong Kong; contract/grant numbers: 7001815, 7001956 Contract/grant sponsor: National Basic Research Program of China; contract/grant number: 2003CB317000 Copyright 2008 John Wiley & Sons, Ltd.

Upload: qing-li

Post on 11-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2009; 21:691–704Published online 1 September 2008 inWiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1373

A new semantic model withapplications in a multimediadatabase system

Qing Li1,∗,†, Na Li1, Liping Wang2 andXiaoping Sun3

1Department of Computer Science, City University of Hong Kong,83 Tat Chee Avenue, Hong Kong, China2School of Information Technology and Electrical Engineering, The Universityof Queensland, Brisbane, Australia3Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

SUMMARY

When people realized that relational databases fall short in supporting advanced applications includingmultimedia data management due to the limited modeling power of the relational data model, researcherswent ahead with devising semantic, object-oriented data models in the 1980s (until early 1990s). While thelater commercial development of database systems has led to the so-called object-relational databases sincelate 1990s, such a marriage of the two does not actually solve the problems encountered by multimediadata management. In this paper, we present a new semantic multimedia database model based on anextension to the traditional ANSI/SPARC three-level architecture, attempting to cater for the uniquerequirements of multimedia data management. Various facilities of this semantic multimedia databasemodel are described and discussed. A number of applications have been developed based on this newmodel, and in this paper we shall describe some of these including recipe modeling and graph mining forretrieval. Copyright © 2008 John Wiley & Sons, Ltd.

Received 7 June 2008; Accepted 7 June 2008

KEY WORDS: multimedia database; MediaView; recipe model; similarity measure

∗Correspondence to: Qing Li, Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, HongKong, China.

†E-mail: [email protected]

Contract/grant sponsor: City University of Hong Kong; contract/grant numbers: 7001815, 7001956Contract/grant sponsor: National Basic Research Program of China; contract/grant number: 2003CB317000

Copyright q 2008 John Wiley & Sons, Ltd.

692 Q. LI ET AL.

1. INTRODUCTION

With the significant growth of multimedia data such as images and video clips on the Web, howto manage, retrieve and navigate these multimedia data precisely and efficiently becomes a criticalproblem. The dynamic nature of multimedia is fundamentally different from that of the traditionalalphanumeric data, whose semantics is explicit, unique, and self-contained. This distinction explainsthe failing of applying traditional data models to characterize the semantics of multimedia data. Forexample, in a conventional (strongly typed) object-oriented model, each object statically belongsto exactly one type, which prescribes the attributes and behaviors of the object. This obviouslyconflicts with the context-dependent nature of a media object, which needs to switch dynamicallyamong various types depending on specific contexts. Moreover, a conventional object model canhardly model the media-independency nature, which requires media objects of different types tohave some attributes and methods defined in common.The incapability of semantic multimedia modeling severely undermines the usefulness of a

database to support semantics-intensive multimedia applications. This problem constitutes the majormotivation ofMediaView as an extended object-oriented viewmechanism. As illustrated in Figure 1,MediaView bridges this ‘semantic gap’ by introducing the above traditional three-schema databasearchitecture (an additional layer constituted by a set of modeling constructs named media views).Each MediaView, defined as an extended object view, formulates a customized context in which thedynamic and elusive semantics of media objects are properly interpreted.A brief comparison of some existing constructs in an object model with our MediaView is

discussed in Section 2. The basic concepts of MediaView are defined in Section 3 where the viewconstruction, evolution and customization are presented. Moreover, in Section 4 we demonstratea real-world application, namely a multimedia recipe analysis system, which is elegantly modeledby MediaView. Section 5 present several experiment results conducted on our system and theconclusion of the paper is given in Section 6.

Figure 1. MediaView as a ‘semantic bridge’.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

DEVISING A SEMANTIC MODEL FOR MMDB 693

2. COMPARISON TO RELATED WORK

In the following, we compare MediaView with each of these constructs in order to clarify theposition of our work in the framework of object models.Class: Similar to the extent of a class, a media view also contains a set of objects as its members,

and it can apply (member-level) properties on them to describe their structural and behavioralproperties. However, a media view differs from a class in several aspects, particular in that:

1. it can accommodate heterogeneous objects, whereas a class only holds a set of uniform objects.2. a media view can only dynamically include/exclude objects that are instances of source

class(es), and does not create new objects.3. while an object must belong to exactly one class, it can be included into arbitrary number of

media views.4. a media view models the semantic relationships and consequently the interaction between its

members, which is not supported by class.5. the global feature of a media view is captured by its view-level properties, another feature not

supported by class.

Object view: In the past decade, there exist numerous proposals on object-oriented view mecha-nisms (e.g. [1,2]). Generally, an object view can be regarded as a virtual class derived by a queryover classes [3]. In fact, an object view is almost a class except that its instances are selected fromthe instances of other classes, and in this regard it is closer (compared with class) to ourMediaView.However, except point 2, the rest statements on the difference between amedia view and a class holdfor a conventional object view as well. Furthermore, with the ability of assigning new propertiesto its members, a media view is more powerful than a conventional view, whose properties areinherited or derived from classes (e.g. deriving the area of a circle object from its diameter).Admittedly, with these new features added, a media view can be hardly classified as an object

view (and MediaView is no longer just a view mechanism) from a conventional point of view,although our initial thought was to adapt an object view for multimedia data. In this paper, we stickto the term ‘view’ on the ground that (i) structurally, media views sit in between the conceptualschema and the applications, the position where views are used to be, and (ii) functionally, they areused to provide customized view of the data for a certain application.Composite object: From another perspective, a media view can be regarded as an extended

composite object, which maintains two lists of object references—one list keeps the members ofthemedia view, and the other keeps all the relationships (which are implemented as objects) betweenmembers. As a composite object, a media view naturally allows dynamic insertion/removal of itsmembers and relationships. The view-level properties correspond to the properties of the compositeobject. As the major difference between them, however, a media view can define properties for itsmembers, whereas a composite object cannot.

3. FUNDAMENTAL OF MEDIAVIEW

A media view named as MVi is represented by a tuple of four elements:

MVi =〈Mi , Pvi , Pm

i , Ri 〉

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

694 Q. LI ET AL.

Figure 2. Example of classes and a media view.

1. Mi is a set of objects that are included into MVi as its members. Each object o ∈ Mi belongsto a certain source class, and different members ofMVi may belong to different source classes.

2. Pvi is a set of view-level properties (attributes and methods) applied on MVi itself.

3. Pmi is a set of member-level properties (attributes and methods), which are applied on all the

members of MVi .4. Ri is a set of relationships, and each r ∈ Ri is in the form of 〈o j , ok, t〉, which denotes a

relationship of type t between member o j and ok in MVi .

The relationship between classes and a media view is exemplified in Figure 2. A set of classes isdefined to model media objects of different types (Figure 2(a)). Figure 2(b) illustrates an examplemedia view called DBMS. Each member of this media view is a media object that is about a specificDBMS product, such as a JPEG image illustrating aDBMS, etc. Different from the properties definedin their source classes, their properties in the media view focus on the semantic aspects of mediaobjects. Moreover, a view-level property, definition, is used to describe the global property of theMediaView itself (i.e. the definition of a DBMS). Different types of semantic relationships mayexist between the view members. For example, the ‘speech-slide’ relationship between the Speechobject and the Slide object denotes that the speech accompanies the slide.

3.1. MediaView construction

We synthesize existing information processing technologies to construct the links betweenMediaView and concrete media data. A multi-system approach is used in our framework and pre-vious query results are accumulated. Furthermore, we append to the MediaView Engine on variouskeywords-based CBIR systems to acquire the knowledge of semantic links between media contentsand contexts (queries) from these well-designed IR technologies, as Figure 3 shows.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

DEVISING A SEMANTIC MODEL FOR MMDB 695

Figure 3. The system architecture for MediaView construction.

Figure 4. Projection in a sub semantic space.

For the reason that different queries may greatly vary with the liberty of choosing query keywords,we need an approach to organize those knowledge into a logic structure for future use. Semanticsin the MediaView framework is organized by following WordNet [4] where a variety of semanticrelationships are defined between word meanings, represented as pointers between synsets. It isdivided into five categories: noun, verb, adjective, adverb, and function word. Hyponymy relation-ship organizes the meanings of nouns into a hierarchical structure. Actually, a context could berepresented by a concept, e.g. ‘flower’, or a combination of concepts, e.g. ‘Van Gogh’s painting’.We call simple context as the context that could be represented by a concept. The collection ofmedia views corresponding to all simple contexts, therefore organized as the hierarchical structureof WordNet, constitutes the basic architecture of MediaView framework. These media views arecalled common media views.A multi-dimension semantic space exists under a concept, denoted as ‘superconcept’ in Figure 4,

if there are several sub-concepts related with that concept. For example, the concept ‘Season’ has

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

696 Q. LI ET AL.

Figure 5. Two feedback sources of MediaView.

a 4-dimension semantic space [‘spring’, ‘summer’, ‘autumn’, ‘winter’]. If we know some mediaobject is relevant to a superconcept, it surely is relevant to one of the subconcepts.

3.2. MediaView evolution through feedback

Themedia views stored in the database are accumulated along with the processes of user interaction.There are two kinds of feedback, system-feedback and user-feedback, utilized in the evolution(Figure 5).System-feedback: The multi-retrieval systems become a feedback source to evolve theMediaView

engine. By analyzing the retrieval result of each query, we know more about the semantics ofretrieved media objects.User-feedback: Users’ identification on the relevant and irrelevant results of the query helps the

systems to improve the search performance.With the belief that the more times a media object is considered as relevant, the more confidence

it carries to be relevant to the context, we provide a fuzzy logic-based evolution mechanism toaccumulate the effects on both system and user feedback.If a media object is selected to match a sub-concept, it is certainly a match to all of the super-

concepts. Hence an inverse propagation of a media object is implemented from bottom-up to affectthe confidence of its upper concepts.

3.3. MediaView customization

A two-level MediaView framework is provided for view customization (Figure 6). The first levelis composed by common media views, which are permanent to the system and accumulated fromthe common knowledge of all users. The second level is for customized media views, which isgenerated based on common views to accommodate users’ personal interests.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

DEVISING A SEMANTIC MODEL FOR MMDB 697

4. CASE STUDY: MULTIMEDIA RECIPE DATABASE

The utilization of ourMediaView mechanism can be best demonstrated through concrete real-worldapplications. In this section, we present a case study of applying our MediaView mechanism to aWeb-based recipe management system named RecipeView.In recent years, there is an explosive proliferation of recipe data (Figure 7) on the Web. Recipes

have their distinct characteristics that make most (if not all) of the conventional data models

Figure 6. Two-level MediaView framework.

Figure 7. A recipe’s components.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

698 Q. LI ET AL.

unsuitable/inapplicable for such data, including:

• Loosely structured: Recipes are usually semi-/loosely structured data, with various levels ofdetails and granularities. Some are more detailed/wordy than others.

• Multimedia flavored: Many recipes on the Web are supplied with pictures showing the ingre-dients, audio/videos introducing the cooking procedure, and/or photos illustrating the resultof final touch.

• Behavior oriented: Recipes are not only data-intensive, but also behavior oriented, in that themain part of a recipe is about the procedure to follow in cooking a dish.

• Constraints bound: Recipes are usually also bound by various constraints which are applicableto either individual actions or a sequence of actions.

4.1. Recipe model

We adopt and extend the MediaView definition to comprehensively describe recipe data. Here arecipe R is modeled and represented by a media view in the form of three elements:

R = 〈M,RP, SP〉where(a) M ={Mi |i = 1 . . .m} is a set of ingredients given as: Mi = 〈MID,MP〉, whereMID is a unique

identity of a member ingredient; MP is a set of member-level properties (and functions) such asthe name and image of the ingredient.An ingredient Mi belongs to one of the three classes: Main, Minor and Seasoning;(b) RP is a set of recipe-level properties (and functions) applied on R itself, such as cooking

style, video clips of the cooking procedure and images of the dish of the recipe;(c) SP= (V, E, Time,Cons, Ingr) is a ‘Cooking Graph’, which is a labeled-directed graph de-

scribing the whole cooking procedure of making a dish by following the recipe R. In particular,

1. V ={vi |i = 1 . . . n} is a set of vertices. Each vi represents a cooking action associated with aunique timestamp Time(vi ) indicating the start time of vi as well as a set of cooking actionconstraints Cons(vi ) that should be satisfied when the action of vi takes place. The LabelL(vi ) is set to the action name of vi .

2. E is a set of edges on V , which describes the temporal execution flow of cooking actions.An edge (vi , v j ) indicates action v j should take place after the completion of action vi .These directed edges are named ‘action flows’. Each (vi , v j ) ∈ E is associated with cookingtransition constraints Cons(vi , v j ), which indicate the conditions that should be satisfied forthe flow to take place.

3. Each vertex vi is associated with a set of ingredients Ingr(vi ) that should be added when theaction of vi takes place. The ingredients can be raw or the output of other actions; we useO(vi ) to represent the output ingredients of vi . These inputs and outputs for the vertices arecalled ‘ingredient flows’.

Table I shows the cooking procedure of a sample recipe named ‘Stir-fried beef with broccoli’,which is crawled from the Web by a RecipeCrawler [5,6]. The cooking procedure is parsed intobasic actions and their related properties.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

DEVISING A SEMANTIC MODEL FOR MMDB 699

Table I. Cooking procedure of ‘Stir-fried beef with broccoli’.

Step# Recipe cooking procedure in steps

1 Cut beef into thin slices. Marinate with cornstarch and soy sauce for 30 minutes.2 Mix oyster sauce, soy sauce, cornstarch and water. Cut broccoli into thin slices and crush garlic.3 Heal oil. Add beef and stir-fry when the oil is medium-hot.4 Remove the beef when it is nearly cooked.5 Heat oil. When the oil is hot, add the crushed garlic and stir-fry briefly.6 Add the broccoli, salt and sugar, stir-fry briefly. Add 1/2 cup water and simmer for 4–5 minutes.7 Add the beef, the mixed sauce and stir quickly. Then remove.

Figure 8. Cooking graph of ‘Stir–fried beef with broccoli’.

Its cooking graph is illustrated in Figure 8 according to the definition of SP. Each vertex representsan action such as ‘cut’ and ‘stir-fry’, while the temporal execution sequence is identified by theaction flows. Here we set Time(vi )<Time(v j ) for any i< j . The ingredient flow indicates that theingredients needed in an action are provided by the preceding actions or directly from the rawingredients M . Here beef and broccoli are considered as Main ingredients with garlic as Minoringredients and salt/soy sauce/oyster sauce as Seasoning ingredients.

4.2. Recipe similarity calculation

As the concept of the recipe model is built on the MediaView, the facilities of MediaView such asview operators and semantic space can also be applied after some modifications. In this section, wemainly introduce how to establish the semantic connections in terms of ‘similarity’ between tworecipes.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

700 Q. LI ET AL.

Totally four factors are considered to calculate the similarity of two recipes R1 and R2:

1. Recipe document.2. Cooking graph.3. Recipe image (the weight of this factor is set lowest among the four factors).4. User preference/feedback (if available).

Document similarity is the most popular and widely investigated similarity measurement in alarge part of the information retrieval research. Among various algorithms in document similarity,the baseline model is the vector space model (VSM) which is originally introduced by GeraldSalton and his associates [7–9]. In the VSM, the keywords or index terms are viewed as basicvectors in a linear vector space, and each document is represented as a vector in such a space [10].Search engines adopting VSM calculate similarity between documents by means of operating onthose vectors. A traditional and direct measurement is the cosine angle that is formed between twovectors. Another issue concerned in the VSM is vector representation. According to Salton [7],the term in a document vector is the product of local and global parameters, known as termfrequency—inverse document frequency model. On the other side, content-based image retrieval islately focused, with several systems published for this purpose [11].By taking the advantage of the structure of cooking graphs, we further advocate a novel graph-

based similarity calculation method. Thus, we are able to retrieve similar recipes in terms of cookingprocess even though the names of recipes may be totally different.Suppose SP1 is the cooking graph of recipe R1 and SP2 is the cooking graph of recipe R2. SP1 and

SP2 sharem subgraphs SPSi (i = 1 . . .m)where the discovery of shared subgraphs is discussed in thefollowing subsection. The basic structure similarity of two recipes R1 and R2 is then calculated as

sim(SP1, SP2) =n∑

i=1|ESi |(�|ESAi | + �|ESIi |) · log2

N

dSi(1)

where |ESi | is the number of edges (including both action and ingredient edges) of the subgraphSPSi . |ESAi |/|ESIi | are the number of action/ingredient edges in SPSi and obviously |ESi | = |ESAi |+|ESIi |. N is the total number of recipes. dSi is the number of cooking graphs that contain subgraphSPSi . log2 N/dSi is the inverse subgraph frequency so as to make rare common subgraphs moreimportant than frequent ones. � and � are weight of |ESAi | and |ESIi |, respectively. In particular,the sequence of some actions can be changed without affecting the final cooking result. The factor|ESi | gives higher weights to large subgraphs. Intuitively, the larger the subgraphs SP1 and SP2share, the more similar they are to each other.

4.2.1. Discovering shared subgraphs

If a subgraph SPS occurs in both cooking graphs SP1 and SP2, then we say that SP1 and SP2 sharethe subgraph SPS . To make the existing subgraph mining algorithms such as FSG [12] applicableto mine our shared subgraphs, several extensions are made so as to support directed graphs wheremultiple edges can exist between a given pair of vertices; such extensions include the following:

1. A ‘dummy’ vertex vA1 (as shown in Figure 9) is added between two vertices v1 and v2, ifthere is an action edge from v1 to v2. That is, Label L(vA1) = L(v1) + ‘ ’ + L(v2).

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

DEVISING A SEMANTIC MODEL FOR MMDB 701

Figure 9. Dummy vertex insertion.

Figure 10. Edge direction check.

Therefore, all edges belong to just one type of edges. If there is an edge directly from v1to v2, this edge represents an ingredient flow from v1 to v2. For a dummy vertex vA1 , if thereexist both an edge from v1 to vA1 and an edge from vA1 to v2, then there is an action flowfrom v1 to v2.

2. The FSG algorithm is applied to the modified cooking graphs to find shared subgraphs.3. After performing FSG, several steps are conducted for restoration, as detailed below:

(a) As the derived subgraphs are undirected, graph matching using Ullmann’s algorithm [13]and edge direction check (as shown in Figure 10) need to be processed.

(b) Remove dummy vertices.(c) Remove duplicated patterns.

5. EXPERIMENTS

We collect 103 Chinese recipe documents (with 51 Guangdong style dishes and 52 Sichuan styledishes) from the Web, with dirty data such as typo errors, misspelling terms, or abbreviations. Byusing the MediaView model, we convert the plain text format to the cooking graph format, whichhas been illustrated in the former section. Note that both recipe documents and cooking graphs

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

702 Q. LI ET AL.

Figure 11. Precision and recall curve for graph similarity and text similarity.

have been pre-processed for the use of retrieval, and we ignore the information carried by images,since their information volume is too little to be used. In this section, we implement graph-basedsimilarity (as described in Section 4.2) and text similarity (using baseline VSM) on our databaseRecipeView, based onwhich, similar recipes can be retrieved by ranking the results.We then evaluatethe performance of both algorithms using traditional precision & recall measurement. At the end,we present a comprehensive case to give readers a better understanding of our whole system.In RecipeView, the recipe documents totally have 696 uniquewords. After conversion, the cooking

graphs consists of 34.4 vertices on average (ranging from 15 to 56), and 45.2 edges on average(ranging from 20 to 80).We adopt the most common evaluation in information retrieval [14], precision and recall, to

measure the effectiveness of structure similarity and VSM-based document similarity. Recall isthe ratio of relevant documents/graphs retrieved for a given query over the number of relevantdocuments/graphs for that query in the database. Precision is the ratio of the number of relevantdocuments/graphs retrieved over the total number of documents/graphs retrieved. In our system,we use the notion of search by example (the system will search the most similar recipes accord-ing to an example recipe given by users). Therefore, the query here is just an example amongdocuments/graphs. Figure 11 shows the precision and recall curve for graph similarity and textsimilarity. The solid curve represents graph similarity; the dotted curve represents text similarity.From Figure 11, we can conclude that our novel graph similarity measure gains a far better

improvement in retrieval performance than text similarity. Under the same recall rate, the former hasan approximately twice precision rate as the latter in the interval from 0.2 to 0.9. The reason for thisimprovement is mainly attributed to the cooking graph structure. Taking the advantage of abstractingthe workflow of a recipe document to a graph, it contains more information semantically describingthe procedure of the data, comparing with simple statistical representation used by the VSM. It canbe easily recognized that two recipes are very similar in cooking procedure with different majoringredients, e.g. ‘Kung Pao Pork’ and ‘Kung Pao Chicken’. However, VSM may not give the paira high similarity value, because of the different terms that each recipe uses separately. It may alsosuffer from the correlation problem mentioned in [15]. In addition, another reason may be led bythe different pre-processing we had conducted. For VSM, we only remove function words, such as

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

DEVISING A SEMANTIC MODEL FOR MMDB 703

Figure 12. System screenshot: the target recipe (left window) with its most similar recipe (right window), andother top 10 similar recipes (shown on the right column).

‘a’, ‘the’, as well as html tag, whereas for graph conversion, we do it manually, therefore preventingsome of the dirty data. However, the retrieval result (which is ranked by similarity) shows that thetop recipes ranked by using the graph similarity method are very similar to the example, while it isnot the case in the text similarity method. This discovery further convinces us that the advantageof using cooking graph is the major reason for the performance improvement.To provide a more complete picture of our system, we give an example in Figure 12, which shows

a query ‘Fried Spareribs in Orange Juice’ submitted by a user and the result obtained by the system.In particular, the system finds the recipe upon receiving the query string, and then the particularrecipe graph is retrieved and displayed in the left window. Meanwhile, similar recipes are listed onthe right column with 10 recipes per page. Clicking on one of them will display the correspondingcooking graph in the right window. Common parts between the two recipes are highlighted inorange color. If a user moves the mouse across the vertices or edges, some popup information isdisplayed indicating the associated constraints. For the two cooking graphs in Figure 12, it can beseen that the two recipes are rather similar with each other (except for the main ingredient andsome minor ingredients), even though their names are totally unrelated. Moreover, by carefullyexamining the top 10 recipes returned, we find that they all belong to the same cooking style,and have some important ingredients in common. This demonstrates the effectiveness of usingthe frequent common pattern-based graph matching, based on which interesting and semanticallymeaningful results are obtained.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe

704 Q. LI ET AL.

6. CONCLUSION AND FUTURE WORK

In this paper, we have presentedMediaView as a new multimedia database model with an extensionto the traditional ANSI/SPARC three-level architecture, so as to cater for the unique requirementsof multimedia data management. Various facilities of this MediaView are described and discussed.A number of applications have been developed based on this new model, and we have described inthis paper some of these including recipe modeling and graph mining for retrieval. We have builta fully fledged personalized recipe recommendation system RecipeView, based on which, similarrecipes are ranked and recommended to users. We have shown that our retrieval performance is farbetter than traditional methods by using recipe model and graph mining.

ACKNOWLEDGEMENTS

The authors would like to thank Mr. Jun Yang (now at CMU) and Professor Guozhu Dong (Wright StateUniversity) for their precious contributions to some parts of the project reported in this paper.

REFERENCES

1. Rundensteiner EA. Multiview: A methodology for supporting multiple views in object-oriented databases. Proceedings ofthe 18th International Conference on Very Large Data Bases (VLDB’92), Vancouver, British Columbia, Canada. MorganKaufmann Publishers: Los Altos, CA, U.S.A., 1992; 187–198.

2. Scholl MH, Laasch C, Tresch M. Updatable views in object-oriented databases. Proceedings of the 2nd InternationalConference on Deductive and Object-Oriented Databases (DOOD), vol. 566. Delobel C, Kifer M, Yasunga Y (eds.).Springer: Munchen, FRG, 1991.

3. Abiteboul S, Bonner A. Objects and views. SIGMOD ’91: Proceedings of the 1991 ACM SIGMOD InternationalConference on Management of Data. ACM: New York, NY, U.S.A., 1991; 238–247.

4. Miller GA. Wordnet: A lexical database for english. Communications of the ACM 1995; 38(11):39–41.5. Li Y, Meng X, Wang L, Li Q. RecipeCrawler: Collecting recipe data from www incrementally. Conference on Web-Age

Information Management (WAIM) (Lecture Notes in Computer Science, vol. 4016), Yu JX, Kitsuregawa M, Leong HV(eds.). Springer: Berlin, 2006; 263–274.

6. Wang L. Cookrecipe—Towards a versatile and fully-fledged recipe analysis and learning system. PhD Thesis, Departmentof Computer Science, City University of Hong Kong, January 2008.

7. Salton G. The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall: EnglewoodCliffs, NJ, 1971.

8. Salton G. Dynamic Information and Library Processing. Prentice-Hall, Inc.: Upper Saddle River, NJ, U.S.A., 1975.9. Salton G, Mcgill MJ. Introduction to Modern Information Retrieval. McGraw-Hill, Inc.: New York, NY, U.S.A., 1986.10. Wong SKM, Ziarko W, Wong PCN. Generalized vector spaces model in information retrieval. SIGIR ’85: Proceedings

of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM:New York, NY, U.S.A., 1985; 18–25.

11. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R. Content-based image retrieval at the end of the early years.IEEE Transactions on Pattern Analysis and Machine Intelligence 2000; 22(12):1349–1380.

12. Kuramochi M, Karypis G. Frequent subgraph discovery. ICDM ’01: Proceedings of the 2001 IEEE InternationalConference on Data Mining. IEEE Computer Society: Washington, DC, U.S.A., 2001; 313–320.

13. Ullmann JR. An algorithm for subgraph isomorphism. Journal of ACM 1976; 23(1):31–42.14. Frakes WB, Baeza-Yates RA (eds.). Information Retrieval: Data Structures & Algorithms. Prentice-Hall: Englewood

Cliffs, NJ, 1992.15. Raghavan VV, Wong SKM. A critical analysis of vector space model for information retrieval. Journal of the American

Society for Information Science 1999; 37(5):279–287.

Copyright q 2008 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2009; 21:691–704DOI: 10.1002/cpe