geocharacters - uva · in section 3 i will discuss how i have used wordnet to implement my...

18
GeoCharacters creating visual character representations with geo-tagged photographs Bram Huijten 5814669 Bachelor thesis Credits: 9 EC Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. Frank Nack Intelligent Systems Lab Amsterdam (ISLA) Institute for Informatics University of Amsterdam Science Park 107 1098 XG Amsterdam June 25 th , 2010

Upload: others

Post on 29-Oct-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

GeoCharacterscreating visual character representations

with geo-tagged photographs

Bram Huijten5814669

Bachelor thesisCredits: 9 EC

Bachelor Opleiding Kunstmatige Intelligentie

University of AmsterdamFaculty of ScienceScience Park 904

1098 XH Amsterdam

SupervisorDr. Frank Nack

Intelligent Systems Lab Amsterdam (ISLA)Institute for Informatics

University of Amsterdam Science Park 107

1098 XG Amsterdam

June 25th, 2010

Table of contents:

1. Introduction! 3

2. Theoretical context and related work! 4

2.1 Images and Flickr tags! 4

2.2 WordNet! 4

2.3 Storytelling applications! 5

2.4 An essence of a story! 5

3. Method: idea, algorithm and implementation! 6

4. Results! 8

5. Discussion! 15

6. Conclusion and recommendations for future work! 15

7. References! 16

2

Abstract:In this thesis I will present an algorithm for constructing coherent visual representations of story characters from existing photographs with geotags. The core idea in this algorithm is that by pair

wise searching for two opposing characters, a protagonist and an antagonist, it is possible to achieved good results by looking at the coherency and contrast of the tags assigned to pictures.

An implementation of the algorithm that uses WordNet to measure the semantic similarity between tags is discussed, results are shown and the responses to a small user test are presented that show respondents perceived the generated visual representations as more coherent and more

logical than collages created using only tag-based image retrieval.

1. IntroductionIn recent years, we have witnessed a surge in the number of people that use smartphones and we have seen how these devices have turned into capable mini-computers that come equipped with relatively large screens, a digital camera, a GPS receiver and a fast mobile connection to the world wide web. Combined with the huge popularity of social network and photo and video sites, this opens up enticing possibilities for both research and applications as there are vast amounts of up to date information in the cloud that users can access while on the go.

There are currently multiple research projects underway at the University of Amsterdam to reach a combined goal: an interactive mobile application that can generate location aware stories for users. Such an application can have many uses: it can present users with information in creative and enticing ways and can let them learn new things about a location. But developing such an application is also interesting to push the boundaries of AI research, because a lot of different problems have to be overcome in order to successfully build such an application.

Creating a story is a difficult process and requires creative and associative skills as well as a sound understanding of narrative structures and a healthy dose of common knowledge. It is one of those cognitive tasks that are difficult to model appropriately. In this project I have focussed on a specific part of this task of creating a (visual) story: the selection of specific photos from a large set of existing photos that can be used to represent characters in a story.

People are able to quickly interpret images and assign meaning to them. They can also see links between different photos based on both the semantic and stylistic properties of an image. Humans also posses knowledge about other humans and knowledge about characters, knowledge derived from all the stories they have heard and seen and read. They combine all these capabilities and this knowledge when they perform a task such as the one I am trying to accomplish in this project. Can a computer strive to perform such a task successfully? In other words:

Is it possible to generate a pair of visual character representations that represent two characters on opposing sides of a conflict and that are perceived as coherent and meaningful by humans using existing photographs taken at a specific location?

In the rest of this thesis I will argue that this is indeed possible by presenting an implementation of an algorithm I developed that uses geotagged Flickr images and WordNet based semantic comparisons of image tags. I will first discuss related works and research relevant to this problem in section 2. Then I will discuss the algorithm I have developed and the implementation I have used in section 3. The results are presented in section 4 and discussed in section 5. Section 6 contains the conclusions that can be

3

drawn from this work and recommendations for future work. The thesis concludes with a list of references in section 7.

2. Theoretical context and related work2.1 Images and Flickr tagsEveryone that has tried to search for images via google knows that the results are often surprising and unpredictable. The same holds for a Flickr image query: images are returned based on the tags given by users, the problem with this is that the relation between the tags and the perceived content of the picture can sometimes seem quite arbitrary. Goodrumʼs overview of image retrieval techniques in (Goodrum, 2000) is somewhat dated but does accurately describe an important problem when he states that: “The textual representation of images is problematic because images convey information relating to what is actually depicted in the image as well as what the image is about. ... For example, an image may be of a glass of wine, but be about the Christian mass.”

This is a fundamental problem that arises when one tries to describe images. A common saying is that a picture says more than a thousand words. Which means that there is bound to be a semantic gap when one tries to describe the contents and the meaning of a picture in a few words. This problem is enhanced even further when one uses a site with content uploaded by users and also tagged by users. These users do not have a shared set of well defined terms that they use to tag pictures. The tags given are not always accurate or can be heavily biased towards a certain interpretation. How accurate are these tags? And how accurate are the location tags that photos are given?

Hollenstein et al. describe in (Hollenstein & Purves, 2010) how they have used Flickr tags to define fuzzy notions such as the centre of a city. They use a combination of user assigned placetags and GPS coordinate tags to mark city centres on a map. They find that the tags are accurate enough to achieve good results. They do note however, that there is a large variation in the number of tags that is assigned to a particular photo and that the quality of the tags varies greatly from user to user. Despite this, they find that in 86 % of the pictures the location tag is accurate.

Nov et al. show in (Nov, Naaman, & Ye, 2008) that users have different goals and motivations when assigning tags. It makes a difference if they are tagging for themselves, for friends or for strangers. They state that the fact that pictures on a site like Flickr are a way of expressing yourself within the online community has a positive effect on both the number and the quality of the tags.

2.2 WordNetIn section 3 I will discuss how I have used WordNet to implement my algorithm. WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet is a well known tool in the field of natural language processing and has been used in many research projects. More information about WordNet can be found in (Fellbaum, 1998). An interesting article about using WordNet to enhance image retrieval techniques by a method called the ʻSemantic Similarity Retrieval Methodʼ is detailed in (Varelas, Voutsakis, Raftopoulou, Petrakis, & Milios, 2005) and shows that WordNet can be a useful tool in a task related to the one at hand.

4

2.3 Storytelling applicationsThe automatic generation of stories has been a subject of research for quite some time. But the first question that needs to be asked is what is a story? Some people define a story as a series of facts and events, but I find this to be a very broad definition. Jhala makes a distinction between narratives and stories in (Jhala, 2008):

“Narrative discourse is a recounting of events occurring in a story world through a communicative medium (in my case a 3D graphical world). As described above, narra- tives are about communicating transformations in (fictional) world states into a final state through a sequence of events. Certain narratives are called ”stories” while others are not. These narratives, that have stories, contain properties like settings, objects, characters, and charactersʼ beliefs, desires, plans, and goals in the story world. ”

Different models are used for generating text-based stories. One example is the agent-based model that models all characters in a story and determines their interactions based upon their internal goals. Another model is the event based model in which the stories are generated as a series of events. Pérez et al. provide a comparative overview of three programs that generate creative text-based stories in (Sharples, 2004). An important criteria for comparing the three programs is the amount of different stories that a program can generate and if this story is fundamentally different from previously generated stories.

During the research for this thesis I have found a number of other projects where the goal was to create an application that generates visual stories, but these use specially prepared sets of pictures or video fragments or user generated content tagged using a specific tagging method.

Balabanovic et Al. in (Balabanović, Chu, & Wolff, 2000) provide a way for users to tell stories using their own photographs. Fujita and Arikawa in (Fujita & Arikawa, 2009) describe an application that assists users in creating a spatially organised slideshow of pictures on a map. Multisilta et al. in (Multisilta & Mäenpää, 2008) try to generate stories from user generated video content. They note that it is difficult to create coherent stories but that better results are achieved when they opt for “a narrative structure based on jazz music as a matrix for the story generator.” Another interesting project that is closely related to this work is the Narranotation project described in (Van Kemenade, Overgoor, & Van der Weij, 2010). They describe an application where an interactive story-journey is generated for a user that letʼs the user travel to different locations and see augmented reality graffiti on these locations.

2.4 An essence of a storyIn drama theory, the foundations of a narrative have been studied as far back as Aristotleʼs Poetics (one of many translations is (Else, 1967)). One of the things he notes when analysing the theatre of his time is that the stories often revolve around two main characters that are engaged in a fundamental conflict. In the beginning of the story, something occurs that causes a confrontation between these characters to become unavoidable. The tension is built up until a big clash occurs, after which the conflict has been resolved, either by one of the two characters defeating the other, or by a previously unexpected reconciliation. These two main characters of a story are called the protagonist and the antagonist.

This simplification does not do justice to the more complex analysis of Aristotle, nor does it encompass all stories, but it does provide a very simple basis for a story. And this basis is exactly what I needed for my approach to creating visual character representations.

5

3. Method: idea, algorithm and implementationBecause of the difficulty presented by the lack of precision and completeness in the tags given to a photograph we are unable to precisely know the semantic contents of a picture, just as we have little knowledge of the atmosphere or style of the picture. This poses problems when we want to use these pictures to generate a story because we cannot accurately know what a picture represents. The methods developed for text based story generation are therefore hard to use when creating a visual story with existing pictures because these all model something that then has to be shown with the available pictures, and this will not work if we cannot be sure what the picture represents.

I then decided to begin with the material and not with a story model or structure. After examining a number of photos and their tags I found that even if it is not feasible to accurately derive the meaning and suitability of a picture from its tags, maybe it is possible to determine with a reasonable accuracy if the picture contrasts with another picture based on the tags that they are given. And a contrast between pictures can be used to represent a conflict.

The central idea of my algorithm is to use the tags of portraits to find two portraits that are suitable to use as protagonist and antagonist of a visual story. These tags will be used to find contrasting pictures and the most distinctive of these tags will be used to model the goal of this character. These tags will then be used to find additional pictures that visualise a side of the conflict and at the same time the internal and external world of the character.

To do this I measure the semantic similarity between the tags of pictures to determine the coherency of their tags and the contrast of these tags to the tags of other pictures. Both coherency and contrast between two synsets are measured using the Wu-Palmer method integrated in WordNet: this returns a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their least common subsumer (most specific ancestor node). This yields a similarity value between 0.0 and 1.0. When measuring contrast I have simply subtracted this similarity value from 1.0.

The complete algorithm is described in table 1. The tools I have used to implement my algorithm are the Natural Language Toolkit for Python (see (Bird, Loper, & Klein, 2009)), the Python Flickrapi and WordNet (see (Fellbaum, 1998)). The values of the variables in the algorithm were found to achieve the best results in a variety of locations when set accordingly:

• n, the initial amount of portraits returned by the query: 40• m, the minimum amount of tags (with a matching synset) to consider a portrait: 8• x, the amount of related pictures searched for: 50• y, the minimum amount of tags (with a matching synset) to consider a related photo: 6

Another thing to note is that the GPS locations that were given as input to the program needed to be converted to Flickr place ids to obtain enough search results. This is a problem with the way Flickr handles geotags in a query. Query results were ordered according to ʻinterestingnessʼ. This prevented getting many photos from one recent upload and helped to retrieve pictures of a better quality.

6

The GeoCharacters Algorithm:

• Search  for  n  photos  taken  at  the  location  that  contain  the  tags  ‘portrait  or  portraiture’  and  store  these  photos  with  their  tags  in  a  portraitlist.

• For  each  photo  in  portraitlist:• Try  to  find  a  matching  synset  in  WordNet  for  each  tag.

• Discard  pictures  with  less  than  m  tags.• For  each  photo  in  portraitlist:

• determine  the  coherency  of  its  synsets• determine  the  contrast  of  its  list  of  synsets  with  the  list  of  synsets  of  all  other  photos  in  portraitlist

• Determine  which  combination  of  two  pictures  yield  the  highest  value  for:contrast  between  the  two  pictures  *  coherency  of  picture  1  *  coherency  of  picture  2

• Select  these  pictures  as  the  protagonist  (P)  and  the  antagonist  (A)• For  both  P  and  A:

• For  each  synset  in  its  list:• determine  the  average  contrast  to  the  other  synsets  in  its  list,  this  is  the  internal  contrast  of  this  synset

• determine  the  average  contrast  to  the  synsets  in  the  list  of  the  other  character,  this  is  the  external  contrast

• Select  the  3  synsets  with  the  highest  value  for  (internal  constrast  *  external  contrast)

• Search  for  x  photos  that  contain  1  or  more  of  the  3  distinctive  terms  of  the  protagonist  and  store  these  photos  with  their  tags

• For  each  of  these  photos:• Try  to  find  a  matching  synset  in  WordNet  for  each  tag.

• Discard  pictures  with  less  than  y  tags.• For  each  remaining  pictures:

• Determine  the  coherency  of  its  synsets  (Coh)• Determine  the  average  similarity  of  its  synsets  with  all  synsets  of  the  protagonist  (Psim)

• Determine  the  average  similarity  of  its  synsets  with  all  synsets  of  the  antagonist  (Asim)

• Rank  these  pictures  according  to  ((Psim  -­‐  Asim)  *  Coh)• Associate  all  pictures  with  a  value  greater  than  zero  with  the  protagonist  in  order  of  the  ranking

• Search  for  x  photos  that  contain  1  or  more  of  the  3  distinctive  terms  of  the  antagonist  and  store  these  photos  with  their  tags,  discard  the  rest.

• For  each  of  these  photos:• Try  to  find  a  matching  synset  in  WordNet  for  each  tag.

• Discard  pictures  with  less  than  y  tags.• For  each  remaining  pictures:

• Determine  the  coherency  of  its  synsets  (Coh)• Determine  the  average  similarity  of  its  synsets  with  all  synsets  of  the  protagonist  (Psim)

• Determine  the  average  similarity  of  its  synsets  with  all  synsets  of  the  antagonist  (Asim)

• Rank  these  pictures  according  to  ((Psim  -­‐  Asim)  *  Coh)• Associate  all  pictures  with  a  value  less  than  zero  with  the  protagonist  in  order  of  the  ranking,  starting  with  the  pictures  with  the  smallest  value  and  discard  the  rest.

Table 1: the GeoCharacters algorithm.

7

While working on my algorithm, I expected the addition of antonyms to the search tags to be able to enhance the performance. When I implemented a function that searched for the antonym relation of all synsets related to all pictures I discovered that the program only finds very few of these antonym relations. This can be explained by the fact that a large majority of the tags on Flickr consists of nouns, and these do not usually have an antonym. Another reason can be a shortage of properly defined antonym relations in WordNet. Whatever the reason, it became apparent that expanding my algorithm to use antonyms would not yield much improvement. I therefore did not pursue it.

4. ResultsIn this section representative samples of the output generated by my implementation are shown. In figure 1, three protagonist-antagonist sets can be seen as generated by the algorithm when given GPS tags from the centre of three large British cities: Birmingham, Londen and Liverpool. In table 2, the lists of synsets associated with these pictures are shown as well as the terms that were deemed the most distinctive by the algorithm. Figure 2 to 7 show the final output of the algorithm: the portrait of the protagonist or antagonist and a set of up to nine images that best matched these portraits.

I conducted a small online query in which users were presented with four sets of photos: two of these were generated by my algorithm and can be seen in figure 2 and 3. Two other sets of photos were generated by selecting random portraits taken at the location and then searching for photos that contained one or more of the tags of these portraits. These results were linked to the photos in a random order. These ʻnormalʼ sets generated by tag-based search can be seen in appendix 1.

Users were asked to rate the coherency of the sets of pictures next to a portrait on a five point scale. One meaning the images appeared to be very incoherent and five meaning the images appeared to be very coherent. Then they were asked to rate how logical the combination of the portrait with the adjoining pictures seemed to them. This was also done using a five point scale, with 1 denoting very illogical and 5 denoting very logical.

Users were finally asked which of these four sets of characters they would choose to represent two characters engaged in a conflict if they were creating a story. As it was a small scale query, the results of only 8 respondents have been evaluated, the result are shown in table 3. The respondents were young Dutch males and females in their mid twenties. Only five respondents correctly interpreted and answered the last question, but 80 % of these selected the pair of characters generated by the algorithm to use in their story.

8

City Protagonist Antagonist

Birmingham

London

Liverpool

Figure 1: the protagonist and antagonist found by the algorithm at three different locations

9

City Protagonist Antagonist

Birmingham [Synset('people.n.01'), Synset('eliot.n.01'), Synset('day.n.01'), Synset('photography.n.01'), Synset('fuji.n.01'), Synset('flash.n.01'), Synset('five.n.01'), Synset('birmingham.n.01'), Synset('coloring_material.n.01')]

selected: five, photography, fuji

[Synset('woman.n.01'), Synset('sunlight.n.01'), Synset('religion.n.01'), Synset('mother.n.01'), Synset('curate.n.01'), Synset('light.a.01'), Synset('religion.n.01'), Synset('church.n.01'), Synset('cathedral.n.01'), Synset('birmingham.n.01')]

selected: sunlight, church, cathedral

London [Synset('dinner.n.01'), Synset('ceremony.n.01'), Synset('evening.n.01'), Synset('cabaret.n.01'), Synset('baseball_club.n.01'), Synset('comedy.n.01'), Synset('comedian.n.01'), Synset('phase.n.01'), Synset('shadow.n.01'), Synset('audience.n.01')]

selected: shadow, baseball_club, dinner

[Synset('bus.n.01'), Synset('red.n.01'), Synset('london.n.01'), Synset('england.n.01'), Synset('city.n.01'), Synset('united_kingdom.n.01'), Synset('museum.n.01'), Synset('conveyance.n.03'), Synset('eleven.n.01'), Synset('front.n.01'), Synset('radiator.n.01'), Synset('wicket.n.04')]

selected: red, eleven, front

Liverpool [Synset('erasmus.n.01'), Synset('liverpool.n.01'), Synset('united_kingdom.n.01'), Synset('baseball_club.n.01'), Synset('portrayal.n.01'), Synset('people.n.01'), Synset('england.n.01'), Synset('europe.n.01'), Synset('europa.n.01')]

selected: baseball_club, europa, erasmus

[Synset('liverpool.n.01'), Synset('beatles.n.01'), Synset('festival.n.01'), Synset('music.n.01'), Synset('culture.n.01'), Synset('urban.a.01'), Synset('street.n.01'), Synset('people.n.01')]

selected: festival, music, beatles

Table 2: the synsets generated from the tags of the protagonist and antagonist for each city and the synsets that were deemed the most distinctive by the algorithm.

GeoCharacters results Tag-based search results

Perceived coherency 4.07 2.57

Perceived logicality 3.71 2.93Table 3: the results from the online query based on 8 respondents.

10

Figure 2: The Birmingham protagonist and the matching pictures as found by the algorithm

11

Figure 3: The Birmingham antagonist and the matching pictures as found by the algorithm

12

Figure 4: The Liverpool protagonist and the matching pictures as found by the algorithm

no more matching pictures

Figure 5: The Liverpool antagonist and the matching pictures as found by the algorithm

no more matching pictures found

no more matching pictures found

no more matching pictures found

no more matching pictures found

13

Figure 6: The London protagonist and the matching pictures as found by the algorithm

no more matching pictures found

Figure 7: The London antagonist and the matching pictures as found by the algorithm

14

5. DiscussionA number of things become apparent when interpreting these results. First of all the results from the online query are encouraging and seem to indicate that the algorithm does indeed generate visual representations of characters that are perceived as more coherent and more logical that tag-based search clusters of pictures. A larger test would be needed to generate statistically significant results.

The fact that 80% of the respondents would choose the protagonist-antagonist combination generated by the algorithm for use in a story can be seen as an indication that the pair-wise contrast based searching pays off. People seem to recognise the contrast between the sets of pictures, but to substantiate this claim further research is necessary.

It is also important to note that the results of the algorithm are not consistent. In some locations very few pictures, or pictures with strange tags are found. Note the illogical set of tags that is given to the picture of the policemen. Also note that the distinctive tags selected for a picture are not always the most apparent choices. I do not think that this is a problem because the criteria for grouping photos by humans can also be unclear and associative.

In figure 6 we see the same picture three times. This was caused by the fact that the same picture was uploaded to Flickr three times with a different Flick id for each upload. Also note that the algorithm was sometimes only able to find a few pictures that it could link to a portrait. This happens when the distinctive terms used to search for pictures of the protagonist and antagonist are to rare to produce enough results or when they are too common. But there are more possible causes for this behaviour of the algorithm. It is intended behaviour however, as the amount of pictures linked to the protagonist and antagonist is not set and can vary greatly from set to set. This I think is meaningful and can be used in different ways, as I will explain in my recommendations for future work. The use of WordNet also has a lot of effect on the results. The strange keywords such as ʻbaseball_clubʼ associated with some photographs can be explained by the fact that this is the first synset for the word ʻclubʼ that is defined in WordNet and the current implementation uses the first sense in wordNet when it tries to match tags to synsets. It therefore would be useful to implement a form of word sense disambiguation when mapping the tags to synsets and this will certainly have a positive effect on the performance of the algorithm. Another thing to note is that all other similarity measures in WordNet were also tested before settling on the Wu-Palmer measurement as it yielded the best results.

6. Conclusion and recommendations for future workThe results presented here are based upon a limited number of locations and responses from only a few users. An obvious extension of the current work would be to conduct a large scale user evaluation of results generated on a large amount of different locations. This will provide more insight in the robustness of the algorithm and will be able to statistically prove the performance of the algorithm.

One of the strong parts of the basic GeoCharacters algorithm for searching protagonist-antagonist pairs is that it can be used with any semantic similarity measure. It would be interesting to compare the current implementation with other implementations that use a

15

different method to measure semantic similarity between tags. Another possibility is expanding the current algorithm to include visual feature representations of the pictures. This can lead to even more coherent sets of pictures.

And of course, the next big step in this project is constructing a story that uses these characters. An interesting aspect of the current implementation is that the amount of pictures that is associated with the protagonist and the amount of pictures associated with the antagonist is not a fixed number. Some portraits can be matched to more pictures than others. These numbers can have a semantic meaning and they can be used in different ways. One obvious way may be to use the amount of pictures associated to the protagonist and the antagonist as a way to determine who wins in the conflict of the story and let the character with the most associated pictures win.

Another interesting extension of the current work would be to find a compelling way to present the pictures in such a way that they form a story. Given that we have two characters and a conflict it should be possible to create a simple story that pits the two characters against one another, builds the tension of the conflict and finally resolves the conflict. I would recommend to try a simple approach using known techniques from comics such as those discussed in (McCloud, 1994).

The current implementation should be seen as a constructive first step towards a larger goal. Though further research is required, early results indicate that the basic idea of visually representing protagonist-antagonist pairs of characters using geotagged photographs by measuring the semantic coherency and contrast of its tags is a viable one. It will be interesting to see how the characters generated by this algorithm will be used in future projects and how the algorithm will be expanded.

7. ReferencesBalabanović, M., Chu, L. L., & Wolff, G. J. (2000). Storytelling with digital photographs.

Proceedings from Proceedings of the SIGCHI conference on Human factors in computing systems.

Bird, S., Loper, E., & Klein, E. (2009). Natural language toolkit. J URL http://www. nltk. org,. Else, G. F. (1967). Aristotle: Poetics. Univ of Michigan Pr. Retrieved from http://

books.google.com/books?hl=en&lr=&id=UjSqROYikUMC&oi=fnd&pg=PA1&dq=poetics+aristotle&ots=pDOdUUmJhW&sig=Cc9ItK_pD7MM3modglqdkgySFuE

Fellbaum, C. (1998). WordNet: An electronic lexical database. MIT press Cambridge, MA. Retrieved from http://acl.ldc.upenn.edu/J/J99/J99-2008.pdf

Fujita, H., & Arikawa, M. (2009). A Story Authoring Tool with Mapped Photo Collections. Retrieved from http://cartography.tuwien.ac.at/ica/documents/ICC_proceedings/ICC2009/html/refer/13_3.pdf

Goodrum, A. A. (2000). Image information retrieval: An overview of current research. Informing Science, 3(2), 63-66. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.8589&rep=rep1&type=pdf

Hollenstein, L., & Purves, R. (2010). Exploring place through user-generated content: using Flickr to describe city cores. Journal of Spatial Information Science, 0). Retrieved from http://josis.org/index.php/josis/article/viewArticle/13

Jhala, A. H. (2008). Cinematic Discourse Generation.

16

McCloud, S. (1994). Understanding comics. Kepustakaan Populer Gramedia. Retrieved from http://digitaal.uba.uva.nl:9003/sfx_local?sid=google&auinit=S&aulast=McCloud&title=Understanding+comics&genre=book&isbn=9799023556

Multisilta, J., & Mäenpää, M. (2008). Mobile video stories. Proceedings from Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment and Arts.

Nov, O., Naaman, M., & Ye, C. (2008). What drives content tagging: the case of photos on Flickr. Proceedings from Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems.

Sharples, M. (2004). Three computer-based models of storytelling: BRUTUS, MINSTREL and MEXICA. Knowledge-based systems, 17(1), 15-29. doi:10.1016/S0950-7051(03)00048-0

Van Kemenade, P., Overgoor, J., & Van der Weij, B. (2010). Narranotation: Honours Project Report. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G. M., & Milios, E. E. (2005). Semantic

similarity methods in wordNet and their application to information retrieval on the web. Proceedings from Proceedings of the 7th annual ACM international workshop on Web information and data management.

17

Appendix 1:Shown below in figure 1 and 2 are the sets of pictures generated by tag-based search that were used as comparative material in the online query:

Figure 1: London protagonist generated using simple tag-based search

Figure 2: London antagonist generated using simple tag-based search

18