mobile video literacy: negotiating the use of a new visual technology

16
ORIGINAL ARTICLE Mobile video literacy: negotiating the use of a new visual technology Alexandra Weilenmann Roger Sa ¨ljo ¨ Arvid Engstro ¨m Received: 23 November 2011 / Accepted: 28 September 2012 / Published online: 26 September 2013 Ó Springer-Verlag London 2013 Abstract In this article, we examine the practice of learning to produce video using a new visual technology. Drawing upon a design intervention at a science centre, where a group of teenagers tried a new prototype tech- nology for live mobile video editing, we show how the participants struggle with both the content and the form of producing videos, i.e., what to display and how to do it in a comprehensible manner. We investigate the ways in which video literacy practices are negotiated as ongoing accom- plishments and explore the communicative and material resources relied upon by participants as they create videos. Our results show that the technology is instrumental in this achievement and that as participants begin to master the prototype, they start to focus more on the narrative aspects of communicating the storyline of a science centre exhibit. The participants are explicitly concerned with such issues as how to create a comprehensible storyline for an assumed audience, what camera angles to use, how to cut and other aspects of the production of a video. We consider these observed activities to be candidate steps in an emerging mobile video literacy trajectory that involves developing a capacity to document and argue by means of this specific medium. Keywords Mobile video Á Camera phones Á Video literacy Á Media literacy Á Video analysis Á Science centres 1 Introduction Media play a vital role in the development, storage and communication of human experiences and knowledge. Through history, media have taken different forms—all the way from Upper Paleolithic cave paintings via clay tablets, parchment rolls, printed books, to present-day smartphones and digital tablets, to mention but a few examples of powerful media with important social impact, each in their own time. The process of documentation through such inventions implies externalizing experiences and present- ing them in visuographic form for others to share [8]. This creates what Donald refers to as an ‘‘external memory field’’, where information can be preserved over time and be accessible for later use. Through the capacity to develop technologies for preserving information and human expe- riences, the social memory of a society may expand without limits. Externalization of human experiences through documentary practices, in turn, relies on the use of inscriptions of various kinds: images, scripts, number systems, graphs and so on. Such inscriptions are often complex and have to be taught through explicit instruction. Human development, thus, is co-determined by such intellectual technologies, which, when appropriated, pro- vide us with the means of representing and analyzing the world, and, of course, communicating what we have found to others. Traditionally, the term literacy has been used to refer to the skills that have to do with reading, writing and the use of texts. However, recently, many scholars have argued A. Weilenmann (&) Á R. Sa ¨ljo ¨ University of Gothenburg, Gothenburg, Sweden e-mail: [email protected] R. Sa ¨ljo ¨ e-mail: [email protected] A. Engstro ¨m Mobile Life Centre, Stockholm University, Stockholm, Sweden e-mail: [email protected] 123 Pers Ubiquit Comput (2014) 18:737–752 DOI 10.1007/s00779-013-0703-x

Upload: arvid

Post on 23-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

ORIGINAL ARTICLE

Mobile video literacy: negotiating the use of a new visualtechnology

Alexandra Weilenmann • Roger Saljo •

Arvid Engstrom

Received: 23 November 2011 / Accepted: 28 September 2012 / Published online: 26 September 2013

� Springer-Verlag London 2013

Abstract In this article, we examine the practice of

learning to produce video using a new visual technology.

Drawing upon a design intervention at a science centre,

where a group of teenagers tried a new prototype tech-

nology for live mobile video editing, we show how the

participants struggle with both the content and the form of

producing videos, i.e., what to display and how to do it in a

comprehensible manner. We investigate the ways in which

video literacy practices are negotiated as ongoing accom-

plishments and explore the communicative and material

resources relied upon by participants as they create videos.

Our results show that the technology is instrumental in this

achievement and that as participants begin to master the

prototype, they start to focus more on the narrative aspects

of communicating the storyline of a science centre exhibit.

The participants are explicitly concerned with such issues

as how to create a comprehensible storyline for an assumed

audience, what camera angles to use, how to cut and other

aspects of the production of a video. We consider these

observed activities to be candidate steps in an emerging

mobile video literacy trajectory that involves developing a

capacity to document and argue by means of this specific

medium.

Keywords Mobile video � Camera phones � Video

literacy � Media literacy � Video analysis � Science

centres

1 Introduction

Media play a vital role in the development, storage and

communication of human experiences and knowledge.

Through history, media have taken different forms—all the

way from Upper Paleolithic cave paintings via clay tablets,

parchment rolls, printed books, to present-day smartphones

and digital tablets, to mention but a few examples of

powerful media with important social impact, each in their

own time. The process of documentation through such

inventions implies externalizing experiences and present-

ing them in visuographic form for others to share [8]. This

creates what Donald refers to as an ‘‘external memory

field’’, where information can be preserved over time and

be accessible for later use. Through the capacity to develop

technologies for preserving information and human expe-

riences, the social memory of a society may expand

without limits. Externalization of human experiences

through documentary practices, in turn, relies on the use of

inscriptions of various kinds: images, scripts, number

systems, graphs and so on. Such inscriptions are often

complex and have to be taught through explicit instruction.

Human development, thus, is co-determined by such

intellectual technologies, which, when appropriated, pro-

vide us with the means of representing and analyzing the

world, and, of course, communicating what we have found

to others.

Traditionally, the term literacy has been used to refer to

the skills that have to do with reading, writing and the use

of texts. However, recently, many scholars have argued

A. Weilenmann (&) � R. Saljo

University of Gothenburg, Gothenburg, Sweden

e-mail: [email protected]

R. Saljo

e-mail: [email protected]

A. Engstrom

Mobile Life Centre, Stockholm University,

Stockholm, Sweden

e-mail: [email protected]

123

Pers Ubiquit Comput (2014) 18:737–752

DOI 10.1007/s00779-013-0703-x

that the developments taking place through digital media

make it necessary to reconsider the nature of literacy and

literacy practices in contemporary society, where texts no

longer are as dominant as they have been in the past. Kress

[24] argues that the screen has become an important arena

for representing the world and that the screen affords dif-

ferent kinds of interpretive practices than texts. For

instance, texts are read in a linear fashion, while screens are

not read in this manner. Jenkins [19] uses the term trans-

media literacy to refer to the interactions between media

that characterize many activities, where narratives and

stories emerge ‘‘across multiple media platforms with each

new text making a distinctive and valuable contribution to

the whole’’. Jenkins [20] further discusses how to foster

new media literacies and highlights participatory skills

such as appropriation and performance, besides being

competent in searching, reading and judging media con-

tent. Lemke [27] suggests that transmedia literacy practices

are now essential elements of how people engage in

meaning making in many situations.

The concept of media literacy, whatever specification

one prefers, is an interesting point of departure for ana-

lyzing how people appropriate technologies for purposes of

meaning making and participation in media practices.

Media technologies—paper and pencil, typewriters, cam-

eras, video, computer software of various kinds—make it

possible to represent and communicate about the world in

increasingly complex manners. At the same time, mastery

of such tools presupposes that one makes experiences of

how they may be used to design messages. In the context of

classical literacy, for instance, one has to learn the alphabet

and grammatical and other principles of how to construe

intelligible messages. As one becomes more advanced, the

learning trajectory has to include insights into genres and

audience expectations; a scholarly piece of writing has to

be designed on the basis of principles that are different

from those that apply to a fantasy story or a piece of news.

To be put in an authoring position—i.e., to act as a designer

of media messages—generally requires more knowledge

than to be a consumer. It is easier to read a novel than to

write one, and it is easier to watch a cartoon than to make

one. As Gilmor argues, ‘‘being literate in today’s world

means more than just smarter consumption, however

actively you do that. Being literate is also about creating,

contributing and collaborating’’ [15, p. 60].

The aim of this article is to explore such learning tra-

jectories that concern what we refer to as mobile video

literacy. Mobile phones are ubiquitous, and they are

becoming increasingly powerful when it comes to docu-

menting and even editing videos. Given the availability of

such resources, it may be expected that they will be used

quite widely in many settings, for instance, in the context

of education and during leisure activities. Recording and

editing, in our opinion, may be seen as examples of media

literacy practices that have to do with how to design

messages suitable for different purposes. The skills that are

relevant tor such activities have to be learned, and such

learning implies both mastery of the technologies and

increasing ones familiarity with principles of how to rep-

resent events in a manner congenial with relevant genres of

communication.

In the present article, we focus on one such new media

practice. Our primary interests concern how new users

develop skills when engaging with a new video technology.

Based on a design intervention at a science centre, where a

group of teenagers worked with a new prototype for live

video editing on mobile phones, we discuss the challenges

involved when learning to use this new technology. Being

novices to both the technology and the setting they are in,

the participants struggle with designing a comprehensible

documentation and rendition of the activities in the science

centere.

1.1 Camera phones and video work

Following the proliferation of camera phones and digital

cameras, video and photography are now readily available

and easy to share in many situations. This has lead to new

practices of documentation and sharing of experiences as

they happen. ‘Camera phones enable an expanded field for

chronicling and displaying self and viewpoint to others in a

new kind of everyday visual storytelling’ [33, p. 17]. While

the body of work on the use of digital cameras and

smartphones is expanding in the research literature, we still

know relatively little about the actual work that goes into

creating and sharing visual experiences with these devices,

and about how such skills develop. It has been noted that

camera phones ‘have yet to be given the attention they

deserve by researchers (…). For example, if one compares

research done on, or involving, camera phone videos with

research done with/on Flickr, YouTube and/or Facebook, it

is striking how little has been done on the former, which is

an enormously important global phenomenon’ [7, p. 96].

Similarly, it has been argued that there is a lack of

empirical studies of the editing of visual images in specific

settings [14].

The present study is concerned with amateurs being

introduced to a new technology. While video editing skills

amongst professionals have been investigated in detail (e.g.

[6, 12, 25, 31, 32]), there has been less attention paid to the

appropriation of these skills amongst novice users [11].

With the development of new video recording and editing

technologies, the possibilities for amateurs to do editing

work are increasing. Basic video recording and editing

software is now included in most mobile phones, making

video tools accessible for non-professional users. These

738 Pers Ubiquit Comput (2014) 18:737–752

123

phones also allow for easy online sharing, allowing easy

dissemination of videos.

Kirk et al. [23] show how video production is an

explicitly social process in all aspects of its use, and

how its end use is a key driver in video production.

For ‘lightweight’ devices, such as camera phones (as

opposed to ‘heavyweight’ video recorders), video is

typically captured spontaneously, shared in the

moment and primarily meaningful in the context of

that shared experience. Editing after the event is

regarded as cumbersome and happens very rarely,

which opens up a design space for more live media

editing and sharing tools. Lehmuskallio et al. [26]

confirm the differences between the ‘lightweight’,

unplanned activity of shooting with a camera phone,

on the one hand, and, on the other hand, the use of

more ‘heavyweight’ camcorder and editing practices.

They further suggest that mobile video practice is

more closely related to snapshot photography than to

traditional videography and filmmaking. David [7]

reports similar observations of everyday video prac-

tices and their consequences, noticing the spontane-

ous character of video production:

Repositories of digital pocket videos often tell stories

that feel like old spaghetti western films. Most of the

home camera phone videos lack dramatic action.

While the spectator just wants to see what will hap-

pen next, the takes are long and the desert dry.

Camera phones are an appurtenance of everyday life,

which we rarely storyboard. The images so produced,

therefore, tend to be spontaneous, at least in their

content. (ibid., 95–96)

The properties and affordances of live streaming video

have been explored in diverse contexts such as groups of

friends [34], visual production in nightclubs [10] and

emergency response work [4]. Following the emergence of

early online services for live broadcasting from mobile

phones, the content on these services has been studied [9,

12]. These studies point to some early trends and user

practices, but at the same time, they reveal that we are

dealing with an immature medium, as it were, where the

literacy practices relevant are not yet stable. All but the

most advanced users struggle with capturing and presenting

engaging content in a timely manner while filming live.

Editing and collaborative production have been sug-

gested as means to overcome this and to communicate the

skills relevant for producing more engaging visual stories

suited to the needs of various practices and settings. Iacucci

et al. [18] provide an early investigation of how camera

phones are used to enhance a shared spectator experience.

They emphasize how spectators co-experience events in

groups and how mobile imaging can be a participatory

practice enhancing the spectator experience on-site, rather

than merely documenting it. Other work with similar aims

has drawn on socially produced video to produce an

enhanced post hoc record of a live music event [22, 38].

Jokela [21] presents an early prototype and design princi-

ples for editing on a mobile device. The collaboration

around the mixing mobile video feeds has been explored in

prototype designs and live productions, revealing new af-

fordances as well as design challenges when transferring

professional production methods and tools to non-profes-

sional users [3, 10, 11].

As pointed out above, live video and editing facilities

add to the possibilities of collaboration around a mutually

shared final product [11]. The live feature allows sharing in

the moment, but at the same time, the access to a live video

feed leads to an added possibility of sharing events outside

of the local context. Designing experiences for a distance

audience, as we will show, involve considering the

awareness of the gaze of others:

Camera phones makes (sic) ubiquitous visual access

to others possible. In other words, the gaze of others

is always present as a potentiality, leading to a

heightened sense of visual awareness and a growing

centrality of images in the ongoing social exchanges

of everyday life. [33]

This awareness of a presumed audience has a number of

consequences for the camerawork. Engstrom et al. [11] argue

that ‘Video technologies support a type of collaborative gaze

in which camera users act as proxy viewers on behalf of

someone else: the eventual viewer of broadcast content’.

They call this ‘mediated looking’ and define three different

forms or positions from which to look: (a) looking editorially

(selecting what shot to choose), (b) looking together as a

team of camerapersons and (c) looking on behalf of others.

The latter category involves not just looking as would a

presumed audience but includes activities when the cam-

eraperson is looking on behalf of the editor. Licoppe and

Morel [29] specifically bring out the affordances of the

mobility of the device; how everything within the video

stream is available for scrutiny and therefore can be poten-

tially relevant to the ongoing conversation.

In many of the previous studies of the work that goes into

editing video or selecting shots to display for an audience,

live or otherwise, the focus is on professional settings, with

a professional camera crew. In this study, as in that of [11]

above, we focus on novice users of this technology in an

informal learning environment. As argued in the introduc-

tion, new forms of literacy will involve more advanced

forms of participation and production of media; thus, we

need to move on to study contexts beyond everyday usages

of these media technologies. One such context, that this

Pers Ubiquit Comput (2014) 18:737–752 739

123

paper is concerned with, is informal learning environments,

more specifically science centres. As we will show, such a

setting allows for an engagement with the content of the

exhibits using the technology, in order to create a compre-

hensible narrative from the visit. The participants are not

only new to the technology; they are also new to the setting

they are recording. This makes their task quite different

from that of a professional team, and it also makes it dif-

ferent from everyday, amateur usage of this medium.

Our point of departure is to analyse ‘video as a practical

accomplishment’ [32]. Following this, we take ‘an ana-

lytical interest in the way in which coherent images are

assembled, as well as the way in which interactional

order—as it is witnessable, accountable, and intelligible for

members (not only researchers)—is a social accomplish-

ment made possible through technological resources within

the social practices of video recording and video editing’

(ibid., 69–70). In doing so, we focus not so much on the

finished product, the film that the participants produce, but

rather on the work behind the scenes, the video production

practices. It has been noted that the common approach to

study media productions is to base the analysis on ‘the

finished products, rather than the processes through which

these products were assembled. Consequently, the design

rationales of the original composers of the objects are not

directly available to researchers, but have to be inferred’

(Greiffenhagen in press). In our analysis, we intend to

make some of these practices explicit. In addition, we

argue that this set-up is particularly useful in rendering

available for analysis the development of skills needed to

use this new video technology in productive manners.

2 Design intervention: the use of mobile video

at a science centre

As part of a larger study on the use of mobile technologies

in science centres and museums, we experimented with

mobile video as a means of giving visitors new ways of

documenting, experiencing and sharing their visits. Below,

we briefly explain the study set-up and the technology.

2.1 The study

The Mobile Vision Mixer, developed by Mobile Life Centre

as a research prototype, provides a real-time multimedia

environment through which mobile users can collaborate to

make and broadcast live video using only their mobile

phones. In this multimedia environment, several mobile

cameras record, while a director monitors the different

video feeds on a mobile phone screen and selects which one

to show ‘on air’. The final, edited output video can then be

viewed directly online or accessed on a later occasion.

A group of seven teenagers, aged 13–17, were invited to

Universeum, a large science centre in the city of Gothen-

burg, Sweden, to participate in an evaluation of the Mobile

Vision Mixer. In the data used here, participants were

asked to focus on one specific exhibit in an exhibition

called CrimeLab. This exhibit focuses on the mechanisms

behind face recognition technology. Inside the exhibit, the

person stands in front of a face scanner, which scans the

face and then informs him/her when the scan is complete.

The person can then move to another room, where a reader

is located and the system will recognize the face. Upon

recognition, the participant gets access to the room—the

face is thus used as a key. In this way, the activity called

for a certain type of sequentiality, involving several steps at

two different locations within the exhibition area.

Participants were asked to make a short video, using the

Mobile Vision Mixer, to explain the exhibit to their

classmates who had not visited the exhibition. No other

instructions on how to go about making the video were

given, except briefly explaining how to manage the pro-

totype. Participants’ output videos could be viewed online

live.

This study has dual aims: one is to evaluate this proto-

type and to receive feedback about its design as part of a

larger study on mobile live media [10–12] and the other is

to place this particular technology in an informal educa-

tional setting to allow for a discussion of its potential

benefits as a documentary practice to be used for instruc-

tional purposes. The latter aim is the focus of this paper. It

should be noted that the prototype during the evaluation

presented users with challenges that are common in field

testing of prototypes; there were several issues with reli-

ability and stability of the system. For instance, it turned

out that the system did not work entirely satisfactory

indoors because of slow connection to the servers that were

placed elsewhere. This meant that during the evaluation,

the system had to be restarted several times, and the par-

ticipants had to wait. The participants used this time to

discuss strategies for making the videos; these discussions

were recorded and are part of the analysis. The most rel-

evant technological problem to mention is that of a delay in

the video being displayed on the screen of the editor. There

was a delay before the content currently recorded by the

cameraman became visible on the screen of the editor. This

proved to be a challenge for the editor when making the

selection between screens.

Also, in field evaluations, it is common that the partic-

ipants are positive towards the technology being evaluated,

simply to please the researchers [5]. However, for our

current purposes, it is not necessary to consider the extent

to which the participants ‘liked or disliked’ the Mobile

Vision Mixer as such. We are simply focusing on the

practical accomplishments of the participants as they were

740 Pers Ubiquit Comput (2014) 18:737–752

123

struggling with using this, imperfect and sometimes cum-

bersome, new video technology.

The evaluation at the science centre was made during

one day. However, a large material forms a background for

the particular study presented here. First, an extensive

period of fieldwork, including observations at this and

other exhibit in the science centre, carried out by the first

author and two students, precedes this data collection.

Second, regarding the technology, complementary evalua-

tions of the same prototype were made during one full day

at a large skate park. Skating, being more of a leisure

activity, allowed for an interesting comparative case.

Third, the prototype has a number of predecessors, which

have been evaluated and reported on elsewhere [10, 11]. In

all, this background material allows for a rich under-

standing of the different ways in which novice users dealt

with the technology in this particular setting (Figs. 1, 2, 3).

The group members were allowed to negotiate what

roles to take. They worked in pairs—two persons for each

camera phone and two persons as editor, handling one

camera. They agreed that the boy (the younger brother of

one of the girls) should be the assistant; a term they coined

for the person interacting with the exhibits while the others

documented it. The reason for the set-up of having two

people collaborating around each phone was to get access

to their negotiations.

2.2 Methodological approach

In order to get a closer look at what constitutes ‘mobile

video literacy’ and its trajectory, we chose a micro-oriented

approach to see how video literacy practices emerge and

are negotiated as ongoing accomplishments. For these

purposes, we rely upon the form of video analysis which

has its roots in ethnomethodology [13] and Conversation

Analysis [35]. Being well-known perspectives in much

previous work of video in professional practices, these

approaches allow us to focus on video as a situated practice

and accomplishment [32]. Rather than focusing on video

literacy as a larger concept and a competence which

develops in a population over time when getting more used

to video technologies, we focus on video literacy as a

‘members’ concern’, as an interactional achievement in the

activity they are currently engaged in. The participants are

explicitly concerned with such issues as how to create a

comprehensible storyline for an assumed audience, what

camera angles to use, how to cut and other aspects of the

production of a video. What this means methodologically is

that these activities are observable in the ongoing practical

achievement of making these videos, and it is also possible

to scrutinise how these skills develop on a micro-level. By

focusing on one instance of a group exploring a new

device, we can follow in detail how they develop, in a

sequence of repeated shootings, competence in terms of

using the technology to present content.

In some sense, then, we are doing video analysis of

video analysis, in our focus on members’ work with this

visual technology:

This fuels an interest in a praxeological analysis of

ordinary and professional video practices, and of

videos as locally organized accomplishments. More-

over, looking at video as practice reveals the skilled

glance on social interaction which is embodied in

looking through the camera: video-makers’ local

orientation to the organizational features of interac-

tion is exhibited in the very way in which they shoot,

arrange and edit the video [32, p. 68].

Fig. 1 The editor (to the right) is looking at the display of her phone

while making the selection between the four different input videos. In

the background, we see one of the camerapersons (left) and the person

doing the test, the assistant

Fig. 2 The mixer version with the four different live video streams

that the editor can choose from

Pers Ubiquit Comput (2014) 18:737–752 741

123

Our video analysis renders visible the participants’

analysis, their displayed understanding, of the exhibit.

In the process of organizing the video work, and

negotiating it continuously as shootings, retakes and so

on, the participants display their understanding of the

content of the exhibit and what is relevant to present to

others. In this article, we are not focusing on how the

participants discover features of exhibits (cf. [39]),

since they have already explored the basic functionality

of the face recognition exhibit before making the

shootings. Rather, we focus on how they display their

understanding of the exhibit in the videos they make of

the exhibit.

We rely on two sets of videos: those recorded by us

as researchers, and those recorded by the participants in

the activity. As pointed out by Mondada [31], this is an

important distinction since ‘video is not a transparent

document but an embodied accomplishment, integrating

the recording and the analysis of the recorded event’

(ibid., 60). In the videos recorded by us as researchers,

we made a selection of what was relevant to focus on in

each particular situation, and in this selection, our

understanding and analysis of the event were displayed.

It is, however, the participants’ understanding, displayed

in their work of making selections of what to shoot, and

the assessments of previous shootings as well as nego-

tiations about upcoming shootings, that is in focus in

this article.

3 Unpacking mobile video literacy

Drawing upon previous work, we have tried to argue that

new forms of literacy skills will emerge as new media

technologies are introduced and taken up by users. In the

following, we will examine in more detail the ways in

which such video literacy practices are negotiated as

ongoing accomplishments. By focusing on one particular

example from a group of visitors to a science centre

exploring a new system, we can follow in detail how the

participants deal with the challenges they encounter and

become more competent in terms of using the technology

to present the content of the exhibits in the science centre.

The idea behind this design intervention is that participants

must, in order to make a comprehensible video, to some

extent, understand both the technology and the content they

are presenting, and this is visible in how they negotiate the

use of this new mobile technology (cf. [40]). These nego-

tiations were verbal and gestural, and relied upon their

placements within the exhibit as well as material resources.

Working as a camera team, the task of the camerapersons is

to provide useable and complementary footage to the edi-

tor, whose role is to assemble the individual shots into a

compelling sequence. Broth [6] describes the reflexive

work of the camerapersons and editor as proposal-accep-

tance, where the default is for each cameraperson to pro-

pose quality shots, communicated non-verbally through

their visual output.

Fig. 3 The facial recognition

exhibit at Universeum

742 Pers Ubiquit Comput (2014) 18:737–752

123

Through a number of excerpts from the data, we will

show how the participants are (a) negotiating the storyline,

(b) negotiating the performance of the actions to be

recorded and (c) negotiating the camerawork so as to

capture the actions in a relevant and comprehensible way.

We present three excerpts from a longer piece of interac-

tion where participants (a) first plan how they work to

produce the recording, (b) then, capture a video sequence,

(c) evaluate it, and, finally, (d) repeat steps A through B a

second time.

3.1 Negotiating the storyline

In this example from the design intervention, the person

performing a test of the face recognition system, here

called the assistant, (whose face is visible in Fig. 1) is

talking to the cameraperson responsible for documenting

his activity. The following conversation occurs before the

test, when they are discussing how to coordinate the work

between the camerapersons. Here, we can see how the

participants struggle with the issue of how the story of the

exhibit should be told (the form) at the same time as they

are deciding what to show (the content).

Previously, they have agreed that the assistant1 should

do what they call ‘the experiment’; i.e., use the face rec-

ognition system, first to scan his face and then move on to

use it to open the door to the ‘hotel room’. However, the

assistant is not clear about what to do once he has entered

the room—‘What the hell should I do when I get in’, he

asks (lines 100–101). The cameraperson’s answer ‘then

there will be someone filming you’ displays a focus on the

form, the presentation, but not an awareness of the fact that

there has to be ‘something’ to film, an element of a story.

The assistant therefore reinforces his point, asking what he

should do ‘in there’ for something more to happen in the

storyline. The cameraperson, however, does not provide

such a next event. Instead, she says that he ‘chill’ because

‘then it’s done then you are in’ (lines 108–109). When the

assistant is ‘in’ the room, they have reached the end of the

sequence, the storyline that they are working with and

nothing more needs to be recorded.

The cameraperson’s final remark that ‘it is the experi-

ment that we are going to show, not that you are walking

into a room’, displays an orientation to the story they are

creating, and to the final product of their work. The result

should focus on the experiment, as she calls it, which is the

procedure of testing the face recognition exhibit. However,

in displaying the exhibit, the act of ‘walking into the room’

has to be made part of the storyline. In fact, it is crucial as it

is the final part of the procedure and thus the end of, or

‘punchline’ of the storyline.

However, as it turns out later, after they have per-

formed and recorded a first sequence, the assistant

actually asks a reasonable question when wondering

what to do upon entering the room. They encounter a

problem in the creation of narrative around this exhibit

because the actual entering of the room goes too quickly

and is perceived as uneventful. Performing ‘getting into

the room’ has to be done more slowly so as to render it

a filmable activity. This will be discussed in the fol-

lowing section.

Excerpt 1 The assistant (to the left) outside the hotel room, being filmed by Cameraperson 4. Notice Cameraperson 3 behind the glass wall,

glancing towards the others, ready to film inside the room as the assistant enters. A Assistant, C4 Cameraperson 4

1 This is a term the participants themselves used, to describe the

person who was interacting with the exhibits while being

documented.

Pers Ubiquit Comput (2014) 18:737–752 743

123

3.2 Negotiating how to perform the actions to be

recorded

The following conversation takes place after the recording

of the first trial at the face recognition exhibit. The par-

ticipants briefly assess the work that was just done and

agree that the actions were performed too quickly to be

acceptable. They decide to have another go and discuss

strategies for improving their work the second time.

Capturing the full sequence of the facial recognition

involves moving over the exhibition space, from the small

room where the first system is placed, to the entrance of the

‘hotel room’, where the face is used as a key to open the

door. The participants are consequently encountering the

problem of capturing the mobility of the assistant, as he

moves between these two locations. Capturing his move-

ments provides some challenges. Right after the first

sequence has been shot, one of the editors says to one of

the camerapersons that ‘it almost feels like you didn’t

really follow’ (lines 201–203). She thereby shares verbally

an evaluation of what was made available on the mixer

screen. The assistant comes out of the room, and the

cameraperson is told to come out as well, and they all

gather around to discuss how the shooting went.

Excerpt 2 Cameraman 4 makes a quick vertical movement with her phone, mimicking the short period of time she had her phone up to record the

assistant’s actions. E1 Editor 1, E2 Editor 2, C2 Cameraperson 2, C4 Cameraperson 4, CX unidentifiable cameraperson, X unknown speaker

744 Pers Ubiquit Comput (2014) 18:737–752

123

Cameraperson 4 says that ‘it went very fast’ and does a

quick vertical movement with her phone, as if again per-

forming the video recording. In the quickness of this

exaggerated gesture, she displays the short period of time

that she needed to do her part of recording the assistant’s

movements. Cameraperson 2 agrees and also uses her body

as a resource to visualize the procedures she had to capture

the video. She does a restrained form of running-on-the-

spot, holding the phone up as if recording. She suggests

that ‘he should walk a bit slower next time’ (lines

215–216).

The discussion then moves on to negotiate how the

performance should be made when doing another

sequence. All participants go back to their positions again.

The editor discusses with one of the camerapersons how to

make the recordings. This involves showing ‘all the steps

that happen’ as Editor 1 explains to the camerapersons.

Before counting down to start again, it is emphasized again

that the event should unfold slowly.

To sum this example up, we have seen how the partic-

ipants, after having filmed a first sequence, are negotiating

the pacing of the performance, i.e., how quickly the actions

should be carried out in order to be comprehensible. As a

result of the assistant performing the walking too quickly,

the cameraperson’s work has to be done quickly as well,

something that they discuss as a problem. So the negotia-

tions of the actions to be recorded are tightly interwoven

with the bodily performance of the camerapersons, having

to keep up with the actions and capturing them, what could

analogously be called pacing of camera movements. A

trained camera operator would typically rehearse the

camera movements simultaneously as the performer

rehearses their movements. But to the novices seen here,

this is an emergent feature that is not evident before

engaging in the camerawork. Although arguably not a part

of their everyday understanding of video recording, these

skills are attainable and begin to develop in just a few

repeated attempts, as the participants engage with the

technology in this setting. The awareness of one’s own

movements as a cameraperson, and the ability to plan and

negotiate them in parallel with the covered action, is part of

becoming video literate in the broader and more involved

sense we argue for here.

3.3 Negotiating the camerawork: taking the audience

perspective

As was discussed in the background section, ‘camera

phones make ubiquitous visual access to others possible’

[33, p. 17], and this implies that ‘the gaze of others is

always present as a potentiality’ (ibid.). When capturing a

video, there is an awareness of the potential of sharing it. In

the scenario at the science centre, the participants were told

to imagine an audience of their classmates, who were not

present but who would look at this video from remote, and

be able to make sense of the exhibitions from the video

presentations. The participants’ orientation to such an

imagined audience is visible in their negotiations of the

video work. In this section, we focus on this negotiation

prior to broadcast, and how it is sequentially organized to

provide the editor with a useable multicamera set-up. The

detailed work of the ongoing camerawork and live editing

is not covered here (cf. [6, 11]).

How, then, is this orientation to an audience visible in

the material? In order to understand the following exam-

ple, there is a need to briefly explain some details about

Swedish pronouns. Here, the participants are changing

between using du (singular second-person you—tu in

French), ni (plural second-person you—vous) and man

(generic you—on). This means that when the camera-

person says what is translated into English as ‘You have

to make sure that you see the screen’ it is not ambiguous

to the participants, as it is in English, what these two

instances of you are referring to. This is of analytical

relevance, since the selection of pronoun displays an

orientation on behalf of the cameraperson to a presumed

audience: her taking the perspective of an outside ‘you’

watching the final product.

The sequence begins with general instructions on how

the recordings should be made, e.g., ‘you (singular)

should film so that one sees clearly what he is doing

then’. Later on in the sequence, we see how she moves

on to giving more direct feedback on the actual cam-

erawork, telling the cameraperson to adjust the view and

asking her to film from another angle. The pictures in

the excerpt are taken from the camera video made by

Cameraperson 1.

Pers Ubiquit Comput (2014) 18:737–752 745

123

Excerpt 3 E Editor 1, C1 Cameraperson 1, X unknown speaker, FRS voice from face recognition system

746 Pers Ubiquit Comput (2014) 18:737–752

123

In this conversation, their orientation to a presumed

audience is visible in the repeated use of formulations like

third person ‘man’ (‘one’) in Swedish, e.g., ‘otherwise one

doesn’t get the whole idea’. A salient example of how the

main editor displays an orientation to the audience, and her

looking on behalf of the audience, is found on lines

209–213. Here, the editor goes from saying ‘you have to

s—(singular you), then interrupts herself and reformulates

it as ‘one has to see’. In fact, they are both relevant for-

mulations, as the cameraperson in order to make it avail-

able for an audience also has to see it herself. Also, this

‘one’ is not just the audience but includes the editor doing

the looking on behalf of the remote audience (cf. [11]).

In this excerpt, in contrast to the previous ones, we focus

on the video streams provided as responses to the editor’s

verbal instructions about the recordings when preparing to

begin the shooting. In a series of photos from the video, we

see how the cameraperson presents a set of interpretations

of the editor’s instructions. In each shot here, she tries out a

new angle or zoom level that the editor can then reject or

accept. As Ivarsson ([17], 178) points out ‘Similar to how a

filmmaker can guide the visual attention of a viewer, by

zooming in on an object, the zoom can become a conver-

sational move in a situation like this’. By simply looking at

the pictures in this excerpt, we see that the focus moves

from capturing the person interacting with the exhibit (the

assistant) to a close-up of the screen next to the face rec-

ognition scanner.

So there are clearly some tensions in the views on how

to best capture the face recognition system to make it

comprehensible to an audience. The cameraperson begins

by filming the person from the front; visible behind him is a

sign with information about the exhibit. This is responded

to by the editor as ‘one has to see what’s going on’. This is

the same, relatively vague, formulation that she used pre-

viously. The camerapersons now pan over the other exhibit

sign behind the assistant (not visible in the pictures above),

suggesting that this sign should be in the shot as well. The

editor disagrees and now formulates herself more precisely

‘you (singular) have to film what they do on the screen’.

Emphasizing ‘screen’, she displays that this is the inter-

pretation of how to capture ‘what’s going on’ as part of a

storyline rather than merely showing the sign, as the

cameraperson suggested.

The next candidate for framing is then proposed [6] by

the cameraperson (Picture 2), recording the assistant’s back

and parts of the screen. The editor is not happy with this

solution either, emphasizing ‘no’, and continues to say that

‘you (singular) have to see what’s on the screen and that’

(lines 235–236). Picture 3 is then provided by the camera-

person as a candidate stream, with the screen being visible

alongside the face recognition scanner. The editor continues

her phrase, explaining why the screen needs to be shown:

‘or else one won’t get the whole idea’. Here, she clearly

displays an orientation to a storyline and a presumed

audience, who needs to have certain visual material pre-

sented to them in order to ‘get the idea’ of this particular

exhibition. There is then a short sequence when the editor

seems to loose her patience with the giggling cameraperson,

and tells her to focus. The final candidate shot (Picture 4)

displays a close-up of the screen, and the editor seems to

approve as they agree to finally begin the shooting.

To sum up, in this example, we have seen that there is a

specific sequencing of the turn-taking involved in arrang-

ing the video streams. The cameraperson proposes different

candidates for shots (Pictures 1–4 in the excerpt), receives

feedback from the editor and then rearranges the camera

until they agree on a framing. Then the recording can

begin. In these stepwise instructions, the cameraperson is

verbally formulating how to present visual material, and in

doing so is orienting to the storyline and the audience.

In this example, the negotiation concerns the planning of

one of the cameraperson’s camerawork. When this is in

place, the editor assesses that the camera team is set-up to

perform the broadcast. The final set-up, displaying four

distinct, complementary shots covering the action being

proposed to the editor, is shown in Fig. 4 (right). By con-

trast, the left image (Fig. 4) shows the set-up before the

negotiation, giving the editor less optimal footage to use in

editing the story. This is one example where we can see

their skills as a collaborative team evolving during the

course of the trial.

4 Discussion

We have explored issues of how content and form are

interrelated when doing video work, illustrating some

features of the process of learning to engage in this practice

of producing a video. We consider these observations as

candidate steps for an emerging ‘mobile video literacy’

trajectory, i.e., an emerging capacity to document and

argue by means of a specific medium. The notion of mobile

video literacy is developed further below. We then discuss

the communicative and material resources relied upon by

the participants in the video work. We then move on to

discuss how the visual elements are instrumental in the

creation of the storyline, and how parallel temporalities and

the mobility of people and the technology provide chal-

lenges but also resources for the participants when pro-

ducing a documentation of the exhibit.

4.1 Mobile video literacy

In this paper, we take the concept of media literacy as a

point of departure for analysing how people appropriate

Pers Ubiquit Comput (2014) 18:737–752 747

123

technologies for purposes of meaning making and partici-

pation in media practices. As media technologies, like the

Mobile Vision Mixer studied here, are becoming more

accessible, it becomes apparent that something more is

needed than mere access to the tools; they also have to be

skilfully put to use for certain purposes in certain contexts.

Also, these technologies are often reaching out to audi-

ences beyond the current situation, allowing for online

participation with people in other contexts. In this way, a

mobile video literacy involves an attention to creating a

narrative and a storyline that is comprehensible beyond the

current context. This was evident in our study when the

participants oriented to their imagined audience, producing

videos that displayed an understanding of what was a

’filmable’ event and what was not. Producing something is

one part of the participatory process involving media

technologies like this one. Gillmor [15] argues that the term

media literacy should move beyond connotation of smart

consumption, and incorporate participation and collabora-

tion practices. In this study, we have seen how the producers

are ongoingly displaying awareness to the consumers, for

instance, when discussing angles that will be understand-

able to an audience. In doing so, it is reasonable to assume

that they draw on their everyday experiences of viewing

visual media (what would be considered media literacy in

the traditional sense), but combine these experiences with

working out how to practically produce such viewable

content, in the situation at hand. The combined activity, in

the live production situation, is a process of developing

media literacy in Gillmor’s more participatory sense.

The element of liveness as a part of the media format

enables engagement and communication around the

content, between local and remote participants. This media

format allows for a process that is less linear than in more

traditional uses of video, where a video is shot in one

context, edited and shown to an audience at a later point in

time. Here, and because of the live editing and broadcast-

ing, the production phase differs radically. This allows for

two forms of collaboration around the context being doc-

umented: the local collaboration around the production,

which we have focused on in this paper, and collaboration

with an audience watching the product as it is being

broadcast. Thus, the product (i.e., the live broadcast) is a

starting point, not an end result. In a related study from

another museum, we saw how live uploading of images

from the museum was responded with comments through

social media channels, which then effectively changed the

material the participants uploaded [41]. Whereas in this

study, we did not include the consumers of the material

being produced, the scenario would easily lend itself to this

type of interaction around the produced content.

The example shows how a cameraperson interprets the

editor’s broad direction to let their viewers ‘see what is

going on’, and tries out camera movements that could

guide the viewer, camera moves of a more conversational

[17] nature. Developing such skills through practice are

arguably another part of a new video literacy, as they

acknowledge the product—the live video broadcast—as a

starting point and a topic of conversation rather than a

finished product to be consumed passively.

In the final product, the production work is (or should

be) invisible. Today, a lot of people are used to consuming

the products of video editing, but they are not necessarily

familiar with the process it takes to get there. Novices

Fig. 4 a In this picture, we see the four streams provided by the

camerapersons to the editor before negotiating the set-up. b The four

streams provided by the camerapersons after the discussion in Excerpt

3, right before beginning to shoot the sequence. The two top pictures

have now been readjusted

748 Pers Ubiquit Comput (2014) 18:737–752

123

encounter challenges involving how to coordinate the

recordings without being heard or seen in the final product,

how to choose between different angles, when to zoom and

how to design a presentation for a specific purpose. They

encounter situations that are crucial for producing live

video, e.g., mobility and temporality issues (see below).

Skills of these kinds have obvious similarities with those

that have to be developed in the context of literacy; one has

to learn skills of analysis and composition but also a range

of skills that relate to specific technologies. We saw how

our participants realized certain problems only when they

encountered them as part of the practical accomplishments

of doing the video work. Thus, mobile video literacy

emerges and develops when engaging with these media

tools.

Although the Mobile Vision Mixer is a prototype in

development, the types of systems that we are discussing

here could be one way to support an integration of media

literacy practices in educational settings. As argued by

Lewis et al. [28], ‘‘[w]hile educators have harnessed the

web to develop formal e-learning platforms, many are

struggling to unleash the power of social media to support

learning. In part, this is due to perceived difficulties in

integrating its emergent fluid forms and meanings into

highly structured learning environments’’ [28, p. 4]. In this

way, the media literacy we are describing here poses

challenges to the educational system.

We have contrasted broader approaches to media liter-

acy with an update of the term accounting for this current

shift in digital media towards involvement, sharing and

production practices [15]. We argue that as the notion of

literacy shifts towards participation and the ability to pro-

duce media content, rather than just consuming it, and as

the tools for production become more powerful and

diverse, the skills needed to participate will be increasingly

medium specific. We aim to contribute to the understand-

ing of one such medium-specific literacy and how it may

evolve. In the following sections, we outline some of the

elements of what constitutes a mobile video literacy, as one

part of the broader media literacy described in the litera-

ture. We draw on this new approach to media literacy and

our present material on use of tools for live video pro-

duction, and describe three dimensions—visual storytell-

ing, parallel temporalities and mobility—that are key in

attaining a mobile video literacy.

4.2 Creating a storyline using visual resources

We have seen how issues arise around what to document

and how to do it, when trying to produce a relevant sto-

ryline of an exhibit. The participants worked to place

themselves in the storyline of the exhibit, both in terms of

performance of actions in a timely and relevant manner,

and in terms of camerawork to display the content in a

coherent way. The Mobile Vision Mixer is a visual tech-

nology, so clearly the participants mainly rely upon visual

resources when creating their storyline. There is a chal-

lenge involved here in the sense that the visual production

has to be formulated verbally when negotiating how and

what to film. We have seen how such verbal formulations

as capturing ‘what’s going on’ have lead to some confu-

sion, which is solved by monitoring and revising the visual

output stream from the camera. In other words, the par-

ticipants resort to the verbal level in order to negotiate and

master obstacles.

Interestingly enough, the participants do not introduce

any verbal commentaries to explain what is shown in the

video. However, there is a discussion and frequent

reminders about not talking during the recording since this

will be heard on the video. So the end result is a silent

movie, where all that is heard is the face recognition sys-

tem’s output (and some noise from nearby exhibits and

other visitors).

Thus, the storyline is produced using visual resources

only. The storyline could have been created with a voice-

over, a verbal narrative. However, the participants did not

choose to add that in the material we present here. Neither

before nor after any of the retakes of the face recognition

sequence, there is any discussion about whether to include

commentaries or not. However, in another example from

the study, not presented in this article, the participants

agree that they need to verbalize parts of the exhibit during

the recording. This happens when filming an exhibition

which is in itself more text-based, a lie-detector. Before the

initiation of the recording, the cameraperson tells the

assistant that they need to read out the questions the lie-

detector system provides, and that the answers have to be

spoken out loud as well. During the actual filming, she then

reads the questions and has to remind the assistant

answering the questions, to speak out loud not just respond

in silence by pressing the button. The button pressing will

not be visible on the screen, so here she shows an orien-

tation to an audience and a need to verbalize the otherwise

invisible, textual elements of the exhibit. However, in none

of the shootings with this group did the participants create a

separate narrative voice-over to explain the content of the

film.

There could be a number of reasons why this does not

happen. First, the Mobile Vision Mixer is a visual tech-

nology. The most salient characteristic of a mobile video

tool is clearly the visual elements, the moving camera

footage. Since this technology was new to the participants,

it called for complex visual work in order to handle the four

live video streams and make it into one. Second, creating a

storyline involving a voice-over would create yet another

work task, thus adding complexity. It would involve the

Pers Ubiquit Comput (2014) 18:737–752 749

123

negotiation of a content of the verbal narrative, some sort

of script, which would have to fit into the other actions

taking place. Also, it would call for an extra role to be

introduced, a narrator. Third, it would introduce another

temporality to the video work: to time the talk with the

actions taking place. The camera movements and the

actor’s (the assistant’s) movements would have to be

matched with the delivery of the commentaries and the

content of the commentaries, thus increasing the com-

plexity. Presumably, this is a set-up that would need more

practice and experience. It would also involve timing the

verbal commentary with other actions.

These are a few aspects of the creation of a visual

narrative, using a visual technology. With this particular

technology, visual elements are particularly important but

as noted, are negotiated and are handled verbally in the

production process. Unpacking the storyline and making it

appear clear in the edited video is one important element

for the participants to learn.

4.3 Parallel temporalities

In the particular video technology used in this study, live

transmission, as opposed to post-editing, is the main fea-

ture. The whole storyline has to be completed before

assessment and, if considered necessary, before any retake.

Two parallel temporalities that have to be coordinated are

at work: that of the performed actions, constituting the

storyline, and that of the camerawork, the recording of the

actions. The participants have to consider the temporality

of the performance of the actions at the same time as they

consider their own movements to capture the events. Also,

the recording is a collaborative achievement, which adds to

the complexity, and calls for coordination between the

different camerapersons. This is the role of the directing.

The directing cannot be done during the actual filming; it

has to be done before the shooting. So before the recording

begins, the director gives the instructions of how the

camera team should handle the timing and sequentiality of

the events and the camerawork.

When they start to unpack the situation, after having

done a couple of rounds of filming, the participants dis-

cover features of the complexity of these parallel tempo-

ralities. In this way, dealing with parallel temporalities is a

developing skill that we have seen emerge throughout this

data. It is not until they have tried it that they get a sense of

the potential problems in timing the performance with the

camerawork. For instance, in the first shooting, the cam-

eraperson did not manage to keep up with the performance

of the assistant walking across the room. This leads to a

discussion of making a retake, where the actor was

instructed to walk slower in order to be captured in a

suitable manner. These challenges become apparent only

when using the technology, after the embodied experience

of having moved across the room in order to follow the

assistant’s movements. This links to another point we wish

to make in the discussion, around how the participants rely

upon different forms of mobility, and how mobility con-

stitutes a part of the visual literacy skills that they need to

grasp.

4.4 Mobility

Because of the mobility of all parts of this system, i.e., the

capturing as well as the mixing camera phones, the par-

ticipants can easily move around to document different

parts of the exhibit. The placement and mobility of the

camerapersons are crucial aspects of the documentation

process. As the participants become more familiar with the

technology, they realize that they need to distribute them-

selves across the exhibition area in order to cover different

aspects of the event. The storyline of the face recognition

documentation builds on the local mobility [2] of the

assistant, moving from the initial face scanning to the

unlocking of the ‘‘hotel room’’. The assistant, who is doing

the procedures of the exhibit, has to move in order to

perform the different steps of the exhibition activity they

are filming, and consequently, the camerapersons have to

be in different locations to document these steps. When

placing themselves within the storyline of the exhibit, they

are also placing themselves within the space of the exhi-

bition area where relevant parts of the storyline takes place.

In this way, when recording the entering into the ’hotel

room’, one cameraperson is placed inside the room and one

outside in order to capture the assistant’s movements

through the space of the storyline. This is something that

evolved after a couple of trials. In the beginning, every-

body stood next to each other, recording the same object,

but after some time they started to discuss how to distribute

themselves to capture the activity from different locations.

This illustrates how the participants learn about some of the

considerations that have to be taken into account when

making a video.

Related to mobility is the timing of movements.

Movements throughout the storyline have to be done in a

certain pace. The participants progressively fine-tune their

movements of their bodies through space, as well as of the

camera. This is an embodied experience, where awareness

of their bodily mobilities is raised when the participants

have tried a first round of filming. They then realize that the

pacing of the performance has to be changed, and the

camerapersons need to keep up with that pacing in order to

be at the right location to document it, at the right time. In

relating their experiences of the first round, one of the

camerapersons is also using her body as a resource to show

the challenge in this documentation process.

750 Pers Ubiquit Comput (2014) 18:737–752

123

Another related aspect of the mobility of the device is

the micro-mobility [30] of the camera phones, where we

saw how fine adjustments of the camera angles were

negotiated amongst the participants. The micro-mobility of

the video recording devices allows a moment-by-moment

assessment of what is visible on the screen, meaning that

certain aspects of the local environment are made available

for scrutiny [29]. In this case, the participants do not get

feedback from the viewers, who are not part of this study

scenario, but they do receive feedback from each other.

Primarily, it is the camerapersons who receive feedback on

various aspects of the camerawork from the editors. This

feedback can be done as the editor is monitoring the

changes the cameraperson is doing in real time, looking at,

e.g., the new angle as it is being produced. The editor can

then ask the cameraperson to adjust this angle, in similar

ways that the participants in a Skype conversation were

found to ask the other to adjust the camera angle to get a

better view (ibid.).

5 Conclusion

Our observations illustrate some steps in the development

of literacy skills that relate to a new technology for docu-

menting, organizing and communicating information and

accounts of events. What we have shown is how students as

a group encounter problems that have to do with accom-

modating to the technology (recording and editing) and

designing a storyline (i.e., providing an interesting rendi-

tion of events documented), and with how to coordinate

these dimensions. In this learning trajectory, we see how

they struggle with critical elements and develop insights

and criteria for how the product of their work should

appear. The resistance offered served as obstacles that

generated negotiations, which, in turn, topicalized issues

that had to be dealt with including, for instance, how to

make a segment of video intelligible and interesting to an

audience. The discussions about difficulties make the

problem definitions and solutions learnable for the partic-

ipants. Participating in such practices, and in the co-

occurring analyses, increases the likelihood that skills will

develop and that learners will be able to make use of more

complex affordances of the technology in question. We

draw on this new approach to media literacy and our

present material on use of tools for live video production,

and describe three dimensions—visual storytelling, parallel

temporalities and mobility—that are key in attaining a

mobile video literacy.

As a learning task, the appropriation of these tool-

mediated literacy practices relies on collective work and

sharing of experiences in situ. Most likely, it is also

important to play different roles in such work in order to

experience the production process from different perspec-

tives and responsibilities. In comparison with more tradi-

tional learning tasks, an interesting feature of this task is

that it has a distinctive performative quality [36]. Students

are held accountable for producing a relevant rendition of

an event rather than for giving back something that is

already known, and they have to struggle with both form

and content. This is a transformation of learning tasks in

many settings that follows the increasing uses of digital

technologies where information is documented and readily

available. The ability to produce informative renditions and

productive accounts, where mobile video literacy is one

such case in point, is what is the expected outcome of such

learning. As Kress [24] points out, learning becomes

‘design’ rather than reproduction.

Appendix: Transcription notations

Based on Jefferson’s transcript notation, as related in [1].

Well Emphasis is indicated by underlining

e:hhh: Colon indicates prolonged segment

(0.3) A pause, timed in tenths of a second

(.) Pause shorter than one-tenth of a second

Overlap [] Simultaneous (overlapping) speech

- Interrupted speech

hhh Outbreath

.hh Inbreath

[what\ Spoken faster

�yes� ‘Degree’ signs enclose quieter speech

YES Capitals are spoken louder than surrounding

talk

wha- Interrupted, cut-off speech

References

1. Atkinson JM, Heritage J (1985) Structures of social action:

studies in conversation analysis. Cambridge University Press,

Cambridge

2. Bellotti V, Bly S (1996) Walking away from the desktop com-

puter: distributed collaboration and mobility in a product design

team. In: Ackerman MS (ed) Proceedings of CSCW ‘96. ACM,

New York, pp 209–218

3. Bentley FR, Groble M (2009) TuVista: meeting the multimedia

needs of mobile sports fans. In: Proceedings of MM’09, Beijing,

China, 19–24 Oct, pp 471–480

4. Bergstrand F, Landgren J (2011) Visual reporting in time-critical

work: exploring video use in emergency response. In: Proceed-

ings of MobileHCI, pp 415–424

5. Brown B, Reeves S, Sherwood S (2011) Into the wild: challenges

and opportunities for field trial methods. In: Proceedings of CHI.

ACM Press

6. Broth M (2009) Seeing through screens, hearing through speak-

ers: managing distant studio space in television control room

interaction. J Pragmat 41(10):1998–2016

Pers Ubiquit Comput (2014) 18:737–752 751

123

7. David G (2010) Camera phone images, videos and live stream-

ing: a contemporary visual trend. Vis Stud 25(1):89–98

8. Donald M (1991) Origins of the modern mind: three stages in the

evolution of culture and cognition. Harvard University Press,

Cambridge

9. Dougherty A (2011) Live-streaming mobile video: production as

civic engagement. In: Proceedings of MobileHCI, pp 425–434

10. Engstrom A, Esbjornsson M, Juhlin O (2008) Mobile collabora-

tive live video mixing. In: Proceedings of MobileHCI. ACM

Press, pp 157–166

11. Engstrom A, Perry M, Juhlin O (2012) Amateur vision and rec-

reational orientation: creating live video together. In: Proceedings

of CSCW

12. Engstrom A, Juhlin O, Perry M, Broth M (2010) Temporal

hybridity: mixing live video footage with instant replay in real

time. In: Proceedings of CHI’10 ACM Press, pp 1495–1504

13. Garfinkel H (1967) Studies in ethnomethodology. Englewood

Cliffs, NJ

14. Gilje Ø (2011) Working in tandem with editing tools: iterative

meaning-making in filmmaking practices. Vis Commun 10(1):

45–62

15. Gillmor D (2010) Mediactive. Dan Gillmor (Creative Commons)

16. Greiffenhagen C (forthc.) Visual grammar in practice: negotiat-

ing the arrangement of speech bubbles in storyboards. Semiotica

17. Ivarsson J (2010) Developing the construction sight: architectural

education and technological change. Vis Commun 9:171

18. Iacucci G, Oulasvirta A, Salovaara A, Sarvas R (2005) Sup-

porting the shared experience of spectators through mobile group

media. In: Proceedings of group. ACM Press, pp 207–216

19. Jenkins H (2006) Convergence culture: where old and new media

collide. New York University Press, New York

20. Jenkins H (2009) Confronting the challenges of participatory

culture: media education for the 21st century. MIT Press,

Cambridge

21. Jokela T, Lehikoinen JT, Korhonen H (2008) Mobile multimedia

presentation editor: enabling creation of audio-visual stories on

mobile devices. In: Proceedings of the SIGCHI conference on

human factors in computing systems (CHI’08). ACM, New York,

NY, pp 63–72

22. Kennedy L, Naaman M (2009) Less talk more rock: automated

organisation of community-contributed collections of concert

videos. In: Proceedings of WWW 2009, pp 311–320

23. Kirk D, Sellen A, Harper R, Wood K (2007) Understanding

videowork. In: Proceedings of ACM CHI, pp 61–70

24. Kress G (2003) Literacy in the New Media Age. Routledge, New

York

25. Laurier E, Brown B (2011) The reservations of the editor: the

routine work of showing and knowing the film in the edit suite.

J Soc Semiot 21(2):239–257

26. Lehmuskallio A, Sarvas R (2008) Snapshot video: everyday

photographers taking short video-clips. In: Proceedings of

NordiCHI ‘08, pp 257–265

27. Lemke JL (1998) Metamedia literacy: transforming meanings and

media. In: Reinking D, McKenna MC, Labbo LD, Kieffer RD

(eds) Handbook of literacy and technology: transformations in a

post-typographic world. Erlbaum, Mahwah, pp 283–301

28. Lewis S, Pea R, Rosen J (2010) Collaboration with mobile media:

shifting from ‘participation’ to ‘co-creation’. In: Proceedings of

WMUTE, IEEE, pp 112–116

29. Licoppe C, Morel J (2009) The collaborative work of producing

meaningful shots in mobile video telephony. In: Proceedings of

MobileHCI’09, ACM Press, pp 254–263

30. Luff P, Heath C (1998) Mobility in collaboration. In: Proceedings

of CSCW ‘98. ACM Press, pp 305–314

31. Mondada L (2003) Working with video: how surgeons produce

video records of their actions. Vis Stud 18(1):58–73

32. Mondada L (2009) Video recording practices and the reflexive

constitution of the interactional order: some systematic uses of

the split-screen technique. Hum Stud 32(1):67–99

33. Okabe D (2004) Emergent social practices, situations and rela-

tions through everyday camera phone use. In: Paper presented at

mobile communication and social change, international confer-

ence on mobile communication in Seoul, Korea, 18–19 Oct 2004

34. Reponen E (2008) Live @ Dublin: mobile phone live video group

communication experiment. In: Proceedings of EUROITV ‘08

35. Sacks H, Schegloff EA (1992) Lectures on conversation. In:

Jefferson G (ed), vol 1. Blackwell, Oxford, p 2

36. Saljo R (2010) Digital tools and challenges to institutional tra-

ditions of learning: technologies, social memory and the perfor-

mative nature of learning. J Comput Assist Learn 26(2):43–64

37. Toussi R, Zoric G, Engstrom A, Juhlin O (in submission) Mobile

vision mixer: a system for collaborative live mobile video pro-

duction, submitted manuscript, Mobile Life Centre

38. Vihavainen S, Mate S et al (2011) We want more: human–

computer collaboration in mobile social video remixing of music

concerts. In Proceedings of CHI 2011, ACM Press, pp 287–294

39. vom Lehn D, Heath C (2006) Discovering exhibits: video-based

studies of interaction in museums and science centres. In:

Knoblauch H, Schnettler B, Raab J, Soeffner H (eds) Video

analysis: methodology and methods: qualitative audiovisual data

analysis in sociology. Peter Lang Pub Inc., New York, NY,

pp 101–113

40. Weilenmann A (2001) Negotiating use: making sense of mobile

technology. Pers Ubiquit Comput 5:137–145

41. Weilenmann A, Hillman T, Jungselius B (2013) Instagram at the

museum: communicating the museum experience through social

photo sharing. In: Proceedings of the conference on human fac-

tors in computing systems, ACM Press

752 Pers Ubiquit Comput (2014) 18:737–752

123