table of contentsvefur.simula.no/sigmm-enews/sigmm-records-1102.pdf · currently, the 2011 season...

.

Volume 3, Number 2June 2011

Published by the Association for Computing MachinerySpecial Interest Group on Multimedia

ISSN 1947-4598http://sigmm.org/records

Table of Contents

1 Volume 3, Number 21 Editorial1 Awards for SIGMM members1 IEEE I&M Technical Award

1 MediaEval 20112 SIGMM Education Column3 Call for Bids: ACM Multimedia 20144 MPEG Internet Video Coding5 PhD thesis abstracts5 Dinesh Babu Jayagopi6 Katayoun Farrahi7 Kimiaki Shirahama9 Pinaki Sinha10 Radu Andrei Negoescu

10 Event and publication reports13 Calls for contributions13 Calls for SIGMM Sponsored and Co-sponsored Events15 Calls for Events held in cooperation with SIGMM16 Other multimedia-related Events

16 Free Material16 Huawei/3DLife Dataset for ACM Multimedia 2011 Grand Challenge

16 Job Opportunities16 PhD Open Position at Telecom ParisTech: New representation and compression

framework for 3D video

17 Back matter17 Notice to Authors17 Impressum

.

ACM SIGMM RecordsVol. 2, No. 4, December 2010 1


Volume 3, Number 2

SIGMM RecordsVolume 3, Number 2, June 2011

EditorialDear Member of the SIGMM Community,

Welcome to the second issue of the SIGMM Records in2011!

Learn about five newly concluded PhD theses in themultimedia area. Summaries of the work of five youngresearchers are included in this issue of the Records toraise your interest, and to point you to the right to lookfor further information.

You can stay up to date with the MediaEval, thebenchmarking initiative by PetaMedia, which haspublished new tasks that will be answered in September2011. MMSys 2011, on the other hand, has already beenheld in February. You can read about the experience inthis issue of the records, and follow the direct links topapers in the ACM digital library. You can of course alsofind links for the latest issues of TOMCCAP and MMSJ.

The Educational Column presents you a new andexciting book that is meant as an introductory textbookinto multimedia computing.

You find a summary of the last MPEG meeting, whichis meant as an introduction to our new column on thedevelopments in MPEG, which will feature in futureissues of the Records.

If you intend to host ACM Multimedia in North Americain 2014, you should take note of the call for bids. Therequirements and links to the relevant documents arepresented in this issue.

We can also report that one of our colleagues has beengranted an IEEE I&M Technical Award, and you canlearn about several other opportunities for your researchor career.

The EditorsStephan KopfViktor WendelLei ZhangWei Tsang OoiCarsten Griwodz

Awards for SIGMMmembers

IEEE I&M Technical Award

Professor El Saddik was awarded the IEEEInstrumentation and Measurement Society technicalAward for his outstanding contributions to multimediacomputing

Dr. El Saddik's research focus is establishing aframework of software tools to define an ambient-based collaborative haptic audio visual environment (C-HAVE). Thereby making the exchange of data in thatenvironment fully interactive and personalized, allowingusers on the Internet to experience an enhancedtelepresence, made up of images, sounds, and touch.

The I&M Society presented its Society Awards at I2MTC2011 in Binjiang, Hangzhou, China.

MediaEval 2011Authors: Martha LarsonURL: http://www.multimediaeval.org

http://www.multimediaeval.org

.

SIGMM Education Column

ISSN 1947-4598http://sigmm.org/records 2

ACM SIGMM RecordsVol. 2, No. 4, December 2010

by Martha Larson

MediaEval (http://www.multimediaeval.org/) is abenchmarking initiative that offers tasks promotingresearch and innovation on multimodal approachesto multimedia annotation and retrieval. Its focus ison speech, language, context and social aspectsof multimedia, in addition to visual content. TheMediaEval 2011 benchmarking season culminateswith the MediaEval 2011 workshop, held on 1&2September MediaEval 2011 Workshop at SantaCroce in Fossabanda, Pisa, Italy. The workshopis an official satellite event of Interspeech 2011(http://www.interspeech2011.org), the 12th AnnualConference of the International Speech CommunicationAssociation (ISCA).

Santa Croce (photo: Flickr User Marius B)

Currently, the 2011 season of MediaEval is under way.For each task, participants receive a task definition, taskdata and accompanying resources (dependent on task)such as shot boundaries, keyframes, visual features,speech transcripts and social metadata. Participantsare tackling the following tasks in the 2011 MediaEvalseason:

Genre Tagging Given a set of genre tags (how-to, interview, review etc.) and a video collection,participants are required to automatically assign genretags to each video based on a combination of modalities,i.e., speech, metadata, audio and visual (Data: CreativeCommons internet video, multiple languages mostlyEnglish)

Rich Speech Retrieval Given a set of queriesand a video collection, participants are required toautomatically identify relevant jump-in points into thevideo based on the combination of modalities, i.e.,speech, metadata, audio and visual. The task can beapproached as a multimodal task, but also as strictlya searching speech task. (Data: Creative Commonsinternet video, multiple languages mostly English)

Spoken Web Search This task involves searching FORaudio content WITHIN audio content USING an audio

content query. It is particularly interesting for speechresearchers in the area of spoken term detection. (Data:Audio from four different Indian languages -- English,Hindi, Gujarati and Telugu. Each of the ca. 400 data itemis an 8 KHz audio file 4-30 secs in length)

Affect Task: Violent Scenes Detection This taskrequires participants to deploy multimodal featuresto automatically detect portions of movies containingviolent material. Any features automatically extractedfrom the video, including the subtitles, can be used byparticipants. (Data: A set of ca. 15 Hollywood moviesthat must be purchased by the participants)

Social Event Detection This task requires participants todiscover events and detect media items that are relatedto either a specific social event or an event-class ofinterest. By social events we mean that the events areplanned by people, attended by people and that thesocial media are captured by people. (Data: A largeset of URLs of videos and images together with theirassociated metadata)

Placing Task This task involves automatically assigninggeo-coordinates to Flickr videos using one or moreof: Flickr metadata, visual content, audio content,social information (Data: Creative Commons Flickr data,predominantly English language)

MediaEval is coordinated by the EU FP7 PetaMediaNetwork of Excellence http://www.petamedia.eu andalso by the ICT Labs of EIT http://eit.ictlabs.eu/ and ismade possible by the many projects, institutions andresearchers that contribute to the organization of theindividual tasks.

For more information on MediaEval, please contactMartha Larson [email protected].

SIGMM EducationColumn

Authors: Wei Tsang OoiURL: http://www.sigmm.org/Educationby Wei Tsang Ooi

In this issue's SIGMM Education Column, we highlighta new textbook on multimedia computing, writtenby two active members of the SIGMM community,Ramesh Jain and Gerald Friedland. The textbook, titled

http://www.multimediaeval.org/

http://www.interspeech2011.org

http://www.petamedia.eu

http://eit.ictlabs.eu/

http://www.sigmm.org/Education

.

Call for Bids: ACM Multimedia 2014



``Introduction to Multimedia Computing,'' is targeted atsenior undergraduate and beginning graduate students.It is scheduled to appear in 2011 and is published byCambridge University Press.

``Introduction to Multimedia Computing'' adopts acomprehensive approach in presenting the fieldof multimedia computing. Instead of introducingmultimedia computing based on the media types(image, audio, video), this book considers multimediaas a fundamental and unique discipline that mustuse all media necessary to solve problems andpresents a unified introduction to the field by focusingon the fundamental techniques and mathematicalfoundations of multimedia computing. For instance,the book introduces lossy compression by presentingthe principles of quantization and differential coding,before showing how these principles are applied tocompression of image, audio, and video.

Even though the theoretical foundations are very similar,multimedia researchers and developers are usuallydivided along specific medium boundaries, such asspeech processing, natural language processing orcomputer vision. The ways humans process informationas well as the processing power of modern computers,however, suggests that by handling different mediasynergistically, one may understand even an individualmedia better and may design more practical engineeringsystems. Integrated processing promises improvedrobustness in many situations and is closer to whathumans do. This textbook provides such a unifiedperspective to multimedia computing for computerscientists. The authors hope that, people interested inmultimedia will rise above the media boundaries and willworry more about the content than the medium.

Of course, a field as broad as multimedia cannotbe covered comprehensively in one book as itwould require covering large parts of mathematics,physics, physiology, psychology, electrical engineering,in addition to computer science topics. The authorstherefore adopt a different approach: They introducethe main concepts in a ``capsule" form and providepseudo-code for algorithms such as ADPCM encodingor the Paeth Predictor. Very often, chapters point tounexplored possibilities and the historical reasons forthem (e.g., patent quarrels), encouraging the student toexperiment on their own. For example, given the Weber-Fechner law of logarithmic perception, why are the TVformats encoding colors and brightness linearly? Furtherliterature and web references are provided, allowing thereader to explore the topics beyond the coverage of thebook.

Topics covered in this textbook include fundamentalsof sound and light, compression, authoring, contentanalysis, retrieval, HCI, as well as privacy and security.

The authors of the textbook have made draft chaptersof the book available online for comments. The chapterscan be downloaded from http://www.mm-creole.org.

In the future, the website will provide additions to thebook chapters, which will dynamically evolve as the fieldprogresses, support for the exercises, as well as furtherand updated information on the community.

Call for Bids: ACMMultimedia 2014

Authors: Mohan Kankanhalliby Mohan Kankanhallo (National University of Singapore)

Required Bid Documents:

Two documents are required:

1. Bid Proposal: This document outlines all of the detailsexcept the budget. The proposal should contain:a. The organizing team: Names and brief bios

of General Chairs, Program Chairs and LocalArrangements Chairs. Names and brief biosof at least one chair (out of the two) eachfor Workshops, Panels, Video, Brave NewIdeas, Interactive Arts, Open Source SoftwareCompetition, Multimedia Grand Challenge,Tutorials, Doctoral Symposium, Preservation andTechnical Demos.

It is the responsibility of the General Chairsto obtain consent of all of the proposed teammembers.

Please note that the SIGMM Executive Committeemay suggest changes in the team composition forthe winning bids.

b. The Venue: the details of the proposed conferencevenue including the location, layout and facilities.The layout should facilitate maximum interactionbetween the participants. It should provide forthe normal required facilities for multimediapresentations including internet access.

Please note that the 2014 ACM MultimediaConference will be held in North America.

c. Accommodation: the bids should indicate a rangeof accommodations catering for student, academicand industry attendees with easy as well as quickaccess to the conference venue. Indicative costsshould be provided. Indicative figures for lunches/dinners and local transport costs for the locationmust be provided.

http://www.mm-creole.org

.

MPEG Internet Video Coding



d. Accessibility: the venue should be easilyaccessible to participants from Americas, Europeand Asia (the primary sources of attendees).Indicative cost of travel from these majordestinations should be provided.

e. Other aspects:i. commitments from the local government and

organizationsii. committed financial and in-kind sponsorshipsiii. institutional support for local arrangement

chairsiv. conference date in October/November which

does not clash with any major holidays or othermajor related conferences

v. social events to be held with the conferencevi. possible venue(s) for the TPC Meetingvii.any innovations to be brought into the

conferenceviii.cultural/scenic/industrial attractions

2. Tentative Budget: The entire cost of holding theconference with realistic estimated figures should beprovided. This template budget sheet should be usedfor this purpose:

http://www.comp.nus.edu.sg/~mohan/ACMMM_Budget_Template.xls

Please note that the sheet is quite detailed and youmay not have all of the information. Please try to fill itas much as possible. All committed sponsorships forconference organization, meals, student subsidy andawards must be highlighted.

Please note that estimated registration costs forACM members, non-members and students will berequired for preparing the budget.

Estimates of the number of attendees will also berequired.

Bid Evaluation Procedure:

Bids will be evaluated on the basis of:

1. Quality of the Organizing Team (both technicalstrengths and conference organization experience)

2. Quality of the Venue (facilities and accessibility)3. Affordability of the Venue (travel, stay and

registration) to the participants4. Viability of the Budget: Since SIGMM fully sponsors

this conference and it does not have reserves, theaim is to minimize the probability of making a loss andmaximize the chances of making a small surplus.

The winning bid will be decided by the SIGMM ExecutiveCommittee by vote.

Bid Submission Procedure:

Please up-load the two required documents and anyother supplementary material to a web-site. The generalchairs then should email the formal intent to hostalong with the bid documents web-site URL to theSIGMM Chair ([email protected]) and the Directorof Conferences ([email protected]) by Oct 1,2011.

Time-line:

Oct 01,2011

Bid URL to be submitted to SIGMMChair and Director of Conferences

Oct 2011 Bids open for viewing by SIGMMExecutive Committee and Feedbackabout any missing information toBidders

Nov 15, 2011 Bids open for viewing by all SIGMMMembers

Nov 29, 2011 10-min Presentation of each Bid at ACMMultimedia 2011

Nov 30, 2011 Decision by the SIGMM ExecutiveCommittee

Please note that there is a separate conferenceorganization procedure which kicks in for the winningbids whose details can be seen at: http://www.acm.org/sigs/volunteer_resources/conference_manual

MPEG Internet VideoCoding

Authors: Christian TimmererURL: http://multimediacommunication.blogspot.com/search/label/MPEGby Christian Timmerer

At its 96th meeting, MPEG issued two document inthe area of Internet video coding which are publiclyavailable:

• Draft Call for Proposals (CfP) for Internet VideoCoding Technologies

• Requirements for Internet Video Coding Technologies

Draft Call for Proposals (CfP)for Internet Video CodingTechnologies

The draft CfP comprises requirements for a proposal(i.e., on the actual submission), information on the



http://www.acm.org/sigs/volunteer_resources/conference_manual

http://www.acm.org/sigs/volunteer_resources/conference_manual

http://multimediacommunication.blogspot.com/search/label/MPEG

http://multimediacommunication.blogspot.com/search/label/MPEG

.

PhD thesis abstracts



evaluation, source code & IPR details, and the timeline.In particular, the aim of this work item is to address thediversified needs of the Internet:

• To satisfy the requirements of this application domainMPEG will evaluate the submissions and will developa specification (the Standard) that MPEG expectsshall include a profile qualified as a "Option-1licensing" and may include other profiles.

The timeline for the Call for Proposal is as follows:

• Draft Call-for-Proposals ready: 2011/03

• Final Call-for-Proposals issued: 2011/07

• Proposals received and evaluation starts: 2011/10

Option-1 Codec specification development plan:

• Committee Draft: 2012/07

• Draft International Standard: 2013/01

• Final Draft International Standard: 2013/07

Requirements for Internet VideoCoding Technologies

Requirements fall into the following major categories:

• IPR requirements

• Technical requirements

• Implementation complexity requirements

Interestingly, the standard shall provide bettercompression performance than MPEG-2 and possiblycomparable to AVC baseline profile. The resolutionshall be from QVGA to HD and various color spaces,color sampling, and bit-depth coding shall be supported.Other technical requirements include (the usual ones)high perceptual quality, random access, support fortrick modes, network friendliness, error resilience, videobuffer management, bitstream scalability, transcoding,and overlay channel. Finally, the implementationcomplexity shall allow for real-time encoding anddecoding on both stationary and mobile devices.


Dinesh Babu JayagopiComputational Modeling of Face-to-Face SocialInteraction

The computational modeling of face-to-face interactionsusing nonverbal behavioral cues is an emerging andrelevant problem in social computing. Studying face-to-face interactions in small groups helps in understandingthe basic processes of individual and group behavior;and improving team productivity and satisfaction inthe modern workplace. Apart from the verbal channel,nonverbal behavioral cues form a rich communicationchannel through which people infer - often automaticallyand unconsciously - emotions, relationships, and traitsof fellow members.There exists a solid body of knowledge about smallgroups and the multimodal nature of the nonverbalphenomenon in social psychology and nonverbalcommunication. However, the problem has only recentlybegun to be studied in the multimodal processingcommunity. A recent trend is to analyze theseinteractions in the context of face-to-face groupconversations, using multiple sensors and makeinferences automatically without the need of a humanexpert. These problems can be formulated in a machinelearning framework involving the extraction of relevantaudio, video features and the design of supervisedor unsupervised learning models. While attemptingto bridge social psychology, perception, and machinelearning, certain factors have to be considered. Firstly,various group conversation patterns emerge at differenttime-scales. For example, turn-taking patterns evolveover shorter time scales, whereas dominance or group-interest trends get established over larger time scales.Secondly, a set of audio and visual cues that arenot only relevant but also robustly computable needto be chosen. Thirdly, unlike typical machine learningproblems where ground truth is well defined, interactionmodeling involves data annotation that needs to factorin interannotator variability. Finally, principled ways ofintegrating the multimodal cues have to be investigated.In the thesis, we have investigated individual social

.




constructs in small groups like dominance and status(two facets of the so-called vertical dimension of socialrelations). In the first part of this work, we haveinvestigated how dominance perceived by externalobservers can be estimated by different nonverbal audioand video cues, and affected by annotator variability,the estimation method, and the exact task involved. Inthe second part,we jointly study perceived dominanceand role-based status to understand whether dominantpeople are the ones with high status and whetherdominance and status in small group conversationsbe automatically explained by the same nonverbalcues. We employ speaking activity, visual activity, andvisual attention cues for both the works. In the secondpart of the thesis, we have investigated group socialconstructs using both supervised and unsupervisedapproaches. We first propose a novel framework tocharacterize groups. The two-layer framework consistsof a individual layer and the group layer. At theindividual layer, the floor-occupation patterns of theindividuals are captured. At the group layer, theidentity information of the individuals is not used. Wedefine group cues by aggregating individual cues overtime and person, and use them to classify groupconversational contexts - cooperative vs competitiveand brainstorming vs decision-making. We then proposea framework to discover group interaction patternsusing probabilistic topic models. An objective evaluationof our methodology involving human judgment andmultiple annotators, showed that the learned topicsindeed are meaningful, and also that the discoveredpatterns resemble prototypical leadership styles -autocratic, participative, and free-rein - proposed insocial psychology.

Advisor(s): Thesis supervisor: Daniel Gatica-PerezSIG MM member(s): Daniel Gatica-Perezhttp://library.epfl.ch/theses/?nr=4986

Social Computing group

http://www.idiap.ch/~gatica/research-team.html

Social computing is an emerging researchdomain focused on the automatic sensing,analysis, and interpretation of human andsocial behavior from sensor data. Throughmicrophones and cameras in multi-sensorspaces, mobile phones, and the web,sensor data depicting human behaviorcan increasingly be obtained at large-scale - longitudinally and population-wise.The research group integrates models andmethods from multimedia signal processingand information systems, statistical machinelearning, ubiquitous computing, and applying

knowledge from social sciences to addressquestions related to the discovery, recognition,and prediction of short-term and long-term behavior of individuals, groups, andcommunities in real life. This can range frompeople at work having meetings, users ofsocial media sites, or people with mobilephones in urban environments. The group'sresearch methods are aimed at creatingethical, personally and socially meaningfulapplications that support social interactionand communication, in the contexts of work,leisure, healthcare, and creative expression.

Katayoun FarrahiA Probabilistic Approach to Socio-Geographic RealityMining

As we live our daily lives, our surroundings know aboutit. Our surroundings consist of people, but also ourelectronic devices. Our mobile phones, for example,continuously sense our movements and interactions.This socio-geographic data could be continuouslycaptured by hundreds of millions of people aroundthe world and promises to reveal important behavioralclues about humans in a manner never before possible.Mining patterns of human behavior from large-scalemobile phone data has deep potential impact onsociety. For example, by understanding a community'smovements and interactions, appropriate measuresmay be put in place to prevent the threat of an epidemic.The study of such human-centric massive datasetsrequires advanced mathematical models and tools. Inthis thesis, we investigate probabilistic topic modelsas unsupervised machine learning tools for large-scalesocio-geographic activity mining.We first investigate two types of probabilistic topicmodels for large-scale location-driven phone data

http://library.epfl.ch/theses/?nr=4986



.




mining. We propose a methodology based on LatentDirichlet Allocation, followed by the Author Topic Model,for the discovery of dominant location routines minedfrom the MIT Reality Mining data set containing theactivities of 97 individuals over the course of a 16month period. We investigate the many possibilities ofour proposed approach in terms of activity modeling,including differentiating users with high and low varyinglifestyles and determining when a user's activitiesfluctuate from the norm over time. We then considerboth location and interaction features from cell towerconnections and Bluetooth, in single and multimodalforms for routine discovery, where the daily routinesdiscovered contain information about the interactionsof the day in addition to the locations visited.We also propose a method for the prediction ofmissing multimodal data based on Latent DirichletAllocation. We further consider a supervised approachfor day type and student type classification usingsimilar socio-geographic features. We then proposetwo new probabilistic approaches to alleviate someof the limitations of Latent Dirichlet Allocation foractivity modeling. Large duration activities and varyingtime duration activities can not be modeled withthe initially proposed methods due to problems withinput and model parameter size explosion. We firstpropose a Multi-Level Topic Model as a method toincorporate multiple time duration sequences into aprobabilistic generative topic model. We then proposethe Pairwise-Distance Topic Model as an approachto address the problem of modeling long durationactivities with topics. Finally, we consider an applicationof our work to the study of influencing factors inhuman opinion change with mobile sensor data. Weconsider the Social Evolution Project Reality Miningdataset, and investigate other mobile phone sensorfeatures including communication logs. We considerthe difference in behaviors of individuals who changepolitical opinion and those who do not. We combineseveral types of data to form multimodal exposurefeatures, which express the exposure of individuals toothers' political opinions. We use the previously definedmethodology based on Latent Dirichlet Allocation todefine each group's behaviors in terms of their exposureto opinions, and determine statistically significantfeatures which differentiate those who change opinionsand those who do not. We also consider the differencein exposure features of individuals that increases theirinterest in politics versus those who do not. Overall, thisthesis addresses several important issues in the recentbody of work called Computational Social Science.Investigations principled on mathematical models andmultiple types of mobile phone sensor data areperformed to mine real life human activities in largescalescenarios.

Advisor(s): Daniel Gatica-Perez, thesis supervisor

SIG MM member(s): Daniel Gatica-PerezISBN number: DOI: 10.5075/epfl-thesis-5018http://library.epfl.ch/theses/?nr=5018 http://www.idiap.ch/~kfarrahi/finalthesis.pdf

Kimiaki ShirahamaIntelligent Video Processing Using Data MiningTechniques

Due to the rapidly increasing video data on the Web,much research effort has been devoted to develop videoretrieval methods which can efficiently retrieve videosof interest. Considering the limited man-power, it ismuch expected to develop retrieval methods which usefeatures automatically extracted from videos. However,since features only represent physical contents (e.g.color, edge, motion, etc.), retrieval methods requireknowledge of how to use/integrate features for retrievingvideos relevant to queries. To obtain such knowledge,this thesis concentrates on `video data mining' wherevideos are analyzed using data mining techniques whichextract previously unknown, interesting patterns inunderlying data. Thereby, patterns for retrieving relevantshots to queries are extracted as explicit knowledge.Queries can be classified into three types. For thefirst type of queries, a user can find keywords suitablefor retrieving relevant videos. For the second type ofqueries, the user cannot find such keywords due tothe lexical ambiguity, but can provide some examplevideos. For the final type of queries, the user hasneither keywords nor example videos. Thus, thisthesis develops a video retrieval system with `multi-modal' interfaces by implementing three video datamining methods to support each of the above threequery types. For the first query type, the systemprovides a `Query-By-Keyword' (QBK) interface wherepatterns which characterize videos relevant to certainkeywords are extracted. For the second query type,a `Query-By-Example' (QBE) interface is providedwhere relevant videos are retrieved based on theirsimilarities to example videos provided by the user.So, patterns for defining meaningful shot similaritiesare extracted using example videos. For the final query type, a `Query-By-Browsing' (QBB) interface isdevised where abnormal video editing patterns aredetected to characterize impressive segments in videos,so that the user can browse these videos to findkeywords or example videos. Finally, to improve retrieveperformance, the integration of QBK and QBE isexplored where informations from text and image/videomodalities are interchanged using knowledge basewhich represents relations among semantic contents.The developed video data mining methods and theintegration method are summarized as follows. Themethod for the QBK interface focuses that a certainsemantic content is presented by concatenating several

http://library.epfl.ch/theses/?nr=5018 http://www.idiap.ch/~kfarrahi/finalthesis.pdf

http://library.epfl.ch/theses/?nr=5018 http://www.idiap.ch/~kfarrahi/finalthesis.pdf

.




shots taken by different cameras. Thus, this methodextracts `sequential patterns' which relate adjacentshots relevant to certain keyword queries. Suchpatterns are extracted by connecting characteristicfeatures in adjacent shots. However, the extraction ofsequential patterns requires an expensive computationcost because a huge number of sequences of featureshave to be examined as candidates of patterns.Hence, time constraints are adopted to eliminatesemantically irrelevant sequences of features. Themethod for the QBE interface focuses on a largevariation of relevant shots. This means that even forthe same query, relevant shots contain significantlydifferent features due to varied camera techniques andsettings. Thus, `rough set theory' is used to extractmultiple patterns which characterize different subsetsof example shots. Although this pattern extractionrequires counter-example shots which are compared toexample shots, they are not provided. Hence, `partiallysupervised learning' is used to collect counter-exampleshots from a large set of shots left behind in thedatabase. Particularly, to characterize the boundarybetween relevant and irrelevant shots, the methodcollects counter-example shots which are as similar toexample shots as possible. The method for the QBBinterface assumes that impressive actions of a characterare presented by abnormal video editing patterns. Forexample, thrilling actions of the character are presentedby shots with very short durations while his/her romanticactions are presented by shots with very long durations.Based on this, the method detects `bursts' as patternsconsisting of abnormally short or long durations of thecharacter's appearance. The method firstly performs aprobabilistic time-series segmentation to divide a videointo segments characterized by distinct patterns of thecharacter's appearance. It then examines whether eachsegment contains a burst or not. The integration of QBKand QBE is achieved by constructing a `video ontology'where concepts such as Person, Car and Buildingare organized into a hierarchical structure. Specifically,this is constructed by considering the generalization/specialization relation among concepts and their co-occurrences in the same shots. Based on the videoontology, concepts related to a keyword query areselected by tracing its hierarchical structure. Shotswhere few of selected concepts are detected are filtered,and then QBE is performed on the remaining shots.Experimental results validate the effectiveness of all thedeveloped methods. In the future, the multi-modal videoretrieval system will be extended by adding a `Query-By-Gesture' (QBG) interface based on virtual realitytechniques. This enables a user to create example shotsfor any arbitrary queries by synthesizing his/her gesture,3DCG and background images.

Advisor(s): Prof. Dr. Kuniaki Uehara (supervisor)SIG MM member(s): Kimiaki Shirahama

http://www.ai.cs.kobe-u.ac.jp/~kimi/papers/shirahama_thesis.pdf

CS 24 Uehara Laboratoryat Graduate School ofSystem Informatics, KobeUniversity

http://www.ai.cs.scitec.kobe-u.ac.jp/

Our research group aims at developingfundamental and practical technologies toutilize knowledge extracted from multimediadata. To this end, we are conducting researchin broad areas of artificial intelligence, morespecifically, machine learning, video datamining, time-series data analysis, informationretrieval, trend analysis, knowledge discovery,etc. with typically a large amount of data.As a part of the research efforts, weare developing a multi-modal video retrievalsystem where different media, such as text,image, video, and audio, are analyzed usingmachine learning and data mining techniques.We formulate video retrieval as a classificationproblem to discriminate between relevant andirrelevant shots to a query. Various techniques,such as rough set theory, partially supervisedlearning, multi-task learning, and HiddenMarkov Model (HMM), are applied to theclassification. Recently, we began to developa gesture-based video retrieval system whereinformation from various sensors, includingRGB cameras, depth sensors, and magneticsensors, are fused using virtual reality andcomputer vision techniques. In addition,transfer learning and collaborative filteringare utilized to refine the video annotation.Another pillar of our research group isconcerned with more deeper analysis ofnatural language text. Our primary focus isto distill both explicit and implicit informationcontained therein. The former is generallyseen as the problems of information extraction,question answering, passage retrieval, andannotation, and the latter as hypothesisdiscovery and text mining. Explicit informationis directly described in text but not readilyaccessible by computers as it is embeddedin complex human language. On the otherhand, implicit information cannot be found ina single document and is only understood bysynthesizing knowledge fragments scatteredacross a large number of documents. We takestatistical natural language processing (NLP)-



http://www.ai.cs.scitec.kobe-u.ac.jp/

.




and machine learning-based approaches,such as kernel-based online learning andtransductive transfer learning, to tacklingthese problems. As described above, thecommon foundation underlying our researchmethodologies is machine learning, whichrequires more and more computing powerreflecting increasingly available large-scaledata and more complex algorithms. To dealwith it, we are also engaged in developingparallel machine learning frameworks usingMapReduce, MPI, Cell, and GPGPU. Theseworks are ongoing and will be shared with theresearch community soon. More details of ourresearch group can be found on our web siteat http://www.ai.cs.scitec.kobe-u.ac.jp.

Pinaki SinhaAutomatic Summarization of Personal Photo Collections

Photo taking and sharing devices (e.g., smart phones,digital cameras, etc) have become extremely popular inrecent times. Photo enthusiasts today capture momentsof their personal lives using these devices. Thishas resulted in huge collections of photos stored invarious personal archives. The exponential growth ofonline social networks and web based photo sharingplatforms have added fuel to this fire. Accordingto recent estimates [46], three billion photos areuploaded on the social network Facebook per month.This photo data overload has created some majorchallenges. One of the them is automatic generation ofrepresentative overviews from large photo collections.

Manual browsing of photo corpora is not only tedious,but also time inefficient. Hence, development of anautomatic photo summarization system is not onlya research but also a practical challenge. In thisdissertation, we present a principled approach forgeneration of size constrained overview summariesfrom large personal photo collections.We define a photo summary as an extractivesubset, which is a good representative of the largerphoto set. We propose three properties that aneffective summary should satisfy: Quality, Diversityand Coverage. Modern digital photos come withheterogeneous content and context data. We proposemodels which can combine this multimodal data tocompute the summary properties. Our summarizationobjective is modeled as an optimization of theseproperties. Further, the summarization framework canintegrate user preferences in form of inputs. Thus,different summaries may be generated from the samecorpus to accommodate preference variations amongthe users. A traditional way of intrinsic evaluation ininformation retrieval is comparing the retrieved resultset with a manually generated ground truth. However,given the variability of human behavior in selectionof appealing photos, it may be difficult and non-intuitive to generate a unique ground truth summaryof a larger data corpus. Due to the personal natureof the dataset, only the contributor of a particularphoto corpus can possibly summarize it (since personalphotos typically come with lots of background personalknowledge). While considerable efforts have beendirected towards evaluation of annotation and ranking inmultimedia, relatively few experiments have been doneto evaluate photo summaries. We conducted extensiveuser studies on summarization of photos from singlelife events. The experiments showed certain uniformityand some diversity of user preferences in generatingand evaluating photo summaries. We also posit thatphoto summaries should serve the twin objectivesof information discovery and reuse. Based on thisassumption, we propose novel objective metrics whichenables us to evaluate summaries from large personalphoto corpora without user generated ground truths.We also create dataset of personal photos along withhosts of contextual data which can be helpful in futureresearch. Our experiments show that the summarizationproperties and framework proposed can indeed be usedto generate effective summaries. This framework can beextended to include other types information (e.g., socialties among multiple users present in a dataset) and tocreate personalized photo summaries.

Advisor(s): Professor Ramesh Jain (supervisor),Professor Sharad Mehrotra (committee member),Professor Padhraic Smyth (committee member),Professor Deva Ramanan (committee member)SIG MM member(s): Ramesh Jain

.

Event and publication reports



http://www.ics.uci.edu/~psinha/research/thesis/pinaki_thesis.pdf

Radu Andrei NegoescuModeling and understanding communities in onlinesocial

The amount of multimedia content is on a constantincrease, and people interact with each other and withcontent on a daily basis through social media systems.The goal of this thesis was to model and understandemerging online communities that revolve aroundmultimedia content, more specifically photos, byusing large-scale data and probabilistic models ina quantitative approach. The disertation has fourcontributions. First, using data from two online photomanagement systems, this thesis examined differentaspects of the behavior of users of these systemspertaining to the uploading and sharing of photos withother users and online groups. Second, probabilistictopic models were used to model online entities, suchas users and groups of users, and the new proposedrepresentations were shown to be useful for furtherunderstanding such entities, as well as to have practicalapplications in search and recommendation scenarios.Third, by jointly modeling users from two different socialphoto systems, it was shown that differences at the levelof vocabulary exist, and different sharing behaviors canbe observed. Finally, by modeling online user groups asentities in a topic-based model, hyper-communities werediscovered in an automatic fashion based on varioustopic-based representations. These hyper-communitieswere shown, both through an objective and a subjectiveevaluation with a number of users, to be generallyhomogeneous, and therefore likely to constitute a viableexploration technique for online communities.

Advisor(s): Daniel Gatica-Perez, supervisorSIG MM member(s): Daniel Gatica-Perez

ISBN number: DOI: 10.5075/epfl-thesis-5059

Social computing group


Our recent work has investigated methods toanalyze small groups at work in multisensorspaces, populations of mobile phones users inurban environments, and on-line communitiesin social media.

Event and publicationreportsMMSJ, Volume 17, Issue 1,February 2010

NewsBox=Editor-in-Chief: Thomas PlagemannURL: http://www.springer.de/Published: February 2011

Papers

• Cristian Hesselman, Pablo Cesar and David Geerts:Guest editorial: Networked television

• Z. Avramova, D. De Vleeschauwer, S. Wittevrongeland H. Bruneel: Performance analysis of a cachingalgorithm for a catch-up television service

• Marcelo G. Manzato, Danilo B. Coimbra and RudineiGoularte: An enhanced content selection mechanismfor personalization of video news programmes

• Nairon S. Viana and Vicente F. de Lucena: iDTVHome Gateway convergence: an open softwaremodel integrating the Ginga middleware and the OSGiframework

• Morten Lindeberg, Stein Kristiansen, ThomasPlagemann and Vera Goebel: Challenges andtechniques for video streaming over mobile ad hocnetworks

MMSJ, Volume 17, Issue 2, March2010

NewsBox=Editor-in-Chief: Thomas PlagemannURL: http://www.springer.de/Published: March 2011

Papers





http://dx.doi.org/10.1007/s00530-010-0203-z

http://dx.doi.org/10.1007/s00530-010-0201-1

http://dx.doi.org/10.1007/s00530-010-0201-1

http://dx.doi.org/10.1007/s00530-010-0204-y

http://dx.doi.org/10.1007/s00530-010-0204-y

http://dx.doi.org/10.1007/s00530-010-0202-0

http://dx.doi.org/10.1007/s00530-010-0202-0

http://dx.doi.org/10.1007/s00530-010-0202-0

http://dx.doi.org/10.1007/s00530-010-0202-0

http://dx.doi.org/10.1007/s00530-010-0187-8

http://dx.doi.org/10.1007/s00530-010-0187-8

http://dx.doi.org/10.1007/s00530-010-0187-8

.




• Arnar Ólafsson, Björn Þór Jónsson, Laurent Amsalegand Herwig Lejsek: Dynamic behavior of balancedNV-trees

• Lei Xie, Zhong-Hua Fu, Wei Feng and Yong Luo:Pitch-density-based features and an SVM binarytree approach for multi-class audio classification inbroadcast news

• Chung-Hua Chu, De-Nian Yang, Ya-Lan Pan andMing-Syan Chen: Stabilization and extraction of 2Dbarcodes for camera phones

• Mohammed Belkhatir: A three-level architecture forbridging the image semantic gap

• Gorka Marcos, Arantza Illarramendi, Igor G. Olaizolaand Julián Flórez: A middleware to enhancecurrent multimedia retrieval systems with content-based functionalities

ACM Multimedia SystemsConference 2011

Conference Chairs: Mark Claypool, Christian Timmerer,Ketan Mayer-Patel, Ali C. BegenEvent location: San Jose, CA, USAEvent date: February 22-23, 2010URL: http://www.mmsys.orgSponsored by ACM SIG Multimedia, SIGOPS, SIGCOMMand CiscoReported by Mark Claypool, Kristian Evensen, CarstenGriwodz

The second MMSys conference was held in San Jose,CA, in February on the Cisco campus. The secondMMSys conference saw a multiplication of the numberof participants compared to the inaugural conferenceand was considered a great success by organizers andparticipants. We attribute the growth of the conferenceto several factors. MMSys becomes better know in thecommunity after replacing MMCN; co-chair ChristianTimmerer connected MMSys to the MPEG communityby organizing a special session called "MultimediaTransport: DASH"; MMSys offered a dataset track thatallows researchers to share the often-ignored work ofcollecting data and get credit for it; and conference feeswere kept extremely low thanks to the sponsorship ofour host, Cisco. An great addition was the provision oflive feeds from MMSys to remote participants throughCisco's WebEx and Cisco TV.

With 15 papers accepted for the main conference,5 long and 4 short for the special session, 5 forthe dataset track, 2 keynotes and demos by Cisco,MMSys was arranged as a 3-day event. The keynoteswere held by Alain Fiocco of Cisco and Mark Watsonof Netflix. The themes of the conference coveredmultimedia systems, real-time support for multimedia,modeling of multimedia systems, mobile multimediasystems, multimedia databases, networked games,

virtual and augmented worlds, and several more. Aspecial issue journal containing papers relevant forMMT: DASH is currently under preparation. Readersshould note that slides from all presentations and thekeynotes are available from the MMSys web page,http://www.mmsys.org.

For the audience, MMSys started with a keynote fromCisco, where Alain Fiocco presented what Cisco seesas the future of the Internet. Unsurprisingly, this includescoping with the increased amount of traffic generatedby video. As more and more multimedia devices getaccess to the internet, they believe the demand onthe core network will increase significantly and smartersolutions than what we have today will be needed. Onesuggestion was that more logic and advanced deviceswill be placed in the network.

The theme of the first session was wireless andmobile, where the papers combined wireless networkswith the capabilities of modern mobile devices (likeGPS, cameras and so on). For example, the paper"GPS-aided Recognition-based User Tracking Systemwith Augmented Reality in Extreme Large-scale Areas"was about using GPS to improve augmented reality.The second session was titled "Networking", whichcontained three papers. Then, three papers on QoSand data transmission were presented, before severaldatasets were introduced in the last session. These willbe hosted at the MMSys website and include traces fromvirtual worlds and a visual search dataset.

Day two began with a keynote by Mark Watsonfrom Netflix. He spoke about their HTTP-streamingsolution and gave details on and insight into how it

http://dx.doi.org/10.1007/s00530-010-0199-4

http://dx.doi.org/10.1007/s00530-010-0199-4

http://dx.doi.org/10.1007/s00530-010-0205-x

http://dx.doi.org/10.1007/s00530-010-0205-x

http://dx.doi.org/10.1007/s00530-010-0205-x

http://dx.doi.org/10.1007/s00530-010-0206-9

http://dx.doi.org/10.1007/s00530-010-0206-9

http://dx.doi.org/10.1007/s00530-010-0207-8

http://dx.doi.org/10.1007/s00530-010-0207-8

http://dx.doi.org/10.1007/s00530-010-0217-6

http://dx.doi.org/10.1007/s00530-010-0217-6

http://dx.doi.org/10.1007/s00530-010-0217-6

http://www.mmsys.org

http://www.mmsys.org

.




works, and what challenges they are currently trying tosolve. Several of these challenges concern the problemof understanding video quality, but also multipathstreaming.

MMSys continued with the first set of DASH-papers. Itwas a bit difficult to follow the presentations without priorknowledge. However, if you have an interest in the futureof dynamic adaptive streaming, the papers are worthchecking out. We were then shown several impressiveCisco demos, before the final session of the day tookplace. This session was about system performance,and included a very interesting presentation aboutusing flash memory as video-on-demand storage,documented in the paper "Impact of Flash Memoryon Video-on-Demand Storage: Analysis of Tradeoffs".The authors had performed several experiments andevaluated the performance of different combinations ofstorage, and concluded that while SSD was unsuitableas main storage, it was very efficient when combinedwith a traditional HDD. When using the SSD as a cache,the performance was better than HDD+DRAM, as wellas cheaper. The first session of the third day was aboutencoding, while the conference finished with a secondsession on DASH.

During the three days the conference lasted, we hadseveral interesting discussions and got many ideas.After the final presentation, everyone agreed that it hadbeen a good conference with interesting presentationsand lots of cool people. If you are interested inmultimedia systems, MMSys is highly recommended.

Papers

Wireless and mobile

• Wei Guan, Suya You, Ulrich Neumann: GPS-aided recognition-based user tracking system withaugmented reality in extreme large-scale areas

• Jia Hao, Seon Ho Kim, Sakire Arslan Ay,Roger Zimmermann: Energy-efficient mobile videomanagement using smartphones

• Navin K. Sharma, David E. Irwin, Prashant J. Shenoy,Michael Zink: MultiSense: fine-grained multiplexingfor steerable camera sensor networks

Networking

• Lawrence Stewart, David A. Hayes, GrenvilleArmitage, Michael Welzl, Andreas Petlund:Multimedia-unfriendly TCP congestion control andhome gateway queue management

• Mukundan Venkataraman, Mainak Chatterjee:Effects of internet path selection on video-QoE

• Kristian Evensen, Dominik Kaspar, Carsten Griwodz,Pål Halvorsen, Audun Hansen, Paal Engelstad:Improving the performance of quality-adaptive video

streaming over multiple heterogeneous accessnetworks

Data transmission and QoS

• Zixia Huang, Wanmin Wu, Klara Nahrstedt, RaoulRivas, Ahsan Arefin: SyncCast: synchronizeddissemination in multi-site interactive 3D tele-immersion

• Kuan-Ta Chen, Chen-Chi Wu, Yu-Chun Chang, Chin-Laung Lei: Quantifying QoS requirements of networkservices: a cheat-proof framework

• Dominik Seiler, Ernst Juhnke, Ralph Ewerth, ManfredGrauer, Bernd Freisleben: Efficient data transmissionbetween multimedia web services via aspect-orientedprogramming

Dataset track

• Yichuan Wang, Cheng-Hsin Hsu, Jatinder PalSingh, Xin Liu: Network traces of virtual worlds:measurements and applications

• Martin Ellis, Colin Perkins, Dimitrios P. Pezaros: End-to-end and network-internal measurements of real-time traffic to residential users

• Vijay R. Chandrasekhar, David M. Chen, Sam S.Tsai, Ngai-Man Cheung, Huizhong Chen, GabrielTakacs, Yuriy Reznik, Ramakrishna Vedantham,Radek Grzeszczuk, Jeff Bach, Bernd Girod: Thestanford mobile visual search data set

• Yeng-Ting Lee, Kuan-Ta Chen, Yun-Maw Cheng,Chin-Laung Lei: World of warcraft avatar historydataset

• Ricardo A. Calix, Gerald M. Knapp: Affect corpus2.0: an extension of a corpus for actor level emotionmagnitude detection

Modern media transport 1

• Thomas Stockhammer: Dynamic adaptive streamingover HTTP --: standards and design principles

• Luca De Cicco, Saverio Mascolo, Vittorio Palmisano:Feedback control for adaptive live video streaming

• Saamer Akhshabi, Ali C. Begen, ConstantineDovrolis: An experimental evaluation of rate-adaptation algorithms in adaptive streaming overHTTP

• Chenghao Liu, Imed Bouazizi, Moncef Gabbouj: Rateadaptation for adaptive HTTP streaming

System performance

• Moonkyung Ryu, Hyojun Kim, UmakishoreRamachandran: Impact of flash memory on video-on-demand storage: analysis of tradeoffs

• Samamon Khemmarat, Renjie Zhou, Lixin Gao,Michael Zink: Watching user generated videos withprefetching

http://dx.doi.org/10.1145/1943552.1943554

http://dx.doi.org/10.1145/1943552.1943554

http://dx.doi.org/10.1145/1943552.1943554

http://dx.doi.org/10.1145/1943552.1943555

http://dx.doi.org/10.1145/1943552.1943555

http://dx.doi.org/10.1145/1943552.1943556

http://dx.doi.org/10.1145/1943552.1943556

http://dx.doi.org/10.1145/1943552.1943558

http://dx.doi.org/10.1145/1943552.1943558

http://dx.doi.org/10.1145/1943552.1943559

http://dx.doi.org/10.1145/1943552.1943560

http://dx.doi.org/10.1145/1943552.1943560

http://dx.doi.org/10.1145/1943552.1943560

http://dx.doi.org/10.1145/1943552.1943562

http://dx.doi.org/10.1145/1943552.1943562

http://dx.doi.org/10.1145/1943552.1943562

http://dx.doi.org/10.1145/1943552.1943563

http://dx.doi.org/10.1145/1943552.1943563

http://dx.doi.org/10.1145/1943552.1943564

http://dx.doi.org/10.1145/1943552.1943564

http://dx.doi.org/10.1145/1943552.1943564

http://dx.doi.org/10.1145/1943552.1943566

http://dx.doi.org/10.1145/1943552.1943566

http://dx.doi.org/10.1145/1943552.1943567

http://dx.doi.org/10.1145/1943552.1943567

http://dx.doi.org/10.1145/1943552.1943567

http://dx.doi.org/10.1145/1943552.1943568

http://dx.doi.org/10.1145/1943552.1943568

http://dx.doi.org/10.1145/1943552.1943569

http://dx.doi.org/10.1145/1943552.1943569

http://dx.doi.org/10.1145/1943552.1943570

http://dx.doi.org/10.1145/1943552.1943570

http://dx.doi.org/10.1145/1943552.1943570

http://dx.doi.org/10.1145/1943552.1943572

http://dx.doi.org/10.1145/1943552.1943572

http://dx.doi.org/10.1145/1943552.1943573

http://dx.doi.org/10.1145/1943552.1943574

http://dx.doi.org/10.1145/1943552.1943574

http://dx.doi.org/10.1145/1943552.1943574

http://dx.doi.org/10.1145/1943552.1943575

http://dx.doi.org/10.1145/1943552.1943575

http://dx.doi.org/10.1145/1943552.1943577

http://dx.doi.org/10.1145/1943552.1943577

http://dx.doi.org/10.1145/1943552.1943578

http://dx.doi.org/10.1145/1943552.1943578

.

Calls for contributions



• Kewin O. Stoeckigt, Hai L. Vu, Philip Branch:Dynamic codec with priority for voice over IP in WLAN

Encoding and repair

• Khiem Quang Minh Ngo, Ravindra Guntur, Wei TsangOoi: Adaptive encoding of zoomable video streamsbased on user access pattern

• Osama Abboud, Thomas Zinner, Konstantin Pussep,Sabah Al-Sabea, Ralf Steinmetz: On the impactof quality adaptation in SVC-based P2P video-on-demand systems

• David Varodayan, Wai-tian Tan: Error-resilient livevideo multicast using low-rate visual quality feedback

Modern media transport 2

• Robert Kuschnig, Ingo Kofler, Hermann Hellwagner:Evaluation of HTTP-based request-response streamsfor internet video streaming

• Yago Sánchez de la Fuente, Thomas Schierl,Cornelius Hellge, Thomas Wiegand, Dohy Hong,Danny De Vleeschauwer, Werner Van Leekwijck,Yannick Le Louédec: iDASH: improved dynamicadaptive streaming over HTTP using scalable videocoding

• Cyril Concolato, Jean Le Feuvre, Romain Bouqueau:Usages of DASH for rich media services

• Christopher Müller, Christian Timmerer: A test-bed for the dynamic adaptive streaming over HTTPfeaturing session mobility

• Frank Hartung, Sinan Kesici, Daniel Catrein: DRMprotected dynamic adaptive HTTP streaming

TOMCCAP, Volume 7, Issue 1,January 2011

NewsBox=Editor-in-Chief: Ralf SteinmetzURL: http://tomccap.acm.org/Sponsored by ACM SIG MultimediaPublished: January 2011

Papers

• Marek Meyer, Christoph Rensing, Ralf Steinmetz:Multigranularity reuse of learning resources

• Samia Bouyakoub, Abdelkader Belkhir: SMIL builder:An incremental authoring tool for SMIL Documents

• M. Anwar Hossain, Pradeep K. Atrey, AbdulmotalebEl Saddik: Modeling and assessing quality ofinformation in multisensor multimedia monitoringsystems

• Jianke Zhu, Steven C. H. Hoi, Michael R. Lyu,Shuicheng Yan: Near-duplicate keyframe retrievalby semi-supervised learning and nonrigid imagematching

• Cheng-Hsin Hsu, Mohamed Hefeeda: A frameworkfor cross-layer optimization of video streaming inwireless networks

• Surendar Chandra, Xuwen Yu: An empirical analysisof serendipitous media sharing among campus-widewireless users

TOMCCAP, Volume 7, Issue 2,February 2011

Editor-in-Chief: Ralf SteinmetzURL: http://tomccap.acm.org/Sponsored by ACM SIG MultimediaPublished: February 2011

Papers

• Ajay Gopinathan, Zongpeng Li: Optimal layeredmulticast

• Cheng-Hsin Hsu, Mohamed Hefeeda: Usingsimulcast and scalable video coding to efficientlycontrol channel switching delay in mobile tv broadcastnetworks

• Yohan Jin, Balakrishnan Prabhakaran: Knowledgediscovery from 3D human motion streams throughsemantic dimensional reduction

• Wei Cheng, Wei Tsang Ooi, Sebastien Mondet,Romulus Grigoras, Géraldine Morin: Modelingprogressive mesh streaming: Does data dependencymatter?

• Susmit Bagchi: A fuzzy algorithm for dynamicallyadaptive multimedia streaming

• Cheng-Hsin Hsu, Mohamed Hefeeda: Statisticalmultiplexing of variable-bit-rate videos streamed tomobile devices


Calls for SIGMM Sponsoredand Co-sponsored Events

International ACM Workshopon Multimedia in Forensics andIntelligence

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://madm.dfki.de/mifor2011/

MiFor 2011 offers a forum for bringing solutions frommultimedia research into forensics and intelligence.

http://dx.doi.org/10.1145/1943552.1943579

http://dx.doi.org/10.1145/1943552.1943581

http://dx.doi.org/10.1145/1943552.1943581

http://dx.doi.org/10.1145/1943552.1943582

http://dx.doi.org/10.1145/1943552.1943582

http://dx.doi.org/10.1145/1943552.1943582

http://dx.doi.org/10.1145/1943552.1943583

http://dx.doi.org/10.1145/1943552.1943583

http://dx.doi.org/10.1145/1943552.1943585

http://dx.doi.org/10.1145/1943552.1943585

http://dx.doi.org/10.1145/1943552.1943586

http://dx.doi.org/10.1145/1943552.1943586

http://dx.doi.org/10.1145/1943552.1943586

http://dx.doi.org/10.1145/1943552.1943587

http://dx.doi.org/10.1145/1943552.1943588

http://dx.doi.org/10.1145/1943552.1943588

http://dx.doi.org/10.1145/1943552.1943588

http://dx.doi.org/10.1145/1943552.1943589

http://dx.doi.org/10.1145/1943552.1943589

http://dx.doi.org/10.1145/1870121.1870122

http://dx.doi.org/10.1145/1870121.1870123

http://dx.doi.org/10.1145/1870121.1870123

http://dx.doi.org/10.1145/1870121.1870124

http://dx.doi.org/10.1145/1870121.1870124

http://dx.doi.org/10.1145/1870121.1870124

http://dx.doi.org/10.1145/1870121.1870125

http://dx.doi.org/10.1145/1870121.1870125

http://dx.doi.org/10.1145/1870121.1870125

http://dx.doi.org/10.1145/1870121.1870126

http://dx.doi.org/10.1145/1870121.1870126

http://dx.doi.org/10.1145/1870121.1870126

http://dx.doi.org/10.1145/1870121.1870127

http://dx.doi.org/10.1145/1870121.1870127

http://dx.doi.org/10.1145/1870121.1870127

http://dx.doi.org/10.1145/1925101.1925102

http://dx.doi.org/10.1145/1925101.1925102

http://dx.doi.org/10.1145/1925101.1925103

http://dx.doi.org/10.1145/1925101.1925103

http://dx.doi.org/10.1145/1925101.1925103

http://dx.doi.org/10.1145/1925101.1925103

http://dx.doi.org/10.1145/1925101.1925104

http://dx.doi.org/10.1145/1925101.1925104

http://dx.doi.org/10.1145/1925101.1925104

http://dx.doi.org/10.1145/1925101.1925105

http://dx.doi.org/10.1145/1925101.1925105

http://dx.doi.org/10.1145/1925101.1925105

http://dx.doi.org/10.1145/1925101.1925106

http://dx.doi.org/10.1145/1925101.1925106

http://dx.doi.org/10.1145/1925101.1925107

http://dx.doi.org/10.1145/1925101.1925107

http://dx.doi.org/10.1145/1925101.1925107

http://madm.dfki.de/mifor2011/

.




International Workshop onAutomated Media Analysis andProduction for Novel TV Services

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://aiempro2011.inria.fr/

The workshop aims at exploring the applicationand evaluation of automated information extractiontechniques and audiovisual content analysis tools tosupport future media production for novel TV services.

The Third ACM SIGMM Workshopon Social Media

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://www.cais.ntu.edu.sg/~wsm2011/

The workshop seeks contributions on various aspectsof social media including papers on related theory,methodology, algorithms and issues associated to socialmedia content creation, modeling, manipulation, contentanalysis, information extraction, storage, search,learning and mining.

ACM Workshop on Social andBehavioral Networked MediaAccess

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://www.elec.qmul.ac.uk/mmv/sbnma/

The first aim of this workshop is to address the questionon how multimedia content analysis can be combinedwith information derived from behavioral modeling andsocial interactions in order to improve personalizedcontent distribution. The second relates to the way inwhich personalized services can be offered to users, in areal time, ubiquitous seamless and comprehensive way.

Workshop on SparseRepresentation for Event Detectionin Multimedia

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://research.microsoft.com/en-us/um/people/zhang/SRED11/

The goal of this workshop is to model, detect andrecognise events using sparsity analysis and theapplications that make use of sparse learning for eventanalysis in the context of multimedia data.

Third International Workshop onSocial Signal Processing

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://sspnet.eu/2011/05/wssp/

We seek to attract contributions representing the state-of-the-art efforts to develop algorithms that can processnaturally occurring human social communication,decode communicative intent, and generate theappropriate socially-adept responses.

International ACM Workshop onMultimedia access to 3D HumanObjects

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://www-rech.telecom-lille1.eu/ma3ho/

This workshop aims at taking a leap forward in emergingresearch of multimedia access of 3D human objects,aggregating together basic research in 3D graphics, 3Drecognition and retrieval.

The ACM International Workshopon Medical Multimedia Analysisand Retrieval

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://web2.utc.edu/~swf134/Service/MMAR2011.htm

The goal of this workshop is to bring togetherresearchers in the area of medical multimedia analysisand retrieval.

Story Representation Mechanismand Context

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://www.srmc2011.org/

http://aiempro2011.inria.fr/

http://www.cais.ntu.edu.sg/~wsm2011/

http://www.elec.qmul.ac.uk/mmv/sbnma/

http://research.microsoft.com/en-us/um/people/zhang/SRED11/

http://research.microsoft.com/en-us/um/people/zhang/SRED11/

http://sspnet.eu/2011/05/wssp/

http://www-rech.telecom-lille1.eu/ma3ho/

http://web2.utc.edu/~swf134/Service/MMAR2011.htm

http://web2.utc.edu/~swf134/Service/MMAR2011.htm

http://www.srmc2011.org/

.




This half-day workshop will foster discussion on issuesrelated to story generation, representation, discoveryand understanding.

Internation ACM Workshop onUbiquitous Meta User Interfaces

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://www.dai-labor.de/Ubi-MUI2011/

The aim of this workshop is to bring together differentresearch groups to foster the developments of highlyintuitive, multimedia supported meta user interfacesthat bring transparency, predictability, and control intointelligent environments.

Third ACM International Workshopon Multimedia Technologies forDistance Leaning

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://mtdl2011.mine.tku.edu.tw/

The workshop will provide a public forum forresearchers and practitioners to exchange new ideasand information regarding advancements in the state ofthe art of emerging multimedia and e-learning relatedissues from multidisciplinary perspectives.

International Workshop onInteractive Multimedia on Mobileand Portable Devices

Full paper Deadline: July 11, 2011Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://www.eecs.qmul.ac.uk/~cfshan/IMMPD.html

This workshop will bring together researchers from bothacademia and industry in domains including computervision, audio and speech processing, machine learning,pattern recognition, communications, human-computerinteraction, and media technology to share and discussrecent advances in interactive multimedia.

First International ACM Workshopon Music Information Retrievalwith User-Centered and MultimodalStrategies

Full paper Deadline: July 11, 2011

Event location: Scottsdale, AZ, USAEvent date: Nov 28 - Dec 1, 2011URL: http://mirum11.tudelft.nl/

The workshop aims to gather experts from the MusicInformation Retrieval community and neighboring fieldsat a premier multimedia venue, to initiate a cross-disciplinary dialogue on open challenges in the field ofMusic Information Retrieval with user-centered and/ormultimodal strategies.

3DLife/Huawei ACM MM GrandChallenge 2011

Full paper Deadline: August 6th, 2011Event location: Scottsdale, ArizonaEvent date: Nov 28th - Dec 1st, 2011URL: http://www.3dlife-noe.eu/3DLife/emc2/grand-challenge/

EMC2 have teamed up with Huawei to presenta Grand Challenge at ACM Multimedia 2011 on"Realistic Interaction in Online Virtual Environments".The challenge provides a captured dataset that consistsof recordings of a number of Salsa dancers froma variety of modalities, together with ground-truthannotations of the choreographies.

ACM Multimedia Systems (MMSys)

Full paper Deadline: September 19, 2010Event location: Chapel Hill, NC, USAEvent date: February 22-24, 2012URL: http://www.mmsys.org/

The ACM Multimedia Systems conference providesa forum for researchers, engineers, and scientists topresent and share their latest research findings inmultimedia systems. While research about specificaspects of multimedia systems is regularly publishedin the various proceedings and transactions of thenetworking, operating system, real-time system, anddatabase communities, MMSys aims to cut across thesedomains in the context of multimedia data types.

Calls for Events held incooperation with SIGMM

International Workshop on Networkand Systems Support for Games(NetGames)

Full paper Deadline: July 10, 2011Event location: Ottawa, Canada

http://www.dai-labor.de/Ubi-MUI2011/

http://mtdl2011.mine.tku.edu.tw/

http://www.eecs.qmul.ac.uk/~cfshan/IMMPD.html

http://mirum11.tudelft.nl/

http://www.3dlife-noe.eu/3DLife/emc2/grand-challenge/

http://www.3dlife-noe.eu/3DLife/emc2/grand-challenge/

http://www.mmsys.org/

.

Free Material



Event date: October 6-7 2011URL: http://www.discover.uottawa.ca/netgames2011/

NetGames brings together researchers andpractitioners from both academia and industry to presentthe latest research results and challenges of todaysnetworked games.

International Conference onAdvances in Mobile Computing &Multimedia (MoMM)

Full paper Deadline: July 15, 2011Event location: Hue City, VietnamEvent date: December 5-7, 2011URL: http://www.iiwas.org/conferences/momm2011/

MoMM is a leading international conference forresearchers and industry practitioners to share theirnew ideas, original research results and practicaldevelopment experiences from all mobile computingand multimedia related areas.

Asia Information RetrievalSocieties Conference

Full paper Deadline: July 17, 2011Event location: Dubai, United Arab EmiratesEvent date: December 18-20, 2011URL: http://www.uowdubai.ac.ae/airs2011/

AIRS aims to bring together researchers and developersto exchange new ideas and latest achievements inthe field of information retrieval. The scope of theconference covers applications, systems, technologiesand theory aspects of information retrieval in text, audio,image, video, and multimedia data.

Other multimedia-relatedEvents

The 18th International Conferenceon MultiMedia Modeling (MMM)

Full paper Deadline: July 22, 2011Event location: Klagenfurt, AustriaEvent date: January 4-6, 2012URL: http://mmm2012.org/

The conference calls for research papers reportingoriginal investigation results and demonstrations inareas related to multimedia modeling technologies andapplications.

IEEE International Symposium onMultimedia

Full paper Deadline: July 6, 2011Event location: Dana Point, CA, USAEvent date: December 5-7, 2011URL: http://ism.eecs.uci.edu/

ISM is an international forum for researchers toexchange information regarding advances in the stateof the art and practice of multimedia computing, as wellas to identify the emerging research topics and definethe future of multimedia computing.

Free Material

Huawei/3DLife Dataset forACM Multimedia 2011 GrandChallenge

URL: http://www.acmmm11.org/content-huawei3dlife-challenge.html

Huawai/3DLife have just released an exciting data setto the community as part of our coordination the ACMMM Grand Challenge entitled "Realistic Interaction inOnline Virtual Environments". The dataset consists ofgroundtruths/recordings of multiple Salsa dancers froma variety of modalities, which include microphones,cameras, inertial and depth sensors.

Job Opportunities

PhD Open Position atTelecom ParisTech:New representation andcompression framework for3D videoEmployer: Institut Telecom - Telecom ParisTechValid until: Autumn 011More info: http://www.tsi.telecom-paristech.fr/mm/en

3D video is widely perceived as the next majoradvancement in video technology. However, currentsolutions based on stereoscopic vision only exploitlimited depth cues. In this PhD thesis, we will explore a

http://www.discover.uottawa.ca/netgames2011/

http://www.iiwas.org/conferences/momm2011/

http://www.uowdubai.ac.ae/airs2011/

http://mmm2012.org/

http://ism.eecs.uci.edu/

http://www.acmmm11.org/content-huawei3dlife-challenge.html

http://www.acmmm11.org/content-huawei3dlife-challenge.html

http://www.tsi.telecom-paristech.fr/mm/en

.

Back matter



new representation and compression framework for 3Dvideo, with the potential to support all depth cues and togreatly enhanced user experience.

The candidate must hold a master degree in electricalengineering, computer science or equivalent. Thecandidate will have a strong knowledge in image andvideo processing, solid basis in mathematics, and goodprogramming skills in Matlab or C/C++. The candidatewill be fluent in English.

The doctoral research work will be carried outwithin the Multimedia Group (http://www.tsi.telecom-paristech.fr/mm/en/), in the Signal and ImageProcessing Department at Telecom ParisTech.Telecom ParisTech (http://www.telecom-paristech.fr/eng/telecom-paristech.html), one of France's top fivegraduate engineering schools, is considered the leadingFrench school in Information and CommunicationTechnology (ICT).

The duration of the PhD program is 3 years. Thecandidate will be supported by a fellowship for thewhole duration with a competitive salary. Starting dateis expected in Fall 2011.

To apply:

Applicants should send a motivation letterand a complete CV to Dr. Frederic Dufaux([email protected]). Applicationswill be considered until the position is filled.

Back matter

Notice to ContributingAuthors to SIG NewslettersBy submitting your article for distribution in this SpecialInterest Group publication, you hereby grant to ACM thefollowing non-exclusive, perpetual, worldwide rights:

• to publish in print on condition of acceptance by theeditor

• to digitize and post your article in the electronicversion of this publication

• to include the article in the ACM Digital Library and inany Digital Library related services

• to allow users to copy and distribute the article fornoncommercial, educational or research purposes

However, as a contributing author, you retain copyrightto your article and ACM will refer requests forrepublication directly to you.

ImpressumEditor-in-Chief

Carsten Griwodz, Simula ResearchLaboratory

Editors Stephan Kopf, University of MannheimViktor Wendel, Darmstadt University ofTechnologyLei Zhang, Microsoft Research AsiaWei Tsang Ooi, National University ofSingapore

http://www.tsi.telecom-paristech.fr/mm/en/

http://www.tsi.telecom-paristech.fr/mm/en/

http://www.telecom-paristech.fr/eng/telecom-paristech.html

http://www.telecom-paristech.fr/eng/telecom-paristech.html