with%applications%in%audio0visual%recognition mmml’15 ...15a_mmml_nips_… · objective...

1

Our Approach Objective Leverage source domain data to improve target domain task Multimodal Transfer Deep Learning with Applications in Audio Visual Recognition Seungwhan Moon Suyoun Kim Haohan Wang Carnegie Mellon University Input: imbalanced ( e.g. in label space) multimodal parallel datasets for training ( e.g. source: audio and target: video) MMML’15 Workshop audio:AZ video:AM train Output: a robust deep neural network for target task video recognition network test video:A–Z (some unforeseen during training) Applications Multimodal tasks with imbalanced datasets • Audiovisual recognition Lipreading recognition Invideo action recognition • Multilingual natural language learning Rare language text classification • Textimage joint learning Finetune a target network with source instances transferred at intermediate layers ... ... : output label ... ... : audio data ... ... : output label ... ... : video data ② ③ ④ ① ① ① : Train a separate model for each modality ( , ) Define activation at ith layer: ② : Learn a transfer function using source target correspondent instances ③ : Transfer auxiliary source data to target network, and compute activations at upper layers ④ : Finetune the target network with the transferred source instances Results Label Space Setup audio: full video: partial Train video: full (+transferred) Finetune Datasets: AV_Letters, Stanford AV_Letters (26 labels) Stanford (49 labels) Interpretation : intractable or less reliable transfer, finetune more layers : more reliable transfer, finetune less layers • Performance is softupperbounded by feature mapping accuracy • Tradeoff for Future work • Comparison with stateoftheart transfer learning methods (heterogeneous transfer, deep shared representation, etc.) • Artificial construction of target modality instances via topdown inference, using source modality instances video: full Test

Upload: others

Post on 26-Jul-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: with%Applications%in%Audio0Visual%Recognition MMML’15 ...15a_MMML_NIPS_… · Objective Our$Approach Leverage’source domain’data’to’improve’ target domain task Multimodal$Transfer$Deep$Learning

Our ApproachObjective

Leverage source domain data to improve target domain task

Multimodal Transfer Deep Learningwith Applications in Audio-‐Visual Recognition

Seungwhan Moon Suyoun Kim Haohan WangCarnegie Mellon University

Input: imbalanced (e.g. in label space) multimodal parallel datasets for training(e.g. source: audio and target: video)

MMML’15Workshop

audio: A - Z video: A - M

train

Output: a robust deep neural network for target task

video recognition network

test

video: A – Z (some unforeseen during training)

ApplicationsMultimodal tasks with imbalanced datasets• Audio-visual recognition- Lip-reading recognition- In-video action recognition

• Multi-lingual natural language learning- Rare language text classification

• Text-image joint learning

Fine-tune a target network with source instances transferred at intermediate layers

. . .

...

: output label

...

. . .

: audio data

. . .

...

: output label

...

. . .

: video data

②

③ ④

① ①

① : Train a separate model for each modality ( , )Define activation at i-‐th layer:

② : Learn a transfer function

using source-‐target correspondent instances

③ : Transfer auxiliary source data to target network,and compute activations at upper layers

④ : Fine-‐tune the target networkwith the transferred source instances

ResultsLabel Space Setup

audio: full video: partialTrainvideo: full (+transferred)Fine-tune

Datasets: AV_Letters, StanfordAV_Letters (26 labels)

Stanford (49 labels)

Interpretation

: intractable or less reliable transfer, fine-‐tune more layers: more reliable transfer, fine-‐tune less layers

• Performance is soft-‐upper-‐bounded by feature mapping accuracy• Trade-‐off for

Future work• Comparison with state-‐of-‐the-‐art transfer learning methods

(heterogeneous transfer, deep shared representation, etc.)

• Artificial construction of targetmodality instances via top-‐down inference, using sourcemodality instances

video: fullTest

Multimodal Transformer for Unaligned Multimodal Language ...Multimodal Transformer for Unaligned Multimodal Language Sequences Yao-Hung Hubert Tsai Carnegie Mellon University (*equal

Okvir 1. (str. 3./32) - zdravlje.gov.hr · MMML nije uređen Uredbom o plastici. Konkretnije, područje primjene Uredbe o plastici obuhvaća samo plastične slojeve u MMML-u (članak

Monitoria multimodal cerebral multimodal monitoring[2]

Music, language, and multimodal metaphorzbikowski.uchicago.edu/pdfs/Zbikowski_Multimodal_metaphor_2009.pdf · Music, language, and multimodal ... in greater detail multimodal metaphors

Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces

COMMONWEALTH OF PENNSYLVANIA JExQxslnixbt 3(mmml · COMMONWEALTH OF PENNSYLVANIA JExQxslnixbt 3(mmml WEDNESDAY, OCTOBER 15, 2003 SESSION OF 2003 187TH OF THE GENERAL ASSEMBLY No

Multimodal Transport 1. MTOGA Multimodal Transportation of Goods Act 1993 Multimodal Transport 2

Multimodal Vigilance Estimation with Adversarial Domain

Multimodal analysis2

Multilingual and multimodal aspects of “cross signing” · Multilingual and multimodal aspects of “cross-signing” – A study of emerging communication in the domain of numerals

GUIDELINES AND RECOMMENDATIONS FOR RIVER …€¦ · multimodal transport domain, this development is reflected in the RIS guidelines 2018 as one of the upcoming developments. Guidelines

MSMO: Multimodal Summarization with Multimodal Output · Figure 1: The illustration of our proposed task – Multimodal Summarization with Multimodal Output (MSMO). The image can

Web-based Multimodal Multi-domain Spoken Dialogue System · Web-based Multimodal Multi-domain Spoken Dialogue System Ridong Jiang, Rafael E. Banchs, Seokhwan Kim, Kheng Hui Yeo, Arthur

Measure of Location and Variability. Histogram Multimodal Multimodal

CHAPTER 2 2. MULTIMODAL TRANSPORTATION · CHAPTER 2 2. MULTIMODAL TRANSPORTATION 2.1 INTRODUCTION Multimodal transportation (MMT) plays a key role in international logistics. Multimodal

Transporte Multimodal

mmml-IIIIII..I DAM SAFETY PROGRAM. KERNODLE LAKE DAM ... · NCLASSIFIED mmml-IIIIII..I llll..llll.... IEHIlllllllIE IIEEEEEEEEEEE. LEVEL rI MISSOURI-KANSAS CITY BASIN 10, 6 4 ,c-KERNODLE

Multimodal presentation

Characterizing Multimedia Objects through Multimodal ...mlsapino/Sapino-documenti2006/... · ontology of the content domain. In [3], an ontology infrastructure for semantic annotation

Alfabetización multimodal: usos y posibilidades Multimodal

mmml vo2 all · 2017-04-21 · Title: mmml_vo2_all___ Created Date: 3/3/2017 8:01:14 PM

FIATA and MULTIMODAL CORIDORS - UNECE · PDF filemode or multimodal transport means), ... restrictive to use multimodal transport operators. • Brasil multimodal ... document •

Joint Wasserstein Autoencoders for Aligning Multimodal ...openaccess.thecvf.com/content_ICCVW_2019/papers/...in each domain, we enforce the latent embeddings to be similar to a Gaussian

RWV - Wing On Travel · 2016. 12. 30. · PCF higjfHFDFGDGEFJ PMQBQBOSSBIGG jq\Ko\;lnlm mmml ~E 3om*^

Workshop Programme Multimodal Corpora From Multimodal ... Corpora... · "Multimodal Corpora From Multimodal Behaviour Theories to Usable Models" ... Analysis of gesture expressivity

Monitorização multimodal

Multimodal Vigilance Estimation with Adversarial Domain ...bcmi.sjtu.edu.cn/~blu/papers/2018/2018-3.pdf · In this paper, our goal is to apply ... of several traditional domain adaptation

MMML Silvia A S 2013

Multimodal Transport - Congreso Fitaccongreso.fitac.net/.../Multimodal-Transport_UNCTADs... · Definition • "International multimodal transport" means the carriage of goods by at

Multimodal Sparse Coding for Event Detectionhtk/publication/2015-mmml... · Summary Our sparse coding-based approach •Capable of jointly training mid-level audio (e.g., MFCC) and

Multimodal Consolidation

Multimodal Literacy - Reading & Viewing Multimodal Digital Texts in Prrimary

Multimodal Unsupervised Image-to-image Translation...arate domain and use multi-domain image-to-image translation techniques to learn a mapping between each pair of modes, thus achieving

Multimodal transport

Multimodal Transformer for Unaligned Multimodal Language