devise: a deep visual-semantic embedding model

DeViSE: A Deep Visual-Semantic Embedding Model Presenters: Ji Gao, Fandi Lin

Upload: others

Post on 10-Feb-2022

19 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

DeViSE: A Deep Visual-Semantic Embedding

Model

Presenters: Ji Gao, Fandi Lin

Motivation

Visual recognition systems experience problems with large amount of categories.

● Insufficient labeled training data

● Blurred distinction between classes

How do we improve predictions of unknown categories?

Page 3: DeViSE: A Deep Visual-Semantic Embedding Model

Background

N-way discrete classifiers

● Labels treated as unrelated

● Semantic information not captured

Result: These systems cannot make zero-shot predictions without additional

information, i.e. text data.

Page 4: DeViSE: A Deep Visual-Semantic Embedding Model

Related Work

WSABIE: Linear map from image features to embedding space. Only used training

labels.

Socher et al: Linear map from image features to embedding space. Outlier

detection. Only 8 known and 2 unknown classes.

Other work that has shown zero-shot classification relies on curated information.

Page 5: DeViSE: A Deep Visual-Semantic Embedding Model

Proposed Method

Combine a traditional Visual model with a language model.

Page 6: DeViSE: A Deep Visual-Semantic Embedding Model

Proposed Method1. Train a language model for semantic information2. At the same time, train a CNN for images3. Initialize the combined model using pre-trained parameters4. Train the combined model

Page 7: DeViSE: A Deep Visual-Semantic Embedding Model

● Efficient estimation of word representations

in vector space, ICLR 2013

● Skip-gram: a generalization of n-grams

which skips the words between

● Skip-gram model: Learn a NN from a word

to predict nearby words.

Skip-gram language model

Page 8: DeViSE: A Deep Visual-Semantic Embedding Model

Skip-gram language model

Learn the relationship between labels.

● Data: 5.7 million documents (5.4 billion

words) extracted from wikipedia.org

Page 9: DeViSE: A Deep Visual-Semantic Embedding Model

CNN model● AlexNet● Winner of ILSVRC 2012● 5 conv layers

Page 10: DeViSE: A Deep Visual-Semantic Embedding Model

Combined modelUse a linear embedding layer to map the features extracted before Softmax(4096d) to match the size of the language model(500 or 1000d).

Loss function:

Page 11: DeViSE: A Deep Visual-Semantic Embedding Model

ExperimentTask:

● Image classification● Zero-shot image classification

Page 12: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment - With same label set (not zero-shot)Baselines:

● Alexnet● Random Embedding: Alexnet + a random vectors (instead of the

language model)

Page 13: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment: Zero-shotDataset:

● 2-hop: two clusters of labels● 3-hop: three clusters of labels● ImageNet2011: Use labels in ImageNet2011 that doesn’t appear in ImageNet2012

Page 14: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment: Zero-shotComparing to pure CNN:

Page 15: DeViSE: A Deep Visual-Semantic Embedding Model

Experiment: Zero-shotCompare to previous zero-shot result

Page 16: DeViSE: A Deep Visual-Semantic Embedding Model

DeViSE achieves state-of-the-art performance in classification task, and also able to do zero-shot learning.

Suitable for large amount of data, and can handle labels with not enough number of data.

Show the power of combining image and semantic data.

Conclusion

Target-Oriented Deformation of Visual-Semantic Embedding …Target-Oriented Deformation of Visual-Semantic Embedding Space Takashi Matsubara Graduate School of System Informatics,

Semantic Enrichment of Pretrained Embedding Output for

Which Embedding Level is Better for Semantic Representation? …tcci.ccf.org.cn/conference/2018/papers/34.pdf · 2018. 8. 16. · Wikipedia titles, and (2) comparing semantic similarity

NSEEN: Neural Semantic Embedding for Entity Normalization · entity normalization based on syntactic and semantic information. We compute these similarities via embedding the entity

Semantic Regularisation for Recurrent Image Annotation · supervised semantic regularisation to the embedding layer. (2) Our proposed semantic regularisation enables reliable ne-tuning

NSEEN: Neural Semantic Embedding for Entity Normalizationambite/...Semantic_Embedding_for_Entity_Normalizati… · Data Normalization, linking entities to their canonical forms, is

Towards Semantic Embedding in Visual Vocabulary

DeViSE: A Deep Visual-Semantic Embedding Modelranzato/publications/frome_nips2013.pdf · oceanic whitetip shark, sandbar shark, dusky shark, blue shark, requiem shark, and great white

Image Question Answering: A Visual Semantic Embedding ... · Image Question Answering: A Visual Semantic Embedding ... an automatic question generation ... Image Question Answering:

Semantic embedding for information retrievalceur-ws.org/Vol-1823/paper12.pdf1.2 From word embedding to document distance Abundant research has proposed to calculate the document distances

DeViSE: A Deep Visual-Semantic Embedding Model · dictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled

Subgraph-augmented Path Embedding for Semantic User Search ...shichuan.org/hin/topic/Similarity Measure/2018.WWW... · neous social network and a semantic relation (e.g., schoolmates),

Hype-HAN: Hyperbolic Hierarchical Attention Network for Semantic … · Hype-HAN: Hyperbolic Hierarchical Attention Network for Semantic Embedding Chengkun Zhang, Junbin Gao Discipline

DeViSE: A Deep Visual-Semantic Embedding Model · One solution to this problem, termed WSABIE [20], is to train a joint embedding model of both im-ages and labels, by employing an

Multi-Task Deep Visual-Semantic Embedding for …Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection Wu Liu1, Tao Mei2, Yongdong Zhang1, Cherry Che2, Jiebo Luo3,

On Clustering Algorithms: Applications in Word-Embedding ...was to generate word embedding that carries the semantic meaning of words. Word2Vec represent wo. rds . as high-dimensional

Tag-based Video Retrieval by Embedding Semantic Content in ... · embedding space, query-video similarity can be computed as a dot product in a low-dimensional space. The mapping

Exploiting Semantic Embedding and Visual Feature for Facial … · 2021. 6. 11. · Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection Huiyuan Yang1,

Graph-RISE: Graph-Regularized Image Semantic Embedding · Graph-RISE: Graph-Regularized Image Semantic Embedding Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev,

Our journey with semantic embedding

Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and

Presentacion Devise

DeViSE: A Deep Visual-Semantic Embedding Modelpapers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf · predict one of 1,000 object categories from the ILSVRC

NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model

Zero-Shot Object Detection via Learning an Embedding from ... · joint embedding space or learning an embedding from visual space to semantic space, embedding the semantic space to

Video Thumbnail Selection Deep Visual Semantic …arunv/master_thesis_defence.pdfDeep Visual Semantic Embedding for Video Thumbnail Selection Arun Balajee Vasudevan Supervisors: Michael

Sparser Johnson-Lindenstrauss Transforms · One way to speed up embedding time in the JL lemma for sparse vectors is to devise a distribu-tion over sparse embedding matrices. This

Funding map using paragraph embedding based on semantic ... · techniques, word/paragraph embedding. The embedding techniques represent words and paragraphs as real-valued vectors

LNCS 4607 - Building Semantic Web Portals with WebML€¦ · cover Semantic Web and we devise a set of new primitives for ontology importing and querying. Finally, an implementation

Semantic Word Clouds with Background Corpus Normalization ... · Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding ERICH SCHUBERT,

Deep semantic-visual embedding with localization

Semantic expansion using word embedding clustering and

Word Embedding - GitHub PagesWord Embedding 10 • Full document vs Window – Full document: general topics of the word • Latent Semantic Analysis • Expensive for word representation

DeViSE: A Deep Visual-Semantic Embedding Modelvicente/vislang/slides/devise.pdf · 2017. 2. 25. · Efficient estimation of word representations in vector space, ICLR 2013 Skip-gram: