an#old#ar(ficial#intelligence#dream#that comes#true:# …bernardi/slides/lavi_tutorial... ·...
TRANSCRIPT
![Page 1: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/1.jpg)
An old Ar(ficial Intelligence dream that comes true:
Merging language and vision modali(es Raffaella Bernardi University of Trento
![Page 2: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/2.jpg)
An old AI dream
A. Turing, Compu(ng machinery and intelligence, Mind 59, pp. 433-‐460, 1950
Need of: • Natural Language Processing (NLP) • Knowledge Representa(on • Reasoning • …
![Page 3: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/3.jpg)
AI
Knowledge Representa(on Planning
Machine Learning
Natural Language Processing
computer vision
Reasoning
Robot
Social Intelligence
Crea(vity
![Page 4: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/4.jpg)
Natural Language Processing (NLP):
• Part of Speech Tagging (PoS) • Syntax • Seman(cs • Discourse • Dialogue
![Page 5: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/5.jpg)
Distribu(onal Seman(cs The meaning of a word is given by its context
![Page 6: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/6.jpg)
Distribu(onal Seman(cs: coun(ng words distribu(on
Words are represented by vectors harvested from a corpus of texts by coun(ng word co-‐occurences.
![Page 7: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/7.jpg)
Distribu(onal Seman(cs: Predict the context
The vector represen(ng a word is obtained by learning to predict its nearby words. (Mikolov et al, 2013)
![Page 8: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/8.jpg)
Seman(c Rela(onship Mikolov et al. NIPS 2013
![Page 9: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/9.jpg)
Pause: Neural Network
It's a composi(on of func(ons (neurons) that goes from an n-‐dimensional vector to class scores.
Each neuron receives some inputs, performs a dot product and op(onally follows it with a non linearity. On the last (fully-‐connected) layer, they have a loss func(on (e.g., So]max).
![Page 10: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/10.jpg)
Pause: Recurrent NN
Tradi(onal neural networks cannot use the informa(on about previous inputs to inform later ones. • Recurrent neural networks (RNNs) address this issue: They are networks with loops in them, allowing informa(on to persist. They work well with short dependencies.
• Long Short Term Memory (LSTM) are a special kind of RNN, capable of learning long-‐term dependencies.
![Page 11: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/11.jpg)
LSTM: Sentence representa(on
Star(ng from word2vec word representa(ons or from the plain words, obtain the sentence representa(on via LSTM:
![Page 12: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/12.jpg)
Distribu(onal Seman(cs: A successful story..
Lexical meaning • Synonyms • Concept categoriza(on (eg. car ISA vehicle) • Selec(onal preferences (e.g. eat chocolate vs. *eat
sympathy) • Rela(on classifica(on (exam-‐anxiety CAUSE-‐EFFECT
rela(on) • Salient proper(es (car-‐wheels)
Composi5onality: Phrase and Sentence • Similarity • Entailment
![Page 13: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/13.jpg)
Distribu(onal Seman(cs: .. but Grounding Problem
Grounding language representa(on into the world: point to the reference of our mental representa(on.
![Page 14: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/14.jpg)
Computer Vision: From pixels to Meaning
![Page 15: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/15.jpg)
Computer Vision: Abstract Features
![Page 16: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/16.jpg)
CV tradi(onal tasks: Objects
Image classifica(on:
Object localiza(on:
From objects to scene classifica(on
![Page 17: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/17.jpg)
CV first important revolu(on: ImageNet
ImageNet: • Stanford Vision Lab, Stanford University & Princeton University.
• Image database organized according to the WordNet hierarchy.
• Challenges: 2007-‐present • AMT: 48,940 annotators from 167 countries • 15M images • 22K categories of objects
![Page 18: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/18.jpg)
CV second important revolu(on: Convolu(onal Neural Networks
ImageNet Classifica(on with Deep Convolu(onal Neural Networks Alex Krizhevsky, Ilya Sutskever and Georey E. Hinton, 2012 • 2012: Krizhevsky outperformed the
other systems using CNN • 2013: half of the systems used CNN • 2014: All of the systems used CNN.
![Page 19: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/19.jpg)
CNN: Hierarchy of features
![Page 20: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/20.jpg)
CNN: off-‐the-‐shelf vector representa(on
• Train a CNN on a vision task (e.g. AlexNet on ImageNet) • Do a forward pass given an image input • Transfer one or more layers (e.g. FC7 or C5)
![Page 21: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/21.jpg)
Language and Vision
Language and Visual Spaces can be combined!
Cogni(ve Angle: Language and Vision Representa(ons
must be combined!
Applied Angle: Combining Language and Vision Representa(ons
gives very useful
![Page 22: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/22.jpg)
Language and Vision
• Mul(modal Tasks: – Exploit language to improve on tradi(onal CV tasks – Exploit vision to improve on tradi(onal NLP tasks – New Mul(modal Tasks
• Mul(modal Representa(ons: – learned separately and translated one into the other – learned separately and concatenated – learned jointly
![Page 23: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/23.jpg)
Mul(modal Tasks: Improve tradi(onal CV tasks
Not a lemon, it's more probable a tennis ball. -‐-‐ Info come from a KB (word similarity list, extracted from internet Google Sets). Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, S. Belongie (ICCV 2007) Objects in Context.
Use of Corpora for Ac(on Recogni(on. Thu Le Dieu, Jasper Uijlings and R. Bernardi (2010, 2011)
![Page 24: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/24.jpg)
Mul(modal Tasks: Improve tradi(onal NLP tasks
E. Bruni, G.B. Tran and M. Baroni (GEMS 2011, ACL 2012, Journal of AI 2014), E. Bruni, G. Boleda, M. Baroni and N. Tran (ACL 2012)
![Page 25: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/25.jpg)
Mul(modal Vector Spaces
Kiros et al. 2014
![Page 26: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/26.jpg)
New Mul(modal Tasks: Cross-‐Modal Mapping
Lazaridou, Bruni and Baroni ACL 2014
![Page 27: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/27.jpg)
New Mul(modal Tasks: Image Cap(oning (IC)
• Datasets: Flickr, Pascal, MS-‐COCO (164K images, 5 cap(ons each) • Survey: Automa(c Descrip(on Genera(on from Images: A Survey of Models,
Datasets, and Evalua(on Measures, Bernardi et al. JAIR 2016 • Very good talk: by Karpathy (2015):
Limita5ons: • Evalua(on Measures: Bleu, Rouge, etc. but not precise. • No reasoning
![Page 28: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/28.jpg)
New Mul(modal Tasks: Visual Ques(on Answering (VQA)
Limita5ons: • Language prior problem: Blind models perform preky well (50% accuracy on COCO-‐
VQA!). è But see development of new real image datasets: VQA2, TDIUC
Datasets: DAQUAR 2014, COCO-‐QA, VQA, Visual7W, Visual Genome, VisWiz Survey: Visual Ques(on Answering: A Survey of Methods and Datasets Wu et ali, (2016)
![Page 29: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/29.jpg)
New Mul(modal Tasks Image-‐Text Aiignment Datasets: Faces in the Wild, Flickr
30k En((es, VRD, Visual Genome Duygulu et al 2002, Barnard et al 2003, Berg et al 2004, Plummer et al 2015, Karpathy and Fei-‐Fei 2015, Zhu et al 2015, Krishna et al 2016, Lu et al 2016
Referring Expressions Datasets: D-‐TUNA Corpus, Referit Game Dataset, Referit Game MS-‐COCO Mitchell et al 2013, Fitzgerald et al 2013, Kazemzadeh et al 2014, Mao et al 2015, Yu et al 2016, Hu et al 2016, Yu et al 2017, Nagaraja et al 2016, Fang et al 2015
Credits: Vicente Ordóñez-‐Román
![Page 30: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/30.jpg)
New Mul(modal Tasks Diagnos(c Datasets: FOIL
Shekhar et al ACL 2017: hkps://foilunitn.github.io/
![Page 31: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/31.jpg)
New Mul(modal Tasks Diagnos(c Dataset: CLEVR
Jonhson et al CVRP 2017: hkps://cs.stanford.edu/people/jcjohns/clevr/
![Page 32: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/32.jpg)
New Mul(modal Tasks: Diagnos(c Datasets: NLVR
Suhr et al ACL 2017: hkps://github.com/clic-‐lab/nlvr
![Page 33: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/33.jpg)
Other more recent
New Mul(modal Tasks: • Spoken VQA
• Mul(modal Machine Transla(on • Image Genera(on • Visual Dialogue • Visual Story Telling (Huang et al. 2016) • Ques(on Genera(on (Mostafazadeh et al 2016, Jain et al 2017) • Explana(on (Park et al. 2018), Counter-‐factual (Hendricks et al.
2018), Inferences (Iyyer et al. 2017), Entailment (Vu et al. 2018) • Emo(on recogni(on, You et al. 2016 • Learning to quan(fy (vague quan(fiers, exact numbers). Pezzelle et
al. 2016, 2017, 2018 • …………
![Page 34: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/34.jpg)
Visual Dialogue: GuessWhat?! game
• Collected by de Vries et al 2017 via AMT
• Two par(cipants see an image (from MS-‐COCO).
• 155K dialogues about 66K different images
• Av. of QA per game: 5.2 • 84.6% of the games are
completed successfully
See also: Visual Dialog hkps://visualdialog.org
![Page 35: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/35.jpg)
Mul(modal Representa(on Multimodal Distributional Semantics Bruni, Tran and Baroni (2014)
Combining Language and Vision with a Multimodal Skipgram Model Lazaridou, Phan and Baroni (2015)
![Page 36: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/36.jpg)
Basic Mul(modal Models: Point-‐wise mul(plica(on
![Page 37: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/37.jpg)
What has the community gained? • Aken(on Networks • Hierarchical Co-‐aken(on • Bokom-‐up Top-‐down aken(on
• Composi(onality • Mul(-‐modal Pooling
Bokom-‐Up and Top-‐Down Aken(on Anderson et al., CVPR 18
Mul(modal Compact Bilinear Pooling Fukui et al., EMNLP 16
Neural Module Networks Andreas et al., CVPR 16
Hierarchical Ques(on-‐Image Co-‐Aken(on Lu et al., NIPS 16
Stacked Aken(on Networks Yang et al., CVPR 16
Credits: Aishwarya Agrawal
![Page 38: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/38.jpg)
Cuung-‐edge fancy models: Learning Paradigms
• Adversarial learning • Reinforcement Learning • Coopera(ve Learning • …
![Page 39: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/39.jpg)
Surveys
• ACL 2017: hkps://www.cs.cmu.edu/~morency/MMML-‐Tutorial-‐ACL2017.pdf
• COLING 2018hkps://arxiv.org/abs/1806.06371
![Page 40: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/40.jpg)
Some research groups • Stanford Vision Lab Le Fei Fei hkp://vision.stanford.edu/ • MIT: Antonio Torralba hkp://web.mit.edu/torralba/www/ • University of North Carolina. Tamara Berg hkp://www.tamaraberg.com/ • Virginia University Devi Parikh hkps://filebox.ece.vt.edu/~parikh/CVL.html • CLIC hkp://clic.cimec.unitn.it/lavi/ • Edinburgh University (M. Lapata, F. Keller ) • University of Sheffild Lucia Specia
hkp://staffwww.dcs.shef.ac.uk/people/L.Specia/ • Universitat Pompeu Fabra, COLT group, Gemma Boleda:
hkp://gboleda.utcompling.com/
• Facebook FAIR • Google DeepMind • More on the iV&L Net Cost Ac(on
hkp://www.cost.eu/COST_Ac(ons/ict/Ac(ons/IC1307
![Page 41: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/41.jpg)
LaVi @ UniTn
• Learning the meaning of Quan(fiers from Language and Vision: hkps://quan(t-‐clic.github.io/
• Visually Grounded Talking Agents (in collabora(on with UvA): hkps://vista-‐unitn-‐uva.github.io/
• Grounded TE (in collabora(on with Malta): hkps://github.com/claudiogreco/coling18-‐gte
On going work: • Be Different for Be Beker (with SAP) • Con(nual learning
![Page 42: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/42.jpg)
UniTN LaVi People
Ionut (-‐>Barcelona) Sandro Ravi
Aliia
Claudio
Alberto Aurelie me
![Page 43: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#](https://reader033.vdocuments.site/reader033/viewer/2022041808/5e55bbfd08d95e4fed281610/html5/thumbnails/43.jpg)
References on tasks • Jain, U., Zhang, Z., Schwing, A.G.: Crea(vity: Genera(ng Diverse Ques(ons using Varia(onal
Autoencoders. In: CVPR. (2017) 5415-‐5424 • Li, Y., Huang, C., Tang, X., Change Loy, C.: Learning to Disambiguate by Asking Discrimina(ve
Ques(ons. In: Proceedings of the IEEE Interna(onal Conference on Computer Vision. (2017) 3419-‐3428
• Vondrick, C., Oktay, D., Pirsiavash, H., Torralba, A.: Predic(ng mo(va(ons of ac(ons by leveraging text. In: Proceedings of the IEEE Conference on Computer Vision and Pakern Recogni(on. (2016) 2997-‐3005
• Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T., Rohrbach, M.: Mul(modal Explana(ons: Jus(fying Decisions and Poin(ng to the Evidence. In: 31st IEEE Conference on Computer Vision and Pakern Recogni(on. (2018)
• Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Genera(ng Counterfactual Explana(ons with Natural Language. arXiv preprint arXiv:1806.09809 (2018)
• You, Q., Luo, J., Jin, H., Yang, J.: Building a Large Scale Dataset for Image Emo(on Recogni(on: The Fine Print and The Benchmark. In: AAAI. (2016), 308-‐314
• Yu, L., Park, E., Berg, A.C., Berg, T.L.: Visual madlibs: Fill in the blank descrip(on genera(on and ques(on answering. In: Proceedings of the IEEE interna(onal conference on computer vision. (2015) 2461-‐2469
• Iyyer, M., Manjunatha, V., Guha, A., Vyas, Y., Boyd-‐Graber, J.L., Daume III, H., Davis, L.S.: The Amazing Mysteries of the Guker: Drawing Inferences Between Panels in Comic Book Narra(ves. In: CVPR. (2017) 6478-‐6487
è For a rather extensive overview see Pezzelle et al. SiVL 2018