computer vision & pattern recognition saturday 22 · 2017. 7. 22. · these are tobi’spicks...

A publication by

A Word of Welcome from:Anthony Hoogs General Chair,CVPR 2017

Saturday 22Computer Vision & Pattern Recognition

Presenting work by:Derya Akkaynak Silvia VinyesRoey MechrezDan Xu

EXPO:Modar AlaouiCEO of Eyeris

Women in Computer Vision:Ilke Demir, Facebook Tobi’s Picks and More…

http://www.rsipvision.com/computer-vision-news/


http://www.rsipvision.com/


These are Tobi’s picks for today, Saturday 22. Don’t miss them!

Morning:S1-1A.8 09:28 Page 5 of the Pocket Guide:On Compressing Deep Models by Low Rank and Sparse DecompositionP1-1.46 10:30 Page 7 of the Pocket Guide:Detecting Masked Faces in the Wild With LLE-CNNsP1-1.73 10:30 Page 8 of the Pocket Guide:Subspace Clustering via Variance Regularized Ridge Regression

Afternoon:S1-2B.13 13:30 Page 12 of the Pocket Guide:Crossing Nets: Combining GANs and VAEs With a Shared Latent Space for Hand Pose Estimation

S1-2C.30 13:30 Page 13 of the Pocket Guide:TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question AnsweringO1-2B.22 14:17 Page 13 of the Pocket Guide:Disentangled Representation Learning GAN for Pose-Invariant Face RecognitionP1-2.83 15:00 Page 15 of the Pocket Guide:Stacked Generative Adversarial Networks

Tobi’s Picks

For today, Saturday 22

2Saturday

Tobias Baumgartner - Yahoo CV&ML team

“I received my M.Sc. in CS with a focus on ML and CV from RWTH Aachen and started my

career at a small Computer Vision startup in Berkeley, CA: IQ Engines. Shortly after joining,

we got acquired by Yahoo, where I now work as a Sr. Research Engineer. We build the image intelligence behind various Y! properties like

Flickr, Tumblr or Y!Mail, as well as doing independent research. Currently I am mainly

focused on face detection, recognition, clustering, and attribute extraction and am

looking forward to gaining new inspiration for my research at CVPR'17.”

“Besides papers, while on Oahu I want to try and soak in as much nature as possible. I will be exploring the coast line by

bicycle (I am especially curious to ride around the north shore) and try to run as many hikes as the evening hours allow. Quick head's up - Diamond Head closes at 6p. Overall I want to get in

50mi of running, as I prepare for my first ultra marathon.”


http://m.sc/

Aloha, CVPR!

Welcome to CVPR 2017 and welcome back to ofCVPR Daily, the magazine launched one year ago byCVPR and RSIP Vision at CVPR 2016 in Las Vegas.

Also in this CVPR 2017 edition, you will read greatstories, reports and articles about the impressivescientific work being presented here in Honolulu,Hawaii. Today and for 3 more days we will do our bestto present our readers a little bit of the immensetalent on display at the most important computerevent of the year. Please do not miss the word ofwelcome of the General Chair at page 6.

On behalf of CVPR, RSIP Vision and Computer VisionNews, I wish you an awesome CVPR 2017!

Ralph AnzarouthEditor , Computer Vision NewsMarketing Manager, RSIP Vision

Summary

Ilke DemirWomen in Comp. Vision

CVPR DailyPublisher: RSIP VisionCopyright: RSIP Vision

All rights reservedUnauthorized reproductionis strictly forbidden.

Our editorial choices are fully independent from CVPR.

3

Anthony HoogsGeneral Chair, CVPR

Tobi’s Picks

Derya Akkaynak

Modar AlaouiCEO of Eyeris

Silvia Vinyes

Saturday

02

04

06

10

14

20


Derya Akkaynak is a postdoctoralresearcher at the University ofHaifa, jointly with theInteruniversity Institute forMarine Sciences in Eilat.

Derya (whose name means 'ocean' inboth Persian and Turkish) told us abouther work which she is presentingtoday, titled “What Is the Space ofAttenuation Coefficients inUnderwater Computer Vision?”.The poster she is presenting is aboutimproving color reconstruction andcolor acquisition in images that arecollected underwater with underwaterrobots or divers. Derya, who is anoceanographer and mechanicalengineer (not a computer scientist bytraining), is working on understandinghow light propagates underwater andhow it gets captured on camerasensors, to find out how we cancompensate for the colors that arelost, in an accurate and objective way.Her paper leverages decades worth ofdata from optical oceanography toimprove underwater computer visionalgorithms, bridging the two fields.A main challenge of her work is tovalidate the mathematical models theybuild, and see if they actually work inan underwater setting. This requiresseveral dives, a lot of equipment, and alot of hardware and sometimes thingscan go wrong, or the results areunexpected, and then they have to go

back to the model and adjust it. “So it’sa constant iteration between work onthe computer, and work in the sea”,Derya says, which is different to mostComputer Vision fields, where work ismostly done on a computer.Talking about previous work, Deryaexplains to us that there is an existingsystem of equations for underwaterimage formation, and that everybodyuses these equations. However, Deryaand her co-authors looked at howthese equations were derived and theyfound that due to two simplifyingassumptions there are errors that areintroduced that affect people’s work incolor reconstruction. So instead ofusing what was commonly accepted,Derya and her co-authors questioned itand were then able to highlight theweaknesses, and offer a bettersolution.

We asked Derya if she thinks that wecan one day see underwater imagesjust like we see things in normal life.She hypothesizes that this might bepossible with a lot of specialisedequipment, because we nowunderstand very well what happens tolight and how cameras capture lightunderwater.

“The other deepin computer vision”

Derya Akkaynak4Saturday

What Is the Space of Attenuation Coefficients in Underwater Computer Vision?


What was important and key in theirwork, and their biggest contribution,is that they have brought over eightdecades of knowledge from opticaloceanology into Computer Vision.Because in Computer Vision,researchers estimated somecoefficients that they use to correctcolors. “But they never checked ifthose coefficients actually make sensegiven ocean conditions”, Derya says.And in Oceanography it is known thatlight attenuation changes by place andby time, which oceanographers havemapped for the last eight decadeswith various instruments, from verysimple to very technical. So what theyhave done in this work together withTali Treibitz is to bridge ComputerVision and Oceanography.

Derya also told us about what shethinks is the biggest misconceptionabout underwater imaging: “In almost99% of underwater Computer Visionpapers I read, people state that lightattenuated unevenly underwater; redattenuates faster than blue or green”.And this is exactly what happens inwaters that are dominated byplankton (small plants that drift in thewater), she explains. But if you go tocoastal water, where there iscontamination from rivers, soil,sediment and other non-organicsubstances, actually blue attenuatesfaster than red.

Do you want to learn more aboutDerya’s work? Visit her poster today(Saturday) after 10:30 - PosterSession P1-1.

SaturdayDerya Akkaynak 5

So it’s possible to un-do that effect,but she is unsure if it will becommonplace enough to just puton a mask and that this mask willcompensate for everything. Shethinks that such a mask willprobably complicate diving a bit,and might be more interesting forcommercial applications.

“Our next step is to now derive anew system of equations forunderwater image formation”,Derya says, “to compensate for thetwo weaknesses that we found”.They want to be able to tell peoplewhat kind of errors they shouldexpect when they use the oldequations, and they can then judgehow good their results are when allthe errors are taken out.

For their work, Derya and her co-authors used image formationequations that are known to beused for camera and imagesimulation.

“Our work has implications for those working with other scattering media,

such as milk and wine”

http://www.rsipvision.com/ComputerVisionNews-2017February/#12


It must have been very exciting toprepare for this day.

Very exciting! We’ve been planning thisfor 3 years. We did not plan to be inHawaii. We planned to be in PuertoRico. Switching a year ago wassomewhat expensive and not reallydisruptive, but it wasn’t that badbecause most of the logisticalpreparation and the detail level hadn’treally yet begun. Switching to hereprimarily meant rearranging rooms,coming out here for a site visit, and soon…Primarily we’ve been increasing theprofessionalism of the conferencethrough support contractors. This yearwe brought on an EXPO supportcontractor called Hall Erickson who hasbeen doing very good things for us tomanage the EXPO. We have C to CEvents, who’s been growing with usand manages registration, logistics ofthe conference, hotel contracts, and allof that.As a general chair, my main job is tomake sure that overall things aregetting done, make sure that we didn’tforget anything, and track the important

things, but mostly there’s a groups ofprobably 25 or 30 people now whoreally run the conference across all therange of things that we do. I need tomake sure that those people getassigned and know their jobs.

I understand that this is going to bethe biggest CVPR conference ever. Doyou have any numbers that we canshare with our readers?As of this moment, we have 4,880registrations. Last year, there were3,650 I believe. The increase isapproximately a little more than 33%.We usually get a bunch of walk-upregistrations as well so we may getover 5,000. This increase has actuallybeen planned for. If you look at theyear-over-year growth of CVPR overthe last few years, it has been typically

“The survey is data, and data speaks very loudly”

Anthony Hoogs6Saturday

A Word of Welcome from the General Chair

Anthony Hoogs is the SeniorDirector of Computer Vision atKitware, a small software R&Dcompany that does a lot ofcomputer vision. For the CVPRconference, he is one of threegeneral chairs, the one who is incharge of the convention center, theEXPO logistics site here in Hawaii.


about 15% each year. We expected abit of a bump coming to Hawaii, andthe growth rate is probably more thanlinear. We planned ahead and reservedenough space for up to even 6,000people in the entire convention center.We could have handled that. Handling5000 should not be a problem, and wehave enough space for the posters andthe growth of the EXPO.

The EXPO has grown by almost 30%this year in the number of companies.It is almost 80% bigger in terms ofsponsorship dollars. Everything isbigger. We have had more than 30%more submissions and about 30%more papers in the program so itmeans more poster space. There’smore workshops. We received 63workshop proposals and accepted, Ithink, 44. I think there is 21 tutorials.Everything is bigger, but we’ve beenable to accommodate it all so itshouldn’t be crowded.

What are the novelties of this year?

This year we have a few primary newelements in the conference thatperhaps have been overdue and someof them have been talked about. All ofthem were discussed repeatedly in ouroriginal proposal 3 years ago to thecommunity and have been reinforcedsince. First and foremost, we wentwith 3 parallel tracks for oral sessions.The conference has had 2 paralleltracks since 1991 when there wereabout 270 people. We have not grownthe number of tracks as otherconferences have done over the years.Some conferences have not, like ICCV,but we felt it was time to expand thenumber of tracks and havecorrespondingly more oral

presentations. We are doing that for 2days actually. The final day-and-a-halfof the conference is just 2 tracks as ithas been.

The other new element is the three-and-a-half day conference. The mainconference lasts 4 days, as it has beendone recently and has been verypopular in our surveys. However, wenow have a half day on the 3rd daywhich gives people a bit of a break anda chance to go out and enjoy Hawaii. Aconference in this venue means thatwe certainly going to tempt people toskip a few sessions. This time wethought we give them an explicitopportunity without guilt to head offand enjoy.

… to enjoy authentic aloha spirit!

Indeed - There’s lots of aloha spirithere. The convention center and thepeople are very Hawaiian. They’vebeen very friendly and veryaccommodating. So far working withthe convention center has been great.There are two other interesting thingsthis year that are novel. One is theEXPO and the poster session. There’sbeen a few experiments with howthose two should interact. This year,like in 2015 CVPR, we put the postersin the middle of the EXPO so thecompanies are in a ring surroundingthe posters. The difference from thisand 2015 is that we are much biggernow. In 2015, there were about 50companies exhibiting. Now we have125, and we have almost twice thenumber of posters so it’s a hugearrangement. We hope it works. Thereis going to be a lot of flow. The expo isgoing to be open during the postersessions.

SaturdayAnthony Hoogs 7


Lastly we have what we call “theindustry spotlights”. On the EXPO floor,there is an area for this. It’s just thesize of 20x20 booth, but we have a fullprogram throughout the entireconference during lunch and theposter sessions of companies giving 10minute talks. The program is online,but every company who is in the EXPOis given an opportunity to participatein this. Most of them do so we have arunning highlight for each company.You can see it on the program. Isuggest you look at it and just drop by.It’ll happen during lunch and alsoduring the posters. It includes also apanel session bringing together peoplefrom a number of companies to talkabout key issues in our field…

….which will be moderated by me…

Exactly!What is your advice to a first-timeattendee to CVPR? They will havethousands of presentations andlectures. What is your advice?

First, welcome to CVPR to all of thefirst-timers. We don’t get the kind ofgrowth we’ve seen without having alot of people who have never been toCVPR each year. We have a conferencesurvey that we do every year. We makethis available to all attendees typicallytowards the end of the conference,and then it’s open for weeksafterwards. I very strongly encourageeveryone to fill out this survey becausethis is how we understand what peopleliked and what didn’t go well so thatwe can plan the conference for nextyear. This is instrumental in uschoosing a 4-day conference vs a 3-day

conference for example, or choosing togo with invited speakers vs oralspotlights. All kinds of experimentsthat had been done over the past fewyears and new ideas, we use thesurvey to judge whether the peoplewould like this to happen or continue.

Last year in the survey, about a quarterof the registered people responded.We would like to have that percentageto be higher, but according to that 45%of the people were new at CVPR. Thismight have been a bias in thesampling, but nevertheless, they’re ahuge percentage. We expect this yearmight be comparable or even higher at

“The industry spotlights”

Anthony Hoogs8Saturday


least in terms of the response to thesurveys. So, there’s a lot of you outthere. I think for all the new folks, itcan be very overwhelming. I wouldadvise you to treat it like a tastingmenu.

Make sure that you get to a few of theoral sessions to see the high quality ofour selective long oral presentationsand some short orals. I would adviseyou to make sure that you drift aroundthrough the posters sessions and get afeel for that by highlighting the postersthat you want to.The poster sessions can beoverwhelming. In each poster session,there will be about 112 posters.There’s going to be 2 poster sessionseach day, but posters will only beactive on one morning or oneafternoon session. I would advise youto figure out which posters and topicareas are of most interest to you andtarget those rather than getting lost inthe crowd. There’s all kinds ofnavigational help like an online floorplan, things like that so you can findthe posters that you want to get to.The EXPO should be quite a technologyfest. Most of the major internetcompanies are here. Virtually everycomputer vision company is here.Many or most companies invested inmachine learning are here. The EXPOgrows in size and dynamism every year.It’s a very exciting group of demos andpresentations that companies aremaking. It’s quite large so I wouldsample all of those.

Don’t forget the luau! The luau is aHawaiian party on Sunday night. This isthe one major sponsored social eventthat comes your registration. It’s downon the beach near the hotels, and itshould be a great time. It will be atraditional Hawaiian luau withentertainment, native dancing, torchjuggling, and things like that.

I would close by saying, pleaseconsider staying for the last workshopday. It’s the day after the mainconference. It’ll be the 6th day, but wegave people a half day off on thefourth day so that they might be morelikely in part to stay for that last day. Ofcourse, Hawaii calls, and people wantto go, but there’s still a lot of greatworkshops and interesting talkshappening all the way through the finalclosing of that workshop day. I wouldalso encourage everyone to fill out thatsurvey. It really is how we understandwhat the community wants, what theylike, and what they didn’t like. The newaspects of the conference will berepresented on the survey, whetheryou like three parallel tracks orwhether you want four or five paralleltracks.

Please fill that out. That’s how weknow whether you like the postersession, the EXPO layout, and thingslike that. That is how you make youropinion heard. The community hasspoken very loudly on many of theseissues with 95% voting in favor of theEXPO, invited speakers, and so on. Youcan always send an email, but thesurvey is data, and data speaks veryloudly.

SaturdayAnthony Hoogs 9

“The poster sessionscan be overwhelming”

“Don’t forget the luau!”


Yes, so there are a lot of companiesthat do a lot of different things. Whatmakes us special, and I think that’skind of your question, is: number one,because we focus on face analytics andemotion recognition today, we providethe largest number of analytics ordeliverables ever extracted from theface today. That includes sevenemotions, age, gender, head pose, eyetracking, eye openness, gazeestimation, all of that good stuff,attention distraction, drowsiness,microsleep, etc., etc., etc. I mean, Imay have just mentioned here like 15or 20, you can imagine what the list is.So, not only that we do all of this inone single, all-in-one SDK or software,but we also have built our proprietarydeep learning-based methodology thatis primarily using convolutional neuralnetworks for, obviously, imageprocessing. But our methodologyincludes embeddability in mind. So

when we built the algorithm wewanted to make sure that thealgorithm does not only sit in a bigdesktop environment with a ton ofGPUs. We want to literally put it onto aRaspberry Pi, and other low-process,low-memory, low-power devices.Embedded devices particularly.

And why are you specifically soproficient in that?

Because we started initially with hand-tuned features before even deeplearning came along. And we wereliterally pushing ceiling as far asaccuracy, as far as what we can do withall of the traditional machine learningtechniques. When deep learning camealong, we knew exactly what we weremissing, and that the big leap is aboutthe extra 5% of 3% or whatever, so youcan have the extra accuracy underdifferent lighting conditions, underdifferent environments, or what we

“We are hardware agnostic, camera sensor agnostic,gender and authenticity agnostic…”

Modar Alaoui - Eyeris10Saturday

Modar, please tell us about Eyeris.

We are a deep learning-based vision AIcompany. We do computer vision workfor human understanding. Particularly,right now, for face analytics andemotion recognition. Because webelieve that we can derive the richestsource of information about humanbehavioural understanding throughfaces. And then some of our additionalproduct roadmaps are to complementthe face analytics and emotionrecognition with body pose estimation,understanding, action recognition,activity recognition, etc.

Would you agree that this segmenthas already a significant number ofactors working on this?

Modar Alaoui, founder and CEO of Eyeris


call different uncontrolledenvironments in general. So themethodology that we created, which isproprietary, kind of bridges betweenthe two, if you would. But that isdependent on where the algorithmsneeds to be deployed. If it needs to bedeployed into some GPU with massiveamounts of processing power and allof that, then we use deep learningspecific algorithms. And if it needs tobe deployed into a Raspberry Pi thenwe kind of leverage some sort of whatI would call a hybrid between the twoso that I can remain a little bit vagueabout our methodology.

Well, this requires a bit of a storynow. How did the idea come to you inthe first place, at such an early stage?

So, my background is in userbehavioural measurements usingtechnology. And early 2008, I launchedthis company that was doing genderand age group specific advertisement,or gender and age group targetedadvertisement using computer vision.So using age and gender recognitionfrom a private in-store TV network thatwas launched or deployed in airportshopping centres and things like that.So the idea behind this was basicallyputting out advertisements based ongender on airports when you arewithin 10 or 15 feet away from thedisplay. So we needed more tools thatare kind of cutting-edge for measuringthe audience. And so we weren’tsatisfied with age and gender, andpeople counting, and dwell time, andall of that. We started looking intoemotions, and we exhibited as the firstemotion recognition company ever atCVPR in Portland in 2014, and weliterally were among a dozen

companies like Amazon and the-liked,and we were the only startup. So that’sbasically when the idea came along.We had a good prototype, we showedit, and then we wanted to pursue asoftware-only kind of play where wewould provide everything that welearned plus everything that westarted to learn about emotions and tocombine everything into a singlesoftware. Not only we have everythingnow, but we also are hardwareagnostic, camera sensor agnostic,gender and authenticity agnostic. Wetrained on over three million imagesand videos of data that we collectedinitially, because there was nothingpublicly available. We spent the betterpart of two years collecting dataregularly and we still do that every dayat our lab in San Jose.

In one of the talks today I heard thatmost data that is used to train suchsystems is collected without thepermission of the person, or evenafter receiving a denial for collectingthat information. I am sure that youhave an answer to that?

[We laugh] - of course! We hirebetween 120 and 150 people to ourSan Jose lab every single month. Wehave the largest dataset ever collectedinside a vehicle with people’s consent.We keep everyone’s record. Not onlyrecord - I mean we have self-reportedage, gender, ethnicity, background,height, weight, etc., from every personwith their IDs attached to releaseforms, which is what we call them. Andthat gives us full authority to do reallywhatever we want to do with the data.And what we do with it, really, is not topost it out there on our website, but totrain our algorithm on it and leverage it

SaturdayModar Alaoui - Eyeris 11


as ground truth. And before trainingwe of course have to label the data,and so on.

What about the segment in general,do such companies as I mentionedearlier exist?They do exist, unfortunately.Unfortunately they do exist. There is agrey zone where people go out on theinternet and collect one million imagesfrom Flickr of from people’s profilepictures of Facebook, or from othersocial media. And those people, yes,they have not granted permission forthis company to collect, but indirectlythey somehow gave permission to themedia platform to basically show theirresults or their images publicly on theinternet to anybody that is interestedin that profile or that image. And thatis why I say that it is grey. So it’sindirectly given to the platform owner,but it’s not necessarily directauthorisation to analyse it or to eventrain on it. So my take on this is that itis great to the extent where thealgorithms that are trained on thisdata remain for non-commercialpurposes. And the second that somealgorithm that was trained on this databecomes for commercial purposes, Ibelieve that there is a little of a bridgethere being crossed.

That means that this grey area ismuch cheaper and much easier...

… and scalable, yes.That means that you forced onyourself to give up those threeadvantages in order to be sure thatwhat you do is not in a grey area but

in the white area.

That is right. So we did that for tworeasons. The first is the same reasonthat you just mentioned, but also forthe reason because the algorithm thatwe are producing is for a very hypertargeted environment and set-up. Andso we go after two different types ofverticals, one is autonomous vehiclesand driver monitoring, or what we calloccupants monitoring AI for highlyautomated vehicles. That includes thedriver of course and all the otherpassengers. And then the secondvertical is social robotics.

But I’m not going to talk too muchabout social robotics, because that’snot really why we tried to collect all ofthat data. We collected the data insidethe vehicle and so there is no publiclyavailable data or no data out there onthe internet today that you can leverage

Modar Alaoui - Eyeris12Saturday

A sample of Eyeris' all-in-one face reading AI deliverables for highly automated

vehicles (HAVs)

“The grey area is much cheaper, much easier...

… and scalable”


to train an algorithm with differentlighting conditions, non-uniform partiallighting, different head poses, personlooking at different areas in the windshield, different parts in the car, whenthe camera is placed in differentpositions, etc. We had to literallycreate all of that and also for thebenefit of creating data that we canleverage to really do whatever wewanted to do with it and to train thealgorithm for the purpose of selling itcommercially.

What did you come to do atCVPR2017?

So, not only we are exhibiting here. Ihave a ton of meetings with a numberof companies which I’m pleasantlysurprised to see here, includingstartups or even larger companies. Butso we are exhibiting and looking atsome papers. Potentially we’ll bevoting for some of the papers that weare looking at, computer vision papersparticularly, that we are looking out forbehaviour understanding. And also myteam is here to scout around, andmeet and mingle with professionals in

the field, potentially hire a couple ofpeople that would be great for ourmission and our company.Does that mean that people can meetyou and your team at the EXPO?That’s right - anytime during the nextfour days. Otherwise they can easilydrop us an email and we can makesomething happen.

You can meet the wonderful team ofEyeris at booth 651.

SaturdayModar Alaoui - Eyeris 13

“Scout around…”

Dear reader,

Would you like tosubscribe to ComputerVision News andreceive it for free inyour mailbox everymonth?

FREE

SUBSCRIPTION

Subscription Form(click here, it’s free)


http://www.computervision.news/subscribe/

Ilke Demir is a research scientist doingher Postdoc at Facebook.

Ilke, where did you study?I did my PhD at Purdue University. Ialso did my Masters at Purdue in theComputer Science department, and Idid my undergrad at the Middle EastTechnical University in Turkey.

Where was it more pleasant to study?

I don’t know. My universities alwayshad a special place for me.

Why did you choose this field?

Since my childhood, I always had thatthing where I assembled anddisassembled things. I was curious howthings work. My father and me had

tasks to solve things, build circuits, orsee how things work… littleexperimental things.

Is he an engineer?

He is actually a helicopter pilot. Nowhe is in the private sector, but he usedto be in the military.

So he’s very much aware of technicalthings?Yes - in order to have their license,they need to have a major from auniversity, so his background is inelectrical engineering.

Since childhood, we used to do allthose little activities. After that inmiddle school, I pretty much knew that

Women in Computer Vision

14 Ilke DemirSaturday

“Why should gender matter?”


I wanted to study computing tasks. Iwas really good at math. All myteachers used to say that I should havea better education and devote myselfto math. I decided in high school tostudy computer science and electricalengineering. The Middle East TechnicalUniversity is the best university inTurkey. There is a big entrance exam inTurkey. You enter all of the universitieswith this exam. Approximately 1.5million students enter that every year. Ihad really good rank in the enteringexams so I could choose anydepartment I wanted in the university.It seems like you were alwaysencouraged to study this field.That’s right. In middle school and highschool, we had math olympics andcomputer science olympics. In middleschool, I won a bronze medal in math.After that I was super dedicated. Inhigh school, my seniors, who are nowdoing awesome things in the SiliconValley, taught me coding. In mycountry, we didn’t have coding andcomputer classes in high school. Wedidn’t have computer classes in myhigh school, at least in my time, sothey taught me coding for thoseolympics. I felt like I could do anythingwith coding.

Why did you decide to come toAmerica from Turkey?

My adviser in Purdue was the best inwhat I wanted to study, which was 3Dreconstruction. At that time, I wasmore focused on robotic visionbecause of my internship in the CowanResearch Lab. I really wanted to be arobotics vision researcher. What myadviser in Purdue was doing at thattime is reconstruction and appearanceediting. That means having cameraprojectors to change a vase with

flowers to a vase with checker boardpatterns and so on. It’s visual, but it’sreal. I wanted to work with him on thisproject so I wrote to him to ask if I canwork with him. I never got a reply, butthat happens of course. Everyoneapplies everywhere. I said I am goingto work with him anyway. Theuniversity already accept me so I knewI was going to Purdue. I did my wholePhD there.

How did you end up working withFacebook?

Together with my adviser at Purdue Iwas giving a course at SIGGRAPH oninverse procedural modeling. RameshRaskar is my supervisor on Facebook,he is new from the MIT Media Lab. Heis the professor that everyone isrunning after. He came to our course atSIGGRAPH. He thought the idea was

SaturdayIlke Demir 15


applicable to the project he had and heliked my presentation - because it’s athree hours and fifteen minutes longcourse, it’s an afternoon course withtwo professors and myself.

It seems like everything has workedout well for you? Are you lucky tohave everything work your way?

I don’t think it’s luck. I know that somethings worked out because certainpeople happened to be in my life alongthe way. I worked hard on those thingsas well. As I told you, my advisordidn’t accept me at first when I wantedto work with him. Then I took his classand showed him I was really interestedin graphics. Then in my secondsemester, not even the first semester,he asked me to be his researchassistant. You can overcome anything.

What is your current work?

We are trying to understand the worldfrom satellite images. We started thatwork by extracting roads from satelliteimages. The big problem is 75% of theworld is unmapped. By that I mean, ifyou want a parcel to be delivered to anaddress and the address that you puton Amazon for example is “Go to thatmarket and behind that market, takethe second street. Then you will see atemple there. It’s the second house onthe right.” That is the address. I can’tprocess what is behind the temple. Youcannot have those addresses.

The main challenge that we are tryingto solve is a automatic, generativeapproach to map all those places, togive everyone a unique street addressthat they can use for all of the locationservices, urban infrastructure,connectivity efforts, etc. Our work atCVPR was studying satellite images and

processing them with a deep learningnetwork to extract roads. Then wesegment those roads to have anindividual street. Then we name thosestreets automatically based on somedistance fields, according to a four-field alphanumeric addressing scheme.

What is particularly challenging indoing that?

Every part had its own challenge. Weexperimented with different deeplearning networks. We experimentedwith different region generationalgorithms. Technically, it wasn’t thatchallenging, but after that havingsomeone use the system is the mostchallenging part. We can have the bestaddress scheme, but if nobody is usingit then why do you have it, right? Wehave some partnerships with somecompanies and government agenciesin developing areas about using ouraddresses. Currently, we have reallygood feedback. For example, for someuser studies, we have 25% reduction inarrival times. If you are using thetraditional addresses, it takes youhours longer to find the place that youneed.



SaturdayIlke Demir 17

[laughs] Even taxi drivers cannot findmy address sometimes. Do you enjoywhat you do?To be a PhD student, you need to bededicated. To dedicated, you need tolove what you are doing. Even with thiswork that I’m doing with Facebook, it iskind of related to my PhD which is ininverse procedural modeling. I see theworld as if I can proceduralizeeverything. This is a branch of it inwhich I proceduralize satellite images.

If you could proceduralize anything inthe world, what would it be?I’ve heard that question before! Iwould want to proceduralize evolution.That would include everything we havelearned nowadays for humanity. Itwould include our learning process, itwould include out growing process. Itwould include us, both semanticallyand geometrically. Just imagine thatyou are being proceduralized. Youknow how your finger grows from acell. All of those are based on a

grammar. So just imagine you thatcould express human evolution withone set of rules, with one procedure,one grammar. That would be soawesome.

Does it mean that everything ispredetermined?

It can be a stochastic grammar, itdoesn’t need to be pre-determined.But it needs to be based on some rules.

So is there no power governing thisworld?

The power belongs to the person thatis able to extract that grammar.

What do you think about us giving avoice to women in computer vision?

Ideally, the world should be equal. Youshouldn’t be focusing on women. Whynot ask white males what they think,for example, right? But we live in asociety where this is needed. I admireand appreciate what you are doing. Iwant to thank you for that. I know thatwhen I speak, people hear me, but I


know that not it’s not the same for allwomen in computer vision andcomputer science. I witnessed so manyengineers and higher ups not havingtheir voices heard, being overtalked, ornot being looked in the eye because oftheir gender. That’s why you need tocontinue this.

I heard two different approachesactually. I’ve heard that it might putwomen on a lower level. Others havesaid that you do need this positivediscrimination otherwise youwouldn’t be heard at all. What do youthink?

I think positive discrimination is stillneeded at least to encourage younggirls. We keep seeing those numbersincreasing, but we are still so far fromthe ideal equality. We need to doeverything in our power. I’m not sayingwe should reduce the quality of thework because of gender. I’m not sayingpositive discrimination does not affectsthe overall quality of something. But if

you can make those trade-offs, toempower a minority - which does nothave to be women, it can be race, age,or any minority. But if you canempower a minority to have some partin the decision or in the process in thework environment then that is reallysomething you should have.

How can we improve the behaviors ofpeople to help make this change?

I think everyone, in all interactions,should be seen as who they are. Theyshould be abstracted from what theyare, and everything before theconversation. It should be a human tohuman conversation. Your title, status,and achievements shouldn’t matterwhen you have conversation in formaland informal settings. Thoseabstractions include everything;gender, class, race, etc.

Do you think men can do that whenthey are talking to a woman?

Why not? I don’t see why not. Whyshould gender matter?



SaturdayJitendra Malik 19

It is always passionating to follow a lecture given by Jitendra Malik.

This year, at the Deep Learning for Robotic Vision workshop, he

shared with us great quotes, good humor and new insights from his

recent work with Google.


http://www.rsipvision.com/ECCV2016-Friday/#4

Silvia Vinyes is starting the fourthyear of her PhD at the ImperialCollege London. Yesterday shepresented her poster at theComputer Vision in Sports workshop.

Silvia’s work is about a method thatshe has developed for actionrecognition in tennis. What she andher co-authors are trying to do isrecognise the fine-grained action of atennis player. This includes for examplespecific types of serves - flat serve, kickserve - or different types of backhandor forehand. The idea is that “in thefuture, this information can be used tobuild models of tennis”, Silvia explains.

In current models of tennis, youusually have only the position of theplayer and the position of the ball, andthen you build a temporal model fromthat. Silvia thinks that if you know thespecific action that a player isperforming you can build an evenmore accurate model. Additionally,when you see a match then maybe youwant to know which actions a player isweaker in or stronger, or which kind ofstrategies he uses. In that case, youwould want to make a differencebetween the first and second serve,and see which kind of action he isperforming. By automatically collectingthe type of action, instead of doingeverything manually, even more datafor these kind of tasks can becollected.

There are also some challenges thatSilvia told us about. One difficulty is

that there are many things thatchange: the place from where you seethe player can change, the type ofsurface where they are playing on canchange, or the illumination in the videocan change. Even how a playerperforms an action changes becausethe movement they do is not exact.Using this information, you could eventhink of another application, Silviasays: if you know the way of serving ofa specific player and he changes theway of serving maybe you can see thatsomething is going on, like an injury.

The way Silvia and her co-authorsapproached the problem is to use adeep neural network that was trainedon an independent dataset to extractsome features. “So we let the networkdecide which features are important inthe image”, she tells us, and then theyused these features to tell whichaction is being played, using anotherneural network.

“In the dataset there were professional and amateur players,

and the model performed different for each group”

Silvia Vinyes20Saturday

Deep Learning for Domain-Specific Action Recognition in Tennis


The next steps Silvia sees for theirwork is to test the approach on other,more challenging datasets. In thedataset they used the background andillumination are changing but it’s notactually in a real world setting sincethe players were not actually playing amatch. At the same time theviewpoint in these videos was notchanging that much so it would beinteresting to see from differentviewpoints, if they can achieve thesame results. “In an ideal world what Iwould like to have is more data, butfor this type of problem the data isrestricted”, Silvia says, and adds that“this is why it’s interesting to work inthis type of setting”. The data isrestricted because it is a very specific

problem, she tells us. If you’re tryingfor example to detect which sport isbeing played, there are datasets witha lot of data which are availablebecause you can just go to YouTubeand then collect videos of differentports. But settings of different fine-grained tennis actions the data ismuch more restricted. One of thethings Silvia is particularly happyabout is that they used a dataset thatis publicly available, since in the areaof sports action recognition most ofthe time people use their own dataand its hard to compare the results.Silvia hopes this encourages people touse the same dataset and compare totheir approach.

SaturdaySilvia Vinyes 21

The first network they used is theinception network from Google.On top of this, they built a three-layer LSTM. Silvia tells us about aninteresting thing they found aftertraining the model: in the datasetthey used there were professionaland amateur players, and themodel performed different foreach group. Also, when the modelwas trained on one group, itperformed good when tested onthis group, but worse on the other.She says it will be interesting to seewhy it’s performing better in oneor the other.


Roey Mechrez and Itamar Talmi will be presenting today their work “TemplateMatching With Deformable Diversity Similarity”: a novel measure for imagesimilarity named Deformable Diversity Similarity (DDIS) – based on the diversityof feature matches between images.

Methods for measuring the similarity between images are a key component inmany computer vision applications such as template matching, symmetrydetection, tracking, counting objects and image retrieval.

The key contribution of this work is a novel similarity measure that is robust tocomplex deformations, significant background clutter, and occlusions. Theauthors demonstrated the use of DDIS for template matching and showed that itoutperforms the previous state-of-the-art in its detection accuracy whileimproving computational complexity.

The main challenge of this development, according to Roey, was to giverobustness to the model in front of all kinds of different template matchingproblems: “It is hard to model background clutter, deformations andocclusions, and yet, we want to have a similarity measure which is able to detectthe best match. That is why our method relies on both local appearance andgeometric information that jointly lead to a powerful approach for similarity. It is

“A trophy held bythe One Direction band

is correctly detectedin the target image,

regardless of the occlusion: as I said,

we only needone direction...”

22 Roey MechrezSaturday

Roey Mechrez is a PhD candidate at the Technion

of Haifa, in Israel

Template Matching With Deformable Diversity Similarity


SaturdayRoey Mechrez 23

based on two properties of the Nearest Neighbor (NN) field of matches betweenpoints in two compared images. The first is that the diversity of NearestNeighbor matches forms a strong cue for similarity. The second key idea behindDDIS is to explicitly consider the deformation implied by the Nearest Neighborfield. The idea of allowing deformations while accounting for them in thematching measure is highly advantageous for similarity.“

The team has continued working on the problem of image similarity, pursuingnew research directions for DDIS. Roey concludes with a story: “Comparing tothe previous state-of-the-art, called Best-Buddies Similarity, we claim that weneed only one direction of the nearest neighbor field and there is no need forboth directions. To make that point, one of the figures in our paper shows atrophy held by the One Direction band which is correctly detected in the targetimage, regardless of the occlusion: as I said, we only need one direction...”

Do you want to learn more? Roey and Itamar will present their work today(Saturday) at 13:30 - Kalākaua Ballroom C


http://people.csail.mit.edu/talidekel/Best-Buddies%20Similarity.html

http://cgm.technion.ac.il/Computer-Graphics-Multimedia/Software/DDIS/

“It’s also useful for other computer vision problems, involving continuous variables”

24 Dan XuSaturday

Multi-Scale Continuous CRFs as Sequential DeepNetworks for Monocular Depth Estimation

Dan Xu is a PhD candidate at theUniversity of Trento, Italy, and he isalso holding a research assistantposition at the Chinese University ofHong Kong. He is presenting today hiswork about Multi-Scale ContinuousCRFs as Sequential Deep Networks forMonocular Depth Estimation. The co-authors of this paper are Elisa Ricciwho is working at FBK, a researchinstitute in Italy; Wanli Ouyang, aresearch assistant professor at theChinese University of Hong Kong; andXiaogang Wang, who is associateprofessor at Chinese University ofHong Kong. Last author is Nicu Sebe,who is a full professor at theUniversity of Trento.

Their paper is about developing anend-to-end deep learning system torecover 3D depth information from asingle RGB image. There are two novelaspects of this work, Dan tells us. First,previous work on this topic usedgraphical models, namely CRFs, butthey only consider a single scale. “Butthe multi-scale information has beenproven very effective for a lot of

computer vision problems”, Danexplains, “so we thought in our task,the multi-scale information would alsobe useful”. So they considered to usethe multi-scale information derivedfrom the deeper networks, and tocombine this information they proposea multi-scale (instead of just a single-scale) CRF. The second novelty is thatthey demonstrated the effectivenessof their framework when plugging theproposed multi-scale CRFs intodifferent front-end convolutionalneural networks. They show that forother tasks, if they involve continuousvariables, the model andimplementation of Dan Xu’s methodcan be used as well. “So this is not onlyuseful for the monocular depthestimation, it’s also useful for othercomputer vision problems, involvingcontinuous variables”, Dan says, forexample in crowd counting system,where the number of people in acrowd has to be counted.

If you want to learn more, you canattend Dan’s spotlight presentationtoday (Saturday) at 9:20.


SaturdayPresentations 25

At the workshop The Bright and Dark Sides of Computer

Vision: Challenges and Opportunities for Privacy and Security, Vinay Uday

Prabhu presented: Vulnerability of Deep Learning Based Gait

Biometric Recognition to Adversarial Perturbations.

These are Adversarial Perturbations in a non-computer vision setting, accelerometric data and

sensor data.

At the workshop Computer Vision for Microscopy

Image Analysis, Avisek Lahiri presented: Generative

Adversarial Learning for Reducing Manual

Annotation in Semantic Segmentation on Large ScaleMicroscopy Images. That is,

how to reduce labor in computer vision annotations

using adversarial GAN learning


Improve your vision with

The only magazine covering all the fields of the computer vision and image processing industry

A publication by

Subscribe(click here, it’s free)

Map

illary

RE•W

ORK

Gau

ss Surgical





http://www.computervision.news/subscribe/

computer vision & pattern recognition saturday 22 · 2017. 7. 22. · these are tobi’spicks...

Documents