18551 fall 2008 g1 final report - ece:course page
TRANSCRIPT
C.A.V.E.S.ContentAwareVideoExpansionandScaling
Fall2008,Group1
AneebQureshi([email protected])GregoryTress([email protected])DavidXiang([email protected])
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
2
1.Problem
Since the proliferation of consumer television technology in the 1940s and 1950s, the American television
industryhasused the4:3aspect ratioasa standard for filmingandbroadcasting,establishedby theNational
TelevisionSystemCommittee.This standardhasguidedboth televisionmanufacturersandcontentproviders.
More recently, with the increased availability of high‐definition content, the nationwide switch to digital
broadcasttelevision,andtheintroductionof low‐costdigitaltelevisionsets,consumershavedemonstratedan
interest inviewingcontent inthe16:9"widescreen"aspectratio.ThisaspectratiowasusedbytheAdvanced
Television Systems Committee as part of its standard for high definition television and its basis for new
standardizedtelevisionresolutions.Manytelevisionsinthemarkettodayaredesignedaroundthe16:9aspect
ratio,andtelevisionstudiosaretransitioningtowarddigitalwidescreenfilmingofcontent.Forconsumers,digital
television generally results in a higher‐quality picture, with increased resolution and decreased visible
interference.
At thesametime, this transition results inabackwards‐compatibility issue.Themajorityofexisting television
contenthasalreadybeenfilmedin4:3,andinordertodisplay4:3aspectratiocontentona16:9television,the
consumermust choosehow to adjust the aspect ratio on the television itself.Widescreen televisions usually
have a similar set of user‐defined options for this purpose, as described here.One option is tomaintain the
originalaspect ratioof thecontentandcenter iton thescreen.Because the16:9screen is significantlywider
thanthe4:3videoframeofthesameheight,blackbarswillappearonthesidesofthevideo;thisphenomenon
isknownas"pillarboxing."Asaresult,theconsumerlosesthebenefitofthewiderscreen;infact,awidescreen
televisionofthesameviewablesurfaceareaasatraditional4:3televisionwillyieldasmallerrepresentationof
thesamevideo.Asecondoptionistohorizontally"stretch"the4:3video,forcingittofilltheentirescreen.This
results inasignificantandnoticeabledistortion.Yetanotheroptionisto"zoom"the4:3video,cuttingoffthe
top and bottom of the frame, so that the video can fill the entire screen without distorting the content.
However,thiszoomfunctionalityisgenerallyunintelligentandresultsincuttingoffimportantpartsofthevideo.
Consumers will ultimately continue to face aspect‐ratio difficulties as long as 4:3 content is broadcasted.
Currently, there is no simpleway to view this 4:3 content on a 16:9 televisionwhile bothutilizing the entire
screenandavoidingnoticeabledistortion.
Inthispaperwedetailasystemthatintelligentlyconvertsvideocontentfromoneaspectratiotoanother.Inthe
caseofthetelevisionaspectratioproblemdescribedabove,thissystemwouldmodifya4:3inputvideoinsuch
awaythattheresultingoutputwouldfill theentirescreenofa16:9televisionbutwouldnotappearseverely
distorted. While our motivation stems specifically from the 4:3 to 16:9 conversion problem, the system is
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
3
generalizedinsuchawaythat itcanconvertbetweenanytwoaspectratios.Oursystemdoesnotoperateon
actual television content in real time, but will still function as a valid proof‐of‐concept for this, since our
algorithmcanbeusedasthebasisforaconsumer‐enddevicewhichdoesperformthisfunction.Suchadevicein
the form of a set‐top box would require adequate hardware and a modification to the parameters of our
algorithmtooperateinreal‐time.Inourcase,wecanstillcreatea4:3to16:9conversionwithlowerresolution
and lower framerate todemonstrate theeffectivenessof thesystem.With theprevalenceofmobiledevices
andweb‐basedvideo inavarietyofphysical resolutions, therearemanypossibleapplicationsofaspect ratio
conversioninadditiontotelevision‐specificcontent.
There are numerousmethods explored for content‐aware image resizing. For videos in particular, there has
been increased research in video retargeting. Video retargeting relies on solving a large system of linear
equations in order to determine the desired output aspect ratio. As we will detail in a later section, video
retargeting is not suitable for the C67 DSK due to the large amount of computations andmemory accesses.
Instead,wewill be using amodified versionof seam carvingwhich takes into account temporal dependency
betweenframes.Byusingseamcarvinginsteadofvideoretargetingtechniques,wecreateatradeoffbetween
qualityandspeed.ConsideringthelackofpoweroftheC67DSK,thetradeofffornotusingvideoretargetingisa
suitablesacrifice.
2. Novelty ThisprojectrelatescloselytoFall2007Group6'sproject:ContentAwareImageResizingasavideoissimplya
seriesofimages.However,thealgorithmdescribedbytheirprojectcannotbedirectlyappliedtovideodueto
largeartifacts thatoccur. Whenthetraditional seamcarvingmethod isapplied toeach frame inavideo, the
result is jerky: parts of the frame appear to jump from one area to another, usually shaking left and right
sporadically.Thisoccursbecausetheseamsinoneframeareunrelatedtotheseamsinthenextframe.Hence,
whenseamsmovebyenoughpixelsbetween2givenframes,theviewerofthevideoobservesthisjerkyeffect.
Throughoutthepaper,wewill refertoa lessextremeversionofthe jerkyartifactasawavyartifact, inwhich
onlycertainsegmentsoftheframeexperiencemildtemporaldistortionintheirexpansion.
Withthatinmind,wehaveaddednewimprovementstotheseamcarvingalgorithmwhichallowsthealgorithm
to functionproperly for video sequences. These tweaksaredescribed indetail in Section3. In addition,our
projecthasanenormousamountofdatatotransferbetweentheDSKandthePC.Thisintroducesmemoryand
speedproblemswhicharenotpresentinGroup6'sproject.
Inregardtocreatingtheprominencescores,wehavefullyadoptedGroup6'sfacedetectionalgorithm.Wefelt
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
4
that face detection should not be the prominent focus of our project and testing, hencewe decided to not
exploreotherfacedetectionalgorithms.
3.Algorithms
EdgeDetection
Thefirststepincreatingtheprominencescoresforaframeinvolvesdetectinghighenergyareasbasedonedge
detection.Edgedetectioncansimplybeimplementedbycorrelatinganimagewithkernelsthatdetectchanges
incolorbetweenadjacentpixels.Inordertosimplifycalculations,theRGBimageisconvertedtograyscale.This
reducestheamountofcalculationsandmemoryaccessestoathirdoftheoriginalamount.Astherearemany
operators which produce edge detection, we use the Sobel operator as it only uses 3‐by‐3 kernels. Smaller
kernelsaremorefavorableastheyreducethenumberofcalculations.Forexample,themosttimesapixelcan
beoperatedonbya3by3kernelis9times.Fora4by4kernel,apixelcanbeoperatedon16times.Asyou
cansee,theamountofoperationsoneachpixelincreasesby7operations.Consideringthatthekernelisbeing
correlatedwithaframeof320by240(76,800pixels),saving7operationsoneachpixelissignificant.
Figure3.1:Sobeloperatorkernels
TheSobeloperatorusesthekernelsdefinedinFigure3.1andtheL2‐Norminordertoimplementedgedetection
[1]. The L2‐Norm is usually implemented by the square root of the sum of squares of each kernel output.
However, a fastermethod is to approximate the L2‐Norm by the sum of the absolute value of each kernel
output;thisallowsfortheprocesstobeevenfasterontheDSK.AsnotedinFigure3.1,eachkernelrepresents
eitherthegradientinthex‐directionorinthey‐direction.Inactualimplementation,thesetwokernelscanbe
correlatedatthesametime(meaningthatweonlyneedtoiteratethroughtheimageonce).
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
5
WhenimplementingtheSobeloperatorontheDSK,wedonotdocomputationsforthe"zero"elementsinthe
kernelstosavecycles.Inaddition,wedonotmultiplyvaluesbyone,asitisauselesscomputation.Theoutput
ofedgedetectionfromtheDSKisshowninFigure3.2.
Figure3.2:Edgedetection
FaceDetection
Facedetectionisanimportantaspectoftheprominencescoresinordertoensurethatfacesarenotdistorted.
Inmanycases,facescanlackdetail(dependingonhowclosethefaceistothecamera).Whenthisoccurs,edge
detectionwillfailtoassignhighenergytothatface.Henceforth,facedetectionmakesupfortheshortcomings
ofedgedetectionalone.
Asnotedearlier,wehavefullyadoptedthefacedetectionalgorithmbyGroup6fromFall2007.Theapproach
consists of three sequential stages: creating a binary image via YCbCr thresholding, opening and closing via
erosionanddilation,andblobdetectionandrejection.ThefacedetectionprocessisshowninFigure3.3.
Figure3.3:Facedetectionblockdiagram
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
6
1:YCbCrThresholding
YCbCr isacolorspace that isoftenused indigitalvideosystems. Y represents thebrightnesscomponent,Cb
representsthebluechromacomponentandCrrepresentstheredchromacomponentof the image. Theface
detectionalgorithmthatweareusingfocusesondifferentiatingfacesfromtherestoftheimagebasedonthe
skin tone. Henceforth,weonlyuse theCbandCrcomponents inourthresholdingstep. Itmaybepossible to
incorporatetheYcomponenttoallowthethresholdingto identifyfaces inmoresituations. Butagain,wedid
notwanttomakefacedetectionthe focusofourresearchandhencewehave justsimplyadoptedGroup6's
algorithm. Theconversion fromRGB toYCbCr is simplydone through theMatlab command rgb2ycbcr(). We
thenusetheYCbCrcolorspacetocreateabinaryimage.Thebinaryimageishighforpixelsintherangeof100<
Cb<133and140<Cr<165.
2:MorphologicalOpeningandClosing
Beforeexplainingtheprocessofopeningandclosing,wehavetounderstanderosionanddilation.Botherosion
anddilationareperformedonbinaryimages.
Erosionissimplytheprocessofremovingnoiseandotherartifactsbymovingastructuringelementthroughout
theimagewhilefindingoverlapsbetweenthestructuringelementandhighpixels.Ineachoverlap,thecentral
pixelstayshighandallotherpixelsbecomelow[5].
Dilationistheoppositeprocessoferosion.Dilationisperformedaftererosioninordertoattempttofillinany
holes thatcouldhavebeencreatedbysettinghighpixels to lowpixels in theerosionprocess. Indilation,we
asserthighpixelstotheentireareaofthestructuringelementforeachhighpixelintheimage;allotherpixels
becomelow[5].
Whenweapply imageopening,weerodeanddilate the imagebyastructuringelementofsize9by9pixels.
Thisbasicallymeansthat in imageopening,weremoveartifactsthataresmallerthan9by9pixels. In image
closing,wedilateanderodetheimagebyastructuringelementofsize7by7pixels.Imageclosingisusedtofill
holes,likeeyesandlips,whichareusuallydifferentcolorthanregularskin[5].Noticethatinimageopening,we
firsterodewhileinimageclosing,wefirstdilate.Besidesthedifferentstructuringelements,thosearethekey
differencesbetweenimageopeningandclosing.Bothstructuringelementsusedwerefoundbytrialanderror
fromGroup6fromFall2007.
3:BlobDetectionandRejection
Atthispoint,thenoise(ifpresentoriginally)shouldbeeliminatedandweshouldbeleftwithaseriesofblobs.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
7
Wenowneedtointerpretwhichblobrepresentsafaceandwhichblobissimplyafalsedetection.Byusingthe
method regionprops() inMatlab,we caneasily determine thewidth, height andareaof eachblob. We then
rejectallblobswhichmeetanyofthefollowingproperties:width<20,width>80,height<25,height>150.
Theseblobsarerejectedbasedonthe ideathattheyareprobablytoosmalltobeafaceortoo largetobea
face.Lastly,ontheremainingblobs,weexaminetheirwidthheightratios.Iftheirwidth‐heightratioisbetween
0.5and0.9,thenwecontinueprocessing;otherwise,thatblobisrejected.Ifsomethingpassesthewidth‐height
ratio,wethenlookatthedensityoftheblob,whichisdefinedasthearea/(width*height).Ifthedensityisless
than0.5,werejecttheblob;otherwise,wehavesuccessfullydetectedaface.
Limitations
Thisfacedetectionalgorithmisverylimited.Becauseitisbasedoncolorthresholding,it'shighlydependenton
the lighting conditions of the given frame,. With that in mind, this algorithm will not work for every
environment.Wehavenoticedthatitworksbestforindoorsenvironmentsandoftenfailsinoutdoorscenarios.
In alternativemethod for face detectionwould be to do a feature‐based face detection algorithm. Feature‐
basedfacedetectionisindependentofthecolorofthesubject;hence,itwouldworkinanylightingconditions.
Weoriginallylookedat feature‐basedalgorithmsanddecidednot to implement thembecause they requirea
verylargetrainingsetinorderforthealgorithmtoworkcorrectly.Todemonstrateasuccessfulfacedetection,
Figure3.4andFigure3.5eachshowfacesofdifferentsizesdetectedfromdifferentvideos. Inbothcases, the
regiondeterminedtobeafacespreadsslightlyoutsidetheactualboundariesofthefaceduetothecoloringof
thesubject'sshirt.
Figure3.4:Inputframeandresultingfacedetectionoutput(withprimitiveedgedetectionshownasaguide)
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
8
Figure3.5:Inputframeandresultingfacedetectionoutput(withprimitiveedgedetectionshownasaguide)
MotionDetection
ThemotionalgorithmusedintheCAVESprojectisablockbasedimagethresholdingalgorithmimplementedby
Liu et al. [2] The goal ofmotion scoring in video expansion is to give importance to regions of highmotion.
Assumingthat theseregionsareonesofhigh interest, thepurposeofdetectingmovement is togiveabetter
viewing experience for expanded videos. The scores of the motion detector is the third contributor to the
prominencetestmatrix.
The motion algorithm used is a proposed way to provide illumination‐independent change detection. The
algorithm discussion will be broken up into two sections. The first part will discuss special values known as
circular shiftmomentsandshowhowtheyprovidechangedetection inanoise‐freecase.Thesecondsection
willapplyanewdecisionruleontopofthistocopewiththeeffectsofnoise.
1:ChangeDetectionwithCircularShiftMoments(CSM)
For the calculations in this algorithm, the 24bpp images are averaged to give an 8bpp gray scale image. The
mappingfromRGBtothe8bppgrayscaleimageissimplyanaverageofthered,green,andblue.Fromthis,the
image is partitioned into NxN pixel square blocks. For our implementation, we choose N to be 10 for our
320x240images.Thisresultsin768possibleareasofmotionwithinonegivenframe.Choosingsmallervaluesof
Nresultedinsignificantlyslowercomputationtimes,andlessaccuratedecisionresultsbythemotiondetector.
For example, using anN value of 5 results in 3072 possible areas ofmotionwith one frame. For such small
valuesofN,theimagesequencesaretoonoisyandresultinalmostconsistentmotiondetectionwhenitisonly
noise.Forvalueslargerthan10,wefoundthatthemotionalgorithmwouldnotreturnspecificenoughresults.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
9
For example, choosing anN value of 20 results in only 192 areas ofmotion. Even though computationwas
faster,thesquareregionsweretobigtogiveanaccurateestimationofmotionregions.ChoosingNtobe10,for
ourgivensize,resultedinthemostpromisingresults.
For every square area of interest, there is a predefined x‐direction circular shiftmoment and a y‐directional
circular shift moment. Please refer to [2] for the equations. With these equations, the CSM‐based change
detectioncanthenbeappliedtodetectmotion.Thestepsareasfollows.Accordingtotheequationsgivenby
[2], calculateboth thexandydirectional circular shiftmoments forevery squareblock ina reference frame.
Alongwiththesecalculations,decideuponapredeterminedthreshold.Foreverysquareblockareaofinterestof
the kth frame in an image sequence, calculate its x and ydirectional circular shiftmoments for every square
block.Finally,claimthatachangeoccurs inasquareblockiftheabsolutevaluedifferenceofeitherthexory
directionalcircularshiftmomentsbetweenthekthframeandthereferenceframeisgreaterthanthethreshold.
Otherwise,thereisnomotioninthatsquare.Rinseandrepeatbyre‐initializingthereferenceframe,andmove
onwiththenextone.
2:ChangeDetectionwithCSMtoCopewithNoise
Theproposedmethodin[2]todealwithnoiseinthevideoisquitesimple.Considerasituationwherenothingin
theactualvideocontentchanges,butpixelvalueschangeasaresultofnoise.Underanoise‐corruptedsituation,
thegraylevelatacertainpositionwillthenbethegraylevelofthatsamepositioninthepreviousframewith
theadditionofnoise.WeassumethatthisnoiseisadditivewhiteGaussiannoise(AWGN)andcanbesomewhat
accuratelyencapsulatedbyitsmeanandvariance.Weassumethemeanofthenoiseinourvideosiszerowitha
certainvariance.Thevarianceiswhatwillbecalculatedinordertocopewiththenoise.Furthermore,thenoise
isassumedtobeindependentbetweenpixels.
ThegoalofthesecalculationsistointelligentlychangethethresholdfromPart1inordertoproperlycopewith
thenoisebetter.Hypothetically in thenoise‐freecase, thecircularshiftmomentsofacertainsquareblock in
two consecutive image framesmust be identical provided that there is no content change. Aswe know, the
effects ofAWGNwillmake it so that these twoCSMmoments aredifferent even though the scenes are the
same.
Theprocess is simple. For a given reference frame and a kth frame, determine the one square 10x10 region
whichexhibitstheleastchangeinitscircularshiftmoments.Letthe'change'herebedefinedasthesumofthe
absolutevaluedifferencesofboththexandydirectionalCSMmoments.Inorderwords,picktheonesquarein
theentire framewhich changes the least.Accumulate thegray scale levels in this area inboth the reference
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
10
frameandthecurrentframeandcreatearatiowhichwillbeknownasthevariationfactor.Thisnumberright
here is then used to estimate the noise variance. The equations and detailed sequential steps are outlined
thoroughly in [2]. A basic summary goes as follows. Find theNxN square in a framewhich changes the least
between two given frames. Assume that this change is due entirely to noise. Calculate the variance of the
speculative AWGN and accommodate for the variance in the predetermined threshold. Any change will
contributetonoisevariancewhichwillmakethethresholdmorestringent.Thus,thefinalstepsareexactlythe
sameastheonesoutlinedinPart1,exceptthatthethresholdisnowmorestrict,orhigher,tofilteroutnoise.
Results
Themotionalgorithmis implementedentirely inMatlab.Theresultsof thealgorithmaregoodwithroomfor
improvement.Motionisdetectedinsquareblocksinareaswhereitshouldbe.Thisisbasedonusviewingthe
videosandactually seeingwhatmovesand thencomparing it to theblockswhicharedetected.All character
movement,bodymovement,andmobileobjectsaredetectedwell.Resultswerelessaccurateduringtimesof
intensecameramovement.Duringcameramovement,manythingssuchasthebackgroundchangesalotwhen
itisinfactnot"important"motion.Nonetheless,duringthepresenceofmotion,moreweightisaddedinthat
particularareaandisaccountedforduringtheseamcarvingprocess.Adetaileddiscussiononhowthemotion
contributedtotheresultsoftheseamcarvingarediscussed inSection8. Anexampleofmotiondetectionon
sequentialframesisshowninFigure3.6. Inthefigure,thetwosequential inputframeshavejustafewsubtle
visibledifferences,butthemotiondetectionalgorithmeasilydetectsthechangesinthepositionofthesubject's
lightsaber,arm,andhead.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
11
Figure3.6:Twosequentialinputframesandtheresultingmotiondetectionoutput(withprimitiveedge
detectionshownasaguide).
SeamCarving
Seamcarvingusestheprominencescorestogenerateanenergymapwhichshowsthe"totalcost"ofaseamat
the bottom boundary (for a top‐down energy approach). We can't simply use the prominence scores to
determinewhich seams to add due to the fact that each prominence score isindependentfrom each other.
Hence,wecan'tmakeaneducateddecisionastowheretostartseamsjustfromexaminingonerow.Thisisthe
motivationbehindcreatingtheenergymap.
EnergyMap
Theenergymapiscalculatedbyatop‐downapproach.Thismeansthatthetotalcostofaddingaseamisfound
inthelast(bottom)rowoftheframe.Theenergymapiscreateddirectlyfromtheprominencescores.Thevery
toprowoftheenergymapisequivalenttothetoprowoftheprominencemap.Startingfromthesecondrowto
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
12
thelastrow,eachpixel'svalueisequivalenttothesumofitsprominencescoreandthemaximumprominence
scoreofthethreeadjacentpixelsaboveit.Thisalgorithmcreatesvaluesinthelastrowwhichcarryinformation
from every other row. The nature of this algorithm is clearly best implemented by dynamic programming.
Figure3.7showsanexampleenergymap.
Figure3.7:Aprominencematrixwithmotion,gradient,andfacevisible(left)anditscorrespondingenergy
(right)
Seams
Asmentionedearlier,weusetheenergymaptodeterminewheretoplaceseams.Wesimplychoosethelowest
100energyvaluesinthebottomrowoftheenergymapandmaketheseourstartingpositionsforeachseam.
This isdifferentfromseamremoval. Whenseamsarebeingremoved,seamsareremovedoneatatimeand
aftereachseamisremoved,theenergymapisrecalculated. This isanextremelyslowprocess. Inthistoken,
videoexpansionismuchmoresuitableforimplementationonaDSK.Thereasonwhywechoosethe100lowest
energyvaluesatonceratherthancalculatingoneseamatatime,istoavoidartifacting.Consideringcalculating
seamsone at a time. The first seamwould be calculated and then duplicated ‐‐ causing the imagewidth to
increaseby1pixel.Wethentrytofindthenextseam.Becausetheadditionofthefirstseamsimplycopiesthe
pixelstotheleftofit,theenergymatrixbarelychanges.Henceforth,whentryingtofindthesecondseam,the
algorithmwillapproximatelyselectthefirstseamagain.Thisprocesswouldkeepgoinguntilall100seamsare
chosen.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
13
Attheend,theviewerwillnoticethattherewouldbeverynoticeableartifactingfromexpandingthesamepixels
by100timesTheartifactingproblemisshowninFigure3.8.
Figure3.8:Seamcarvingartifactingforexpansion[3]
Thus,bychoosingall100seamsatthesametime,withoutalteringtheenergymap,weareabletoavoidthis
artifactingproblem(asshowninFigure3.9).Afterchoosingthe100startingpositions,wesimplybacktrackthe
energymaptodeterminetheentireseam. Thisresults inseamsthatarebothconnectiveandmonotonic [3].
Thisbasicallymeansthataseamcanonlyhaveonepixelperrowandeachpixeloftheseammustbeadjacent
totheseam'spixelsintherowaboveandbelowit.Thismakessense,aswewanttouniformlychangethewidth
ofallrows.
Figure3.9:Properseamcarvingforexpansion[3]
Thismethodthatwedescribedcreatesseamssolelydependentontheenergymap. Inordertoeliminatethe
jerkyartifactandlessenthewavyartifact,weaddnewrestraintsonframes.Wefirstcomputethefirstframeof
avideoasdescribedabove.Butthenfortheframesthereafter,weaddanewrestraintbasedontheprevious
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
14
frame.Forexample,forthesecondframe,westartoffwithhavingtheenergymapforthesecondframeand
theseamsusedinthefirstframe.Wethenaddarestraintforcalculatingtheseamsinthesecondframe.The
seamsinthesecondframemustbewithin3pixelstotherightor3pixelstotheleftoftheseamsfromthefirst
frame. The value, 3 pixels, was found from experimentation and it gave the best qualitative results. After
restraining the seams in the second frame with this 6 pixel window, we then allow the seams to change
dependingontheenergyofthe2ndframe.Byrestrictingtheseamsofthecurrentframebythepreviousframe,
wecreateatemporaldependancywhichcompletelyeliminatesthejerkyartifact.
Problems
By creatinga limitationonhowmuch seamscanchangebetween two frames,weencounterproblemswhen
importantregionsofthevideomovefastbetweenframes.Forinstance,ifthereisapersonintherightsideof
frame1andthenhemovesprogressivelytotheleftsideupuntilframe10,weoftenseeartifacting.Thisoccurs
becausetheseamsarenotabletomoveasfastasthehighenergyregionismoving.Whenthisoccurs,thehigh
energycontentbecomesdistortedbecause it runs into the seams. This canpossiblybeavoidedbynot fixing
rangeoftheseam'smovabilitywindow. Infuturework,wecanpotentiallycontent‐awarecalculatetherange
basedonthetotalenergyoftheseam.FutureworkandcurrentproblemsarediscussedindepthinSection8.
Onemethodweusetoamelioratethisproblemistheprocessofkeyframing.Akeyframeisaframethathasno
additionalrestraintsbeyondtheenergymapforseamcalculations.Ifwecontent‐awarelykeyframethroughout
thevideo, itgivestheseamsachanceto fully"reset;"essentially itallowstheseamstomovetotheirproper
locationsdisregardingtemporaldependancy.Thisfixestheissueofapersonrunningintoseams,aswecanjust
keyframethatinstance.However,determiningwhentokeyframeisataskinitself.Wenaivelykeyframewhen
there is a large energy change between frames. When the energy ratio between the current frame and the
previousframeiseither200%or50%,wemakethecurrentframeakeyframe.Thevalues200%and50%were
found by examining the energy ratios of a high‐motion 50 frame video. By keyframing only at these large
changes,weavoidtheproblemofkeyframingsuccessiveseams,asthiswouldbreakourtemporaldependancy.
Tofurtherbuildonusingenergyratios,wechangetheseam'smovabilitywindowdependingonalarge[small]
enoughenergyratio.Iftheenergyratioisfrom126%to200%orfrom50%to74%,wechangethewindowfrom
6pixelsto50pixels.Theideaisthatwewouldliketokeyframeatthischangeinenergyratio,butthechangein
this energy ratio occurs toomany times throughout the video to allow keyframing. Hence,we instead allow
moremovement for the seams so that they're able to "jump"more. This enhances thewavy artifact but it
allowsforseamstokeepupwithmovinghighimportantenergyareas.Again,thesepercentratioswerefound
fromthesamehigh‐motion50framevideo.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
15
4.BriefSystemOverview
The following sectionwill provideabrief systemoverviewof theCAVESproject. Formoredetails behind the
processinganddatatransferbetweenthePCandtheDSKpleaserefertoSection7:Processing,SpeedsandData
Rates.
From start to finish, we process an input video on the local machine and output its 420x240 24bpp
representation. The whole process can be broken down sequentially into steps with different PC and DSK
responsibilities. Figure 4.1 shows a basic representation of these steps. The details of the data flow are
describedfurtherinthissection.
Figure4.1:Simplifieddataflowwithprimaryalgorithmstepsshown
Inoursystemlayout,differentprogramsareusedindifferentpartsofthedataflow(Figure4.2).Matlabisused
for processing on the PC, but a separate network server, written in C, runs on the PC also and handles all
communicationwith theDSK. BothMatlab and the network server read fromandwrite to a common set of
image and video files on the PC, but the timing is asynchronous. In the data flow figures in this section, the
networkservercomponentisexcludedfromthefiguresforforclarity,butitisstillusedfordatatransfer.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
16
Figure4.2:Corecomponentsusedindatatransferandprocessing
1) PC: The computer takes the input video and does preliminary processing entirely inMatlab. This includes
breakingthevideodownintoseparateframes,rescaling,recoloring,andpreparationfortransfertotheDSK.In
addition to this,Matlabwillperformpreprocessingofevery frameof thevideoby running themthrough the
facedetectionandmotiondetectionalgorithms.Thiswillbeaccumulatedintoa"partialPT"whichwillbestored
onthePCandsenttotheDSKlater.(Figure4.3)
Figure4.3:Matlabpreprocessing
3)DSK:TheDSKtakestheframealongwiththepartialPTmatrixandcomputesthegradient.Afteraddingthe
gradientmatrix tothepartialPTmatrix, theDSKwillnowholdthefinalprominencematrix for thatparticular
frame.TheDSKcomputestheenergymatrixbasedonthisprominencematrix,andfeedstheenergymatrixinto
the seam calculations. All the seams are then routed for this frame. Remember that CAVES calculates seams
basedonthepreviousframe'sseamsalongwiththecurrent frame'senergy.TheDSKexpandstheframewith
thesecalculatedseamsandsendstheexpandedframebacktothePC.Fordebuggingandviewingpurposes,we
alsosendtheexpandedframewithredseams,a fullPTmatrix,andtheresultingenergymapbacktothePC.
(Figure4.4).
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
17
Figure4.4:DSKprocessing
Todeliver thenecessarydatatotheDSK,aseparatePCnetworkserverprogram isused,asdescribedearlier.
Thisprogramisverysimpleandperformsonlynetwork,file‐handling,andformat‐conversionoperations.Thisis
not shown in the simplifieddata flow inFigure4.4,but ispresent in the system.TheMatlabpreprocessing is
performedrelativelyfast(about5framespersecond)andeachframeissavedtoafileonthePC'slocaldrive.
TheDSKprocessingisstartedsimultaneouslyandisperformedasynchronouslyfromtheMatlabprocessing.The
PCnetworkserverreadseachfilerecentlycreatedbyMatlab,convertsittoRGB,andsendsittotheDSK.When
theDSKfinishesprocessingeachframe,thePCnetworkserverreceivestheRGBoutput,convertsittoBMP,and
savesitonthelocalPCdrive.Theexpandedframe,theexpandedframewithredseams,thefinalPTmatrix,and
theenergymatrixareeachsavedasindependentBMPfilesforeachframeinadesignatedfolderstructureon
thePC.Figure4.5showsthegeneralizedlayoutofdatadeliverytoandfromtheDSK.
Figure4.5:DetailofPC‐DSKdataflowforeachdatasegment
4)PC:ThecomputerreceiveseverythingfromStep3andreassemblesitintoavideowhilesavingittothelocal
machine.AlltheinformationisviewableontheGUI.(Figure4.6)
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
18
Figure4.6:Matlabpost‐processingandGUI
Thisisaquicksummaryofthebasicsystemlayoutofourproject.Again,pleasenotethatanin‐depthanalysisof
thisdataflowisincludedinSection7:Processing,SpeedsandDataRates.
5.GraphicalUserInterface
Figure5.1:GUIoverview
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
19
For the CAVES project, a graphical user interfacewas created to allow users to easily use the video resizing
algorithmandviewthe results (Figure5.1).Bydisplayingavarietyof intermediatestepsandallowingvarious
settingchanges,thisGUIwasextremelyhelpfulinourdebuggingprocessandallowsustoanalyzevarioussteps
inthealgorithm.It isalsoveryconvenientfordemonstratingthebehaviorofthealgorithminapracticalway.
Theuserinterfaceisdividedintotwomainsections:retargettingcontrolsandframeviewing.
RetargettingControlsandSettings:
TheretargettingcontrolsandsettingsarelocatedinthetopleftoftheGUI.(Figure5.2)
Figure5.2:Retargetingcontrolsandsettings
ControlButtons:
ThefirststepinusingtheuserinterfaceistousetheOpenVideobuttontoopenupavideowithMatlab.These
videosarelocaltothemachineandwillbereadintovariablesontheMatlabworkspace.Thesecondstepisto
useExtracttodeterminewhichsectionsofthevideoaretobeprocessedandtoperformallthepreprocessing
requiredsothatthesequencecanbesentovertotheDSK.Theexactprocessinganddetailsofthisarediscussed
in Section 7: Processing, Speeds, and Data Rates. Finally, the Retarget button will establish the connection
betweenthePCandtheDSKhardwareunitinordertoprocessourvideo.
ActiveAlgorithms:
ThePTmatrixthatwecalculateperframeisdependentonthegradientoftheframealongwithmotionandface
detection. As a default, all three of these attributes are enabled in order to contribute to our prominence
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
20
scoringastheyareallimportant.Wehaveaddedafeaturetoallowtheusertospecificallychooseamongthese
contributesforthePTmatrixcalculations.Thisenablesustoanalyzehowwellindividualpartsareworking.
FrameRange:
Theuserisallowedtoinputthespecificrangeofframesinthevideothatheorshewouldliketoretarget.Thisis
usefulfordebuggingaswesometimestendtoanalyzeparticularsequencesinvideos.Thisisalsoveryhelpfulfor
theviewingexperienceasusersmayonlywanttoprocesscertainsectionsofthevideo.
I/OHeight,Weight,andAspectRatio:
Thesearenotwritableby theuserandareusedtomerelydisplay theheight,weight,andaspect ratioof the
inputandoutputvideo.Wehavehardcodedtheinputvideotobe320x240pixelsandtheoutputvideotobe
420x240pixels.This isshownontheGUI tomaketheuserawareof thechangeswearemakingto thevideo
sequence. If the input video does not have an aspect ration of 4:3, its converted size will be smaller than
320x240. The correct converted input dimensions and corresponding output dimensionswith 100 seams are
displayed, preserving theoriginal aspect ratio of the videoduring resizing.However,wedid encounter some
problemswith retargetingbehaviorwhenusing input videos thatwerenotoriginally4:3. Theproblemswere
causedbyvariousdiscrepanciesbetweenthehard‐coded320x240dimensionsandtheactualdimensionsofthe
resized input. Some debugging would be necessary to fix these problems, but we were not particularly
concernedwith these cases becausemost of the videoswe used for testing and demonstrationwere 4:3. In
addition,wedidnothavetimetomaketheinputoroutputdimensionsettingsuser‐customizable.Futurework
inthisareashouldallowtheusertospecifyarbitraryinputandoutputaspectratios.
FrameView:
InputVideo:
Wedisplaythevideosequenceofourinputvideo.Itisimportanttonotethatallvideosequencesandrescaled
andrecoloredto320x24024bpp.
OutputVideo:
ThisisthefinalproductthatCAVESoutputs.Ourfinalvideois420x24024bppandistheretargettedversionof
theinputvideotoitsleft.
Prominence:
Forthecompletesequenceof inputframes,wedisplaytheprominencematrixofeachframe.Rememberthat
the prominencematrix consists of all the checked attributes in the Active Algorithms box. This includes the
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
21
gradient, face,andmotionbydefaultbutmaybechangedaccordingly.Theresultingprominencecanthenbe
viewed.
Energy:
Forthecompletesequenceofinputframes,wedisplaytheenergymapwhichiscalculatedwithrespecttothe
prominence. This isuseful to seehowenergy changes indifferent sectionsof a frameandbetweendifferent
frames.ChangingpropertiesintheActiveAlgorithmsboxalsoenablesustoanalyzehowourvariousalgorithms
contributetotheenergymatrixaswell.Theenergymatrixisthususedtorouteseams.
Seams:
For the complete sequence of input frames, we display the retargetting video at its new aspect ratio while
includingtheseamsdrawninred.Thisallowstheusertoseeexactlyhoweachoftheseamswereroutedwith
respecttotheenergymatrix.
ScrollBar/Animate:
InthemiddleoftheGUI,thereisascrollbarandananimatebutton.Thescrollbarscrollsbetweentheframesof
thevideosequencewhiletheanimatebuttonrunsthroughtheframesandplaysitasamovie.Thisisessential
toseeexactlyhowouralgorithmisperformingonavideosequence.
Note: InMatlab, behind the scenes,we are not playing amovie. TheGUI is stepping through all the frames
sequentially.TheCAVESprogramoutputstherescaledinputvideoalongwiththefinaloutputvideosequencein
avideoformattedfileonthelocalmachine.Theprominence,energy,andseamsareoutputtedasasequenceof
imagesandarenotreconstructedintoamoviefile.Thesearealloutputtednicelyintoadirectorytree.
7.Processing,Speeds,andDataRates
VideoandImageFormats
Wechosetouseaninputvideosizeof320x240pixelsandanoutputvideosizeof420x240pixelsforthesystem.
The input size was chosen because it is exactly a 4:3 aspect ratio and because it is a standardized display
resolution,knownasQuarterVGA.Thisisacommonsizeformobilevideodisplaysandisapopularresolution
forvideosfoundontheweb,includingthoseontheYouTubewebsite.Theoutputsizewaschosenbecauseitis
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
22
very close to16:9 andallowsexactly 100 columns tobeadded to the video frame.A true16:9 framewould
requireapproximately426.67columns,butthisisnotanintegerandthereisnostandardizedwidescreenframe
sizedefinedataheightof240pixels.Forthemostpart,thecodeofthesystemisgeneralizedtoallowtheframe
size to be changed relatively easily if necessary. Using the systemwith larger standardized resolutions (e.g.,
640x480)wouldrequiresignificantlylongerprocessingandnetworktransfertimes.Wedecidedthatourdefault
sizeof320x240waslargeenoughtovisiblydemonstratetheresultsoftheseamcarvingalgorithminapractical
waywhileminimizingoverallprocessingandtransfertimes.
ThePCdecodingoftheinputvideoisperformedentirelyinMatlab.Anyvideoformatandcodecsupportedby
Matlabcanbeprocessed.Whenthevideoisdecoded,itsframesareresizedto320x240,whichisa4:3aspect
ratio. If the input video is not 4:3, its existing aspect ratio is preserved and blank data is added around the
content so that it fits completely inside a 320x240 framewithout aspect ratio distortion.Matlab then saves
theseframes individuallyasBMP imagefileswith24bppcolor locallyonthePC.WedecidedtouseBMPfiles
becausetheydonotrequiredecompressionandwearenotconcernedwithstoragespaceonthePC.ABMPfile
with24bppcontains8bitseachforred,green,andbluecolorinformationineachpixel.Italsocontainsheader
dataofvariable lengthanddatapaddingbetweenrowsofthe imageforbytealignmentpurposes. Inorderto
minimize memory access on the DSK, to eliminate unnecessary format parsing on the DSK, and to reduce
networktransfertime,wetransformtheBMPfilestoasimplerRGBformatonthePCbeforeeachframeissent.
ThisRGBformathasnoheaderandnointernalpadding.TheDSKperformsallprocessingon24bppRGBdata(in
the case of the input frame) or 8bpp grayscale data (in the case of the PT). When the DSK processing is
complete,theRGBoutputistransferredtothePC.ThePCsavesthisoutputforeachframeasa420x24024bpp
BMPfilebycreatingtheappropriateheaderinformationandaddingdatapaddingbetweenrowsoftheimageas
necessary.
WeuseMatlab'sbuilt‐invideoprocessingtoolstocreateAVIfilesfromthebitmapimages.TheGUIallowsthe
usertoselectaspecific rangeof framestoretarget fromthe inputvideo.Matlabcreatesaduplicatevideoof
onlytheselectedrangeofframesfromtheinputsothatitcanbeeasilycomparedtotheoutput.WhentheDSK
hasfinishedprocessingallframes,Matlabconvertstheoutputbitmapsintoanoutputvideo.Wechosetouse
uncompressedvideooutputfromMatlabtoeliminatedistortioncausedbycompression.Wearenotconcerned
withstoragespaceonthePC,sothiswasnotaproblem.Iftheoutputvideosneedtobesavedpermanentlyor
moved,or if storage space is an issue, the compressionusedbyMatlabcaneasilybe changedbyadjustinga
singleparameterinMatlab'smovie2avifunctioncall.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
23
ColorDepth
Ourinitialimplementationofseamcarvingusedan8bppinputframe.Thiscolordepthwaschosentominimize
DSKprocessingtimeduetofewermemoryaccesses.Tochangethecolordepthofaninputvideo,weaddeda
simplepixel‐by‐pixelconversiontoourPCcodewhichsaved3bitsofreddata,3bitsofgreendata,and2bitsof
bluedata, inaccordancewiththestandard8‐bit truecolorscheme. Inour initial testing,wefoundthatvideos
originallyencodedwithhighercolordepthwereconvertedcorrectlybutexperiencedanoticeablereductionin
quality.Thisreductionisinevitablebecauseofthesmallernumberofcolorsthatcanberepresentedwith8bpp
comparedtoothercommoncolordepthssuchas16bppor24bpp.Weeventuallydecidedtouse24bpprather
than8bppintheDSKprocessingtoeliminateanysignificantreductioninquality.Whenretargetingvideosthat
arealreadysignificantlycompressed,theuseof24bppisnotvisiblydifferentthan8bpp.Inthefuture,itwould
bepossibletoaddsupportforbothformatsandallowtheusertoselecthigh‐qualityorlow‐qualityconversion
as one of the settings. This would require additional code support on the DSK, primarily in the gradient
calculation and the applySeams() function, to handle both formats. The use of 24bpp not only requires
additional memory accesses on the DSK but also requires additional data to be sent over the network.
Ultimately,wedecidedthatthehigherqualityofthevideooutweighedthecostoflongerprocessingtimeand
networktransfertime.
DSKMemoryandPaging
Our retargeting systemoperates sequentially onone frameat a time. Theonly information thatneeds tobe
savedontheDSKfromoneframetothenextisthelocationoftheseams.Thefunctionsthatchangetheseam
locations(getSeams()andgetInitialSeams())operatein‐placeandsimplyoverwritetheseamsofthepastframe
withtheseamsofthecurrentframe.Notethatseamsareformedandupdatedfromlefttorightandmaynot
overlap.Theonlytemporalconstraintofaseamisbasedonitsownlocationinthepastframe,notthelocation
ofanyotherseams.Theseamtoitsleftinthecurrentframeprovidesaspatialconstraint.Thus,noextraseam
datahastobesaved.
Besidestheseamdata,allotherdataisoverwrittenbyeachsubsequentsetofframedata.Wedefinedasimple
structcalledpixelcontainingthreecharfieldsforred,green,andblue.Thiswasusedtomakethecodingmore
intuitivefor24bppframes.Thefollowingtabledetailsthestoragerequirementsforthevarioussetsofdataon
theDSK.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
24
Data Type Dimensions Sizeinbytes
Inputframe pixel(3bytes) 320x240 230,400
PT char(1byte) 320x240 76,800
Gradient char(1byte) 320x240 76,800
Energy float(4bytes) 320x240 307,200
Outputframe pixel(3bytes) 420x240 302,400
Seams short(2bytes) 100x240 48,000
(Total) 1,041,600
Approximately 1 MB of data is stored on the DSK. All of these data fields are stored in external memory.
Processing isperformeddirectlyonthedata inexternalmemoryandnopaging isused.TheL2cacheissetto
32KB.Inanearlyimplementationofouralgorithm,wefullyimplementedpagingforallprocessing.Atthetime,
wewere using 8bpp color andwere only performing seam carving functions on theDSK, not prominence or
energyfunctions.DMAwasusedforallpagetransfers.Themaximumsizeofthememoryworkspacethatcould
fit on‐chipwas approximately 120 KBwith L2 cache turned off. In this implementation, we did not notice a
perceptibleimprovementinoverallsystemspeedcomparedtothesameimplementationwithoutpaging.Later,
weupgradedtheframecolordepthto24bppandsimultaneouslyaddedgradientandenergycalculationstothe
DSK.Thesechangesrequiredadjustmetstothepagingmechanism.Duetotheaddedcodecomplexityofpaging,
the need to handle extra boundary cases in all of the processing functions, and the minimal difference in
observed processing time, we chose to eliminate paging in our final implementation. After more detailed
testing,wefoundthatnetworktransfertimeandvariousinefficienciesinthePCcode(whichwerelaterfixed)
werecontributing toaslowoverall systemspeed. It is reasonable toassumethatpagingwouldhelpmemory
access time in our final algorithm implementation, but we could not re‐implement paging due to time
constraints.
Onesimpleoversightofourmemorymanagementwasthatverylittleon‐chipdataisusedexceptforautomatic
variablesinfunctionsandafewsmallpermanentarrays.Giventhattheremainingon‐chipmemoryavailableis
closeto100KB,oneofthedatafieldscouldbeeasilymovedfromexternalmemorytointernalmemorysimply
bychagingonelineofcode.Theseamdataisagoodcandidateforthisbecuaseofitssmallsize(48KB)andits
consistent use in multiple functions. We did not realize this until after performing all of our timing
measurements,sothetimingresultsinthisreportarebasedonfullyexternalmemoryallocation.Anyonewho
usesourcodeinthefuturecaneasilymakethischangetoon‐chipmemoryifpagingisnotdesired.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
25
Network
Foreachframe,thedatasentfromthePCtotheDSKiscomprisedofthescaledinputframeandthepartialPT
matrix, which is comprised of weighted face detection and motion detection scores. Both sets of data are
320x240pixels.Theinputframeis24bppandthePTis8bppgrayscale.Theresultingsizeoftheinputframeis
320 x240x3=230,400bytesand the sizeof thePT is320 x240x1=76,800. The total amountofdata to
transferfromthePCtotheDSKperframeis307,200bytes.ProfilingtheDSKcodefornetworkreceivingcaused
thedatatransfertofail,sowecouldnotdeterminethistimeexactly.Basedonthe2.5MB/sinboundspeedlimit
oftheDSK,weestimatethatthistransfertakes123ms.Thefollowingtabledetailsthetimerequirements.
Data Sizeinbytes Cycles Time
Inputframe 320x240x3=230,400 20,700,000estimated 92msestimated
PartialPT 320x240x1=76,800 6,975,000estimated 31msestimated
(Total) 307,200 27,675,000estimated 123msestimated
ThedatasentfromtheDSKtothePCiscomprisedoffouroutputimages.Theprimaryoutputistheexpanded
video frame itself,which is420x240and24bpp.Theotheroutputsare the finalPTmatrix, theenergymatrix,
andtheexpandedframewithvisibleseamsdrawninred.ThePTandtheenergyare320x240becausetheyare
basedontheinputimage.Theexpandedframewithvisibleseamsis420x240.Alloutputdataissentas24bpp.
ThismakestheprocessofconvertingfromRGBtoBMPconsistentforalloutputimages.Inthefuture,itwould
be possible to reduce network transfer time simply by sending the final PTmatrix and the energymatrix as
8bpp, since the data is grayscale. Thiswould also require additional grayscale‐to‐BMP conversion code to be
written on the PC with appropriate adjustments to the BMP header data to create an 8bpp BMP file.
Alternatively,thePCcouldconvertthegrayscaledatato24bppRGBandsubsequentlyconvertitto24bppBMP.
Wemadetheinitialdecisiontomakealloutputdata24bppforconsistency,notknowinghowthisaddeddata
transferwould impact outoverall system speed.Ultimately, network transfer timewasobserved tobemore
significant than we expected, but due to time constraints we did not have the opportunity to create the
additionalcodenecessarytohandle8bppoutputconversion. WhensendingdatafromtheDSKtothePC,we
were able to use DSK code profiling to perform precise measurements. These measurements essentially
confirmedthe10MB/soutboundspeed limitof theDSK.ThetablebelowshowstheDSK‐to‐PCdatasizesand
timerequirements.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
26
Data Sizeinbytes Cycles Time
Outputframe 420x240x3=302,400 6,350,700measured 28.2msmeasured
Outputframe+seams 420x240x3=302,400 6,350,700measured 28.2msmeasured
FinalPT 320x240x3=230,400 4,833,900measured 21.5msmeasured
Energy 320x240x3=230,400 4,833,900measured 21.5msmeasured
(Total) 1,065,600 22,369,200measured 99.4msmeasured
Thetotaldatasizeis justover1MB,andthetotaltimeisabout100ms.Thismatchesthepredicted10MB/s.If
the finalPTandenergyweresentas8bpp instead, theywouldeachrequire76,800bytesofdata transfer,or
about 7.7ms each. This would reduce the total outbound transfer time to an estimated 72ms per frame, or
about72%ofthepresenttransfertime.Notethatinapracticalapplicationofoursystem,suchasaconsumer‐
end video retargetingdevice, theonly necessary outputwouldbe the actual output frame,whichhas only a
28mstransfertime.Ifthebehaviorofthealgorithmwasdeterminedtobeadequate,itwouldberelativelyeasy
toaddanoptiontothesystemthatwouldallowittosendonlytheoutputframetothePC.Thiswouldincrease
theoverallspeedofthesystembyeliminatinginformationabouthowthealgorithmisworking.Wedidnotadd
thisoptionbecausewealwayswantedtohavetheadditionalinformationaboutthebehaviorofthealgorithm
availabletous.
MatlabProcessing
ThefullPTismadeupoffacedetection,motiondetection,andtheimagegradient.Weimplementedtheface
detectionandmotiondetectionalgorithms inMatlabaspartofthepre‐processingofeachframe.Sincethese
twodetectionalgorithmscontributeapartofthePTscoreforeachpixel,wecalltheoutputa"partialPT."When
the retargeting process is started, the array of input frames selected by the user is already saved locally in
Matlab'smemory.TheMatlabcodeisthenresponsibleforperformingthefacedetectionandmotiondetection
foreachframeandsavingtheresultingdatainafileonthePCharddrive.InadditiontotheexistingBMPfile,
eachframenowhasacorrespondingpartialPTfile.Thedataisan8bppgrayscalerepresentationofthepartial
PT.ThesefileswilllaterbereadbythenetworkserverprogramandsenttotheDSKasneeded.
Inourmeasurements,wefoundthatMatlab isabletoperformpreprocessingatarateofabout5 framesper
second.Thisspeedincludesthetimerequiredforfacedetectionandmotiondetectionandforsavingthefileto
theharddrive.WedidnotperformanysignificantoptimizationsbyhandtotheMatlabalgorithmssincetheir
performancewasalreadyadequateonthePC.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
27
AftertheDSKhasfinishedprocessingallframes,MatlabreadsallBMPfilesofexpandedframesandassembles
themintoaMatlab‐nativevideostructure.Thefinalexpandedvideoissavedtothelocaldrivewiththebuilt‐in
movie2avi()function.BecauseMatlabcanreadandassembletheexpandedframesrelativelyfast,wedesigned
thesystemtowaituntilallexpandedframeswereavailablebeforestartingtoreadthem.Thiscouldeasilybe
changed such that, after it completes preprocessing all frames, Matlab starts to asynchronously read the
expandedframeswhiletheDSKisstillprocessing.Inthisscenario,afterthefinalframeisexpandedbytheDSK,
thetimerequiredtofinishassemblingthevideoinMatlabwouldbesignificantlyreduced.Webelievedthisissue
was relatively unimportantbecause, formost videos, the total video assembly time is small. For the cases in
whichthistimeissignificant,itisstillrelativelysmallcomparedtotheoverallsystemprocessingtime.
DSKProcessing
TheprocessingofeachframeontheDSK isperformed indistinctstages.Thefirststage is tocompletethePT
matrix.Thefacedetectionandmotiondetectionresults,whicharecalculatedinMatlabandsenttothetheDSK,
arealreadysaved inmemory.TheonlyremainingcomponentofthePT isthegradient,which iscalculatedon
theDSK and added to the existing PT. Once the full PT is formed, the energy is calculated top‐down on the
frame.Seamcarvingisbasedsolelyontheenergymatrix,nottheinputframeitself.Eachseamisdrawninthe
framefromthebottomup.Thisislogicalbecausebitmapimagesarecustomarilystoredondiskandinmemory
row‐by‐rowstartingwiththebottomrowoftheimage.ThisroworderingispreservedintheRGBrepresentation
ofeach frameon theDSK.Theenergy calculation isperformed in theoppositedirectionof the seamcarving
algorithmsothattheseamcarvingalgorithmcan"seeahead"ofitscurrentposition.Thisultimatelyallowsusto
useagreedyalgorithmwhencarvingseamsbecauseallthenecessaryinformationfordirectingtheseamateach
pixel is contained nearby in the energymatrix. The seams are then applied to the input frame to create the
expandedframe.
In general, eachmajor function in theDSKalgorithms consistsof a constantnumberofmemoryaccesses for
eachpixel.Giventhe fact that the functionstendtotakethesameamountof timefordifferent input frames
andvideos, it is reasonable toassume thata significantportionofeachstage is limitedbymemoryaccesses,
althoughtheL2cacheincreasesthespeed.Thetablebelowdetailsthecyclecountsthatwemeasuredforeach
stageofouralgorithm.Thesecyclecountsdidvarybyseveralthousandfromframetoframebutwerealways
veryclose.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
28
Function Cycles(approximate) Time
gradient 1,850,000 8.2ms
getEnergy 3,540,000 15.7ms
getSeams 4,060,000 18.1ms
applySeams 2,820,000 12.5ms
(Total) 12,280,000 54.6ms
Insomecases,thefunctiongetInitialSeams()iscalledinplaceofgetSeams().Theonlydifferencebetweenthese
functionsisthepresenceoftemporaldependencyingetSeams().WhengetInitialSeams()iscalled,thealgorithm
is executed the sameway but the perevious seam locations are not used. The cycle count we observed for
getInitialSeams()was3,950,000,orabout110,000cycleslessthangetSeams().Thisdecreaseispartiallydueto
thesmallernumberofmemoryaccessesbecausethepreviousseamdatadoesnothavetoberead.Sincethe
actualtimedifferenceislessthan0.5ms,wecosiderthedifferencetobenegligibleandsimplycountthevalues
listedintheabovetableasaverage‐casetimes.
PerformanceOptimization
ToimproveDSKperformance,wechangedsomeofthecompilersettingsinadditiontomanuallymodifyingour
code tomake itmore optimal. Once our testing demonstrated that the algorithm behavior was correct, we
turnedoffthememoryaliassafetycompileroption.Weexpectedthatthiswouldsignificantlyhelpperformance
because of the heavy use of heap pointers in function calls and calculations. In reality, this adjustment only
significantlyincreasedthespeedofthegradientcalculation.Figure7.1showsthecyclecomparisonbetweenthe
stages of the algorithm using alias safety on (the default setting) and off (which we used in our final
implementation).
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
29
Figure7.1:Cyclecomparisonwithandwithoutaliassafety
Computationtime isspreadrelativelyevenlybetweentheprimaryalgorithmfunctions,sothere isnoobvious
bottleneck in the algorithm. The slowest functions in cycle time are getSeams and getInitialSeams, but we
expect this becauseof the complexity of the seam routing process and theneed formanymemory accesses
(sometimes repeating) in the seam data and energy data. From our initial estimates, we can see that the
compiler isabletooptimizeourfunctionssignificantly.Forexample,thenumberofexternalmemoryaccesses
(alloccurringas1‐bytereads)inthegradient()functionaloneisover1.2million.Ifeachbytewasactuallyread
fromexternalmemoryateachcorrespondinglookupinthecode,thiswouldrequirenearly7millioncyclesfor
memory access alone, given an access time of 5.6 cycles. Alternatively, if each byte was read from internal
memorywithanaccesstimeof1.5cycles,memoryaccesseswouldrequire1.8millioncycles.Sincetheentire
optimizedgradient()functionrequiresjustover1.8millioncycles,wecanassumethatacombinationoftheL1
andL2cachesminimizestheexternalmemoryaccesstimeforthisfunction. IngetEnergy(),wecountedabout
2.4millionbytesofexternalmemoryaccesses,suggesting3.6millioncyclesrequiredifinternalmemoryisused
(1.5cyclesperbyte),andthecycletimeofthefunctionisabout3.5million.Itisobviousfromtheseresultsthat
both theL1andL2cachesareusedheavily inourprocessing.Similarcacheoptimizationsoccur for theother
functions.
Afterverifyingouralgorithmbehavior,werevisedmuchofoursourcecodebyunrollingconditionalstatements
andboundarycasesfromwithinimportantloopswhenpossible.ForthegetEnergy()function,thischangealone
reducedthecyclecountbyover50%fromourinitialimplementation.Wealsoeliminatedunnecessarymemory
accesses in the getSeams() and getInitialSeams() functions by establishing a validitiy check for possible seam
routes before comparing their energy measurements. Based on the compiler's suggestions, we added code
speculationoption(‐mh2)andremovedthedebugoption(‐g)tomaximizespeedinthefinaltests.Optimization
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
30
level3wasused for theproject andno settingswere changedonaper‐filebasis.Wedidnotneed toworry
aboutnegativeeffectsfromaggressiveoptimizationbecausenointerrupthandlingisperformedinourcode.
AsafinalgaugeofourDSKperformance,weexaminedtheassemblycodegeneratedbythecompiler.Fromthis
we observed that nearly all of the primary processing loops in the algorithm were scheduled with 3 or 4
iterationsinparallel.Forthosethatwerenotscheduledinparallel,wewereabletochangethecodefurtherby
hand,makingminoradjustmentstocreateadditionalparallelism.TheonlyexceptiontothiswasthegetEnergy()
method,whichcontainsmanyintermediatevariablesandcalculations,andforwhichthecompilercouldnotfind
aschedulewithanyiterationsinparallelregardlessofouradjustments.Still,weweresatisfiedwiththisresult.
SystemSpeed
Videoandimageconversionisnotcountedinourtimingofthesystemsincethesetimesarerelativelysmalland
sincewe simplyused thebuilt‐inMatlab functions asneeded. Forourpurposes,weassumed that the set of
inputBMPfilesisalreadypreparedandthatthesystemonlyneedstooutputaBMPfileforeachframewithout
consideringvideoreassembly.
Theretargetingsystemrunsatabout2.7framespersecond.Thecorrespondingtimeperframeisabout370ms,
whichisbrokendownindetailinFigure7.2.Thisincludesthetimeforreadingtheinputfilefromthelocaldrive
andsavingtheoutputfiletothelocaldrive.Thesefilemanagementtimes,inadditiontothetimerequiredfor
BMP‐to‐RGB and RGB‐to‐BMP conversion, are counted in the PC overhead time. As described above,Matlab
preprocessingofeachframeoccursasynchronouslyandisnotcountedhere.Networktransfertimesweremore
significantthanweoriginallyexpectedandcontributed220ms,orabout60%ofthetimeperframe.
Figure7.2:Averagetimeforretargetingasingleframe
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
31
DSK processing time is divided into two categories in Figure 7.2. Primary processing refers to the algorithm
functionsdescribedintheDSKprocessingsectionearlier.Thesearethefunctionsnecessarytogenerateasingle
expandedoutputframe.Inaddition,theDSKperformssomesecondaryoperationstodeliverthePT,energy,and
visible‐seamoutputstothePC.OnedesignchoicewemadewastosendalloutputdatatothePCas24bpp.This
allowed us to use the same RGB‐to‐BMP conversion function for all outputs on the PC,which simplified our
codingbut resulted in slowernetwork transfer,asdescribedearlier. Inaddition, it requiresextra timeon the
DSKtoconvertthePTmatrix(oftypechar)andtheenergymatrix(oftypefloat)toRGB.Itwouldbepossibleto
speed up the system by eliminating these DSK‐side conversions, but we did not have time to make these
changes. The other part of the secondary DSK processing is a second instance of the applySeams() function
whichdrawstheseams inred insteadofexpandingexistingpixels in the frame.This ismemory intensiveand
performsmostlyidenticalmemorycopiesastheoriginalcalltoapplySeams().Asaresult,callingbothfunctions
inseriesaswedoisinefficient,butwewewantedtokeeptheabilitytocalloneortheotherindependentlyif
necessary.Themoreefficientalternativewouldbetodrawtheseamsovertheexistingoutputframeafteritis
senttothePC,sothattheframedoesnothavetobeexpandedagain.Alternatively,theDSKcouldsimplysend
theseamdataandthePCcoulddrawthemontopoftheexpandedframe.Eitheroftheseapproachescouldbe
implementedinthefuture,butourapproachisthemostrobustatthecostofspeed.
Onceagain,allofthesecondaryDSKprocessing,asignificantpartofthePCoverhead,andasignificantpartof
theDSK‐to‐PCtransfercouldbeeliminatedinapracticalapplicationofthesystembecausetheonlyimportant
result istheexpandedframeitself.Toestimatethespeedincrease inthisscenario,wecalculatethatthenew
DSK‐to‐PCtransfertimewouldbe28ms,andweknowthatthesecondaryDSKprocessingwouldbeeliminated
completely.ThePCoverheadwouldbeapproximatelycutinhalfto35msconservatively.Theresultingtotaltime
per framewouldthenbeabout240ms,a35%reduction fromthecurrentmeasuredtime.Thiswouldallowa
system throughput of about 4 frames persecond. We did not test this scenario because we were more
concernedwiththebehaviorofthealgorithmthanwithideal‐casespeedupgrades.
8.Problems,Issues,andFutureWork
ISSUE:
Theoutputis"wavy"duetomovingseamswithinunimportantregions.
DISCUSSION:
Avarietyofdifferentvideosdisplayedwavy‐likebehaviorwithinseamregionsafterbeingprocessedbyCAVES.
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
32
Withintheseunimportantregions,theseamstendtomovearoundquitenoticeablyandhaveanadverseaffect
ontheoveralloutputvideodisplayexperience.Themainreasonbehindthisisenergychangesduetonoisefrom
thevideoitself.Aswerescaleinputvideostoourpredeterminedheightandwidth,thevideoqualityisnotideal.
Theexistingcompressioninthevideosweusedfortestingwasalsosignificantenoughtocauseamplifiednoise
inourPTcalculations.Subsequently,thereisenoughnoisetocausetheenergyregionsinseamareastochange
suchthattheseamsthemselvesareforcedtomovebasedonthenatureofthealgorithm.Theseamschangein
ordertofollowtheleastenergypaths,butthesepathsappearanddisappearwiththevideonoise.Asaresult,
ourseamstendtoexhibitawavyeffectattimesinunimportantregions.
Apossibleimprovementcouldbetoaccountforthisvideonoisewithintheenergycalculationalgorithmitselfor
tofilteroutthevideonoisebeforeprocessingtheframe.Thisway,theenergymatrixcalculationscouldbemore
accurateandonlydisplaylegitimateenergychanges,notrandomfluctuationsintheinputvideosequence.
ISSUE:
Seams can not move fast enough to accommodate for fast changes in video sequences such as character
movement.
DISCUSSION:
WhentestingvideoswithCAVES,wewereforcedtosetboundstoseammovementinordertoprovidetemporal
smoothness to the video sequences.Without this temporaldependency in seams, video sequencesdisplayed
choppy results as seams relocate themselves around the image too drastically. Such choppy behavior was
unacceptableintermsofthefinalviewingexperiencethusmakingtemporaldependencyanessentialtechnique.
Nonetheless, this customization came at a cost. During periods of extreme movement during the video
sequences,characterssometimesrun intotheseamsandbecomedistorted.Duetoourrestraints,there isno
way for the seam tomove out of the way fast enough. As expected, the algorithm does detect the energy
change and seams shift accordingly within time. However, in this time the important regions of our video
becomedistorted.Thereareacoupleofpotentialfixesthatwewilldiscussinordertobetteraccommodatefor
this.
Thefirstpotentialfixistomaketheseamrestraintrangechangewithrespecttomotion.Currently,motiononly
addstotheprominencematrixwhichgets fed intoourenergyfunction.Theaddedmotiondoeshave itsown
contributionasitintroducesmoreenergyintospecificsectionsofmovement,butitdoesnotdelegatetherange
oftheseams.Anewsystemcouldtakeintoaccountmotionspecifically,andchangetherangeofspecificseam
movement dynamically. Ideally, this would result in seamswith a large range ofmotion in areas where the
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
33
algorithmdetectslargequantitiesofmovement.
Asecondpotentialfixforthisscenarioistoforcetheredrawingofseamsinthesespecificareasofinterest.Let
usconsideranexamplewhereacharacterorobjectmovesintoanareadenselypopulatedbyseamsandthus
becomesdistorted.Aproposedfixistoredrawtheseamswhichcausethisdistortion.Ifacharactermovesintoa
groupofseams,thatgroupofseamswillberecalculatedanddrawninadifferentareaoftheframeawayfrom
themovement.Sincethisisonlyasmallsectionofseamsbeingrecalculatedatatime,thiswillnothinderthe
viewingexperience.Thisisanotherpotentialimprovementuponourcurrentmethodstodrawnseams.
ISSUE:
Duringsceneswithahighnumberofimportantregions,areasinevitablygetdistortedincertainareas.
DISCUSSION:
This is a problem with our implementation which has a tough time handling videos with many important
regions.Thepotentialfixbehindthisliesinthefactthatwehavechosentohardcodethenumberofseamsand
expandeachseambyexactlyonepixel.Asidefromthisapproach,thereareavarietyofothermethodswhichwe
havethoughtabout,butdidnotimplementintheCAVESproject.
Thepotentialsolutiontothisliesinthefactthatitisnotnecessarytoexpandaseambyonlyonepixel.Sincewe
decided toexpandour imagebyonehundredpixels, itwasnot required thatwehave to routeonehundred
specificseams.Forexample,fiftyseamscouldhavebeenchosenwhichcouldhavebeenaddedtwiceinsteadof
once. This still expands the image by one hundred pixels and draws less seams which could result in less
distortion. Furthermore, smarter techniques could have been developed to expand low energy seamsmore
often than higher energy seams. This way, the overall number of seams could have been reduced but the
expansionresizingcouldstillbethesame.Thiswouldbeanotherpotentialfixfortheissueofhavinginevitable
distortionin"busy"frames.
9.FinalWorkSchedule
We chose to create the GUI and the PC‐DSK network infrastructure early in the project, in parallel with the
researchanddesignofourcorealgorithms,sothatwecouldeasilyobservethebehavioroftheDSKcodeaswe
transitioned from theMatlab implementation of the algorithm to the C implementation. This allowed us to
verifythebehaviorofthePTandenergycalculationsandwasparticularlyusefulintheearlystagesofdebugging
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
34
our seamcarvingalgorithm.Since thecomponentsofour systemareverymodular, itwas straightforward to
test each part of the data flow either inMatlab or on the DSK itself.We did not encounter any significant
obstacles in theMatlaborC implementationsof the functions, sowewereable toclosely followouroriginal
timeframe and to use the last fewweeks to tweak the behavior of the algorithm andwork on optimization
ratherthandebugbehavioral issues in thesystem.Thetablebelowshowstheweeklybreakdownof taskswe
accomplished.
Week Task Person
October12 Matlabgradientsandfacedetection Dave
PCvideo/imageprocessing Greg
Retargetingalgorithmresearch Aneeb
October19 GUIcreation Greg
Matlabmotiondetection Dave
Retargetingalgorithmresearch Aneeb
October26 Matlabretargetingimplementation Aneeb
Networkinfrastructure Greg
Facedetectionandmotiondetectionimprovements Dave
November2 SeamcarvingimplementationonDSK Everyone
November9 EnergycalculationonDSK Everyone
Networkimprovements Greg
November16 Seamcarvingbehavioraladjustmentandoptimization Greg,Dave
Energycalculationoptimization Aneeb
November23 DSKalgorithmtweaks;completesystemtesting Everyone
November30 NetworkandPCcodefixes Greg
Convertfullvideosfordemonstration Aneeb
Finalcodeoptimizationsandcompletesystemtesting Dave
Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)
35
10.References
•[1]L.Wolf,M.Guttmann,andD.Cohen.Non‐homogeneousContent‐drivenVideo‐retargeting.IEEETrans.on
ImageProcessing,2007.
–Videoretargetingbasedonlargecomputationsthroughsystemsoflinearequations
•[2]S.‐C.Liu,C.‐W.Fu,andS.Chang.Statisticalchangedetectionwithmomentsundertime‐varyingillumination.
IEETrans.OnImageProcessing,1998.
–Motiondetectionalgorithm
•[3]S.AvidanandA.Shamir.Seamcarvingforcontent‐awareimageresizing.SIGGRAPH,2007.
–Originalseamcarvingexplanation;weusethisinformationasthebasisofouralgorithms
•[4]S.Avidan,A.Shamir,andM.Rubinstein.ImprovedSeamCarvingforVideoRetargeting.ACMSIGGRAPH,
2008.
–Ideaofforwardenergycalculationsandsuggestionsfortemporalconsiderations
•[5]Marius,D.,Pennathur,S.,&Rose,K.(n.d.).FaceDetectionUsingColorThresholding,andEigenimage
TemplateMatching.
–Facedetectionalgorithmresource
•[6]Z.Wolkowicki,J.He,andM.Gonzalez‐Rivero.ContentAwareImageResizing.18‐551Fall2007Group6,
2007.
–Generalresource;adoptedWolkowicki’sfacedetectionalgorithm