privacy protected surveillance using secure visual object coding

1152 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 8, AUGUST 2008

Privacy Protected Surveillance Using SecureVisual Object Coding

Karl Martin, Student Member, IEEE, and Konstantinos N. Plataniotis, Senior Member, IEEE

Abstract—This paper presents the Secure Shape and TextureSPIHT (SecST-SPIHT) scheme for secure coding of arbitrarilyshaped visual objects. The scheme can be employed in a pri-vacy protected surveillance system, whereby visual objects areencrypted so that the content is only available to authorized per-sonnel with the correct decryption key. The secure visual objectcoder employs shape and texture set partitioning in hierarchicaltrees (ST-SPIHT) along with a novel selective encryption schemefor efficient, secure storage and transmission of visual object shapeand textures. The encryption is performed in the compressed do-main and does not affect the rate-distortion performance of thecoder. A separate parameter for each encrypted object con-trols the strength of the encryption versus required processingoverhead. Security analyses are provided, demonstrating the con-fidentiality of both the encrypted and unencrypted portions of thesecured output bit-stream, effectively securing the entire objectshape and texture content. Experimental results showed that noobject details are revealed to attackers who do not possess thecorrect decryption key. Using typical parameter values and outputbit-rates, the SecST-SPIHT coder is shown to require encryptionon less than 5% of the output bit-stream, a significant reduction incomputational overhead compared to “whole content” encryptionschemes.

Index Terms—Encryption, privacy protection, security, setpartitioning in hierarchical trees (SPIHT), shape adaptive coding,shape and texture coding, surveillance, visual object coding,wavelet-based coding.

I. INTRODUCTION

V IDEO surveillance of both public and private spaces is ex-panding at an ever-increasing rate. Consequently, individ-

uals are increasingly concerned about the invasiveness of suchubiquitous surveillance and fear that their privacy is at risk. Thedemands of law enforcement agencies to prevent and prosecutecriminal activity and the need for private organizations to pro-tect against unauthorized activities on their premises are oftenseen to be in conflict with the privacy requirements of individ-uals.

Manuscript received October 25, 2007; revised March 7, 2008. First pub-lished June 17, 2008; current version published August 29, 2008. This workwas supported in part by a grant from the Natural Sciences and EngineeringResearch Council of Canada (NSERC) under the Network for Effective Col-laboration Technologies through Advanced Research (NECTAR) Project. Thispaper was recommended by Guest Editor I. Ahmad.

The authors are with The Edward S. Rogers Sr. Department of Electrical andComputer Engineering, Multimedia Laboratory, University of Toronto, Toronto,ON M5S 3G4, Canada (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2008.927110

In order to address this, we propose a secure visual objectcoder, Secure Shape and Texture Set Partitioning in Hierar-chical Trees (SecST-SPIHT). The SecST-SPIHT scheme codesthe shape and texture of arbitrarily shaped visual objects in thesame fashion as ST-SPIHT [1], employing a shape-adaptive dis-crete wavelet transform (SA-DWT) variant [2] and a modifiedSPIHT algorithm [3] offering progressive/embedded bit-rateoutput. The proposed scheme incorporates a novel selectiveencryption algorithm, utilizing a stream cipher to encrypt asmall portion of the output bit-stream. The activation of thecipher is controlled by intelligent bit-classification instructionsreceived from the coder. The scheme efficiently and effectivelysecures the entire shape and texture of the object and ensuresthat the object cannot be reconstructed without provision of thecorrect decryption key; no object details are revealed withoutproviding the exact, correct decryption key. At typical outputbit-rates and choice of security parameter, the encryption op-eration is performed on less than 5% of the output code bits;the remaining unencrypted code bits cannot be decoded due totheir dependence on the correct interpretation of the encryptedportion of the code. The progressive/embedded nature of thecoder allows the output bit-rate to be varied without affectingthe total number of encrypted bits or reducing security.

The SecST-SPIHT secure coder can be employed in surveil-lance systems where the capture of certain sensitive visual ob-jects may be considered privacy invasive (e.g., face and bodyimages). If the sensitive objects are detected and segmentedfrom the surveillance footage, SecST-SPIHT can be employedto simultaneously code and encrypt them, thus ensuring theirconfidentiality against those who do not possess the decryptionkey. Unlike privacy protection schemes which irreversibly bluror obscure the private data, the encrypted, coded objects pro-duced by SecST-SPIHT can be decrypted and decoded when re-quired for investigative purposes. The key management policiesshould reflect the privacy protection policies of the surveillancesystem. That is, only the appropriate authorities should possessthe decryption key—e.g., law enforcement investigating an inci-dent. In this scenario, the personnel who regularly monitor thesurveillance system would not possess the decryption key. Apolicy decision can be made whether such personnel can accessjust the shape (but not texture) of the encrypted objects so as tobalance the efficacy of the surveillance system while still main-taining the privacy of encrypted objects.

In enterprise environments, the key may be tied to the sub-ject’s identity (e.g., through RFID-based tokens), thus givingcontrol of the private content to the subject. Alternatively, thekey may be only provided to senior management in an organi-zation so that the privacy of individuals is protected during reg-ular monitoring by security personnel. The proposed, selective

1051-8215/$25.00 © 2008 IEEE

MARTIN AND PLATANIOTIS: PRIVACY PROTECTED SURVEILLANCE USING SECURE VISUAL OBJECT CODING 1153

encryption procedure makes the scheme suitable for real-timeapplications where significant processing resources are requi-sitely consumed for coding of the video stream and traditional“whole content” encryption may be computationally infeasible[4]. While the SecST-SPIHT secure coder is particularly suit-able for privacy protection applications, it can be used to secureobjects of any nature; SecST-SPIHT can be utilized in any sce-nario where arbitrarily shaped visual objects must be kept con-fidential via a secret key. In this way, SecST-SPIHT operates asa generic secure object coder.

One class of existing schemes addressing privacy protec-tion in video surveillance employs scrambling, obscuring,or masking techniques to protect the identity of the subjects[5]–[8]. In these schemes, the visual texture data of the subject’sface or whole body are discarded or irreversibly transformed.Where SecST-SPIHT stores the encrypted, coded object data,these schemes disallow the use of the content for future in-vestigative purposes and ultimately limit the efficacy of thesurveillance system in which they are utilized. In [5], thesubject’s body image is masked, revealing only a silhouette.However, such a silhouette may still allow identification of thesubject via biometric modalities such as gait [9]. Similarly, in[6], the focus is on removing appearance information while re-taining structural information about the body in order to assessbehavior. The approach in [7] is to “de-identify” face imagesso that facial recognition software cannot be used to reliablyidentify the subject, but enough facial features remain so thatthe image could still be used for detecting behavior. In thisso-called -Same approach, face images are clustered based ona distance metric, and the images replaced by a representativeimage generated by averaging of components based on pixelsor eigenvectors. This approach, however, does not obscure thewhole body image, and again, the original data is discarded andcannot be retrieved by authorized users. In [8], colored markersare worn by subjects who wish to have their face obscured ina particular surveillance environment. Employing AdaBoost tolearn the marker’s color model and particle filtering to track themarker from frame-to-frame, the subject is tracked in real timeand an elliptical mask placed over the head region. However, thescheme may not be practical in public scenarios as it requiressubjects to “opt-out” through the use of the colored marker.

Another class of privacy protection schemes attempts to sep-arate private features from the input signal and secure themin a fashion so that they may still be retrieved for future use[10]–[13]. The proposed SecST-SPIHT scheme falls under thiscategory as the subject’s image is coded and encrypted as anarbitrarily shaped object and can be retrieved with provision ofthe correct decryption key. In [10], a region of interest (ROI)is defined for face data within a frame, and the correspondingcoefficients are downshifted in order to be coded and protectedin a separate quality layer using Motion JPEG 2000 [14]. How-ever, using a traditional nonshape-adaptive wavelet transform,the wavelet-domain separation of ROI content only allows forrough separation of content in the spatial domain, thus disal-lowing precise object versus background separation possiblewith object-based coding. The computer vision approach of [11]provides three policy-dependent options to hiding privacy data:summarization transformation (obscuration), and encryption. Inthe case of encrypted output, traditional encryption is applied to

the entire private data stream, which is computationally infea-sible in many digital video surveillance systems. The schemeproposed in [12] embeds the private information of subjects asan encrypted watermark within the surveillance frames. How-ever, the private data is limited to rectangular regions of theimage frame and the utilization of traditional encryption andwatermarking may be computationally burdensome. In [13], areversible wavelet-domain scrambling is performed on ROI-de-fined private data, thus allowing subsequent retrieval of the pri-vate data by authorized users. This approach, as in [10], doesnot allow explicit spatial domain separation of the object of in-terest and the background, and the ROI shape is not secured.Furthermore, the scrambling is performed before compression,resulting in a modest reduction in coding performance [13].

A variety of image and video content protection schemesexist for entertainment applications [15], [16]. The techniquesemployed generally place an emphasis on standards com-pliance to ensure compatibility with the plethora of existingconsumer devices and content delivery systems. However, thesetechniques may not be directly applicable to privacy-protectedsurveillance applications, where system operators may demanda greater level of confidentiality over the content and the systemmust support a mechanism for separation of private contentwhile still maintaining the efficacy of the surveillance system.The schemes in [15] use efficient encryption or shuffling ofvariable-length codeword concatenations to secure MPEG-4video streams while maintaining format compliance. However,entire frames are secured and hence cannot be used to secureonly private data in surveillance applications. Furthermore,some image details may be reconstructed through error con-cealment techniques [15]. In [16], MPEG-4 video objects aresecured through selective encryption of object descriptors(ODs). This approach, however, offers very limited securitysince only meta-data is secured and none of the actual objectcontent is encrypted.

The proposed SecST-SPIHT secure coder offers an efficientsolution for protection of private data in surveillance video. Theobject-based coding approach allows for explicit separation of asubject’s shape and texture from background imagery, offeringa finer level of content granularity not present in ROI-basedschemes. The selective encryption algorithm is designed to min-imize processing overhead by encrypting the minimum amountof output code bits required to decode the original object shapeand texture. The analysis provided in this paper verifies the se-curity of the proposed scheme and offers insight into the generaldesign of selective encryption approaches for embedded coders.

The remainder of this paper is organized as follows. InSection II, the SecST-SPIHT scheme is described in detail.Security analysis of SecST-SPIHT is provided in Section III.In Section IV, experimental results are provided and analyzedfor various object inputs and parameters. Finally, the paper isconcluded in Section V.

II. SECURE SHAPE AND TEXTURE SPIHT CODING SCHEME

The Secure ST-SPIHT (SecST-SPIHT) coding and de-coding system is shown in Fig. 1. It employs the ST-SPIHTscheme for coding arbitrarily shaped visual objects [1] (see theAppendix), with a novel selective encryption algorithm thatutilizes a stream cipher to encrypt specific bits in the output


Fig. 1. System-level diagram of the SecST-SPIHT coding and decodingscheme.

bit-stream. Any stream cipher may be chosen for this, providedthat it is sufficiently secure for the application at hand, that is,SecST-SPIHT is only as secure as the stream cipher it utilizes.The shape and texture of the input object are coded in parallel,producing a single partially encrypted, embedded bit-streamwhich can be progressively decoded with provision of the cor-rect decryption key; the resultant bit-stream may be truncatedat an arbitrary point to produce a lower bit-rate output. Theselective encryption offers an efficient alternative to completecontent encryption which can be computationally burdensomein full color image and video applications. The data-depen-dent decoding algorithm makes the unencrypted portion ofthe bit-stream effectively impossible to locate or interpret.Furthermore, the bits chosen for encryption represent the mostsignificant components of the coded object, ensuring com-plete confidentiality of the visual data from those without thecorrect decryption key. Since encryption is performed duringthe output stage, SecST-SPIHT offers identical rate-distortionperformance and embedded/progressive output properties asST-SPIHT [1]. The proposed system describes secure codingof still visual objects but can easily be extended to the framesof a video object sequence in a fashion similar to Motion JPEG2000 [14] or using 3-D transform domain representations [17].

The input consists of two components: 1) an fullcolor (texture) image representing a two-di-mensional (2-D) matrix of three-component RGB color samples

, withand denoting the spatial position of thepixel, and denoting the component in the red ,green , or blue color channel and 2) anbinary (shape mask) image representing a2-D matrix of binary values where denotes spatialpositions “inside” the object, and denotes spatialpositions “outside” the object. The object is preprocessed byfirst converting the texture to the color space. Subse-quently, texture positions outside the object are set to zero, suchthat , , where .

Each color channel of the texture is subsequently transformedusing an in-place lifting shape-adaptive discrete wavelet trans-form (SA-DWT) with global subsampling [1], [2], creating the

vectorial field of transform coefficients. The in-place

Fig. 2. SecST-SPIHT coder.

SA-DWT allows the spatial domain shape mask to remainunmanipulated and coded directly [1].

The SecST-SPIHT coder, shown in Fig. 2, employs anST-SPIHT coder and selectively encrypts the output bit-streamusing a stream cipher , applied to individual bitsusing the private key . The ST-SPIHT algorithm is utilized tocode the input shape and texture as well as to provide intelligentbit classification instructions to the stream cipher. The detailsof the SPIHT and ST-SPIHT coding algorithm are summarizedin the Appendix; full details and analysis can be found in [1].

A. Selective Encryption for ST-SPIHT

The SecST-SPIHT selective encryption algorithm is a novelextension of the scheme proposed in [18] for regular SPIHT.By extending the selective encryption principle to object-basedcoding, the encryption of arbitrary image regions is achieved.We denote the ST-SPIHT bit-stream as the ordered set of bits

. The bit-stream can be divided into the ordered subsets, where is the set

of bits obtained during coding iteration for bit-plane (i.e.,representing the value ) and is the highest bit-plane atwhich coding is initiated. Each can be further subdividedinto , where denotesthe ordered set of bits obtained during the first phase of thesorting pass where coefficients in the LIP are tested for signif-icance, denotes the ordered set of bits obtained duringthe second phase of the sorting pass where entire trees aretested for significance, and denotes the ordered set ofbits obtained during the refinement pass.

Each set of bits is composed of -test shape bits, significance bits , and sign bits

. Similarly, each set of bits is composed ofsignificance bits and sign bits for in-dividual coefficients, significance bits for trees ,and -test shape bits for both individual coefficients and trees

. This decomposition of the bit-stream is shown inFig. 3.

The SecST-SPIHT encryption scheme uses an en-cryption function to encrypt only the bits

, for, and . The key

enforces the confidentiality of the data by preventing entitieswithout the correct matching decryption key, , from correctlydecrypting the data.1 The parameter is controlled by the user

1In the case where f (b; k ) implements a symmetric key cipher, k = k .


Fig. 3. Composition of subset B of ST-SPIHT bit-stream for n > �.

Fig. 4. SecST-SPIHT decoder.

at the time of encryption/encoding to determine the number ofcoding iterations to be encrypted. Increasing results in morebits being encrypted and greater security, with the tradeoff ofgreater computational overhead. The specific bits are selec-tively chosen since they represent the object shape informationand the significance information of individual coefficients.The coefficient sign bits ( and ) remainunencrypted since their values do not affect the coder/decoderexecution path. Similarly, the significance bits relating to entiretrees remain unencrypted since they do not affectspecific coefficient reconstruction values.

The encryption function must be implementedusing a stream cipher since the decoder (Fig. 4) must decodeindividual bits and instruct the decryption functionwhether each subsequent bit requires decryption or not; theuse of a block cipher would prevent the decoder from correctlydetermining which bits in the output bit-stream are part ofthe cipher block. However, the system is flexible in that anybit-level stream cipher may be used, employing either sym-metric private keys or public–private key pairs.

For ease of notation, we introduce the controlled encryptionfunction defined as follows:

otherwise.(1)

Hence, the encryption function is only activated for the firstiterations of the coding algorithm, after which the input bits arepassed through, unencrypted. The complete description of theSecST-SPIHT routine and the Secure SCS subroutine (SecSCS)follows. The reader should refer to the SPIHT and ST-SPIHTbackground material provided in the Appendix for a descriptionof the notation employed.

SecST-SPIHT Coder:

Input: , , , ,1) Initialization: Find initial quantization level

;set ; set ; set

" .2) Sorting pass:

2.1. For each :2.1.1. If not coded yet then output

;2.1.2. If then:

Output ;If then move to theLSP and output the sign of ;

2.1.3. If then remove from theLIP;

2.2. For each entry :[If “type-A” entry, ; If “type-B”entry, ]

2.2.1. If and shape not completely coded,then:

If not coded yet then output;

If then remove fromthe LIS and move on to next entry in theLIS (go to Step 2.2);If then:— If not coded yet then output

;— If and then run

;2.2.2 If shape completely coded and

then remove from the LIS and move onto next entry in the LIS (go to Step 2.2);

2.2.3. If “type-A” entry and :Output ;If then:— For each :

* Output ;* If then add

to the LSP and output sign of;

* If and notcoded yet, then output

;*If then add to the

LIP;—If then move to

the end of the LIS as “type-B” entry;else, remove from the LIS;

2.2.4. If “type-B” entry and :Output ;If then:— Add each to the end

of the LIS as “type-A” entry;— Remove from the LIS.


3) Refinement pass: For each , except thosefound significant in the current sorting pass, output the

most significant bit of ;4) Quantization-step update: Decrement by 1 and go

to Step 2.

Secure Shape Code Set (SecSCS) Subroutine:

Input: set with root , , ,1) If is “type-A” entry:

1.1. For each :1.1.1. If not coded yet then output

;1.1.2. If then:

If not coded yet then output;

If terminate processingof ;If then:— If not coded yet then

output ;— If then go to Step 1

treating as new “type-A” input;2) If is “type-B” entry:

2.1. For each , go to Step 1 treatingas new “type-A” input;

The coding operation is typically terminated when a specifiedrate or distortion criterion is met. While SecST-SPIHT allowsfor coding to be terminated before the shape has been losslesslycoded, typical rate criteria and values of will result in completelossless coding of the shape. Also, the coder may be instructednot to code the shape in situations where, for example, the shapeis implicitly available via the shape of another object which sur-rounds the object to be coded (e.g., a background object).

The SecST-SPIHT decoder follows exactly the same execu-tion path as the coder and only requires basic initialization in-formation (i.e., , , , , , , the number of wavelettransform levels, and if the shape was not coded) to inter-pret the output bit-stream. Provided with the correct decryptionkey, , the decoder decodes the bit-stream and instructs thedecryption function as to whether each subsequentbit should be decrypted or passed through, unencrypted. Sincethe first bit is always in (generated from the firstiteration of step 2.1.1), it must always be decrypted. An alterna-tive approach to implementing the coder and decoder would beto set the total number of bits to encrypt, , rather than .Encryption would only be activated until this criterion is met;accordingly, provided with this parameter, the decoder can de-termine which bits in the output bit-stream require decryption.

It should be noted that SecST-SPIHT is backward compat-ible such that, when the input shape fills the entirerectangular bounding box, the coding operation is identical totraditional SPIHT [3] and the selective encryption algorithmoperates the same as in [18]. Also, the selective encryption maybe applied “offline” to an object already coded using ST-SPIHT.Using an ST-SPIHT decoder to interpret the bit-stream, theequivalent bit classification instructions can be generated as

in the SecST-SPIHT coder, and the appropriate bits replacedwith encrypted versions. A system protocol for managing andcompositing objects is beyond the scope of this work and isaddressed in other works, e.g., [19].

III. SECURITY ANALYSIS OF SecST-SPIHT

The SecST-SPIHT selective encryption ensures the con-fidentiality of the coded visual object data in two ways: 1)securing the most significant portion of the bit-stream usinga secret cryptographic key and a stream cipher and 2)making the unencrypted portion of the bit-stream impossibleto decode since its location and the state of the decoder cannotbe determined without correct decryption and decoding of theencrypted portion.

As noted in the previous section, encryp-tion is performed on the output bits

. This represents a partial bit-plane and shapeencryption performed on the visual object in the SA-DWTdomain, with the choice of determining to how manybit-planes selective encryption is applied. A coefficient

will have its most significant bit (MSB) atbit-plane , encrypted if

—i.e., if the coefficient is foundsignificant during the first coding iterations. Also, if thecoefficient is part of the luminance SA-DWT LL subband (i.e.,

), it is placed in the LIP upon initialization of thecoder and hence will also have each bit encrypted in bit-planes

. In other words,for luminance LL subband coefficients, the higher order bitsare also encrypted, until the bit-plane at which the coefficient isfound significant, or coding iterations have passed. Alterna-tively, if is contained in a spatial orientation tree (i.e.,

), it will have one or more bits encrypted if it hasbeen removed from the tree and placed in the LIP during thefirst coding iterations. This occurs if the parent of coefficient

has other descendants found significant during thefirst coding iterations, before is found significant.Defining the parent coordinates of coefficient as

, as per the color spatial orientation tree definition [20],we then define the set of coordinates of “parental descendants”of as ,that is, the parental descendants of are all thecoefficient’s descendant from its parent, not including itself.Hence, ifand , thencoefficient will be placed in the LIP during thefirst coding iterations and will have encrypted bits inthe bit-planes

. The net effect of this is that anonsignificant coefficient will still have one or more of its bitsencrypted if it is located in the region of significant coefficients,thus the partial encryption can be seen to be applied in generalregions of significance.

In addition to the partial bit-plane encryption of the texturecoefficients, the output of each -test is encrypted, effectivelyencrypting the entire shape code during the first iterations.If , then the complete, lossless shape code isencrypted. The choice of should be made to ensure that the


number of bits finally encrypted is sufficient to make it compu-tationally infeasible to perform a brute-force, exhaustive searchattack over all possible sequences.

As with SPIHT and ST-SPIHT, the SecST-SPIHT coderand decoder follow a data-dependent execution path. Thismeans that the correct interpretation of a given bit in theoutput bit-stream requires complete knowledge of all previoussignificance test and -test bits. The result is that an attackercannot in fact locate the bits in the output bit-stream whichare not encrypted. To demonstrate the difficulty encounteredby a cryptanalyst attempting to determine which bits are unen-crypted, we use to denote the bit in the set ,for , where is the totalnumber of bits in . According to the SecST-SPIHTcoder definition, considering the initial coding iterations inwhich (i.e., the shape is still being coded), it is known apriori that the first bit is an -test bit

(2)

However, classification of the second bit depends on the first bit

ifotherwise

(3)

and, consequently, classification of the third bit depends on thefirst and second bits

if andif andotherwise.

(4)This can be generalized as follows:

if

and

if

and

otherwise

(5)

for . From (5), it is evident that the bitscan in fact be treated as the ordered set of coded transition in-structions in a Markov chain. The classification of , indi-cating the th state in the chain, must be known along withthe value (the transition instruction) in order to determinethe classification of (the th state in the chain). Since thevalue of indicates only the transition and not the state it-self, it is clear that all previous bits , must beknown in order classify and determine whether it is un-encrypted. Similar arguments can be made for . Hence,without the correct decryption key, not only do the encryptedbits remain confidential, but the locations of the unencryptedbits cannot be determined and are thus also confidential.

In attacking the encrypted portion of the bit-stream, the crypt-analyst may attempt to recreate the Markov chain and perform

statistical analyses so that the original bits could be correctlypredicted with probability from previous bits, thusaiding an exhaustive search attack. While recreating such anattack is beyond the scope of this paper, the efficiency of thecoding algorithm [1], [3] implies that the entropy of each bit

and thus , regardless of the additional contex-tual information offered by the previous states in the decodedchain. However, if a more conservative estimate of ismade, then can simply be increased to increase the numberof encrypted bits in order to ensure that an exhaustive search re-mains computationally infeasible. Also, it should be noted that,as with traditional cryptographic systems, the length of the de-cryption key, , should also be long enough to defend againsta brute-force attack over the key space.

Alternatively, an attacker may attempt to locate the unen-crypted portion of the bit-streamsince it is known that all bits in are unencrypted and mayreveal important image features if correctly decoded. If we de-note the total number of bits in the first coding iterations (bothencrypted and unencrypted) as , an attack on may be at-tractive if . In other words, if determining thelocation of (which starts at bit within the overallbit-stream ) is computationally simpler than an exhaustivesearch over the encrypted bits , the attacker may view this ap-proach as offering a greater probability of success in revealingimage details. However, even with knowledge of , the stateof the LSP, LIP, and LIS lists and the shape decoding remainunknown without correct decryption and decoding of . Thismeans that, while the initial bits in may be correctly classi-fied by the attacker, it cannot be determined which coordinateswithin the SA-DWT representation of the object the coded bitscorrespond to. Ultimately, the attacker will not be able to deter-mine any image details from without correct decryption anddecoding of .

In summary, the SecST-SPIHT secure coder achieves confi-dentiality by encrypting the most significant portion of the bit-stream as well as obfuscating the unencrypted portion. We notethat the scheme in [21] applies a similar approach for zero-treewavelet-coded rectangular images, except that an a priori designchoice is made to restrict encryption to the lowest two frequencysubbands (i.e., the top two levels in the spatial orientation trees).This approach does not allow for the data-dependent distributionof significant coefficients and is inflexible to varying applica-tions which require input images of different sizes with the useof varying number of wavelet decomposition levels. In contrast,the approach of SecST-SPIHT is for the selective encryption tofollow the data-dependent execution path of the coder, ensuringthat the most significant coefficients, regardless of location, arepartially encrypted, and that always the initial portion of thebit-stream is partially encrypted. Furthermore, SecST-SPIHToffers the user parameter which provides control over howmany coding iterations are considered for encryption. This al-lows flexibility to meet the security requirements of the applica-tion at hand. In practice, choosing will always result in asufficient number of bits being encrypted to prevent a successfulbrute-force attack (see Table I). In other words, for , thenumber of encrypted bits , representing the cur-rent standard for the minimum length of “strong” binary keys.However, it is possible that the states of the LSP, LIP, and LIS


TABLE INUMBER OF BITS ENCRYPTED FOR THE TEST OBJECTS USING DIFFERENT

VALUES OF K AND � = n � 2

Fig. 5. “Surveillance1” test object. (a) Original frame. (b) Segmented object.(c) Rectangular segmented object.

Fig. 6. “Surveillance2” test object. (a) Original frame. (b) Segmented object.(c) Rectangular segmented object.

lists may not be sufficiently random after a single coding itera-tion, potentially aiding a brute-force attack. As such, it is recom-mended to choose to protect against intelligent attacks.For critical applications where security is of greater importancethan processing overhead, practitioners may choose .

IV. EXPERIMENTAL RESULTS

The analyses provided in Section III demonstrate the secu-rity of the SecST-SPIHT coder. However, the efficacy of such ascheme must also be demonstrated via subjective visual evalu-ation to ensure that the secured object details remain confiden-tial. Also, the computational requirements of the scheme mustbe evaluated via empirical measurement of processing times. Inthis section we input a variety of sample visual objects to theSecST-SPIHT coder and evaluate the output generated whenthe user does not provide the correct decryption key. The per-formance of the proposed scheme is judged on its ability to ob-scure the original visual object features as well as its ability toachieve processing times less than those achieved with “wholecontent” encryption. The security-level parameter and shapecode-level parameter are varied to determine their effect onthe processing times and the resultant number of encrypted bitsas a portion of the whole bit-stream. The rate-distortion perfor-mance of SecST-SPIHT is identical to ST-SPIHT, which is ex-amined in detail in [1] and will not be covered here.

The chosen input visual test objects are shown in Figs. 5and 6. The “surveillance1” and “surveillance2” objects wereextracted from actual surveillance video frames using motion-based segmentation. For comparison, the same objects are in-cluded with bounding box shape representations, simulating thecase where “coarse” segmentation is applied. This may be the

case in some real-time or low-resolution applications where ac-curate segmentation is infeasible. The coder accepts an arbitrarybinary segmentation map so that any segmentation algorithmcan be employed, depending on the requirements of the appli-cation. All frames are in 8-b per channel RGB CIF format (352

288) with the “surveillance1” and “surveillance2” occupying10.9% and 25.7% of frame, respectively, for the accurate seg-mentation, and 20.9% and 42.7%, respectively, for the coarsebounding box segmentation.

In all test cases, the SecST-SPIHT coder utilized the CDF9/7 biorthogonal wavelet filters [22] with a four-level transform,and an output code bit-rate of 2.4 b-per-object-pixel (includingthe shape code, where applicable). Since the progressive/em-bedded output property of ST-SPIHT is maintained, the outputcode may be arbitrarily truncated to achieve a lower bit-ratewith the sacrifice of greater texture distortion.2 If lossless codingof the texture is required, integer-to-integer wavelet filters [23]and color transforms can be utilized and the coder instructed tocode all of the transform domain bit-planes [1]. The HC-128software-based cipher was employed as a realistic example of amodern stream cipher [24], using a 128-bit randomly generatedkey. However, any stream cipher that is sufficiently secure forthe application at hand can be utilized.

Figs. 7 and 8 show sample output using the test objects. Inall cases, encryption is performed during the first two coding it-erations . In the cases where the shape is coded andencrypted with the object texture, the shape code is completedin the third iteration . Figs. 7 and 8 show the de-crypted/decoded output “surveillance” objects/frames when (a)and (d) the correct decryption key is provided, (b) and (e) the in-correct decryption key is provided, and (c) and (f) the incorrectdecryption key is provided, but the shape is available externallyand only the texture is coded and encrypted. We note that theshape may be implicitly provided externally via a backgroundobject which surrounds the given object. This is not equiva-lent to simply turning off encryption (but still coding) for theshape bits since, in this case, the unencrypted shape bits wouldstill be difficult for an attacker to locate and decode amongstthe other encrypted content. On the other hand, providing theshape externally gives direct access to the content and allows de-coding of the texture in reference to the provided shape. Also inFigs. 7 and 8, the rectangular bounding box versions of the of thedecrypted/decoded objects (“surveillance1”-rect and “surveil-lance2”-rect) are shown for when: (g) and (j) the correct de-cryption key is provided, (h) and (k) the incorrect decryptionkey is provided, and (i) and (l) the incorrect decryption key isprovided, but the bounding box shape is available externally andonly the texture is coded. In all cases where the incorrect key isprovided, the textural content is completely obscured; no ob-ject details can be seen. For the case (b)/(e)/(h)/(k) where theshape is coded and encrypted with the texture, the shape is alsocompletely obscured. In order to reconstruct the frame withoutrevealing the object shape mask, the background is transmittedas a full frame, with the missing texture information behind theobject filled-in using prior frames.

Comparing the output of the accurately segmented objectswith the bounding box segmented objects, it can be seen that the

2At most bit-rates and choices of �, the shape will be coded losslessly.


Fig. 7. “Surveillance1” [(a)–(f)] and “Surveillance1”-rect [(g)–(l)] test object/frame decoded and decrypted output (K = 2): (a), (d), (g), (j) with correct key,(b), (e), (h), (k) with incorrect key, and (c), (f), (i), (l) with incorrect key and shape provided externally.

Fig. 8. “Surveillance2” [(a)–(f)] and “Surveillance2”-rect [(g)–(l)] test object/frame decoded and decrypted output (K = 2): (a), (d), (g), (j) with correct key,(b), (e), (h), (k) with incorrect key, and (c), (f), (i), (l) with incorrect key and shape provided externally.

same level of obscuration is achieved when the shape is codedand encrypted (i.e., comparing (e) and (k) in Figs. 7 and 8).However, in the cases where the shape has been provided exter-nally, the accurate segmentation [(f) in Figs. 7 and 8] may revealsilhouette details which could be used to identify subjects [9].In contrast, the coarse bounding box [(l) in Figs. 7 and 8] com-pletely obscures the actual shape of the object. The tradeoff inthis case is that the liberal nature of the bounding box segmenta-tion map results in a large portion of the frame being obscured,reducing the ability to monitor general activities that occur inthe frame.

Fig. 9 shows the fraction of the output code bits which areencrypted versus the number of coding iterations during whichencryption is performed . The total number of output codebits corresponds to a bit-rate of 2.4 b-per-object-pixel [includingthe shape code for Fig. 9(b)–(d)]. Fig. 9(a) shows the case wherethe shape is not coded; Fig. 9(b)–(d) shows the cases wherethe shape code is completed during the first, second, and thirdcoding iterations ( , , and ), respec-tively. In Fig. 9(a), the effect of varying can clearly be seen,with the fraction of the output code being encrypted rising with

. The fraction remains small for all considered ,ranging from approximately 0.2% to 1.6%. In Fig. (b)–(d), alarge jump in the portion of the bit-stream that is encrypted isobserved once is set sufficiently high to ensure that the shapeis completely encrypted . When is raisedabove this point, the effect is more subtle since at low output

bit-rates the shape code represents a significant portion of thebit-stream. With , the actual percentage of theoutput code that is encrypted is largely controlled by the portionwhich is the shape code ( and ). If the userwishes to keep the level of encryption to a minimum for the pur-pose of computational efficiency, should be set low enough todisperse the shape code further into the bit-stream, and setting

so that only the initial portion of the shape codeis encrypted. In this case, should be chosen so that canstill be set high enough to encrypt a minimum number of bitsto achieve a minimum desired level of security. For example, asin Figs. 7 and 8, setting and (i.e., shapecode completed in the third coding iteration). The drawback ofthis approach is that the shape cannot be completely, losslesslydecoded until later in the output bit-stream, possibly resultingin lossy shape reconstruction in very low bit-rate scenarios.

It should be noted that as Figs. 7 and 8 (b), (e), (h), and (k)show cases where the shape is only partially encrypted (i.e.,

), the shape is still entirely obscured. Using(i.e., entirely encrypting the shape) does not

provide any further visual obscuration of the shape. Hence, jus-tification for employing greater should be based purely onthe cryptanalysis and not on visual inspection.

Table I shows the number of bits encrypted forand different . As in Fig. 9(d), there is a jump at the itera-tion at which the remaining shape code is generated and en-crypted . With this choice of , can be chosen


Fig. 9. Fraction of bits encrypted versus the security-level parameter K(number of encrypted coding iterations) for different � (shape code levels).The total bits in the code corresponds to a bit-rate of 2.4 b-per-object-pixel. (a)Shape not coded. (b) Shape code completed in first iteration (� = n ). (c)Shape code completed in second iteration (� = n � 1). (d) Shape codecompleted in third iteration (� = n � 2).

since the number of bits encrypted is large enough to prevent abrute-force exhaustive search attack over the encrypted bits, butstill represent minimal processing overhead with less than 5%of the output bit-stream encrypted for a bit-rate of 2.4 b-per-ob-ject-pixel. We note that, for a given object and chosen (i.e.,fixed number of encrypted bits), if the output bit-rate is de-creased, the percentage of the output bits that are encrypted risesproportionally. This is necessary to ensure the confidentiality ofthe coded information, regardless of output bit-rate or recon-struction quality, that is, should be chosen based on the secu-rity requirements, independent of the image quality employedby the system.

The results in Fig. 9 show that use of the rectangular boundingbox segmentation mask results in no appreciable difference inthe fraction of bits encrypted when compared with the accu-rate segmentation map. However, Table I shows that the abso-lute number of bits encrypted increases in the range of approx-imately 10%–20% for the rectangular bounding box. This is adirect result of the bounding box containing more pixels thanthe accurate segmentation mask.

Table II shows the processing time in seconds for dif-ferent values of , as well as with no encryption (baselineST-SPIHT), and whole content encryption (encryption of theentire ST-SPIHT bit-stream). The coding and encryption wasperformed on a Windows XP-based machine, using an IntelCore 2 Duo E6600 processor at 2.4 GHz. As can be seen, for

, the processing time compared to the case of no en-cryption is increased negligibly ( 5%). In contrast, encryptingthe entire content results in processing times that are between15% and 75% greater than those achieved with no encryption.It is clear that the partial encryption approach is justified asa method for processing efficiency when a software-basedstream cipher is employed. In an environment where multiplesurveillance streams must be processed simultaneously, the

TABLE IIPROCESSING TIMES IN SECONDS FOR CODING AND ENCRYPTION USING

DIFFERENT VALUES OF K AND � = n � 2

processing time savings achieved by ST-SPIHT in comparisonto whole content encryption can be critical.

It should be noted that the property of SecST-SPIHT to dis-perse the shape code within the texture code is inherited fromST-SPIHT. With the execution path of the texture decoding de-pendent on the shape code, the two portions of the code cannotbe separated without correct decryption of all encrypted bits.

V. CONCLUSION

The SecST-SPIHT secure visual object coder was presented,offering an efficient solution for privacy protection of subjectsin digital video surveillance systems. Provided with segmented,arbitrarily shaped visual objects, SecST-SPIHT securely codesboth the shape and texture, ensuring confidentiality through theuse of a private decryption key. In contrast to privacy protec-tion systems that simply discard the subject’s visual details viamasking or blurring, SecST-SPIHT allows complete recoveryof the data if the correct decryption key is provided. This is nec-essary in applications where the visual data may be required forfuture investigative purposes. Furthermore, by encrypting theobject shape, subject recognition based on silhouette character-istics is prevented. Additionally, the SecST-SPIHT secure coderoffers all the features of the ST-SPIHT visual object coder [1],namely, efficient and progressive/embedded parallel coding ofthe object shape and texture.

The parameter offers the user control over a variable levelof application-dependent security. In effect, increasing in-creases the portion of the output bit-stream that is encryptedby performing encryption for a greater number of coding itera-tions. In practice, can be chosen to ensure that the number ofencrypted bits is high enough to protect against a brute-force,exhaustive search attack over the encrypted portion of the bit-stream. It was shown that was generally sufficient. Theremaining unencrypted portion of the bit-stream cannot be de-coded since the data-dependent execution of the decoder re-quires complete knowledge of the prior (encrypted) portion ofthe bit-stream.

The provided secure coding algorithm operates on individualvisual object input frames, but may be extended for video se-quences using techniques similar to Motion JPEG 2000 [14] or3-D transform-domain representations [17]. Alternatively, mo-tion compensation may be employed to reduce the size of theshape and texture coded for subsequent frames. Consequently,for a given , the number of encrypted bits for subsequent en-crypted object frames would also be very low. However, con-fidentiality of those object frames would not be compromisedsince correct decoding would require decryption of the previousframes, thus extending the data dependent, partial encryptionparadigm into the temporal dimension.


SecST-SPIHT is well suited as a privacy-enhancing tech-nology for surveillance-intensive environments. However, thecoder can be employed in any number of applications wherethe confidentiality and efficient coding of arbitrarily shapedvisual objects is required.

APPENDIX

SPIHT AND SHAPE AND TEXTURE SPIHT CODING

The SPIHT rectangular image coder [3] is in the class ofwavelet-based zero-tree coders, where the discrete wavelet co-efficients of an image are organized into spatial orientation trees(SOT) and undergo iterative bit-plane coding. The SOTs and anordered partitioning algorithm exploit the so-called self-simi-larity of the wavelet coefficients across subbands, producing ahighly efficient code in a rate-distortion sense. The symmetriccoding and decoding algorithms depend on three lists that storethe state of each coefficient with respect to the current bit-planebeing coded: the list of insignificant pixels (LIP), list of signif-icant pixels (LSP), and list insignificant sets (LIS). The algo-rithm employs the following definitions to reference individualcoefficients as well as sets of coefficients: a “type-A” entry inthe LIS refers to , all of the descendants of ; a“type-B” entry refers to , where

are the direct offspring of location . denotesthe set of all luminance LL subband coefficient coordinates and

refers to the significance test at bit-plane , as defined in[3]. The coordinates are defined as in Section II, with thecolor index derived from the SPIHT modification proposed in[20] for color images.

The ST-SPIHT extension to SPIHT codes both the shape andtexture of arbitrarily shaped visual objects in parallel to produceone unified, embedded output bit-stream [1]. The texture codingin ST-SPIHT follows a natural extension of SPIHT with thesame SOT definition as in [3], along with the modification forcolor images proposed in [20]. The SOTs are first formed usingall coordinates inside the bounding box of size ; thebinary shape mask is used to describe which nodes are insidethe object and which are outside.

The same input object definition and preprocessingsteps described in Section II are used. We define

as the set of all coordinates inside theobject and as the complemen-tary set containing all coordinates outside the object—i.e.,

and .Unique to the ST-SPIHT algorithm are a series of three

“ -test” functions. The “ pixel test” function, , identi-fies whether a coordinate is inside or outside the shape and isdefined follows:

otherwise.(6)

The “ set-discard test” function, , identifies sets of co-efficients that are entirely outside the object:

otherwise(7)

where represents a given set of coefficients. Finally, the “set-retain test” function, , identifies sets of coefficientsthat are entirely inside the object

otherwise.(8)

The ST-SPIHT coding routine requires the shape code levelparameter, , to be input. This defines the quantization level atwhich the routine forces the coding of not-yet-coded shape maskpixels . This is done by applying the subroutine “ShapeCode Set” (SCS) to the appropriate trees. The complete algo-rithm codes the shape and texture information in parallel, pro-ducing an embedded bit-stream that can be decoded to produceprogressive shape and texture reconstruction. By lowering , theshape code becomes further dispersed in the output bit-stream,delaying the point at which the shape can be completely, loss-lessly decoded. At very low output bit-rates, lowering al-lows greater emphasis to be placed on the texture, providing thetradeoff of lossy shape reconstruction [1]. The decoder followsthe same data-dependent execution path as the coder based oninterpretation of the output bit-stream.

REFERENCES

[1] K. Martin, R. Lukac, and K. N. Plataniotis, “SPIHT-based coding ofthe shape and texture of arbitrarily shaped visual objects,” IEEE Trans.Circuits Syst. Video Technol., vol. 16, no. 10, pp. 1196–1208, Oct. 2006.

[2] S. Li and W. Li, “Shape-adaptive discrete wavelet transforms for arbi-trarily shaped visual object coding,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 10, no. 4, pp. 725–743, Aug. 2000.

[3] A. Said and W. A. Pearlman, “A new fast and efficient image codecbased on set partitioning in hierarchical trees,” IEEE Trans. CircuitsSyst. Video Technol., vol. 6, no. 3, pp. 243–250, Jun. 1996.

[4] B. Furht, D. Socek, and A. M. Eskicioglu, “Fundamentals of multi-media encryption techniques,” in Multimedia Security Handbook.Boca Raton, FL: CRC, 2004, ch. 3.

[5] S. Tansuriyavong and S. Hanaki, “Privacy protection by concealingperson in circumstantial video image,” in Proc. Workshop on Percep-tive User Interfaces, 2001, vol. 4, pp. 1–4.

[6] D. Chen, Y. Chang, R. Yan, and J. Yang, “Tools for protecting theprivacy of specific individuals in video,” EURASIP J. Adv. SignalProcess., vol. 2007, pp. 1–9, 2007.

[7] E. M. Newton, L. Sweeney, and B. Malin, “Preserving privacy byde-identifying face images,” IEEE Trans. Knowl. Data Eng., vol. 17,no. 2, pp. 232–243, Feb. 2005.

[8] J. Schiff, M. Meingast, D. K. Mulligan, S. Sastry, and K. Goldberg,“Respectful cameras: Detecting visual markers in real-time to addressprivacy concerns,” in Proc. Int. Conf. on Intell. Robots Syst., Nov. 2007,pp. 971–978.

[9] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “A full-body lay-ered deformable model for automatic model-based gait recognition,”EURASIP J. Adv. Signal Process., vol. 2008, 2008, article ID 261317.

[10] I. Martinez-Ponte, X. Desurmont, J. Meesen, and J.-F. Delaigle, “Ro-bust human face hiding ensuring privacy,” in Proc. Int. Workshop onImage Analysis for Multimedia Interactive Services, 2005.

[11] A. Senior, S. Pankanti, A. Hampapur, L. Brown, Y.-L. Tian, A. Ekin, J.Connell, C. F. Shu, and M. Lu, “Enabling video privacy through com-puter vision,” IEEE Security Privacy, vol. 3, no. 3, pp. 50–57, May–Jun.2005.

[12] W. Zhang, S. S. Cheung, and M. Chen, “Hiding privacy information invideo surveillance system,” in Proc. IEEE Int. Conf. Image Process.,2005, vol. 3, pp. 868–871.

[13] F. Dufaux, M. Ouaret, Y. Abdeljaoued, A. Navarro, F. Vergnenegre,and T. Ebrahimi, “Privacy enabling technology for video surveillance,”in Image Processing for Military and Security Applications, S. S.Agaian and S. A. Jassim, Eds., 2006, vol. 6250, Proc. SPIE, pp. 1–12.

[14] ISO/IEC 15444-3:2007 Information Technology—JPEG 2000 ImageCoding System: Motion JPEG 2000, JTC 1/SC 29/WG 1, ISO/IEC Std.,2007.


[15] J. Wen, M. Severa, W. Zeng, M. H. Luttrell, and W. Jin, “Aformat-compliant configurable encryption framework for access con-trol of video,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no.6, pp. 545–557, Jun. 2002.

[16] P. C. Wang and T.-W. Hou, “An AV object oriented encryption al-gorithm for MPEG-4 streams,” in Proc. Int. Conf. on Multimedia andExpo, Jun. 2004, pp. 971–974.

[17] G. Minami, Z. Xiong, A. Wang, and S. Mehrotra, “3-D wavelet codingof video with arbitrary regions of support,” IEEE Trans. Circuits Syst.Video Technol., vol. 11, no. 9, pp. 1063–1068, Sep. 2001.

[18] K. Martin, R. Lukac, and K. N. Plataniotis, “Efficient encryption ofwavelet-based coded color images,” Pattern Recognition, vol. 38, no.7, pp. 1111–1115, 2005.

[19] Information Technology—Coding of Audio-Visual Objects—MPEG-4Part 11: Scene Description and Application Engine (14496-11),ISO/IEC JTC1/SC29 WG11 International Standard, 2005.

[20] A. A. Kassim and W. S. Lee, “Embedded color image coding usingSPIHT with partially linked spatial orientation trees,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 13, no. 2, pp. 203–206, Feb. 2003.

[21] H. Cheng and X. Li, “Partial encryption of compressed images andvideos,” IEEE Trans. Signal Process., vol. 48, no. 8, pp. 2439–2451,Aug. 2000.

[22] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Imagecoding using wavelet transform,” IEEE Trans. Image Process., vol. 1,pp. 205–220, Apr. 1992.

[23] R. Calderbank, I. Daubechies, W. Sweldens, and B.-L. Yeo, “Wavelettransforms that map integers to integers,” Appl. Comput. Harmon.Anal., vol. 5, no. 3, pp. 322–369, 1998.

[24] Ecrypt Stream Cipher Project: HC-128 European Network of Excel-lence for Cryptology [Online]. Available: http://www.ecrypt.eu.org/stream/hcp3.html

Karl Martin (S’00) received the B.A.Sc. degreein engineering science and the M.A.Sc. degreein electrical engineering from the University ofToronto, Toronto, ON, Canada in 2001 and 2003,respectively. where he is currently working towardthe Ph.D. degree.

His research interests include multimedia security,multimedia processing, wavelet-based image coding,object-based coding, and CFA processing.

Mr. Martin is a member of the IEEE Signal Pro-cessing Society, the IEEE Communications Society,

and the IEEE Circuits and Systems Society. He has been a technical reviewerfor numerous journals and conferences. Since 2003 he has held the position ofVice-Chair of the Signal Processing Chapter, IEEE Toronto Section.

Konstantinos N. Plataniotis (S’90–M’92–SM’03)received the B. Eng. degree in computer engineeringfrom University of Patras, Patras, Greece, in 1988and the M.S. and Ph.D. degrees in electrical engi-neering from Florida Institute of Technology (FloridaTech), Melbourne, in 1992 and 1994, respectively.

He is an Associate Professor with The Edward S.Rogers Sr. Department of Electrical and ComputerEngineering, University of Toronto, Toronto, ON,Canada, an Adjunct Professor with the School ofComputer Science, Ryerson University, Toronto, and

a member of The University of Toronto’s Knowledge Media Design Institute.His research interests include biometrics, communications systems, multimediasystems, and signal and image processing.

Dr. Plataniotis was the 2005 recipient of IEEE Canada’s Outstanding En-gineering Educator Award “for contributions to engineering education and in-spirational guidance of graduate students” and the corecipient of the 2006 IEEETRANSACTIONS ON NEURAL NETWORKS Outstanding Paper Award for the paper(published in 2003) entitled “Face recognition using kernel direct discriminantanalysis algorithms.” He is a Registered Professional Engineer in the Provinceof Ontario and a member of the Technical Chamber of Greece.

privacy protected surveillance using secure visual object coding

Documents