[ieee 2013 brics congress on computational intelligence & 11th brazilian congress on...

6
Analyzing Ensemble Systems For Protected Biometric Data Isaac de L. Oliveira Filho and Otaciana G. R. Santiago Department of Informatics, State University of RN Natal, RN - Brazil, 59063-200, +55-84-3207-8789 E-mail: {isaacoliveira, otacianarezende}@uern.br Anne M. P. Canuto and Benjam´ ın R. C. Bedregal Department of Informatics and Applied Mathematics, Federal University of RN Natal, RN - Brazil, 59072-970, +55-84-3215-3815 E-mail:{anne, bedregal}@dimap.ufrn.br Abstract—In this paper, we propose a comparative analysis of the use of cryptography and transformation functions to be used as biometric (signature) template protection methods. The main goal is to investigate the increasement of the biometric dataset security as well as the performance of the protected dataset in the biometric-based systems. We use the well-elaborated structures for pattern recognition (ensembles systems) on unprotected and protected dataset to measure the performance of the biometric template protection methods used in this research. The results would allow us to define the most secure used protection method which keeps an acceptable accuracy level at the same time. I. I NTRODUCTION Currently, there are different modalities for identifying individuals in identification systems. A user can use, for instance, passwords, key phrases, identification numbers, etc. These modalities are referred to as traditional approaches and they present some disadvantages such as low security level, easily discovered information and some keys that can be very common. In other words, we can face some problems when using the traditional approaches. Regarding this aspect, the traditional approaches cannot prevent original information from being lost or stolen. However, in these scenarios, a user can generate another password, login or a key phrase to be authenticated by a system. An alternative approach named biometric modalities can be used in authentication systems. An authentication system that recognizes a biometric characteristic (modality) of a user can be called biometric-based authentication system. Examples of biometric characteristics are iris and fingerprints. However, these modalities are not revocable and, therefore, it is ex- tremely necessary to increase security and robustness of the biometric-based identification systems. These features should be preserved because they are unique for each person and they increase reliability, convenience and universality of the identification systems [1]. For instance, in [2], the authors cite that the security is even more important for biometric- based identification systems than for non-biometric ones, since a biometric is permanently associated with a user and cannot be revoked or canceled if compromised. However, if the users biometric characteristics are stolen, these information cannot be recovered. There are other protection techniques in the biometric data related to the security area. For instance, the use of signature template protection systems were first considered in [3], which was based on the biometric cryptosystem approach (key gener- ation cryptosystem). Another research about biometric security can be found in [4] and an adaptation of the fuzzy vault for signature protection was proposed in this study. This adaptation uses a quantized set of maximal and minimal of the temporal functions mixed with chaff points in order to provide security. In the biometric protection area, the template protection methods can be divided in Cancellable and Bio-Cryptosystem. The former has been increasingly applied to address such security issues. Cancellable (also known as Transformation Method) is commonly referred to the application of non- invertible and repeatable modifications in the original biomet- ric templates. The latter uses protection methods like Fuzzy Vault [5], Fuzzy Commitment [6] and Biohashing [7]. Another solution for unprotected biometric data is the use of traditional protection methods (cryptosystems) such as: cryptography or hash functions. The use of these protection methods shows several classification problems when they are used on biometric modalities. This problem can occur because the cryptosystems shuffle the biometric dataset in a more effec- tive way. Thus, the biometric-based authentication systems do not classify and understand the relationship among protected patterns. Therefore, we want to prove that the relationship of original samples is broken with the use of the most secure protection methods. In this research, we use four traditional protection methods. The first one is Pap´ ılio Cryptosystem (Cipher) [8], because it offers the same security level than AES and RSA according to [9]. The second and third ones are MD5 (Hash Function) [10] and SHA-1(Hash Function) [11], which present the most well-known hash functions in the literature and are considered as highly recommended since they increase the security in biometric dataset. Finally, the fourth method is the Transfor- mation method (BioConvolving) [12], which was chosen due to its high security and accuracy levels in biometric dataset. We aim at comparing Pap´ ılio Cryptosystem, Hash Function MD5, SHA-1 and Transformation method (BioConvolving) performances regarding the accuracy level in biometric-based identification systems. In fact, it is important to verify if the application of the traditional protection methods and the transformation methods on biometric dataset keeps the same accuracy level found in unprotected biometric dataset by the ensemble systems application. This paper is divided into eight sections and organized 1st BRICS Countries Congress on Computational Intelligence 978-1-4799-3194-1/13 $31.00 © 2013 IEEE DOI 10.1109/BRICS-CCI.&.CBIC.2013.97 586 1st BRICS Countries Congress on Computational Intelligence 978-1-4799-3194-1/13 $31.00 © 2013 IEEE DOI 10.1109/BRICS-CCI.&.CBIC.2013.97 586 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence 978-1-4799-3194-1/13 $31.00 © 2013 IEEE DOI 10.1109/BRICS-CCI-CBIC.2013.103 586

Upload: benjamin-rc

Post on 24-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC) - Ipojuca, Brazil (2013.9.8-2013.9.11)] 2013 BRICS

Analyzing Ensemble Systems For ProtectedBiometric Data

Isaac de L. Oliveira Filho and Otaciana G. R. SantiagoDepartment of Informatics,

State University of RN

Natal, RN - Brazil, 59063-200, +55-84-3207-8789

E-mail: {isaacoliveira, otacianarezende}@uern.br

Anne M. P. Canuto and Benjamın R. C. BedregalDepartment of Informatics and Applied Mathematics,

Federal University of RN

Natal, RN - Brazil, 59072-970, +55-84-3215-3815

E-mail:{anne, bedregal}@dimap.ufrn.br

Abstract—In this paper, we propose a comparative analysis ofthe use of cryptography and transformation functions to be usedas biometric (signature) template protection methods. The maingoal is to investigate the increasement of the biometric datasetsecurity as well as the performance of the protected dataset in thebiometric-based systems. We use the well-elaborated structuresfor pattern recognition (ensembles systems) on unprotected andprotected dataset to measure the performance of the biometrictemplate protection methods used in this research. The resultswould allow us to define the most secure used protection methodwhich keeps an acceptable accuracy level at the same time.

I. INTRODUCTION

Currently, there are different modalities for identifyingindividuals in identification systems. A user can use, forinstance, passwords, key phrases, identification numbers, etc.These modalities are referred to as traditional approaches andthey present some disadvantages such as low security level,easily discovered information and some keys that can bevery common. In other words, we can face some problemswhen using the traditional approaches. Regarding this aspect,the traditional approaches cannot prevent original informationfrom being lost or stolen. However, in these scenarios, a usercan generate another password, login or a key phrase to beauthenticated by a system.

An alternative approach named biometric modalities canbe used in authentication systems. An authentication systemthat recognizes a biometric characteristic (modality) of a usercan be called biometric-based authentication system. Examplesof biometric characteristics are iris and fingerprints. However,these modalities are not revocable and, therefore, it is ex-tremely necessary to increase security and robustness of thebiometric-based identification systems. These features shouldbe preserved because they are unique for each person andthey increase reliability, convenience and universality of theidentification systems [1]. For instance, in [2], the authorscite that the security is even more important for biometric-based identification systems than for non-biometric ones, sincea biometric is permanently associated with a user and cannotbe revoked or canceled if compromised. However, if the usersbiometric characteristics are stolen, these information cannotbe recovered.

There are other protection techniques in the biometric datarelated to the security area. For instance, the use of signaturetemplate protection systems were first considered in [3], which

was based on the biometric cryptosystem approach (key gener-ation cryptosystem). Another research about biometric securitycan be found in [4] and an adaptation of the fuzzy vault forsignature protection was proposed in this study. This adaptationuses a quantized set of maximal and minimal of the temporalfunctions mixed with chaff points in order to provide security.

In the biometric protection area, the template protectionmethods can be divided in Cancellable and Bio-Cryptosystem.The former has been increasingly applied to address suchsecurity issues. Cancellable (also known as TransformationMethod) is commonly referred to the application of non-invertible and repeatable modifications in the original biomet-ric templates. The latter uses protection methods like FuzzyVault [5], Fuzzy Commitment [6] and Biohashing [7].

Another solution for unprotected biometric data is theuse of traditional protection methods (cryptosystems) such as:cryptography or hash functions. The use of these protectionmethods shows several classification problems when they areused on biometric modalities. This problem can occur becausethe cryptosystems shuffle the biometric dataset in a more effec-tive way. Thus, the biometric-based authentication systems donot classify and understand the relationship among protectedpatterns. Therefore, we want to prove that the relationship oforiginal samples is broken with the use of the most secureprotection methods.

In this research, we use four traditional protection methods.The first one is Papılio Cryptosystem (Cipher) [8], because itoffers the same security level than AES and RSA accordingto [9]. The second and third ones are MD5 (Hash Function)[10] and SHA-1(Hash Function) [11], which present the mostwell-known hash functions in the literature and are consideredas highly recommended since they increase the security inbiometric dataset. Finally, the fourth method is the Transfor-mation method (BioConvolving) [12], which was chosen dueto its high security and accuracy levels in biometric dataset.We aim at comparing Papılio Cryptosystem, Hash FunctionMD5, SHA-1 and Transformation method (BioConvolving)performances regarding the accuracy level in biometric-basedidentification systems. In fact, it is important to verify ifthe application of the traditional protection methods and thetransformation methods on biometric dataset keeps the sameaccuracy level found in unprotected biometric dataset by theensemble systems application.

This paper is divided into eight sections and organized

1st BRICS Countries Congress on Computational Intelligence

978-1-4799-3194-1/13 $31.00 © 2013 IEEE

DOI 10.1109/BRICS-CCI.&.CBIC.2013.97

586

1st BRICS Countries Congress on Computational Intelligence

978-1-4799-3194-1/13 $31.00 © 2013 IEEE

DOI 10.1109/BRICS-CCI.&.CBIC.2013.97

586

2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence

978-1-4799-3194-1/13 $31.00 © 2013 IEEE

DOI 10.1109/BRICS-CCI-CBIC.2013.103

586

Page 2: [IEEE 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC) - Ipojuca, Brazil (2013.9.8-2013.9.11)] 2013 BRICS

as follows. Some research studies related to the subject ofthis paper are presented in Sections II and III. Ensemblesystems are described in Section IV, while Section V presentsthe proposed methodology used in this paper. Methods andmaterials are presented in Section VI, while Section VIIpresents and discusses the experimental results. Finally, inSection VIII, some final observations about this work arepresented.

II. INCREASING SECURITY IN BIOMETRIC DATA

In the context of biometric data, the unauthorized copy ofstored data is probably the most dangerous threat, regardingusers privacy and security.The biometric templates must bestored in a protected way in order to offer security to biometric-based identification systems. In [13], these methods werebroadly divided into two classes of methods, which are:

1) In a biometric cryptosystem, some public informationabout the biometric template are stored. This publicinformation is usually referred to as helper data, andhence biometric cryptosystems are also known ashelper data-based methods. Biometric cryptosystemscan be further classified as key binding and keygeneration systems depending on how the helper datais obtained;

2) In the transformation approach feature, a transforma-tion function (f ) is applied to the biometric template(T ) and only the transformed template (f(T )) isstored in the database. The parameters of the trans-formation function can be derived from a random key(K)or password. This random key might or might notbe used in the transformation function, defining it askey-dependent or key-independent function.

The feature transformation schemes can be further cat-egorized as salting or as non-invertible transformations. Insalting, the transformation function (f ) is invertible, while fis (as implied in the name) non-invertible in the non-invertibletransformations. In this work, we will focus on the use of non-invertible transformation functions. Hence, hereafter the trans-formation function and template protection will be consideredas referring to the non-invertible transformation function.

According to [3], the transformation scheme is a set ofparametric features which was extracted from the acquireddynamic signatures. A hash function was applied to the binaryrepresentation of the characteristic and some statistical proper-ties of the enrollment signatures were explored. This methodprovides protection for the signature templates, although thecancellability property has not been considered.

A technique for cancellable biometrics data that merges aset of user-specific random vectors with biometric features isknown as BioHashing [14]. This method has been applied tofingerprints, palmprints and face images. In [15], an adapta-tion of the BioHashing method for signature templates wasproposed.

The main disadvantage of BioHashing is its significantperformance degradation when the legitimate token is stolenand it is used by the attacker claiming to be the legitimateuser. Moreover, in [16] an improved version of the BioHashingapproach, where the procedure is iterated many times to

increase the system security, has also been employed to protectsignature templates.

In the field of non-invertible transformations, some relevantworks can also be found in [12], [17], [18], [19], [20]. In [12],a signature template protection scheme called bioConvolvingis presented and discussed. In this scheme, non-invertibletransformations are applied to a set of signature sequences.However, the sole use of the transformation method still allowsusing the original dataset in the classification process (usingcoding/decoding algorithm to obtain the original dataset), mak-ing it vulnerable to being stolen. Therefore, it is necessary tofind a balance in the use of these traditional protection methodsand the accuracy levels. We can break the relationship-patterncharacteristics through the use of a strong cryptosystem inan attempt to give security and to achieve a minimal correctclassification at the same time.

III. TEMPLATE PROTECTION METHOD, CRYPTOSYSTEM

AND HASH FUNCTIONS

We can classify the traditional protection methods broadlycategorized as Cryptography and Hash Functions (MD5 andSHA-1). However, there is another one named Transformationmethod (BioConvolving) that it is strongly recommended sinceit shows good performance in biometric data, as it was verifiedin our bibliography.

A. Papılio Cryptosystem

In the literature, there are several cryptosystems such asAES [21] and RSA [22]. However, we use the Papılio cryp-tosystem [8] in this paper. Papılio is a Feistel cipher encryptionalgorithm where the function F is a function computed bythe Modified Viterbi [8] algorithm whose parameters aren/s (codification rate), Q (amount of bits for the avalancheeffect ), m (coder memory size) and the polynomial generator.Currently, blocks of any size of bits are considered. Also, thekey size is 128 bits, but it could be variable and the numberof the rounds may vary between zero and sixteen.

The process of Papılio decryption is essentially the same asthe encryption process, although the sub-keys are employed ina reverse order. There is a F function for each Papılio’s roundfor both processes. This F function is the same for all roundsand it is considered as the main component of this process. Theencryption and decryption processes always begin with the textblock division m/2. The right part is used as F function inputand the left one is used, along with the output text function Fin a XOR operation. This output XOR operation is the inputF function in the next round and so on up to the last round.

The size (number of bits) of the resulting encrypted textis the same of the plaintext, the original file, which is anadvantage of the Papılio method. Therefore, it is possible toachieve a ciphertext, encrypted text, for each completed round.It is possible to achieve the high level of the diffusion andconfusion with the variation of the number of rounds andthrough the operations modes ECB, CBC, CFB and OFB,

B. BioConvolving

The BioConvolving transformation method was originallyproposed in [12]. The main aim of this function is to divide

587587587

Page 3: [IEEE 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC) - Ipojuca, Brazil (2013.9.8-2013.9.11)] 2013 BRICS

each original voice sequence into W non-overlapping seg-ments, according to a randomly selected transformation key d.Then, the transformed functions are obtained by performing alinear convolution between the obtained segments. A generaldescription of BioConvolving is described as follows.

1) Randomly select a number (W − 1) of values dj .The selected number has to be between 1 and 99 inan ordered fashion. The selected values are arrangedin a vector d = [d0, ..., dW ], having kept d0 = 0 anddW = 100. The vector d represents the key of theemployed transformation.

2) Convert the values dj according to the following

relations bj = round((dj

100 )∗n), j = 0, ...,W , wheren is the number of attributes and round representsthe nearest integer,

3) Divide the original sequence Γ ∈ Rn, into Wsegments Γ(q) of length Nq = bq − bq−1 and whichlies in the interval [bq−1, bq];

4) Apply the linear convolution of the functionsf(Γ(q)), q = 1..W in order to obtain the transformedfunction

f = Γ(1) ∗ ... ∗ Γ(W ) (1)

As it can be observed, the length of the transformedfunctions is equal to K = N −W + 1 due to the convolutionoperation in 1, being the same of the original functions one.A final signal normalization, oriented to obtain zero meanand unit standard deviation transformed functions is thenapplied. Different realizations can be obtained from the sameoriginal functions, simply varying the size by the values of theparameter key d.

C. MD5 and SHA-1 Hash Functions

MD5 Message-Digest Algorithm [10] is simple to imple-ment, and provides a “fingerprint” or message digest of amessage of arbitrary length. It is conjectured that the difficultyof coming up with two messages having the same messagedigest is on the order of 264 operations, and that the difficultyof coming up with any message having a given messagedigest is on the order of 2128 operations. MD5 algorithm hasbeen carefully scrutinized for weaknesses. It is, however, arelatively new algorithm and further security analysis is ofcourse justified, as it is the case with any new proposal ofthis sort. SHA-1[11] is used to compute a message digest fora message or data file that is provided as input. The outputinformation should be considered a bit string. The length ofthis message is the number of bits in the message. If thenumber of bits is a multiple of 8, then we can represent themessage in hex format. The purpose is to make the total lengthof a padded message a multiple of 512. As a summary, a “1”followed by m “0” followed by a 64-bit integer are appendedto the end of the message to produce a padded message oflength 512 ∗ n. The 64-bit integer is the length of the originalmessage. Thus, the padded message is processed by the SHA-1as n 512-bit blocks.

IV. ENSEMBLE SYSTEMS

In a typical architecture of ensemble, a new input pattern ispresented to all K components (individual classifiers), whichare ensembles with different classifiers (Heterogeneous) and

using the same classifier types (Homogeneous). The individualclassifiers provide their output and send them to a combinationmethod, which is responsible for providing the final output ofthe system.

In pattern classification problems the answers are usuallycombined via voting, and in regression problems the processis via simple averaging [23].

Ensemble systems can be seen as a two-step decisionmaking process, where the first step is related to the decisionof the individual classifier level while the second step refersto the decision of the combination method. When designingensembles, two main issues are important: the choice of theensemble components and the combination methods that willbe used. In relation to the first issue, the members of anensemble are chosen and implemented. The ideal situation isa set of base classifiers with uncorrelated errors - they wouldbe combined in such a way that they minimize the effect ofthese failures. In other words, the base classifiers should bediverse among themselves.

Once a set of classifiers has been created and selected,the next step is to choose an effective way of combining theiroutputs. The choice of the best combination method for an en-semble implies the conduction of exhaustive testing. There arethree main strategies of combination methods: fusion-based,selection-based, and hybrid methods. The main difference ofthese strategies is the number of classifiers to be used in thecombination procedure. For instance, all individual classifierswill be taken into consideration in the fusion-based strategy,while only one classifier is taken into consideration in thecombination-based strategy. As stated by its name, the hybridstrategy mixes the two other strategies, using selection if andonly if the best classifier is really good to classify the testingpattern. Otherwise, a fusion procedure is used [23].

V. THE PROPOSED METHODOLOGY

In order to carry out this investigation, we propose amethodology defined by four steps which is shown in Figure 1.In the first step we select the protection methods to protect thebiometric dataset. The second step consists in the applicationof the template protection methods to the original biometricdataset. In step 3 we apply the Homogeneous and Hetero-geneous ensemble systems to transformed biometric datasets.Finally, in step 4 the results are analyzed.

In step 1 we select the protection template methods usedin this work, which are: Papılio Cryptosystem, BioConvolving,MD5 and SHA-1. The original dataset described in subsectionVI was chosen for this analysis.

In step 2, we applied each protection template method(MD5, SHA-1, BioConvolving, and Papılio) to the originaldataset and then we generated four encrypted datasets withthese protection methods. The dataset generated from MD5was named MD5Dataset. The encrypted dataset from SHA-1application was named SHA-1Dataset , the encrypted datasetfrom Papılio Cryptosystem was called PapılioDataset and thetransformed dataset from BioConvolving was named Transf-Dataset. Moreover, it is necessary to change all patterns ofeach encrypted dataset into floating point notation in order tobe used in the Weka machine learning tool.

588588588

Page 4: [IEEE 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC) - Ipojuca, Brazil (2013.9.8-2013.9.11)] 2013 BRICS

In step 3, we applied the Ensemble structures (Homo-geneous and Heterogeneous) to all four generated protectedbiometric datasets as well as in the original dataset. Finally,in step 4, the results from the application of the protectionmethods on biometric data are analyzed and compared throughthe accuracy and security level.

Fig. 1. Architecture of the Experiments

VI. METHODS AND MATERIALS

The on-line signatures dataset was an in-house dataset andit was called OriginalDataset; the data was collected from 100users and with 10 samples per user, making a total of 1000signature samples. The dataset had 18 attributes describingaspects like execution time, signature width and height, amongothers.

Two different structures of ensembles were used for eachsystem size: the Heterogeneous (which combines differenttypes of classification algorithms as individual classifiers) andHomogeneous (which combines classification algorithms ofthe same type) options. As there were several possibilitiesfor each structure, we reported the average of the accu-racy delivered by all configurations within the correspondingstructure. The chosen base classifiers were Neural Networks,Decision Tree and k-NN (Nearest Neighbor) [24]. In addition,these individual classifiers were combined using four commoncombination methods (meta-classifiers), which were: Sum,Majority Voting, Support Vector Machine (SVM) and k-NN[23].

In order to obtain a better estimation, a 10-fold crossvalidation method was applied to all the ensembles as well asto individual classifiers. Thus, all the accuracy results presentedin this paper refer to the mean over 10 different test sets. Inthese cases, a validation set was used to train these methodsand each ensembles structure is composed by 3, 6 and 12 baseclassifiers.

We use WEKA [25] tool to the application of the ensemblesystems on protected and unprotected biometric dataset. Forthe parameter setting of the combination methods, we opted forthe simplest version of these methods. Therefore, we used k-NN with k = 1 and SVM using a polynomial kernel and c = 1.Finally, a statistical test was applied to compare the accuracyof the classification systems. We used the hypothesis test (t-test), a test which involves testing two learned hypotheses on

identical test sets. In this investigation, we used the bi-caudalt-test with a confidence level chosen of 95% (α = 0.05).

VII. COMPARATIVE RESULTS

According to step 3, in the proposed methodology weapplied the ensemble systems on four datasets (Original-Dataset, TransDataset, PapılioDataset, MD5Dataset and SHA-1Dataset) and the results are shown in Tables I, II, III, IV andV, respectively. In these tables, we illustrate the accuracy leveland standard deviation of the classification systems (individualclassifiers - Ind, and ensemble systems combined with Sum,Majority Voting - MV, k-NN and SVM). It is important toemphasize that, since we used different individual classifiersto construct the homogeneous and heterogeneous ensembles,they have different accuracy values.

Table I (OriginalDataset) shows that the results were sat-isfactory for the original dataset. We can observe that the useof ensemble systems was positive for the accuracy level ofthe classification systems, increasing their performance, whencompared to the individual classifiers. When comparing bothensemble structures, it is possible to state that the heteroge-neous ensembles obtained better results than the homogeneousones. This fact shows the importance of having diversity (het-erogeneity) in the base classifiers. In heterogeneous structures,this is possible because these structures contain classifiers withdifferent specialties to reach the problem goal.

When applying these ensembles to TransDataset dataset(Table II), as it was expected the accuracy level decreasedin all cases (these decreases were statistically significant).It shows that the use of the transformation functions madethe decision process more complex and decreased the generalaccuracy level. However, these results can still be consideredas satisfactory since we have an accuracy level of close, orabove, 75% in most of the cases. This fact alerts us about animportant observation: the protection (transformation method)applied on signature dataset is not really strong. In fact, wecan not obtain the original dataset by using this transformationfunction. However, it is still possible to use the transformeddataset to obtain a satisfactory classification result.

When applying these ensembles to PapılioDataset dataset(Table III), the accuracy level drastically decreased in allcases (all these decreases were statistically significant), as itwas expected. It shows that the use of Papılio Cryptosystemmade the decision process more complex than BioConvolving.Therefore, it is not possible to use this biometric dataset inbiometric authentication systems. However, it is possible toobserve that this protection method applied to a signaturedataset is really strong, but it is not possible to classifydirectly the encrypted patterns into PapılioDataset dataset. Forexample, in Table III the best accuracy in all ensemble typeswas 8.18% with a standard deviation equal to 2.80%.

We can observe that the application of these ensembles toMD5Dataset dataset (Table IV) also decreased the accuracylevels sharply. It shows that the MD5 Hash function madethe decision process more complex and decreased the generalaccuracy level like Papılio Cryptosystem. However, if weanalyze this situation more carefully, it is possible to observethat the results of the MD5 were slightly worse than the Papılio

589589589

Page 5: [IEEE 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC) - Ipojuca, Brazil (2013.9.8-2013.9.11)] 2013 BRICS

TABLE I. RESULTS OF THE ENSEMBLES ON ORIGINALDATASET

OriginalDatasetSize 3 Ind Sum Voting k-NN SVM

Het 82.55 ± 5.9 87.8 ±5.1 86.56±5.96 85.47± 4.26 88.56 ± 2.24

Hom 81.41 ±7.21 83.47±5.32 82.1±6.83 79.13± 8.3 83.37 ± 5.77

Size 6 Ind Sum Voting k-NN SVM

Het 81.66±5.57 88.29±6.26 86.14±6.86 87.26±4.82 89.5±2.55

Hom 80.59±7.70 84.03±5.18 81.30±6.51 79.54±6.94 83.43±5.82

Size 12 Ind Sum Voting k-NN SVM

Het 81.49±5.99 88.04±6.17 87.46±6.78 87.46±4.74 89.09±2.49

Hom 81.19±8.22 83.90±5.67 82.83±6.51 79.77±7.49 82.37±6.75

TABLE II. RESULTS OF THE ENSEMBLES ON TRANSDATASET

TransDatasetSize 3 Ind Sum Voting K-NN SVM

Het 74.01±5.18 76.41±9.57 75.23±8.65 74.49±6.81 78.47±4.12

Hom 72.67±5.51 74.33±7.00 72.97±5.89 69.33±10.18 73.43±7.76

Size 6 Ind Sum Voting K-NN SVM

Het 73.25±5.01 76.66±9.44 74.84±9.09 75.74±6.62 79.33±4.25

Hom 71.92±5.21 74.37±7.04 72.00±5.24 69.07±8.05 73.33±7.85

Size 12 Ind Sum Voting K-NN SVM

Het 72.35±5.19 76.63±8.82 75.99±9.81 75.88±6.60 79.02±3.53

Hom 72.02±11.33 74.50±12.33 73.30±11.61 68.93±16.25 72.50±13.77

TABLE III. RESULTS OF THE ENSEMBLES WHEN APPLIED TO PAPILIODATASET

PapılioDatasetSize 3 Ind Sum Voting k-NN SVM

Het 5.59±2.26 7.61±2.87 4.30±2.00 5.23±1.00 5.89±1.52

Hom 5.41±1.92 5.43±1.98 5.00±1.80 4.53±2.75 5.90±3.46

Size 6 Ind Sum Voting k-NN SVM

Het 5.42±2.02 8.18±2.80 5.86±2.15 5.87±1.34 6.44±1.50

Hom 5.36±1.82 5.70±2.20 4.90±1.51 4.73±2.26 4.80±2.86

Size 12 Ind Sum Voting k-NN SVM

Het 5.28±1.89 6.43±2.95 5.97±2.27 5.37±1.37 6.80±1.67

Hom 5.42±1.93 5.40±2.00 5.57±2.11 7.20±2.43 4.97±3.78

TABLE IV. RESULTS OF THE ENSEMBLES WHEN APPLIED TO MD5DATASET

MD5DatasetSize 3 Ind Sum Voting k-NN SVM

Het 5.26±1.65 7.06±1.88 4.16±1.47 4.86±2.18 5.77±2.79

Hom 4.93±1.51 5,39±1.61 4.70±1.26 3.60±2.13 4.23±3.02

Size 6 Ind Sum Voting k-NN SVM

Het 4.79±1.50 7.54±1.91 5.20±1.51 4.83±1.77 5.41±2.83

Hom 4.65±1.44 5.30±1.62 4.40±1.05 3.70±2.51 4.10±2.86

Size 12 Ind Sum Voting k-NN SVM

Het 4.91±1.55 7,37±1.75 5.96±1.79 4.54±2.43 5.69±2.91

Hom 4.96±1.57 5.49±1.69 5.29±1.52 6.57±3.50 4.40±3.29

cryptosystem ones, but these results can be considered similarfrom a statistical point of view.

Finally, we applied the ensemble system to biometricdataset encrypted by the SHA-1 method (Table V), which alsoshowed a sharp decrease in the accuracy levels. These resultswere very similar to MD5 Hash, and it shows that the SHA-1 function made the decision process also more complex anddecreased the general accuracy level like Papılio Cryptosystemdid. However, when we analyze these results more carefully,we observe that the MD5 and SHA-1 methods were worsethan the Papılio cryptosystem in terms of accuracy level, butthese results can be considered similar from a statistical pointof view.

Meanwhile, similarly to MD5, it is not possible to use thisbiometric dataset in biometric authentication systems from astatistical point of view. In other words, the encryption processapplied to signature dataset is really strong, and it is notpossible to classify directly the encrypted patterns into SHA-1Dataset dataset.

Indeed, from the sole use of traditional protection methods,

the results of the ensemble system were very similar. In fact,they show that Papılio performance is at the same level asthe Hash Functions in the terms of breaking. Therefore, theseresults are considered good for Papılio.

VIII. CONCLUSION

In this research, we investigated the use of the traditionalprotection methods (Paplio cryptosystem, MD5 and SHA-1)and a Transformation Method (BioConvolving) on originalbiometric dataset and compared them regarding accuracy andsecurity.

The results showed that ensemble systems accuracy wasdrastically decreased on traditionally protected biometricdatasets. However, when we use the BioConvolving, the en-semble system results were better than the traditional pro-tection methods, but from the security point of view, theBioConvolving maintained a relationship between protectedand unprotected patterns. It is a problem from the securitypoint of view, because the transformed patterns still keep somecharacteristics from the original patterns. On the other hand,

590590590

Page 6: [IEEE 2013 BRICS Congress on Computational Intelligence & 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC) - Ipojuca, Brazil (2013.9.8-2013.9.11)] 2013 BRICS

TABLE V. RESULTS OF THE ENSEMBLES WHEN APPLIED TO SHA-1DATASET

SHA-1DatasetSize 3 Ind Sum Voting k-NN SVM

Het 4.19±1.84 5.93±2.54 3.37±1.66 3.56±1.87 4.31±2.39

Hom 4.19±1.84 2.05±1.99 3.76±1.67 3.73±3.44 3.70±3.38

Size 6 Ind Sum Voting k-NN SVM

Het 4.14±1.75 6.60±2.53 4.37±1.91 3.39±1.74 4.31±2.49

Hom 4.20±1.76 4.23±1.88 4.00±1.49 3.70±3.40 3.63±3.70

Size 12 Ind Sum Voting k-NN SVM

Het 4.19±1.82 6.25±2.44 4.76±2.12 3.40±1.70 4.23±2.67

Hom 5.59±2.03 5.75±2.13 5.15±1.78 6.23±3.99 4.85±4.45

the traditional protection methods increased the security onthese database. Thus, it is possible to prove the importanceof using ensembles on biometric datasets and the good results(accuracy rate) obtained by these systems should be consideredrelevant.

Therefore, this investigation allows us to propose threehypotheses. Firstly, the cryptosystem and/or hash function usedin this work can be considered strong in terms of securitysince performance is drastically reduced, even when usingmore elaborated classification structures as ensemble systems.Secondly, the transformation method (BioConvolving) used inthis research showed a better accuracy level than the traditionalprotection methods. Finally, the ensemble structure softensthe different characteristics of each classifier, which couldbe observed when we applied the classifier separately toeach encrypted dataset. Therefore, based on the encryptionprocess application to three modified dataset, we can ratifythat increasing security of a cipher encryption method isinversely proportional to the efficiency of the classificationmethod. However, the BioConvoling method is a good alterna-tive for biometric-based authentication systems which acceptthe unprotected biometric dataset. For future work, we canapplied the other cipher and use this process to investigate itsintegration with template protection methods like Fuzzy Vaultand Fuzzy Commitment.

REFERENCES

[1] J. C. H. Bringera and B. Kindarji, “The best of both worlds: Apply-ing secure sketches to cancellable biometrics,” Science of ComputerProgramming, vol. 74(1-2), pp. 43–51, 2008.

[2] I. L. Oliveira Filho, B. R. C. Bedregal, and A. M. Canuto, “Aninvestigation of ensemble systems applied to encrypted and cancellablebiometric data,” in Artificial Neural Networks and Machine Learning– ICANN 2012, ser. Lecture Notes in Computer Science, A. Villa,W. Duch, P. Erdi, F. Masulli, and G. Palm, Eds., vol. 7553. SpringerBerlin Heidelberg, 2012, pp. 180–188.

[3] C. Vielhauer, R. Steinmetz, and A. Mayerhofer, “Biometric hash basedon statistical features of online signatures,” in Pattern Recognition,2002. Proceedings. 16th International Conference on, vol. 1, 2002, pp.123–126 vol.1.

[4] M. Freire-Santos, J. Fierrez-Aguilar, and J. Ortega-Garcia, “Cryp-tographic key generation using handwritten signature,” in BiometricTechnology for Human Identification III, SPIE. Int Society for OpticalEngineering, United States, 2006.

[5] A. Juels and M. Sudan, “A fuzzy vault scheme,” Proc. IEEE Interna-tional Symposium on Information Theory, p. 408, 2002.

[6] A. Juels and M. Wattenberg, “A fuzzy commitment scheme,” in Pro-ceedings of the 6th ACM conference on Computer and communicationssecurity, ser. CCS ’99. New York, NY, USA: ACM, 1999, pp. 28–36.

[7] A. T. B. Jin, D. N. C. Ling, and A. Goh, “Biohashing: two factorauthentication featuring fingerprint data and tokenised random number.”Pattern Recognition, vol. 37, no. 11, pp. 2245–2255, 2004.

[8] F. S. de Araujo, K. D. N. Ramos, B. R. C. Bedregal, and I. S.Silva, “Papılio cryptography algorithm,” in Proceedings of the Firstinternational conference on Computational and Information Science,ser. CIS’04. Berlin, Heidelberg: Springer-Verlag, 2004, pp. 928–933.

[9] I. de Lima Oliveira Filho, “Criptoanalise diferencial do papilio,” Mas-ter’s thesis, Universidade Federal do Rio Grande do Norte, 2010.

[10] R. Rivest, “The md5 message-digest algorithm,” United States, 1992.

[11] r. D. Eastlake and P. Jones, “Us secure hash algorithm 1 (sha1),” UnitedStates, 2001.

[12] E. Maiorana, M. Martinez-Diaz, P. Campisi, J. Ortega-Garcia, andA. Neri, “Template protection for hmm-based on-line signature au-thentication,” in IEEE Conference on Computer Vision and PatternRecognition Workshops, CVPRW, Jun. 2008, pp. 1–6.

[13] A. K. Jain, K. Nandakumar, and A. Nagar, “Biometric templatesecurity,” vol. 2008. New York, NY, United States: HindawiPublishing Corp., Jan. 2008, pp. 113:1–113:17. [Online]. Available:http://dx.doi.org/10.1155/2008/579416

[14] A. Teoh, A. Goh, and D. Ngo, “Random multispace quantizationas an analytic mechanism for biohashing of biometric and randomidentity inputs,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 28, no. 12, pp. 1892–1901, dec. 2006.

[15] W. Yip, A. Goh, D. Ngo, and A. Teoh, “Generation of replaceablecryptographic keys from dynamic handwritten signatures,” in Advancesin Biometrics, ser. Lecture Notes in Computer Science, D. Zhang andA. Jain, Eds. Springer Berlin / Heidelberg, 2005, vol. 3832, pp. 509–515.

[16] A. Lumini and L. Nanni, “An improved biohashing for human authen-tication,” Pattern Recognition, vol. 40, no. 3, pp. 1057–1065, 2007.

[17] E. Maiorana, P. Campisi, J. Ortega-Garcia, and A. Neri, “Cancelablebiometrics for hmm-based signature recognition,” in IEEE InternationalConference on Biometrics: Theory, Applications and Systems, BTAS,2008, pp. 1–6.

[18] P. Campisi, E. Maiorana, and A. Neri, On-Line Signature-Based Authen-tication: Template Security Issues and Countermeasures. John Wiley& Sons, Inc., 2009, pp. 497–538.

[19] L. Nanni, E. Maiorana, A. Lumini, and P. Campisi, “Combining local,regional and global matchers for a template protected on-line signatureverification system,” Expert Systems with Applications, vol. 37, no. 5,pp. 3676–3684, 2010.

[20] E. Maiorana, P. Campisi, and A. Neri, “Template protection for dynamictime warping based biometric signature authentication,” in Proceedingsof the 16th international conference on Digital Signal Processing, ser.DSP’09. Piscataway, NJ, USA: IEEE Press, 2009, pp. 526–531.

[21] J. Daemen, S. Borg, and V. Rijmen, The Design of Rijndael: AES - TheAdvanced Encryption Standard, Springer-Verlag, Ed., 2002.

[22] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining dig-ital signatures and public-key cryptosystems,” Commun. ACM, vol. 21,no. 2, pp. 120–126, 1978.

[23] L. I. Kuncheva, Combining Pattern Classifiers, Methods and Algo-rithms. John Wiley and Sons, Inc., Hoboken, New Jersey, 2004.

[24] I. H. Witten and E. Frank, Data Mining: Pratical Machine LearningTools and Techiniques, 2nd ed. Elsevier, 2005.

[25] I. H. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes, and S. J.Cunningham, “Weka: Practical machine learning tools and techniqueswith java implementations,” 1999.

591591591