pattern recognitionweb.cse.ohio-state.edu/~sarkhel.5/mobj_cost-eff_ocr.pdf · 2019-08-12 · bangla...

Pattern Recognition 58 (2016) 172–189

Contents lists available at ScienceDirect

Pattern Recognition

http://d0031-32

n CorrE-m

journal homepage: www.elsevier.com/locate/pr

A multi-objective approach towards cost effective isolated handwrittenBangla character and digit recognition

Ritesh Sarkhel, Nibaran Das n, Amit K. Saha, Mita NasipuriComputer Science and Engineering Department, Jadavpur University, Kolkata 700032, India

a r t i c l e i n f o

Article history:Received 22 September 2015Received in revised form22 March 2016Accepted 13 April 2016Available online 22 April 2016

Keywords:Feature setRegion samplingHandwritten character recognitionMulti-objective evolutionary algorithmHarmony searchNSGA-IIAFS theory

x.doi.org/10.1016/j.patcog.2016.04.01003/& 2016 Elsevier Ltd. All rights reserved.

esponding author. Tel./fax: þ91 3324146766.ail address: [email protected] (N. Das).

a b s t r a c t

Identifying the most informative local regions of a handwritten character image is necessary for a robusthandwritten character recognition system. But identifying them from a character image is a difficult task.If this task were to be performed incurring minimum possible cost, it becomes more challenging due tohaving two independent, apparently contradicting objectives which need to be optimized simulta-neously, i.e. maximizing the recognition accuracy and minimizing the associated recognition cost. Toaddress the problem a multi-objective approach is required. In the present task, two popular multi-objective optimization Algorithm (1) a Non-Dominated Sorting Harmony-Search Algorithm (NSHA) and(2) a Non-Dominated Sorting Genetic Algorithm-II (NSGA-II, Deb et al., 2002 [18]) are employed forregion sampling separately. The method objectively selects the most informative set of local regionsusing the framework of Axiomatic Fuzzy Set (AFS) theory, from the sets of pareto-optimal solutionsprovided by the multi-objective region sampling algorithms. The system has been evaluated on twoisolated handwritten Bangla datasets, (1) a dataset of randomly mixed handwritten Bangla Basic andCompound characters and (2) a dataset of handwritten Bangla numerals separately, with SVM basedclassifier, using a feature set containing convex-hull based features and CG based quad-tree partitionedlongest-run based local features extracted from the selected local regions. The results have shown asignificant increase in recognition accuracy and decrease in recognition cost for all the datasets. Thus thepresent system introduces a cost effective approach towards isolated handwritten character recognitionsystems.

& 2016 Elsevier Ltd. All rights reserved.

1. Introduction

Optical Character Recognition (OCR) is an active area of research.While there are many systems commercially available for recog-nizing printed text [1–4], their success is yet to be extended tohandwritten characters. Several reasons can be cited to explain thisapparent anomaly. Shape and size of handwritten characters varyfrom one individual to another. It may even vary for a single indi-vidual from time to time, depending on various factors. Thesechallenges make the task of recognizing handwritten charactersvery difficult. Researchers all around the world have proposedseveral methods [5] for handwritten character recognition, butmost of them are focused on Roman scripts [6], concentrating onEnglish and other European languages. Among Asian languages,Chinese [7], Japanese, Korean languages are dominant in the lit-erature. Indian scripts like Malayalam, Tamil, Telugu, and Hindi havestarted to get attention of the researchers during past decade [8,9],

but development of OCR for complete Bangla script [10] has notreceived much attention from researchers until recently. Bangla isthe second most popular script in India and the fifth most popularscript in the world [11]. Bangla alphabet contains some of the mostintricate and complex characters, which differ from one anotheronly by a single period, a modifier ref or an upper horizontal line orMatra, as shown in an example in Fig. 1. Bangla alphabet containsabout 50 Basic characters (11 vowels and 39 consonants) and morethan 334 Compound characters [12]. Samples of a few of BanglaBasic and Compound characters is shown in Fig. 2.

One of the most common approaches taken up by OCRresearchers is zoning, i.e. dividing the character image into severalzones or local regions [13] and generating the invariant local featureset by extracting features from every local region. There are severaldifferent zoning methods [13] mentioned in the literature, but mostof them can be classified into two major categories: static [4,10]and dynamic zoning methods [13]. Static zoning methods divide ahandwritten character image into a fixed set of overlapping ornon-overlapping windows, where the number of windows is fixed.Basu et al. used static zoning method in [14] and sub-divided thehandwritten numerals’ image into 9 fixed-sized, overlapping local

Fig. 1. Similarity in shape and size between different Bangla characters. (a). Bangla Basic character ‘ ’ (b) Bangla Basic character ‘ ’.

Fig. 2. Samples of handwritten Bangla characters.

R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189 173

regions and extracted longest-run based features form each sub-region. On other hand, dynamic zoning methods sub-divide ahandwritten character image into local regions by dynamicallycreating windows based on some statistical or topological featureof that specific character. Cao et al. [8] proposed a similar techni-que to generate a hierarchical feature-space based on a quin-treepartition of the character image, where zones were dynamicallycreated based on the centroid of the contour segment of thecharacter residing in the parent zone. Das et al. [10,15] have used aGA based selection mechanism to find out the most optimal set oflocal regions for recognition of handwritten Bangla numerals.

In those papers, the researchers have emphasized on achievingbetter recognition accuracies, but associated recognition costsincurred in the process were not taken into consideration. Forexample, Das et al. presented a two pass approach towardshandwritten character recognition in [10], which produced a sig-nificant increase in the recognition accuracy but at the cost of arecognition cost which is almost 8.5 times than the average percharacter recognition cost incurred by traditional single passapproach towards handwritten character recognition. This mayprove to be undesirable to users who want to use such a system forreal-life applications. An extensive study of recognition accuracyversus associated recognition cost is undertaken in our experi-mental setup to investigate the scope of a practical optical char-acter recognition system, in terms of both recognition accuracyand associated recognition cost.

In the present work, a multi-objective approach towards opticalcharacter recognition (OCR) is proposed, which attempts to find atrade-off between the recognition accuracy achieved by the sys-tem and its associated recognition costs. In real life applications ofan OCR system, insignificant increase in recognition accuracy atthe expense of high recognition cost may not be acceptable to theusers of the system. In such cases, a multi-objective approach canprovide the user with a set of good solutions. In the present work,framework of a novel, multi-objective isolated handwritten char-acter recognition system is proposed. There are several variants ofmulti-objective Evolutionary Algorithms [16] present in the lit-erature. A Non-dominated Sorting Harmony-search Algorithm (NSHA[17]) based region sampling method and a Non-dominated SortingGenetic Algorithm – II (NSGA-II [18]) based region sampling methodis introduced in our present work. These two multi-objective

region sampling algorithms mark one of the contributions of thepresent work. Both of the region sampling algorithms areemployed over the decision space separately. These algorithmshave two objective functions: – (1) maximizing handwrittencharacter recognition accuracy and (2) minimizing associatedrecognition costs. In our experimental setup, recognition accuracyis measured using an SVM based classifier and recognition costsare measured by: (i) average time taken by the recognition systemto recognize each handwritten character in the test-set and (ii) thenumber of local regions used to represent each handwrittencharacter in the test-set. Two sets of pareto-optimal solutionsprovided by these two algorithms are then combined using Axio-matic Fuzzy Set (AFS) Theory [19]. The multi-objective regionsampling algorithms and the AFS theory based approach toobjectively combine the pareto-optimal solutions provided by themulti-objective algorithms mark one of the contributions of thepresent work.

The proposed method tries to find an objective solution overthe decision space, while providing an optimal trade-off betweenrecognition accuracy and corresponding recognition costs, makingit suitable to use in practical applications. The present work hasbeen evaluated on datasets of isolated handwritten Bangla char-acters and handwritten Bangla numerals separately. Results fromthese experiments have been compared with some of the otherpopular handwritten character recognition methods present in theliterature, to prove its superiority.

The rest of the paper is organized as follows: in Section 2, abrief overview on multi-objective evolutionary algorithms basedregion sampling techniques is presented, basics of AxiomaticFuzzy Set (AFS) theory is introduced in Section 3; Section 4describes the featureset and our present work is discussed indetails in Section 5, experimental results are presented in Section6. Finally, a brief conclusion is drawn based on the results gatheredfrom the experiments.

2. Motivation behind using multi-objective evolutionaryalgorithms for region sampling

Region sampling based OCR systems try to identify the mostdiscriminative set of local regions from handwritten character

R. Sarkhel et al. / Pattern Recognition 58 (2016) 172–189174

images. The easiest approach is to exhaustively enumerate everypossible combination of local regions until the best combination isfound. This approach however, would take time of exponentialorder. Therefore, Evolutionary Algorithm based meta-heuristicapproaches are generally employed, so that a good enough solu-tion is found within a reasonable amount of time.

Most of the proposed methods present in handwritten char-acter recognition literature [11] however, focus only on increasingthe accuracy of the recognition system. Sometimes such incre-ments in recognition accuracy may come at an expense ofincreased recognition cost. In such a scenario, where moderateincrements in recognition accuracy is achieved by incurring highrecognition costs, the solutions provided by a recognition systemmay not be acceptable to the end-user. A handwritten characterrecognition system, to be used in real-life applications, should notjust increase the average recognition accuracy of its test datasetbut keep a check on the associated recognition cost as well. Ourpresent work uses Multi-objective Evolutionary Algorithms to findsuch a set of solutions. The multi-objective Evolutionary Algorithmbased region sampling technique, used in our experimental setup,tries to identify the most informative set of local regions byheuristically searching the solution space, while incurring mini-mum possible recognition cost.

To formalize, in our experimental setup, the multi-objectiveregion sampling techniques try to optimize two separate objec-tives simultaneously, which are: (a) maximizing the recognitionaccuracy and (b) minimizing the cost associated with the recog-nition process, as described before. The recognition cost is repre-sented in terms of two major factors: (i) number of local regionsneeded to represent the handwritten characters and (ii) averagetime taken by the proposed system to recognize each handwrittencharacter. This set of solutions, which is also called the pareto-optimal solutions [20] presents the users of the system with a setof choices. Users may choose a single solution from the pareto-optimal set, based on their real-time requirement and domainexpertize.

There are several variants of multi-objective EvolutionaryAlgorithms [21] present in the literature. In our present work, wehave used a widely popular multi-objective Evolutionary Algo-rithm, Non-dominated Sorting Genetic Algorithm (NSGA-II), pro-posed by Deb et al. [18]. Since its introduction, NSGA-II has beenapplied in various applications such as water network design [22],construction management [23], economics [24], population plan-ning [25] etc. To the best of our knowledge, the performance ofNSGA-II in a region sampling based handwritten character recog-nition system is yet to be explored. In the present work, an NSGA-II based region sampling algorithm is proposed. We have includedan extensive study of another multi-objective Evolutionary Algo-rithm in our experimental setup. Harmony Search Algorithm is arelatively new, music inspired meta-heuristics algorithm, pro-posed by Geem et al. [26,44]. It has been proved [27] to providebetter performance than Genetic Algorithms for some applica-tions. Several variants of multi-objective Harmony Search Algo-rithms [28] are presented in the literature. They have been used toaddress problems such as economic/environmental dispatch [29],optimal power flow problem [17] etc. but to the best of ourknowledge, performance of a multi-objective Harmony SearchAlgorithm in a region sampling based handwritten characterrecognition system is yet to be explored. Here, we have introduceda Non-dominated Sorting Harmony Search Algorithm (NSHA) basedregion sampling technique for this purpose.

Both NSGA-II and NSHA based region sampling technique,proposed in our present work, have been tested extensively onpublicly available datasets of isolated handwritten Bangla char-acters and isolated handwritten Bangla numerals. Details aboutthe datasets used in the present work is provided in Section 4.

Using multiple region sampling algorithms and comparing theirresults hereafter, help us to reach a conclusive decision about theperformance of the proposed system, as the results becomeindependent of the specificities of any particular algorithm inconsideration.

In our present work, an Axiomatic Fuzzy Set (AFS) Theorybased fuzzy logic is proposed to combine and finally return asingle solution from the candidate set of pareto-optimal solutionsextracted from the multi-objective region sampling algorithmsused in our experimental setup. AFS Theory has been employed inour experimental setup, because its framework provides muchgreater flexibility [19] in computing the interaction between a setof local regions. It is also easier to compute the combined class-separating power of a set of local regions, compared to other fuzzylogic based systems. Important aspects of a fuzzy logic basedsystem like definition of membership function, logical operationbetween a set of elements in the fuzzy featureset is alreadydefined [19] the framework of AFS theory.

3. A Brief overview on axiomatic set theory based fuzzy logic

A priori identification of the set of local regions containing themost informative set of features is a difficult task. Local region orfeature selection methods, described in the literature, can bebroadly classified into two categories: wrappers and filters [30].Wrappers identify the most informative set of features from thediscourse with the help of a training dataset and an efficientlearning algorithm; whereas filters use some kind of heuristics toidentify the feature set that has the most promise. For applicationslike handwritten character recognition, merit of a local region isinterpreted by its contribution to the recognition system’s class-separability power. Although there has been some literature thatuses entropy based fuzzy interpretation [31] of local features infeature-ranking techniques for recognition of handwritten Hindinumerals, fuzzification of features to determine the feature setwith most class-separability power for handwritten characterrecognition has not been addressed that much. Collection ofindividually good features does not ensure a good feature set [32].Finding out all the possible combinations of local features is also adifficult and time consuming task [33]. Our present work uses theframework of AFS theory [19] based algebra to define fuzzysemantics, membership functions and logical operations on a setof local regions to identify the subset of local regions which hasthe most class-separating power.

3.1. AFS algebra based fuzzy logic

AFS algebra mainly focuses on two things: how fuzzy sets arecreated i.e. the definition of membership functions and howlogical operations on fuzzy sets are defined. The concepts extrac-ted from a given dataset depend strongly on the observed data[19]. For example, the concept of “heavy” (in terms of a person’sweight) is not interpreted the same way in a dojo of JapaneseSumo wrestlers, as it would be interpreted in a Ballet studio ofBroadway. Strong additional background knowledge on the data-set is thus necessary to extract a concept.

A fuzzy concept is defined based on one or more features of thedataset. Let M be the set of fuzzy concepts, M ¼m1;m2…::mn,where a concept mi is defined on a feature of XiAF . M can beviewed as a building block, containing the elementary conceptsassociated to each feature. Using M, every possible concept on Xcan be easily formulated and represented, which essentially meansthat every possible concept Y on X can be easily formulated using


M, where

Y ¼X

j A J∏i

mi ACjmi

� �where miDCj: ð1Þ

3.1.1. DefinitionsLet M be a non-empty set. Then the set EM* can be defined as

EM� ¼X

jA J∏i

mi ACjmi

� �jCjA2m;

njA J; J maybe any non�empty index set

o: ð2Þ

A binary relation R between two fuzzy concepts is defined asfollows:

ForP

jA J ∏imi ACj

mi

� �;P

iA I ∏kmk ADi

mk

� �AEM�;X

j A J∏i

mi ACjmi

� �RX

i A I∏k

mk ADimk

� ��

�8Cj jA Jð Þ ( Di iA Ið Þsuch that Cj+Di

8 Di iA Ið Þ ( Cj jA Jð Þ such that Di+Cj

(ð3Þ

It is clear from (3) that R is an equivalence relation and thesemantics

PjA J ∏i

mi ACjmi

� �and

PiA I ∏k

mk ADimk

� �is equivalent

under R.

The quotient EM�=R is called EM and eachXi A I

∏kmk ADi

mk

� �ϵ EM is called a f uzzy concept: ð4Þ

3.2. AFS structure

AFS structure formalizes a mathematical description of the datastructure used by the AFS algebra, a completely distributive lattice[19] generated by the fuzzy datasets and the concepts behindthem. An AFS structure can be defined as a triplet ðM;т;X0Þ:

3.2.1. DefinitionLet X0;M are two finite sets. Let т be a relation defined as:

т: X0 � X0-2m. т is called an AFS structure if it follows two axioms:

ðaÞ x0; y0ð ÞAX0 � X 0; т x0; y0ð ÞDт x0; x0ð ÞðbÞ 8 x0; y0ð Þ; y0; z0ð ÞAX 0 � X0;тðx0; y0Þ \ Tðy0; z0ÞDтðx0; z0Þ ð5Þ

In this definition, X0 is the universe of discourse, M is the conceptset and т is the axiomatic structure. In real world applications, тcan be defined as follows:

тðx0; y0Þ ¼ fmjm AM; x0Rmy0gA2m; ð6Þwhere Rm represents binary relation of simple conceptmAM, andx0Rmy0 means the degree of x0 belonging to attribute m is largerthan or equal to that ofy0.

The membership function of a fuzzy concept γ ¼ γ ¼Pj A J

∏imi ACj

mi

� �is defined as follows:

θγ ¼ supj ∏γACjℳγ Cтj ðx0Þ

� �� ;where CTðx0Þ ¼ z0AX0 jTðx0; z0Þ+C

� �ð7Þ

Table 1Datasets used in the present work.

Index Name of the dataset Dataset type

DB1 CMATERdb 3.1.1 Isolated, handwritten Bangla numeralsDB2 ISI handwritten Bangla numeral

databaseIsolated, handwritten Bangla numerals

DB3 CMATERdb 3.2.1 Isolated, handwritten Bangla Basic characterDB4 CMATERdb 3.2.1 þ CMATERdb

3.1.3.3Randomly mixed, isolated, handwritten BangBasic and Compound characters

Cт x0ð Þ is the set of all elements whose degree of belonging to∏mACm is less than or equal to that of x0 and Mγð:Þ is a functionwhose value always lies between 0 and 1.

4. Design of the featureset

4.1. Dataset of the experiment

The proposed method has been evaluated on four publiclyavailable benchmark [12,34] datasets. A brief overview of thedatasets used in the present work is described in Table 1. Moredetails about these databases such as sample collection, prepara-tion techniques etc. can be found in the ‘Reference’ column pro-vided in Table 1.

4.2. Design of the feature set

The extracted features, used in the present work can be clas-sified into two categories: global and local. While the number ofglobal features extracted from a handwritten character or digitimage is fixed, in case of local features it may vary based on thenumber of local regions considered. Global features are extractedfrom the whole image, whereas local features are extracted from asub-region or local region [13] of the image being considered. Bothglobal and local features are used in the feature set so that theinherent distinguishing pattern of a character or digit can be suf-ficiently quantified.

4.2.1. Global featuresThe number of global features used in our experimental setup

is 175, out of which 155 are convex-hull based features and therest are quad-tree based longest run based features. Convex hullbased features, proposed by Das et al. [40] are used in the presentwork. The feature set includes maximum dcp, total number ofrows having dcp 40, average dcp, number of visible bays, etc. CGbased quad-tree partitioned longest-run features were suggestedby Basu et al. in the year 2009. Longest run features extracted fromthe root node (1*4¼4) and the first-level child-nodes (4*4¼16) ofthe resultant quad-tree also contribute to the global feature set(4þ16¼20). Details about the feature extraction techniques canbe found in reference [41].

4.2.2. Local featuresThe features extracted from the second level of child nodes of

the CG based quad-tree [42] contribute to the local feature set. Thenumber of nodes in the second level of the quad-tree is 16(42¼16). Longest run based features are extracted from each localregion represented by a node in the quad-tree, along four axes,horizontal, vertical and two diagonals. Each local region is repre-sented by 4 local features. Hence, total number of local features inthe feature set of our present work is 64 (16*4¼64) and theaggregate number of features in the integrated feature set of theexperimental setup is 239 (175þ64¼239).

Number ofclasses

Number of trainingsamples

Number of testsamples

Reference

10 4000 2000 [35,36]10 19392 4000 [37,38]

s 50 12000 3000 [35,36]la 384 46919 11661 [36,39]


A brief overview of the feature extraction techniques employedin our present work is presented in Fig. 3.

5. Present work

As discussed before, objectives of our present work is threefold.They are: (a) proposing a region sampling methodology thatreturns a subset of local regions containing the most dis-criminating features, while incurring minimum possible recogni-tion cost (b) evaluating the performance of the proposed methodon two separate, publicly available datasets of Bangla handwrittencharacters and (c) comparing the performance of the proposedmethod against some of the popular contemporaries present inthe literature. Fig. 4 presents a block diagram of the proposedsystem.

5.1. Definitions and notations

Let IH�W is a 2D array that denotes a digital image M ofdimensions H �W , such that IH�W ¼ f ði; jÞ0r irH�1and0r j

�rW�1g, where f(i,j) denotes the intensity of the pixel at positionði; jÞ. In our present work, only binary images of Bangla hand-written characters and numerals are considered. For a binaryimage the value of f ði; jÞA 0;1½ �. As discussed before, a 2D binaryimage is assumed to be a combination of a number of overlapping/non-overlapping regions Rk, k¼{1, 2... n}, i.e. M¼⋃kRk. Here theregionsRk, k¼{1, 2... n} are rectangular in shape with edges par-allel to corresponding edges of M. Thus, a region Rk is defined bythe pixel-pairs at the bottom-left and top-right cornersi.e.Rk � iTLk ; jTLk ; iBRk ; jBRk

n o, where the pixel at (iTLk ; jTLk ) denotes the

pixel at the top-left corner of the region and the pixel at (iBRk ; jBRk )denotes the pixel at the bottom-right corner of the region. Goal ofthe present work is to find a subset MiDM, such that the recog-nition cost of the character image (when described by the set ofregions Mi) is minimal and the recognition accuracy achieved bythe proposed system is maximum.

5.2. Region sampling methodology

As discussed before, the local feature set used in the presentwork is comprised of longest run based features [15] computedalong 4 directions, extracted from the second level of the CG based

Fig. 3. Feature extraction techniques i

quad-tree partition of a sample handwritten character or digitimage. Let i denotes the partition level of a CG based quad-tree,where iA 0;1;2f g. Si denotes the set of local regions at i-th level ofthe quad-tree; then Si ¼ Rij jA 0;1…4i�1

n o�� on. Clearly, a sample

image M can be defined as M¼⋃j Rij. Let the set of local featuresextracted from the region Rij of M is denoted by Vij, where

Vij ¼ FHij ; FVij ; F

D1ij ; FD2ij

n o. FHij ,F

Vij ,F

D1ij ,FD2ij denote the values of the

longest run based feature extracted from the local region Rij of thehandwritten character or numeral image, along its four major axesi.e. horizontal axis, vertical axis, principal diagonal axis-1 (south-west diagonal) and principal diagonal axis-2 (north-east diagonal)respectively, as shown in Fig. 8. Global feature set of M is definedas, G¼ ð⋃iA 0;1f g

j V ijÞ [ CF , where CF denotes the convex-hull basedfeatures extracted from the entire image. Local feature set of M isdefined as L¼⋃i ¼ 2

j V ij. Fitness value of a set of regions Lm � L isdenoted by f Lm [ Gð Þ, where f :ð Þ denotes the recognition accuracyby a SVM is based classifier of the sample handwritten character ordigit image, described by only the set of local regions Lm.

The present work performs region sampling in two phases. Inthe first phase of the proposed system, a non-dominated sortingharmony-search algorithm (NSHA) based region sampling methodand a non-dominated sorting genetic algorithm (NSGA-II by Debet al. [18]) based region sampling method are employed on thedataset separately and the non-dominated points are extractedfrom both of the pareto-optimal fronts. In the final phase, a qualityconsensus is performed among the extracted points. Local regionswith majority of votes (more than 50% vote share) are preservedseparately and the rest of the candidate regions are reinterpretedas fuzzy features with the help of AFS theory [43]. The subset ofregions with the most class-separating power is selected. Finally,the union of the set of fuzzily selected subset of local regions andthe set of previously separated regions are returned. Fig. 4 shows aschematic diagram of the proposed system.

5.2.1. The Non-dominated Sorting Genetic Algorithm (NSGA-II) basedregion sampling methodology

Since its introduction, NSGA-II [18] has been widely used invarious applications such as water network design [22], con-struction management [23], economics [24], population planning[25] etc. In our present work, we have investigated the efficacy ofNSGA-II for the selection of the most discriminating set of localregions from a handwritten character or digit image, whileincurring minimum possible recognition cost. In the following

mplemented under present work.

Fig. 4. Schematic representation of the integrated system developed under present work.


section, a NSGA-II based region sampling methodology is intro-duced for this purpose.

(1) Initialization: The algorithm is initialized with an empty set ß;upon termination ß contains the non-dominated pointsextracted from the pareto-optimal front. The population sizeis fixed at the number of possible non-trivial subset cardinal-ities of the local region set. In our present work, the popula-tion size of NSGA-II based region sampling methodology is 15.The parameters crossover-probability (pc) is set to 0.9 andmutation-probability (pm) is set to (1/(total number of localregions)) or 0.0625 in our present work, as suggested by Debet al. [18].

(2) Identification of the pareto-optimal points in search-space:Through reproduction and natural selection, the algorithmheuristically tries to identify the set of local regions having themost class-separability power, while incurring minimal recog-nition cost. The region selection method is described in detailsin Appendix A as Algorithm 1.

(3) Objective functions: The algorithm has two objectives:(1) minimization of average recognition cost for handwrittencharacter/digit and (2) maximization of recognition accuracyof the test dataset, using a SVM-based classifier. The recogni-tion cost is expressed by, (i) average per character recognitiontime and (ii) number of local regions used to represent eachhandwritten character/digit.

Fig. 5. Binary operators used in a genetic algorithm.

Fig. 6. Crowding distance of point P is the perimeter of the cuboid shown indashed line.


To formalize, the optimization problem can be described asfollowing:

Maximize f A ðO1ÞMinimize f T ðO2ÞMinimize f R ðO3Þ

8><>:

with respect to the following constraints:

f AZ0 ðC1Þf T 40 ðC2Þf R ¼ VRj j ðC3Þ1r f Ro16 ðC4Þ

8>>>><>>>>:where VR denotes the region-vector representing the encodingof local regions returned by the algorithm, f A denotes therecognition accuracy achieved by the SVM based classifier usingthe features extracted from the global featureset G and the local

featureset FR extracted from VR (as discussed in Section 4.2) , f Tdenotes the average per character recognition time achieved bythe proposed system and f R denotes the number of local regionsused to represent each handwritten character/digit.

(4) Termination criteria: The algorithm is terminated when thealgorithm has successfully reproduced 25 generations of theinitial population. Hence, the maximum number of iterations(NI) of the algorithm is 25.

Pseudocode of the proposed NSGA-II based region samplingalgorithm and a detailed analysis of the algorithm, is described inAppendix A as Algorithm 1.

5.2.2. A Non-dominated Sorting Harmony-search (NSHA) basedregion sampling methodology

Harmony search is a derivative-free nature-inspired meta-heuristics algorithm, proposed by Geem et al. [22], in the year2001. It mimics a musician’s journey towards a better state of har-mony. Since then, it has been used extensively by researchers forvarious applications [45,46]. Several variations of multi-objectiveharmony search algorithm have also been proposed by researchers[28], but the performance of a multi-objective harmony searchalgorithm for region sampling based handwritten character recog-nition system is yet to be explored. To address this, a non-dominatedsorting harmony-search based region sampling method is proposedin our present work. The proposed method uses a non-dominatedsorting and crowding-distance based ranking technique, introducedby Deb et al. in [18], to extract the pareto-optimal solutions from thesolution space. The exploration (harmony memory consideration rateor HMCR) and exploitation (pitch adjustment rate or PAR) parametersused by this algorithm are dynamically adjusted with respect to thecurrent generation of the population.

In our experimental setup, a comparative study between theproposed algorithm and a basic harmony search based regionsampling was performed on the dataset of randomly mixedhandwritten Bangla Basic and Compound characters [39,12]. Ourpresent work have provided a significant 14.2965% increase inrecognition accuracy and 6.25% decrease in the associated recog-nition cost, compared to basic harmony search based regionsampling method [47].

(1) Initialization: The algorithm is initialized with an empty set ß;upon termination ß contains the non-dominated pointsextracted from the pareto-optimal front. Harmony memory size(HMS) is fixed at the number of possible non-trivial subsetcardinalities of the set of local regions. In our present work, HMSis set to 15. The parameters harmony memory consideration rate(HMCR) and pitch adjustment rate (PAR) are self-adaptive. Valueof HMCR at the tth population generation is denoted by HMCRt

and the value of PAR at the tth population generation is denotedby PARt , defined as following:

HMCRt ¼HMCRminþðHMCRmax�HMCRminÞ=t � NI ð5Þ

PARt ¼ PARmax–ðPARmax�PARminÞ=t � NI ð6Þ

PARmax and PARmin denote the maximum andminimumvalues ofthe parameter pitch adjustment rate or PAR, whereas HMCRmax

and HMCRmin denote the maximum and minimum values of theparameter harmony memory consideration rate or HMCR. Thevalues of HMCRmax and HMCRmin are set as 1.0 and 0.9 respec-tively, whereas the values of PARmaxand PARmin are 1.0 and0.0 respectively, as suggested by Pan et al. in [48].

(2) Identification of the pareto-optimal points in search-space: Thealgorithm heuristically searches for the most informative setof local regions that will incur minimal recognition cost. The

Fig. 7. Analogy of musical improvisation and optimization technique in Harmony Search Method.

Fig. 8. Directional vector model of a region in residual regions-space.


region selection method employed under present work isdescribed in details in Appendix A as Algorithm 2.

(3) Objective functions: The algorithm has two objectives:(1) minimization of average recognition cost for handwrittencharacter/digit and (2) maximization of recognition accuracyof the test dataset, using a SVM-based classifier. The recogni-tion cost is expressed by, (i) average per character recognitiontime and (ii) number of local regions used to represent eachhandwritten character/digit, same as the NSGA-II based regionsampling method described before. Therefore, the optimiza-tion problem can also be formalized same as the NSGA-IIbased region sampling algorithm, described before.

(4) Termination criteria: The algorithm is terminated when thealgorithm has successfully reproduced 25 generations of theinitial population. Hence, the maximum number of iterations(NI) of the algorithm is 25. Pseudocode and a detailed analysisof the proposed NSHA based region sampling algorithm isdescribed in Appendix A as Algorithm 2.

Both of the algorithms proposed in our experimental setup,have been extensively tested for a dataset of randomly mixedhandwritten Bangla Basic and Compound characters and a datasetof Bangla numerals separately. The results of the experiments areshown in Section 5. It is to be noted that both of the algorithmsdiscussed above are developed independent of any specific datasetand can be successfully applied for any region sampling basedpattern recognition task.

5.3. Integrating the pareto-optimal solutions using AFS theory basedfuzzy logic

Applying NSHA and NSGA-II on a test dataset of handwrittenBangla characters/numerals (discussed in 5.2.1 and 5.2.2) producetwo pareto-optimal sets of local region subsets. Let, the pareto-optimal set returned by NSHA is denoted by ß1 and the pareto-optimal set returned by NSGA-II is denoted by ß2. The presentwork uses a novel region sampling method to objectively choosethe most discriminative set of regions, using the informationgathered from ß1 and ß2. This is one of the major contributions ofthe present work.

It has been proved in [21] that elitism helps accelerate theconvergence rate of a multi-objective evolutionary algorithm. Tochoose a solution from set of candidate solutions in both ß1 and ß2,

a quality consensus is performed. The regions with majority con-sensus are therefore excluded from the region-space from furtherconsiderations and maintained separately henceforth. Let, this setof regions is denoted by E. The resultant region space VR forms thevocabulary for fuzzy concepts and semantics; it is called residualregion-space. Clearly, VR¼Ec.

As previously discussed in Section 3, one of the main motiva-tions of using fuzzy logic is better interpretability of complexconcepts to model the outstanding human ability of decisionmaking with imprecise understanding or lack of backgroundknowledge. In the final phase of the proposed method, the residual


region-space is reinterpreted as a fuzzy feature set. The steps arediscussed in the following section.

5.3.1. Scalarization of the residual feature setAs discussed earlier in Section 4.2.2, longest run features are

extracted along 4 axes from each local region. Considering a unitvector along the direction of each of the four axes, h(horizontal), v(vertical), d1(principal diagonal 1) and d2(principal diagonal 2), eachregion can be represented as a vector of the longest run of blackpixels (shown in Fig. 3b) along each of the four directions. The firststep of final phase of the proposed method is scalarization of thelocal feature set extracted from each region. Let the scalarized featureset is denoted by S. The proposed directional vector model of longestrun based features, used in the present work is shown in Fig. 8.

In our present work, scalarization of local regions is done byrepresenting it as a vector of directional components.

Example:Suppose, the feature set extracted from a local region R is given

below:

V
ertical Horizontal Principal Diagonal 1 Principal Diagonal 2
B
C D
Table 2Values of the parameters used in our experimental framework.

Region samplingmethodology

Parameter values used in the present work

NSHA [17] based regionsamplingmethodology

HMCRmax HMCRmin PARmax PARmin POP NI1.0 0.9 1.0 0.0 15 25pm pc POP NI

NSGA-II [18] basedregion samplingmethodology

0.0625 0.9 15 25

A

Using the proposed vector model the region R can be repre-sented as

R¼ AvþBhþCd1þDd2 ð7ÞUsing vector decomposition, (7) can be rewritten as

R¼ AvþBhþCðhsin45ο–vcos45οÞþDðhsin45οþvcos45οÞ

⟹yields

R¼ ðA–Ccos45 οþDcos45οÞvþðBþCsin45 οþDsin45οÞhð8Þ

Hence, the local region R can be scalarized to:ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA�C cos451þDcos451ð Þ2þ BþC sin451þD sin451ð Þ2

q: In the fea-

ture set S, there will be a scalarized feature against each of theregions of the residual region-space.

5.3.2. Defining the fuzzy feature setThe fuzzy features are defined such that it can reflect the

proximity of a sample character image to its true class with respectto all the other possible classes. For a feature fAS, it is representedas fuzzy feature uf . Value of uf for a handwritten character/digitimage, xAX is denoted by ux

f . It is defined as following:

uxf ¼

spreadðclassxÞxf

� meanclassxf

� �P

iACspreadðiÞ

xf� meani

f

� � � vf ð9Þ

where xf is the value of the feature f for element x, classx is the trueclass of x and C denotes the set of all possible classes, meani

f ismean of the values of the feature f in class i, vf is the vote share ofthe local region corresponding to the feature f in the previouslyperformed quality consensus on the pareto-optimal set extractedfrom both NSGA-II and NSHA and spread(i) denotes the spread ofthe elements in class i (standard deviation is used for this measurein the present work).

It has been mentioned in the literature [49] that associating toomany fuzzy sets and rules can make the fuzzy model hard tointerpret and thus defeating the purpose of fuzzifying the originalfeatureset in the first place. In our present work, a one to onemapping is maintained between a feature and its correspondingfuzzy feature, therefore the dimensionality of fuzzy featuresetdoes not increase than the original featureset. Cardinalities of the

fuzzy feature set UF and the scalarized feature set S are same, asthe mapping between UF and S is one to one.

In a system with more than two elements, understanding theoverall interaction between a set of elements is best indicated byinvestigating the interaction between a pair of elements and thentheir combination with other elements. In a similar way, in ourpresent work, a subset of features is selected by investigating theinteraction between a pair of features first and then combiningthem with other features such that the resultant subset of featureswith best overall interaction between them is returned.

5.3.3. Computing the class-separation power of a set of fuzzyfeatures

Using AFS theory [43], the membership function of a pair offuzzy features u1;u2AUF can be calculated by computing θu1u2 (x)(as shown in Section 3). Membership function denotes theapparent class-separation power of the combination of the twofeatures u1;u2. Another metric for class-separation is introduced inthe present work, denoted by normðu1;u2Þ. It is defined in thefollowing way:

normðu1; u2Þ ¼P

xAXθu1u2 ðxÞjX j ð10Þ

5.3.4. Selecting the set of regions with the most class-separatingpower

To consider all possible combination of local regions, a 16�16matrix M is formed (number of local regions in our experimentalsetup is 16), where each row and column corresponds to a local regionfrom the residual region-space. M[i,j] denotes the combined class-separation power of the i-th and j-th feature from the fuzzy feature setUF, represented by the metric defined in (10). The diagonal elementsare made to zero to nullify the effect of combining i-th feature withitself. The task of selecting the most informative set of local regionsnow reduces to finding the maximum sum k� k non-trivial sub-matrix of M. The sub-matrix is formed such that, if the i-th row isselected, then the i-th column also has to be selected [50], whichessentially means that the i-th region is being considered. The k� kmaximum sum sub-matrix returns the best k features from the fuzzyfeatureset. Now, as there is a one to onemapping between the originalfeatureset and the fuzzy featureset, this helps us to identify the k bestfeatures from the residual featureset. Each feature in the residual fea-tureset has a one to one mapping with a local region in the residualregion-space, therefore we get the k best local regions from the resi-dual region-space at the end of this method. A pseudocode of thealgorithm described above, is described in Appendix A as Algorithm 3.

Finally, the union of the set of k selected regions and the elitistset of regions E, which was previously separated and maintained isreturned as the best candidate solution.

Fig. 9. Pareto-optimal surface for the dataset (DB4) of randomly mixed handwritten Bangla Basic and Compound characters.

Fig. 10. Pareto-optimal surface for the dataset (DB2) of handwritten Bangla digits.


6. Experimental results

Objectives of the present work are: (a) increasing the recog-nition accuracy and (b) decreasing the corresponding recognitioncost for the recognition of isolated Bangla handwritten characters.The recognition cost is represented in our experimental setup as,(i) average per character recognition time and (ii) number of localregions used to represent each handwritten digit/character. Theintegrated system shown in Fig. 4 is implemented for the experi-mental setup. As discussed in Table 1 of Section 4.1, performanceof the system is evaluated on four separate datasets, (a) a datasetof handwritten Bangla numerals (DB1), (b) a dataset of hand-written Bangla numerals (DB2), (c) a dataset of handwrittenBangla Basic characters (DB3), and (d) a randomly mixed dataset ofhandwritten Bangla Basic and Compound characters (DB4). In ourexperimental setup, LIBSVM [51], an open-source tool based onSVM [52] is used for classification tasks. Values of the differentparameters used in our experimental setup is described in Table 2.The experiments are performed on an Intel(R) Core(TM) i5 pro-cessor, with 3.10 GHz clock-frequency and 8 GB RAM.

As shown in Fig. 4, in the first phase of our experimental setup,pareto-optimal solutions are extracted from the final population ofthe evolutionary algorithms (NSHA and NSGA-II). These solutionsare plotted as an ordered triplet of [recognition accuracy, averageper character recognition time, number of local regions used todescribe each handwritten character/digit] over the solution-space.Figs. 9 and 10 demonstrate the resultant representative pareto-

optimal surfaces for two of the larger datasets (DB2 and DB4 inTable 1) used in our experimental setup.

Finally, in the second phase, a quality consensus is performedamong the regions extracted from the pareto-surfaces and theregions with majority consensus are separated. The minorityregions form the residual region-space. The local regions can nowbe selected from the residual region-space, based on their class-separability power, using AFS theory based fuzzy logic (as dis-cussed in Section 5.3), such that the recognition accuracy achievedby the proposed system is maximized.

A comparative analysis of the maximum recognition accuraciesachieved by employing NSHA, NSGA-II separately on all of thedatasets used in the present work and their fuzzy recombination isshown in Fig. 11.

The final results returned by the system are shown in Tables 3and 4. Dataset indexes are same as mentioned in Table 1. From theresults provided in Tables 3 and 4, it is clear that the proposedmethod provides a substantial increase in recognition accuracyand decrease in recognition cost for all of the datasets. For thedataset of randomly mixed Bangla Basic and Compound char-acters, per character average recognition time taken by the clas-sifier has slightly increased. This may be attributed to greaternumber of support vectors needed to be generated to define theMaximal Marginal Hyperplane [52] for the dataset. This apparentcontradiction serves the purpose of highlighting the difficultiesinvolved in handwritten character recognition. However, overallperformance of the proposed system on the datasets used in ourpresent experimental setup is proved to be good.

9797.297.497.697.8

Handwritten Bangla digitsNSHA 97.3NSGA-II 97.3Present work 97.8

Rec

ogni

tion

Acc

urac

y (%

)

Comparison of recognition accuracy(%) achieved for the dataset (DB1) of handwritten Bangla digits

NSHA NSGA-II Present work

98

98.1

98.2

98.3

Handwritten Bangladigits

NSHA 98.1NSGA-II 98.15Present work 98.23

Rec

ogni

tion

Acc

urac

y (%

)

Comparison of recognition accuracy(%) achieved for the dataset (DB2) of handwritten Bangla digits


85.586

86.587

87.5

Handwritten Banglacharacters

NSHA 86.3095NSGA-II 86.6363Present work 87.28

Rec

ogni

tion

Acc

urac

y (%

)

Comparison of recognition accuracy(%) achieved for the dataset (DB3) of hanwritten Bangla Basic characters


8686.286.486.686.8

Handwritten Bangla charactersNSHA 86.3095NSGA-II 86.6363Present work 86.6478

Rec

ogni

tion

Acc

urac

y (%

)

Comparison of recognition accuracy(%) achieved for the dataset (DB4) of randomly mixed hanwritten Bangla Basic

and Compound characters


Fig. 11. A comparative analysis of the maximum recognition accuracy achieved the proposed system.


Table 3Final results on the dataset of handwritten Bangla numerals.

Table 3.1

Dataset Index Dataset type Methodology Recognition Accuracy (%) bySVM classifier

Average per character recogni-tion time (ms)

Number of local regionsused

(A) (B) (C)

DB1 Isolated handwritten Banglanumerals

The Present work 97.8 16.531 11Without region sampling 97.15 17.57 16Improvement using presentwork (%)

0.67 5.913 31.25

Table 3.2


Average per character recogni-tion time (ms)

Number of local regionsused

(A) (B) (C)

DB2 Isolated handwritten Banglanumerals

The Present work 98.23 30.6872 14Without region sampling 97.5 30.72 16Improvement using presentwork (%)

0.73 0.107 12.5

Table 4Final results on the dataset of handwritten Bangla characters.

Table 4.1


Average per characterrecognition time (ms)

Number of localregions used

(A) (B) (C)

DB3 Isolated handwritten Bangla Basiccharacters

The present work 87.28 28.96 14Without region sampling 84.5750 30.02 16Improvement using presentwork (%)

3.1983 3.531 12.5

Table 4.2


Average per characterrecognition time (ms)

Number of localregions used

(A) (B) (C)

DB4 Randomly mixed isolated handwrittenBangla Basic and Compound characters

The present work 86.6478 90.6675 15Without region sampling 85.33 85.52 16Improvement using presentwork (%)

1.3178 �6.019 6.25


A brief comparative analysis of the proposed method withsome of the contemporary methods present in the literature ispresented in Table 5. Competing methods are tested on the samedatasets [12,38], used in our experimental setup and their per-formance is compared based on two evaluation metrics: (a) themaximum recognition accuracy achieved by the method afterrepeating the experiment for 25 times and (b) the reduction inrecognition cost using the method. In our experimental setup,recognition accuracy of a competing method is computed by usinga SVM based classifier whereas the reduction in recognition cost iscomputed as the sum of average reduction in per characterrecognition time and the average reduction in the number of localregions needed to represent each character. The local regions usedto represent each character are produced by a two-level CG basedquad-tree partitioning [41]. The proposed method achieved(1) 97.8% recognition accuracy and 37.163% reduction in recogni-tion cost, (2) 98.23% recognition accuracy and 12.61% reduction inrecognition cost, (3) 87.28% recognition accuracy and 16.031%reduction in recognition accuracy & (4) 86.6478% recognition

accuracy and 0.231% reduction in recognition cost for the datasetsDB1, DB2, DB3 and DB4 respectively.

To establish the superiority of the method proposed in the presentwork, statistical significance tests are performed on the results shownin Table 5. In our experimental setup, paired t-tests at 5% significancelevel have been performed to establish statistical significance of thesuperior performance shown by the proposed method. Results of thepaired t-tests are shown in Table 6. Indexes are same as in Table 5. Theresults shown in Table 6 emphatically prove that superior perfor-mance provided by the present work is statistically significant for9 out of 10 competing methods in case of Bangla characters and 10 outof 11 competing methods in case of Bangla digits.

In spite of the various precautions taken to disambiguatesimilarly shaped handwritten Bangla characters, due to variouswriting styles and/or complex shape of different Bangla charactersand digits, some of the characters and digits in the test datasetwere misclassified by the proposed methodology, which highlightsthe challenges inherent in handwritten character recognition.Fig. 12 shows some of the correctly classified and misclassifiedcharacters by the proposed method.

Table 5Comparative analysis of the proposed method with contemporaries.

Table 5.1

index Work reference Database Type of data Classification Scheme Recognition accuracy(%)

Reduction in recognitioncost (%)

A1 Hassan et al. [53] DB1 Isolated handwritten Bangla numerals SRC 94.00 0A2 Hassan et al. [54] KNN 96.70 0A3 Das et al. [55] SVM 97.70 0A4 Das et al. [15] SVM 97.70 3.9A5 Roy et al. [56] SVM 93.30 0A6 Xu et al. [57] Bayesian Network 87.50 0A7 Present work SVM 97.80 37.163B1 Wen et al. [58] DB2 Isolated handwritten Bangla numerals SVM 75.05 0B2 Bhattacharya et al.

[34]SVM 84.50 �170.11

B3 Basu et al. [42] MLP 67.70 0B4 Das et al. [15] SVM 80.58 �100B5 Roy et al. [56] SVM 87.26 �846.59B6 Present work SVM 98.23 12.61

Table 5.2

Index Work reference Database Type of data Classification Scheme Recognition accuracy(%)

Reduction in recognitioncost (%)

C1 Basu et al. [59] DB3 Isolated handwritten Bangla Basic characters SVM 80.58 �200C2 Roy et al. [60] SVM 86.40 13.52C3 Das et al. [61] SVM 82.27 0C4 Bhowmick et al. [62] MLP 84.23 0C5 Sarkhel et al. [47] SVM 86.533 8.583C6 Present work SVM 87.28 16.031D1 Das et al. [61] DB4 Randomly mixed, isolated handwritten Bangla

Basic and Compound charactersSVM 75.05 0

D2 Roy et al. [60] SVM 84.50 �170.11D3 Basu et al. [14] MLP 67.70 0D4 Basu et al. [59] SVM 80.58 �100D5 Das et al. [10] SVM 86.96 �846.59D6 Present work SVM 86.6478 0.231

Table 6Results of paired t-test at 5% significance level.

Table 6.1

Index Database Type of data p-Value t-Value HrH0

A1 DB1 Bangla digits 0.0348 5.2197 YesA2 0.0472 4.4388 YesA3 0.4597 0.7454 NoA4 0.0009 3.5355 YesA5 0.0001 18.6852 YesA6 0.0012 3.4333 YesB1 DB2 Bangla digits 0.0002 66.65 YesB2 0.0001 24104 YesB3 0.0004 47.44 YesB4 0.0448 2.212 YesB5 0.0114 9.297 Yes

Table 6.2

Index Database Type of data p-Value t-Value HrH0

C1 DB3 Bangla Basic characters 0.0310 2.2222 YesC2 0.0077 2.7828 YesC3 0.0168 2.4773 YesC4 0.0001 6.1000 YesC5 0.0492 2.0182 YesD1 DB4 Randomly mixed Bangla

Basic and Compoundcharacters

0.0112 9.361 YesD2 0.0015 25.52 YesD3 0.0122 8.954 YesD4 0.0801 3.209 NoD5 0.0001 139.4 Yes


Fig. 12. Samples of correctly classified and misclassified characters by the present work.

Fig. 13. Samples of broken and degraded Bangla handwritten characters and digitswhich were correctly classified by the present work.


It is to be noted that our present work can successfully handlebroken and degraded characters. Fig. 13 shows some of such char-acters present in the datasets that have been used in our experiment.All these samples were successfully recognized by our proposedmethod, despite the apparent discontinuities present in the shapes ofthe characters. The reason behind this may be attributed to thesimilarities in the sub-images generated in the first and second levelof depths of quad-tree partitioning of the handwritten character/digitimage. Interested readers can find more details regarding this in [10].

7. Conclusion

A novel region sampling method based on multi-objectiveevolutionary algorithms is presented here. The proposed methodtries to identify the set of most discriminative local regions whileincurring minimal recognition cost. A non-dominated sortingharmony search based region sampling method and a non-dominated sorting genetic algorithm based region samplingmethodology have been employed in our experimental setup. Itprovided 1.3178% and 0.73% increase in recognition accuracy,while decreasing the associated recognition cost by 12.607% and0.234% for a dataset of handwritten Bangla characters [35] and a

dataset of handwritten Bangla digits [38] respectively. Both of themulti-objective evolutionary region sampling algorithms used inthe experimental setup, have been developed independent of anyparticular dataset. The results extracted from the pareto-optimalsurfaces of both the algorithms are fuzzily combined to return thebest compromise solution. The multi-objective region samplingalgorithms and the AFS theory based fuzzy region selectionmethodology, proposed under the current work, mark one of thecontributions of this paper. Also to the best of our knowledge, thisis the first work which tries to find a trade-off between max-imizing the achieved recognition accuracy and minimizing theassociated recognition cost for a handwritten character recogni-tion system. The experimental results have shown great promisein this approach. It opens up a new frontier of developing efficientoptical character recognition systems, which can be successfullyused in various real-life applications.

Conflict of interest

None declared.

Acknowledgment

The authors are thankful to CVPR unit, ISI, Kolkata for providingthe dataset for the experiment. The authors are also thankful to“Centre for Microprocessor Application for Training Education andResearch” and Department of Computer Science & Engineering,Jadavpur University, Kolkata for kindly providing the resources andinfrastructural facilities that helped to complete this work.

Appendix A

To describe the algorithms in more details, pseudocodes of the proposed algorithms developed under our present work is described here. Inthis section, the pseudocodes of proposed NSGA-II [18] based region sampling methodology, the NSHA [17] based region sampling methodologyand the AFS theory based local region selection methodology is described under Algorithm 1, Algorithm 2 and Algorithm 3 respectively.
Algorithm 1. A non-dominated sorting genetic algorithm basedregion sampling methodology.
Input: Initial feature set extracted from the Bangla handwritten character/digit database.Output: A set of non-dominated points extracted from the pareto-optimal front.Initialize the parameters: population size (POP), crossover-probability (pc), mutation-probability (pm), maximum number of iterations (NI)

Begin1. ß ¼ Φ /* ß denotes the pareto-optimal set */2. for (i ¼ 1 to NI){

3

4567891111

11111

122222

2

2

222

12

3


. C ¼β /* C denotes population of the current generation, represented by a set of ordered pairs (recognition accuracy,recognition cost) */. for ( j ¼ 1 to POP-1){ /*POP denotes the maximum possible population size*/. if (population is empty){. Initialize the population with a set S containing 2 sets ofjrandomly selected local regions.. C ¼ C [ S. }. Generate a uniform random number r1 ϵ (0, 1).0. if (r1 r pc ){ // pc denotes the crossover probability1. Randomly select two parents from the current generation.2. /* These two parents constitute the mating pool. */3. Randomly select a crossover point and perform single-point crossover operation. If the two offspring generated by thisoperation are denoted by O1 and O2.4. Update population: C ¼ C [ O1 [ O2

5. }6. Generate a uniform random number r2 ϵ (0, 1).7. if (r2 r pm ){ // pm denotes the mutation probability8. Randomly select a parent from the current generation. Make a copy of this parent for the mutation operation. Performmutation on this parent (shown in Fig. 5).9. Let, the mutated parent is denoted by MP.0. C ¼ C [ MP1. }2. }3. for the population in C{4. Perform fast non-dominated sort and assign non-domination rank, as suggested by Deb et al. [16]. For a parent pA C, non-domination rank of p is the number of individuals in C that dominates p.5. To preserve diversity among the pareto-optimal points, a metric called the crowding- distance (shown in Fig. 6) is alsointegrated, when trying to find the non-dominated fronts.6. Using these two metrics i.e. non-domination ranks and crowding-distance, non-dominated fronts are computed based on apartial order relation denoted by!n. For any p, qA C, p!nq implies [(prank o qrank) or ((prank ¼qrank) and (pdistance4qdistance))]. Thepoints of the pareto-optimal front are represented by the set ß.7. }8.9. return ß. /* ß contains the pareto-optimal set of points */nd
E
Analysis of the algorithmUsing the meta-heuristics searching ability of the algorithm, the pareto-optimal front produces several alternate solutions to the user

to choose from. Time complexity analysis of the algorithm for each iteration, is as following:

) Time complexity for non-dominated sorting is: O M 2Nð Þ2� �

;) Time required for crowding-distance assignment is:
O M 2Nð Þlog 2Nð Þð Þ;
) Sorting on !n requires time of the order of: O 2Nlog 2Nð Þð Þ:

In aggregate, total time required for each iteration of the algorithm is of the order of: O MN2� �

, where N is the total number of localregions in the local feature set and M is the number of objectives to be optimized. This is a significant reduction from experimenting withall the possible combinations of local regions, which would have taken time of the order of Oð2NÞ.

Algorithm 2. A self-adaptive non-dominated sorting harmony-search region sampling methodology.

Input: Initial feature set extracted from the Bangla handwritten character/digit database.Output: A set of non-dominated points extracted from the pareto-optimal front.Initialize the parameters: harmony memory size (HMS), minimum value of HMCR (HMCRmin), maximum value of HMCR (HMCRmax),minimum value of PAR (PARmin), maximum value of PAR (PARmax), maximum number of iterations (NI).

Begin1. ß ¼ Φ /* ß denotes the pareto-optimal set */2. for (i ¼ 1 to NI ){3.C ¼ ß /* C denotes population of the current generation, represented by a set of ordered pairs (recognitionaccuracy, recognition cost) */

4. for ( j ¼ 1 to HMS-1){ /* HMS denotes the maximum possible population size */5. if (population is empty){6. Initialize the population with a set S containing 2 sets of j randomly selected local regions.

789111111

11112222222

2

2

233

12

3

InO12

345678


. C ¼ C [ S

. }

. Generate a uniform random number r1 ϵ (0, 1).0. if (r1 r HMCRi ){ /* The value of HMCRi is set by (5) */1. Randomly select a parent from the current generation. Make a copy of this parent for pitch adjustment.2. Generate a uniform random number r2 ϵ (0, 1).3. if (r2 r PARi ){ /* The value of PARi is set by (6) */4. Randomly select a region from the previously chosen parent i.e. a note from a harmony.5. Replace the selected region with a region selected randomly from the set of local regions that are not included in thechosen parent.6. Let the pitch adjusted parent is denoted by P.7. C ¼ C ⋃ P8. }9. }0. else {1. Initialize a parent T with j randomly selected regions such that, T =2C.2. C ¼ C ⋃ T3. }4. }5. for the population in C {6. Perform fast non-dominated sort and assign non-domination rank, as suggested by Deb et al. [16]. For a parent pA C, non-domination rank of p is the number of individuals in C that dominates p.7. To preserve diversity among the pareto-optimal points, a metric called the crowding-distance (shown in Fig. 6) is also inte-grated, when trying to find the non-dominated fronts.8. Using these two metrics i.e. non-domination ranks and crowding-distance, non-dominated fronts are computed based on a partialorder relation denoted by!n. For any p, qA C, p!nq implies [(prank o qrank) or ((prank ¼qrank) and (pdistance4qdistance))]. The points of thepareto-optimal front are represented by the set ß.9. }0. }1. return ß. /* ß contains the pareto-optimal set of points */nd
E
Analysis of the algorithmThe proposed method searches for the most informative set of local regions in the search space of (recognition accuracy, recognition

cost) and returns the pareto-optimal set. Time complexity analysis of the algorithm for each iteration, is given as following:

) Time complexity for non-dominated sorting is: O M 2Nð Þ2� �

;) Time required for crowding-distance assignment is:
O M 2Nð Þlog 2Nð Þð Þ;
) Sorting on !n requires time of the order of: O 2Nlog 2Nð Þð Þ:

In aggregate, total time required for each iteration of the algorithm is of the order of: O MN2� �

, where N is the total number of localregions in the local feature set and M is the number of objectives to be optimized. This is a significant reduction from experimenting withall the possible combinations of local regions, which would have taken time of the order of Oð2N�1Þ.

Algorithm 3. An AFS theory based feature selection techniquefrom the residual featureset.

// f AF(the original set of features) and// ux

f denotes the fuzzy features for the

// k� k sub-matrix is chosen from M// that if i-th row is selected, i-th colum// will also have to be selected.

put: The set of training samples X, original featureset F.utput: The most discriminating subset of features from the residual featureset.for xAX // X is the set of training samples

Calculate uxf according toxf ;

// element x, as shown in Eq. (9)for ui;ujAUF // UF denotes the fuzzy featureset

Calculateθu1u2 ðxÞ; // As shown in Eq. (7)for ui;ujAUFCalculatenormðu1;u2Þ; // As shown in Eq. (10)Create matrix M;
Find maximum sum k� k sub-matrix from M;
n


References

[1] A. Ul-Hasan, S. Bin Ahmed, F. Rashid, F. Shafait, T.M. Breuel, Offline printedUrdu Nastaleeq script recognition with bidirectional LSTM networks, in: Pro-ceedings of the 2013 12th International Conference on Document Analysis andRecognition, 2013, pp. 1061–1065.

[2] A. Ray, S. Rajeswar, S. Chaudhury, Text recognition using deep BLSTM net-works, in: Proceedings of the 2015 Eighth International Conference onAdvances in Pattern Recognition (ICAPR), 2015, pp. 1–6.

[3] A. Park, Offline text recognition without intraword character segmentationbased on two-dimensional low frequency discrete Fourier transforms, 24-May-1994.

[4] A. Vinciarelli, S. Bengio, H. Bunke, Offline recognition of unconstrainedhandwritten texts using HMMs and statistical language models, IEEE Trans.Pattern Anal. Mach. Intell. 26 (6) (2004) 709–720.

[5] H. Fujisawa, Forty years of research in character and document recognition—an industrial perspective, Pattern Recognit. 41 (8) (2008) 2435–2446.

[6] R.M. Bozinovic, S.N. Srihari, Off-line cursive script word recognition, IEEETrans. Pattern Anal. Mach. Intell. 11 (1) (1989) 68–83.

[7] P.-K. Wong, C. Chan, Off-line handwritten Chinese character recognition as acompound Bayes decision problem, IEEE Trans. Pattern Anal. Mach. Intell. 20(9) (1998) 1016–1023.

[8] J. Cao, M. Ahmadi, M. Shridhar, Recognition of handwritten numerals withmultiple feature and multistage classifier, Pattern Recognit. 28 (2) (1995)153–160.

[9] S.V. Rajashekararadhya, P.V. Ranjan, Efficient zone based feature extractionalgorithm for handwritten numeral recognition of four popular south indianscripts, J. Theor. Appl. Inf. Technol. 4 (12) (2008) 1171–1181.

[10] N. Das, R. Sarkar, S. Basu, P.K. Saha, M. Kundu, M. Nasipuri, Handwritten Banglacharacter recognition using a soft computing paradigm embedded in two passapproach, Pattern Recognit. 48 (6) (2014) 2054–2071.

[11] U. Pal, B.B. Chaudhuri, Indian script character recognition: a survey, PatternRecognit. 37 (9) (2004) 1887–1899.

[12] R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, CMATERdb1: adatabase of unconstrained handwritten Bangla and Bangla–English mixedscript document image, Int. J. Doc. Anal. Recognit. 15 (1) (2011) 71–83.

[13] D. Impedovo, G. Pirlo, Zoning methods for handwritten character recognition:a survey, Pattern Recognit 47 (3) (2014) 969–981.

[14] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten‘Bangla’ alphabet recognition using an MLP based classfier, in: Proceedings ofthe 2nd National Conference on Computer Processing of Bangla—2005, 2005,pp. 285–291.

[15] N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, A genetic algorithmbased region sampling for selection of local features in handwritten digitrecognition application, Appl. Soft Comput. 12 (5) (2012) 1592–1606.

[16] K. Deb, Multi-objective optimization using evolutionary algorithms: anintroduction, 2011, pp. 1–24.

[17] S. Sivasubramani, K.S.S. Swarup, Multi-objective harmony search algorithm foroptimal power flow problem, Int. J. Electr. Power Energy Syst. 33 (3) (2011)745–752.

[18] K. Deb, A. Member, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitistmultiobjective genetic algorithm, IEEE Trans. Evol. Comput. 6 (2) (2002)182–197.

[19] L. Xiaodong, The fuzzy theory based on AFS algebras and AFS structure, J.Math. Anal. Appl. 217 (2) (1998) 459–478.

[20] V. Chankong, Y.Y. Haimes, Multiobjective Decision Making: Theory andMethodology, North Holland, New York, 1983.

[21] E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algo-rithms: empirical results, Evol. Comput. 8 (2) (2000) 173–195.

[22] M. Atiquzzaman, S.-Y. Liong, X. Yu, Alternative decision making in water dis-tribution network with NSGA-II, J. Water Resour. Plan. Manag. 132 (2) (2006)122–126.

[23] E. Fallah-Mehdipour, O. Bozorg Haddad, M.M. Rezapour Tabari, M.A. Mariño,Extraction of decision alternatives in construction management projects:application and adaptation of NSGA-II and MOPSO, Expert Syst. Appl. 39 (3)(2012) 2794–2803.

[24] S. Dhanalakshmi, S. Kannan, K. Mahadevan, S. Baskar, Application of modifiedNSGA-II algorithm to combined economic and emission dispatch problem, Int.J. Electr. Power Energy Syst. 33 (4) (2011) 992–1002.

[25] S. Kannan, S. Baskar, J.D. McCalley, P. Murugan, Application of NSGA-II algo-rithm to generation expansion planning, IEEE Trans. Power Syst. 24 (1) (2009)454–461.

[26] K.S. Lee, Z.W. Geem, A new meta-heuristic algorithm for continuous engi-neering optimization: harmony search theory and practice, Comput. MethodsAppl. Mech. Eng. 194 (36–38) (2005) 3902–3933.

[27] X.S. Yang, Harmony search as a metaheuristic algorithm, Stud. Comput. Intell.191 (2009) 1–14.

[28] J.R. Germ, Multiobjective Harmony Search Algorithm Proposals, vol. 281, 2011,pp. 51–67.

[29] S. Sivasubramani, K.S. Swarup, Environmental/economic dispatch using multi-objective harmony search algorithm, Electr. Power Syst. Res. 81 (9) (2011)1778–1785.

[30] I. Tsamardinos, C.F. Aliferis. Towards principled feature selection: relevancy,filters, and wrappers, in: Proceedings of the Ninth International Workshop onArtificial Intelligence and Statistics, 2003.

[31] M. Hanmandlu, A.V. Nath, A.C. Mishra, V.K. Madasu, Fuzzy Model BasedRecognition of Handwritten Hindi Numerals using Bacterial Foraging, in:Proceedings of the 6th IEEE/ACIS International Conference on Computer andInformation Science (ICIS 2007), 2007, pp. 309–314.

[32] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach.Learn. Res. 3 (2003) 1157–1182.

[33] M. Ramze Rezaee, B. Goedhart, B.P.F. Lelieveldt, J.H.C. Reiber, Fuzzy featureselection, Pattern Recognit. 32 (12) (1999) 2011–2019.

[34] U. Bhattacharya, B.B. Chaudhuri, Handwritten numeral databases of Indianscripts and multistage recognition of mixed numerals, IEEE Trans. PatternAnal. Mach. Intell. 31 (3) (2009) 444–457.

[35] cmaterdb – CMATERdb: The pattern recognition database repository – GoogleProject Hosting. [Online]. Available: https://code.google.com/p/cmaterdb/.(accessed: 31.01.15).

[36] R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, CMATERdb1 : a database ofunconstrained handwritten Bangla and Bangla – English mixed script docu-ment image, 2012, pp. 71–83.

[37] S. K. Parui, K. Guin, U. Bhattacharya, B.B. Chaudhuri, Online HandwrittenBangla Character Recognition Using HMM 3. Analysis of strokes in hand-written, IEEE, 2008, pp. 1–4.

[38] ISI Image Databases of Handwritten Isolated Bangla Numerals. [Online].Available: ⟨http://www.isical.ac.in/�ujjwal/download/BanglaNumeral.html⟩.(Accessed: 20.04.15).

[39] CMATERdb3.1.3.3 – cmaterdb – Handwritten Bangla Compound characterimage database – CMATERdb: The pattern recognition database repository –

Google Project Hosting.[40] N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, D. Kumar Basu, An improved

feature descriptor for recognition of handwritten Bangla alphabet, in: Pro-ceedings of the ICSIP 2009, 2009, pp. 451–454.

[41] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. Kumar Basu, A novelframework for automatic sorting of postal documents with multi-scriptaddress blocks, Pattern Recognit. 43 (10) (2010) 3507–3521.

[42] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. K. Basu, Recognition ofnumeric postal codes from multi-script postal address blocks, in: PatternRecognition and Machine Intelligence, Springer, 2009, pp. 381–386.

[43] X. Liu, W. Pedrycz, Axiomatic Fuzzy Set Theory and Its Applications, Springer,Berlin, Heidelberg, 2009.

[44] G.V. Geem, Loganathan, A new heuristic optimization algorithm: harmonysearch, Simulation 76 (2) (2001) 60–68.

[45] S. Salcedo-Sanz, A. Pastor-Sánchez, J. Del Ser, L. Prieto, Z.W. Geem, A coral reefsoptimization algorithm with harmony search operators for accurate windspeed prediction, Renew. Energy 75 (2015) 93–101.

[46] Z.W. Geem, J.-H. Kim, Wastewater treatment optimization for fish migrationusing harmony search, Math. Probl. Eng. 2014 (2014) 1–5.

[47] R. Sarkhel, A. Saha, N. Das, An enhanced harmony search method for banglahandwritten character recognition using region sampling, in: Proceedings ofthe 2nd IEEE International Conference on Recent Trends in Information Sys-tems (ReTIS-15), 2016, p. (in press).

[48] Q.-K. Pan, P.N. Suganthan, J.J. Liang, M.F. Tasgetiren, A local-best harmonysearch algorithm with dynamic subpopulations, Eng. Optim. 42 (2) (2010)101–117.

[49] Y. Li, Z.-F. Wu, Fuzzy feature selection based on min–max learning rule andextension matrix, Pattern Recognit. 41 (1) (2008) 217–226.

[50] A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, An Axiomatic Fuzzy Set TheoryBased Feature Selection Methodology for Handwritten Numeral Recognition.

[51] C. Chang, C. Lin, LIBSVM: A Library for Support Vector Machines, vol. 2 (3),2011.

[52] V.N. Vapnik, An overview of statistical learning theory, IEEE Trans. NeuralNetw. 10 (5) (1999) 988–999.

[53] H.A. Khan, A. Al Helal, K.I. Ahmed, Handwritten Bangla digit recognition usingSparse Representation Classifier, in: Proceedings of the 2014 InternationalConference on Informatics, Electronics & Vision (ICIEV), 2014, pp. 1–6.

[54] T. Hassan, H.A. Khan, Handwritten Bangla numeral recognition using LocalBinary Pattern, in: Proceedings of the 2015 International Conference onElectrical Engineering and Information Communication Technology (ICEEICT),2015, pp. 1–4.

[55] N. Das, J.M. Reddy, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, D.K. Basu, Astatistical–topological feature combination for recognition of handwrittennumerals, Appl. Soft Comput. 12 (8) (2012) 2486–2495.

[56] A. Roy, N. Mazumder, N. Das, R. Sarkar, S. Basu, M. Nasipuri, A new quad treebased feature set for recognition of handwritten bangla numerals, in: AICERA2012 – Annual International Conference on Emerging Research Areas: Inno-vative Practices and Future Trends, 2012, pp. 1–6.

[57] J. Xu, J. Xu, Y. Lu, Handwritten Bangla digit recognition using hierarchicalBayesian network, in: Proceedings of the 2008 3rd International Conferenceon Intelligent System and Knowledge Engineering, 2008, vol. 1, pp. 1096–1099.

[58] Y. Wen, Y. Lu, P. Shi, Handwritten Bangla numeral recognition system and itsapplication to postal automation, Pattern Recognit. 40 (1) (2007) 99–107.

[59] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, A hierarchicalapproach to recognition of handwritten Bangla characters, Pattern Recognit 42(7) (2009) 1467–1484.

[60] A. Roy, N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, Region selection inhandwritten character recognition using artificial bee colony optimization, in:


Proceedings of the Third International Conference on Emerging Applicationsof Information Technology (EAIT– 2012), 2012, pp. 183–186.

[61] N. Das, B. Das, R. Sarkar, S. Basu, M. Kundu, Handwritten Bangla basic andcompound character recognition using MLP and SVM classifier, J. Comput. 2(2) (2010) 109–115.

[62] S.K.P.T.K. Bhowmik, U. Bhattacharya, Recognition of Bangla HandwrittenCharacters Using an MLP Classifier Based on Stroke Features, Springer, Berlin,Heidelberg, 2004.

Ritesh Sarkhel received his B.C.S.E degree from Jadavpur University in 2012. He worked as an R&D Engineer in Samsung Research Institute, Noida from 2012 to 2014. He iscurrently pursuing M.C.S.E degree from Jadavpur University. His areas of current research interest are OCR of handwritten text, optimization techniques and computer vision.

Nibaran Das received his B.Tech degree in Computer Science and Technology from Kalyani Govt. Engineering College under Kalyani University, in 2003. He received his M.C.S.E. degree from Jadavpur University, in 2005. He received his Ph.D. (Engg.) degree thereafter from Jadavpur University, in 2012. He joined J.U. as a lecturer in 2006. His areasof current research interest are OCR of handwritten text, Bengali fonts, optimization techniques and image processing. He has been an editor of Bengali monthly magazine“Computer Jagat” since 2005.

Amit K. Saha received his B.Tech degree in Information Technology from WBUT, in 2011. He received his M.T.C.T. degree from Jadavpur University, in 2015. His areas ofcurrent research interest are OCR of handwritten text, Nature Inspired Computing and Multi-Objective Optimization.

Mita Nasipuri received her B.E.Tel.E., M.E.Tel.E., and Ph.D. (Engg.) degrees from Jadavpur University, in 1979, 1981 and 1990, respectively. Prof. Nasipuri has been a facultymember of J.U. since 1987. Her current research interest includes image processing, pattern recognition, and multimedia systems. She is a senior member of the IEEE, U.S.A.,Fellow of I.E. (India) and W.B.A.S.T., Kolkata, India.

pattern recognitionweb.cse.ohio-state.edu/~sarkhel.5/mobj_cost-eff_ocr.pdf · 2019-08-12 · bangla...

Documents