infrared and visible image fusion using a novel deep ...abstract： infrared and visible image...

Infrared and visible image fusion using a

novel deep decomposition method

Hui Li Xiao-Jun Wu *

Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,

Jiangnan University, Wuxi, China, 214122.

Abstract： Infrared and visible image fusion is an important problem in image fusion tasks which has been applied

widely in many fields. To better preserve the useful information from source images, in this paper, we propose an

effective image fusion framework using a novel deep decomposition method which based on Latent Low-Rank

Representation(LatLRR). And this decomposition method is also named DDLatLRR. Firstly, the LatLRR is utilized to learn

a project matrix which used to extract salient features. Then, the base part and multi-level detail parts are obtained

by DDLatLRR. With adaptive fusion strategies, the fused base part and the fused detail parts are reconstructed.

Finally, the fused image is obtained by combine the fused base part and the detail parts. Compared with other fusion

methods experimentally, the proposed algorithm has better fusion performance than state-of-the-art fusion methods in

both subjective and objective evaluation. The Code of our fusion method is available at

https://github.com/exceptionLi/imagefusion_deepdecomposition .

Keywords: image fusion;deep decomposition;latent low-rank representation; infrared image; visible image;

1 Introduction

In multi-sensor image fusion field, the infrared and visible image fusion is an important task.

It has been widely used in many applications, such as surveillance, object detection and target

recogintion. The main purpose of image fusion is to generate a single image which contains the

complementary information from multiple images of the same scene[1]. In infrared and visible image

fusion, it is a key problem to extract the salient object from infrared image and visible image.

And many fusion methods have been proposed in recently.

As we all know, the most commonly used method in image fusion are multi-scale transforms, such

as discrete wavelet transform(DWT)[2], contourlet trnsform[3], shift-invariant shearlet

transform[4] and quaternion wavelet transform[5] etc. Due to the conventional tansform methods have

not enough detail preservation ability, Lou et al.[6] proposed a fusion method based on contextual

statistical similarity and nonsubsampled shearlet transform which can obtain the local structure

information from source images. For infrared and visible image fusion, Bavirisetti et al.[7]

proposed a fusion method based on two-scale decomposed and saliency detection, they used mean

filter and median filter to extract the base layers and detail layers, and visual saliency is used

to obtain weight maps. Then, the fused image is obtained by calculate these three parts.

Besides above methods, Zhang et al.[8] proposed a morphological gradient based fusion method.

https://github.com/exceptionLi/imagefusion_deepdecomposition

This method used the different morphological gradient operators to obtain the focus region, defocus

region and focus boundary region, respectively. Then the fused image is obtained by using an

appropriate fusion strategy.

In representation learning based algorithms, the most common methods are based on sparse

representation(SR). Zong et al.[9] proposed a novel medical image fusion method based on SR. The

Histogram of Oriented Gradient(HOG) features are used to classify the image patches and several

sub-dictionaries are learned. The 𝑙1-norm and choose-max strategy are utilized to reconstruct fused

image. In addition, there are many algorithms based on combining SR and other tools which are pulse

coupled neural network(PCNN)[10], low-rank representation(LRR)[11] and shearlet transform[12].

Moreover, the joint sparse representation[13] and cosparse representation[14] were also applied

into image fusion field.

With the rise of deep learning, deep features of source images are used to reconstruct the

fused image. In [15], Yu Liu et al. proposed a fusion method based on convolutional sparse

representation(CSR). The CSR is different from deep learning methods, but the features extracted by

CSR indicate multi-scale and multi-layer features which just like deep features. In addition, Yu

Liu et al.[16] also proposed a convolutional neural network(CNN)-based fusion method. Image patches

which contain different blur versions are used to train a network and obtain a decision map. The

fused image is obtained by using the decision map and source images. In ICCV 2017, Prabhakar

et.al.[17] proposed a novel CNN-based fusion framework for multi-exposure image fusion task.

However, these deep learning-based methods still have drawbacks which the network is difficult

to train when the training data is not enough, especially in infrared and visible image fusion

task. And CNN-based methods just work on a specific image fusion task.

To solve these drawbacks, Li et.al.[18] proposed a novel fusion framework based on a pretrained

network(VGG-19[19]). Firstly, the detail parts and base parts are obtained by a optimization

method[20]. Average strategy is used to fuse base parts and a deep learning framework is utilized

to obtain fused detail parts. The VGG-19 is used to extract multi-layer deep features from detail

parts. With multi-layers fusion strategy and choose-max operator, the final fused detail parts are

obtained. The final fused image is reconstructed by combine fused detail parts and fused base

parts.

Although the SR and deep learning based methods obtain good fusion performance, these methods

still have drawbacks: 1)In SR based methods, the dictionary learning is a very time consuming

operator, especially online dictionary learning; 2)In deep features based method[18], the

decomposition method is very sample, and different decomposition methods will cause different

influence.

So in this paper, we propose a novel fusion framework based on a deep decomposition

method(DDLatLRR) for infrared and visible image fusion. Firstly, a project matrix is learned by the

Latent Low-Rank Representation(LatLRR)[21], and the project matrix is used to map the input data to

a salient feature space. Secondly, image patches are obtained by divide source images with sliding

window technique. And these image patches are stretched to a source matrix which each column

indicates an image patch. The DDLatLRR is used to decompose the source matrix level-by-level to

extract salient features which named detail parts and the base part. Then, the fused detail parts

are reconstructed by an adaptive fusion strategy and a reshape operator. And the fused base part is

obtained by average strategy. Finally, the fused image is reconstructed by combining the fused

detail parts and the fused base part. Compare with state-of-the-art fusion methods, our fusion

framework achieves better fusion performance in both subjective and objective evaluation.

This paper is structured as follows. In Section 2, we give a brief introduction of related

works. In Section 3, the proposed fusion framework based on a deep decomposition method is

introducted in detail. Section 4 introduces how to learn a project matrix. The experimental results

are shown in Section 5. Finally, Section 6 draws the conclusions.

2 Related works

Latent Low-Rank Representation:

In 2010, Liu et al.[22] proposed LRR theroy which the input data matrix is chosen as the

dictionary, but this method can not achieve good performance when the input data is insufficient or

corrupted. So in 2011, the author proposed LatLRR theroy[21], the low-rank structure and salient

structure can be extracted by LatLRR from raw data.

In reference [21], the LatLRR problem is reduced to solve the following optimization problem,

min𝑍,𝐿,𝐸

‖𝑍‖∗ + ‖𝐿‖∗ + 𝜆‖𝐸‖1, ............................... （1）

𝑠. 𝑡. 𝑋 = 𝑋𝑍 + 𝐿𝑋 + 𝐸,

where 𝜆 > 0 is the balance coefficient, ‖∙‖∗ denotes the nuclear norm which is the sum of the

singular values of matrix and ‖∙‖1 is 𝑙1 − 𝑛𝑜𝑟𝑚. X denotes observed data matrix, Z is low-rank

coefficients, L is a project matrix which is named salient coefficients, and E is sparse noisy

matrix. Eq.(1) is solved by the inexact Augmented Lagrangian Multiplier (ALM)[22]. Then the salient

component LX is obtained by Eq.(1).

Fig.1. Latent low-rank representation. The observed data matrix X, low-rank coefficients Z and project matrix L

Why choose project matrix L:

As shown in Fig.1. Assume that source image is divided into M image patches and the size of

image patch is 𝑛 × 𝑛, and 𝑁 = 𝑛 × 𝑛. X indicates the observed matrix and each column denotes an

image patch. The size of Z is related to the number of image patches, which depends on the size of

the source image. And it is time consuming to calculate the low-rank coefficients for every image

in test phase, if Z is applied to fusion framework.

However, the size of project matrix L is just related to image patch size. In this case, once

the project matrix is learned by LatLRR, it can be used to process other images which are arbitrary

size.

So in our fusion method, Eq.(1) is used to learn a project matrix L from training data(infrared

and visible images). Then, observed data matrices(X) are decomposed into detail parts and base

parts by a pre-trained project matrix, the detail of this operator will be introduced in the next

section.

3 The Proposed Fusion Method

In this section, the proposed fusion method is presented in detail. The deep decomposition

method and fusion strategies for detail parts and base part will be presented in the next

subsection.

Assuming that 𝐼1 and 𝐼2 indicate input images(infrared and visible images), and 𝐼𝑘(𝑘 ∈ {1, 2}) is

irrelevant with the type of input images. The fusion algorithm is the same when 𝑘 > 2. The

framework of our fusion algorithm is shown in Fig.2.

Fig.2. The framework of proposed fusion method. Source images are decomposed into detail parts(𝑉𝑑𝑘1:𝑟) and base

part(𝐼𝑏𝑘𝑟 ). Then, with adaptive fusion strategies, the fused image(𝐼𝑓) is reconstructed by fused detail parts(𝐼𝑏𝑓

1:𝑟) and

base part(𝐼𝑏𝑓).

As shown in Fig.2, the input image(𝐼𝑘) is divided into many image patches by sliding window

technique with overlapping(the step is one). And these image patches are reshuffled into a source

matrix(𝑉𝑑𝑘0 ) which each column indicates an image patch. The detail parts and the base parts are

calculated by Eq.(2).

𝑉𝑑𝑘𝑖 = 𝐿 × 𝑃(𝐼𝑏𝑘

𝑖−1), 𝐼𝑏𝑘𝑖 = 𝐼𝑏𝑘

𝑖−1 − 𝑅(𝑉𝑑𝑘𝑖 ) ......................... （2）

𝑠. 𝑡. 𝐼𝑏𝑘0 = 𝐼𝑘 , 𝑘 ∈ {1,2}, 𝑖 = 1,2, ⋯ , 𝑟

where r and L denote the decomposition level and the project matrix which learned by LatLRR,

respectively. 𝑉𝑑𝑘𝑖 means the decomposition results from the previous base part 𝐼𝑏𝑘

𝑖−1, 𝑃(⋅) denotes the

sliding window technique and reshuffled operator, and R(⋅) indicates the operator which is utilized

to reconstruct the detail image from detail part. As shown in Eq.(2), the detail parts are

generated by L, 𝑃(⋅) and the input image 𝐼𝑏𝑘𝑖−1. Then, the base part is obtained by subtract the detail

part from input image.

After decomposed with r levels by our decomposition method, the input image(𝐼𝑘) is decomposed

into r pairs of detail part matrices(𝑉𝑑𝑘1:𝑟) and one pair of base part(𝐼𝑏𝑘

𝑟 ). For each pair of detail

parts, an adaptive fusion strategy is used to fused these part column by column. Then, r fused

detail images(𝐼𝑑𝑓1:𝑟) are obtained. The fused detail images can be calculated by Eq.(3),

𝐼𝑑𝑓𝑖 = 𝑅 (𝐹𝑆(𝑉𝑑1

𝑖 , 𝑉𝑑2𝑖 )) , 𝑖 = 1, 2, ⋯ , 𝑟 ............................ (3)

where r indicates the decomposition level, 𝑅(⋅) denotes the operator which is used to reconstruct a

salient feature image from the detail part matrix and 𝐹𝑆(⋅) is fusion strategy which will be

introduced in the next subsection.

Due to base part contains more contour and brightness information, in our fusion method,

weighted average strategy is utilized to obtain the fused base part.

After the fused detail images and base part are obtained by adaptive strategies, fused image is

reconstructed by those parts.

In the next subsections, the detail of deep decomposition method, the fusion strategies and

reconstruction will be presented.

3.1 Deep Decomposition based on LatLRR(DDLatLRR)

Firstly, we introduce the LatLRR based decomposition method(DLatLRR). As we discussed in

Eq.(2), once the project matrix L is learned by LatLRR, it can be used to extract detail parts and

base parts from input images by DLatLRR. And the process of DLatLRR is shown in Fig.(3).

Fig.3. The process of DLatLRR.

In our decomposition method, the input image(𝐼𝑏𝑘𝑖−1) is divided into image patches and reshuffled

to vectors by the operator 𝑃(⋅). Then, the detail part matrix(𝑉𝑑𝑘𝑖 ) is calculated by project matrix L

and 𝑃(𝐼𝑏𝑘𝑖−1). The salient features are shown in detail image(𝑅(𝑉𝑑𝑘

𝑖 )) which reconstructed by 𝑅(⋅).

For just one level(𝑟 = 1) of our decomposition method, the detail part and base part can be

calculated by Eq.(2).

If 𝑟 > 1 which means multi-level decomposition method is used, the deep decomposition

method(DDLatLRR) is presented. And the framework of DDLatLRR is shown in Fig.4.

Fig.4. Deep decomposition based on LatLRR(DDLatLRR). DLatLRR indicates our decomposition method for one level. And 𝑟

denotes the number of decomposition level.

In Fig.4, 𝐼𝑘 presents the source image, 𝑉𝑑𝑘𝑖 and 𝐼𝑏𝑘

𝑖 denote the detail part matrix and base part

from input image which is decomposed by DLatLRR, 𝑖 = 1,2, ⋯ , 𝑟. As we can see from Fig.4, each

previous base part is decomposed by DLatLRR. Then, for 𝑟 levels decomposition, we will get 𝑟 detail

part matrices 𝑉𝑑𝑘1:𝑟 and one base part 𝐼𝑏𝑘

𝑟 . And 𝐼𝑘 can be reconstructed by add 𝐼𝑏𝑘𝑟 and 𝑟 detail images.

Then, adaptive strategies are utilized to fuse the detail parts and the base part.

3.2 Fusion Strategies

Once the detail parts and the base part are obtained by DDLatLRR, we choose adaptive strategies

to fuse these parts.

3.2.1 For base part

Base parts for input images contain more common features, redundant information and brightness

information. So, in our fusion method, we use weighted average strategy to obtain the fused base

part. The fused base part can be calculated by Eq.(4),

𝐼𝑏𝑓(𝑥, 𝑦) = 𝑤b1𝐼𝑏1𝑟 (𝑥, 𝑦) + 𝑤b2𝐼𝑏2

𝑟 (𝑥, 𝑦) ............................ (4)

𝑠. 𝑡. 𝑤𝑏1 = 𝑤𝑏2 = 0.5.

where (𝑥, 𝑦) denotes the corresponding position in base parts(𝐼𝑏1𝑟 , 𝐼𝑏2

𝑟 ) and fused base part(𝐼𝑏𝑓).

3.2.2 For detail parts

In contrast to base part, the detail parts preserve more structure information and salient

features. So, the fusion strategy(𝐹𝑆(⋅)) for detail parts should be more carefully chosen. In our

method, nuclear norm is used to calculate the weights for each corresponding image patches which

are divided by 𝑃(⋅) from input images. And this strategy is shown in Fig.5.

Fig.5. Fusion strategy based on nuclear norm. ‘reshape’ means the reconstruction operator which reverse the vector

to an image patch. || ⋅ ||∗ indicates nuclear-norm. 𝑤𝑑𝑘𝑖,𝑗 indicates the weight for each column. 𝑉𝑑𝑓

𝑖 denotes the fused

detail part.

In our fusion strategy, 𝑉𝑑𝑘𝑖,𝑗 and 𝑉𝑑𝑓

𝑖,𝑗 indicate vectors which are 𝑗-th column in the detail part

matrix 𝑉𝑑𝑘𝑖 and the fused detail part matrix 𝑉𝑑𝑓

𝑖 , respectively. And 𝑖 is the decomposition level, 𝑘 ∈

{1,2}. Firstly, for each corresponding column, the weights 𝑤𝑑𝑘𝑖,𝑗 are calculated by Eq.(5),

𝑤𝑑𝑘𝑖,𝑗

=𝑤𝑑𝑘

𝑖,��

∑ 𝑤𝑑𝑝𝑖,��𝑃

𝑝=1 , ..................................... (5)

𝑠. 𝑡. 𝑤𝑑𝑘𝑖,��

= ||𝑟𝑒(𝑉𝑑𝑘𝑖,𝑗

)||∗ , P = 2

where 𝑟𝑒(⋅) indicates the reshape operator which is used to reconstruct image patch from the vector

𝑉𝑑𝑘𝑖,𝑗, and || ⋅ ||∗ indicates nuclear-norm which calculates the sum of singular values for a matrix.

Then, fused detail part vector 𝑉𝑑𝑓𝑖,𝑗 is obtained by weights 𝑤𝑑𝑘

𝑖,𝑗 and detail part vectors 𝑉𝑑𝑘

𝑖,𝑗, as

shown in Eq.(6),

𝑉𝑑𝑓𝑖,𝑗

= ∑ 𝑤𝑑𝑝𝑖,𝑗

× 𝑉𝑑𝑝𝑖,𝑗𝑃

𝑝=1 , 𝑃 = 2. ............................... (6)

This fusion strategy is applied for each level and r fused detail part matrices 𝑉𝑑𝑘1:𝑟 are

obtained. Then, each fused detail image(𝐼𝑑𝑓𝑖 ) for r levels is calculated by Eq.(7),

𝐼𝑑𝑓𝑖 = 𝑅(𝑉𝑑𝑓

𝑖 ) ...................................... (7)

where 𝑅(⋅) indicates the reconstruction operator which is discussed before.

3.3 Reconstruction

Once we have the fused detail images and the fused base part, fused image is generated by

Eq.(8),

𝐼𝑓(𝑥, 𝑦) = 𝐼𝑏𝑓(𝑥, 𝑦) + ∑ (𝐼𝑑𝑓𝑖 (𝑥, 𝑦))𝑟

𝑖=1 ............................ (8)

4 Learning the Project Matrix 𝑳

As we discussed before, a project matrix 𝐿 is learned by LatLRR. And the size of 𝐿 is just

related with image patch. Training data[23] are shown in Fig.6, which contain five pairs infrared

and visible images. In Fig.6, first row and second row are infrared images and visible images,

respectively.

Fig.6. Five pairs infrared and visible images, which are used to learn L by LatLRR.

In our learning phase, all these images are divided into image patches by sliding windows

technique without overlapping. We choose three different image patch size, which are 𝑛 × 𝑛 and 𝑛 ∈

{8, 16, 32}. And, 1200 image patches are randomly chosen to generate an input matrix 𝑋 which each

column indicates all pixels of one image patch. Then, the size of 𝑋 is 𝑁 × 𝑀, where 𝑁 = 𝑛 × 𝑛 and

𝑀 = 1200.

As we discussed in section 2.1 and 2.2, the project matrix 𝐿 could be learned by LatLRR and

ALM. In LatLRR, 𝜆 is set as 0.4. With three different image patch size, three types of 𝐿 are

obtained, which their size are 64 × 64, 128 × 128 and 1024 × 1024.

In this case, if we choose the same image patch size(𝑛 × 𝑛) in sliding window technique for

testing images, 𝐿 could be used to extract features, repeatedly.

5 Experimental Results and Analysis

The aims our experiment are to discuss why the nuclear-norm and overlapping operator are

utilized in our method, and evaluate the fusion performance compare with other existing fusion

methods.

5.1 Experimental setting

Our testing data is available at [24] which also are collected from [25] and [26]. There are 21

pairs infrared and visible images. And a sample of these infrared and visible images is shown in

Fig.7

Fig.7. Five pairs of source images. The top row contains infrared images, and the second row contains visible

images.

Firstly, we discuss the reason of why we choose overlapping operator and nuclear-norm in fusion

strategy. For non-overlapping, three types of 𝐿(𝐿8, 𝐿16, 𝐿32) are used, which means the decomposition

size is 8 × 8, 16 × 16 and 32 × 32.

However, for overlapping operator, due to each pixel have to generate an image patch, when

image patch size is set 32 × 32,, the input source matrix size will be extremely lager and the

speed for just calculate one level also will be very slow. So, we just choose 8 × 8 and 16 × 16 for

decomposition size when use overlapping operator.

For deep decomposition method, the number of decomposition level is 1 to 4, which means 𝑟 =

1, 2, 3, 4.

For comparison, nine recent and classical fusion methods are chosen to perform the same

experiment, including: cross bilateral filter method(CBF)[27], discrete cosine harmonic wavelet

transform method(DCHWT)[28], joint sparse representation-based method(JSR)[29], the JSR model with

saliency detection fusion method(JSRSD)[13], the gradient transfer fusion method(GTF)[30], weighted

least square optimization-based method(WLS)[25], convolutional sparse representation(ConvSR)[15],

VGG-19 and multi-layers fusion strategy-based method(VggML)[18] and a CNN-based fusion

method(DeepFuse)[17].

For the purpose of quantitative comparison between our method and other existing fusion methods

seven quality metrics are utilized. These are: entropy(En); mutual information(MI); 𝑄𝑎𝑏𝑓[31]

reflects the quality of visual information obtained from the fusion of input images; 𝐹𝑀𝐼𝑑𝑐𝑡 and

𝐹𝑀𝐼𝑤[32] calculate fast mutual information (FMI) for the pixel, discrete cosine and wavelet

features, respectively; a modified structural similarity 𝑆𝑆𝐼𝑀𝑎[18]; and MS_SSIM[33] calculates a

modified structural similarity which just focus on structural information. The fusion performance

of fused image is better with the increasing numerical index of these metric values.

All the experiments are implemented in MTALAB R2017b on 2.8 GHz Intel(R) Core(TM) i5-8400 CPU

with 16 GB RAM.

5.2 Why choose nuclear-norm

Firstly, in this experiment, we choose non-overlapping and overlapping operator to divide

testing images with sliding windows technique. 𝑙1-norm and nuclear-norm are utilized to make

comparisons. And 𝑙1-norm based fusion strategy is shown in Fig.8.

Fig.8. Fusion strategy based on 𝑙1-norm. || ⋅ ||1 indicates 𝑙1-norm

The average values of seven quality metrics for 21 fused images which obtained by our method

using non-overlapping and two norm-based strategies are shown in Table 1. And Table 2 shows the

quality metrics values for overlapping operator. 𝐿8 𝐿16 and 𝐿32 indicate that 𝐿 is learned by set

image patch size is 8 × 8, 16 × 16 and 32 × 32, respectively. And for overlapping, the input matrix

size will be extremely lager and the fusion speed will be very slow, so 𝐿32 is discarded.

And in our tables, the best and second-best values are denoted in bold and red, respectively.

Table 1 The average values of ten metrics for 21 fused images which are obtained by our method with non-overlapping

and two norms.

Non-overlapping En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM

𝑙1-norm

L8

level-1 6.20354 12.40708 0.38865 0.38595 0.40588 0.77641 0.87710

level-2 6.22697 12.45395 0.40803 0.36816 0.40117 0.77376 0.87760

level-3 6.24534 12.49069 0.41648 0.35635 0.39954 0.77195 0.87649

level-4 6.25966 12.51932 0.42051 0.34838 0.39927 0.77086 0.87494

L16

level-1 6.21365 12.42731 0.40794 0.38040 0.40357 0.77271 0.88806

level-2 6.24910 12.49820 0.44080 0.35591 0.39256 0.76579 0.89484

level-3 6.27704 12.55409 0.45630 0.33877 0.38487 0.76019 0.89725

level-4 6.29874 12.59748 0.46306 0.32631 0.37932 0.75595 0.89776

L32

level-1 6.20968 12.41936 0.39557 0.37721 0.39942 0.77327 0.88553

level-2 6.24695 12.49390 0.42609 0.34732 0.38379 0.76650 0.89219

level-3 6.27939 12.55878 0.44334 0.32513 0.37142 0.76025 0.89521

level-4 6.30711 12.61421 0.45275 0.30891 0.36169 0.75485 0.89660

nuclear-

norm

L8

level-1 6.20768 12.41537 0.40342 0.38946 0.40813 0.77563 0.88093

level-2 6.23431 12.46861 0.42628 0.37347 0.40334 0.77255 0.88336

level-3 6.25529 12.51057 0.43494 0.36214 0.40116 0.77047 0.88322

level-4 6.27169 12.54337 0.43833 0.35383 0.40030 0.76919 0.88211

L16

level-1 6.21417 12.42834 0.41765 0.38391 0.40652 0.77234 0.88931

level-2 6.25079 12.50157 0.45573 0.36196 0.39716 0.76491 0.89754

level-3 6.28051 12.56103 0.47401 0.34606 0.38998 0.75868 0.90125

level-4 6.30429 12.60857 0.48237 0.33409 0.38451 0.75379 0.90285

L32

level-1 6.20754 12.41508 0.39922 0.37922 0.40077 0.77323 0.88493

level-2 6.24321 12.48642 0.43149 0.35092 0.38599 0.76648 0.89161

level-3 6.27483 12.54966 0.44965 0.32948 0.37403 0.76020 0.89485

level-4 6.30227 12.60454 0.45990 0.31356 0.36458 0.75469 0.89652

Table 2 The average values of seven metrics for 21 fused images which are obtained by our method with overlapping

and two norms.

Overlapping En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM

𝑙1-norm

L8

level-1 6.19811 12.39622 0.38558 0.39976 0.41494 0.77830 0.87583

level-2 6.41292 12.82583 0.49336 0.39962 0.42733 0.76452 0.91792

level-3 6.68753 13.37506 0.48276 0.38757 0.43273 0.70326 0.92864

level-4 6.57591 13.15182 0.37264 0.33866 0.41473 0.60962 0.88210

L16

level-1 6.21917 12.43833 0.42405 0.40229 0.41881 0.77632 0.89272

level-2 6.45665 12.91330 0.52704 0.40060 0.43029 0.75556 0.94603

level-3 6.75216 13.50432 0.49299 0.38505 0.43449 0.68956 0.93230

level-4 6.98423 13.96846 0.35764 0.36075 0.43119 0.60014 0.86392

nuclear-

norm

L8

level-1 6.20284 12.40569 0.40245 0.40343 0.41767 0.77772 0.88031

level-2 6.42438 12.84876 0.50919 0.40609 0.43028 0.76149 0.92561

level-3 6.70183 13.40366 0.49195 0.39115 0.43481 0.69669 0.93517

level-4 6.60205 13.20411 0.37112 0.34020 0.41535 0.60163 0.88644

L16

level-1 6.21857 12.43714 0.43640 0.40623 0.42214 0.77573 0.89399

level-2 6.45277 12.90554 0.53936 0.40784 0.43479 0.75241 0.94796

level-3 6.74924 13.49849 0.49937 0.39279 0.43944 0.68260 0.93285

level-4 6.98267 13.96534 0.35134 0.36741 0.43546 0.59045 0.86227

From Table 1 and Table 2, the nuclear-norm gains almost best and second-best values of seven

quality metrics in both non-overlapping and overlapping. For non-overlapping, nuclear-norm obtains

four best values(Qabf, 𝐹𝑀𝐼𝑑𝑐𝑡, 𝐹𝑀𝐼𝑤, MS_SSIM) and five second-best values(except 𝐹𝑀𝐼𝑑𝑐𝑡, 𝐹𝑀𝐼𝑤).

For overlapping, the nuclear-norm still has advantageous in both best and second-best values.

These results denote that nuclear-norm can achieve better performance with or without

overlapping operator. Because 𝑙1-norm based strategy just considers the values of detail parts, the

structural information is ignored. And nuclear-norm is used to calculate the sum of singular values

for a matrix, which considers the structural information for an image patch. Another reason is that

our fusion framework is based on LatLRR which is also related with nuclear-norm, so nuclear-norm

obtains better performance than 𝑙1-norm. Thus nuclear-norm is used in our fusion framework.

5.3 Why choose overlapping operator

In this section, we analyze the influence of overlapping operator on our method and fusion

strategy is based on nuclear-norm. Seven quality metrics are still chosen to evaluate the

performance. The values are shown in Table 3. And the best and second-best values are denoted in

bold and red, respectively.

Table 3 The average values of seven metrics for 21 fused images are obtained by our method with nuclear-norm and

non-overlapping and overlapping operator.

nuclear-norm En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM

Non-

overlapping

L8

level-1 6.20768 12.41537 0.40342 0.38946 0.40813 0.77563 0.88093

level-2 6.23431 12.46861 0.42628 0.37347 0.40334 0.77255 0.88336

level-3 6.25529 12.51057 0.43494 0.36214 0.40116 0.77047 0.88322

level-4 6.27169 12.54337 0.43833 0.35383 0.40030 0.76919 0.88211

L16

level-1 6.21417 12.42834 0.41765 0.38391 0.40652 0.77234 0.88931

level-2 6.25079 12.50157 0.45573 0.36196 0.39716 0.76491 0.89754

level-3 6.28051 12.56103 0.47401 0.34606 0.38998 0.75868 0.90125

level-4 6.30429 12.60857 0.48237 0.33409 0.38451 0.75379 0.90285

L32

level-1 6.20754 12.41508 0.39922 0.37922 0.40077 0.77323 0.88493

level-2 6.24321 12.48642 0.43149 0.35092 0.38599 0.76648 0.89161

level-3 6.27483 12.54966 0.44965 0.32948 0.37403 0.76020 0.89485

level-4 6.30227 12.60454 0.45990 0.31356 0.36458 0.75469 0.89652

Over-

lapping

L8

level-1 6.20284 12.40569 0.40245 0.40343 0.41767 0.77772 0.88031

level-2 6.42438 12.84876 0.50919 0.40609 0.43028 0.76149 0.92561

level-3 6.70183 13.40366 0.49195 0.39115 0.43481 0.69669 0.93517

level-4 6.60205 13.20411 0.37112 0.34020 0.41535 0.60163 0.88644

L16

level-1 6.21857 12.43714 0.43640 0.40623 0.42214 0.77573 0.89399

level-2 6.45277 12.90554 0.53936 0.40784 0.43479 0.75241 0.94796

level-3 6.74924 13.49849 0.49937 0.39279 0.43944 0.68260 0.93285

level-4 6.98267 13.96534 0.35134 0.36741 0.43546 0.59045 0.86227

From Table 3, overlapping-based framework obtains all the best values and the second-best

values compare with non-overlapping-based framework. These results indicate that with overlapping

operator, our deep decomposition method can perserve more information in detail parts, which will

improve the performance of our fusion method. So, overlapping operator is utilized to divided input

images in our deep decomposition method.

5.4 Subjective evaluation

Due to the space limit, we only show the fused results of one pair source images(“street”).

And these results are obtained by nine existing fusion methods and our algorithm(DDLatLRR) which

use different 𝐿 and decomposition levels. The fused results are shown in Fig.9.

Fig.9. Experiment on “street” images. (a) Infrared image; (b) Visible image; (c) CBF; (d) DCHWT; (e) JSR; (f)

JSRSD. (g) GTF; (h) WLS; (i) ConvSR; (j) VggML; (k) DeepFuse; l-o) DDLatLRR(𝐿8) with level 1 to 4; p-s)

DDLatLRR(𝐿16) with level 1 to 4.

As we can see from Fig.9, the fused images which are obtained by CBF and DCHWT contain more

artifacts information and their salient features are not clear. For the fused images which are

obtained by JSR, JSRSD, GTF contain many ringing artifacts around the salient features and the

detail information are also not clear. In contrast, the fused images which are obtained by WLS,

ConvSR, VggML, DeepFuse and the proposed fusion method contain more saliency features and preserve

more detail information.

On the other hand, with the increasing of the decomposition level, the salient features are

enhanced by our fusion methods, as shown in Fig.9(n), Fig.9(o) and Fig.9(r), Fig.9(s).

5.5 Objective Evaluation

For better present the persuasion of our fusion method, seven quality metrics are also used to

evaluate the fusion performance for nine existing fusion methods and our algorithm.

In this section, the average values of seven quality metrics for 21 pairs source images are

shown in Table 4. And the best and second-best values are denoted in bold and red, respectively.

Table 4 The average values of seven quality metrics for 21 pairs source images.

En MI Qabf 𝐹𝑀𝐼𝑑𝑐𝑡 𝐹𝑀𝐼𝑤 𝑆𝑆𝐼𝑀𝑎 MS_SSIM

CBF(2013) 6.85749 13.71498 0.43961 0.26309 0.32350 0.59957 0.70879

DCHWT(2012) 6.56777 13.13553 0.46592 0.38568 0.40147 0.73132 0.84326

JSR(2013) 6.72263 12.72654 0.32306 0.14236 0.18506 0.54073 0.75523

JSRSD(2017) 6.72057 13.38575 0.32281 0.14253 0.18498 0.54127 0.75517

GTF(2016) 6.63433 13.26865 0.41037 0.39787 0.41038 0.70016 0.80844

WLS(2017) 6.64071 13.28143 0.50077 0.33103 0.37662 0.72360 0.93349

ConvSR(2016) 6.25869 12.51737 0.53485 0.34640 0.34640 0.75335 0.90281

VggML(2018) 6.18260 12.36521 0.36818 0.40463 0.41684 0.77799 0.87478

DeepFuse(2017) 6.69935 13.39869 0.43797 0.41501 0.42477 0.72882 0.93353

DDLatLRR

L8

level-1 6.20284 12.40569 0.40245 0.40343 0.41767 0.77772 0.88031

level-2 6.42438 12.84876 0.50919 0.40609 0.43028 0.76149 0.92561

level-3 6.70183 13.40366 0.49195 0.39115 0.43481 0.69669 0.93517

level-4 6.60205 13.20411 0.37112 0.34020 0.41535 0.60163 0.88644

L16

level-1 6.21857 12.43714 0.43640 0.40623 0.42214 0.77573 0.89399

level-2 6.45277 12.90554 0.53936 0.40784 0.43479 0.75241 0.94796

level-3 6.74924 13.49849 0.49937 0.39279 0.43944 0.68260 0.93285

level-4 6.98267 13.96534 0.35134 0.36741 0.43546 0.59045 0.86227

In Table 4, the proposed method achieves five best values(En, MI, Qabf, 𝐹𝑀𝐼𝑤 and MS_SSIM) and

three second-best values(𝐹𝑀𝐼𝑑𝑐𝑡, 𝐹𝑀𝐼𝑤 and 𝑆𝑆𝐼𝑀𝑎). These values indicate that the fused images which

are obtained by our method are more natural and contain less artificial information. From objective

evaluation, our fusion method has better fusion performance than those compared methods.

Specially for the quality metrics of EN and MI, we notice that CBF obtained the second-best

values because its fused image contains more noise and artifacts information, as shown in Fig.9.

However, our fusion method obtains the best values on EN and MI is because the salient features are

enhanced by our fusion method with the rise of decomposition level. So our fusion method may have

feature enhancement ability.

6 Conclusions

In this paper, we proposed a novel infrared and visible image fusion method based on a deep

decomposition method(DDLatLRR). Firstly, the training data are used to learn a project matrix 𝐿 by

LatLRR. Then, DDLatLRR is utilized to decompose detail parts and base part from source images. In

DDLatLRR, the source images are divided into image patches by sliding window technique with

overlapping operator. After 𝑟 levels decomposition, 𝑟 detail parts and one base part are obtained.

For base part, weighted-average strategy is used to generate fused base part. The nuclear-norm

based fusion strategy is utilized to fuse detail parts. Finally, the fused image is reconstructed

by adding the fused base part and the fused detail part. We use both subjective and objective

methods to evaluate the proposed method, the experimental results show that the proposed method

exhibits better performance than other compared methods.

Reference：

[1] S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of

the art,” Inf. Fusion, vol. 33, pp. 100–112, 2017.

[2] A. Ben Hamza, Y. He, H. Krim, and A. Willsky, “A Multiscale Approach to Pixel-level Image

Fusion,” Integr. Comput. Aided. Eng., vol. 12, pp. 135–146, 2005.

[3] S. Yang, M. Wang, L. Jiao, R. Wu, and Z. Wang, “Image fusion based on a new contourlet packet,”

Inf. Fusion, vol. 11, no. 2, pp. 78–84, 2010.

[4] L. Wang, B. Li, and L. F. Tian, “EGGDD: An explicit dependency model for multi-modal medical

image fusion in shift-invariant shearlet transform domain,” Inf. Fusion, vol. 19, no. 1, pp. 29–

37, 2014.

[5] H. Pang, M. Zhu, and L. Guo, “Multifocus color image fusion using quaternion wavelet transform,”

2012 5th Int. Congr. Image Signal Process. CISP 2012, 2012.

[6] X. Luo, Z. Zhang, B. Zhang, and X. Wu, “Image Fusion with Contextual Statistical Similarity and

Nonsubsampled Shearlet Transform,” IEEE Sens. J., vol. 17, no. 6, pp. 1760–1771, 2017.

[7] D. P. Bavirisetti and R. Dhuli, “Two-scale image fusion of visible and infrared images using

saliency detection,” Infrared Phys. Technol., vol. 76, pp. 52–64, 2016.

[8] Y. Zhang, X. Bai, and T. Wang, “Boundary finding based multi-focus image fusion through multi-

scale morphological focus-measure,” Inf. Fusion, 2017.

[9] J. jing Zong and T. shuang Qiu, “Medical image fusion based on sparse representation of

classified image patches,” Biomed. Signal Process. Control, vol. 34, pp. 195–205, 2017.

[10] X. Lu, B. Zhang, Y. Zhao, H. Liu, and H. Pei, “The infrared and visible image fusion algorithm

based on target separation and sparse representation,” Infrared Phys. Technol., vol. 67, pp.

397–407, 2014.

[11] H. Li and X.-J. Wu, “Multi-focus Noisy Image Fusion using Low-Rank Representation,” 2018.

[12] M. Yin, P. Duan, W. Liu, and X. Liang, “A novel infrared and visible image fusion algorithm based

on shift-invariant dual-tree complex shearlet transform and sparse representation,”

Neurocomputing, vol. 226, no. November 2016, pp. 182–191, 2017.

[13] C. H. Liu, Y. Qi, and W. R. Ding, “Infrared and visible image fusion method based on saliency

detection in sparse domain,” Infrared Phys. Technol., vol. 83, pp. 94–102, 2017.

[14] R. Gao, S. A. Vorobyov, and H. Zhao, “Image fusion with cosparse analysis operator,” IEEE Signal

Process. Lett., vol. 24, no. 7, pp. 943–947, 2017.

[15] Y. Liu, X. Chen, R. K. Ward, and J. Wang, “Image Fusion with Convolutional Sparse

Representation,” IEEE Signal Process. Lett., vol. 23, no. 12, pp. 1882–1886, 2016.

[16] Y. Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural

network,” Inf. Fusion, vol. 36, pp. 191–207, 2017.

[17] K. R. Prabhakar, V. S. Srikar, and R. V. Babu, “DeepFuse: A Deep Unsupervised Approach for

Exposure Fusion with Extreme Exposure Image Pairs,” in Proceedings of the IEEE International

Conference on Computer Vision, 2017, pp. 4724–4732.

[18] H. Li, X.-J. Wu, and J. Kittler, “Infrared and Visible Image Fusion using a Deep Learning

Framework,” in arXiv preprint arXiv:1804.06992, 2018.

[19] A. Z. Karen Simonyan, “VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION,” in

ICLR 2015, 2015, vol. 5, no. 3, pp. 345–358.

[20] S. Li, X. Kang, and J. Hu, “Image fusion with guided filtering,” IEEE Trans. Image Process.,

vol. 22, no. 7, pp. 2864–2875, 2013.

[21] G. Liu and S. Yan, “Latent Low-Rank Representation for Subspace Segmentation and Feature

Extraction,” in Proceedings of the IEEE International Conference on Computer Vision, 2011, pp.

1615–1621.

[22] G. Liu, Z. Lin, and Y. Yu, “Robust Subspace Segmentation by Low-Rank Representation,” in

Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 663–670.

[23] H. Li, Training data,

https://github.com/exceptionLi/imagefusion_deepdecomposition/tree/master/training_data. 2018.

[24] H. Li, Testing data,

https://github.com/exceptionLi/imagefusion_deepdecomposition/tree/master/IV_images. 2018.

[25] J. Ma, Z. Zhou, B. Wang, and H. Zong, “Infrared and visible image fusion based on visual saliency

map and weighted least square optimization,” Infrared Phys. Technol., vol. 82, pp. 8–17, 2017.

[26] Alexander Toet et al., TNO Image Fusion Dataset.

https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029. 2014.

[27] B. K. Shreyamsha Kumar, “Image fusion based on pixel significance using cross bilateral filter,”

Signal, Image Video Process., 2015.

[28] B. K. Shreyamsha Kumar, “Multifocus and multispectral image fusion based on pixel significance

using discrete cosine harmonic wavelet transform,” Signal, Image Video Process., vol. 7, no. 6,

pp. 1125–1143, 2013.

[29] Q. Zhang, Y. Fu, H. Li, and J. Zou, “Dictionary learning method for joint sparse representation-

based image fusion,” Opt. Eng., vol. 52, no. 5, p. 057006, 2013.

[30] J. Ma, C. Chen, C. Li, and J. Huang, “Infrared and visible image fusion via gradient transfer and

total variation minimization,” Inf. Fusion, vol. 31, pp. 100–109, 2016.

[31] C. S. Xydeas and V. Petrovic, “Objective image fusion performance measure,” Electron. Lett.,

2000.

[32] M. Haghighat and M. A. Razian, “Fast-FMI: Non-reference image fusion metric,” in 8th IEEE

International Conference on Application of Information and Communication Technologies, AICT 2014 -

Conference Proceedings, 2014.

[33] K. Ma, S. Member, K. Zeng, and Z. Wang, “Perceptual Quality Assessment for Multi-Exposure Image

Fusion,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3345–3356, 2015.

infrared and visible image fusion using a novel deep ...abstract： infrared and visible image...

Documents