low complexity detection algorithms in large-scale mimo systems

7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

1/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

1

Low Complexity Detection Algorithms in Large-Scale MIMO

Systems

Ali Elghariani, Member, IEEEand Michael Zoltowski, Fellow, IEEE

School of Electrical and Computer Engineering Purdue University, West Lafayette IN 47906

Email: [email protected] and [email protected]

In this contribution, we present low-complexity detectionalgorithms in large-scale MIMO systems where they achievesignificantly better bit error rate (BER) performance than knownheuristic algorithms in large-scale MIMO literature, such asLocal Ascent Search (LAS) and Reactive Tabu Search (RTS)algorithms, especially at higher-order modulations. The proposedtechniques are developed from the conventional Quadratic Pro-gramming (QP) detector. The first one is based on performingtwo stages of a QP detector with a novel combination of bothinterference cancellation and shadow area constraints of theconstellation. The second one is based on the Branch and Boundsearch tree algorithm. The efficacy of the proposed algorithms areinvestigated at various QAM modulations. Computer simulationsshow that the proposed algorithms outperform LAS and RTSalgorithms in both uncoded and turbo coded BER performance,especially at higher QAM levels, with no significant change incomplexity as the modulation level increases. Also, an extension ofthe QP detector for iterative detection and decoding is developedfor the case of QPSK using a low complexity approach.

Index TermsLarge-scale MIMO, Quadratic Programming,Two-stage Quadratic Programming, Branch and Bound, Com-

plexity, Iterative Detection and Decoding.

I. INTRODUCTION

A large-scale multi-input multi-output (MIMO) (or a so-

called Massive MIMO) system in which a large number of

antennas is used at the transmitter and/or receiver is one

of the main components of the future 5G wireless commu-

nication systems [2]. The capacity of this MIMO system

can be scaled up by installing more antennas at the trans-

mitter and/or receiver to fulfill the demands for high data

rate applications [3], [4], [5]. The interest in these systems

poses challenges in several design aspects, such as channelestimation, antenna correlation, hardware implementation, and

detection complexity [6],[5]. In particular, a critical design

challenge in a large-scale MIMO system is to design a reliable

and computationally efficient detector even if the number of

antennas grows very large or the modulation order increases.

There have been many linear detectors and near-Maximum

Likelihood detectors proposed in the literature of conventional

MIMO systems; however, they become noncompetitive when

A preliminary version of this work was presented in IEEE WCNC 2015 [1],in which only one algorithm is considered. In this paper, further algorithmsare considered with extensive simulation results.

used to serve large-scale systems. One reason is because their

computational complexity becomes exponential, such as the

case of Sphere Decoding (SD) and its variants [7], [8], [9].

Another reason is because the performance worsens as the

number of antennas increases, such as the cases of minimum

mean square error (MMSE), MMSE with ordered successive

interference cancellation (MMSE-OSIC) [10], Chase [11], QRDecomposition combined with an M algorithm (QRDM) [12],

and Fixed Sphere Decoding (FSD) [13], [14].

Various algorithms have been presented in the literature of

large-scale MIMO that exhibit a large-system behavior where

the BER performance improves as the number of antennas

increases, such as the family of Likelihood Ascent Search

(LAS) detectors and the reactive tabu search (RTS) detectors.

LAS detectors have been proposed in [15], [16], [7] for

large-scale MIMO systems. They are based on successively

searching the local neighborhood of some good initial vectors,

such as MMSE vector. They show near-single antenna AWGN

performance, especially when hundreds of antennas are used,

with an average per-received vector complexity of O(N3t),

where Nt =Nr and Nt and Nr denote the number of trans-mit and receive antennas, respectively. LAS detectors have

been generalized for higher order modulations; however, they

still suffer from performance deterioration as the modulation

order increases. They also require a very large number of

antennas, in the order of hundreds, to achieve near-single

antenna unfaded performance. This number increases as the

modulation level increases [16]. The RTS algorithm has also

been proposed for large-scale MIMO systems with various

QAM modulations in [17], [13], [18]. It is a heuristic-based

combinatorial optimization technique that forces the search to

visit several neighborhood solutions and then choose the ML

solution among them. It achieves near-ML performance withmuch lower complexity compared to ML and SD, especially in

low-order modulations; however, its computational complexity

scales up significantly with increasing QAM levels accompa-

nied by performance deterioration.

In this article, three potential algorithms are proposed for a

large-scale MIMO detection problem. They can provide near-

single antenna AWGN performance with only tens of antennas

and with nearly constant average complexity over all modu-

lation orders. The first algorithm is simply the conventional

quadratic programming detector in which the ML problem is

reformulated using a quadratic optimization problem. We show






2

in this paper that it provides better performance than the LAS

detector with no major increase in average complexity. We

also show that its complexity does not grow significantly from

a low-order to a high-order modulation. While QP detectors

have already been studied in conventional MIMO systems

[19], [20], [21], [22], [23], their performance comparisons

with existing heuristic algorithms (especially in large-scale

MIMO) have not been seriously considered. In this work, we

present the performance and complexity comparisons of QP-

based detectors with existing techniques and point out that they

are among the family of detectors that exhibit large-system

behavior.

The second proposed algorithm improves the performance

of the first algorithm with a minor complexity increase. The

improvement is based on the use of two stages of QP detector

with a successive interference cancellation strategy that utilizes

a shadow area constraint [24] to measure symbols reliability.

Finally, the third algorithm uses the Branch and Bound (BB)

search tree algorithm to further improve the solution of the

conventional QP detector. In this algorithm, we do not perform

the standard BB search tree as in [25], [26]; rather, a reducedand controlled version is proposed to provide a flexible trade-

off between performance and complexity. A few nodes are

explored in the BB tree based on two criteria: one reduces

the depth of the BB tree and the other reduces the width of

the BB tree. This idea is based on combining our previously

proposed techniques which were used in [27], [28]. Although

the complexity of this algorithm is still high when Nt is large

at all SNRs, we reduced it dramatically (although, only at

high SNR) by applying a new pruning rule based on the

difference between the cost function of the integer problem

and its relaxed problem in each node of the BB search tree.

To the best of our knowledge the two proposed algorithms

are new and have not been presented in literature before,especially in conjunction with QP detectors or large-scale

MIMO systems. In addition to these two algorithms, the

contribution of this paper includes: (i) Reducing complexity

of the standard QP solver with no major loss in performance.

This reduction is then used in implementing the two proposed

techniques. (ii) Investigating the performance of the proposed

techniques with a more realistic MIMO channel (the spatially

correlated Kronecker Model). And (iii) presenting a low com-

plexity method that generates soft information from QP-based

detectors that can be used to implement an iterative detection

and decoding receiver.

I I . SYSTEM M ODEL

Consider a MIMO system withNttransmit antennas andNrreceive antennas employing a spatial multiplexing transmis-

sion known as Vertical Bell Laboratories Layered Space-Time

(V-BLAST) [29], [10]. At the transmitter side, the information

is generated in the source and mapped to symbols of different

alphabets. The mapped complex symbols are demultiplexed

into Nt separate independent data streams with a transmitted

signal vector x = [x1, . . . ,xNt ]T CNt1. The general

MIMO channel model is:

y= Hx +n (1)

where y = [y1, . . . ,yNr ]T CNr1 is the received signal

vector at all Nr antennas, H CNrNt denotes the flatfading channel gain matrix whose entries are modeled as

CN(0, 1), and n represents the receiver AWGN noise vectorwhose entries are modeled as i.i.d CN(0, 2). A more realisticMIMO channel will be considered later in section IV. The tilde

symbol in (1) is made to distinguish the complex model from

the real model shown in the next section. We assume ideal

channel estimation and synchronization at the receiver end.

III. PROPOSEDA LGORITHMS

A. Formulation of the Problem

The ML problem of model (1), which is equivalent toEuclidean distance minimization, can be expressed as:

x= argminxNt

y Hx22 (2)

where Nt is the set of all possible Nt-dimensional complexcandidate vectors of the transmitted vector x. The equivalentreal-valued model of (1) is:

y= Hx+n (3)

y=

{y}{y}

, x =

{x}{x}

, n =

{n}{n}

,H =

{H} {H}{H} {H}

(4)

In this real-valued system model, the real part of the complexdata symbols is mapped to [x1, . . . , xNt ] and the imaginarypart of these symbols is mapped to [xNt+1, . . . , x2Nt ]. Nowthe equivalent ML detection problem of the real model canbe written as: x = argmin

x2Nt y Hx 22, where set =

{C+1,.., 1, 1,..., C1}, and Cis the QAM constellationsize. Each element of this real set can be transformed toa positive integer using the following linear transformation:

z = x+(

C1)2 . The above ML problem can be simplified to

the following optimization problem:

z= arg minz2Nt

{12zTQz + bTz} (5)

where = {0, 1, 2,.., C 1}, Q = HTH is a symmetricpositive semidefinite matrix,b =HT(y +(C1)H1)/2, and1= [1, 1, . . . , 1]T is a column vector of dimension (2Nt 1).

B. Algorithm I: A QP Detector (Review)

One way to approximate the solution of (5) is to use QP

solvers that rely on relaxing the integer constraints. Thus,

problem (5) can be relaxed to the following:

argminz

12zTQ z +bTz

subject to 0 z (

C 1)1(6)

where 0 represents a 2Nt 1 vector of all zeros and theconstraints 0 z (C 1)1 represents the box constraintsof all elements of z, i.e. each element (symbol) of z is lower

bounded by 0 and upper bounded by

C 1. This form of anoptimization problem is a convex QP minimization problem.

A unique global continuous solution z can be obtained using

efficient interior-point solvers with reduced computational

complexity [30]. The importance of using an interior-point

solver is that in practice, the interior-point algorithm converges






3

in a number of iterations that is constant, independent of

the problem dimension [31]. This becomes attractive from

a complexity point of view, especially when the number of

antennas increases. Solving (6) provides a 2Nt dimensional

solution vector z = [z1 , . . . , z2N]

T R2Nt and a scalar costfunction value f(z). If all elements of z satisfy the integer

constraints, then z is the optimum solution for problems (5)

and (6). In general, the integer solution of (6) is provided by

quantizing z to the nearest constellation set , that is:

zi = Q[zi], i= 1, 2, . . . , 2Nt (7)

where Q[.]is a quantization function to the appropriate constel-lation levels of the set . In the next subsections, we propose

improvements to the QP detector in a large MIMO system

through performing further analysis to the problem (6) using

first, two-stage QP detection with interference cancellation,

and second, the concept of the BB search tree [32], [25],

[33]. It is worth noting that in the previous work [22],

[23], a randomization rounding technique is shown to provide

better performance than simple rounding (as in (7)), but with

additional complexity of the order O(N2t). This technique,

however, can still be used with any of our proposed algorithms.

C. Algorithm II: A Two-Stage QP Detector

The idea of this algorithm is to implement two stages of

QP detection with interference cancellation to further improve

the detection of the unreliable symbols (non-integer values of

z in (6)). One drawback of Algorithm I is that all symbols

are quantized simultaneously, irrespective of their reliabilities.

Therefore, in this algorithm we use the concept of interference

cancellation with symbols reliability that is based on shadow

area constraints. A shadow area between positive integers of

the constellation set (similar to [24]) is introduced beforeperforming quantization in (7). Any zi that falls in this

shadow area is considered unreliable. In other words, from

the continuous solution of (6), the variables with fractions

that are far from their nearest integers by a value greater

than or equal to are considered noisy, and therefore, need

another stage of QP detection after interference cancellation

of the more reliable symbols. We denote the positions of these

unreliable symbols by the set of indicesJ. On the other hand,the variables with small fractions (< ) or purely integers can

be immediately quantized and their values are considered the

optimum integer solution for both (6) and (5). Thus, their

effects need to be canceled out so that the solution of the noisy

variables can be improved. The set of indices that representthe positions of these integer variables is denoted as I, andcan be estimated using this criterion:

I={i: i {1, 2, . . . , 2Nt} | |zi zi| } (8)wherexis the rounding operation ofx to the nearest integer.Note that 0 < 0.4), mostof the symbols will pass the integer condition, even though

they might be far from their nearest integers. With this ,

interference cancellation may improve the detection of some

symbols, especially at a high SNR regime. In this algorithm,

is optimized based on both minimum BER and complexity

across various SNR using simulation experiments, since the

analytical optimization seems cumbersome. We found that the

optimum is around 0.2 to 0.3 for various QAM levels. Asummary of the Algorithm II steps is shown in Table I. Note

in the sequel, we refer to this algorithm as 2QP.

D. Algorithm III: A Controlled Size BB Search Tree

In this section, we start by a quick review for the standard

BB algorithm, then we introduce our proposed approximations

that help controll the size of the BB search tree and reduce its

computational complexity.

Branch and Bound algorithm: is a search tree-based al-

gorithm that successively forces non-integer values of z in(6) to be integers in a recursive way. It does so using a






4

TABLE I A Two-stage QP Algorithm

1 Input: Q , b2 z = quadprog(Q, b) from (6)3 FindIthat satisfies z z 4 z(I) =Q[z(I)]5 Find set of indicesJ6 Find Q= Q(J, J), and7 b= Q(I, J)Tz(I) +b(J)8 z(

J) = quadprog(Q, b) from (9)

9 z(J) =Q[z(J)]

search tree structure [32], [35], as shown in Fig. 1. The input

problem to the BB search tree is problem (6). Its optimum

continuous solution and cost function are denoted as z(0)

and f(z(0)), respectively. The rest of the nodes in the BBsearch tree are denoted the same way, related to their node

numbers, as depicted in Fig. 1. The basic idea of the standard

BB search tree is that it starts by solving problem (6) at

node 0 and then checks all the solution elements of z(0).

If they satisfy the integer constraints, then there is no need tofurther explore node 0 for a better solution because z(0) is theoptimum integer solution for problem (6), and also for (5) [32].

Alternatively, if they are not all integers, for example when

z(0) contains some symbols with fractions, the BB algorithmsplits the problem in node 0 into two subproblems by adding

two mutually exclusive and exhaustive constraints, as shown

in Fig. 1. The new subproblems are called children nodes,

and the original problem is called the parent node. The new

relaxed problems at nodes 1 and 2 are similar to (6) except

that the upper and lower bounds of the branching variable (say

variable i) are replaced with zi

z(0)i

and zi

z

(0)i

,

respectively. That is, problems at node 1 and node 2 of level

one can be written as:

argminz

1

2zTQ z+bTz

subject to 0 z (

C 1)1, and zi

z(0)i

,

(11)

argminz

1

2zTQ z+bTz

subject to 0 z (

C 1)1, and zi

z(0)i

(12)where zi is called the branching variable at index i (0 i 2Nt), and

z(0)i

(

z(0)i

) denotes the largest (smallest)

integer smaller (greater) than or equal to z(0)i . There are

various strategies for choosing the branching variable [35],

but in this paper we choose the simplest one, which branchesa node at the first non integr variable. Now solving these

new subproblems again using the interior-point algorithm,

returns (z(1) , f(z(1))) and (z(2) , f(z(2))) for nodes 1 and

2, respectively. If the solutions to these subproblems do not

satisfy the integer constraints, each of them will be branched

into two more subproblems and the process of branching will

continue until the optimal integer solution is found, see more

details on [33] and [35]. Two important pruning rules are used

with the BB algorithm: 1) for any node in the tree, whenever its

cost function value is greater than a known upper bound f(up),

this node is pruned because no better solution is expected from

the subtree below this node. The initial value of the upper

bound can be taken as a very large value, such as , or canbe computed from any available integer solution, such as ZF or

MMSE solutions. And 2) as mentioned above, if the solution

of any node satisfies the integer constraints, then no branching

is needed and the node is pruned.

In this paper, we focus on the Breadth First (BF) search

strategy [36], where the nodes of the tree are explored level

by level as dipicted in Fig. 1. We prefer this strategy because

it suits well our proposed approximation herein. In general,

Fig. 1 Representation of Breath First BB search tree

applying standard BB to (6) can lead to the ML solution, as is

shown in our previous work [27], [28]. However, our system

of interest is large-scale MIMO, where the dimension of the

problem is 2Nt. This makes the standard algorithm computa-

tionally expensive, and thus simplifications are needed.

Proposed Approximations: our proposed algorithm in this section

relies on adding the following three approximations to the

standard BB algorithm.

1) Depth reduction: Instead of finishing the search tree all

the way down until the optimum integer solution is found,

this approximation forces the BB search tree to stop at a

predefined level (layer) L, even if the optimum integer solutionhas not been reached yet. We denote the number of nodes

in the stopping level, L, as mL. Thus, the solution and the

corresponding cost function values of the existing nodes in

this level are z(p)L andf(z(p)L ), respectively, wherep = 1,...,mL.

Therefore, the approximated integer solution is the quantized

version of the solution corresponds to the minimum cost

function at the stopping level L:

z= Q[z(t)], t= argminp

f(z(p)), p= 1,....,mL (13)

This approximation is based on the concept of the standard

BB algorithm, where every time the algorithm moves down

one layer in the tree, at least one node comes closer to the

optimum integer solution due to the branching rule. In other

words, the nodes located in the path that leads to the optimum

integer solution have the following property: the absolute value

of the difference between zand its quantized version becomes

smaller and smaller. For example, in Fig. 1, assume that

the optimum integer solution found using the standard BB

algorithm is in node 14, and the path leads to this node is the

path from nodes 0, 2 , 6 and 14, then, |z(14) Q[z(14)]| |z(6) Q[z(6)]| |z(2) Q[z(2)]| |z(0) Q[z(0)]|.

2) Width Reduction: Instead of exploring all nodes in every

level of the search tree, this approximation explors only M






5

most probable nodes that may lead to the optimum solution,

while the rest are discarded (pruned). The selection criteria is

based on the cost function as a metric. To accomplish this, we

adopt the concept of the M algorithm [37], which is a breadth-

first algorithm that is widely used in the QRDM technique for

conventional MIMO systems [38].

3) For faster simulation time and a reduced number of

visited nodes (hence fewer computations), we further pro-

pose another approximation in conjunction with the BB(L,M)

search tree. This approximation depends on the difference

between the cost function value of the relaxed problem (using

continuous solution, z) and the cost function value of the

integer problem (using rounded solution,z) of any node inthe tree. The idea is that whenever this difference is small

(based on some criteria), we can approximate the relaxed

continuous solution to be the integer solution. This adds one

more pruning rule to the BB algorithm because more integer

solutions are going to be available in the tree. Hence, it reduces

the number of visited nodes significantly, especially at high

SNR. Following the same notation in this section, we denote

the optimum continuous solution of the relaxed problem ofa node k by z(k) and its objective function value as f(z(k)),

where k = 0, 1, 2, . . . , N v and Nv is the number of visited nodes

in the search tree. Similarly, we denote the quantized optimum

continuous solution of the same node, k, byQ[z(k)], and itscost function value as f(Q[z(k)]). Thus, the approximation is:

z(k) =

Q[z(k)] if|f(z(k)) f(Q[z(k)])| |f(z(k))|

z(k) otherwise

(14)

where|(.)| represents the absolute value operation, and is asmall number >0, which can be optimized based on a trade-

off between performance and complexity. The larger the , the

lower the performance and the complexity is reduced. In the

standard BB algorithm = 0. This approximation is differentfrom the one in [25], which prunes the node only if its cost

value is close to the best available upper bound.

Note that in the sequel, we refer to Algorithm III as BB(L,M),

where L is the stopping level of the search tree and M is the

number of nodes maintained in each level. The summary of

BB(L,M) is shown in Table II.

E. Complexity Analysis

The main ingredient of the computations in the QP detector

is the interior-point algorithm, which finds a point where

the Karush-Kuhn-Tucker (KKT) conditions hold for the op-

timization problem (6) in an iterative manner. As shown in[30] and [39], each iteration of the interior-point algorithm

boils down to solving a system of linear equations where it

is required to perform a matrix inversion in every iteration.

Therefore, the complexity of one interior-point iteration is in

the order of O(N3t), and becomes O(nN3t) for n iterations. In

practice, the interior-point converges in a number of iterations

which is almost always a constant, independent of the problem

dimension [31]. This is very attractive in high dimensional

optimization problems. From our simulation experiments, we

found that when using the standard interior-point algorithm,

the average number of iterations required for various number

TABLE II BB(L,M) Algorithm Summary

1 Initialize node LIST = empty, and f(up) =2 Insert the values of L and M3 Initialize search by adding Problem (6) to the node LIST4 Initialize tree level l = 0 (root node level)5 while (node LIST is not empty) do6 for Loopm = 1 : ml7 Pick problem from node LIST ( call it problem (P(m)))

8 Solve P(m)

z(m)

and f(z(m)

).9 iff(z(m))> f(up); prune node m and delete it

from the LIST

10 else iff(z(m))f(up), then11 if z(m) is all integer or satisfies ifcondition in (14),12 update f(up) =f(z(m)), and z=Q[z(m)]13 else keep node problem in the node LIST , end if14 end if15 end for loop16 if all nodes in level l are pruned, GOTO 2517 else Select the first M nodes that have the minimum

f(z(m)) in level l , and delete the rest, end if18 ifl = L, then z=Q[z(t)], t= argminp f(z(p)),19 empty node LIST, then GOTO 2520 else expand the selected M nodes by branching

each node prblem into two new sub-problems21 Push the new sub-problems into the node LIST22 Delete the original M nodes from the node LIST23 end if24 Set l = l + 125 end while

of antennas is 6, 7, 8, and 9, when the symbol mapping is

QPSK, 16QAM, 64QAM, and 256QAM, respectively. In this

work, we further reduce the number of iterations to 2, 4, 5, and

6 without major loss in performance. The idea is as follows:

since the QP detector approximates the continuous solutionprovided by the interior-point algorithm, an early termination

to the interior-point algorithm can speed up the convergence

to the integer solution. The early termination, which is done

before applying quantization step in (7), can be achieved by

relaxing the tolerance constraints of the convergence.

The second algorithm requires more computations than the

first algorithm, due to the presence of the second stage of QP.

Fortunately, the problem size of the second QP is much smaller

than the first, especially for medium to high SNR and when

the parameter is optimized. This makes the computational

complexity of Algorithms I and II is nearly the same when

the number of antennas becomes large. The interior-point

algorithm in the second stage requires complexity in the orderof O(n(|J |)3). Therefore, the total complexity of Algorithm IIis in the order of O(nN3t + n(|J |)3).

Finally, the proposed controlled-size BB algorithm needs

more computations compared to the first two algorithms

because of the computations needed in every node of the

search tree. Thus, the total complexity can be in the order

of O(NvnN3t) per received vector, where Nv is the number of

visited nodes in the proposed BB search tree. In large-scale

MIMO systems, n Nt and Nv is a function of both L and Mvalues of the tree (approximately, from simulations, Nv LMat low SNR, whereas Nv LM at high SNR). Therefore,






6

BB(L,M) requires nearly Nv-times the complexity of 2QP.

For various QAM modulations, the complexity of the

proposed algorithms does not change significantly. In fact,

the small variation in complexity is due to the difference

in the number of interior-point iterations required for each

modulation case. For instance, the average number of interior-

point iterations required by 256QAM modulation is about

3 times higher than that of QPSK modulation. This is an

important advantage for the QP-based detectors compared to

other algorithms in the literature of large-scale MIMO, such

as RTS, R3TS [18], and Fixed Complexity SD [14], which

require a large variation in complexity when the modulation

order changes from low to high (e.g. it is in the order of 100

times between QPSK and 64QAM for R3TS [18], and more

than that for FSD).

As shown in [15], the complexity per received vector of

MMSE-LAS is in the order ofO (N3t) + O(N3t); one O (N

3t) due

to the MMSE initial vector, and one O(N3t) due to the LAS

procedures. Therefore, the extra complexity needed by QP and

2QP over MMSE-LAS arises from the number of interior-point

iterations n, of the QP detector. Moreover, BB(L,M) requiresapproximately nNv-times the complexity of MMSE-LAS.

IV. SIMULATION RESULTS

In this section, we show simulation results for an uncoded

and a coded large-scale MIMO system in a block flat fading

channel with Nt = Nr for various QAM levels, assuming

perfect knowledge of channel state information at the receiver.

We refer to our proposed algorithms as QP for Algorithm

I, 2QP for Algorithm II, and BB(L,M) for Algorithm III.

We compare our proposed algorithms with other detectors

including MMSE, MMSE-OSIC [10], MMSE-LAS [15], MIV-

LAS [16], and RTS [17]. MIV-LAS is a LAS algorithm that

uses three initial input vectors (matched filter (MF), zeroforcing (ZF), and MMSE). Since the performance gain of a

multiple symbol update LAS algorithm [7] over MMSE-LAS

is small, we limit our comparison to MIV-LAS and MMSE-

LAS only. For fair comparison between various detection

techniques, all implementations are done using the real system

model shown in (3).

A. Optimizing, and the Number of Iterationsn

Figs. 2a and 2b demonstrate, as an example with the QPSK

modulation, that the choice of can significantly improve the

performance of the 2QP detector over the conventional QP

detector. In this example of 3232 MIMO, it can be saidthat the value of between 0.25 and 0.3 provides the best

performance over other values. For instance, when = 0.25,

2QP has a 2 dB improvement over QP at 103 BER. The

problem size of the second stage of the 2QP detector decreases

as the value of increases (see Fig. 2b), especially at high

SNR, and in general it is far below the size of the first

stage, which is 2Nt. This makes the complexity of the 2QP

detector close to the QP detector. For example, in the QPSK

case with Nt = 32 and = 0.25 at 103 BER, the average

size of the second stage of 2QP is 6 compared to 64 in the

first stage. For various QAM modulations, SNRs, and Nt,

Fig. 2c demonstrates that the value of = 0.25 is a good

optimized value, which also corresponds to the hueristic value

of = max/2. Thus, it is used in the rest of the paper.

As we mentioned in section III-E, the main computational

burden in the QP detector comes from the interior-point

solver. We proposed to reduce its computations by forcing

the algorithm to perform early termination, thus reducing the

number of iterations, n. We performed simulation experiments

using both QP and 2QP detectors for QPSK and 16 QAM

modulations with various interior-point iterations. Figs. 3a

and 3b show that 2 and 4 iterations for QPSK and 16QAM

modulations, respectively, are the minimum numbers that

guarantee no major loss in BER performance. The same

reduction procedures were done for 64QAM and 256QAM

and the minimum number of iterations was found to be 5

for 64QAM and 6 for 256QAM. The same idea is used to

optimize the value of in the BB(L,M) algorithm for various

modulation levels, and we found that = 0.01 for QPSK,

= 0.001 for 16QAM, and = 0.0001 for both 64QAM and

256QAM. These optimized number of iterations and will be

used in the rest of the simulation experiments.

B. Uncoded BER Performance vs. SNR

We choose a relatively large number of antennas, such

as Nt = Nr = 32, to demonstrate the performance of our

techniques. In Fig. 4, we present the average uncoded BER

performance for32 32MIMO with QPSK, 16QAM, 64QAM,and 256QAM modulations. Fig. 4 shows that both 2QP and

BB(L,M) algorithms improve the performance of the QP

detector at all displayed SNRs and at all QAM modulations.

When comparing 2QP with BB(L,M), say BB(16,2), as in Fig.

4a, 2QP performs better than BB(16,2) in QPSK with a 0.5 dB

improvement at 103 BER and with even lower complexity.

On the other hand, 2QP steadily becomes worse than BB(16,2)as the modulation order increases (see Figs 4b,c,d).

A more detailed simulation of the BB(L,M) algorithm

is shown in Fig. 5 for 16QAM as an example. It shows

that as L increases, the BER performance increases. For

instance, BB(4,4) outperforms BB(2,4), and BB(8,4) outper-

forms BB(4,4). From the same figure, it can be observed

that the diversity of the system increases with increasing L.

Increasing the width of the BB tree can also improve the

performance, such as the case of BB(16,4) over BB(16,2);

however, in some cases extending the width of BB(L,M)

does not provide improved performance, it only adds more

complexity, as shown in the same figure with the cases of

BB(16,4) and BB(16,6). Note that in this paper we did notfocus on finding the optimum values of L and M, we only

show that some pairs can be chosen as good suggestions

to demonstrate how the algorithm works, such as BB(16,2),

BB(4,4), but for large Nt, especially with higher QAM levels,

it is enough to pick L=Nt/2, and M=2 to outperform the other

existing algorithms.

Fig. 4 shows that the advantages of the QP-based detectors

come to an effect when higher order modulations are used.

From QPSK simulation shown in Fig. 4a, RTS outperforms all

of our proposed techniques, and also MMSE-LAS and MIV-

LAS outperform QP and BB(8,4) at certain SNRs. While on






7

6 8 10 12 14 15

104

102

Average received SNR (dB)

BER

QPSK MIMO 32x32

QP

2QP (=0.1)

2QP (=0.4)

2QP (=0.2)

2QP (=0.3)

2QP (=0.25)

(a)

0 2 4 6 8 10 12 14 150

5

10

15

20

25


Secondstageproblems

ize

QPSK 32x32 MIMO

=0.1

=0.2

=0.25

=0.3

=0.4

(b)

0.1 0.2 0.3 0.4 0.5

106

104

102

BER

Nt = 32, SNR = 21 dB

Nt = 64, SNR = 21 dB

Nt = 64, SNR = 15 dB

Nt = 32, SNR = 15 dB

Nt = 20, SNR = 15 dB

Nt = 64, SNR = 10 dB

Nt =32, SNR = 39 dB

Nt = 32, SNR = 49 dB

Blue color : 16

QAM

Black color:

QPSK

Red color: 256

QAM

(c)

Fig. 2 A Two-stage QP detector (a) QPSK BER performance (b) Problem size of the second stage (c) QAM BER performance vs.

5 10 15 20 25

104

102

100


BER

32x32 MIMO, QP Detector

Standard IP algorithm

Avg. IP iter =2

Avg. IP iter =1

Standard IP algorithmAvg. IP iter =4

Avg. IP iter =3

16 QAM

QPSK

QP Detector

(a)

5 10 15 20 25

104

102

100


BER

32x32 MIMO, 2 QP algorithm


Avg.IP iter=2

Avg.IP iter=1


Avg.IP iter=4

Avg.IP iter=3

16 QAM

Twostage QP

with =0.25

QPSK

(b)

Fig. 3 The effect of reducing interior-point iters. on the BER performance in a 32 32 MIMO system (a) QP Detector (b) Two-stage QPdetector. Standard IP is the standard interior-point algorithm

the other hand, from Figs. 4b, c, d where the modulation level

increases, QP, 2QP, and BB(L,M) steadily become superior to

RTS and LAS at all displayed SNRs. For example, in Fig.

4d, the QP detector, which provides an upper bound BER to2QP and BB(L,M), provides 5 dB improvements over MMSE-

LAS and 3 dB improvements over RTS at 102 BER. The

performance of RTS was improved using a hybrid of RTS and

Belief Propagation (RTS-BP) in [40], but this only achieved

a 1.6 dB improvement at 103 BER with 16QAM (see Fig.

3 in [40]), while our algorithms 2QP and BB(32,4) provide

improvements of 2 dB and 3 dB over RTS, respectively. It is

worth noting that the performance of our proposed algorithms

can be further improved when combined with the LAS or RTS

algorithms, by making the starting initial vector of LAS or

RTS to be the vector results from QP, 2QP, or BB(L,M). The

simulation results for this claim are not extensively shown

here, but two examples for QP with LAS using QPSK, andBB(32,4) with LAS using 16QAM are shown in Figs. 4a and

5, respectively.

Figs. 6a, b, c, d present a sample of complexity computa-

tions in terms of the average number of real operations versus

Nt measured at relatively low SNR and relatively high SNR

for both 16QAM and 256QAM. The important observations

from these figures are as follows: (i) The complexity of QP

and 2QP are almost similar with the advantage of 2QP for its

superior performance. (ii) There is no significant increase in

the computational complexity of the QP and 2QP detectors

over the MMSE-LAS detector; however, their performance

is substantially improved, especially at higher QAM modu-

lations. For example, at 256QAM 32 32 MIMO, 2QP has a 7dB improvement over MMSE-LAS (see Fig. 4-d), while it onlyrequires about double the computations of MMSE-LAS. (iii)

At 256QAM modulation, 2QP requires fewer computations

than RTS, with even better performance (see Fig. 4-d). (iv)

At fixed Nt = Nr, complexity of QP, 2QP, and BB(L,M) does

not change significantly from 16QAM to 256QAM. (v) At

relatively high SNR, the difference in complexity between

BB(4,4), BB(16,2) and QP,2QP is small while at relatively

low SNR the difference is clearly noticeable. (vi) At low SNR

and low order modulation, such as 16QAM, RTS requires

less computations than BB(L,M); however at higher SNR and

higher modulation order, such as 256QAM, BB(L,M) requires

less computations. This is due to the effect of pruning rule of

(14) which becomes clear at high SNR. (vii) Even though thecomplexity of RTS is close to that of BB(4,4) and BB(16,2)

at 256QAM with low SNR, the BER performance of BB(4,4)

and BB(16,2) is significantly outperforming RTS.

C. Uncoded BER Performance vs.Nt

In Figs. 7, 8, and 9, we plot an uncoded BER performance

as a function of Nt = Nr, for various detectors at an average

received SNR of 15 dB, 26 dB, and 39 dB for QPSK, 16QAM

and 256QAM, respectively. We compare the proposed algo-

rithms against MMSE-LAS, RTS, MMSE-OSIC, and QRDM.






8

6 8 10 12 14 1610

6

105

104

103

102

101

100

Average received SNR

BER

QPSK 32x32 MI MO

MMSE

MMSELAS

3 MIVLAS

QP

BB(8,4)

BB(16,2)

2QP (=0.25)

QPLAS

RTS

SISOAWGN

(a)

12 14 16 18 20 22 24 26 28

104

103

102

101

100

Avgerage received SNR (dB)

BER

16QAM 32x3 2 MIMO

MMSE

MMSELAS

3 MIVLAS

RTS

QP

BB(4,4)

BB(16,2)2QP (=0.25)

BB(32,4)

SISOAWGN

(b)

20 25 30 35 40

104

103

102

101

100

Average received SNR, dB

BER

64QAM MIMO 32x32 BB

MMSE

MMSELAS

RTSQP

2QP (=0.25)

BB(16,2)

BB(32,8)

SISOAWGN

(c)

30 35 40 45

104

103

102

101

100


BER

256 QAM 32x32 MIMO

MMSE

MMSELAS

3 MIVLAS

RTS

QP

2QP (=0.25)

BB(4,4)

BB(16,2)

BB(32,8)

(d)

Fig. 4 Uncoded BER performance of a 32 32 MIMO (a) QPSK (b) 16QAM (c) 64QAM (d) 256QAM

16 18 20 22 24 26 2810

6

105

104

103

102

101

100

BER

16QAM 32x32 MIMO


MMSE

QP

BB(2,4)

BB(4,4)

BB(8,4)

BB(16,2)BB(16,4)

BB(16,6)

BB(16,8)

BB(32,4)

BB(32,4)LAS

SISOAWGN

Fig. 5 16QAM BER performance using BB(L,M). Improvemnt isclear as the move to a deeper level

MF and MMSE are also plotted for reference.

In the case of QPSK, Fig. 7 shows that MMSE-LAS

provides better performance than QP and BB(4,4) at Nt 30andNt 40, respectively, while it is completely inferior to 2QPat all displayed Nt. BB(L,M) can outperform MMSE-LAS if

more levels are considered in the BB(L,M) search tree, such as

the case of BB(16,2). Similarly, RTS outperforms QP, BB(4,4)

and 2QP at all considered Nt; however, at higher values of L,

such as 16, RTS is inferior to BB(L,M) when Nt < 20. On the

other hand, as we go for higher QAMs (see Figs. 8, and 9),

our algorithms clearly outperform LAS and RTS algorithms.

An interesting result regarding the 2QP algorithm, across

various QAM modulations, is that although it requires lower

complexity than BB(L,M), it has superior performance in some

ranges of Nt. For example, in QPSK, it outperforms BB(4,4)and BB(16,2) at Nt >10 and Nt > 28, respectively. At higher

QAM modulations, the value of Nt at which 2QP starts to

outperform BB(L,M) is increased (see Figs. 8 and 9).

We observe a flooring behavior with respect to BB(L,M)

performance. This is due to the fact that while we increase

Nt, we keep the same depth, L, which is not enough to reduce

more errors. This effect can be reduced if L is adaptively

increasing with increasing Nt. Fig. 7 shows that this effect

is reduced when BB(16,2) is replaced by BB(2Nt,2).

MMSE-OSIC performs well only at smaller Nt; using

QPSK, it performs better than QP at Nt 12; using 16QAM,it performs better than QP and 2QP at Nt

16; using

256QAM, interestingly, it performs better than QP, 2QP, andBB(4,4) at Nt 45; however, it requires more computations.In general, MMSE-OSIC starts to exhibit a high error floor as

Nt increases, which is in line with the results shown in [15].

The reduced complexity search tree algorithms that are

studied in conventional MIMO, such as Fixed SD (FSD)

[14], K-best SD, and QRDM, demonstrate poor performance

in large-scale MIMO systems [41]. We present here, as an

example, the performance of the QRDM algorithm for both

QPSK with M=4 and 16QAM with M=16. It can be seen that

QRDM with M equals the QAM constellation size can provide






9

10 20 30 40 50 6010

4

105

106

107

108

109

N

Avg.

#of

ArithmaticOperations.

computational complexity for 16QAM at 19 dB SNR

MMSE

MMSELAS

QP

2QP

BB(4,4)

BB(16,2)

RTS

(a)

10 20 30 40 50 6010

4

105

106

107

108

N

Avg.

#of

ArithmaticOperations.

Computational Complexity for 16QAM at 26dB SNR

MMSE

MMSELAS

QP

2QP

BB(4,4)

BB(16,2)

RTS

(b)

10 20 30 40 50 6010

4

105

106

107

108

109

N

Avg.

#ofArithmaticOperations.

Computational Complexity for 256QAM at 35 dB

MMSE

MMSELAS

QP

2QP

BB(4,4)

BB(16,2)

RTS

(c)

10 20 30 40 50 6010

4

105

106

107

108

N

Computational Complexity for 256 at 45 dB

Avg.

#ofArithmatic

Operations.

MMSE

MMSELAS

QP

2QP

BB(4,4)

RTS

BB(16,2)

(d)

Fig. 6 Avg. Complexity in terms of # of real operations vs. Nt (a) 16QAM at 19dB SNR (b) 16QAM at 26dB SNR (c) 256QAM at 35dBSNR (d) 256QAM at 45dB SNR

the best performance atNt < 10, which is the ML performance;

however, as Nt gets higher, the BER performance deteriorates

due to the fact that the QRDM reduced search space becomes

smaller than the ML search space.

D. Turbo Coded BER Performance

In this subsection, we evaluate the turbo coded BER per-

formance of the QP-based detectors compared to MMSE,

MMSE-LAS, and RTS detectors. A 32 32 MIMO systemis examined with both 16QAM and 256QAM, and with a

rate-1/3 turbo decoder of 10 iterations. A hard decision1output valued vector from all detectors is fed as an input to theturbo decoder. Performance can be improved if a soft decision

output valued vector is fed instead. Fig. 10 demonstrates that

similar to uncoded BER performance, the turbo coded BER

performance of the QP-based detectors outperform RTS and

LAS detectors as the modulation order increases. In 16QAM

turbo coded performance, RTS outperforms QP and 2QP with

about 1.5 and 0.5 dB, respectively, at 102 BER, while in

256QAM, QP and 2QP outperform RTS with 4 and 4.5 dB,

respectively. The Nt = Nr = 32 with 16QAM and rate-1/3

turbo coded corresponds to 32 1/3 4 = 42.67 bit/sec/Hzspectral efficiency. It becomes85.33bit/sec/Hz when 256QAM

is used. The theoretical minimum SNR required to achieve this

capacity is shown in Fig. 10.

E. Effect of MIMO Spatial Correlation

In this section, we investigate the performance of the 2QP

detector in a more realistic MIMO channel. We adopt a

spatially correlated MIMO fading model using the Kronecker

product model [42],[43], where the complex MIMO channel

matrix can be written as:

H= R1/2r Aiid R

1/2t (15)

where Rr and Rt are the correlation matrices for the re-ceive antennas and transmit antennas, respectively, while Aiidrepresents an i.i.d. (independent and identically distributed)

Rayleigh fading channel matrix. This model assumes that

the fading statistics of the transmit and receive arrays are

independent. In this paper, the correlation matrices of the

signals at both the transmit and receive sides are computed

based on the distance between antenna elements [44], [45].

Also, this model does not take into account the structure of

the scattering environment between transmitter and receiver.

The BER performance of the 2QP detector is only consid-

ered here for illustration. In this simulation, we consider a 16






10

10 20 30 40 50 6010

6

105

104

103

102

101

100

# of Antennas

BER

QPSK MIMO , SNR = 15 dB

MF

MMSE

QRDM, M=4

MMSEOSIC

MMSELAS

QP

BB(4,4)

2QP (=0.25)

BB(16,2)

BB(2Nt ,2)

RTS

Fig. 7 QPSK BER performance vs. Nt at SNR=15 dB.

10 20 30 40 50 6010

6

105

10

4

103

102

101

100

# of Antennas

BER

16QAM MIMO, SNR =26 dB

MF

MMSE

MMSELAS

MMSEOSIC

QRDM, M=16

QP

BB(4,4)

BB(16,2)

2QP (=0.25)

RTS

Fig. 8 16QAM BER performance vs. Nt at SNR=26 dB

20 40 60 80 10010

4

103

102

101

100

# of Antennas

BE

R

256QAM MIMO ,SNR =39 dB

MF

MMSE

MMSEOSIC

MMSELAS

2QP (=0.25)

QP

BB(4,4)

BB(16,2)

RTS

Fig. 9 256QAM BER performance vs. Nt at SNR=39 dB

16MIMO system using 16QAM modulation for both iid fading

and spatially correlated fading. The distances between antenna

elements is taken to be 0.4 (mild correlation scenario). The

effect of spatial correlation is examined for both uncoded andrate-1/3 turbo coded BER performance. Fig. 11 shows that

there is a clear performance loss when using correlated fading.

For instance, at 102 uncoded BER performance, 2QP with

correlated fading experiences degradation by 6 dB compared to

iid fading, while with the turbo coded BER performance, 2QP

with correlated fading exhibits degradation of 4 dB compared

to iid fading. To alleviate the degradation from correlation,

we increased the dimension of the receive array, similar to

the work in [7]. Fig. 11 shows that increasing the number of

receive antennas by just one (i.e. Nr = 17) can dramatically

alleviate this degradation. For instance, with 16 17 scenario,

5 10 15 20 25 30 35 4010

4

103

102

101

100

Avergae received SNR, dB

BER

MIMO 32x32 1/3 rate Turbo Coded

MMSE

MMSELAS

QP

2QP

RTS

3.25

dB9.75

dB

16QAM 256QAM

Fig. 10 16QAM and 256QAM turbo coded BER performance witha rate-1/3 32 32 MIMO system

the difference in performance at 102 uncoded BER is 1 dB

compared to 6 dB in 1616scenario, whereas with turbo codedperformance the difference reduces to 0.6 dB.

10 15 20 25 30 35 40

104

103

102

101

100


BER

16QAM MIMO 2QP Algorithm

Uncoded 16x16(spatial corr. fading)

Rate1/3 turbo coded 16x16(spatial corr. fading)

Uncoded 16x16(iid fading)

Rate1/3 turbo coded 16x16 (iid fading)

Uncoded 16x17 (spatial corr. fading)

Rate1/3 turbo coded 16x17 (spatial corr. fading)

Uncoded 16x17 (iid fading)

Rate1/3 turbo coded 16x17 (iid fading)

Uncoded SISO AWGN

Fig. 11 Uncoded/coded BER performance of a 2QP detector in i.i.d.

fading as well as in correlated MIMO fading for both 16 16 and16 17 cases

V. ITERATIVE D ETECTION ANDD ECODING USING AQ P

DETECTOR

In this section, the aim is to develop a turbo equalization-

type receiver using a QP detector. In the previous sections,

the performance of QP-based detectors were studied based on

uncoded/coded BER. In order to improve the performance of

such detectors in a low SNR regime, a turbo equalization-

type receiver can be used, in which a detector and a decoder

exchange soft information between each others in an iterative

manner (called iterative detection and decoding (IDD)) untila stopping criteria is reached [46]. There are two challenges

in using QP in an IDD setting. First, how to incorporate a

priori information provided by the channel decoder, in the

form of Log-Likelihood Ratio (LLR), into the QP optimization

problem (6). Second, how to make the QP detector provide

soft information, in the form of LLR, so that it can be used

as a priori information to the soft-input soft-output channel

decoder. Addressing these challenges with implementation and

performance study will be presented in this section for large-

scale MIMO in a spatial multiplexing setup. We use the same

technique in [47] to incorporate a prioriinformation into the QP






11

optimization problem; however, we further propose to reduce

the number of optimization problems needed to compute LLR

using local neighborhood solutions. Model (3) is used for the

analysis, with the focus on the QPSK modulation. A receiver

block diagram with turbo equalization is shown in Fig. 12.

Fig. 12 Receiver side for MIMO IDD with a QP detector

Consider QPSK symbols, which are mapped from coded and

interleaved bits, to be transmitted over a MIMO flat fadingchannel. At the receiver side the complex channel model is

transformed to a real equivalent one, as shown in section

III-A. The real part of the complex data symbols is mapped to

[x1, . . . , xNt ], and the imaginary part of these symbols is mapped

to [xNt+1, . . . , x2Nt ], where bit xi {1, +1} , i = 1, . . . , 2Nt.Therefore, the a posteriori LLR for bit xi is:

Lpost(xi) = lnp(xi = +1|y, H)p(xi = 1|y, H)

, i= 1, . . . , 2Nt (16)

Using Bayes theorem, Eq. (16) can be written as:

Lpost(xi) = ln

xx+1i

p(y|x, H)P(x) ln

xx1i

p(y|x, H)P(x)(17)

where x1i

is the set of all possible vectors of x satisfying

xi = 1. P(x) is the vector of a priori probabilities, which in thecase of turbo equalization, is delivered by the outer channel

decoder in the form of an a priori LLR ratio, as follows:

La(xi) = lnp(xi = +1)

p(xi = 1), i= 1, . . . , 2Nt (18)

If the noise in the system is white Gaussian, the prob-

ability density function p(y|x, H) can be represented by122

exp||y Hx||2/22. This can be used in (17), and

with the aid of max-log approximation (ln(

iexp(i)) maxi {i}) [48], Eq. (17) can be simplified to:

Lpost(xi) minx

x1i

122

||y Hx||2 ln[P(x)]min

xx+1i

1

22||y Hx||2 ln[P(x)]

(19)

In order to find the relation between the vector ofa prioriprob-

ability P(x) and La, we follow the same work and assumptions

in [46] and [48]. Thus, (19) can be written as:

Lpost(xi) minx1xi

1

22||y Hx||2 1

2xTLa

minx+1xi

1

22||y Hx||2 1

2xTLa

(20)

where x = [x1, . . . , x2Nt ]T is the vector of all interleaved bits,

and La = [La(1), . . . , La(2Nt)]T is the vector of LLR ratios of

all interleaved bits. Now lets focus on the first term on the

right side of (20) and reformulate it to a QP problem, we get:

minx1xi

1

22||y Hx||2 1

2xTLa

= min

z0zi

1

2zTQz + bTz (21)

where = {0, 1}2Nt

, Q = H

T

H, z =

x+ 1

2 , and b =1

2HT(y+ H1)

2

4 La. The result of (21) can be applied

to both terms of (20), and with relaxing integer constraints,

Equation (20) becomes:

Lpost(xi) min0z1,zi=0

{ 12zTQz + bTz}

min0z1,zi=1

{ 12zTQz + bTz}

(22)

Equation (22) shows that to evaluate LLR per one bit, it is

required to solve two QP problems of length 2Nt1each. TheLLR computations for these 2Nt bits require solving a total

of 4Nt QP problems, which are large computations. Thus, as

in [47], we, first solve the following problem without any bitconstraints,

z= Q[argmin0z1

1

2zTQz + bTz] (23)

and second, we solve the same problem again 2Nt-times with

bit constraints as follows:

minz

1

2zTQz + bTz

st 0 z 1, zi = xor(zi, 1), i= 1, . . . , 2Nt

(24)

The cost function values that result from the minimization

problems in (23) and (24) are substituted back in (22) to

find Lpost(xi). This idea reduces the number of problems to

be solved to 2Nt+ 1.

As shown in [46], the exchange of extrinsic information

between the channel detector and the channel decoder is more

effective in improving performance of the turbo equalization

receiver. Thus, the required extrinsic information can be cal-

culated, as follows:

LE(xi) = Lpost(xi) La(xi) (25)

Although the above technique may suit the conventional

small MIMO systems because the size of the QP is small,

it is not computationally efficient for the large-scale MIMO

system. For instance, if Nt = 64 with QPSK modulation, 129

QP optimization problems need to be computed to evaluate

LLR for 128 bits (i.e. using (23) and (24)). Therefore, in thissection, we propose a simple algorithm that solves only one

optimization problem and then finds the neighborhood set of

solutions to the vector z to compute LLR per bit. It can be

summarized in the following steps:

1) Solve the QP problem (23) one time to find z.

2) Then, instead of solving problem (24) 2Nt-times, construct

the closest neighborhood solutions of z, as described below.

3) The list of solution vectors provided by z and its neighbor-

hood is used in (20) or (22) to compute Lpost(x).

The construction of a neighborhood solution can be done

according to the following way: Let the alphabet set for QPSK






12

modulation = {0, 1}, so the symbol neighborhood of (0)(i.e.N(0)) is{1}, andN(1) is{0}. The vector neighborhoodto z is the vector that differs from z in just one coordi-

nate; hence, there will be 2Nt neighbor vectors to z. Let

the neighbor vectors be znb = [z(1), . . . , z(j), . . . , z(2Nt)], where

z(j) = [z(j)1 , . . . , z

(j)i , . . . , z

(j)2Nt

]T, i, j = 1, . . . , 2Nt, and

z

(j)

i = zi for i

=j

N(zi) for i= j (26)The simulation of this section is implemented using a soft-in

soft-out 1/2 rate convolutional channel decoder that is based on

the BCJR algorithm. Note that in the transmit side a convolu-

tional encoder (rate R = 1/2, generator polynomials [133 171],

and constraint length 7) is used with a random interleaver and

a QPSK large-scale MIMO system with Nt = Nr = 16 and

64. The number of iterations represents the number of times

the soft-input soft-output MIMO detector and the soft-input

soft-output channel decoder are used.

Fig. 13 demonstrates the BER performance of three itera-

tions of IDD when the soft-in soft-out QP detector is used. It

can be seen that as the number of iterations increases, a lowerBER is obtained for both cases of Nt = 16 and Nt = 64,

though the difference in performance between Nt = 16 and

Nt = 64 can be seen only at higher iteration numbers, such

as iteration 3. The uncoded and convolutionally coded cases

are also plotted in the same figure to point out the advantages

of IDD at low SNR. The coded performance of 16 16 and64 64 represents the case where a hard decision QP detectoris followed by a hard decision viterbi decoder. As expected,

the performance difference between the hard decision and soft

decision (represented by iteration number 1 of IDD) is about 2

dB. Note that in this figure, the large system behavior between

16

16and 64

64can be observed in both uncoded and coded

cases; however, in IDD, it can be observed at higher iterationnumbers.

The performance of our proposed technique for reducing

LLR computations is shown in Fig. 14, with Nt = 16 and

Nt = 64, where LLR is computed based on (23) with the

set of neighborhood solutions. This is compared to the case

where LLR is computed based on multiple QP computations

((23) and (24)). When Nt is relatively small, such as 16, the

performance of the two techniques become very close as the

number of iterations increases, such as the case of iteration 3

in Fig. 14a. Whereas, for relatively large Nt, such as 64, the

performance of the proposed technique is quite similar to the

multiple QP computation technique. It becomes even slightly

better at the third iteration, as shown in Fig. 14b. This may

be due to the large system effect that appears more clearly at

Nt= 64 because it combines QP technique with some sort of

LAS technique in computing LLR.

VI . CONCLUSION

This paper proposes low complexity detection algorithms

that are suitable for large-scale MIMO with higher QAM

modulations. The proposed algorithms are based on the QP

detector. They improve the performance of the conventional

QP detector with better trade-offs between complexity and

2 4 6 8 10 12 14 16 18 2010

6

105

104

103

102

101

100

Average Received SNR, dB

BER

QPSK MIMO QP 1/2 rate Conv. Coded

Uncoded 16x16Uncoded 64x64Coded, 16x16Coded, 64x64IDD , 16x16IDD , 64x64

iter# 2

iter# 3 iter# 1

Uncoded

Coded

Fig. 13 BER performance of IDD using a QP detector (LLRcomputations are based on (23) and (24))

performance. At high SNR and higher QAM modulations, the

proposed algorithms outperform LAS and RTS algorithms in

both coded and uncoded BER performance. At large Nt = Nr,

the 2QP algorithm is more suitable in terms of performanceand complexity than BB(L,M), while BB(L,M) provides better

performance at relatively small Nt = Nr. This paper also

demonstrated that QP-based detectors can be used for iterative

detection and decoding with low complexity.

REFERENCES

[1] A. Elghariani and M. Zoltowski, A quadratic programming-based

detector for large-scale mimo systems, accepted in IEEE Wireless

Communications and Networking Conference (WCNC), 2015.

[2] F. Boccardi, R. W. Heath Jr, A. Lozano, T. L. Marzetta, and P. Popovski,

Five disruptive technology directions for 5g, IEEE Communications

Magazine, vol. 52, no. 2, pp. 7480, 2014.

[3] G. J. Foschini and M. J. Gans, On limits of wireless communications ina fading environment when using multiple antennas, Wireless personal

communications, vol. 6, no. 3, pp. 311335, 1998.

[4] H. Bolcskei, Principles of mimo-ofdm wireless systems, Chapter in

CRC Handbook on Signal Processing for Communications, M. Ibnkahla,

Ed, 2004.

[5] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,

O. Edfors, and F. Tufvesson, Scaling up mimo: Opportunities and

challenges with very large arrays, IEEE Signal processing Magazine,

vol. 30, no. 1, pp. 4060, 2013.

[6] J. Hoydis, K. Hosseini, S. t. Brink, and M. Debbah, Making smart

use of excess antennas: Massive mimo, small cells, and tdd, Bell Labs

Technical Journal, vol. 18, no. 2, pp. 521, 2013.

[7] S. K. Mohammed, A. Zaki, A. Chockalingam, and B. S. Rajan, High-

rate space-time coded large-mimo systems: low-complexity detection

and channel estimation, IEEE Journal of selected Topics Signal pro-

cessing, vol. 3, no. 6, 2009.

[8] M. O. Damen, H. El Gamal, and G. Caire, On maximum-likelihood

detection and the search for the closest lattice point, IEEE Trans. Inf.

Theory, vol. 49, no. 10, pp. 23892402, 2003.

[9] M. Hansen, B. Hassibi, A. G. Dimakis, and W. Xu, Near-optimal

detection in mimo systems using gibbs sampling, in Global Telecom-

munications Conference, GLOBECOM. IEEE, 2009, pp. 16.

[10] P. W. Wolniansky, G. J. Foschini, G. Golden, and R. A. Valenzuela,

V-blast: An architecture for realizing very high data rates over the

rich-scattering wireless channel, in URSI International Symposium on

Signals, Systems, and Electronics, ISSSE 98. IEEE, 1998, pp. 295300.

[11] D. W. Waters and J. R. Barry, The chase family of detection algorithms

for multiple-input multiple-output channels, in Global Telecommunica-

tions Conference, GLOBECOM04, vol. 4. IEEE, 2004, pp. 26352639.






13

2 3 4 5 6 7 8

104

103

102

101

100


BER

QPSK 16x16 MIMO IDD

LLR using neighbor solutions

LLR using (23) and(24)

iter# 1

iter# 2iter# 3

(a)

2 3 4 5 6 7 8

104

103

102

101

100


BER

QPSK 64x64 MIMO IDD

LLR using neighbor solutionsLLR using (23) and (24)

iter# 2

iter# 3

iter# 1

(b)

Fig. 14 IDD BER performance with reduced LLR computation (a) 16 16 (b)64 64

[12] J. Yue, K. J. Kim, J. D. Gibson, and R. A. Iltis, Channel estimation and

data detection for mimo-ofdm systems, in Global Telecommunications

Conference, GLOBECOM03, vol. 2. IEEE, 2003, pp. 581585.[13] N. Srinidhi, S. K. Mohammed, A. Chockalingam, and B. S. Rajan,

Near-ml signal detection in large-dimension linear vector channels

using reactive tabu search, arXiv preprint arXiv:0911.4640, 2009.

[14] L. G. Barbero and J. S. Thompson, Fixing the complexity of the sphere

decoder for mimo detection, IEEE Trans. Wireless Communications,

vol. 7, no. 6, pp. 21312142, 2008.

[15] K. Vishnu Vardhan, S. K. Mohammed, A. Chockalingam, and B. Sun-

dar Rajan, A low-complexity detector for large mimo systems and mul-

ticarrier cdma systems,IEEE Journal Selected Area on Communication ,

vol. 26, no. 3, 2008.

[16] P. Li and R. D. Murch, Multiple output selection-las algorithm in large

mimo systems, IEEE Communications Letters, vol. 14, no. 5, pp. 399

401, 2010.

[17] B. S. Rajan, S. K. Mohammed, A. Chockalingam, and N. Srinidhi,

Low-complexity near-ml decoding of large non-orthogonal stbcs usingreactive tabu search, in International Symposium Info. Theory. IEEE,

2009, pp. 19931997.

[18] T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, Random-

restart reactive tabu search algorithm for detection in large-mimo sys-

tems, IEEE Communications Letters, vol. 14, no. 12, pp. 11071109,

2010.

[19] F. A. Bhatti, S. A. Khan, S. Ur Rehman, and F. Rasool, Mimo ofdm

signal detection using quadratic programming, in 14th International

Multitopic Conference (INMIC). IEEE, 2011, pp. 323328.

[20] Y. Zhang, W. Lu, and T. Gulliver, Integer qp relaxation based algorithms

for ici reduction in ofdm systems, inCanadian Conference on Electrical

and Computer Engineering, CCECE. IEEE, 2007, pp. 184187.

[21] A. Mobasher, M. Taherzadeh, R. Sotirov, and A. K. Khandani, A near-

maximum-likelihood decoding algorithm for mimo systems based on

semi-definite programming, IEEE Trans. Info. Theory, vol. 53, no. 11,pp. 38693886, 2007.

[22] Z.-q. Luo, W.-k. Ma, A.-C. So, Y. Ye, and S. Zhang, Semidefinite

relaxation of quadratic optimization problems, IEEE Signal Processing

Magazine, vol. 27, no. 3, pp. 2034, 2010.

[23] W.-K. Ma, C.-C. Su, J. Jalden, T.-H. Chang, and C.-Y. Chi, The

equivalence of semidefinite relaxation mimo detectors for higher-order

qam, IEEE Journal of Selected Topics in Signal Processing, vol. 3,

no. 6, pp. 10381052, 2009.

[24] P. Li, R. C. de Lamare, and R. Fa, Multiple feedback successive

interference cancellation detection for multiuser mimo systems, IEEE

Trans. on Wireless Communications, vol. 10, no. 8, pp. 24342439, 2011.

[25] Z. Li, Y. Cai, and M. Ni, Low complexity mimo detection based on

branch and bound algorithm, in IEEE 18th International Symposium on

Personal, Indoor and Mobile Radio Communications (PIMRC). IEEE,

2007, pp. 15.

[26] A. Murugan, H. El Gamal, M. Damen, and G. Caire, A unified frame-work for tree search decoding: rediscovering the sequential decoder,

IEEE Trans. Info. Theory, vol. 52, no. 3, pp. 933953, 2006.

[27] A. Elghariani and M. D. Zoltowski, Branch and bound algorithm for

code spread ofdm, in Statistical Signal Processing Workshop (SSP).

IEEE, 2012, pp. 844847.

[28] A. Elghariani and M. Zoltowski, Branch and bound with m algorithm

for near optimal mimo detection with higher order qam constellation, in

MILITARY COMMUNICATIONS CONFERENCE (MILCOM). IEEE,

2012, pp. 15.

[29] G. J. Foschini, Layered space-time architecture for wireless commu-

nication in a fading environment when using multi-element antennas,

Bell labs technical journal, vol. 1, no. 2, pp. 4159, 1996.

[30] C. V. Rao, S. J. Wright, and J. B. Rawlings, Application of interior-point

methods to model predictive control,Journal of optimization theory and

applications, vol. 99, no. 3, pp. 723757, 1998.[31] J. Gondzio, Interior point methods 25 years later, EJOR, vol. 218,

no. 3, pp. 587601, 2012.

[32] E. Lawler and D. Wood, Branch-and-bound methods: A survey,

Operations research, pp. 699719, 1966.

[33] W. Zhang, Branch and bound search algorithm and their computational

complexity, Document of USC/Information Sciences Institute, May

1996., Tech. Rep., 1996.

[34] L.-L. Yang, Using multi-stage mmse detection to approach optimum

error performance in multiantenna mimo systems, in Vehicular Tech-

nology Conference Fall (VTC 2009-Fall), 2009 IEEE 70th. IEEE, 2009,

pp. 15.

[35] J. Clausen, Branch and bound algorithms-principles and examples,

Department of Computer Science, University of Copenhagen, pp. 130,

1999.

[36] T. Ibaraki, Theoretical comparisons of search strategies in branch-and-bound algorithms, International Journal of Parallel Programming,

vol. 5, no. 4, pp. 315344, 1976.

[37] J. Anderson and S. Mohan, Sequential coding algorithms: A survey and

cost analysis, IEEE Transactions on Communications, vol. 32, no. 2,

pp. 169176, 1984.

[38] J. Zhang and K. Kim, Near-capacity mimo multiuser precoding with

qrd-m algorithm, in Conference Record of the Thirty-Ninth Asilomar

Conference on Signals, Systems and Computers (Asilomar). IEEE,

2005, pp. 14981502.

[39] M. Lau, S. Yue, K. Ling, and J. Maciejowski, A comparison of interior

point and active set methods for fpga implementation of model predictive

control, in Proc. European Control Conference, 2009, pp. 156160.

[40] T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, A hybrid


14/14



14

rts-bp algorithm for improved detection of large-mimo m-qam signals,

in National Conference on Communications (NCC),.

[41] P. Svac, F. Meyer, E. Riegler, and F. Hlawatsch, Soft-heuristic detectors

for large mimo systems, IEEE Trans. Signal Processing, vol. 61, pp.

45734586, 2013.

[42] Y. S. Cho, J. Kim, W. Y. Yang, and C. G. Kang, MIMO-OFDM wireless

communications with MATLAB. JW&S, 2010.

[43] H. Ozcelik, M. Herdin, W. Weichselberger, J. Wallace, and E. Bonek,

Deficiencies of kroneckermimo radio channel model, Electronics

Letters, vol. 39, no. 16, pp. 12091210, 2003.[44] J. W. Wallace and M. A. Jensen, Modeling the indoor mimo wireless

channel, IEEE Transactions on Antennas and Propagation, vol. 50,

no. 5, pp. 591599, 2002.

[45] M. A. Jensen and J. W. Wallace, A review of antennas and propagation

for mimo wireless communications, IEEE Transactions on Antennas

and Propagation, vol. 52, no. 11, pp. 28102824, 2004.

[46] M. Tuchler, R. Koetter, and A. C. Singer, Turbo equalization: principles

and new results, IEEE Trans. Communications, vol. 50, no. 5, pp. 754

767, 2002.

[47] B. Steingrimsson, Z.-Q. Luo, and K. M. Wong, Soft quasi-maximum-

likelihood detection for multiple-antenna wireless channels, IEEE

Transactions on Signal Processing, vol. 51, no. 11, pp. 27102719, 2003.

[48] B. M. Hochwald and S. Ten Brink, Achieving near-capacity on

a multiple-antenna channel, IEEE Transactions on Communications,

vol. 51, no. 3, pp. 389399, 2003.

Ali Elghariani received both the B.S. and M.S. de-grees in Electrical and Electronic Engineering fromUniversity of Tripoli in 1999 and 2008, respectively,and the Ph.D. in Communications, Networking, andSignal Processing from the School of Electricaland Computer Engineering at Purdue University ofWest Lafayette in 2014. He joined industry forseveral years before he started his PhD. Currentlyhe is a lecturer at the Department of Electrical andElectronic Engineering, University of Tripoli, Libya.

During 2013 he was a system engineer intern withQualcomm, Inc. at San Diego. He was the recipient of IEEE MILCOMconference travel grant award in 2012. His current research interests aresignal detection and channel estimation in large-scale MIMO systems, symbolspreading OFDM systems, turbo equalization, and the application of quadraticprogramming optimization techniques in wireless communications.

Michael Zoltowskireceived both the B.S. and M.S.degrees in Electrical Engineering with highest hon-ors from Drexel University in 1983 and the Ph.D. inSystems Engineering from the University of Penn-sylvania in 1986. In Fall 1986, he joined the facultyof Purdue University where he currently holds anEndowed Chaired Professorship in Electrical andComputer Engineering. In this capacity, he was theRuth and Joel Spira Outstanding Teacher Award for1990-1991 and the 2001-2002 Wilfred Hesselberth

Award for Teaching Excellence, and the EngineeringDistance Education Award for 2012. In 2001, he was named a UniversityFaculty Scholar by Purdue University. On 25 September 2008, he becamethe Thomas J. and Wendy Engibous Professor of Electrical and ComputerEngineering, an Endowed Chair conferred by the Board of Trustees of PurdueUniversity. Prof. Zoltowski is a co-recipient of a 2014 IEEE Globecom BestPaper Award and a 21st Humantech Paper Award: Silver Prize sponsoredby Samsung. He is also the recipient of a 2002 Technical AchievementAward from the IEEE Signal Processing Society. In addition, he served asa 2003 Distinguished Lecturer for the IEEE Signal Processing Society. Heis a Fellow of IEEE. He is a recipient of the 2006 Distinguished AlumniAward from Drexel University. Prof. Zoltowski is a co-recipient of the IEEECommunications Society 2001 Leonard G. Abraham Prize Paper Award inthe Field of Communications Systems. He is also the recipient of the IEEESignal Processing Societys 1991 Paper Award, The Fred Ellersick MILCOMAward for Best Paper in the Unclassified Technical Program at the 1998IEEE Military Communications Conference, and a Best Paper Award at the

2000 IEEE International Symposium on Spread Spectrum Techniques andApplications. In addition, from 1998 to 2001, Dr. Zoltowski served as anelected Member-at-Large of the Board of Governors and Secretary of theIEEE Signal Processing Society. From 2003-2005, he served on the AwardsBoard of the IEEE Signal Processing Society and also served as the AreaEditor in charge of Feature Articles for the IEEE Signal Processing Magazine.Within the IEEE Signal Processing Society, he has been a member of theTechnical Committee for the Statistical Signal and Array Processing Area,the Technical Committee for DSP Education and the Technical Committeeon Signal Processing for Communications (SPCOM.) From 2003-2004, heserved as Vice-Chair of the Technical Committee on Sensor and Multichannel(SAM) Processing, and served as Chair for 2005-2006. He has served as anAssociate Editor for both the IEEE Transactions on Signal Processing and theIEEE Communications Letters. He was Technical Chair for the 2006 IEEESensor Array and Multichannel Workshop. He served as Vice-President forAwards & Membership for the IEEE Signal Processing Society, 2008-2010.

low complexity detection algorithms in large-scale mimo systems

Documents