low complexity detection algorithms in large-scale mimo systems

Upload: ayvidleog

Post on 27-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    1/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    1

    Low Complexity Detection Algorithms in Large-Scale MIMO

    Systems

    Ali Elghariani, Member, IEEEand Michael Zoltowski, Fellow, IEEE

    School of Electrical and Computer Engineering Purdue University, West Lafayette IN 47906

    Email: [email protected] and [email protected]

    In this contribution, we present low-complexity detectionalgorithms in large-scale MIMO systems where they achievesignificantly better bit error rate (BER) performance than knownheuristic algorithms in large-scale MIMO literature, such asLocal Ascent Search (LAS) and Reactive Tabu Search (RTS)algorithms, especially at higher-order modulations. The proposedtechniques are developed from the conventional Quadratic Pro-gramming (QP) detector. The first one is based on performingtwo stages of a QP detector with a novel combination of bothinterference cancellation and shadow area constraints of theconstellation. The second one is based on the Branch and Boundsearch tree algorithm. The efficacy of the proposed algorithms areinvestigated at various QAM modulations. Computer simulationsshow that the proposed algorithms outperform LAS and RTSalgorithms in both uncoded and turbo coded BER performance,especially at higher QAM levels, with no significant change incomplexity as the modulation level increases. Also, an extension ofthe QP detector for iterative detection and decoding is developedfor the case of QPSK using a low complexity approach.

    Index TermsLarge-scale MIMO, Quadratic Programming,Two-stage Quadratic Programming, Branch and Bound, Com-

    plexity, Iterative Detection and Decoding.

    I. INTRODUCTION

    A large-scale multi-input multi-output (MIMO) (or a so-

    called Massive MIMO) system in which a large number of

    antennas is used at the transmitter and/or receiver is one

    of the main components of the future 5G wireless commu-

    nication systems [2]. The capacity of this MIMO system

    can be scaled up by installing more antennas at the trans-

    mitter and/or receiver to fulfill the demands for high data

    rate applications [3], [4], [5]. The interest in these systems

    poses challenges in several design aspects, such as channelestimation, antenna correlation, hardware implementation, and

    detection complexity [6],[5]. In particular, a critical design

    challenge in a large-scale MIMO system is to design a reliable

    and computationally efficient detector even if the number of

    antennas grows very large or the modulation order increases.

    There have been many linear detectors and near-Maximum

    Likelihood detectors proposed in the literature of conventional

    MIMO systems; however, they become noncompetitive when

    A preliminary version of this work was presented in IEEE WCNC 2015 [1],in which only one algorithm is considered. In this paper, further algorithmsare considered with extensive simulation results.

    used to serve large-scale systems. One reason is because their

    computational complexity becomes exponential, such as the

    case of Sphere Decoding (SD) and its variants [7], [8], [9].

    Another reason is because the performance worsens as the

    number of antennas increases, such as the cases of minimum

    mean square error (MMSE), MMSE with ordered successive

    interference cancellation (MMSE-OSIC) [10], Chase [11], QRDecomposition combined with an M algorithm (QRDM) [12],

    and Fixed Sphere Decoding (FSD) [13], [14].

    Various algorithms have been presented in the literature of

    large-scale MIMO that exhibit a large-system behavior where

    the BER performance improves as the number of antennas

    increases, such as the family of Likelihood Ascent Search

    (LAS) detectors and the reactive tabu search (RTS) detectors.

    LAS detectors have been proposed in [15], [16], [7] for

    large-scale MIMO systems. They are based on successively

    searching the local neighborhood of some good initial vectors,

    such as MMSE vector. They show near-single antenna AWGN

    performance, especially when hundreds of antennas are used,

    with an average per-received vector complexity of O(N3t),

    where Nt =Nr and Nt and Nr denote the number of trans-mit and receive antennas, respectively. LAS detectors have

    been generalized for higher order modulations; however, they

    still suffer from performance deterioration as the modulation

    order increases. They also require a very large number of

    antennas, in the order of hundreds, to achieve near-single

    antenna unfaded performance. This number increases as the

    modulation level increases [16]. The RTS algorithm has also

    been proposed for large-scale MIMO systems with various

    QAM modulations in [17], [13], [18]. It is a heuristic-based

    combinatorial optimization technique that forces the search to

    visit several neighborhood solutions and then choose the ML

    solution among them. It achieves near-ML performance withmuch lower complexity compared to ML and SD, especially in

    low-order modulations; however, its computational complexity

    scales up significantly with increasing QAM levels accompa-

    nied by performance deterioration.

    In this article, three potential algorithms are proposed for a

    large-scale MIMO detection problem. They can provide near-

    single antenna AWGN performance with only tens of antennas

    and with nearly constant average complexity over all modu-

    lation orders. The first algorithm is simply the conventional

    quadratic programming detector in which the ML problem is

    reformulated using a quadratic optimization problem. We show

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    2/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    2

    in this paper that it provides better performance than the LAS

    detector with no major increase in average complexity. We

    also show that its complexity does not grow significantly from

    a low-order to a high-order modulation. While QP detectors

    have already been studied in conventional MIMO systems

    [19], [20], [21], [22], [23], their performance comparisons

    with existing heuristic algorithms (especially in large-scale

    MIMO) have not been seriously considered. In this work, we

    present the performance and complexity comparisons of QP-

    based detectors with existing techniques and point out that they

    are among the family of detectors that exhibit large-system

    behavior.

    The second proposed algorithm improves the performance

    of the first algorithm with a minor complexity increase. The

    improvement is based on the use of two stages of QP detector

    with a successive interference cancellation strategy that utilizes

    a shadow area constraint [24] to measure symbols reliability.

    Finally, the third algorithm uses the Branch and Bound (BB)

    search tree algorithm to further improve the solution of the

    conventional QP detector. In this algorithm, we do not perform

    the standard BB search tree as in [25], [26]; rather, a reducedand controlled version is proposed to provide a flexible trade-

    off between performance and complexity. A few nodes are

    explored in the BB tree based on two criteria: one reduces

    the depth of the BB tree and the other reduces the width of

    the BB tree. This idea is based on combining our previously

    proposed techniques which were used in [27], [28]. Although

    the complexity of this algorithm is still high when Nt is large

    at all SNRs, we reduced it dramatically (although, only at

    high SNR) by applying a new pruning rule based on the

    difference between the cost function of the integer problem

    and its relaxed problem in each node of the BB search tree.

    To the best of our knowledge the two proposed algorithms

    are new and have not been presented in literature before,especially in conjunction with QP detectors or large-scale

    MIMO systems. In addition to these two algorithms, the

    contribution of this paper includes: (i) Reducing complexity

    of the standard QP solver with no major loss in performance.

    This reduction is then used in implementing the two proposed

    techniques. (ii) Investigating the performance of the proposed

    techniques with a more realistic MIMO channel (the spatially

    correlated Kronecker Model). And (iii) presenting a low com-

    plexity method that generates soft information from QP-based

    detectors that can be used to implement an iterative detection

    and decoding receiver.

    I I . SYSTEM M ODEL

    Consider a MIMO system withNttransmit antennas andNrreceive antennas employing a spatial multiplexing transmis-

    sion known as Vertical Bell Laboratories Layered Space-Time

    (V-BLAST) [29], [10]. At the transmitter side, the information

    is generated in the source and mapped to symbols of different

    alphabets. The mapped complex symbols are demultiplexed

    into Nt separate independent data streams with a transmitted

    signal vector x = [x1, . . . ,xNt ]T CNt1. The general

    MIMO channel model is:

    y= Hx +n (1)

    where y = [y1, . . . ,yNr ]T CNr1 is the received signal

    vector at all Nr antennas, H CNrNt denotes the flatfading channel gain matrix whose entries are modeled as

    CN(0, 1), and n represents the receiver AWGN noise vectorwhose entries are modeled as i.i.d CN(0, 2). A more realisticMIMO channel will be considered later in section IV. The tilde

    symbol in (1) is made to distinguish the complex model from

    the real model shown in the next section. We assume ideal

    channel estimation and synchronization at the receiver end.

    III. PROPOSEDA LGORITHMS

    A. Formulation of the Problem

    The ML problem of model (1), which is equivalent toEuclidean distance minimization, can be expressed as:

    x= argminxNt

    y Hx22 (2)

    where Nt is the set of all possible Nt-dimensional complexcandidate vectors of the transmitted vector x. The equivalentreal-valued model of (1) is:

    y= Hx+n (3)

    y=

    {y}{y}

    , x =

    {x}{x}

    , n =

    {n}{n}

    ,H =

    {H} {H}{H} {H}

    (4)

    In this real-valued system model, the real part of the complexdata symbols is mapped to [x1, . . . , xNt ] and the imaginarypart of these symbols is mapped to [xNt+1, . . . , x2Nt ]. Nowthe equivalent ML detection problem of the real model canbe written as: x = argmin

    x2Nt y Hx 22, where set =

    {C+1,.., 1, 1,..., C1}, and Cis the QAM constellationsize. Each element of this real set can be transformed toa positive integer using the following linear transformation:

    z = x+(

    C1)2 . The above ML problem can be simplified to

    the following optimization problem:

    z= arg minz2Nt

    {12zTQz + bTz} (5)

    where = {0, 1, 2,.., C 1}, Q = HTH is a symmetricpositive semidefinite matrix,b =HT(y +(C1)H1)/2, and1= [1, 1, . . . , 1]T is a column vector of dimension (2Nt 1).

    B. Algorithm I: A QP Detector (Review)

    One way to approximate the solution of (5) is to use QP

    solvers that rely on relaxing the integer constraints. Thus,

    problem (5) can be relaxed to the following:

    argminz

    12zTQ z +bTz

    subject to 0 z (

    C 1)1(6)

    where 0 represents a 2Nt 1 vector of all zeros and theconstraints 0 z (C 1)1 represents the box constraintsof all elements of z, i.e. each element (symbol) of z is lower

    bounded by 0 and upper bounded by

    C 1. This form of anoptimization problem is a convex QP minimization problem.

    A unique global continuous solution z can be obtained using

    efficient interior-point solvers with reduced computational

    complexity [30]. The importance of using an interior-point

    solver is that in practice, the interior-point algorithm converges

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    3/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    3

    in a number of iterations that is constant, independent of

    the problem dimension [31]. This becomes attractive from

    a complexity point of view, especially when the number of

    antennas increases. Solving (6) provides a 2Nt dimensional

    solution vector z = [z1 , . . . , z2N]

    T R2Nt and a scalar costfunction value f(z). If all elements of z satisfy the integer

    constraints, then z is the optimum solution for problems (5)

    and (6). In general, the integer solution of (6) is provided by

    quantizing z to the nearest constellation set , that is:

    zi = Q[zi], i= 1, 2, . . . , 2Nt (7)

    where Q[.]is a quantization function to the appropriate constel-lation levels of the set . In the next subsections, we propose

    improvements to the QP detector in a large MIMO system

    through performing further analysis to the problem (6) using

    first, two-stage QP detection with interference cancellation,

    and second, the concept of the BB search tree [32], [25],

    [33]. It is worth noting that in the previous work [22],

    [23], a randomization rounding technique is shown to provide

    better performance than simple rounding (as in (7)), but with

    additional complexity of the order O(N2t). This technique,

    however, can still be used with any of our proposed algorithms.

    C. Algorithm II: A Two-Stage QP Detector

    The idea of this algorithm is to implement two stages of

    QP detection with interference cancellation to further improve

    the detection of the unreliable symbols (non-integer values of

    z in (6)). One drawback of Algorithm I is that all symbols

    are quantized simultaneously, irrespective of their reliabilities.

    Therefore, in this algorithm we use the concept of interference

    cancellation with symbols reliability that is based on shadow

    area constraints. A shadow area between positive integers of

    the constellation set (similar to [24]) is introduced beforeperforming quantization in (7). Any zi that falls in this

    shadow area is considered unreliable. In other words, from

    the continuous solution of (6), the variables with fractions

    that are far from their nearest integers by a value greater

    than or equal to are considered noisy, and therefore, need

    another stage of QP detection after interference cancellation

    of the more reliable symbols. We denote the positions of these

    unreliable symbols by the set of indicesJ. On the other hand,the variables with small fractions (< ) or purely integers can

    be immediately quantized and their values are considered the

    optimum integer solution for both (6) and (5). Thus, their

    effects need to be canceled out so that the solution of the noisy

    variables can be improved. The set of indices that representthe positions of these integer variables is denoted as I, andcan be estimated using this criterion:

    I={i: i {1, 2, . . . , 2Nt} | |zi zi| } (8)wherexis the rounding operation ofx to the nearest integer.Note that 0 < 0.4), mostof the symbols will pass the integer condition, even though

    they might be far from their nearest integers. With this ,

    interference cancellation may improve the detection of some

    symbols, especially at a high SNR regime. In this algorithm,

    is optimized based on both minimum BER and complexity

    across various SNR using simulation experiments, since the

    analytical optimization seems cumbersome. We found that the

    optimum is around 0.2 to 0.3 for various QAM levels. Asummary of the Algorithm II steps is shown in Table I. Note

    in the sequel, we refer to this algorithm as 2QP.

    D. Algorithm III: A Controlled Size BB Search Tree

    In this section, we start by a quick review for the standard

    BB algorithm, then we introduce our proposed approximations

    that help controll the size of the BB search tree and reduce its

    computational complexity.

    Branch and Bound algorithm: is a search tree-based al-

    gorithm that successively forces non-integer values of z in(6) to be integers in a recursive way. It does so using a

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    4/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    4

    TABLE I A Two-stage QP Algorithm

    1 Input: Q , b2 z = quadprog(Q, b) from (6)3 FindIthat satisfies z z 4 z(I) =Q[z(I)]5 Find set of indicesJ6 Find Q= Q(J, J), and7 b= Q(I, J)Tz(I) +b(J)8 z(

    J) = quadprog(Q, b) from (9)

    9 z(J) =Q[z(J)]

    search tree structure [32], [35], as shown in Fig. 1. The input

    problem to the BB search tree is problem (6). Its optimum

    continuous solution and cost function are denoted as z(0)

    and f(z(0)), respectively. The rest of the nodes in the BBsearch tree are denoted the same way, related to their node

    numbers, as depicted in Fig. 1. The basic idea of the standard

    BB search tree is that it starts by solving problem (6) at

    node 0 and then checks all the solution elements of z(0).

    If they satisfy the integer constraints, then there is no need tofurther explore node 0 for a better solution because z(0) is theoptimum integer solution for problem (6), and also for (5) [32].

    Alternatively, if they are not all integers, for example when

    z(0) contains some symbols with fractions, the BB algorithmsplits the problem in node 0 into two subproblems by adding

    two mutually exclusive and exhaustive constraints, as shown

    in Fig. 1. The new subproblems are called children nodes,

    and the original problem is called the parent node. The new

    relaxed problems at nodes 1 and 2 are similar to (6) except

    that the upper and lower bounds of the branching variable (say

    variable i) are replaced with zi

    z(0)i

    and zi

    z

    (0)i

    ,

    respectively. That is, problems at node 1 and node 2 of level

    one can be written as:

    argminz

    1

    2zTQ z+bTz

    subject to 0 z (

    C 1)1, and zi

    z(0)i

    ,

    (11)

    argminz

    1

    2zTQ z+bTz

    subject to 0 z (

    C 1)1, and zi

    z(0)i

    (12)where zi is called the branching variable at index i (0 i 2Nt), and

    z(0)i

    (

    z(0)i

    ) denotes the largest (smallest)

    integer smaller (greater) than or equal to z(0)i . There are

    various strategies for choosing the branching variable [35],

    but in this paper we choose the simplest one, which branchesa node at the first non integr variable. Now solving these

    new subproblems again using the interior-point algorithm,

    returns (z(1) , f(z(1))) and (z(2) , f(z(2))) for nodes 1 and

    2, respectively. If the solutions to these subproblems do not

    satisfy the integer constraints, each of them will be branched

    into two more subproblems and the process of branching will

    continue until the optimal integer solution is found, see more

    details on [33] and [35]. Two important pruning rules are used

    with the BB algorithm: 1) for any node in the tree, whenever its

    cost function value is greater than a known upper bound f(up),

    this node is pruned because no better solution is expected from

    the subtree below this node. The initial value of the upper

    bound can be taken as a very large value, such as , or canbe computed from any available integer solution, such as ZF or

    MMSE solutions. And 2) as mentioned above, if the solution

    of any node satisfies the integer constraints, then no branching

    is needed and the node is pruned.

    In this paper, we focus on the Breadth First (BF) search

    strategy [36], where the nodes of the tree are explored level

    by level as dipicted in Fig. 1. We prefer this strategy because

    it suits well our proposed approximation herein. In general,

    Fig. 1 Representation of Breath First BB search tree

    applying standard BB to (6) can lead to the ML solution, as is

    shown in our previous work [27], [28]. However, our system

    of interest is large-scale MIMO, where the dimension of the

    problem is 2Nt. This makes the standard algorithm computa-

    tionally expensive, and thus simplifications are needed.

    Proposed Approximations: our proposed algorithm in this section

    relies on adding the following three approximations to the

    standard BB algorithm.

    1) Depth reduction: Instead of finishing the search tree all

    the way down until the optimum integer solution is found,

    this approximation forces the BB search tree to stop at a

    predefined level (layer) L, even if the optimum integer solutionhas not been reached yet. We denote the number of nodes

    in the stopping level, L, as mL. Thus, the solution and the

    corresponding cost function values of the existing nodes in

    this level are z(p)L andf(z(p)L ), respectively, wherep = 1,...,mL.

    Therefore, the approximated integer solution is the quantized

    version of the solution corresponds to the minimum cost

    function at the stopping level L:

    z= Q[z(t)], t= argminp

    f(z(p)), p= 1,....,mL (13)

    This approximation is based on the concept of the standard

    BB algorithm, where every time the algorithm moves down

    one layer in the tree, at least one node comes closer to the

    optimum integer solution due to the branching rule. In other

    words, the nodes located in the path that leads to the optimum

    integer solution have the following property: the absolute value

    of the difference between zand its quantized version becomes

    smaller and smaller. For example, in Fig. 1, assume that

    the optimum integer solution found using the standard BB

    algorithm is in node 14, and the path leads to this node is the

    path from nodes 0, 2 , 6 and 14, then, |z(14) Q[z(14)]| |z(6) Q[z(6)]| |z(2) Q[z(2)]| |z(0) Q[z(0)]|.

    2) Width Reduction: Instead of exploring all nodes in every

    level of the search tree, this approximation explors only M

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    5/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    5

    most probable nodes that may lead to the optimum solution,

    while the rest are discarded (pruned). The selection criteria is

    based on the cost function as a metric. To accomplish this, we

    adopt the concept of the M algorithm [37], which is a breadth-

    first algorithm that is widely used in the QRDM technique for

    conventional MIMO systems [38].

    3) For faster simulation time and a reduced number of

    visited nodes (hence fewer computations), we further pro-

    pose another approximation in conjunction with the BB(L,M)

    search tree. This approximation depends on the difference

    between the cost function value of the relaxed problem (using

    continuous solution, z) and the cost function value of the

    integer problem (using rounded solution,z) of any node inthe tree. The idea is that whenever this difference is small

    (based on some criteria), we can approximate the relaxed

    continuous solution to be the integer solution. This adds one

    more pruning rule to the BB algorithm because more integer

    solutions are going to be available in the tree. Hence, it reduces

    the number of visited nodes significantly, especially at high

    SNR. Following the same notation in this section, we denote

    the optimum continuous solution of the relaxed problem ofa node k by z(k) and its objective function value as f(z(k)),

    where k = 0, 1, 2, . . . , N v and Nv is the number of visited nodes

    in the search tree. Similarly, we denote the quantized optimum

    continuous solution of the same node, k, byQ[z(k)], and itscost function value as f(Q[z(k)]). Thus, the approximation is:

    z(k) =

    Q[z(k)] if|f(z(k)) f(Q[z(k)])| |f(z(k))|

    z(k) otherwise

    (14)

    where|(.)| represents the absolute value operation, and is asmall number >0, which can be optimized based on a trade-

    off between performance and complexity. The larger the , the

    lower the performance and the complexity is reduced. In the

    standard BB algorithm = 0. This approximation is differentfrom the one in [25], which prunes the node only if its cost

    value is close to the best available upper bound.

    Note that in the sequel, we refer to Algorithm III as BB(L,M),

    where L is the stopping level of the search tree and M is the

    number of nodes maintained in each level. The summary of

    BB(L,M) is shown in Table II.

    E. Complexity Analysis

    The main ingredient of the computations in the QP detector

    is the interior-point algorithm, which finds a point where

    the Karush-Kuhn-Tucker (KKT) conditions hold for the op-

    timization problem (6) in an iterative manner. As shown in[30] and [39], each iteration of the interior-point algorithm

    boils down to solving a system of linear equations where it

    is required to perform a matrix inversion in every iteration.

    Therefore, the complexity of one interior-point iteration is in

    the order of O(N3t), and becomes O(nN3t) for n iterations. In

    practice, the interior-point converges in a number of iterations

    which is almost always a constant, independent of the problem

    dimension [31]. This is very attractive in high dimensional

    optimization problems. From our simulation experiments, we

    found that when using the standard interior-point algorithm,

    the average number of iterations required for various number

    TABLE II BB(L,M) Algorithm Summary

    1 Initialize node LIST = empty, and f(up) =2 Insert the values of L and M3 Initialize search by adding Problem (6) to the node LIST4 Initialize tree level l = 0 (root node level)5 while (node LIST is not empty) do6 for Loopm = 1 : ml7 Pick problem from node LIST ( call it problem (P(m)))

    8 Solve P(m)

    z(m)

    and f(z(m)

    ).9 iff(z(m))> f(up); prune node m and delete it

    from the LIST

    10 else iff(z(m))f(up), then11 if z(m) is all integer or satisfies ifcondition in (14),12 update f(up) =f(z(m)), and z=Q[z(m)]13 else keep node problem in the node LIST , end if14 end if15 end for loop16 if all nodes in level l are pruned, GOTO 2517 else Select the first M nodes that have the minimum

    f(z(m)) in level l , and delete the rest, end if18 ifl = L, then z=Q[z(t)], t= argminp f(z(p)),19 empty node LIST, then GOTO 2520 else expand the selected M nodes by branching

    each node prblem into two new sub-problems21 Push the new sub-problems into the node LIST22 Delete the original M nodes from the node LIST23 end if24 Set l = l + 125 end while

    of antennas is 6, 7, 8, and 9, when the symbol mapping is

    QPSK, 16QAM, 64QAM, and 256QAM, respectively. In this

    work, we further reduce the number of iterations to 2, 4, 5, and

    6 without major loss in performance. The idea is as follows:

    since the QP detector approximates the continuous solutionprovided by the interior-point algorithm, an early termination

    to the interior-point algorithm can speed up the convergence

    to the integer solution. The early termination, which is done

    before applying quantization step in (7), can be achieved by

    relaxing the tolerance constraints of the convergence.

    The second algorithm requires more computations than the

    first algorithm, due to the presence of the second stage of QP.

    Fortunately, the problem size of the second QP is much smaller

    than the first, especially for medium to high SNR and when

    the parameter is optimized. This makes the computational

    complexity of Algorithms I and II is nearly the same when

    the number of antennas becomes large. The interior-point

    algorithm in the second stage requires complexity in the orderof O(n(|J |)3). Therefore, the total complexity of Algorithm IIis in the order of O(nN3t + n(|J |)3).

    Finally, the proposed controlled-size BB algorithm needs

    more computations compared to the first two algorithms

    because of the computations needed in every node of the

    search tree. Thus, the total complexity can be in the order

    of O(NvnN3t) per received vector, where Nv is the number of

    visited nodes in the proposed BB search tree. In large-scale

    MIMO systems, n Nt and Nv is a function of both L and Mvalues of the tree (approximately, from simulations, Nv LMat low SNR, whereas Nv LM at high SNR). Therefore,

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    6/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    6

    BB(L,M) requires nearly Nv-times the complexity of 2QP.

    For various QAM modulations, the complexity of the

    proposed algorithms does not change significantly. In fact,

    the small variation in complexity is due to the difference

    in the number of interior-point iterations required for each

    modulation case. For instance, the average number of interior-

    point iterations required by 256QAM modulation is about

    3 times higher than that of QPSK modulation. This is an

    important advantage for the QP-based detectors compared to

    other algorithms in the literature of large-scale MIMO, such

    as RTS, R3TS [18], and Fixed Complexity SD [14], which

    require a large variation in complexity when the modulation

    order changes from low to high (e.g. it is in the order of 100

    times between QPSK and 64QAM for R3TS [18], and more

    than that for FSD).

    As shown in [15], the complexity per received vector of

    MMSE-LAS is in the order ofO (N3t) + O(N3t); one O (N

    3t) due

    to the MMSE initial vector, and one O(N3t) due to the LAS

    procedures. Therefore, the extra complexity needed by QP and

    2QP over MMSE-LAS arises from the number of interior-point

    iterations n, of the QP detector. Moreover, BB(L,M) requiresapproximately nNv-times the complexity of MMSE-LAS.

    IV. SIMULATION RESULTS

    In this section, we show simulation results for an uncoded

    and a coded large-scale MIMO system in a block flat fading

    channel with Nt = Nr for various QAM levels, assuming

    perfect knowledge of channel state information at the receiver.

    We refer to our proposed algorithms as QP for Algorithm

    I, 2QP for Algorithm II, and BB(L,M) for Algorithm III.

    We compare our proposed algorithms with other detectors

    including MMSE, MMSE-OSIC [10], MMSE-LAS [15], MIV-

    LAS [16], and RTS [17]. MIV-LAS is a LAS algorithm that

    uses three initial input vectors (matched filter (MF), zeroforcing (ZF), and MMSE). Since the performance gain of a

    multiple symbol update LAS algorithm [7] over MMSE-LAS

    is small, we limit our comparison to MIV-LAS and MMSE-

    LAS only. For fair comparison between various detection

    techniques, all implementations are done using the real system

    model shown in (3).

    A. Optimizing, and the Number of Iterationsn

    Figs. 2a and 2b demonstrate, as an example with the QPSK

    modulation, that the choice of can significantly improve the

    performance of the 2QP detector over the conventional QP

    detector. In this example of 3232 MIMO, it can be saidthat the value of between 0.25 and 0.3 provides the best

    performance over other values. For instance, when = 0.25,

    2QP has a 2 dB improvement over QP at 103 BER. The

    problem size of the second stage of the 2QP detector decreases

    as the value of increases (see Fig. 2b), especially at high

    SNR, and in general it is far below the size of the first

    stage, which is 2Nt. This makes the complexity of the 2QP

    detector close to the QP detector. For example, in the QPSK

    case with Nt = 32 and = 0.25 at 103 BER, the average

    size of the second stage of 2QP is 6 compared to 64 in the

    first stage. For various QAM modulations, SNRs, and Nt,

    Fig. 2c demonstrates that the value of = 0.25 is a good

    optimized value, which also corresponds to the hueristic value

    of = max/2. Thus, it is used in the rest of the paper.

    As we mentioned in section III-E, the main computational

    burden in the QP detector comes from the interior-point

    solver. We proposed to reduce its computations by forcing

    the algorithm to perform early termination, thus reducing the

    number of iterations, n. We performed simulation experiments

    using both QP and 2QP detectors for QPSK and 16 QAM

    modulations with various interior-point iterations. Figs. 3a

    and 3b show that 2 and 4 iterations for QPSK and 16QAM

    modulations, respectively, are the minimum numbers that

    guarantee no major loss in BER performance. The same

    reduction procedures were done for 64QAM and 256QAM

    and the minimum number of iterations was found to be 5

    for 64QAM and 6 for 256QAM. The same idea is used to

    optimize the value of in the BB(L,M) algorithm for various

    modulation levels, and we found that = 0.01 for QPSK,

    = 0.001 for 16QAM, and = 0.0001 for both 64QAM and

    256QAM. These optimized number of iterations and will be

    used in the rest of the simulation experiments.

    B. Uncoded BER Performance vs. SNR

    We choose a relatively large number of antennas, such

    as Nt = Nr = 32, to demonstrate the performance of our

    techniques. In Fig. 4, we present the average uncoded BER

    performance for32 32MIMO with QPSK, 16QAM, 64QAM,and 256QAM modulations. Fig. 4 shows that both 2QP and

    BB(L,M) algorithms improve the performance of the QP

    detector at all displayed SNRs and at all QAM modulations.

    When comparing 2QP with BB(L,M), say BB(16,2), as in Fig.

    4a, 2QP performs better than BB(16,2) in QPSK with a 0.5 dB

    improvement at 103 BER and with even lower complexity.

    On the other hand, 2QP steadily becomes worse than BB(16,2)as the modulation order increases (see Figs 4b,c,d).

    A more detailed simulation of the BB(L,M) algorithm

    is shown in Fig. 5 for 16QAM as an example. It shows

    that as L increases, the BER performance increases. For

    instance, BB(4,4) outperforms BB(2,4), and BB(8,4) outper-

    forms BB(4,4). From the same figure, it can be observed

    that the diversity of the system increases with increasing L.

    Increasing the width of the BB tree can also improve the

    performance, such as the case of BB(16,4) over BB(16,2);

    however, in some cases extending the width of BB(L,M)

    does not provide improved performance, it only adds more

    complexity, as shown in the same figure with the cases of

    BB(16,4) and BB(16,6). Note that in this paper we did notfocus on finding the optimum values of L and M, we only

    show that some pairs can be chosen as good suggestions

    to demonstrate how the algorithm works, such as BB(16,2),

    BB(4,4), but for large Nt, especially with higher QAM levels,

    it is enough to pick L=Nt/2, and M=2 to outperform the other

    existing algorithms.

    Fig. 4 shows that the advantages of the QP-based detectors

    come to an effect when higher order modulations are used.

    From QPSK simulation shown in Fig. 4a, RTS outperforms all

    of our proposed techniques, and also MMSE-LAS and MIV-

    LAS outperform QP and BB(8,4) at certain SNRs. While on

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    7/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    7

    6 8 10 12 14 15

    104

    102

    Average received SNR (dB)

    BER

    QPSK MIMO 32x32

    QP

    2QP (=0.1)

    2QP (=0.4)

    2QP (=0.2)

    2QP (=0.3)

    2QP (=0.25)

    (a)

    0 2 4 6 8 10 12 14 150

    5

    10

    15

    20

    25

    Average received SNR (dB)

    Secondstageproblems

    ize

    QPSK 32x32 MIMO

    =0.1

    =0.2

    =0.25

    =0.3

    =0.4

    (b)

    0.1 0.2 0.3 0.4 0.5

    106

    104

    102

    BER

    Nt = 32, SNR = 21 dB

    Nt = 64, SNR = 21 dB

    Nt = 64, SNR = 15 dB

    Nt = 32, SNR = 15 dB

    Nt = 20, SNR = 15 dB

    Nt = 64, SNR = 10 dB

    Nt =32, SNR = 39 dB

    Nt = 32, SNR = 49 dB

    Blue color : 16

    QAM

    Black color:

    QPSK

    Red color: 256

    QAM

    (c)

    Fig. 2 A Two-stage QP detector (a) QPSK BER performance (b) Problem size of the second stage (c) QAM BER performance vs.

    5 10 15 20 25

    104

    102

    100

    Average received SNR (dB)

    BER

    32x32 MIMO, QP Detector

    Standard IP algorithm

    Avg. IP iter =2

    Avg. IP iter =1

    Standard IP algorithmAvg. IP iter =4

    Avg. IP iter =3

    16 QAM

    QPSK

    QP Detector

    (a)

    5 10 15 20 25

    104

    102

    100

    Average received SNR (dB)

    BER

    32x32 MIMO, 2 QP algorithm

    Standard IP algorithm

    Avg.IP iter=2

    Avg.IP iter=1

    Standard IP algorithm

    Avg.IP iter=4

    Avg.IP iter=3

    16 QAM

    Twostage QP

    with =0.25

    QPSK

    (b)

    Fig. 3 The effect of reducing interior-point iters. on the BER performance in a 32 32 MIMO system (a) QP Detector (b) Two-stage QPdetector. Standard IP is the standard interior-point algorithm

    the other hand, from Figs. 4b, c, d where the modulation level

    increases, QP, 2QP, and BB(L,M) steadily become superior to

    RTS and LAS at all displayed SNRs. For example, in Fig.

    4d, the QP detector, which provides an upper bound BER to2QP and BB(L,M), provides 5 dB improvements over MMSE-

    LAS and 3 dB improvements over RTS at 102 BER. The

    performance of RTS was improved using a hybrid of RTS and

    Belief Propagation (RTS-BP) in [40], but this only achieved

    a 1.6 dB improvement at 103 BER with 16QAM (see Fig.

    3 in [40]), while our algorithms 2QP and BB(32,4) provide

    improvements of 2 dB and 3 dB over RTS, respectively. It is

    worth noting that the performance of our proposed algorithms

    can be further improved when combined with the LAS or RTS

    algorithms, by making the starting initial vector of LAS or

    RTS to be the vector results from QP, 2QP, or BB(L,M). The

    simulation results for this claim are not extensively shown

    here, but two examples for QP with LAS using QPSK, andBB(32,4) with LAS using 16QAM are shown in Figs. 4a and

    5, respectively.

    Figs. 6a, b, c, d present a sample of complexity computa-

    tions in terms of the average number of real operations versus

    Nt measured at relatively low SNR and relatively high SNR

    for both 16QAM and 256QAM. The important observations

    from these figures are as follows: (i) The complexity of QP

    and 2QP are almost similar with the advantage of 2QP for its

    superior performance. (ii) There is no significant increase in

    the computational complexity of the QP and 2QP detectors

    over the MMSE-LAS detector; however, their performance

    is substantially improved, especially at higher QAM modu-

    lations. For example, at 256QAM 32 32 MIMO, 2QP has a 7dB improvement over MMSE-LAS (see Fig. 4-d), while it onlyrequires about double the computations of MMSE-LAS. (iii)

    At 256QAM modulation, 2QP requires fewer computations

    than RTS, with even better performance (see Fig. 4-d). (iv)

    At fixed Nt = Nr, complexity of QP, 2QP, and BB(L,M) does

    not change significantly from 16QAM to 256QAM. (v) At

    relatively high SNR, the difference in complexity between

    BB(4,4), BB(16,2) and QP,2QP is small while at relatively

    low SNR the difference is clearly noticeable. (vi) At low SNR

    and low order modulation, such as 16QAM, RTS requires

    less computations than BB(L,M); however at higher SNR and

    higher modulation order, such as 256QAM, BB(L,M) requires

    less computations. This is due to the effect of pruning rule of

    (14) which becomes clear at high SNR. (vii) Even though thecomplexity of RTS is close to that of BB(4,4) and BB(16,2)

    at 256QAM with low SNR, the BER performance of BB(4,4)

    and BB(16,2) is significantly outperforming RTS.

    C. Uncoded BER Performance vs.Nt

    In Figs. 7, 8, and 9, we plot an uncoded BER performance

    as a function of Nt = Nr, for various detectors at an average

    received SNR of 15 dB, 26 dB, and 39 dB for QPSK, 16QAM

    and 256QAM, respectively. We compare the proposed algo-

    rithms against MMSE-LAS, RTS, MMSE-OSIC, and QRDM.

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    8/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    8

    6 8 10 12 14 1610

    6

    105

    104

    103

    102

    101

    100

    Average received SNR

    BER

    QPSK 32x32 MI MO

    MMSE

    MMSELAS

    3 MIVLAS

    QP

    BB(8,4)

    BB(16,2)

    2QP (=0.25)

    QPLAS

    RTS

    SISOAWGN

    (a)

    12 14 16 18 20 22 24 26 28

    104

    103

    102

    101

    100

    Avgerage received SNR (dB)

    BER

    16QAM 32x3 2 MIMO

    MMSE

    MMSELAS

    3 MIVLAS

    RTS

    QP

    BB(4,4)

    BB(16,2)2QP (=0.25)

    BB(32,4)

    SISOAWGN

    (b)

    20 25 30 35 40

    104

    103

    102

    101

    100

    Average received SNR, dB

    BER

    64QAM MIMO 32x32 BB

    MMSE

    MMSELAS

    RTSQP

    2QP (=0.25)

    BB(16,2)

    BB(32,8)

    SISOAWGN

    (c)

    30 35 40 45

    104

    103

    102

    101

    100

    Average received SNR (dB)

    BER

    256 QAM 32x32 MIMO

    MMSE

    MMSELAS

    3 MIVLAS

    RTS

    QP

    2QP (=0.25)

    BB(4,4)

    BB(16,2)

    BB(32,8)

    (d)

    Fig. 4 Uncoded BER performance of a 32 32 MIMO (a) QPSK (b) 16QAM (c) 64QAM (d) 256QAM

    16 18 20 22 24 26 2810

    6

    105

    104

    103

    102

    101

    100

    BER

    16QAM 32x32 MIMO

    Average received SNR (dB)

    MMSE

    QP

    BB(2,4)

    BB(4,4)

    BB(8,4)

    BB(16,2)BB(16,4)

    BB(16,6)

    BB(16,8)

    BB(32,4)

    BB(32,4)LAS

    SISOAWGN

    Fig. 5 16QAM BER performance using BB(L,M). Improvemnt isclear as the move to a deeper level

    MF and MMSE are also plotted for reference.

    In the case of QPSK, Fig. 7 shows that MMSE-LAS

    provides better performance than QP and BB(4,4) at Nt 30andNt 40, respectively, while it is completely inferior to 2QPat all displayed Nt. BB(L,M) can outperform MMSE-LAS if

    more levels are considered in the BB(L,M) search tree, such as

    the case of BB(16,2). Similarly, RTS outperforms QP, BB(4,4)

    and 2QP at all considered Nt; however, at higher values of L,

    such as 16, RTS is inferior to BB(L,M) when Nt < 20. On the

    other hand, as we go for higher QAMs (see Figs. 8, and 9),

    our algorithms clearly outperform LAS and RTS algorithms.

    An interesting result regarding the 2QP algorithm, across

    various QAM modulations, is that although it requires lower

    complexity than BB(L,M), it has superior performance in some

    ranges of Nt. For example, in QPSK, it outperforms BB(4,4)and BB(16,2) at Nt >10 and Nt > 28, respectively. At higher

    QAM modulations, the value of Nt at which 2QP starts to

    outperform BB(L,M) is increased (see Figs. 8 and 9).

    We observe a flooring behavior with respect to BB(L,M)

    performance. This is due to the fact that while we increase

    Nt, we keep the same depth, L, which is not enough to reduce

    more errors. This effect can be reduced if L is adaptively

    increasing with increasing Nt. Fig. 7 shows that this effect

    is reduced when BB(16,2) is replaced by BB(2Nt,2).

    MMSE-OSIC performs well only at smaller Nt; using

    QPSK, it performs better than QP at Nt 12; using 16QAM,it performs better than QP and 2QP at Nt

    16; using

    256QAM, interestingly, it performs better than QP, 2QP, andBB(4,4) at Nt 45; however, it requires more computations.In general, MMSE-OSIC starts to exhibit a high error floor as

    Nt increases, which is in line with the results shown in [15].

    The reduced complexity search tree algorithms that are

    studied in conventional MIMO, such as Fixed SD (FSD)

    [14], K-best SD, and QRDM, demonstrate poor performance

    in large-scale MIMO systems [41]. We present here, as an

    example, the performance of the QRDM algorithm for both

    QPSK with M=4 and 16QAM with M=16. It can be seen that

    QRDM with M equals the QAM constellation size can provide

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    9/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    9

    10 20 30 40 50 6010

    4

    105

    106

    107

    108

    109

    N

    Avg.

    #of

    ArithmaticOperations.

    computational complexity for 16QAM at 19 dB SNR

    MMSE

    MMSELAS

    QP

    2QP

    BB(4,4)

    BB(16,2)

    RTS

    (a)

    10 20 30 40 50 6010

    4

    105

    106

    107

    108

    N

    Avg.

    #of

    ArithmaticOperations.

    Computational Complexity for 16QAM at 26dB SNR

    MMSE

    MMSELAS

    QP

    2QP

    BB(4,4)

    BB(16,2)

    RTS

    (b)

    10 20 30 40 50 6010

    4

    105

    106

    107

    108

    109

    N

    Avg.

    #ofArithmaticOperations.

    Computational Complexity for 256QAM at 35 dB

    MMSE

    MMSELAS

    QP

    2QP

    BB(4,4)

    BB(16,2)

    RTS

    (c)

    10 20 30 40 50 6010

    4

    105

    106

    107

    108

    N

    Computational Complexity for 256 at 45 dB

    Avg.

    #ofArithmatic

    Operations.

    MMSE

    MMSELAS

    QP

    2QP

    BB(4,4)

    RTS

    BB(16,2)

    (d)

    Fig. 6 Avg. Complexity in terms of # of real operations vs. Nt (a) 16QAM at 19dB SNR (b) 16QAM at 26dB SNR (c) 256QAM at 35dBSNR (d) 256QAM at 45dB SNR

    the best performance atNt < 10, which is the ML performance;

    however, as Nt gets higher, the BER performance deteriorates

    due to the fact that the QRDM reduced search space becomes

    smaller than the ML search space.

    D. Turbo Coded BER Performance

    In this subsection, we evaluate the turbo coded BER per-

    formance of the QP-based detectors compared to MMSE,

    MMSE-LAS, and RTS detectors. A 32 32 MIMO systemis examined with both 16QAM and 256QAM, and with a

    rate-1/3 turbo decoder of 10 iterations. A hard decision1output valued vector from all detectors is fed as an input to theturbo decoder. Performance can be improved if a soft decision

    output valued vector is fed instead. Fig. 10 demonstrates that

    similar to uncoded BER performance, the turbo coded BER

    performance of the QP-based detectors outperform RTS and

    LAS detectors as the modulation order increases. In 16QAM

    turbo coded performance, RTS outperforms QP and 2QP with

    about 1.5 and 0.5 dB, respectively, at 102 BER, while in

    256QAM, QP and 2QP outperform RTS with 4 and 4.5 dB,

    respectively. The Nt = Nr = 32 with 16QAM and rate-1/3

    turbo coded corresponds to 32 1/3 4 = 42.67 bit/sec/Hzspectral efficiency. It becomes85.33bit/sec/Hz when 256QAM

    is used. The theoretical minimum SNR required to achieve this

    capacity is shown in Fig. 10.

    E. Effect of MIMO Spatial Correlation

    In this section, we investigate the performance of the 2QP

    detector in a more realistic MIMO channel. We adopt a

    spatially correlated MIMO fading model using the Kronecker

    product model [42],[43], where the complex MIMO channel

    matrix can be written as:

    H= R1/2r Aiid R

    1/2t (15)

    where Rr and Rt are the correlation matrices for the re-ceive antennas and transmit antennas, respectively, while Aiidrepresents an i.i.d. (independent and identically distributed)

    Rayleigh fading channel matrix. This model assumes that

    the fading statistics of the transmit and receive arrays are

    independent. In this paper, the correlation matrices of the

    signals at both the transmit and receive sides are computed

    based on the distance between antenna elements [44], [45].

    Also, this model does not take into account the structure of

    the scattering environment between transmitter and receiver.

    The BER performance of the 2QP detector is only consid-

    ered here for illustration. In this simulation, we consider a 16

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    10/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    10

    10 20 30 40 50 6010

    6

    105

    104

    103

    102

    101

    100

    # of Antennas

    BER

    QPSK MIMO , SNR = 15 dB

    MF

    MMSE

    QRDM, M=4

    MMSEOSIC

    MMSELAS

    QP

    BB(4,4)

    2QP (=0.25)

    BB(16,2)

    BB(2Nt ,2)

    RTS

    Fig. 7 QPSK BER performance vs. Nt at SNR=15 dB.

    10 20 30 40 50 6010

    6

    105

    10

    4

    103

    102

    101

    100

    # of Antennas

    BER

    16QAM MIMO, SNR =26 dB

    MF

    MMSE

    MMSELAS

    MMSEOSIC

    QRDM, M=16

    QP

    BB(4,4)

    BB(16,2)

    2QP (=0.25)

    RTS

    Fig. 8 16QAM BER performance vs. Nt at SNR=26 dB

    20 40 60 80 10010

    4

    103

    102

    101

    100

    # of Antennas

    BE

    R

    256QAM MIMO ,SNR =39 dB

    MF

    MMSE

    MMSEOSIC

    MMSELAS

    2QP (=0.25)

    QP

    BB(4,4)

    BB(16,2)

    RTS

    Fig. 9 256QAM BER performance vs. Nt at SNR=39 dB

    16MIMO system using 16QAM modulation for both iid fading

    and spatially correlated fading. The distances between antenna

    elements is taken to be 0.4 (mild correlation scenario). The

    effect of spatial correlation is examined for both uncoded andrate-1/3 turbo coded BER performance. Fig. 11 shows that

    there is a clear performance loss when using correlated fading.

    For instance, at 102 uncoded BER performance, 2QP with

    correlated fading experiences degradation by 6 dB compared to

    iid fading, while with the turbo coded BER performance, 2QP

    with correlated fading exhibits degradation of 4 dB compared

    to iid fading. To alleviate the degradation from correlation,

    we increased the dimension of the receive array, similar to

    the work in [7]. Fig. 11 shows that increasing the number of

    receive antennas by just one (i.e. Nr = 17) can dramatically

    alleviate this degradation. For instance, with 16 17 scenario,

    5 10 15 20 25 30 35 4010

    4

    103

    102

    101

    100

    Avergae received SNR, dB

    BER

    MIMO 32x32 1/3 rate Turbo Coded

    MMSE

    MMSELAS

    QP

    2QP

    RTS

    3.25

    dB9.75

    dB

    16QAM 256QAM

    Fig. 10 16QAM and 256QAM turbo coded BER performance witha rate-1/3 32 32 MIMO system

    the difference in performance at 102 uncoded BER is 1 dB

    compared to 6 dB in 1616scenario, whereas with turbo codedperformance the difference reduces to 0.6 dB.

    10 15 20 25 30 35 40

    104

    103

    102

    101

    100

    Average received SNR, dB

    BER

    16QAM MIMO 2QP Algorithm

    Uncoded 16x16(spatial corr. fading)

    Rate1/3 turbo coded 16x16(spatial corr. fading)

    Uncoded 16x16(iid fading)

    Rate1/3 turbo coded 16x16 (iid fading)

    Uncoded 16x17 (spatial corr. fading)

    Rate1/3 turbo coded 16x17 (spatial corr. fading)

    Uncoded 16x17 (iid fading)

    Rate1/3 turbo coded 16x17 (iid fading)

    Uncoded SISO AWGN

    Fig. 11 Uncoded/coded BER performance of a 2QP detector in i.i.d.

    fading as well as in correlated MIMO fading for both 16 16 and16 17 cases

    V. ITERATIVE D ETECTION ANDD ECODING USING AQ P

    DETECTOR

    In this section, the aim is to develop a turbo equalization-

    type receiver using a QP detector. In the previous sections,

    the performance of QP-based detectors were studied based on

    uncoded/coded BER. In order to improve the performance of

    such detectors in a low SNR regime, a turbo equalization-

    type receiver can be used, in which a detector and a decoder

    exchange soft information between each others in an iterative

    manner (called iterative detection and decoding (IDD)) untila stopping criteria is reached [46]. There are two challenges

    in using QP in an IDD setting. First, how to incorporate a

    priori information provided by the channel decoder, in the

    form of Log-Likelihood Ratio (LLR), into the QP optimization

    problem (6). Second, how to make the QP detector provide

    soft information, in the form of LLR, so that it can be used

    as a priori information to the soft-input soft-output channel

    decoder. Addressing these challenges with implementation and

    performance study will be presented in this section for large-

    scale MIMO in a spatial multiplexing setup. We use the same

    technique in [47] to incorporate a prioriinformation into the QP

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    11/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    11

    optimization problem; however, we further propose to reduce

    the number of optimization problems needed to compute LLR

    using local neighborhood solutions. Model (3) is used for the

    analysis, with the focus on the QPSK modulation. A receiver

    block diagram with turbo equalization is shown in Fig. 12.

    Fig. 12 Receiver side for MIMO IDD with a QP detector

    Consider QPSK symbols, which are mapped from coded and

    interleaved bits, to be transmitted over a MIMO flat fadingchannel. At the receiver side the complex channel model is

    transformed to a real equivalent one, as shown in section

    III-A. The real part of the complex data symbols is mapped to

    [x1, . . . , xNt ], and the imaginary part of these symbols is mapped

    to [xNt+1, . . . , x2Nt ], where bit xi {1, +1} , i = 1, . . . , 2Nt.Therefore, the a posteriori LLR for bit xi is:

    Lpost(xi) = lnp(xi = +1|y, H)p(xi = 1|y, H)

    , i= 1, . . . , 2Nt (16)

    Using Bayes theorem, Eq. (16) can be written as:

    Lpost(xi) = ln

    xx+1i

    p(y|x, H)P(x) ln

    xx1i

    p(y|x, H)P(x)(17)

    where x1i

    is the set of all possible vectors of x satisfying

    xi = 1. P(x) is the vector of a priori probabilities, which in thecase of turbo equalization, is delivered by the outer channel

    decoder in the form of an a priori LLR ratio, as follows:

    La(xi) = lnp(xi = +1)

    p(xi = 1), i= 1, . . . , 2Nt (18)

    If the noise in the system is white Gaussian, the prob-

    ability density function p(y|x, H) can be represented by122

    exp||y Hx||2/22. This can be used in (17), and

    with the aid of max-log approximation (ln(

    iexp(i)) maxi {i}) [48], Eq. (17) can be simplified to:

    Lpost(xi) minx

    x1i

    122

    ||y Hx||2 ln[P(x)]min

    xx+1i

    1

    22||y Hx||2 ln[P(x)]

    (19)

    In order to find the relation between the vector ofa prioriprob-

    ability P(x) and La, we follow the same work and assumptions

    in [46] and [48]. Thus, (19) can be written as:

    Lpost(xi) minx1xi

    1

    22||y Hx||2 1

    2xTLa

    minx+1xi

    1

    22||y Hx||2 1

    2xTLa

    (20)

    where x = [x1, . . . , x2Nt ]T is the vector of all interleaved bits,

    and La = [La(1), . . . , La(2Nt)]T is the vector of LLR ratios of

    all interleaved bits. Now lets focus on the first term on the

    right side of (20) and reformulate it to a QP problem, we get:

    minx1xi

    1

    22||y Hx||2 1

    2xTLa

    = min

    z0zi

    1

    2zTQz + bTz (21)

    where = {0, 1}2Nt

    , Q = H

    T

    H, z =

    x+ 1

    2 , and b =1

    2HT(y+ H1)

    2

    4 La. The result of (21) can be applied

    to both terms of (20), and with relaxing integer constraints,

    Equation (20) becomes:

    Lpost(xi) min0z1,zi=0

    { 12zTQz + bTz}

    min0z1,zi=1

    { 12zTQz + bTz}

    (22)

    Equation (22) shows that to evaluate LLR per one bit, it is

    required to solve two QP problems of length 2Nt1each. TheLLR computations for these 2Nt bits require solving a total

    of 4Nt QP problems, which are large computations. Thus, as

    in [47], we, first solve the following problem without any bitconstraints,

    z= Q[argmin0z1

    1

    2zTQz + bTz] (23)

    and second, we solve the same problem again 2Nt-times with

    bit constraints as follows:

    minz

    1

    2zTQz + bTz

    st 0 z 1, zi = xor(zi, 1), i= 1, . . . , 2Nt

    (24)

    The cost function values that result from the minimization

    problems in (23) and (24) are substituted back in (22) to

    find Lpost(xi). This idea reduces the number of problems to

    be solved to 2Nt+ 1.

    As shown in [46], the exchange of extrinsic information

    between the channel detector and the channel decoder is more

    effective in improving performance of the turbo equalization

    receiver. Thus, the required extrinsic information can be cal-

    culated, as follows:

    LE(xi) = Lpost(xi) La(xi) (25)

    Although the above technique may suit the conventional

    small MIMO systems because the size of the QP is small,

    it is not computationally efficient for the large-scale MIMO

    system. For instance, if Nt = 64 with QPSK modulation, 129

    QP optimization problems need to be computed to evaluate

    LLR for 128 bits (i.e. using (23) and (24)). Therefore, in thissection, we propose a simple algorithm that solves only one

    optimization problem and then finds the neighborhood set of

    solutions to the vector z to compute LLR per bit. It can be

    summarized in the following steps:

    1) Solve the QP problem (23) one time to find z.

    2) Then, instead of solving problem (24) 2Nt-times, construct

    the closest neighborhood solutions of z, as described below.

    3) The list of solution vectors provided by z and its neighbor-

    hood is used in (20) or (22) to compute Lpost(x).

    The construction of a neighborhood solution can be done

    according to the following way: Let the alphabet set for QPSK

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    12/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    12

    modulation = {0, 1}, so the symbol neighborhood of (0)(i.e.N(0)) is{1}, andN(1) is{0}. The vector neighborhoodto z is the vector that differs from z in just one coordi-

    nate; hence, there will be 2Nt neighbor vectors to z. Let

    the neighbor vectors be znb = [z(1), . . . , z(j), . . . , z(2Nt)], where

    z(j) = [z(j)1 , . . . , z

    (j)i , . . . , z

    (j)2Nt

    ]T, i, j = 1, . . . , 2Nt, and

    z

    (j)

    i = zi for i

    =j

    N(zi) for i= j (26)The simulation of this section is implemented using a soft-in

    soft-out 1/2 rate convolutional channel decoder that is based on

    the BCJR algorithm. Note that in the transmit side a convolu-

    tional encoder (rate R = 1/2, generator polynomials [133 171],

    and constraint length 7) is used with a random interleaver and

    a QPSK large-scale MIMO system with Nt = Nr = 16 and

    64. The number of iterations represents the number of times

    the soft-input soft-output MIMO detector and the soft-input

    soft-output channel decoder are used.

    Fig. 13 demonstrates the BER performance of three itera-

    tions of IDD when the soft-in soft-out QP detector is used. It

    can be seen that as the number of iterations increases, a lowerBER is obtained for both cases of Nt = 16 and Nt = 64,

    though the difference in performance between Nt = 16 and

    Nt = 64 can be seen only at higher iteration numbers, such

    as iteration 3. The uncoded and convolutionally coded cases

    are also plotted in the same figure to point out the advantages

    of IDD at low SNR. The coded performance of 16 16 and64 64 represents the case where a hard decision QP detectoris followed by a hard decision viterbi decoder. As expected,

    the performance difference between the hard decision and soft

    decision (represented by iteration number 1 of IDD) is about 2

    dB. Note that in this figure, the large system behavior between

    16

    16and 64

    64can be observed in both uncoded and coded

    cases; however, in IDD, it can be observed at higher iterationnumbers.

    The performance of our proposed technique for reducing

    LLR computations is shown in Fig. 14, with Nt = 16 and

    Nt = 64, where LLR is computed based on (23) with the

    set of neighborhood solutions. This is compared to the case

    where LLR is computed based on multiple QP computations

    ((23) and (24)). When Nt is relatively small, such as 16, the

    performance of the two techniques become very close as the

    number of iterations increases, such as the case of iteration 3

    in Fig. 14a. Whereas, for relatively large Nt, such as 64, the

    performance of the proposed technique is quite similar to the

    multiple QP computation technique. It becomes even slightly

    better at the third iteration, as shown in Fig. 14b. This may

    be due to the large system effect that appears more clearly at

    Nt= 64 because it combines QP technique with some sort of

    LAS technique in computing LLR.

    VI . CONCLUSION

    This paper proposes low complexity detection algorithms

    that are suitable for large-scale MIMO with higher QAM

    modulations. The proposed algorithms are based on the QP

    detector. They improve the performance of the conventional

    QP detector with better trade-offs between complexity and

    2 4 6 8 10 12 14 16 18 2010

    6

    105

    104

    103

    102

    101

    100

    Average Received SNR, dB

    BER

    QPSK MIMO QP 1/2 rate Conv. Coded

    Uncoded 16x16Uncoded 64x64Coded, 16x16Coded, 64x64IDD , 16x16IDD , 64x64

    iter# 2

    iter# 3 iter# 1

    Uncoded

    Coded

    Fig. 13 BER performance of IDD using a QP detector (LLRcomputations are based on (23) and (24))

    performance. At high SNR and higher QAM modulations, the

    proposed algorithms outperform LAS and RTS algorithms in

    both coded and uncoded BER performance. At large Nt = Nr,

    the 2QP algorithm is more suitable in terms of performanceand complexity than BB(L,M), while BB(L,M) provides better

    performance at relatively small Nt = Nr. This paper also

    demonstrated that QP-based detectors can be used for iterative

    detection and decoding with low complexity.

    REFERENCES

    [1] A. Elghariani and M. Zoltowski, A quadratic programming-based

    detector for large-scale mimo systems, accepted in IEEE Wireless

    Communications and Networking Conference (WCNC), 2015.

    [2] F. Boccardi, R. W. Heath Jr, A. Lozano, T. L. Marzetta, and P. Popovski,

    Five disruptive technology directions for 5g, IEEE Communications

    Magazine, vol. 52, no. 2, pp. 7480, 2014.

    [3] G. J. Foschini and M. J. Gans, On limits of wireless communications ina fading environment when using multiple antennas, Wireless personal

    communications, vol. 6, no. 3, pp. 311335, 1998.

    [4] H. Bolcskei, Principles of mimo-ofdm wireless systems, Chapter in

    CRC Handbook on Signal Processing for Communications, M. Ibnkahla,

    Ed, 2004.

    [5] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,

    O. Edfors, and F. Tufvesson, Scaling up mimo: Opportunities and

    challenges with very large arrays, IEEE Signal processing Magazine,

    vol. 30, no. 1, pp. 4060, 2013.

    [6] J. Hoydis, K. Hosseini, S. t. Brink, and M. Debbah, Making smart

    use of excess antennas: Massive mimo, small cells, and tdd, Bell Labs

    Technical Journal, vol. 18, no. 2, pp. 521, 2013.

    [7] S. K. Mohammed, A. Zaki, A. Chockalingam, and B. S. Rajan, High-

    rate space-time coded large-mimo systems: low-complexity detection

    and channel estimation, IEEE Journal of selected Topics Signal pro-

    cessing, vol. 3, no. 6, 2009.

    [8] M. O. Damen, H. El Gamal, and G. Caire, On maximum-likelihood

    detection and the search for the closest lattice point, IEEE Trans. Inf.

    Theory, vol. 49, no. 10, pp. 23892402, 2003.

    [9] M. Hansen, B. Hassibi, A. G. Dimakis, and W. Xu, Near-optimal

    detection in mimo systems using gibbs sampling, in Global Telecom-

    munications Conference, GLOBECOM. IEEE, 2009, pp. 16.

    [10] P. W. Wolniansky, G. J. Foschini, G. Golden, and R. A. Valenzuela,

    V-blast: An architecture for realizing very high data rates over the

    rich-scattering wireless channel, in URSI International Symposium on

    Signals, Systems, and Electronics, ISSSE 98. IEEE, 1998, pp. 295300.

    [11] D. W. Waters and J. R. Barry, The chase family of detection algorithms

    for multiple-input multiple-output channels, in Global Telecommunica-

    tions Conference, GLOBECOM04, vol. 4. IEEE, 2004, pp. 26352639.

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    13/141536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See

    http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    13

    2 3 4 5 6 7 8

    104

    103

    102

    101

    100

    Average received SNR, dB

    BER

    QPSK 16x16 MIMO IDD

    LLR using neighbor solutions

    LLR using (23) and(24)

    iter# 1

    iter# 2iter# 3

    (a)

    2 3 4 5 6 7 8

    104

    103

    102

    101

    100

    Average received SNR, dB

    BER

    QPSK 64x64 MIMO IDD

    LLR using neighbor solutionsLLR using (23) and (24)

    iter# 2

    iter# 3

    iter# 1

    (b)

    Fig. 14 IDD BER performance with reduced LLR computation (a) 16 16 (b)64 64

    [12] J. Yue, K. J. Kim, J. D. Gibson, and R. A. Iltis, Channel estimation and

    data detection for mimo-ofdm systems, in Global Telecommunications

    Conference, GLOBECOM03, vol. 2. IEEE, 2003, pp. 581585.[13] N. Srinidhi, S. K. Mohammed, A. Chockalingam, and B. S. Rajan,

    Near-ml signal detection in large-dimension linear vector channels

    using reactive tabu search, arXiv preprint arXiv:0911.4640, 2009.

    [14] L. G. Barbero and J. S. Thompson, Fixing the complexity of the sphere

    decoder for mimo detection, IEEE Trans. Wireless Communications,

    vol. 7, no. 6, pp. 21312142, 2008.

    [15] K. Vishnu Vardhan, S. K. Mohammed, A. Chockalingam, and B. Sun-

    dar Rajan, A low-complexity detector for large mimo systems and mul-

    ticarrier cdma systems,IEEE Journal Selected Area on Communication ,

    vol. 26, no. 3, 2008.

    [16] P. Li and R. D. Murch, Multiple output selection-las algorithm in large

    mimo systems, IEEE Communications Letters, vol. 14, no. 5, pp. 399

    401, 2010.

    [17] B. S. Rajan, S. K. Mohammed, A. Chockalingam, and N. Srinidhi,

    Low-complexity near-ml decoding of large non-orthogonal stbcs usingreactive tabu search, in International Symposium Info. Theory. IEEE,

    2009, pp. 19931997.

    [18] T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, Random-

    restart reactive tabu search algorithm for detection in large-mimo sys-

    tems, IEEE Communications Letters, vol. 14, no. 12, pp. 11071109,

    2010.

    [19] F. A. Bhatti, S. A. Khan, S. Ur Rehman, and F. Rasool, Mimo ofdm

    signal detection using quadratic programming, in 14th International

    Multitopic Conference (INMIC). IEEE, 2011, pp. 323328.

    [20] Y. Zhang, W. Lu, and T. Gulliver, Integer qp relaxation based algorithms

    for ici reduction in ofdm systems, inCanadian Conference on Electrical

    and Computer Engineering, CCECE. IEEE, 2007, pp. 184187.

    [21] A. Mobasher, M. Taherzadeh, R. Sotirov, and A. K. Khandani, A near-

    maximum-likelihood decoding algorithm for mimo systems based on

    semi-definite programming, IEEE Trans. Info. Theory, vol. 53, no. 11,pp. 38693886, 2007.

    [22] Z.-q. Luo, W.-k. Ma, A.-C. So, Y. Ye, and S. Zhang, Semidefinite

    relaxation of quadratic optimization problems, IEEE Signal Processing

    Magazine, vol. 27, no. 3, pp. 2034, 2010.

    [23] W.-K. Ma, C.-C. Su, J. Jalden, T.-H. Chang, and C.-Y. Chi, The

    equivalence of semidefinite relaxation mimo detectors for higher-order

    qam, IEEE Journal of Selected Topics in Signal Processing, vol. 3,

    no. 6, pp. 10381052, 2009.

    [24] P. Li, R. C. de Lamare, and R. Fa, Multiple feedback successive

    interference cancellation detection for multiuser mimo systems, IEEE

    Trans. on Wireless Communications, vol. 10, no. 8, pp. 24342439, 2011.

    [25] Z. Li, Y. Cai, and M. Ni, Low complexity mimo detection based on

    branch and bound algorithm, in IEEE 18th International Symposium on

    Personal, Indoor and Mobile Radio Communications (PIMRC). IEEE,

    2007, pp. 15.

    [26] A. Murugan, H. El Gamal, M. Damen, and G. Caire, A unified frame-work for tree search decoding: rediscovering the sequential decoder,

    IEEE Trans. Info. Theory, vol. 52, no. 3, pp. 933953, 2006.

    [27] A. Elghariani and M. D. Zoltowski, Branch and bound algorithm for

    code spread ofdm, in Statistical Signal Processing Workshop (SSP).

    IEEE, 2012, pp. 844847.

    [28] A. Elghariani and M. Zoltowski, Branch and bound with m algorithm

    for near optimal mimo detection with higher order qam constellation, in

    MILITARY COMMUNICATIONS CONFERENCE (MILCOM). IEEE,

    2012, pp. 15.

    [29] G. J. Foschini, Layered space-time architecture for wireless commu-

    nication in a fading environment when using multi-element antennas,

    Bell labs technical journal, vol. 1, no. 2, pp. 4159, 1996.

    [30] C. V. Rao, S. J. Wright, and J. B. Rawlings, Application of interior-point

    methods to model predictive control,Journal of optimization theory and

    applications, vol. 99, no. 3, pp. 723757, 1998.[31] J. Gondzio, Interior point methods 25 years later, EJOR, vol. 218,

    no. 3, pp. 587601, 2012.

    [32] E. Lawler and D. Wood, Branch-and-bound methods: A survey,

    Operations research, pp. 699719, 1966.

    [33] W. Zhang, Branch and bound search algorithm and their computational

    complexity, Document of USC/Information Sciences Institute, May

    1996., Tech. Rep., 1996.

    [34] L.-L. Yang, Using multi-stage mmse detection to approach optimum

    error performance in multiantenna mimo systems, in Vehicular Tech-

    nology Conference Fall (VTC 2009-Fall), 2009 IEEE 70th. IEEE, 2009,

    pp. 15.

    [35] J. Clausen, Branch and bound algorithms-principles and examples,

    Department of Computer Science, University of Copenhagen, pp. 130,

    1999.

    [36] T. Ibaraki, Theoretical comparisons of search strategies in branch-and-bound algorithms, International Journal of Parallel Programming,

    vol. 5, no. 4, pp. 315344, 1976.

    [37] J. Anderson and S. Mohan, Sequential coding algorithms: A survey and

    cost analysis, IEEE Transactions on Communications, vol. 32, no. 2,

    pp. 169176, 1984.

    [38] J. Zhang and K. Kim, Near-capacity mimo multiuser precoding with

    qrd-m algorithm, in Conference Record of the Thirty-Ninth Asilomar

    Conference on Signals, Systems and Computers (Asilomar). IEEE,

    2005, pp. 14981502.

    [39] M. Lau, S. Yue, K. Ling, and J. Maciejowski, A comparison of interior

    point and active set methods for fpga implementation of model predictive

    control, in Proc. European Control Conference, 2009, pp. 156160.

    [40] T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, A hybrid

  • 7/25/2019 Low Complexity Detection Algorithms in Large-Scale MIMO Systems

    14/14

    his article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation informatio

    10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications

    14

    rts-bp algorithm for improved detection of large-mimo m-qam signals,

    in National Conference on Communications (NCC),.

    [41] P. Svac, F. Meyer, E. Riegler, and F. Hlawatsch, Soft-heuristic detectors

    for large mimo systems, IEEE Trans. Signal Processing, vol. 61, pp.

    45734586, 2013.

    [42] Y. S. Cho, J. Kim, W. Y. Yang, and C. G. Kang, MIMO-OFDM wireless

    communications with MATLAB. JW&S, 2010.

    [43] H. Ozcelik, M. Herdin, W. Weichselberger, J. Wallace, and E. Bonek,

    Deficiencies of kroneckermimo radio channel model, Electronics

    Letters, vol. 39, no. 16, pp. 12091210, 2003.[44] J. W. Wallace and M. A. Jensen, Modeling the indoor mimo wireless

    channel, IEEE Transactions on Antennas and Propagation, vol. 50,

    no. 5, pp. 591599, 2002.

    [45] M. A. Jensen and J. W. Wallace, A review of antennas and propagation

    for mimo wireless communications, IEEE Transactions on Antennas

    and Propagation, vol. 52, no. 11, pp. 28102824, 2004.

    [46] M. Tuchler, R. Koetter, and A. C. Singer, Turbo equalization: principles

    and new results, IEEE Trans. Communications, vol. 50, no. 5, pp. 754

    767, 2002.

    [47] B. Steingrimsson, Z.-Q. Luo, and K. M. Wong, Soft quasi-maximum-

    likelihood detection for multiple-antenna wireless channels, IEEE

    Transactions on Signal Processing, vol. 51, no. 11, pp. 27102719, 2003.

    [48] B. M. Hochwald and S. Ten Brink, Achieving near-capacity on

    a multiple-antenna channel, IEEE Transactions on Communications,

    vol. 51, no. 3, pp. 389399, 2003.

    Ali Elghariani received both the B.S. and M.S. de-grees in Electrical and Electronic Engineering fromUniversity of Tripoli in 1999 and 2008, respectively,and the Ph.D. in Communications, Networking, andSignal Processing from the School of Electricaland Computer Engineering at Purdue University ofWest Lafayette in 2014. He joined industry forseveral years before he started his PhD. Currentlyhe is a lecturer at the Department of Electrical andElectronic Engineering, University of Tripoli, Libya.

    During 2013 he was a system engineer intern withQualcomm, Inc. at San Diego. He was the recipient of IEEE MILCOMconference travel grant award in 2012. His current research interests aresignal detection and channel estimation in large-scale MIMO systems, symbolspreading OFDM systems, turbo equalization, and the application of quadraticprogramming optimization techniques in wireless communications.

    Michael Zoltowskireceived both the B.S. and M.S.degrees in Electrical Engineering with highest hon-ors from Drexel University in 1983 and the Ph.D. inSystems Engineering from the University of Penn-sylvania in 1986. In Fall 1986, he joined the facultyof Purdue University where he currently holds anEndowed Chaired Professorship in Electrical andComputer Engineering. In this capacity, he was theRuth and Joel Spira Outstanding Teacher Award for1990-1991 and the 2001-2002 Wilfred Hesselberth

    Award for Teaching Excellence, and the EngineeringDistance Education Award for 2012. In 2001, he was named a UniversityFaculty Scholar by Purdue University. On 25 September 2008, he becamethe Thomas J. and Wendy Engibous Professor of Electrical and ComputerEngineering, an Endowed Chair conferred by the Board of Trustees of PurdueUniversity. Prof. Zoltowski is a co-recipient of a 2014 IEEE Globecom BestPaper Award and a 21st Humantech Paper Award: Silver Prize sponsoredby Samsung. He is also the recipient of a 2002 Technical AchievementAward from the IEEE Signal Processing Society. In addition, he served asa 2003 Distinguished Lecturer for the IEEE Signal Processing Society. Heis a Fellow of IEEE. He is a recipient of the 2006 Distinguished AlumniAward from Drexel University. Prof. Zoltowski is a co-recipient of the IEEECommunications Society 2001 Leonard G. Abraham Prize Paper Award inthe Field of Communications Systems. He is also the recipient of the IEEESignal Processing Societys 1991 Paper Award, The Fred Ellersick MILCOMAward for Best Paper in the Unclassified Technical Program at the 1998IEEE Military Communications Conference, and a Best Paper Award at the

    2000 IEEE International Symposium on Spread Spectrum Techniques andApplications. In addition, from 1998 to 2001, Dr. Zoltowski served as anelected Member-at-Large of the Board of Governors and Secretary of theIEEE Signal Processing Society. From 2003-2005, he served on the AwardsBoard of the IEEE Signal Processing Society and also served as the AreaEditor in charge of Feature Articles for the IEEE Signal Processing Magazine.Within the IEEE Signal Processing Society, he has been a member of theTechnical Committee for the Statistical Signal and Array Processing Area,the Technical Committee for DSP Education and the Technical Committeeon Signal Processing for Communications (SPCOM.) From 2003-2004, heserved as Vice-Chair of the Technical Committee on Sensor and Multichannel(SAM) Processing, and served as Chair for 2005-2006. He has served as anAssociate Editor for both the IEEE Transactions on Signal Processing and theIEEE Communications Letters. He was Technical Chair for the 2006 IEEESensor Array and Multichannel Workshop. He served as Vice-President forAwards & Membership for the IEEE Signal Processing Society, 2008-2010.