1 david brewster building, edinburgh, eh14 4as, uk and 2 … · 2018-10-08 · projected gradient...

Projected gradient descent algorithms for quantum state tomography

Eliot Bolduc1, George Knee2, Erik Gauger1, Jonathan Leach1∗1 SUPA, Institute of Photonics and Quantum Sciences, Heriot-Watt University,

David Brewster Building, Edinburgh, EH14 4AS, UK and2 Department of Physics, University of Warwick, Coventry, CV4 7AL, UK

(Dated: October 8, 2018)

Accurate quantum tomography is a vital tool in both fundamental and applied quantum science. Itis a task that involves processing a noisy measurement record in order to construct a reliable estimateof an unknown quantum state, and is central to quantum computing, metrology, and communication.To date, many different approaches to quantum state estimation have been developed, yet no onemethod fits all applications, and all fail relatively quickly as the dimensionality of the unknownstate grows. In this work, we suggest that projected gradient descent is a method that can evadesome of these shortcomings. We present three novel tomography algorithms that use projectedgradient descent and compare their performance with state-of-the-art alternatives, i.e. the dilutediterative algorithm and convex programming. Our results find in favour of the general class ofprojected gradient descent methods due to their speed, applicability to large states, and the rangeof conditions in which they perform.

INTRODUCTION

The reconstruction of an unknown quantum state –known as quantum tomography – is a fundamental taskin quantum information science, where a myriad of newtechnologies promise to exploit special features of quan-tum states in order to enhance communication, metrol-ogy, and computation. Since the quantum state repre-sents maximal information about a physical system, allphysical properties can be calculated from it. Checkingfor the existence for a highly entangled state, a statewhich can violate a Bell inequality, or even the initialstate required in a gate-based quantum computer arethus just some examples of the importance of inferringthe quantum state from laboratory data. As experimen-tal methods progress, the complexity of quantum sys-tems that can be well controlled in the laboratory grows.In recent times, for example, various groups have beenable to demonstrate quantum control of a high numberof qubits [1–3]. To gain an idea of the challenge of statereconstruction, one need only consider that the numberof real parameters required to describe the joint state ofn qubits scales as order 22n. Alternatively, the orbitalangular momentum of single photons, for example, is asingle degree of freedom with a large amount of internalstructure: it has recently been characterised via the re-construction of a 100,000 dimensional statevector [4–6].Quite apart from the challenges presented by preparationand measurement of quantum states, tackling the statereconstruction problem in the face of such complexitycalls for sophisticated data processing techniques, whichare the focus of this paper.

Tomography experiments involve subjecting a sys-tem, described by some unknown quantum state, to awell-defined measurement procedure and recording themeasurement outcome. The central tenets of quantumtheory place severe restrictions on one’s ability to charac-

terise an unknown quantum system given measurementsmade on only a single copy. One assumes, therefore, ac-cess to a large but finite number of copies of a system,all prepared in an identical quantum state. As the com-plexity of the quantum state grows, the number of detec-tor counts for each state parameter necessarily shrinks.Particularly in optical systems, the prevalence of low de-tection efficiency exacerbates this problem, leaving thetomographer with a noisy data-set from which to makeher inferences. In an idealised situation where all noise(including statistical) is absent, the true state ρtrue canbe found exactly. Here we concentrate on the more re-alistic situation, and assume only that the measurementprocedure is informationally complete (that is to say, themeasurement record contains a nonzero amount of in-formation available about each quantum state parame-ter), and turn our attention to the question of processingthis data toward the most likely estimate of the unknownstate.

In most cases, quantum tomographers must employnumerical techniques to search for the best estimate pos-sible, given the data that has been collected. In thiswork, we analyse the algorithmic method of projectedgradient descent as applied to quantum tomography, andbenchmark it against a number of existing methods. Thestate-of-the-art with regards to full quantum tomographymethods include the diluted iterative algorithm (DIA)[7, 8] and convex programming [9]. Both methods ben-efit from a theoretical guarantee: that they converge tothe maximum likelihood (ML) state ρML (discussed be-low). However, the DIA has been observed to convergeslowly [10, 11], and convex programming solvers such asSDPT3 and SeDuMi are known to require computationaltime that scales poorly with non-sparse matrix dimen-sionality [12, 13].

A non-iterative quantum tomography method wasdevised by Smolin et al., who showed that, if the mea-

arX

iv:1

612.

0953

1v1

[qu

ant-

ph]

30

Dec

201

6

2

surement operators are traceless and the noise is of theGaussian type, the constrained maximum likelihood stateρML can be retrieved in a single projection step from theunconstrained maximum likelihood state χML, which canbe found very quickly using linear inversion [14]. Withan implementation of this method’s core algorithm usinga GPU, Guo et al. were able to recover a simulated 14-qubit density matrix [15]. However, the restrictive aboveconditions motivate the search for more broadly applica-ble efficient techniques.

Projected gradient descent (PGD) methods are emerg-ing as promising candidates for quantum tomography[10, 16]. We present three PGD algorithms that con-verge towards the maximum likelihood quantum state:projected gradient descent with backtracking (PGDB)[10, 16], Fast Iterative Shrinkage-Thresholding Algorithm(FISTA) [17] and projected gradient descent with mo-mentum (PGDM). We provide evidence that they con-verge faster than both DIA and SDPT3, and scale morefavourably than SDPT3. We also find that the PGDmethods are very versatile in that one can model a widevariety of types of noise; in particular, the case of ill-conditioned measurements.

Goncalves et al. [10] and Shang et al. [16] have both re-cently discussed PGDB as an efficient technique for quan-tum tomography: however, the algorithmic variants weintroduce here can significantly outperform it. Further-more, Shang et al. considered Pauli measurements, forwhich the technique from Ref. [14] is highly efficient. It istherefore vital to study the performance of projected gra-dient descent outside of this realm to establish its trueusefulness. We here confirm that PGD methods con-tinue to exhibit excellent properties in scenarios wherethe technique from Ref. [14] is not applicable.

The remainder of this paper is structured as follows:After giving an introduction to quantum tomography andthe general idea of projected gradient descent, we layout three PGD algorithms and discuss their performance:namely, convergence profiles and running time. To thebest of our knowledge, FISTA has never previously beenapplied to quantum tomography and PGDM is a novel al-gorithm altogether. DIA and SDPT3, considered as cur-rent state-of-the-art algorithms, will serve as benchmarksby which the PGD approach will be judged. Finally, wereport on the results of state reconstruction using pseudo-experimental data, i.e. simulations of realistic quantumtomographic experiments with noise. We consider threefigures of merit for quantum tomographic techniques: theconvergence time and the fidelity between the recoveredstate and the actual one ρtrue. Over a broad range ofHilbert space dimensions, we run the algorithms multi-ple times, each time with a randomly generated densitymatrices with fixed purity [18], and record the runningtimes and fidelities.

(a)

ρ1

ρ2

ρ3

ρ4

ρML

descend

project

(b)

ρ1

ρ2

ρ3

ρ4

ρ5

ρMLρ6

cost function

cost

func

tion

0 1 2 3 4 5 6iteration number

inside physical spaceoutside physical space

(c)

physical space

unconstrainedminimum

ρML

FIG. 1: a) Schematic showing the physical space as a convex subsetof the set of unconstrained matrices, The minimum of the costfunction is often outside of the physical space. b) Illustration ofthe PGD process for a qubit. A step in the gradient direction canyield a non physically-allowed density matrix, but the projectionbrings the estimate back into the desired search space, (i.e. theBloch sphere in the case of a qubit). c) This process lowers thecost function, and is repeated until progress is sufficiently smalland the final density matrix estimate is as close as desired to themaximum likelihood state ρML.

Quantum tomography

The Born rule, pi = Tr(Πiρ), being the central equa-tion of quantum theory, encodes the probability pi toobtain a certain measurement outcome given a particu-lar quantum state. It involves a quantum state ρ, whichtakes the form of a d× d positive-semidefinite matrix ofunit trace, and an Hermitian measurement operator Πi.Quantum tomography is essentially the process of findingthe density matrix whose calculated outcome probabili-ties (for an informationally complete set of N operators)most closely match the experimentally observed data.

The probabilities pi are of course not directly ob-servable, only the number of clicks ni recorded in a realmeasurement device after a finite number of trials. In theabsence of noise, the probabilities relate to the numberof clicks through a multiplicative factor r: ni = rpi. Inreal situations, there is a discrepancy between rpi and nidue to i) technical noise in the measurement device andii) statistical uncertainty. Furthermore, if the noise isparticularly severe, the matrix reconstructed with naivemethods (such as linear inversion) will fail to qualify as aphysical quantum state: the positivity or unit-trace prop-erties can be violated. Multiple techniques have thereforebeen developed that allow one to search for an estimatematrix that is guaranteed to be physical. Examples in-clude searching for the Cholesky factor T (where ρ = TT †

is guaranteed positive) and using a Lagrange multiplier(to ensure unit trace) [19]. However, searching in thefactored space can often lead to an ill-conditioned prob-lem and slow convergence [16]. As we evidence, there are

3

advantages to be had by instead allowing the search totemporarily wander into unphysical territory.

The measurement operators, the expectation valuespi and the detector clicks ni can be stacked into a matrixand two vectors, respectively:

A =

vec(Π1)T

...

vec(ΠN )T

, p =

p1...pN

, and n =

n1...nN

,

(1)where N is the total number of projectors. With theabove notation, the expectation value vector reads p =A vec(ρ). The computation of this vector takes O(Nd2)floating-point operations in general, but a lower compu-tational complexity can be achieved when the operatorsoriginate from tensor products [16]. The accuracy of themaximum likelihood state depends significantly on thecondition number (which is the ratio of maximum singu-lar value to minimum singular value) of the measurementmatrix A [20]. High condition numbers, which corre-spond to ill-conditioned measurement matrices, arise inthe fields of detector tomography [21, 22] and supercon-ducting artificial atoms [23].

The multiplicative factor r can readily be estimated ifat least a subset Z of the measurement matrix A formsa POVM, in which case the sum of the probabilities be-longing to the set Z,

∑pZ , is independent of the state

ρ. For example, if the measurement operators in Z forma basis, the sum in question amounts to unity. For an ar-bitrary POVM, the best estimate for the multiplicationfactor is given by

r =d

NZ

∑j∈Z

nj , (2)

where NZ is the number of projectors in the POVM.Moreover, the average number of clicks per outcome isr/d, and the total number of clicks for this POVM isrNZ/d.

Summary of PGD algorithms

The distance between pi and ni/r is to be consid-ered as a ‘cost function’ C(ρ) in the sense of numeri-cal optimisation. In the minimisation of a cost function,PGD algorithms are useful when one seeks a solution ina proper subset of a larger search space. Since physi-cal quantum states, represented as unit-trace positive-semidefinite matrices, exist in a (convex) subset of the(convex) set of d × d matrices, quantum tomography isindeed a problem of this kind. Projected gradient de-scent is an iterative procedure with two substeps. Start-ing with a well-chosen physical state, first a step is takenin the downhill direction of the cost function, which hasthe chance to result in a nonphysical matrix. Second, to

bring the estimate back within the constrained, physicalspace, we project it to the closest point in the solutionspace (for example, using a matrix norm). This two stepprocess is then repeated until the cost function convergestowards a low enough value. Since we are searching overa convex set, as long as the cost function is a strictlyconvex function of ρ, there will be a unique solution thatminimises it. Fig. 1 shows the evolution of the densitymatrix estimate of a qubit through six iterations of PGD.

Maximum likelihood

We have yet to specifically define the figure of meritfor closeness between the estimated probabilities and theoutcome frequencies. Maximum likelihood analysis pro-vides a principled way to derive such a figure of merit.When doing statistical estimation, it is necessary to op-erate within a statistical model or belief system: oneapproach is Bayesian estimation, which works with it-eratively updating such beliefs using Bayes’ rule. Here,however, the beliefs are encoded in the likelihood func-tion for a multinomial experiment:

L(ρ) ∝∏i

pnii . (3)

Maximizing this function – i.e. finding the quantumstate ρ that makes the observed data the most likely– is the most widely applied philosophy for tomogra-phy [7, 19, 24, 25]. Since the maximum-likelihood stateρML = maxρ L(ρ) is also the minimum of − logL, we canproceed by minimising the second function, and we mayignore any scale or shift by a constant that is independentof ρ. We therefore define the cost function

C(ρ) = − logL(ρ) (4)

that we seek to minimise. When the number of tri-als is large, this is well approximated by the Poisson-approximated Gaussian likelihood function C(ρ) ≈− logLP (ρ) = νTν with νi = (rpi − ni)/

√ni. Assum-

ing Poisson-distributed data, the variance for outcomei is the number of clicks ni. Hence νi corresponds tothe ratio of the error to the expected error on outcomei. The true density matrix gives an expected negativelog-likelihood per outcome C/N of unity because of thenoise on the outcomes ni. A value greater than unity in-dicates a poor density matrix estimate or an incompletenoise model, whereas a value smaller than unity is a signof overfitting to noise. In general, the maximum likeli-hood density matrix overfits the data slightly [26], butone cannot achieve a better estimate in the absence ofprior knowledge.

We are now ready to detail the algorithms for recon-structing the density matrix. In all of the following al-gorithms, the completely mixed state ρ0 = I/d will serve

4

as the starting point. Our selection of four iterative al-gorithms are then defined by a recursion relation relatingthe density matrix at the next iteration to the matrix atthe current iteration.

RESULTS

Projected Gradient Descent Algorithms

The process of any PGD algorithm involves steps inthe general gradient direction, interspersed with leapsback into the constrained set [10, 16]. The simplest suchalgorithm can be written in a single line [4]:

ρk+1 = P[ρk − δ∇C(ρk)], (5)

where δ is a step size and P[·] is a projection onto theset of unit-trace positive matrices, seeking the ‘closest’unit-trace positive matrix to its argument. Various ap-proaches can be used to to establish what ‘closest’ means(including operator norms, see Supplementary Informa-tion). We adopt the simplex projection P[·]→ S[·], whichessentially consists of transforming the eigenvalues of thedensity matrix such that the sum is unity trace and theyare all positive [27]. If the multiplicative factor r is knownor computed in advance, the version described in detailin Ref. [10] applies, otherwise the projection must in-stead be performed over the space of positive matrices,preserving the trace of the argument [14].

We now proceed to discuss three PGD algorithmswhich are all are extensions of Eq. (5).

Projected Gradient Descent with Momentum

Here we augment the basic PGD algorithm of Eq. (5)with a technique borrowed the momentum-aided gradientdescent method from the field of machine learning [28].This technique stores a running weighted-average Mk ofthe log-likelihood gradient. This running average pro-vides a memory of previous descent directions which isused to better estimate the next descent step. The coreof this algorithm reads

Mk+1 = ζkMk − γk∇C(ρk) ,

ρk+1 = S(ρk +Mk) ,(6)

where ζk codes for the level of ‘inertia’ for the descentdirection, and γk is the step size. In general, these meta-parameters depend on the iteration number k, but canalso be set as constants throughout the algorithm. Weprovide full pseudo-code for this and the other algorithmsin the Methods section, as well as Matlab implementa-tions.

Fast Iterative Shrinkage-Thresholding Algorithm

FISTA was first developed in the context of image de-noising [17], but here we introduce the method for usein quantum state tomography with adapted refinements.In this implementation of PGD, the change in the iter-ate ρk is not always in the descent direction, i.e. thelog-likelihood function can go up, but as we shall see itdescends much faster on average than the basic gradientdescent algorithm from Eq. (5). The core of the algo-rithm is given by

ρk+1 = S[ρk +

k − 2

k + 1(ρk − ρk−1)− δ∇C(ρk)

], (7)

where δ is a step size.

Projected gradient descent with backtracking

This PGD method has recently been applied to quan-tum tomography simulations in Ref. [10]. A similar vari-ant has been studied in Ref. [16], where the authors re-port on a hybrid method between the DIA and PGD. ThePGDB algorithm, whose main characteristic consists oftrying to find the maximum step size that reduces thenegative log-likelihood, can be written as

ρk+1 = (1− α)ρk + αS [ρk −∇C(ρk)] , (8)

where α is a parameter to be loosely optimised at eachstep through backtracking. Each iteration of this algo-rithm is guaranteed to lower the negative log-likelihoodunless a stationary point is reached, in which case thestopping criterion is satisfied.

Benchmark algorithms

Diluted Iterative Algorithm

The diluted iterative algorithm (DIA) is based on thegradient of the log-likelihood function. The algorithmcan be simply stated with the following two iterativeequations [7, 8]

Rk = −H−1/2[∇C]H−1/2 ,

ρk =(I + εRk)ρk−1(I + εRk)

Tr[(I + εRk)ρk−1(I + εRk)],

(9)

where H =∑i Πi/

∑i pi. The variable ε is optimised at

every iteration, such that it minimises the log-likelihoodfunction, and can be implemented in various ways [8, 29].The matrix H reduces to the identity (up to a constant)when all the measurement operators form a POVM. TheDIA leaves the density matrix estimate ρk positive atevery iteration.

5

0 200 400 600 800 1000 1200

106

100

102

10 4

106

100

102

10 4

0 200 400 600 800 1000 1200

C/N

C/N

Time (s ) Time (s )

0 10 20 30

102

100

101

102

100

101

0 20 60 10040 80Time (s )

C/N

Time (s )

C/N

(b) (a)

benchmark algorithms

projected gradient

PGDM

FISTA

PGDB

DIA

SDPT3

descent algorithms

FIG. 2: Typical curves of the cost function versus running time. These simulations are performed on six-qubit systems using (a) Paulimeasurements as indicated by the vectors on the Bloch sphere and (b) an ill-conditioned measurement matrix. The global minimum ofthe negative log-likelihood is expected to be at C/N ≈ 1, around the top of the grey regions. Only for PGDM and FISTA can the costfunction go up as a function of iteration number before reaching the ML state. The total running time for PGDB correlates highly withthe measurement matrix condition number.

Number of qubits

(b)

2Number of qubits

10-2

100

102

10 4

Tim

e (s

)

(a)

3 4 5 6 7 8

10 -2

10 0

10 2

10 4

2 3 4 5 6 7 8

Tim

e (s

)benchmark algorithms

projected gradient

PGDM

FISTA

PGDB

DIA

SDPT3

descent algorithms

FIG. 3: Average running time to reach (sufficiently close to) ρML for various system sizes. The measurement matrix is made of (a) Paulimeasurements and (b) bases relatively close to each other, as indicated on the inset. The colored areas are bounded by one standarddeviation above and below the mean running time. The gradient-based techniques have a computational complexity of O(Nd2) whileSDPT3 converges in time O(N2d2).

It has been observed that the DIA converges quicklyfor the first few iterations and converges very slowly later[10, 11, 16, 21]. In the Results section, we corroboratethese observations.

Semidefinite programming

As we emphasised above, quantum tomography is oftenequivalent to minimising a convex function over a convexset. In the field of numerical optimisation, a problemis considered effectively solved if it can be cast into thisform, partly because of the powerful and efficient algo-rithms and software packages that are available, and alsobecause of the guarantee of global optimality for the so-lutions that they find. Such a software package thereforemakes a natural benchmark for quantum tomography al-

gorithms, with the the understanding that (because oftheir general purpose nature) they are not optimised forfull tomography and unlikely to be as fast as other meth-ods. We use the CVX software environment; and theSDPT3 solver, which is an example of an infeasible path-following algorithm.

Simulation

We perform quantum tomography simulations onmulti-qubit systems with all of the techniques mentionedin the previous section. When using canonical Pauli mea-surements, all of the techniques are found to work well inthat they all recover ρML. Since the simulations consis-tently reach high likelihoods in practice, we concentrateon the total computation time. The exit criterion for all

6

techniques – except for SDPT3 whose code we do notchange – is such that when the average gradients of thelast 20 iterations is sufficiently small, the optimisationprocedures are terminated.

Examples of convergence curves for 6-qubit systemsusing Pauli measurements are shown in Fig. 2. The de-tails of the implementations and simulated data are pro-vided in the Methods section. As already remarked inRefs. [10, 11, 16, 21], the DIA displays fast convergencein the first few iterations but requires many more to fi-nally satisfy the exit criterion. SDPT3 converges in be-tween only 10 and 15 iterations, but each iteration has acomputational complexity of O(N2d2), rendering it slowin high dimensions.

We put the tomographic techniques to the test us-ing ill-conditioned measurement matrices [20, 21, 23], seeFig. 2 b). If the measurement operators are limited to arestricted region of the Hilbert space, the condition num-ber of the measurement matrix is greater than unity, andthe error on the final density matrix estimate will nec-essarily increase [20]. Here, the measurement matrix isbuilt using the bases

[0 1]; [1 0]; [cos(β/2), sin(β/2)]; [sin(β/2), cos(β/2)];

[cos(β/2), i sin(β/2)]; [sin(β/2), i cos(β/2)](10)

with β = π/4 for regular canonical Pauli operators andβ = π/3 for the ill-conditioned case; see the Bloch sphereson Fig. 2 for an illustration of these vectors.

Goncalves et al. provide a proof of the monotonicity ofC for PGDB, that is to say that the cost function neverincreases in this algorithm. By contrast, PGDM andFISTA are both algorithms for which the cost functionmay increase from one iteration to the next, but interest-ingly, this tends to speed up their performance relative toPGDB with regards to the ill-conditioned measurementmatrices.

We show the running time of each algorithm as a func-tion of Hilbert space dimensionality (number of qubits)in Fig. 3. The speed-up of PGDM over PGDB for high-dimensions is present in both panels of Fig. 3, but partic-ularly pronounced for the quantum state reconstructiontask based on the ill-conditioned measurement matrices,where PGDM is about ten times faster than PGDB onaverage for the eight and seven-qubit cases.

The running times of PGDM and FISTA algorithmsare more resilient to a condition number change thanPGDB, due to the fact that the number of PGDM andFISTA iterations required to reach ρML grows very littleas a function of the measurement matrix condition num-ber. Fig. 4 a) illustrates this dependence. The semidefi-nite programming technique does not depend on the con-dition number: in our simulations, SDPT3 always tookabout 15 iterations to reach ρML.

There exists a monotonic relationship between ill-posedness and the accuracy of the recovered state: for

PGDM

SPDT3

PGDBFISTA

Run

ning

tim

e (s

)

100

101

102

103a)

cos 2 β

0 0.5 1

0.4 0.5 0.6

3

4

5×10-3

10-3

10-2

10 -1

100

Infid

elity

b)

cos 2 β

0 0.5 1

FIG. 4: Scatter plot of (a) the time performance and (b) theinfidelity between the recovered five-qubit states and the true statesas a function of the ill-posedness of the measurement matrix. ThePGDB method saturates in (a) because it reaches the maximumnumber of iterations set in the program. The inset in (b) showsthat the methods converge towards the same fidelity, indicatingthat they reach ρML. The results of FISTA differ slightly becausethis method oscillates around ρML for a large number of iterations.The number of events per outcome is set to 104.

a fixed number of events per measurement, the more ill-posed the measurement matrix is, the lower the fidelitybetween the recovered state and the true one. This re-lationship is illustrated in Fig. 4b, where the verticalaxis corresponds to one minus the fidelity f(ρ1, ρ2) =Tr(√√

ρ1ρ2√ρ1) between the recovered density matrix

and the true state. The extreme cases, i.e. cos2 β = 0and cos2 β = 1, correspond to mutually unbiased basesand a single basis measurement, respectively. We show azoomed subset of this data around the intermediate an-gle, i.e. cos2 π/4 = 0.5, that gives statistical evidencethat the projected gradient descent techniques consis-tently converge to ρML.

7

DISCUSSION

A key advantage of the PGD techniques is their ver-satility. They successfully and quickly converge to themaximum likelihood state in a wide range of cases, re-gardless of the desired accuracy and whether the mea-surement matrix is well- or ill-conditioned. PGDM andFISTA – our new PGD algorithms – are shown to beespecially well suited for ill-conditioned problems.

Quantum tomography includes three main subfields:detector, process and state tomography. State tomogra-phy algorithms are straightforwardly transferable to pro-cess tomography, but applying the same algorithms to de-tector tomography with success is not trivial. The prob-lem of detector tomography lies in characterising an un-known detector POVM from an informationally completeset of known states. If the tomographer probes an opti-cal detector with coherent states, the problem of detectortomography is ill-conditioned, and like the density ma-trix, the POVM elements must be positive-semidefinite[22]. Currently, the state-of-the-art algorithms to solvethis problem are semidefinite program solvers such as Se-DuMi [21]. Because PGDM and FISTA perform well inthe case of ill-conditioned measurement matrices, thesealgorithms hold great promise for optical detector tomog-raphy. One avenue for future work is the application ofthese two algorithms to the characterisation of detectorPOVMs in high dimensions.

In summary, the PGD techniques have proven theirworth in that they all converge towards the maximumlikelihood density matrix reliably. The different PGDtechniques complement each other such that PGDB isfastest in low dimensions and, according to our simula-tions, PGDM is fastest beyond five-qubit systems. Fur-ther, we find that the PGD techniques reach ρML sig-nificantly faster than the DIA and SDPT3 in the vastmajority of scenarios, thus surpassing the state-of-the-art techniques with regards to assumption-free quantumstate tomography.

Methods

Rank-1 projectors

To reduce the memory requirements, we chose to userank-1 measurement operators in the simulation. Insteadof matrix operators, the measurements take the form ofd-dimensional vectors. Given rank-1 projectors |φi〉〈φi|,the Born rule is written pi = 〈φi|ρ|φi〉 and the measure-ment matrix takes the following form

A =

〈φ1|...〈φN |

. (11)

In this compact notation, the measurement matrix is(N × d)-dimensional, thus requiring d times less RAMmemory compared to the full-rank case.

The gradient of the Gaussian negative log-likelihoodfunction is compactly written as

∇ logLG(ρ) = 2G†A∗, (12)

where the elements of the G matrix are defined: Gi,j =Ai,j(rpi − ni). The computation of the above gradienttakes O(Nd2) floating-point operations. This is the gra-dient that we use in the PGD algorithms for the simula-tions.

For all simulations in the main text, the average num-ber of events per outcome r/d is set to 104. Densitymatrices were chosen randomly in the Haar sense but al-ways a with purity of 0.5. All simulations were performedon a single thread on an Intel Xeon Haswell processor.

PGD algorithms

The pseudo code for PGDM, FISTA and PGDB isgiven in Algorithms 1, 2 and 3. The symbol ◦ corre-sponds to the Hadamard product (or element-wise mul-tiplication).

Algorithm 1 PGDM

1: k = 02: Initial estimate and momentum matrix: ρ0 = I, M0 = 03: currentMagnitude = dlog10 CP (ρ0)e4: Set step size and inertia: γ = (2rd)−1, ζ = 0.955: while

∑20j=1 |CP (ρj)− CP (ρj−1)| > 10−5 do

6: Projection: ρk = S(ρk)7: Estimate probabilities: pk =

∑j [A ◦ (A∗ρk)]i,j

8: Calculate log-likelihood: CP (ρk) = νTν/N9: Compute gradient: ∇CG(ρk) = 2G†A∗

10: currentMagnitude = dlog10 CP (ρk)e)11: if currentMagnitude < previousMagnitude then12: Update inertia: ζk = (1− (1− ζk) ∗ 0.95)13: previousMagnitude = currentMagnitude

14: Update momentum: Mk+1 = ζkMk − γ∇C(ρk)15: Update density matrix: ρk+1 = ρk +Mk+1

16: k = k + 1

17: Final projection: ρfinal = S(ρk+1)18: Return ρfinal

∗ email: [email protected][1] W.-B. Gao, C.-Y. Lu, X.-C. Yao, P. Xu, O. Guhne,

A. Goebel, Y.-A. Chen, C.-Z. Peng, Z.-B. Chen, and J.-W. Pan, Nature Phys. 6, 331 (2010).

[2] P. Schindler, J. T. Barreiro, T. Monz, V. Nebendahl,D. Nigg, M. Chwalla, M. Hennrich, and R. Blatt, Science332, 1059 (2011).

mailto:[email protected]

8

Algorithm 2 FISTA

1: k = 02: Initial estimate and momentum matrix: ρ0 = I, M0 = 03: Set step size: δ = (10d)−1

4: while∑20

j=1 |CP (ρj)− CP (ρj−1)| > 10−6 do

5: Projection: ρk = S(ρk)6: Estimate probabilities: pk =

∑j [A ◦ (A∗ρk)]i,j

7: Calculate log-likelihood: CP (ρk) = νTν/N8: Compute gradient: ∇CG(ρk) = 2G†A∗9: ρk+1 = ρk + (k − 2)(ρk − ρk−1)(k + 1)−1 − δ∇CG(ρk)

10: k = k + 1


Algorithm 3 PGDB

1: k = 02: Initial estimate and momentum matrix: ρ0 = I, M0 = 03: Set metaparameters: µ = 1, ` = 10−4

4: while∑20

j=1 |C(ρj)− C(ρj−1)| > 10−5 do

5: Projection: ρk = S(ρk)6: Estimate probabilities: pki =

∑j [A ◦ (A∗ρk)]i,j

7: Calculate log-likelihood: C(ρk) = νTν/N8: Compute gradient: ∇C(ρk) = 2P †A∗9: ρ′k = S(ρk − µ−1∇CG(ρk))

10: D = ρ′k − ρk11: Line search initialisation: αk = 112: C′G(ρk) = CG(ρk) + `αkTr[D∇CG(ρk)]13: while CG(ρk + αkD) > C′G(ρk) do14: Line search: αk = αk/215: C′G(ρk) = CG(ρk) + `αkTr[D∇CG(ρk)]

16: Update density matrix: ρk+1 = ρk + αkDk

17: k = k + 1


[3] X.-C. Yao, T.-X. Wang, P. Xu, H. Lu, G.-S. Pan, X.-H.Bao, C.-Z. Peng, C.-Y. Lu, Y.-A. Chen, and J.-W. Pan,Nature Comm. 6, 225 (2012).

[4] E. Bolduc, G. Gariepy, and J. Leach, Nature Comm. 7(2016).

[5] M. Malik, M. Mirhosseini, M. Lavery, J. Leach, M. J.Padgett, and R. W. Boyd, Nat Commun 5, 3115 (2014).

[6] X. C. Yao, T. X. Wang, P. Xu, H. Lu, G. S. Pan, X. H.Bao, C.-Z. Peng, C.-Y. Lu, Y.-A. Chen, and J.-W. Pan,Nature Comm. 6, 225 (2012).

[7] J. Rehacek, Z. Hradil, and M. Jezek, Phys Rev A 63,040303 (2001).

[8] J. Rehacek, Z. Hradil, E. Knill, and A. I. Lvovsky, PhysRev A 75, 042108 (2007).

[9] M. Grant, S. Boyd, and Y. Ye (2008).[10] D. S. Goncalves, M. A. Gomes-Ruggiero, and C. Lavor,

Optimization Methods and Software 31, 328 (2016).[11] G. B. Silva, S. Glancy, and H. M. Vasconcelos, arXiv

preprint arXiv:1604.00321 (2016).

[12] K.-C. Toh, M. J. Todd, and R. H. Tutuncu, OptimizationMethods and Software 11, 545 (1999).

[13] J. F. Sturm, Optimization Methods and Software 11, 625(1999).

[14] J. A. Smolin, J. M. Gambetta, and G. Smith, Phys RevLett 108, 070502 (2012).

[15] Z. H. Guo, H.-S. Zhong, Y. Tian, D. Dong, B. Qi, L. Li,Y. Wang, F. Nori, G.-Y. Xiang, C.-F. Li, et al., New J.Phys. 18, 083036 (2016).

[16] J. Shang, Z. Zhang, and H. K. Ng, arXiv preprintarXiv:1609.07881 (2016).

[17] A. Beck and M. Teboulle, SIAM Journal on Imaging Sci-ences 2, 183 (2009).

[18] K. Zyczkowski, K. A. Penson, I. Nechita, and B. Collins,Journal of Mathematical Physics 52, 062201 (2011).

[19] K. Banaszek, G. M. D’Ariano, M. G. A. Paris, and M. F.Sacchi, Phys Rev A 61, 10304 (2000).

[20] A. Miranowicz, K. Bartkiewicz, J. Perina Jr, M. Koashi,N. Imoto, and F. Nori, Phys Rev A 90, 062123 (2014).

[21] A. Feito, J. S. Lundeen, H. Coldenstrodt-Ronge, J. Eis-ert, M. B. Plenio, and I. A. Walmsley, New J. Phys. 11,093038 (2009).

[22] J. S. Lundeen, A. Feito, H. Coldenstrodt-Ronge, K. L.Pregnell, C. Silberhorn, T. C. Ralph, J. Eisert, M. B.Plenio, and I. A. Walmsley, Nature Phys. 5, 27 (2009).

[23] R. Bianchetti, S. Filipp, M. Baur, J. M. Fink, C. Lang,L. Steffen, M. Boissonneault, A. Blais, and A. Wallraff,Phys Rev Lett 105, 223601 (2010).

[24] M. S. Kaznady and D. F. James, Phys Rev A 79, 022109(2009).

[25] D. F. James, P. G. Kwiat, W. J. Munro, and A. G. White,Phys Rev A 64, 052312 (2001).

[26] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.Flannery, Numerical recipes in C, vol. 2 (Cambridge uni-versity press Cambridge, 1996).

[27] C. Michelot, Journal of Optimization Theory and Appli-cations 50, 195 (1986).

[28] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton,ICML (3) 28, 1139 (2013).

[29] D. S. Goncalves, M. A. Gomes-Ruggiero, and C. Lavor,Quantum Information & Computation 14, 966 (2014).

Acknowledgements

E.B. acknowledges the financial support of theFQRNT, grant #176729. J.L. acknowledges the finan-cial support of the Engineering and Physical Sciences Re-search Council (EPSRC, UK, Grants EP/M006514/1 andEP/M01326X/1). G.C.K. was supported by the RoyalCommission for the Exhibition of 1851. E.M.G. acknowl-edges support from the Royal Society of Edinburgh andthe Scottish Government.

http://arxiv.org/abs/1604.00321

http://arxiv.org/abs/1609.07881

1 david brewster building, edinburgh, eh14 4as, uk and 2 … · 2018-10-08 · projected gradient...

Documents