arxiv:1910.02071v1 [quant-ph] 4 oct 2019

21
Quantum Hamiltonian-Based Models & the Variational Quantum Thermalizer Algorithm Guillaume Verdon, 1, 2, 3, * Jacob Marks, 1, 4, Sasha Nanda, 1, 5 Stefan Leichenauer, 1 and Jack Hidary 1 1 X, The Moonshot Factory, Mountain View, CA 2 Institute for Quantum Computing, University of Waterloo, ON, Canada 3 Department of Applied Mathematics, University of Waterloo, ON, Canada 4 Department of Physics, Stanford University, Stanford, CA 5 Division of Physics, Mathematics and Astronomy, California Institute of Technology, CA (Dated: October 7, 2019) We introduce a new class of generative quantum-neural-network-based models called Quantum Hamiltonian-Based Models (QHBMs). In doing so, we establish a paradigmatic approach for quantum-probabilistic hybrid variational learning, where we efficiently decompose the tasks of learn- ing classical and quantum correlations in a way which maximizes the utility of both classical and quantum processors. In addition, we introduce the Variational Quantum Thermalizer (VQT) for gen- erating the thermal state of a given Hamiltonian and target temperature, a task for which QHBMs are naturally well-suited. The VQT can be seen as a generalization of the Variational Quantum Eigensolver (VQE) to thermal states: we show that the VQT converges to the VQE in the zero temperature limit. We provide numerical results demonstrating the efficacy of these techniques in illustrative examples. We use QHBMs and the VQT on Heisenberg spin systems, we apply QHBMs to learn entanglement Hamiltonians and compression codes in simulated free Bosonic systems, and finally we use the VQT to prepare thermal Fermionic Gaussian states for quantum simulation. I. INTRODUCTION As near-term quantum devices move beyond the point of classical simulability, also known as quantum supremacy [1], it is of utmost importance for us to discover new applications for Noisy Intermediate Scale Quantum devices [2] which will be available and ready in the next few years. Among the most promising ap- plications for near-term devices are Quantum Machine Learning (QML) [3–14], Quantum Simulation (QS) [15– 20], and Quantum-enhanced Optimization (QEO) [9, 21– 28]. Recent advances in these three areas have been dom- inated by a class of algorithms called hybrid quantum- classical variational algorithms. In these algorithms, a classical computer aids the quantum computer in a search over a parameterized class of circuits. These parame- terized quantum circuits are sometimes called quantum neural networks [3–5]. Key to the success of quantum- classical algorithms is hybridization: in the near-term, quantum computers will be used as co-processors for clas- sical devices. The work in this paper proposes a new way to hybridize certain quantum simulation and QML tasks in a way that fully takes advantage of the strengths of both devices. The rise of variational quantum algorithms can be traced back to the invention and implementation of the Variational Quantum Eigensolver [16], which sparked a Cambrian explosion of works in the field of near-term al- gorithms. Similarly, in this paper, not only do we intro- duce a direct generalization of the VQE, but we introduce * Equal contribution; contact: [email protected] Equal contribution the first member of a new class of algorithms, which we call quantum-probabilistic hybrid variational algorithms, which are a combination of classical probabilistic vari- ational inference [29–35] and quantum variational algo- rithms. More specifically, in this paper we focus on the task of generative modelling of mixed quantum states. It is gen- erally accepted that one must employ a quantum-based representation to efficiently learn the quantum correla- tions of a pure quantum state, as classical representa- tions of quantum states scale poorly in the presence of quantum correlations such as entanglement [36]. Mixed quantum states, arising from probabilistic mixtures of pure quantum states, generally exhibit both classical and quantum correlations. One must therefore learn a hy- bridized representation featuring both quantum correla- tions and classical correlations. Within the new paradigm of quantum-probabilistic hy- brid machine learning, we introduce a class of models called Quantum Hamiltonian-Based Models (QHBM). These models are constructed as thermal states (i.e., quantum exponential distributions) of a parameterized modular Hamiltonian. As a first set of applications for this class of models, we explore the learning of unknown quantum mixed states, given access to several copies (quantum samples) of this quantum-probabilistic distri- bution. As a second class of applications for QHBMs, we consider the task of generating the thermal state of a quantum system given knowledge of the system’s Hami- tonian. Before proceeding with the main body of the paper, let us establish a broader context from the point of view of classical machine learning. The field of machine learn- ing (ML) has seen several breakthroughs in recent years. These successes have often been attributed to the rapid arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

Upload: others

Post on 09-Jul-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

Quantum Hamiltonian-Based Models &the Variational Quantum Thermalizer Algorithm

Guillaume Verdon,1, 2, 3, ∗ Jacob Marks,1, 4, † Sasha Nanda,1, 5 Stefan Leichenauer,1 and Jack Hidary1

1X, The Moonshot Factory, Mountain View, CA2Institute for Quantum Computing, University of Waterloo, ON, Canada

3Department of Applied Mathematics, University of Waterloo, ON, Canada4Department of Physics, Stanford University, Stanford, CA

5Division of Physics, Mathematics and Astronomy, California Institute of Technology, CA(Dated: October 7, 2019)

We introduce a new class of generative quantum-neural-network-based models called QuantumHamiltonian-Based Models (QHBMs). In doing so, we establish a paradigmatic approach forquantum-probabilistic hybrid variational learning, where we efficiently decompose the tasks of learn-ing classical and quantum correlations in a way which maximizes the utility of both classical andquantum processors. In addition, we introduce the Variational Quantum Thermalizer (VQT) for gen-erating the thermal state of a given Hamiltonian and target temperature, a task for which QHBMsare naturally well-suited. The VQT can be seen as a generalization of the Variational QuantumEigensolver (VQE) to thermal states: we show that the VQT converges to the VQE in the zerotemperature limit. We provide numerical results demonstrating the efficacy of these techniques inillustrative examples. We use QHBMs and the VQT on Heisenberg spin systems, we apply QHBMsto learn entanglement Hamiltonians and compression codes in simulated free Bosonic systems, andfinally we use the VQT to prepare thermal Fermionic Gaussian states for quantum simulation.

I. INTRODUCTION

As near-term quantum devices move beyond thepoint of classical simulability, also known as quantumsupremacy [1], it is of utmost importance for us todiscover new applications for Noisy Intermediate ScaleQuantum devices [2] which will be available and readyin the next few years. Among the most promising ap-plications for near-term devices are Quantum MachineLearning (QML) [3–14], Quantum Simulation (QS) [15–20], and Quantum-enhanced Optimization (QEO) [9, 21–28]. Recent advances in these three areas have been dom-inated by a class of algorithms called hybrid quantum-classical variational algorithms. In these algorithms, aclassical computer aids the quantum computer in a searchover a parameterized class of circuits. These parame-terized quantum circuits are sometimes called quantumneural networks [3–5]. Key to the success of quantum-classical algorithms is hybridization: in the near-term,quantum computers will be used as co-processors for clas-sical devices. The work in this paper proposes a new wayto hybridize certain quantum simulation and QML tasksin a way that fully takes advantage of the strengths ofboth devices.

The rise of variational quantum algorithms can betraced back to the invention and implementation of theVariational Quantum Eigensolver [16], which sparked aCambrian explosion of works in the field of near-term al-gorithms. Similarly, in this paper, not only do we intro-duce a direct generalization of the VQE, but we introduce

∗ Equal contribution; contact: [email protected]† Equal contribution

the first member of a new class of algorithms, which wecall quantum-probabilistic hybrid variational algorithms,which are a combination of classical probabilistic vari-ational inference [29–35] and quantum variational algo-rithms.

More specifically, in this paper we focus on the task ofgenerative modelling of mixed quantum states. It is gen-erally accepted that one must employ a quantum-basedrepresentation to efficiently learn the quantum correla-tions of a pure quantum state, as classical representa-tions of quantum states scale poorly in the presence ofquantum correlations such as entanglement [36]. Mixedquantum states, arising from probabilistic mixtures ofpure quantum states, generally exhibit both classical andquantum correlations. One must therefore learn a hy-bridized representation featuring both quantum correla-tions and classical correlations.

Within the new paradigm of quantum-probabilistic hy-brid machine learning, we introduce a class of modelscalled Quantum Hamiltonian-Based Models (QHBM).These models are constructed as thermal states (i.e.,quantum exponential distributions) of a parameterizedmodular Hamiltonian. As a first set of applications forthis class of models, we explore the learning of unknownquantum mixed states, given access to several copies(quantum samples) of this quantum-probabilistic distri-bution. As a second class of applications for QHBMs,we consider the task of generating the thermal state of aquantum system given knowledge of the system’s Hami-tonian.

Before proceeding with the main body of the paper,let us establish a broader context from the point of viewof classical machine learning. The field of machine learn-ing (ML) has seen several breakthroughs in recent years.These successes have often been attributed to the rapid

arX

iv:1

910.

0207

1v1

[qu

ant-

ph]

4 O

ct 2

019

Page 2: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

2

advancements in deep learning [37]. In deep learning,one learns representations of data distributions. Suchrepresentations can consist of a neural network [38], atensor network [39, 40], or more generally a parameter-ized network of compositions of differentiable mappings.The parameters are trained by optimizing some metricof success called the loss function, often via gradient de-scent with backpropagation of errors [41]. One of themajor tasks in unsupervised deep learning is so-calledgenerative modelling of a distribution.

In discriminative machine learning models, a modellearns the conditional probability of a target, Y , giventhe observation x, i.e., P (Y |X = x). Generative modelstake on the task of learning the joint probability distribu-tion P (X,Y ), enabling a trained generative model to gen-erate new data that looks approximately like the trainingdata. Generative models have been used for many unsu-pervised tasks, including generating new datapoints akinto a dataset, inpainting [42], denoising [43], superreso-lution [44], compression [45], and much more. Perhapsthe most widespread generative techniques in classicalmachine learning are Generative Adversarial Networks(GANs) [46], which pit a generative network and an ad-versarial network against each other, and variational au-toencoders (VAEs) [30], which maximize a variationallower bound to the log likelihood of the data in orderto increase the probability of the model to generate thedataset, and hence, similar datapoints. Both architec-tures have demonstrated great successes and have re-mained competitive with each other in terms of perfor-mance and scalability [47, 48].

Recently, a third type of generative algorithm — thegeneralized form of an Energy Based Model (EBM) —has been gaining traction in the classical machine learn-ing community. This new architecture, derived as gen-eralization of early energy-based architectures of neuralnetworks such as Boltzmann machines, has been shown tobe competitive with GANs and VAEs for generative tasksat large scales [49]. The EBM approach draws its inspi-ration from distributions encountered in physics, namelythermal (exponential) distributions, where the probabil-ity of a given sample is proportional to the exponential ofa certain function, called the energy. Instead of samplingthe exponential distribution from a fixed energy function,EBMs have a parameterization over a hypothesis spaceof energy functions, and the parameters which maximizethe likelihood (relative entropy) to the dataset can befound via optimization. Boltzmann machines, a type ofenergy-based model with an energy function inspired byIsing models, have long been in use. The recent innova-tion, however, has been to use a neural network in or-der to have a more general and flexible parameterizationover the space of energy functions. The differentiabil-ity of neural networks is then used to generate thermalsamples according to Stochastic Langevin Gradient Dy-namics [50]. The construction presented in this paper isanalogous to this energy-based construction, generalizedto the quantum domain.

II. QUANTUM HAMILTONIAN-BASEDMODELS

In order to represent the hybrid quantum-classicalstatistics of mixed states, the QHBM ansatz is structuredin terms of a “simple” (i.e. nearly-classical) parameter-ized latent mixed state which is passed through a unitaryquantum neural network to produce a visible mixed state.In this section we will introduce the general frameworkof QHBM before proceeding to training and examples insubsequent sections. The flow of classical and quantuminformation in a QHBM is illustrated in Figure 1, andbackground on quantum neural networks is provided inAppendix A.

The variational latent distribution ρθ with variationalparameters θ is constructed in such a way that thepreparation of ρθ is simple, from a quantum computa-tional standpoint.1 Quantum correlations are incorpo-rated through the unitary U(φ) with model parametersφ.

Our complete variational mixed state is then

ρθφ = U(φ)ρθU†(φ). (1)

We call this state the variational visible state. It isthe output of the inference mechanism of our compos-ite model, and will be either a thermal state or learnedapproximation to a target mixed state, depending on thetask.

A. Modular Hamiltonians and the ExponentialAnsatz

Quantum Hamiltonian-Based Models are quantumanalogues of classical energy-based models, an analogywhich will be made clear in this section.

Without loss of generality, we can consider the latentdistribution to be a thermal state of a certain Hamilto-nian:

ρθ = 1Zθe−Kθ , Zθ = tr[e−Kθ ]. (2)

We call this Kθ

the latent modular Hamiltonian, and itis one of a class of operators parameterized by the la-tent variational parameters θ. Here Zθ = tr[e−Kθ ] is themodel partition function, which is, notably, also parame-terized by the latent variational parameters. Now, giventhis form for the latent state, notice that the variationalvisible state is a thermal state of a related Hamiltonian:

ρθφ = 1Zθe−U(φ)KθU

†(φ) ≡ 1Zθe−Kθφ , (3)

where we define the model modular Hamiltonian asKθφ ≡ U(φ)KθU

†(φ), which is parameterized by both

1 We define this more precisely a few paragraphs below when givingexamples of structures for the latent space distribution.

Page 3: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

3

the latent variational parameters θ and the model param-eters φ. Thus, we see that our QHBM ansatz represents aclass of quantum distributions which are thermal statesof parameterized Hamiltonians. As we will see below,this exponential structure is useful for computing rela-tive entropies of the model against target distributions.

We note that the above structure is in direct anal-ogy with classical energy-based models. In such models,the variational distribution is of the exponential formpθ(x) = 1

Zθe−Eθ(x), where Zθ ≡

∑x e−Eθ(x) and the

energy function Eθ(x) is parameterized by a neural net-work. The network is trained so that the samples frompθ mimic those of a target data distribution.

In place of a parameterized classical energy function,we have a parameterized modular Hamiltonian operator,and the variational model is a thermal state of this opera-tor. This justifies why we call it a Quantum Hamiltonian-Based Model instead of simply an Energy-Based model.The thermal state of the Hamiltonian is designed to repli-cate the quantum statistics of the target data.

We will distinguish two distinct tasks within theQHBM framework. The first is generative learning ofa target mixed state, given access to copies of said mixedstate. We call this task Quantum Modular Hamilto-nian Learning (QMHL). As described above, we shouldthink of this task as finding an effective modular Hamil-tonian for the target state. The second task involvesbeing given a target Hamiltonian, and generating ap-proximate thermal states of some temperature with re-spect to that Hamiltonian. In our framework, this meansthat we variationally learn an effective modular Hamilto-nian which reproduces the statistics of the target thermalstate. We call this task Variational Quantum Thermal-ization (VQT). We will treat QMHL and VQT in depthin Sections III A and III B, respectively.

Before discussing in more detail the structure of thelatent space, let us draw attention to two quantities whichwill be important for the training process in both VQTand QMHL: the partition function and the Von Neumannentropy of the model. Comparing equations (2) and (3),we see that the partition function Zθ is the same for boththe latent state and the visible state. Furthermore theVon Neumann entropy of the latent and visible statesare also identical due to the invariance of Von Neumannentropy under unitary action,

S(ρθ) = S(U(φ)ρθU

†(φ))

= S(ρθφ). (4)

We will leverage this in our training of the QHBM, whereefficient computation or estimation of the latent state en-tropy will give us knowledge of the visible state entropy.

B. Structure of the Latent Space

In this section we will discuss the form of the latentmodel ρθ. We consider a good choice of a latent space

Quantum Hamiltonian-Based Model

FIG. 1. Quantum-classical information flow diagramfor hybrid quantum-probabilistic inference for a QuantumHamiltonian-Based Model with a general classical latent dis-tribution. Here, we have unitaries Vx which map the quantumcomputer’s initial state |0〉 to computational the computa-tion basis state |x〉 which corresponds to the sampled valuex ∼ pθ(x) on a given run.

ansatz to be one that is quantumly simple, i.e., of low-complexity for the quantum computer to prepare. Thetwo types of latent distributions employed in this paperwill be either a factorized latent state, or a general classi-cal latent space distribution. Let us now introduce thesetwo cases and dive further into the specifics of each.

1. Factorized Latent Space Models

A first choice for the latent space structure is a factor-ized latent state, which is a latent state of the the form

ρθ =

N⊗j=1

ρj(θj), (5)

where the total quantum system is separated into Nsmaller-dimensional subsystems, each with their own setof variational parameters θj for which the mixed statesof the subsystems ρj(θj) are uncorrelated.

This structure has several useful benefits. Notice that,due to this tensor product structure, the latent modularHamiltonian becomes a sum of modular Hamiltonians ofthe subsystems,

Kθ ≡∑j

Kj(θj), ρθ =

N⊗j=1

1Zθj

e−Kj(θj), (6)

where Zθj = tr[e−Kj(θj)] is the jth subsystem partition

function, and Kj(θj) the jth subsystem modular Hamil-tonian. This sum decomposition becomes useful whenestimating expectation values of the modular Hamilto-nian, as it becomes a sum of expectation values of thesubsytems’ modular Hamiltonians,

〈Kθ〉 =∑j

〈Kj(θj)〉. (7)

Page 4: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

4

From the above expression, we see that the partitionfunction is a product of the subsystem partition functionsZθ =

∏j Zθj , and hence the logarithm of the partition

function becomes a sum:

log(Zθ) =∑Nj=1 log(Zθj ). (8)

Furthermore, the entropy of the latent state (and henceof the visible state, via equation (4)) becomes additiveover the entropies of the subsystems:

S(ρθ) =∑Nj=1 S

(ρj(θj)

)(9)

This is convenient, as estimating N entropies of statesin d-dimensional Hilbert space is much simpler generallythan computing the entropy of a state in dN -dimensionalspace.

Another feature of a factorized state is that the num-ber of parameters used to describe such a distribution islinear in the number of subsystems N . The precise num-ber of parameters depends on the structure of the stateswithin each subsystem. There are many possibilities, andwe will present some concrete examples below.

Finally, by learning a completely decorrelated (interms of both entanglement and classical correlations)representation in latent space, we are effectively learninga representation which has a natural orthogonal basisfor its latent space, allowing for latent space interpola-tion, independent component analysis, compression codelearning, principal component analysis, and many otherspin-off applications. In classical machine learning, thisis known as a disentangled representation [51], and therehave been several recent works adressing this machinelearning task [52].

2. Examples of Latent Subsystem Modular Hamiltonians

For the QHBMs considered in this paper, as exploredin-depth in Section IV, we use several different typesof latent subsystem modular Hamiltonians within thefactorized-latent-state framework of equation (6). Wewill review them here.

The first class of modular Hamiltonians employed inour experiments are qudit operators [53] diagonal in thecomputational basis. In our particular numerics in Sec-tion IV A, we use two-level systems, i.e. qubits. Themore general qudit case is akin to the eigenbasis repre-sentation of the Hamiltonian of a multi-level atom,

Kj(θj) =∑djk=1 θjk|k〉〈k|j , (10)

where the the eigenvalues θjkdjk=1 of this Hamiltonian

form the set of variational parameters of the distribution.The latent distribution of each subsystem is then effec-tively a softmax [54] function of the eigenvalues of thisHamiltonian, and can be considered as equivalent to a

general N -outcome multinoulli [29] distribution. The to-tal number of parameters for such a latent space param-eterization with factorized subsystems scales as the sum

of the subsystem dimensions,∑Nj=1 dj , which is much

smaller than the most general (non factorized) latentdistribution, whose number of parameters scales as the

product∏Nj=1 dj , which is the total dimension of the sys-

tem’s Hilbert space.The second type of modular Hamiltonian we use, in

Section IV B, is the number operator of a continuous-variable quantum mode (qumode), or harmonic oscilla-tor,

Kj(θj) =θj2 (x2 + p2) = θj

(a†j aj + 1

2

). (11)

The exponential distribution of such modular Hamiltoni-ans then becomes a single-mode thermal state [55] whichhas a Wigner phase space representation as a symmetricGaussian. The single parameter per mode here, θj , mod-ulates the variance of the Gaussian. This single-modethermal state is the closest thing we have to a latentproduct of Gaussians, which is the standard choice of la-tent distributions in several variational inference modelsin classical machine learning, including variational au-toencoders [30]. Gaussian states of such quantum modesare very natural to prepare on continuous-variable quan-tum computers [56], and are also emulatable on digitalquantum computers [13].

Finally, as will be explored in Section IV C, our thirdtype of latent subsystem modular Hamiltonian is the par-

ticle number operator for Fermions, Kj(θj) = θj c†j cj .

3. General Classical Latent Space Models

In some cases, a decorrelated (disentangled) represen-tation for the latent space is not possible or capableof yielding accurate results. In classical deep learning,this factorized latent space prior assumption has beenone of the main reasons that VAEs with uncorrelatedGaussian priors perform worse than GANs on severaltasks. To remedy this situation, several classical algo-rithms for more accurate variational inference of latentspace distributions have popped up, notably includingNeural ODEs [35], normalizing flow-based models [33],and neural energy-based models [49].

This same generalization can be applied to QHBMs,moving from the factorized latent space structure of (6)to more general distributions. This should be understoodas a delegation of parts of the representation to a classi-cal device running classical probabilistic inference. Thejob of the classical device is to sample from the classicallatent space distribution.

In the most general formulation, the latent state canbe represented as

ρ(θ) =∑x∈Ω

pθ(x)|x〉〈x|, (12)

Page 5: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

5

where the summation is formal, and runs over the in-dex set Ω of some basis in Hilbert space, which we callthe computational basis. Here the probability distribu-tion pθ(x) is generally a parameterized class of varia-tional probability distributions over the computationalbasis domain Ω. Since a general categorical (multinoulli)distribution would need a number of parameters whichscales as |Ω|, the dimension of the global Hilbert space,which is generally exponentially large (e.g., |Ω| = 2N forN qubits), one needs a more efficient parameterization ofthe latent distribution. This is where classical algorithmsfor probabilistic and variational inference come into play.

The classical computer is thus tasked with the varia-tional learning of this classical distribution. There are aplethora of techniques for variational inference to choosefrom, as were listed above. The key feature we need fromsuch classical algorithms is the ability to estimate the log-likelihood of the model distribution, the gradients of the(log) partition function, and the entropy. These quan-tities will become useful in various generative learningtasks, see Section III for more details.

The most natural fit for our needs are the modern vari-ant of classical energy-based models [49]. In EBMs, aneural network Eθ : Ω 7→ R parameterizes the energymap from the computational basis to a real value, i.e. forany value x ∈ Ω it can produce the corresponding energyEθ(x). Furthermore, due to the easy differentiability ofneural networks, one can rather easily compute gradi-ents of the energy function, ∇θEθ(x) and ∇xEθ(x). Toleverage these gradients, one uses Langevin dynamics orHamiltonian Monte Carlo to generate samples x ∼ pθ(x)according to the exponential distribution of the energyfunction,

pθ(x) = 1Zθe−Eθ(x) (13)

where Zθ ≡∑x∈Ω e

−Eθ(x) is the partition function.As a straightforward feature of this approach, given anensemble of points sampled from a different distribu-tion x ∼ q(x), one can straightforwardly use the neu-ral network to evaluate expectation values of the energywith respect to this distribution, Ex∼q(x)[E(x)]. Addi-tionnally, this approach can provide an estimate and/orgradients of the entropy of the distribution S(pθ) =∑x∈Ω−pθ(x) log pθ(x). Given the abilities of the classi-

cal model above, one can use the classical algorithm togenerate the latent mixed state ρ(θ) =

∑x∈Ω pθ(x)|x〉〈x|

by sampling x ∼ pθ and preparing the computational ba-sis state |x〉〈x| after each sample. Furthermore, we candefine the modular Hamiltonian for this latent variationalmodel as

Kθ =∑x∈Ω

Eθ(x)|x〉〈x|. (14)

Thus one can estimate expectation values of this operatorby feeding standard basis measurement results throughthe neural network function Eθ(x) and averaging. Wenow have outlined all the ingredients needed for inferenceand training of this type of hybrid QHBM.

Generally, given a target mixed state σD which wewant to generatively model, we know that there existsa diagonal representation of this mixed state:

∀ σD, ∃ WD : W †DDDWD = σD, DD =∑x λx|x〉〈x|

(15)

where tr(DD) = 1, and WD is a unitary operator. Thusthe approach outlined above has, in principle, the capac-ity to represent any mixed state. The challenge remainsto pick a proper prior for both the unitary quantum neu-ral network U(φ) and for the parameterization of theclassical latent distribution pθ(x) (or equivalently, the la-tent energy function Eθ(x)). A good ansatz is one whichuses knowledge about the physics of the system to forma prior for both the latent space and the unitary trans-formation. Although it may be tempting to use a generalmultinoulli distribution for the latent distribution com-bined with a universal random quantum neural networkfor the unitary ansatz, the model capacity in this case isfar too large. One encounters not only exponential over-head for parameter estimation of the high-dimensionalmultinoulli distribution, but also the quantum versionof the no-free lunch theorem [4]. In Section IV, we seeseveral scenarios where the structure of the latent spaceand the unitary transformation are well-adapted to thecomplexity of the situation at hand.

III. QUANTUM GENERATIVE MODELLINGOF MIXED & THERMAL STATES

In this section, we introduce two types of generativetasks for quantum Hamiltonian-based models which arealso valid for more general quantum-probabilistic models.The first task is quantum modular Hamiltonian learn-ing (QMHL), where one effectively learns the logarithm(modular Hamiltonian) of an unknown data mixed state.Once trained, one is then able to reproduce copies of thisunknown mixed state via an exponential (thermal) distri-bution of the learned modular Hamiltonian. The secondtasks is dual to the first task: given a known Hamilto-nian, learn to produce copies of the thermal state of thisHamiltonian. While both tasks can be considered in thebroad class of generative modelling, the first task is aquantum machine learning task, while the second maybe considered as a quantum simulation task.

As we have already detailed how to construct severalvariants of the QHBM with a variety of latent spacestructure in Section II — including details on how tocompute expectations of the modular Hamiltonian, gen-erate samples of the latent state, and compute quantitiessuch as the partition function and entropy of the model— we will focus on the general framework of the QHMLand VQT without too many specifics about how to com-pute the various quantities involved. For more detailson gradients of the loss functions introduced here, seeAppendix B.

Page 6: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

6

A. Quantum Modular Hamiltonian Learning ofMixed Quantum States

Analogous with the classical case, quantum generativemodels seek to learn to replicate the statistics and cor-relation structure of a given quantum data distributionfrom which we have a finite number of samples. QHBMsmake this learning process feasible by positing a formthat inherently separates the tasks of learning the quan-tum correlations in the learned representation of a quan-tum distribution.

Typical classical distributions are a set D of samplesdrawn from some underlying distribution d ∼ pD. Thesampled dataset distribution is simply the categorical dis-tribution over the datapoints d ∈ D. A quantum datasetgenerally can be a similar mix of various datapoints, inthis case quantum states σd observed at different frequen-cies (inverse probabilities). The effective mixed state rep-resenting this dataset is then σD =

∑d∈D pd σd, where we

obtain the mixed state σd with probability pd from ourdataset D. As modelling a mixture of states is effectivelyequivalent to the task of modelling one mixed state, wewill focus on simply learning to approximate a single datamixed state σD using a variational quantum-probabilisticmodel. We assume that we have access to several copiesof this data mixed state directly as quantum data avail-able in quantum memory.

Variational Quantum Thermalizer

Quantum processing

unit

variational parameters(classical)

Classical output(e.g., signal classification label)

FIG. 2. Information flow for the process of training for theQHBM applied to Modular Hamiltonian Learning. Grayscaleindicates classical information processing happening on a clas-sical device while colored registers and operations are storedand executed on the quantum device. The θ parameters de-termine the latent space distribution and thus the modularHamiltonian Kθ. From the known latent distribution, onecan estimates of the parameterized partition function logZθon the classical device. One then applies the inverse unitaryquantum neural network U†(φ) and estimates the expecta-

tion value of the modular Hamiltonian Kθ at the ouput viaseveral runs of inference on the quantum device and measure-ment. The partition function and modular expectation arethen combined to yield the quantum variational cross entropyloss function (20).

Thus, in complete generality, we consider the prob-lem of learning a mixed state σD, and let ρθφ be our

model’s candidate approximation to σD, where θ,φ arevariational parameters. In classical probabilistic machinelearning, one minimizes the Kullback-Leibler divergence(relative entropy), an approach known as “expectationpropagation.” [29] We directly generalize this methodto quantum probabilistic machine learning, now aim-ing to minimize the quantum relative entropy betweenthe quantum data distribution and our quantum model’scandidate distribution, subject to variations of the pa-rameters. Thus we aim to find

argminθ,φD(σD‖ρθφ), (16)

where D(σ‖ρ) = tr(σ log σ) − tr(σ log ρ) is the quantumrelative entropy. Due to the positivity of relative entropy,the optimum of the above cost function is achieved if andonly if the variational distribution of our model is equalto the quantum data distribution:

D(σD‖ρθφ) ≥ 0, D(σD‖ρθφ) = 0 ⇐⇒ σD = ρθφ.(17)

We can use this as a variational principle: the relativeentropy is our loss function and we can find optimal pa-rameters of the model θ∗,ϕ∗ such that ρθ∗φ∗ ≈ σD.As we will see below, the use of QHBMs becomes crucialfor the tractability of evaluation of the relative entropy,making its use as a loss function possible. Details onthe gradient computation of the loss are discussed in Ap-pendix B.

Now, for a target quantum data mixed state σD anda QHBM variational state ρθφ of the form described inEquation (3), our goal is to minimize the forward relativeentropy,

D(σD‖ρθφ) = −S(σD) + tr(σDKθφ) + log(Zθ). (18)

Notice that the first term, S(σD) = −tr(σD log σD), theentropy of the data distribution, is independent of ourvariational parameters, and hence, for optimization pur-poses, irrelevant to include. This is convenient as theentropy of the dataset is a priori unknown. We can thususe the last two terms as our loss function for QHML.These terms are known as the quantum cross entropybetween our data state and our model,

Lqmhl(θ,φ) ≡ −tr(σD log ρθφ) = tr(σDKθφ) + log(Zθ).(19)

We call this loss function the quantum variational crossentropy loss. The first term can be understood as theexpectation value of the model’s modular Hamiltonianwith respect to the data mixed state. In order to esti-mate this modular energy expectation, it is first useful toexpress the modular Hamiltonian of the model in terms ofthe latent modular Hamiltonian: Kθ,φ = U(φ)KθU

†(φ).Then, by the cyclicity of the trace, we can rewrite the lossas

Lqmhl(θ,φ) = tr([U†(φ)σDU(φ)]Kθ) + log(Zθ). (20)

Page 7: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

7

To better understand this, we define the pulled-back datastate σD,φ ≡ U†(φ)σDU(φ), which is the state obtainedby feeding our quantum data through the unitary quan-tum neural network circuit in reverse (or rather, inverse).Thus, this term in the loss is equivalent to the expecta-tion value of the latent modular Hamiltonian with respectto the pulled-back data, 〈Kθ〉σD,φ . The flow of quantumand classical information for the task of Modular Hamil-tonian Learning is depicted in Figure 2. See Section IIfor details on estimating the partition function, depend-ing on the chosen latent space structure.

Finally, an interesting feature to note about about thischoice of cross-entropy loss function is that in the limitwhere the model approximates the data state, i.e,. whenthe relative entropy in (17) converges to zero, then theloss function itself converges to the entropy of the data:

Lqmhl(θ,φ)ρθφ→σD−→ S(σD) (21)

This means that, as a by-product of the learning process,after convergence of the training of our model, we are au-tomatically provided with an estimate of the entropy ofour data distribution simply by observing what the lossvalue has converged to. This has wide-ranging implica-tions about the potential use of QHBM and QMHL, com-bined with quantum simulation, to allow us to estimateentropies and various information theoretic quantities us-ing quantum computers. More on this in the discussionin Section V.

B. Variational Quantum Thermalizer Algorithm:Quantum Simulation of Thermal States

In this section we formally introduce the VariationalQuantum Thermalizer (VQT) algorithm, the free energyvariational principle behind it, how it relates to the Varia-tional Quantum Eigensolver (VQE) [16] in a certain limit,and how one can use QHBMs for this task.

The variational quantum thermalization task can bedescribed as the following: given a Hamiltonian H and atarget inverse temperature β = 1/T , we wish to generatean approximation to thermal state

σβ = 1Zβ e

−βH , Zβ = tr(e−βH). (22)

Here Zβ is known as the thermal partition function.Our strategy will be to phrase this quantum simulationtask as a quantum-probabilistic variational learning task.Suppose we have a quantum-probabilistic ansatz for thisthermal state, ρθφ with parameters θ,φ. We can con-sider the relative entropy between this unknown thermalstate and our variational model,

D(ρθφ‖σβ) = −S(ρθφ)− tr(ρθφ log σβ) (23)

= −S(ρθφ) + βtr(ρθφH) + logZβ .

This is known as the quantum relative free energy of ourmodel with respect to the target Hamiltonian H. The

reason it is called a relative free energy is because it is

the difference of free energy F (ξ) ≡ tr(Hξ) − 1βS(ξ), up

to a factor of β, between our ansatz state and the truethermal state,

D(ρθφ‖σβ) = βF (ρθφ)− βF (σβ).

Further note that the positivity of relative entropy im-plies that the minimum of free energy is achieved by thethermal state;

D(ρθφ‖σβ) = 0 =⇒ F (ρθφ) = F (σβ) and ρθφ = σβ .

Thus, given a variational ansatz for the thermal state, byminimizing the free energy as our loss function,

Lvqt(θ,φ) = βF (ρθφ) = βtr(ρθφH)− S(ρθφ),

we find optimal parameters θ∗,ϕ∗ such that ρθ∗φ∗ ≈σβ .

Variational Quantum Thermalizer

Quantum processing

unit

variational parameters(classical)

Classical output(e.g., signal classification label)

FIG. 3. Information flow for the process of inference andtraining for the QHBM applied to VQT. Grayscale indicatesclassical information processing happening on a classical de-vice while colored registers and operations are stored and exe-cuted on the quantum device. Here we focus on a general casefor the latent variational distribution. The θ parameters de-termine the latent space distribution. From this distribution,one can compute the entropy Sθ classically. Using samplesfrom the latent distribution x ∼ pθ(x), one applies a quan-

tum operation to prepare the state |x〉〈x| via the unitary Vx

from the initial state of the quantum device. One then ap-plies the unitary quantum neural network U(φ) and estimates

the expectation value of the Hamiltonian H at the output viaseveral runs of classical sampling of x and measurement. Theentropy and energy expectation are then combined into thefree energy loss for optimization.

Now that we have our loss function, we can briefly ex-amine how one could use a QHBM ansatz for the above.The great advantage of the QHBM structure here is thatthe entropy of the variational model distribution is thatof the latent distribution, as pointed out in equation(4). As the entropy of the latent variational distributionis stored on the classical computer and assumed to beknown a priori, only the evaluation of the energy expec-tation (the first term in the above) requires the quantumcomputer. Thus, the number of runs required to esti-mate the loss function should be similar to the number

Page 8: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

8

of runs required for the Variational Quantum Eigensolverand other variational algorithms whose loss only dependson expectation values [15].

Note that, in the case where our state converges to thetrue thermal state upon convergence of the training, thenthe value of our loss function will give us the free energyof the thermal state, which is proportional to the log ofthe thermal partition function:

Lvqt(θ,φ)ρθφ→σβ−→ βF (σβ) = − logZβ .

This is effectively a variational free energy principle, andthe basis of the VQT.

As a final note, let us see how we recover the Varia-tional Quantum Eigensolver and its variational principlein the limit of low temperature. If we divide our lossfunction by β, i.e., directly minimizing the free energy,L(θ,φ) = 1

βLvqt(θ,φ) = F (ρθφ), then in the limit ofzero temperature,

L(θ,φ)β→∞−→ 〈H〉θφ,

we recover the loss function for the ground state Varia-tional Quantum Eigensolver (VQE), and thus we recoverthe ground state variational principle. Note that in thelimit of zero temperature, there is no need for latent spaceparameters as there is no entropy and the latent state isunitarily equivalent to any starter pure state.

Thus, the VQT is truly the most natural generalizationof the VQE to non-zero temperatures states.

IV. APPLICATIONS & EXPERIMENTS

Our framework is general enough to apply in many sit-uations of interest in quantum computing [57], quantumcommunication [58], quantum sensing [59], and quan-tum simulation [60–75]. In this section we focus onapplications to quantum simulation and quantum com-munication, which illustrate the QHBM frameworks forspins/qubits, Bosonic, and Fermionic systems.

In the Bosonic and Fermionic examples, we restrictourselves to Gaussian states for two purposes: first,these highlight the ways in which our framework is well-suited to utilizing problem structure; and second, clas-sical methods exist for simulating quantum systems re-stricted to these types of states, so they serve as a veri-fication and benchmarking of our methods. This allowsus to reach system sizes which would not be simulable onclassical computers were we to simulate the wavefunc-tion in the Hilbert space directly. However, we empha-size that Quantum Hamiltonian-Based Models apply toa much broader class of quantum states.

Gaussian states are the class of thermal (and ground)states of all Hamiltonians that are quadratic in creationand annihiliation operators (in second quantization). Onthe Fermionic side, this class contains a variety of tight-binding models, including topological models like the Su-Schrieffer-Heeger (SSH) model [76–78], and mean-field

BCS superconducting states. Bosonic Gaussian statesinclude coherent and squeezed states, and already enable(when coupled with measurement) a plethora of appli-cations in quantum communication, including quantumteleportation and quantum key distribution. In addition,Gaussian states are interesting in their own right, andplay a significant role in quantum information theory [79–81], as well as quantum field theory in curved space-time[82] and quantum cosmology [83].

Gaussian states are special in that our latent-space fac-torization postulate is justified physically in addition toholding numerically. As a consequence of Williamson’stheorem [84], we are guaranteed that for any thermalstate, there exists a Gaussian transformation that decou-ples all subsystems in latent space (a procedure often re-ferred to as normal mode decomposition [85]). For Gaus-sian systems, we can also work directly with 2N × 2Ncovariance matrices rather than density matrices.

A. Variational Quantum Thermalizer for the 2DHeisenberg Model

As a demonstration of the effectiveness of VQT forspin systems (qubits), we learn a thermal state of thetwo-dimensional Heisenberg model, which exhibits spin-frustration despite its simplicity. To be precise, we con-sider the model

Hheis =∑〈ij〉h

JhSi · Sj +∑〈ij〉v

JvSi · Sj (24)

where h (v) denote horizontal (vertical) bonds, and〈·〉 represent nearest-neighbor pairs. As laid out inSection II, we use a factorized latent space distribu-tion. Even though we are considering a two-dimensionalmodel, we simulate VQT using an imagined one-dimensional array of qubits. As such, we parameterizeour quantum neural network with only single qubit rota-tions and two-qubit rotations between adjacent qubits inour simulated one-dimensional quantum computer.

In particular, our paramterized unitary iscomposed out of three layers of gates. Eachlayer consists of an arbitrary single qubit rota-tion of the form exp[i(φ1

jXj + φ2j Yj + φ3

j Zj)] ap-plied to each qubit, and a two-qubit rotationexp[i(φ4

jXjXj+1 + φ5j Yj Yj+1 + φ6

j ZjZj+1)] appliedto a complete set of neighboring qubits. The neighborsacted upon by two-qubit gates are staggered in sequen-tial layers. No parameters are shared within or betweenlayers. So in the first layer, the first and second qubits,third and fourth, and so on, are coupled, while in thesecond layer it is the second and third qubits, fourthand fifth, etc., which are coupled. In addition, we donot assume periodic boundary conditions for our qubits,and as such do not include any gates between the firstand last qubits.

While we use relative entropy as our cost function, wealso look at the trace distance and state fidelity of the

Page 9: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

9

TABLE I. QHBM Experiments Ingredients

Physical System Qubits (Spins) Gaussian Bosons Gaussian Fermions

Base Quantities Si, Sj xj , pk c2j−1, c2j

Representation ρ ΓB = 12Tr(ρξ, ξT ) ΓF = i

2Tr(ρ[ξ, ξT ])

Ansatz ρθφ = U†(φ)ρ(θ)U(φ) ST (φ)ΓθS(φ) OT (φ)ΓθO(φ)

Latent Ansatz ρ(θ) =

N⊗j=1

(1− pj(θj) 0

0 pj(θj)

)Γθ =

N⊕j=1

(νj(θj) 0

0 νj(θj)

)Γθ =

N⊕j=1

(0 λj(θj)

−λj(θj) 0

)

reconstructed target states, as defined in Appendix C.As the modular Hamiltonian is not unique, in modularHamiltonian learning we choose to compare the thermalstates generated from target and learned Hamiltonian.In Appendix D, we investigate the performance of thefactorized latent space ansatz on systems that in generaldo not factorize.

FIG. 4. Density Matrix visualizations for thermal state ofa two-dimensional Heisenberg model. Left: Target thermalstate of Hamiltonian; Right: VQT reconstruction after 200training steps. Specific model parameters are Nx = Ny = 2,at β = 2.6, Jx = 1.0, Jy = 0.6. Performance is sustainedfor larger systems, but harder to visualize due to sparsity ofdensity matrix elements.

In Fig. 4, we show the density matrix elements of thetarget thermal state, and the reconstructed density ma-trix elements learned from the corresponding Hamilto-nian via VQT after 200 variational steps. A few itera-tions later, the differences become too small to visualizeand then the procedure converges.

Figure 5 shows mean values and 95% confidence inter-vals for three performance metrics over 100 VQT trainingiterations, for 100 randomly initialized sets of variationalparameters. This illustrates that our model trains gener-ically and is not strongly dependent on initial parame-ters. Furthermore, while our model was only trained tominimize relative entropy, it is worth noting that it suc-cessfully learns with respect to trace distance and fidelity.

B. Learning Quantum Compression Codes

Continuous Variable (CV) systems offer an intriguingalternative to discrete variable quantum information pro-

FIG. 5. Performance metrics for VQT ansatz reconstruc-tion of thermal state of two-dimensional anti-ferromagneticHeisenberg model with Nx = Ny = 2, at β = 0.5, Jx =1.0, Jy = 1.0. Upper: Training loss (Free energy of ansatzminus free energy of target state; Center: trace distance;Lower: mixed state fidelity. Solid line denotes mean, andshaded region represents 95% confidence interval. Modeltrained on 100 randomly chosen sets of initial parameterspicked uniformly from the unit interval.

cessing [55]. In this context, our parameterized ansatzfinds a decoupled representation of subsystem of quan-tum modes in latent space, and the entropy is effectivelydescribed by the modes’ effective temperature. High tem-perature modes contribute most to the entropy, and lowtemperature modes contain relatively little information.This decorrelated structure could be used to devise quan-tum approximate compression codes.

We recall that for Bosonic Gaussian states, the prob-lem of dimensionality is made tractable by working withthe phase space quadratures x and p defined in Ap-pendix E. The real symmetric covariance matrix ΓB asso-ciated to a mixed state ρ is given by the matrix elements

ΓabB =1

2Tr(ρξa, ξb), (25)

where each component of ξ is one of the x or p quadra-tures.

We also remind the reader that Gaussian transforma-tions act as symplectic transformations on the covariancematrix, and that the symplectic transformation that di-

Page 10: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

10

agonalizes ΓB ,

SΓBST =

Nb⊕j=1

(νj 00 νj

), (26)

also diagonalizes the Hamiltonian. In this context, mod-ular Hamiltonian learning consists of finding the sym-plectic diagonalization of the Hamiltonian given quan-tum access to the covariance matrix. For more de-tails on Bosonic Gaussian Quantum Information, see Ap-pendix E.

To keep our method as close to implementation as pos-sible, we directly parameterize the space of symplecticmatrices in terms of the effective action of single modephase shifts and squeezers, and two-mode beam splittersas detailed in Appendix F, and in [86].

We consider a translationally invariant linear harmonicchain

Hlhc =∑j

ωx2j + p2

j + 2χxj xj+1, (27)

where xN+1 = x1, as in [85]. At the point ω = 2|χ|,the system is critical and the operator Hlhc becomes un-bounded from below.

First, we find the ground state of a long chain (we use200 sites) close to criticality. We then cut the chain at twopoints and keep one of the two parts of the bipartition,performing a partial trace over the rest of the system.Given this reduced density matrix for the subsystem, weuse QMHL to find an approximate representation of amodular Hamiltonian that would generate this as a ther-mal state.

FIG. 6. Compression of latent space representation for re-duced state of harmonic chain near criticality (ω = 1.0, χ =0.499), with Modular Hamiltonian Learning. Subsystem of10 modes traced out from full chain of length 200. Plots visu-alize absolute value of difference between reconstructed andtrue covariance matrix elements, at single-shot compressionratios of 0.1, 0.4, 0.7 and 0.9. Color scale is relative to largestmatrix element of the true covariance matrix.

Our ansatz for QHBM is perfectly suited for latentspace compression. The entropy of our ansatz is onlydependent on the latent space parameters, where wehave uncoupled oscillators at different effective temper-atures. Compression consists of systematically settinglow-temperature oscillators to their ground state.

By running our circuit forward (encoding), compress-ing, and then backward (decoding), we see that for thereduced state of the chain, even when we remove almost

FIG. 7. Modular modes of reduced harmonic chain. Bluediamonds denote Modular Hamiltonian Learning results, anddashed red line denotes exact result from Williamson decom-position. Modes arranged in order of increasing eigenvalue(from top to bottom, and from left to right). System underconsideration was a harmonic chain of length 100 lattice sites(subsystem of 12 sites), with ω = 1, χ = −0.2.

all of the latent mode information, we can still approxi-mately reconstruct the original state.

In the case of Bosonic Gaussian systems, the columnsof the diagonalizing symplectic transformation are themodular modes of the reduced system. Thus, QHBM areessentially directly learning these modes. In Fig. 7, weshow modular mode reconstructions (after shifting andrescaling) for a translationally invariant harmonic chainaway from criticality.

C. Preparation of Fermionic Thermal States

In recent years, great progress has been achieved inusing quantum systems to simulate quantum dynamics.This quantum simulation has been achieved in a varietyof physical implementations, including ultracold atomicgases in optical lattices [87], trapped ions [88], and su-perconducting circuits [89]. QHBM are applicable to anyof these implementations, where VQT could be used toprepare an approximate thermal state. Moreover, mod-ular Hamiltonian learning could facilitate the simulationof out-of-equilibrium time-evolution, going beyond whatis feasible with classical methods.

As a proof of principle of the application of QHBMto these problems, we use the VQT to learn an efficientpreparation of d-wave superconducting thermal states.The BCS mean-field theory Hamiltonian is quadratic increation and annihilation operators, so we restrict ourattention to Fermionic Gaussian states. These states areuseful in a variety of quantum chemistry applications,where they are often desired as initial states for furtherquantum processing routines [90]. They have also been

Page 11: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

11

used to model impurities in metals [91, 92], which arekey to understanding much of the collective phenomenain strongly-correlated materials [93–95].

In analogy with the Bosonic Gaussian case consideredabove, we remind the reader that for Fermionic Gaus-sian systems, the real anti-symmetric covariance matrixΓF associated to a mixed state ρ is given by the matrixelements

ΓabB =i

2Tr(ρ[ξa, ξb]), (28)

where now ξ = (c1, . . . , c2N ) is a vector of Majorana op-erators derived from the system’s physical creation andannihilation operators.

Fermionic Gaussian transformations act as orthogonaltransformations on covariance matrices, and the sameorthogonal that diagonalizes the covariance matrix,

OΓFOT =

Nf⊕j=1

(0 −λjλj 0

), (29)

also diagonalizes the Hamiltonian. For more details, seeAppendix G.

The VQT uses minimization of relative entropy tolearn the orthogonal transformation concurrently withthe effective frequencies of the decoupled Fermionicmodes, essentially learning a representation of the Bo-goliubov transformation. However, implementing this ona discrete-variable quantum computer requires the addi-tional initialization steps of applying a qubit-to-Fermiontransformation, such as the Jordan-Wigner transforma-tion, and then transforming to the basis of Majoranamodes. The orthogonal transformation itself can be di-rectly parameterized in terms of Givens rotations [96] onpairs of modes. This in turn allows for the analytic com-putation and propagation of variable gradients throughthe quantum circuit.

For the completely general BCS Hamiltonian and in-terpretations of its constituent terms, we refer the readerto Appendix H. For present purposes, we consider thecase of uniform Cooper pairing, i.e. ∆ij = ∆, and with-out loss of generality we set the chemical potential tozero, µ = 0, leading to a Hamiltonian of the form

Hdx2−y2=− t

∑〈i,j〉,σ

(a†i,σaj,σ + a†j,σai,σ) (30)

∆∑〈i,j〉

(a†i,↑a†j,↓ − a

†i,↓a†j,↑ + h.c.).

While the d -wave Hamiltonian is Gaussian, the Cooper

pairing induces terms such as a†1,↑a†2,↓. As a result, we

switch to the Majorana basis before learning a statepreparation circuit. See Appendix G for details. In Fig-ure 8, we illustrate the training of VQT on a typicald-wave thermal state. The snapshots on the bottom arecovariance matrices output by the partially trained QNN

Training Loss Target

Training Snapshots

FIG. 8. VQT training for d-wave thermal state with Nx =Ny = 2, with β = 1, t = 0.3, ∆ = 0.2. Upper left: VQTtraining loss (free energy) verus training iteration. Upperright: Depiction of matrix elements of target covariance ma-trix in Majorana basis. Below: Snapshot visualization of ma-trix elements of variationally parameterized covariance matrixin Majorana basis at various points in training process.

at intervals of 20 learning steps. After 150 steps, thetarget covariance matrix and VQT reconstruction are in-discernible by fidelity, entropy and trace distance. Wenote that in classical simulations of QHBM for Gaus-sian Fermions, our models were able to learn successfullyfor much larger systems (up to 50 Fermions), but thecovariance matrices became too large to clearly discernindividual matrix elements as in Figure 8.

V. DISCUSSION & FUTURE WORK

Recent literature in quantum machine learning andquantum-classical variational algorithms has focused onthe learning of pure quantum state approximations to theground or excited states of quantum systems. If we wishto simulate and model the physics of realistic quantumsystems, we must allow for the possibility of interplaybetween classical and quantum correlations.

In this paper we introduced the general formalism ofQHBMs. Previous techniques for learning mixed quan-tum states have been tailored specifically for low-rankdensity matrices [11]. Our method, which minimizes rel-ative entropy rather than Hilbert-Schmidt distance, isgeneric and can be applied to mixed and thermal statesof any rank. It has the additional benefits of yieldingestimates of mixed state entropy, free energy, and the di-agonalizing transformation of the target system, the lastof which enables modular time evolution and facilitatesfull quantum simulation of a previously unknown system.This further opens up the possibility of using quantummachine learning to compute state entropies of analyt-ically intractable systems. Considering the vast bodyof literature exploring the computation of entropies in

Page 12: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

12

physical quantum systems [97], this novel application ofquantum-classical hybrid computing could be quite sig-nificant.

Latent space factorization, while not obligatory forQHBM, leads to widespread applications based on learn-ing decorrelated (factorized) quantum representations,akin to classical disentangled representations [51]. In thecontext of unsupervised learning, learning a disentangledlatent space representation could be useful for tasks suchas latent space interpolation [98] and latent space vectorarithmetic, quantum principal component analysis [99],clustering [100], and more generally entropy-based quan-tum compression as demonstrated in Section IV B. Suchcompression could also be used for unsupervised pre-training in discriminative learning [101]. Progressivelylearning to compress layer-wise while growing the depthof the unitary quantum neural network would yield aprocedure akin to layer-wise pretraining in classical ma-chine learning [101]. Compression could also be used fordenoising quantum data, something previous approachesto quantum autoencoders were not suitable for [102].

As for the case of using a diagonal latent space Hamil-tonian for QMHL, we are then effectively learning theeigenbasis of the data density matrix.

The great advantage over previously proposed varia-tional quantum algorithms for quantum state diagonal-ization is that our method employs the relative entropyrather than a Hilbert-Schmidt metric, thus allowing usto learn to diagonalize much higher rank quantum states.Furthermore, learning to diagonalize a quantum densitymatrix is essentially related to the Quantum PrincipalComponent Analysis algorithm [103] for classical data,and other related quantum machine learning algorithms[8, 104]. Our approach could be seen as a variationalalternative method for these algorithms, circumventingthe need for a very long quantum circuit for the quan-tum state exponentiation [105], which has been deemedintractable even for far-term quantum computers whencompiled [106]. This does not remove the requirement ofthe state preparation (which requires QRAM or a specialoracle [107]). Though, in the case of quantum data, thismethod does not need access to such exotic components,and has the potential to demonstrate a quantum advan-tage for learning the unitary which diagonalizes either aquantum Hamiltonian or quantum density matrix.

Future iterations on this work could modify the lossfunction, potentially introducing a quantum variant ofthe Evidence Lower Bound (ELBO) for the quantum rel-ative entropy loss. This would, in turn, yield a quan-tum form of Variational Autoencoders (VAEs) [30]. BothQHBMs and VAEs could potentially be used to learn aneffective mixed state representation of an unknown quan-tum state from partial quantum tomographic data [108].We aim to tackle this Quantum Neural State Tomogra-phy with QHBMs in future work.

In a near-term future iteration, we plan to fully com-bine classical neural network-based Energy-Based Models[49] with our QHBM. Recent papers have demonstrated

the power of hybridized quantum-classical neural net-works trained using hybrid quantum-classical backprop-agation, originally introduced in [13], then later iteratedupon in [109]. Given previous successes in combiningclassical differentiable models and quantum variationalcircuits, we anticipate this implementation to be straight-forward given the framework laid out in this paper. Sincesuch a full hybridization would allow us to go beyondfull-rank multinoulli estimation for the general classicallatent distribution QHBM, this would be the piece whichwould unlock the scalability of this algorithm to highlynon-trivial large-scale quantum systems. We plan to ex-plore this in upcoming work.

QHBM present several key advantages relative to otherforms of unsupervised quantum learning with quantumneural networks such as quantum GANs [110]. GANs(both quantum and classical) are notoriously difficultto train, and once trained, difficult to extract physicalquantities from. QHBM represent physical quantitiesmore directly, and as such are much more suitable forapplications that involve physical quantum data. Fromour preliminary numerics, our framework also appears totrain very robustly and with few iterations. Furthermore,QHBM require less quantum circuit depth during train-ing than quantum GANs, which require both a quantumgenerator and quantum discriminator. The latter is a keyconsideration for possible implementation on near-termintermediate scale quantum devices [2].

A key point is that in Quantum Modular HamiltonianLearning (QMHL) and VQT, we are not just learninga distribution for the modular Hamiltonian and ther-mal state respectively. Rather, we are learning efficientapproximate parameterizations of these quantities withour quantum circuit. In the context of VQT, this givesus the ability to directly prepare a thermal state onour quantum computer using polynomial resources, fromknowledge of a corresponding Hamiltonian. For Mod-ular Hamiltonian Learning, once an efficient ansatz haslearned the optimal value of parameters such that its out-put approximates our data mixed state, this gives us theability to reproduce as many copies of the learned mixedstate distributions as desired. In addition, the modu-lar Hamiltonian itself provides invaluable information re-lated to topological properties, thermalization, and non-equilibrium dynamics.

In particular, Modular Hamiltonian Learning gives oneaccess to the eigenvalues of the density matrix and theunitary that diagonalizes the Modular Hamiltonian. Ap-plying the QNN to a quantum state brings it into theeigenbasis of the Modular Hamiltonian. In this basis,an exponentiation of the diagonal latent modular Hamil-tonian (which we have a classical description of) imple-ments modular time evolution. We can apply the inverseQNN to return to the original computational basis.

In the same vein, QMHL provides the ability to probea system at different temperatures, something that mixedstate learning on its own does not. Given access to sam-ples from a thermal state at some temperature, one can

Page 13: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

13

thus generate typical samples from the same system atanother temperature by learning the modular Hamilto-nian and systematically changing the latent space param-eters.

Finally, we underscore the generality of our framework.Just as classical EBM are inspired by physics but canbe applied to problems far abstracted from real physicalsystems, so too can our QHBM be employed for any taskwhich involves learning a quantum distribution.

VI. ACKNOWLEDGEMENTS

Numerical experiments in this paper were executed us-ing a custom combination of Cirq [111], TensorFlow [112],and OpenFermion [90]. GV, JM, and SN would like tothank the team at X for the hospitality and support dur-ing their respective Quantum@X residencies where thiswork was completed. X, formerly known as Google[x], ispart of the Alphabet family of companies, which includesGoogle, Verily, Waymo, and others (www.x.company).GV acknowledges funding from NSERC.

[1] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, andH. Neven, Characterizing quantum supremacy in near-term devices, Nature Physics 14, 595 (2018).

[2] J. Preskill, Quantum computing in the nisq era and be-yond, Quantum 2, 79 (2018).

[3] E. Farhi and H. Neven, Classification with quantumneural networks on near term processors, arXiv preprintarXiv:1802.06002 (2018).

[4] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab-bush, and H. Neven, Barren plateaus in quantum neuralnetwork training landscapes, Nature communications 9,4812 (2018).

[5] G. Verdon, M. Broughton, and J. Biamonte, A quantumalgorithm to train neural networks using low-depth cir-cuits, arXiv preprint arXiv:1712.05304 (2017).

[6] P.-L. Dallaire-Demers and N. Killoran, Quantum gen-erative adversarial networks, Physical Review A 98,012324 (2018).

[7] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fu-jii, Quantum circuit learning, Physical Review A 98,032309 (2018).

[8] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,N. Wiebe, and S. Lloyd, Quantum machine learning,Nature 549, 195 (2017).

[9] G. Verdon, M. Broughton, J. R. McClean, K. J.Sung, R. Babbush, Z. Jiang, H. Neven, andM. Mohseni, Learning to learn with quantum neuralnetworks via classical neural networks, arXiv preprintarXiv:1907.05415 (2019).

[10] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T.Sornborger, and P. J. Coles, Quantum-assisted quantumcompiling, Quantum 3, 140 (2019).

[11] R. LaRose, A. Tikku, E. ONeel-Judy, L. Cincio, andP. J. Coles, Variational quantum state diagonalization,npj Quantum Information 5, 8 (2019).

[12] M. Schuld, A. Bocharov, K. Svore, and N. Wiebe,Circuit-centric quantum classifiers, arXiv preprintarXiv:1804.00633 (2018).

[13] G. Verdon, J. Pye, and M. Broughton, A universal train-ing algorithm for quantum deep learning, arXiv preprintarXiv:1806.09729 (2018).

[14] M. Benedetti, E. Lloyd, and S. Sack, Parameterizedquantum circuits as machine learning models, arXivpreprint arXiv:1906.07682 (2019).

[15] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, The theory of variational hybrid quantum-

classical algorithms, New Journal of Physics 18, 023023(2016).

[16] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,A variational eigenvalue solver on a photonic quantumprocessor, Nature communications 5, 4213 (2014).

[17] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-efficient variational quantum eigensolver for smallmolecules and quantum magnets, Nature 549, 242(2017).

[18] O. Higgott, D. Wang, and S. Brierley, Variational quan-tum computation of excited states, Quantum 3, 156(2019).

[19] J. R. McClean, Z. Jiang, N. Rubin, R. Babbush, andH. Neven, Decoding errors using quantum subspace ex-pansions, (2019).

[20] R. M. Parrish, E. G. Hohenstein, P. L. McMahon,and T. J. Martınez, Quantum computation of elec-tronic transitions using a variational quantum eigen-solver, Physical Review Letters 122, 230401 (2019).

[21] E. Farhi, J. Goldstone, and S. Gutmann, A quantumapproximate optimization algorithm, arXiv preprintarXiv:1411.4028 (2014).

[22] E. Farhi and A. W. Harrow, Quantum supremacythrough the quantum approximate optimization algo-rithm, arXiv preprint arXiv:1602.07674 (2016).

[23] Z. Jiang, E. Rieffel, and Z. Wang, A qaoa-inspired cir-cuit for grover’s unstructured search using a transversefield, arXiv preprint arXiv:1702.0257 (2017).

[24] G. Verdon, J. M. Arrazola, K. Bradler, and N. Killo-ran, A quantum approximate optimization algorithm forcontinuous problems, arXiv preprint arXiv:1902.00409(2019).

[25] Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel,xy-mixers: analytical and numerical results for qaoa,arXiv preprint arXiv:1904.09314 (2019).

[26] F. G. Brandao, M. Broughton, E. Farhi, S. Gutmann,and H. Neven, For fixed control parameters the quan-tum approximate optimization algorithm’s objectivefunction value concentrates for typical instances, arXivpreprint arXiv:1812.04170 (2018).

[27] Z. Jiang, E. G. Rieffel, and Z. Wang, Near-optimalquantum circuit for grover’s unstructured search usinga transverse field, Physical Review A 95, 062317 (2017).

[28] L. Li, M. Fan, M. Coram, P. Riley, and S. Leichenauer,Quantum optimization with a novel gibbs objective

Page 14: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

14

function and ansatz architecture search, arXiv preprintarXiv:1909.07621 (2019).

[29] K. P. Murphy, Machine learning: a probabilistic perspec-tive (MIT press, 2012).

[30] D. P. Kingma and M. Welling, Auto-encoding varia-tional bayes, arXiv preprint arXiv:1312.6114 (2013).

[31] R. M. Neal, Probabilistic inference using Markov chainMonte Carlo methods (Department of Computer Sci-ence, University of Toronto Toronto, Ontario, Canada,1993).

[32] M. Betancourt, A conceptual introduction to hamil-tonian monte carlo, arXiv preprint arXiv:1701.02434(2017).

[33] D. J. Rezende and S. Mohamed, Variational inferencewith normalizing flows, arXiv preprint arXiv:1505.05770(2015).

[34] C. Louizos and M. Welling, Multiplicative normalizingflows for variational bayesian neural networks, in Pro-ceedings of the 34th International Conference on Ma-chine Learning-Volume 70 (JMLR. org, 2017) pp. 2218–2227.

[35] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K.Duvenaud, Neural ordinary differential equations, inAdvances in neural information processing systems(2018) pp. 6571–6583.

[36] M. M. Wilde, Quantum information theory (CambridgeUniversity Press, 2013).

[37] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning,nature 521, 436 (2015).

[38] J. Schmidhuber, Deep learning in neural networks: Anoverview, Neural networks 61, 85 (2015).

[39] W. J. Huggins, P. Patil, B. Mitchell, K. B. Whaley,and M. Stoudenmire, Towards quantum machine learn-ing with tensor networks, Quantum Science and tech-nology (2018).

[40] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman,B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Le-ichenauer, Tensornetwork: A library for physics and ma-chine learning, arXiv preprint arXiv:1905.01330 (2019).

[41] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, Atheoretical framework for back-propagation, in Proceed-ings of the 1988 connectionist models summer school,Vol. 1 (CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988)pp. 21–28.

[42] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing,M. Hasegawa-Johnson, and M. N. Do, Semantic imageinpainting with deep generative models, in Proceedingsof the IEEE Conference on Computer Vision and Pat-tern Recognition (2017) pp. 5485–5493.

[43] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Man-zagol, Extracting and composing robust features withdenoising autoencoders, in Proceedings of the 25th inter-national conference on Machine learning (ACM, 2008)pp. 1096–1103.

[44] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunning-ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang,et al., Photo-realistic single image super-resolution us-ing a generative adversarial network, in Proceedings ofthe IEEE conference on computer vision and patternrecognition (2017) pp. 4681–4690.

[45] L. Theis, W. Shi, A. Cunningham, and F. Huszar,Lossy image compression with compressive autoen-coders, arXiv preprint arXiv:1703.00395 (2017).

[46] I. Goodfellow J., J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,Generative adversarial nets, in Proceedings of the Inter-national Conference on Neural Information ProcessingSystems (2014) p. 26722680.

[47] A. Razavi, A. v. d. Oord, and O. Vinyals, Generating di-verse high-fidelity images with vq-vae-2, arXiv preprintarXiv:1906.00446 (2019).

[48] A. Brock, J. Donahue, and K. Simonyan, Large scalegan training for high fidelity natural image synthesis,arXiv preprint arXiv:1809.11096 (2018).

[49] Y. Du and I. Mordatch, Implicit generation and gen-eralization in energy-based models, arXiv preprintarXiv:1903.08689 (2019).

[50] M. Welling and Y. Whye Teh, Bayesian learning viastochastic gradient langevin dynamics, in ICML’11 Pro-ceedings of the 28th International Conference on In-ternational Conference on Machine Learning (2011) p.681688.

[51] Y. Bengio, A. Courville, and P. Vincent, Representationlearning: A review and new perspectives, IEEE trans-actions on pattern analysis and machine intelligence 35,1798 (2013).

[52] C. P. Burgess, I. Higgins, A. Pal, L. Matthey,N. Watters, G. Desjardins, and A. Lerchner, Un-derstanding disentangling in beta-vae, arXiv preprintarXiv:1804.03599 (2018).

[53] J. Marks, T. Jochym-O’Connor, and V. Gheorghiu,Comparison of memory thresholds for planar qudit ge-ometries, New Journal of Physics 19, 113022 (2017).

[54] D. Koller, U. Lerner, and D. Angelov, A general algo-rithm for approximate inference and its application tohybrid bayes nets, in Proceedings of the Fifteenth con-ference on Uncertainty in artificial intelligence (MorganKaufmann Publishers Inc., 1999) pp. 324–333.

[55] A. Serafini, Quantum Continuous Variables: A Primerof Theoretical Methods (CRC Press, 2017).

[56] S. Lloyd and S. L. Braunstein, Quantum computationover continuous variables, in Quantum Information withContinuous Variables (Springer, 1999) pp. 9–17.

[57] M. A. Nielsen and I. Chuang, Quantum computationand quantum information (2002).

[58] N. Gisin and R. Thew, Quantum communication, Na-ture photonics 1, 165 (2007).

[59] C. L. Degen, F. Reinhard, and P. Cappellaro, Quantumsensing, Reviews of modern physics 89, 035002 (2017).

[60] I. Kassal, S. P. Jordan, P. J. Love, M. Mohseni, andA. Aspuru-Guzik, Polynomial-time quantum algorithmfor the simulation of chemical dynamics, Proceedingsof the National Academy of Sciences 105, 1868118686(2008).

[61] I. Kassal, J. D. Whitfield, A. Perdomo-Ortiz, M.-H.Yung, and A. Aspuru-Guzik, Simulating chemistry us-ing quantum computers, Annual Review of PhysicalChemistry 62, 185207 (2011).

[62] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik,Simulation of electronic structure hamiltonians usingquantum computers, Molecular Physics 109, 735750(2011).

[63] H. Wang, S. Kais, A. Aspuru-Guzik, and M. R. Hoff-mann, Quantum algorithm for obtaining the energyspectrum of molecular systems, Phys. Chem. Chem.Phys. 10, 5388 (2008).

Page 15: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

15

[64] B. P. Lanyon, J. D. Whitfield, G. G. Gillett, M. E.Goggin, M. P. Almeida, I. Kassal, J. D. Biamonte,M. Mohseni, B. J. Powell, M. Barbieri, and et al., To-wards quantum chemistry on a quantum computer, Na-ture Chemistry 2, 106111 (2010).

[65] A. Aspuru-Guzik, A. D. Dutoi, P. J. Love, and M. Head-Gordon, Simulated quantum computation of molecularenergies, Science 309, 1704 (2005).

[66] H. Lamm and S. Lawrence, Simulation of nonequi-librium dynamics on a quantum computer, Physi-cal Review Letters 121, 10.1103/physrevlett.121.170501(2018).

[67] M. Endres, M. Cheneau, T. Fukuhara, C. Weitenberg,P. Schauß, C. Gross, L. Mazza, M. C. Banuls, L. Pol-let, I. Bloch, and S. Kuhr, Observation of correlatedparticle-hole pairs and string order in low-dimensionalmott insulators, Science 334, 200 (2011).

[68] M. Cheneau, P. Barmettler, D. Poletti, M. Endres,P. Schau, T. Fukuhara, C. Gross, I. Bloch, C. Kollath,and S. Kuhr, Light-cone-like spreading of correlationsin a quantum many-body system, Nature 481, 484487(2012).

[69] T. Fukuhara, S. Hild, J. Zeiher, P. Schauß, I. Bloch,M. Endres, and C. Gross, Spatially resolved detectionof a spin-entanglement wave in a bose-hubbard chain,Phys. Rev. Lett. 115, 035302 (2015).

[70] S. De Franceschi, R. Hanson, W. G. van der Wiel, J. M.Elzerman, J. J. Wijpkema, T. Fujisawa, S. Tarucha, andL. P. Kouwenhoven, Out-of-equilibrium kondo effect ina mesoscopic device, Phys. Rev. Lett. 89, 156801 (2002).

[71] H. E. Tureci, M. Hanl, M. Claassen, A. Weichselbaum,T. Hecht, B. Braunecker, A. Govorov, L. Glazman,A. Imamoglu, and J. von Delft, Many-body dynamicsof exciton creation in a quantum dot by optical absorp-tion: A quantum quench towards kondo correlations,Phys. Rev. Lett. 106, 107402 (2011).

[72] C. Latta, F. Haupt, M. J. Hanl, A. Weichselbaum,M. Claassen, W. Wuester, P. Fallahi, S. Faelt, L. Glaz-man, J. von Delft, H. E. Tureci, and A. Imamoglu,Quantum quench of kondo correlations in optical ab-sorption, Nature 474, 627 (2011).

[73] A. M. Kaufman, M. E. Tai, A. Lukin, M. Rispoli,R. Schittko, P. M. Preiss, and M. Greiner, Quan-tum thermalization through entanglement in an isolatedmany-body system, Science 353, 794800 (2016).

[74] G. P. Berman, V. Y. Rubaev, and G. M. Zaslavsky,The problem of quantum chaos in a kicked harmonicoscillator, Nonlinearity 4, 543 (1991).

[75] L. M. Sieberer, T. Olsacher, A. Elben, M. Heyl,P. Hauke, F. Haake, and P. Zoller, Digital quantum sim-ulation, trotter errors, and quantum chaos of the kickedtop, npj Quantum Information 5, 10.1038/s41534-019-0192-5 (2019).

[76] A. J. Heeger, S. Kivelson, J. R. Schrieffer, and W. P.Su, Solitons in conducting polymers, Rev. Mod. Phys.60, 781 (1988).

[77] W. P. Su, J. R. Schrieffer, and A. J. Heeger, Soliton exci-tations in polyacetylene, Phys. Rev. B 22, 2099 (1980).

[78] W. P. Su, J. R. Schrieffer, and A. J. Heeger, Solitons inpolyacetylene, Phys. Rev. Lett. 42, 1698 (1979).

[79] S. Bravyi, Lagrangian representation for fermionic lin-ear optics (2004), arXiv:quant-ph/0404180 [quant-ph].

[80] S. Bravyi and R. Koenig, Classical simulation of dis-sipative fermionic linear optics (2011), arXiv:1112.2184

[quant-ph].

[81] F. de Melo, P. Cwiklinski, and B. M. Terhal, The powerof noisy fermionic quantum computation, New Journalof Physics 15, 013015 (2013).

[82] B. S. DeWitt, Quantum field theory in curved space-time, Physics Reports 19, 295 (1975).

[83] V. Mukhanov and S. Winitzki, Introduction to quantumeffects in gravity (Cambridge University Press, 2007).

[84] J. Williamson, On the algebraic problem concerning thenormal forms of linear dynamical systems, AmericanJournal of Mathematics 58, 141163 (1936).

[85] A. Serafni, Quantum Continuous Variables (CRC Press,2017).

[86] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld,N. Quesada, and S. Lloyd, Continuous-variable quan-tum neural networks (2018), arXiv:1806.06871 [quant-ph].

[87] I. Bloch, Ultracold quantum gases in optical lattices,Nature physics 1, 23 (2005).

[88] D. Kielpinski, C. Monroe, and D. J. Wineland, Architec-ture for a large-scale ion-trap quantum computer, Na-ture 417, 709 (2002).

[89] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank,E. Jeffrey, T. C. White, J. Mutus, A. G. Fowler,B. Campbell, et al., Superconducting quantum circuitsat the surface code threshold for fault tolerance, Nature508, 500 (2014).

[90] J. R. McClean, K. J. Sung, I. D. Kivlichan, X. Bonet-Monroig, Y. Cao, C. Dai, E. S. Fried, C. Gid-ney, B. Gimby, P. Gokhale, T. Hner, T. Hardikar,V. Havlek, O. Higgott, C. Huang, J. Izaac, Z. Jiang,W. Kirby, X. Liu, S. McArdle, M. Neeley, T. O’Brien,B. O’Gorman, I. Ozfidan, M. D. Radin, J. Romero,N. Rubin, N. P. D. Sawaya, K. Setia, S. Sim, D. S.Steiger, M. Steudtner, Q. Sun, W. Sun, D. Wang,F. Zhang, and R. Babbush, Openfermion: The elec-tronic structure package for quantum computers, arXivpreprint arXiv:1710.07629 (2017).

[91] S. Bravyi and D. Gosset, Complexity of quantumimpurity problems, Communications in MathematicalPhysics 356, 451 (2017).

[92] B. Bauer, D. Wecker, A. J. Millis, M. B. Hastings, andM. Troyer, Hybrid quantum-classical approach to cor-related materials, Phys. Rev. X 6, 031045 (2016).

[93] P. W. Anderson, Localized magnetic states in metals,Phys. Rev. 124, 41 (1961).

[94] R. Schmidt, J. D. Whalen, R. Ding, F. Camargo,G. Woehl, S. Yoshida, J. Burgdorfer, F. B. Dunning,E. Demler, H. R. Sadeghpour, and T. C. Killian, Theoryof excitation of rydberg polarons in an atomic quantumgas, Phys. Rev. A 97, 022707 (2018).

[95] F. Camargo, R. Schmidt, J. D. Whalen, R. Ding,G. Woehl, S. Yoshida, J. Burgdorfer, F. B. Dunning,H. R. Sadeghpour, E. Demler, and T. C. Killian, Cre-ation of rydberg polarons in a bose gas, Phys. Rev. Lett.120, 083401 (2018).

[96] D. Wecker, M. B. Hastings, N. Wiebe, B. K. Clark,C. Nayak, and M. Troyer, Solving strongly correlatedelectron models on a quantum computer, Phys. Rev. A92, 062318 (2015).

[97] H. Casini and M. Huerta, Entanglement entropy in freequantum field theory, Journal of Physics A: Mathemat-ical and Theoretical 42, 504007 (2009).

Page 16: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

16

[98] P. Bojanowski, A. Joulin, D. Lopez-Paz, and A. Szlam,Optimizing the latent space of generative networks,arXiv preprint arXiv:1707.05776 (2017).

[99] S. Wold, K. Esbensen, and P. Geladi, Principal compo-nent analysis, Chemometrics and intelligent laboratorysystems 2, 37 (1987).

[100] T. Hofmann, Unsupervised learning by probabilistic la-tent semantic analysis, Machine learning 42, 177 (2001).

[101] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol,P. Vincent, and S. Bengio, Why does unsupervised pre-training help deep learning?, Journal of Machine Learn-ing Research 11, 625 (2010).

[102] J. Romero, J. P. Olson, and A. Aspuru-Guzik, Quantumautoencoders for efficient compression of quantum data,Quantum Science and Technology 2, 045001 (2017).

[103] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantumprincipal component analysis, Nature Physics 10, 631(2014).

[104] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum sup-port vector machine for big data classification, Physicalreview letters 113, 130503 (2014).

[105] T. R. Bromley and P. Rebentrost, Batched quantumstate exponentiation and quantum hebbian learning,Quantum Machine Intelligence 1, 31 (2019).

[106] A. Scherer, B. Valiron, S.-C. Mau, S. Alexander,E. Van den Berg, and T. E. Chapuran, Concrete re-source analysis of the quantum linear-system algorithmused to compute the electromagnetic scattering crosssection of a 2d target, Quantum Information Process-ing 16, 60 (2017).

[107] V. Giovannetti, S. Lloyd, and L. Maccone, Quantumrandom access memory, Physical review letters 100,160501 (2008).

[108] G. D’Ariano and P. L. Presti, Quantum tomography formeasuring experimentally the matrix elements of an ar-bitrary quantum operation, Physical review letters 86,4195 (2001).

[109] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, andN. Killoran, Evaluating analytic gradients on quantumhardware, Physical Review A 99, 032331 (2019).

[110] S. Lloyd and C. Weedbrook, Quantum generative ad-versarial networks for learning and loading random dis-tributions, arXiv preprint arXiv:1804.09139 (2018).

[111] Cirq: A python framework for creating, editing, andinvoking noisy intermediate scale quantum (nisq) cir-cuits., https://github.com/quantumlib/Cirq.

[112] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is-ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-enberg, D. Mane, R. Monga, S. Moore, D. Murray,C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,F. Viegas, O. Vinyals, P. Warden, M. Wattenberg,M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems (2015),software available from tensorflow.org.

[113] D. Gottesman, Stabilizer codes and quantum error cor-rection, Ph.D. thesis, California Institute of Technology(1997).

[114] M. Suzuki, Fractal decomposition of exponential opera-tors with applications to many-body theories and montecarlo simulations, Physics Letters A 146, 319 (1990).

[115] E. Campbell, Random compiler for fast hamiltoniansimulation, Physical review letters 123, 070503 (2019).

[116] M. Abramowitz and I. Stegun, 1965, handbook of math-ematical functions, Appl. Math. Ser 55 (2006).

[117] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, andN. Killoran, Evaluating analytic gradients on quantumhardware, arXiv preprint arXiv:1811.11184 (2018).

[118] A. Harrow and J. Napp, Low-depth gradient mea-surements can improve convergence in variationalhybrid quantum-classical algorithms, arXiv preprintarXiv:1901.05374 (2019).

[119] Z. Jiang, K. J. Sung, K. Kechedzhi, V. N. Smelyanskiy,and S. Boixo, Quantum algorithms to simulate many-body physics of correlated fermions, Phys. Rev. Applied9, 044036 (2018).

[120] M. A. de Gosson, Symplectic Methods in HarmonicAnalysis and in Mathematical Physics (Springer, 2011).

[121] G. Ortiz, J. E. Gubernatis, E. Knill, and R. Laflamme,Quantum algorithms for fermionic simulations, Phys.Rev. A 64, 022319 (2001).

[122] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik,Simulation of electronic structure hamiltonians us-ing quantum computers, Molecular Physics 109, 735(2011).

[123] J. T. Seeley, M. J. Richard, and P. J. Love, The bravyi-kitaev transformation for quantum computation of elec-tronic structure, The Journal of Chemical Physics 137,224109 (2012).

[124] S. B. Bravyi and A. Y. Kitaev, Fermionic quantum com-putation, Annals of Physics 298, 210 (2002).

[125] R. C. Ball, Fermions without fermion fields, Phys. Rev.Lett. 95, 176407 (2005).

[126] F. Verstraete and J. I. Cirac, Mapping local hamiltoni-ans of fermions to local hamiltonians of spins, Journalof Statistical Mechanics: Theory and Experiment 2005,P09012 (2005).

[127] J. D. Whitfield, V. c. v. Havlıcek, and M. Troyer, Localspin operators for fermion simulations, Phys. Rev. A 94,030301 (2016).

Appendix A: Quantum Neural Networks andGradients

A Quantum Neural Network can generally be writtenas a product of layers of unitaries in the form

U(φ) =

L∏`=1

V `U `(φ`), (A1)

where the `th layer of the QNN consists of the product ofV `, a non-parametric unitary, and U `(φ`) a unitary with(possibly) multiple variational parameters (note the su-perscripts here represent indices rather than exponents).The multi-parametric unitary of a given layer can itselfbe generally comprised of multiple unitaries U `j (φ`j)

M`j=1

applied in parallel:

U `(φ`) ≡M⊗j=1

U `j (φ`j), (A2)

Page 17: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

17

here I` represents the set of indices of the correspondingto the `th layer. Finally, each of these unitaries U `j can beexpressed as the exponential of some generator gj`, whichitself can be any Hermitian operator on n qubits (thusexpressible as a linear combination of n-qubit Pauli’s),

U `j (φ`j) = e−iφ`j g`j , g`j =

Kj`∑k=1

βj`k Pk, (A3)

where Pk ∈ Pn (Paulis on n-qubits [113]) and βj`k ∈ R forall k, j, `. For a given j and `, in the case where all thePauli terms commute, i.e. [Pk, Pm] = 0 for all m, k such

that βj`m , βj`k 6= 0, one can simply decompose the unitary

into a product of exponentials of each term,

U `j (φ`j) =∏k

e−iφ`jβj`k Pk . (A4)

Otherwise, in instances where the various terms do notcommute, one may apply a Trotter-Suzuki decompositionof this exponential [114], or other quantum simulationmethods [115].

In order to optimize the parameters of an ansatz fromequation (A1), we need a cost function to optimize. Inthe case of standard variational quantum algorithms thiscost function is most often chosen to be the expectationvalue of a cost Hamiltonian,

f(φ) = 〈H〉φ ≡ 〈Ψ0|U†(φ)HU(φ)|Ψ0〉 (A5)

where |Ψ0〉 is the input state to the parametric circuit.In general, the cost Hamiltonian can be expressed as alinear combination of operators, e.g. in the form

H =

N∑k=1

αkhk ≡ α · h, (A6)

where we defined a vector of coefficients α ∈ RN anda vector of N operators h. Often this decompositionis chosen such that each of these sub-Hamiltonians arein the n-qubit Pauli group hk ∈ Pn. The expectationvalue of this Hamiltonian is then generally evaluated viaquantum expectation estimation, i.e. by taking the linearcombination of expectation values of each term

f(φ) = 〈H〉φ =

N∑k=1

αk〈hk〉φ ≡ α · hφ, (A7)

here we introduced the vector of expectations hφ ≡ 〈h〉φ.In the case of non-commuting terms, the various expecta-

tion values 〈hk〉φ terms are estimated over separate runs.Now that we have established how to evaluate the loss

function, let us describe how to obtain gradients of thecost function with respect to the parameters. A simpleapproach is to use simple finite-difference methods, forexample, the central difference method,

∂kf(φ) = 12ε [f(φ+ ε∆k)− f(φ− ε∆k)] +O(ε2) (A8)

which, in the case where there are M continuous param-eters, involves 2M evaluations of the objective function,each evaluation varying the parameters by ε in some di-rection, thereby giving us an estimate of the gradientof the function with a precision O(ε2). Here the ∆k

is a unit-norm perturbation vector in the kth directionof parameter space, (∆k)j = δjk. In general, one mayuse lower-order methods, such as forward difference withO(ε) error from M + 1 objective queries [3], or higherorder methods, such as a five-point stencil method, withO(ε4) error from 4M queries [116].

As recently pointed out in various works [117, 118],given knowledge of the form of the ansatz (e.g. as in(A3)), one can measure the analytic gradients of the ex-pectation value of the circuit for Hamiltonians which havea single-term in their Pauli decomposition (A3) (or, al-ternatively, if the Hamiltonian has a spectrum ±λ forsome positive λ). For multi-term Hamiltonians, in [117]a method to obtain the analytic gradients is proposedwhich uses a linear combination of unitaries. Recentmethods use a stochastic estimation rule to reduce thenumber of runs needed and keep the optimization effi-cient [118]. In our numerics featured in this paper, wesimply used first-order finite (central) difference methodsfor our gradients.

Appendix B: Gradients of QHBMs for generativemodelling

1. Gradients of model parameters

Using our notation for the pulled-back data stateσD,φ ≡ U†(φ)σDU(φ), the gradient with respect tomodel parameters φ will be given by

∂φjL(θ,φ) = ∂φj tr(KθσD,φ) = ∂φj 〈Kθ〉σD,φ , (B1)

simply is the gradient of the latent modular Hamilto-nian expectation value with respect to the data statepushed through the reverse quantum neural networkU†(φ). Taking gradients of the expectation value of amulti-term observable is the typical scenario encounteredin regular VQE and quantum neural networks, there existmultiple strategies to get these gradients (see AppendixA).

2. Gradients of variational parameters

For gradients of the loss function with respect to modelparameters θ, note that we need to take gradients of boththe modular Hamiltonian terms and the partition func-tion. For gradients of the modular Hamiltonian expec-tation term with respect to the variational parametersθ, we change the observable we are taking the expecta-tion value of, this is not as common in the literature onquantum neural networks, but it is fairly simple:

Page 18: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

18

∂θm〈Kθ〉σD,φ = 〈∂θmKθ〉σD,φ = 〈∂θmKm(θm)〉σD,φ ,(B2)

it is simply the expectation value of the gradient. Notethat here we assumed the latent modular Hamiltonianwas of the form

Kθ =∑m

Km(θ), (B3)

which is valid both for a diagonal latent multinoulli vari-able (Km = |m〉〈m|) and a factorized latent distribution,

in which case each Km can be understood as a subsystemHamiltonian.

For simple cases where the parameterized modularHamiltonian is simply a scalar parameter times a fixedoperator M , i.e., Km(θm) = θmMm, then the above sim-plifies to

∂θm〈Kθ〉σD,φ = 〈Mm〉σD,φ = tr(MmσD,φ), (B4)

which is a simple expectation value.Now, to get the gradient of the partition function,

given our assumed knowledge of each of the simple modu-lar Hamiltonian, we should have an analytical expressionfor the partition function which we can use. Notice,

logZθ =∑j

logZj(θj) (B5)

=∑j

log[tr(e−Kj(θj))]

=∑j

log(∑kj

kj(θj))

where kj(θj) are the eigenvalues of the jth parameterized

modular Hamiltonian Kj(θj).We can use this form to derive the gradient of this term

with respect to model parameters

∂θm logZθ = ∂θm log[tr(e−Km(θm))] (B6)

= 1Zm(θm)∂θmtr(e−Km(θm))

= − 1Zm(θm) tr(e−Km(θm)∂θmKm(θm)).

Once again, for simple parameterized modular Hamilto-nians of the form Km(θm) = θmMm, then this formulabecomes

∂θm logZθ = − 1Zm(θm) tr(e−θmMmMm) (B7)

which is straightforward to evaluate analytically classi-cally given knowledge of the spectrum of Mm. Techni-cally, there is no need to use the quantum computer toevaluate this part of the gradient.

Thus, we have seen how to take gradients of all terms ofthe loss function. Thus, by simply training on the quan-tum cross entropy between the data and our model, wecan theoretically fully train our generative model usinggradient-based techniques.

Appendix C: Distance Metrics

The trace distance is defined to be

T (ρ, σ) :=1

2Tr[√

(ρ− σ)†(ρ− σ)], (C1)

and generic mixed state fidelity is calculated as

F (ρ, σ) :=[Tr

√√ρσ√ρ]2

(C2)

Appendix D: Investigation of Ansatz Performance

In the case of Gaussian states, we are guaranteed thata factorized latent space solution exists. For qubit sys-tems with inter-qubit coupling, this is not the case. Ofcourse fully parameterizing both the unitary and the clas-sical distribution should allow for perfect reconstruction- and hybrid quantum-classical algorithms should enablethe efficient extraction of typical samples. That said, it isinteresting to see just how much representational powerthe product ansatz actually has when applied to non-Gaussian systems. To this end, we investigate its perfor-mance both as a function of temperature, and in com-parison to the actual performance of the general ansatz,which has a fully parameterizable diagonal distribution.

Our simulations are small in scale and restricted in theclasses of models considered, but promising nonetheless.In Figure 9, we show trace distance and fidelity for aone-dimensional heisenberg model with transverse andlongitudinal magnetic fields,

H = −∑〈ij〉

Si · Sj +∑j

(hxSxj + hzSzj ), (D1)

where the presence of perpendicular field terms inducesquantum fluctuations. In the limit β → 0, the thermalfluctuations completely dominate the quantum, and theresulting thermal state is completely mixed. The essen-tially perfect reconstruction in that limit - and with lit-tle variance - reflects the high approximate degeneracyof the solution space. In the opposite limit, β → ∞,the thermal state approaches the ground state. Classicalcorrelations vanish and the state becomes pure. In thislimit, we achieve asymptotically ideal reconstruction withhigh probability, but occasionally the variational proce-dure gets stuck in local minima, resulting in the highvariance around the mean.

In the intermediate regime both quantum and classicalcorrelations are important, and the lack of generality infactorized latent space results in slight performance losseswith respect to a generic ansatz. For large systems, thefidelity and trace distance will not necessary perform wellon models trained to optimize relative entropy.

In the reconstruction tables for fidelity II(a) and tracedistance, II(b), QMHL was performed at inverse temper-ature β = 1.3, which is in the general region where we

Page 19: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

19

0.0

0.1

0.2

Trace Distance

0.0 2.5 5.0 7.5 10.0 12.5 15.0

0.9

1.0Fidelity

FIG. 9. Performance metrics for QMHL with factorized la-tent space ansatz on the thermal state of a fixed Hamiltonianat different temperatures. Solid dark line denotes mean, andshaded translucent region denotes 95% confidence interval.Model under consideration is a four qubit one-dimensionalHeisenberg model J = −1.0 with transverse and longitudinalmagnetic fields, Jx = 0.3, Jz = 0.2, and 50 trial initializa-tions were chosen for each inverse temperature β.

TABLE II. Reconstruction metrics for QMHL with latentspace product (factorized) and general ansatz, computed for afamily of random coupling models on a four qubit chain afterconvergence or 200 training steps. For each of the 10 modelHamiltonians, 10 random initializations were chosen for eachansatz. Mean, min and max are with respect to these initial-izations, and the values reported represent averages over theclass of Hamiltonians.

(a) Fidelity

Ansatz Mean Min MaxFactorized 0.871 0.752 0.919General 0.935 0.881 0.950

(b) Trace Distance

Ansatz Mean Min MaxFactorized 0.221 0.172 0.367General 0.173 0.153 0.249

expect most drastic deviation from the factorized latentspace assumption. We see that the performance loss ascompared to the generic ansatz is relatively small. Ran-dom models were chosen from the class

H =∑〈ij〉

(αijXiXj + βij YiYj + γijZiZj) +∑j

hj · Sj ,

(D2)

where αij , βij , γij , hj were all sampled independentlyfrom the uniform distribution on the interval [−1, 1] ⊂ R,and allowed to differ on each site and edge.

Appendix E: Continuous Variable GaussianQuantum Information

By Wick’s theorem, Gaussian states (Bosonic orFermionic) are completely specified by their first and sec-ond moments. Mathematically, this means that all higherorder correlation functions (n-point correlators) can bewritten as sums of products of second order correlators:

〈ξa1 · · · ξan〉 =∑∏

〈ξaj · · · ξbj 〉 (E1)

For thermal states, the mean displacement is zero andthe density matrix ρ is completely replaced by the co-variance matrix Γ (2N × 2N for systems of N Bosons).

The ‘position’ and ‘momentum’ quadratures, definedby

xj =1√2

(aj + a†j) (E2)

pj =i√2

(aj − a†j), (E3)

are conjugate variables, satisfying the canonicalBosonic commutation relations, [xj , pk] = 2iδjk. If wecombine all of the quadratures into a vector of operators,ξ = (x1, . . . , xN , p1, . . . , pN ), then the Bosonic commuta-tions take the form

[ξa, ξb] = iΩab, (E4)

where

Ω =

[0 IN−IN 0

](E5)

is a 2N × 2N skew-symmetric matrix known as the sym-plectic form, and IN is the N ×N identity matrix.

Bosonic Gaussian transformations are then transfor-mations that preserve the symplectic form, i.e. S ∈Sp(2Nb) where

STΩS = Ω. (E6)

The uncertainty principle (encoded in the commuta-tion relations) constrains the covariance matrix to ΓB ≥iΩ.

In symplectic diagonal form, we now have independenteffective oscillators, and the entropy takes the simplifiedform

SB(σ) =∑j

ωjeωj − 1

− log(1− e−ωj ), (E7)

where ωj is the frequency of the jth effective oscillator,and β, ~, and kB are set to unity.

Page 20: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

20

The logarithm of the partition function also takes asimplified form:

logZθ =∑j

−ωj2− log(1− e−ωj ), (E8)

Appendix F: Gaussian Model Parameterizations

In Section IV, we utilize the Gaussian nature of theproblems to better parameterize the unitary transforma-tions. Here, we explain those parameterizations and howone may operationally implement such parameterizedtransformations on various quantum computing plat-forms.

1. Fermions

As mentioned in the main text, Fermionic Gaussiantransformations are given by orthogonal transformationsacting on Fermionic modes. Multiple methods exist forparameterizing the class of orthogonal transformations.In classical simulation of QHBM, one can take advantageof a consequence of the lie algebra - lie group correspon-dence: the space of real orthogonal matrices of size Nis spanned by eA paremeterized by anti-symmetrizing alower-diagonal matrix with N(N −1)/2 non-zero entries.

For QHBM on a real quantum computer, it is prefer-able to use Givens rotations on pairs of modes. The ma-trix describing this two-mode mixing is a rotation,

G(φ) =

(cos(φ) − sin(φ)sin(φ) cos(φ)

). (F1)

Such a rotation can be generalized to include complexphases, but that is unnecessary when staying within theconfines of the special orthogonal group. The space ofsuch orthogonal transformations can be systematicallydecomposed into sequences of Givens rotations which acton pairs of modes, as detailed in [119].

2. Bosons

By the Bloch-Messiah decomposition [120], any sym-plectic matrix S of size 2N can be brought into the form

S = O1DO2 (F2)

where O1 and O2 are real orthogonal and sym-plectic, and D is a diagonal matrix of the formdiag(d1, d2, . . . dN , d

−11 , d−1

2 , ..., d−1N ).

For classical simulation of QHBM, it is useful to de-compose this further. Every real orthogonal symplecticmatrix,

O =

[X Y−Y X

]∈ R2N×2N

can be associated to a unitary U = X + iY ∈CN×N . The space of such unitaries is spanned by ex-ponentiating anti-Hermitian matrices. Each Hermitianmatrix is described by N(N − 1)/2 complex parame-ters. Finally, we can generate Hermitian matrices byanti-symmetrizing (with hermitian conjugate) complex-valued lower-diagonal matrices. We can thus classicallyparameterize the space of symplectic transformations byreading the sequence of decompositions backwards start-ing from lower-diagonal matrices. Altogether, a symplec-tic matrix for an N -mode system may be described by2N2 −N real parameters.

In quantum optical setups, O1 and O2 represent pas-sive transformations, such as performed by beam split-ters and phase shifters, and D can be implemented via asequence of single-mode squeezers, as explained in [85].

Appendix G: Gaussian Fermionic Quantum Systems

In this appendix, we give a more detailed prescriptionfor how one could implement QHBM for Fermionic Gaus-sian systems. While we employ the Jordan-Wigner trans-formation [121, 122] to symbolically map Fermionic op-erators to qubit operators, we note that other mappings,like Bravyi-Kitaev (BK) [123, 124] or Ball-Verstraete-Cirac (BVC) [125–127] transformations also suffice aslong as appropriate parameterizations are chosen.

In the JW transformation, effective Fermionic creationand annihilation operators are induced by combiningqubit operators,

ak =1

2(Xk + iYk)Z1 · · · Zk−1

a†k =1

2(Xk − iYk)Z1 · · · Zk−1, (G1)

where Xk, Yk, and Zk are Pauli operators on qubit k.In this language, it’s natural to consider the occupationof the kth effective Fermionic mode,

nk ≡ c†k ck =1

2(1− Zk), (G2)

and we see that we can ’measure’ the occupation ofthe mode just by measuring the Pauli-Z operator on thecorresponding qubit.

In making this prescription, we are implicitly definingan ordering on the qubits. For one-dimensional systems,this is natural, and all next neighbor effective Fermionichopping gates are local in qubit operators:

c†k ck+1 + c†k+1ck =1

2(XkXk+1 + YkYk+1). (G3)

For higher-dimensional systems, methods have beendeveloped that handle the extra phases involved.

Page 21: arXiv:1910.02071v1 [quant-ph] 4 Oct 2019

21

For each pair of physical Fermionic creation and an-

nihilation operators, aj , a†j , we can associate a pair of

virtual Majorana modes, c2j−1, c2j , according to

c2j−1 = aj + a†j (G4)

c2j = i(aj − a†j) (G5)

The canonical transformation from physical Fermionicoperators to Majorana operators is a local rotation inoperator space for each Fermion individually. Whilethe defining equations for the Majoranas are cosmeti-cally identical (up to a multiplicative constant) to thosedefining Bosonic conjugate variables p and q, these Ma-jorana Fermions satisfy the canonical Fermionic anti-commutation relations, ck, cl = 2δkl.

In the Majorana basis, any Fermionic Gaussian Hamil-tonian takes the generic form

HF = i

2Nf∑i,j

hij cicj + E, (G6)

where Nf is the total number of Fermions, and E issome constant energy shift associated with the transfor-mation to Majorana modes.

In this basis, we can approach the Fermionic problemin a similar manner to the Bosonic problem - we wantto decompose the system (now written in terms of Ma-joranas) into independent effective Fermionic oscillators.Said in another way, we want to find a set of Fermionicoperators, c1, . . . , c2Nf and a corresponding set of en-ergies ε1, . . . , ε2Nf , such that

HF =

2Nf∑j=1

εj c†j cj + const. (G7)

As mentioned in the main text, this diagonalizationcan be performed by an orthogonal transformation onthe covariance matrix associated with the thermal state.

In general, an orthogonal transformation on Fermionicmodes must also contain particle-hole transformations,but working in the Majorana basis these are not neces-sary. A generic orthogonal matrix requires no more thanO(N2

f ) such Givens rotations.

In one-dimensional systems, the JW transformationmakes for a remarkably simple implementation of Givensrotations, including two CNOT gates and one controlled-phase rotation. For higher-dimensional systems, tech-niques exist for dealing with the phases that arise in theJW mapping. Moreover, Givens rotations are incrediblyclose to the physical operations performed on real quan-tum devices.

Finally, the derivative of a Givens rotation analyticallyis

G′(φ) = G(φ+π

2), (G8)

meaning that gradients can be computed easily usingexact methods, by physically measuring expectation val-ues with parameters shifted by π

2 .

Appendix H: D-wave Superconductivity

The mean-field BCS Hamiltonian has become aparadigmatic phenomenological model in the study of su-perconductors.

The generic model is described by Hamiltonian

Hdx2−y2=− t

∑〈i,j〉,σ

(a†i,σaj,σ + a†j,σai,σ) + µ∑i,σ

a†i,σai,σ∑〈i,j〉

∆ij(a†i,↑a†j,↓ − a

†i,↓a†j,↑ + aj,↓ai,↑ − aj,↑ai,↓)

(H1)

defined for Fermions (electrons) on a two-dimensional

lattice of size Nx by Ny sites, where a (a†j) is a Fermionic

annihilation (creation) operator on lattice site j. Eachsite hosts two spin-orbitals (one for spin up, and one forspin down), giving a total of 2×Nx×Ny Fermions. Here,µ represents a chemical potential that sets the filling den-sity. t denotes next-neighbor hopping strength, and ∆ij

parameterizes the superconducting gap (less formally, itincentivizes the formation of Cooper pairs).