computation of matrix norms with applications to …nemirovs/daureen.pdf1.1 matrix norm problem:...

68
COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO ROBUST OPTIMIZATION Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Operations Research and System Analysis Daureen Steinberg SUBMITTED TO THE SENATE OF THE TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY TAMUZ 5765 HAIFA JULY 2005

Upload: others

Post on 30-Jun-2020

17 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

COMPUTATION OF MATRIX NORMSWITH APPLICATIONS

TO ROBUST OPTIMIZATION

Research Thesis

Submitted in partial fulfillment of the requirements for the degree ofMaster of Science in Operations Research and System Analysis

Daureen Steinberg

SUBMITTED TO THE SENATE OF THE TECHNION - ISRAEL INSTITUTE OFTECHNOLOGY

TAMUZ 5765 HAIFA JULY 2005

Page 2: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

The research thesis was done under the supervision of Prof. A. S. Nemirovski in the Faculty ofIndustrial Engineering and Management.

The generous financial help of the Technion is gratefully acknowledged.

i

Page 3: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

ii

Page 4: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Contents

1 Introduction 11.1 Matrix Norm problem: setting and motivation . . . . . . . . . . . . . . . . . . . 1

1.1.1 Matrix Norm problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Robust Optimization and Matrix Norm problem . . . . . . . . . . . . . . 2

1.2 Basic facts on matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Norms on Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Induced norms of linear mappings . . . . . . . . . . . . . . . . . . . . . . 81.2.3 Induced norms and quadratic maximization . . . . . . . . . . . . . . . . . 9

1.3 Solvability status of the Matrix Norm problem: known results . . . . . . . . . . . 111.3.1 When Pp,r is known to be easy . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2 When Pp,r is known to be difficult . . . . . . . . . . . . . . . . . . . . . . 121.3.3 Known approximation results for Pp,r . . . . . . . . . . . . . . . . . . . . 14

1.4 Overview of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Complexity of the Matrix Norm problem 172.1 NP-Hardness: preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Generic optimization problems: instances, data vectors and sizes . . . . . 172.1.2 ε-solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.1.3 Model of computations, solution algorithms, complexity and polynomial

time solvability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Computationally intractable optimization problems . . . . . . . . . . . . . . . . . 19

2.2.1 Combinatorial Complexity Theory: problem classes NP and P . . . . . . 192.2.2 NP-hard combinatorial problems . . . . . . . . . . . . . . . . . . . . . . . 202.2.3 Difficult problems of Continuous Optimization . . . . . . . . . . . . . . . 20

2.3 NP-hardness of the Matrix Norm Problem . . . . . . . . . . . . . . . . . . . . . . 222.3.1 The strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.2 Demonstrating A.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.3 Demonstrating A.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Approximating ‖A‖p,r in the case of 1 ≤ r ≤ 2 ≤ p ≤ ∞ 273.1 Semidefinite Relaxation bound on ‖A‖p,r . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 Derivation of the bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.2 Processing the bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Quality of the bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.1 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.2 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

iii

Page 5: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

3.2.4 Evaluating tightness of the bound . . . . . . . . . . . . . . . . . . . . . . 373.3 Exactness of the bound for nonnegative matrices . . . . . . . . . . . . . . . . . . 39

4 Approximating ‖A‖p,r in the entire range of p, r 414.1 Developing tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 Interpolating the norm bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.2 Interpolating the norm bound . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.3 A rough summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Bounding ‖A‖‖·‖p,|·|∞ 495.1 Motivation: Robust Semidefinite Programming . . . . . . . . . . . . . . . . . . . 49

5.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.1.2 Approximate Robust Counterparts . . . . . . . . . . . . . . . . . . . . . . 505.1.3 Robust Counterpart of uncertain Linear Matrix Inequality and Norm

bounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2 Bounding ‖A‖p,∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

References 59

iv

Page 6: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

List of Figures

4.1 Partition of the domain S = (α, β) : 0 ≤ α, β ≤ 1 (square ABCD, α along theX-axis, β along the Y -axis, M = (1/2, 1/2)) . . . . . . . . . . . . . . . . . . . . . 43

4.2 Sample graphs of Ω(α, n, β, m) as a function of α, β ∈ [0, 1]. α along axis AB, βalong axis AD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

v

Page 7: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

vi

Page 8: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Abstract

The Thesis is devoted to investigating the problem of computing the norm ‖A‖E,F =max

x∈E:‖x‖E≤1‖Ax‖F of a linear mapping x 7→ Ax acting from a finite-dimensional normed space

(E, ‖ · ‖E) to a finite-dimensional normed space (F, ‖ · ‖F ). This problem is important andinteresting by its own right and especially due to its role in Robust Optimization. We mainlyfocus on the case where (E, ‖ · ‖E) = (Rn, ‖ · ‖p) and (F, ‖ · ‖F ) = (Rm, ‖ · ‖r), so that A can beidentified with an m × n matrix; the associated norm ‖A‖E,F is denoted by ‖A‖p,r. There arethree simple cases ((p = 1, 1 ≤ r ≤ ∞), (r = ∞, 1 ≤ p ≤ ∞), p = r = 2) where ‖ · ‖p,r is easy tocompute. We conjecture that these are the only 3 cases where Pp,r is not NP-hard, and provethat Pp,r is NP-hard in the case when 1 ≤ r < p ≤ ∞. We further focus on building efficientlycomputable upper bounds on ‖·‖p,r. Our first result in this direction is a refinement of Nesterov’stheorems (see [21] and Chapter 13.2 in [22]) stating that in the case of 1 ≤ r ≤ 2 ≤ p ≤ ∞ anatural semidefinite relaxation upper bound Ψp,r(A) on ‖A‖p,r is tight within the absolute con-stant factor 1

2√

3π− 2

3

≈ 2.29 (which can be reduced to√

π/2 ≈ 1.25 when p = 2 or when r = 2).

We develop a novel technique for quantifying the quality of the bound Ψp,r and demonstratethat this bound in a wide range of values of p, r, n,m is essentially less conservative than it issuggested by Nesterov’s results. We prove also that the bound Ψp,r coincides with ‖A‖p,r inthe case when A has nonnegative entries. Next, we develop a simple interpolation techniqueallowing to extend the efficiently computable upper bound Ψp,r(A) on ‖A‖p,r from its originaldomain 1 ≤ r ≤ 2 ≤ p ≤ ∞ to the entire range 1 ≤ p, r ≤ ∞ of values of p, r, and showthat the extended bound is tight within a factor depending on p, n, r,m and never exceedingO(1) (max(m,n))

25128 . Our analysis demonstrates that this factor does not exceed 9.48 for all

p, r, provided that m, n ≤ 100, 000. Finally, we apply our interpolation technique to bound fromabove the norm of a linear mapping A acting from (Rn, ‖ · ‖p) to the space Sn of symmetricmatrices equipped with the standard matrix norm – a situation which is of significant interestfor Robust Semidefinite Programming. We demonstrate that for “well-structured”, in certainprecise sense, mappings A the norm in question admits efficiently computable upper bound tightwithin the factor O(1)n1/4.

vii

Page 9: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Chapter 1

Introduction

In this chapter, we formulate and motivate the goals of our research, outline the relevant back-ground and present the summary of our results.

1.1 Matrix Norm problem: setting and motivation

1.1.1 Matrix Norm problem.

In the Thesis, we focus on the Matrix Norm problem as follows:

Let E, H be finite-dimensional real vector spaces equipped with norms ‖ · ‖E , ‖ · ‖H ,respectively, and let L(E, H) be the space of linear mappings from E to H; from thealgebraic viewpoint, L(E,H) is just the space of m× n real matrices, where m andn are the dimensions of H, E, respectively. Norms ‖ · ‖E , ‖ · ‖H on E and H inducethe norm ‖ · ‖E,H on the space L(E, H):

‖A‖E,H = maxx:‖x‖E≤1

‖Ax‖H .

The Matrix Norm problem is to compute ‖A‖E,H given A.

Computing ‖A‖E,H is the problem of maximizing a convex function fA(x) = ‖Ax‖H over aconvex solid x : ‖x‖E ≤ 1, so that no universal efficient (i.e., polynomial time) solutionalgorithms for the problem are known. Whether such an algorithm exists, it depends on aparticular choice of the underlying norms ‖ · ‖E , ‖ · ‖H , and we shall discuss this issue in lengthlater; for the time being, it suffices to say that, except for few particular cases, the Matrix Normproblem is NP-hard. When the latter is the case, we intend to look for the “second best” option,that is, for an approximating algorithm for ‖A‖E,H – a polynomial time algorithm which, givenon input A, ‖ · ‖E , ‖ · ‖H , computes efficiently an upper bound ΨE,H(A) on ‖A‖E,H . The qualityof such an algorithm is quantified by its tightness factor γ which, by definition, is the smallestconstant c such that the inequality

‖A‖e,H ≤ ΨE,H(A) ≤ c‖A‖E,H

holds true for all A ∈ E(E, H). It is clear that the tightness factor is always ≥ 1; the closer it isto 1, the better is the underlying approximation algorithm.

Our interest in Matrix Norm problem comes mainly from the role the problem plays in RobustOptimization – a novel optimization methodology aimed at handling optimization problems withuncertain data.

1

Page 10: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

1.1.2 Robust Optimization and Matrix Norm problem

Robust Optimization: the paradigm

The data in numerous real-world optimization problems are not known exactly. It turns out thatdata uncertainty, even a small one (usually simply ignored when solving optimization problems)can make the “nominal” optimal solution (the one corresponding to the nominal data) completelymeaningless from the practical viewpoint. Indeed, there are cases (see examples in [2, 5, 7, 1])when quite small data perturbations can make the nominal solution severely infeasible and thus– practically meaningless. The Robust Optimization methodology originating from [2, 3, 16, 17]and rapidly developing since then (see, e.g., [4, 5, 6, 7, 1, 8, 9, 10, 11, 12, 13, 14] and referencestherein) is aimed at overcoming this difficulty. Here an “uncertain optimization problem” isdefined as a family

P =min

xf(x) : F (x, ζ) ≤ 0 : ζ ∈ U

of optimization programs – instances – with common decision vector x ∈ Rn, common objectivef(x)1) and the data ζ varying in a given uncertainty set U . A point x ∈ Rn is called a robustfeasible solution of P, if x is feasible for all instances, i.e., if

F (x, ζ) ≤ 0 ∀ζ ∈ U .

Finally, the robust optimal solution of uncertain optimization problem P is defined as the robustfeasible solution with the smallest possible value of the objective, i.e., as the usual optimalsolution to the optimization problem

minxf(x) : F (x, ζ) ≤ 0 ∀ζ ∈ U (RC)

Discussions and examples presented in [2, 4, 16, 17, 3, 5] demonstrate that in many cases therobust optimal solution is the most natural candidate to the role of “optimal solution to anuncertain optimization problem”.

There is, however, a significant a priory difficulty with the Robust Optimization methodol-ogy. By definition, the robust optimal solution is the usual optimal solution to the semi-infinite(that is, with infinitely many constraints) optimization problem (RC), called the Robust Coun-terpart (RC) of the uncertain problem P, and a semi-infinite optimization program can be“computationally intractable” even when the objective and all realizations of the constraints are“quite nice”. Although in some important cases (e.g., the one of uncertain Linear Programming)the RC can be converted equivalently into a computationally tractable optimization program,there are important cases (e.g., uncertain Semidefinite Programming) when all instances of Pare “easy”, while the RC of P is NP-hard. In these cases, the Robust Counterpart methodologyproposes replacing the RC of the problem by its “computationally tractable” approximations.We are about to demonstrate that the problem of recognizing whether the RC of an uncertainoptimization program is computationally tractable, same as the problem of building a “reason-ably good” tractable approximation of the RC when the RC itself is intractable, are closelyrelated to the Matrix Norm problem.

1)Extensions to the case when the objective also is affected by the data and thus varies from instance toinstance are quite straightforward.

2

Page 11: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Robust Optimization and matrix norms

In many important applications of Robust Optimization the situation is as follows:

1. The constraint F (x, ζ) ≤ 0 expresses the fact that the image of x under an affine mapping

x 7→ Bζ(x) = Bζx + bζ

should belong to a given cone K (nonnegative orthant in the case of Linear Programming,the cone of positive semidefinite matrices in the case of Semidefinite Programming, etc.)

2. Bζ , bζ affinely depend on the data ζ:

[Bζ , bζ ] = [Bn, bn] +L∑

`=1

ζ`[B`, b`]; (1.1.1)

here [Bn, bn] is the “nominal data”, and ζ if the “perturbation vector”.

3. The uncertainty set U is a symmetric w.r.t. the origin convex solid in E ≡ RL, or, whichis the same,

U = Uρ = ζ ∈ E ≡ RL : ‖ζ‖E ≤ ρ,where ‖ζ‖E is a given norm in the space of perturbations and ρ > 0 is a given “perturbationlevel”.

The most interesting cases here are:

• ‖ · ‖E = ‖ · ‖∞ (“interval uncertainty” – every entry in the perturbation vector variesin a given interval centered at the origin);

• ‖·‖E = ‖·‖2 (“ellipsoidal uncertainty” – a natural way to model random perturbations,see [4, 3]).

4. The objective f(x) is a simple convex function (usually just a linear one).

With these assumptions, the Robust Counterpart (RC) of the uncertain problem in questionis a semi-infinite convex optimization problem with efficiently computable objective. It is well-known that all we need in order to solve such a problem efficiently is the possibility to solveefficiently the associated Analysis problem:

(Anal): Given candidate solution x, check whether x is robust feasible, i.e., whether

Bζ(x) ∈ K ∀ζ ∈ U .

Note that (Anal) is nothing but the problem of checking whether the image Ux of the uncertaintyset U under the affine mapping

ζ 7→ Ax(ζ) ≡L∑

`=1

ζ`[B`x + b`]

︸ ︷︷ ︸Axζ

+Bnx + bn︸ ︷︷ ︸

ax

(the mapping depends on x as on a parameter) is contained in the cone K. Since Ux is symmetricw.r.t. ax, it is the same – to ask whether Ux is contained in K and to ask whether the image ofU under the linear mapping ζ 7→ Axζ is contained in the symmetric w.r.t. the origin compactconvex set Qx = (K− ax) ∩ (ax −K). Now, the question

3

Page 12: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Whether the image of the set U = ζ : ‖ζ‖E ≤ ρ under a given linear mappingζ 7→ Aζ is contained in a given central-symmetric w.r.t. the origin convex compactset Q ⊂ RM?

is nothing but the question of computing the norm ‖A‖E,H of the matrix A (where ‖ · ‖H is thenorm with the unit ball Q): AU ⊂ Q if and only if ‖A‖E,H ≤ ρ−1. It follows that results onefficient computation/bounding of matrix norms can be straightforwardly used to solve efficientlyRobust Counterparts/Approximate Robust Counterparts of uncertain optimization programs.

To illustrate this point, consider two generic examples.

Example 1: Robust approximation in ‖ · ‖r. Recall that the standard ‖ · ‖s-norm, 1 ≤s ≤ ∞, on Rn is defined as

‖(x1, ..., xn)T ‖s =

(n∑

i=1|xi|s

)1/s

, 1 ≤ s < ∞max

i|xi|, s = ∞

(1.1.2)

For a linear mapping from E = Rn to Rm, that is, for a m× n matrix A and assuming that E,H are equipped with ‖ · ‖p, ‖ · ‖r, respectively, the corresponding norm of the mapping A is

‖A‖E,H ≡ ‖A‖p,r ≡ maxx‖Ax‖r : x ∈ Rn, ‖x‖p ≤ 1 . (1.1.3)

Consider the following well-known problem:

Given m×n matrix B, m-vector b and r ∈ [1,∞], find the best possible approximationof b by a linear combination of the columns of B, i.e., solve the convex optimizationproblem

minτ,x

τ : ‖Bx− b‖r ≤ τ . (1.1.4)

This problem arises in numerous applications; in these applications, the “values of interest” forr usually are r = ∞ (Tschebyshev approximation), r = 2 (Least Squares) and r = 1 (‖ · ‖1-approximation). Now, in many important applications (like synthesis of filters and arrays ofantennae, see [19, 7]), the decision variables in (1.1.4) are characteristics of certain physicaldevices. In reality these characteristics somehow drift around their computed values rather thanremain exactly equal to these values. Examples in [7] demonstrate that even small “imple-mentation errors” of this type can destroy completely the quality of the usual optimal solutionto (1.1.4). Thus, in many applications of (1.1.4) there is a necessity to “immunize” the so-lution against implementation errors. Assuming the most natural multiplicative model of theimplementation errors:

xi 7→ xi = (1 + ζi)xi, |ζi| ≤ ρ, (1.1.5)

a natural way to immunize a solution against these errors is to pass from (1.1.4) to the problem

minτ,x

τ : ‖Bx− b‖r ≤ τ, ‖BDiag1 + ζ1, ..., 1 + ζnx−Bx‖r ≤ θτ ∀(ζ : ‖ζ‖∞ ≤ ρ) , (1.1.6)

where θ is the “safety parameter” of order of 1 (say, θ = 0.1). The role of the semi-infiniteconstraint in (1.1.6) is to impose an upper bound on how the implementation errors can affectthe quality of approximation; with this constraint, for a feasible solution (x, τ) perturbations

4

Page 13: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

(1.1.5) cannot make the quantity ‖Bx − b‖r larger than (1 + θ) times the “nominal quality” τof the solution.

The Analysis problem associated with (1.1.6) is, essentially, to verify, given x, τ , whether

‖B(In + Diagζ)x−Bx‖r ≤ θτ ∀(ζ ∈ Rn : ‖ζ‖∞ ≤ ρ),

where Diaga, a ∈ Rn, is the diagonal matrix with diagonal entries a1, ..., an, and In is theunit n× n matrix. In other words, we need to check whether

‖BDiagx‖∞,r ≤ θ

ρτ ; (1.1.7)

We see that to solve efficiently the problem of interest (1.1.6) is the same as to be able to computeefficiently the ‖·‖∞,r-norm of the m×n matrix A = BDiagx. It is known (see Section 1.3.3 fordetailed explanation and references) that computing ‖A‖∞,r is easy when r = ∞ (Tschebyshevapproximation) and is NP-hard when r = 2 (the Least Squares case); however, in the latter casethere exists an efficiently computable upper bound Ψ∞,2(A) on ‖A‖∞,2 which coincides with‖A‖∞,2 up to the factor

√π/2 ≈ 1.25; this bound is

Ψ∞,2(A) = maxX∈Sn

√Tr(AXAT ) : X º 0, Xii ≤ 1, i = 1, ..., n

= minµ∈Rm,ν∈R

12

[ν +

m∑i=1

µi

]:

[Diagµ A

AT µIn

]º 0

(1.1.8)

where Sn is the space of symmetric n×n matrices, Tr(B) is the trace of a square matrix B, andX º 0 means that X a is symmetric positive semidefinite matrix.

Replacing in the Analysis problem the “computationally intractable” matrix norm ‖ · ‖∞,2

with its efficiently computable approximation Ψ∞,2(·), we arrive at an efficiently solvable ap-proximation of the (Least Squares version of) problem (1.1.6):

minτ,x

τ : ‖Bx− b‖2 ≤ τ, ‖BDiagx‖∞,2 ≤ θ

ρτ

(I[θ])

⇓min

τ,x,µ,ν

τ : ‖Bx− b‖2 ≤ τ, ν +

∑i

µi ≤ 2 θρτ,

[Diagµ BDiagx

DiagxBT νIn

]º 0

(II[θ])

Here the NP-hard problem (I[ρ]) is equivalent to the Least Squares version of (1.1.6), whileproblem (II[θ]) is a computationally tractable approximation of (I[θ]) (and thus – of the problemof interest (1.1.6)). The quality of the approximation is as follows: a feasible solution to (II[θ])is feasible for (I[θ]), and “nearly vice versa”: a feasible solution to (I[ θ

1.25 ]) is feasible for (II[θ]).Recalling the role of the safety parameter θ, we see that from the modelling viewpoint, a 25%change in the value of the parameter makes no essential difference; consequently, problem (II[θ]),for all practical purposes, is a quite satisfactory substitution of the problem of interest.

We see that although the Matrix Norm problem, especially in its simplest “‖ · ‖p,r”-setting,looks “purely academic”, the related results admit immediate practical applications. This iseven more so for more “specialized” settings of the problem, as is seen from the following

Example 2. Uncertain Semidefinite Programming. Consider an uncertain semidefiniteprogram

min

x

cT x : B0[ζ] +

n∑

j=1

xjBj [ζ] º 0

: ‖ζ‖p ≤ ρ

; (1.1.9)

5

Page 14: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

in this problem, Bj [ζ], j = 0, 1, ..., n, are symmetric matrices affinely depending on the per-turbation vector ζ ∈ RL, and P º Q means that the matrices P,Q are symmetric and theirdifference P − Q is positive semidefinite. The most interesting cases of (1.1.9) are those ofp = ∞ (“interval uncertainty”) and p = 2 (“ellipsoidal uncertainty”, which in many cases is anatural model of random perturbations, see [4]). Taking into account the extremely importantrole played by Semidefinite Programming in Control, Structural Design, Signal Processing, etc.,problem (1.1.9) is of definite applied interest.

The Analysis problem associated with the Robust Counterpart of (1.1.9) is the problem

Given m×m symmetric matrices C0, C1, ..., CL, check whether the implication

‖ζ‖p ≤ ρ ⇒L∑

`=1

ζ`C` ¹ C0

takes place.

When verifying this implication, one can assume w.l.o.g. that C0 is positive definite (otherwisethe implication definitely is not valid, provided that certain linear combination of the matricesC` is of full rank). When C0 Â 0, the implication is equivalent to

‖ζ‖p ≤ ρ ⇒ −I ¹L∑

`=1

ζ`C` ¹ I, C` = C−1/20 C`C

−1/20 . (1.1.10)

Now consider the linear mapping

ζ 7→ Aζ ≡L∑

`=1

ζ`C`

acting from the space E ≡ RL equipped with the norm ‖ · ‖E ≡ ‖ · ‖p to the space H ≡ Sn ofsymmetric n × n matrices equipped with the standard matrix norm ‖B‖H ≡ ‖B‖2,2 = σ1(B),where σ1(B) ≥ ... ≥ σn(B) are the singular values of an m×n matrix B. It is immediately seenthat implication (1.1.10) is equivalent to the relation

‖A‖p,(2,2) ≡ ‖A‖E,H ≤ 1ρ,

so that to process the (approximate) Robust Counterparts of uncertain semidefinite problem(1.1.9) is, essentially, the same as to compute/to approximate the norm ‖A‖p,(2,2) of a linearoperator acting from (RL, ‖ · ‖p) to (Sn, ‖ · ‖2,2). It is known ([3, 8], [6]) that although it can beNP-hard to compute the norm ‖ · ‖p,(2,2) exactly (it is the case, e.g., when p = 2), there existnontrivial efficiently computable upper bounds on the norm, and thus there exist nontrivialapproximate Robust Counterparts of uncertain semidefinite programs (1.1.9).

1.2 Basic facts on matrix norms

Here we list the well-known facts on matrix norms to be used in the sequel.

6

Page 15: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

1.2.1 Norms on Euclidean spaces

Let E be a Euclidean space (that is, a finite-dimensional linear space over reals equipped withan inner product 〈·, ·〉). Recall that a norm on E is a real-valued function ‖ · ‖ on E with thefollowing properties:

‖x‖ ≥ 0∀x ∈ E; ‖x‖ = 0 ⇔ x = 0 [positivity]‖λx‖ = |λ‖x‖ ∀(x ∈ E, λ ∈ R) [homogeneity and symmetry]‖x + y‖ ≤ ‖x‖+ ‖y‖ ∀(x, y ∈ E) [triangle inequality]

It is well-known that every norm on E is a convex function. Every norm ‖ · ‖ on E induces theconjugate norm ‖ · ‖∗:

‖ξ‖∗ = maxx〈ξ, x〉 : x ∈ E, ‖x‖ ≤ 1 ,

and the norm conjugate to ‖ · ‖∗ is the original norm ‖ · ‖. From the definition of conjugation itfollows that

∀(x, y ∈ E) : |〈x, y〉| ≤ ‖x‖‖y‖∗. (1.2.1)

Particular norms of primary interest for us are as follows:

I. Norms ‖ · ‖s on Rn. As always, Rn is the space of n-dimensional real column vectors

equipped with the standard inner product 〈a, b〉 = aT b =n∑

i=1aibi. For s ∈ [1,∞], the norm ‖ · ‖s

on Rn is given by (1.1.2). The basic properties of the norms of the family are as follows:

(a) (‖ · ‖s)∗ = ‖ · ‖s∗ , s∗ = ss−1

[⇔ 1

s + 1s∗ = 1

]

(b) ∀(x, y ∈ Rn, s ∈ [1,∞]) : |xT y| ≤ ‖x‖s‖y‖s∗ [Holder Inequality](c) 1 ≤ s ≤ r ≤ ∞⇒ ‖x‖s ≤ ‖x‖r ≤ n

1s− 1

r ‖x‖s ∀x ∈ Rn

(1.2.2)

It is convenient to parameterize the norms ‖ · ‖s by the parameter α = 1s varying in [0, 1]. With

this parameterization, (1.2.2) becomes

(‖ · ‖ 1α)∗ = (‖ · ‖ 1

1−α).

Besides this, we have the following

Proposition 1.1 Let 0 6= a ∈ Rn. Then the function

f(α) = ln(‖a‖ 1α) : [0, 1] → R (1.2.3)

is convex, nondecreasing and Lipschitz continuous on [0, 1] with Lipschitz constant ln n.

Proof of this well-known fact is as follows. Monotonicity and Lipschitz continuity, with constantln(n), of f(·) are readily given by (1.2.2.c). To prove convexity, we may w.l.o.g. assume thatai 6= 0 for all i ≤ n. Now let 0 ≤ α ≤ β ≤ 1 and let γ = λα + (1 − λ)β, where λ ∈ (0, 1); weshould prove that f(γ) ≤ λf(α) + (1 − λ)f(β); since f , as we already know, is continuous on[0, 1], it suffices to prove the latter inequality in the case of α > 0. Assuming α > 0, and settingµ = γ

λα , ν = γ(1−λ)β , so that 1

µ + 1ν = 1 and µ, ν ∈ [1,∞], we have

f(γ) = γ ln(∑

i|ai|1/γ

)= γ ln

(∑i|ai|

λγ · |ai|

(1−λ)γ

)

≤ γ ln

((∑i|ai|

λγ

µ) 1

µ(∑

i|ai|

1−λγ

ν) 1

ν

)[by Holder inequality]

= γµ ln

(∑i|ai| 1α

)+ γ

ν ln(∑

i|ai|

)= λf(α) + (1− λ)f(β),

7

Page 16: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

as claimed.

Spectral norms on the spaces of matrices Rm×n. Let Rm×n be the space of m× n real

matrices equipped with the inner product 〈X, Y 〉 = Tr(XY T ) =m∑

i=1

n∑j=1

XijYij . The spectral

s-norms | · |s on Rm×n, 1 ≤ s ≤ ∞, are defined by the relation

|X|s = ‖σ(X)‖s,

where σ1(X) ≥ σ2(X) ≥ ... ≥ σn(X) are the singular values of X. It is well-known thatthese functions indeed are norms; their basic properties are similar to those of the ‖ · ‖s-norms,specifically,

(a) (| · |s)∗ = ‖ · ‖s∗ , s∗ = ss−1

[⇔ 1

s + 1s∗ = 1

]

(b) ∀(X, Y ∈ Rm×n, s ∈ [1,∞]) : |〈X,Y 〉| ≤ |X|s|Y |s∗(c) 1 ≤ s ≤ r ≤ ∞⇒ ‖X‖s ≤ ‖X‖r ≤ (min[m,n])

1s− 1

r ‖X‖s ∀X ∈ Rm×n,(d) ∀(X ∈ Rm×n, s ∈ [1,∞]) : |X|s = |XT |s(e) 0 6= A ∈ Rm×n ⇒ ln |A| 1

αis convex, nondecreasing and Lipschitz continuous,with constant ln(min[m, n]), on [0, 1]

(1.2.4)

When deriving (1.2.4.c-e) from (1.2.2.c) and Proposition 1.1, one should take into account thatthe number of nonzero singular values of an m× n matrix does not exceed min[m,n], and thatthe vectors of singular values of A and AT differ from each other by adding/deleting zero entries.

1.2.2 Induced norms of linear mappings

Let E, H be two Euclidean spaces equipped with respective inner products 〈·, ·〉E , 〈·, ·〉H . Wecan associate with the pair E, H the space L(E,H) of linear mappings from E to H; this is alsoa finite-dimensional linear space. After we equip E and H with bases, the elements of L(E, H)can be identified with matrices of the size dimH × dimE, i.e., with elements of Rm,n, wherem = dimH and n = dimE. When E and H are the arithmetic spaces Rn, Rm with the standardinner products xT y, these spaces are from the very beginning equipped with “canonical” bases,and in this case we can say that L(E, H) is the space of m× n matrices Rm,n.

A pair of norms ‖ · ‖E on E and ‖ · ‖H on H induces a norm on L(E, H), specifically, thenorm

‖A‖‖·‖E ,‖·‖Hmaxx∈E

‖Ax‖H : ‖x‖E ≤ 1 (1.2.5)

Equivalently: ‖A‖E,H is the smallest constant C such that

‖Ax‖‖·‖E ,‖·‖H≤ C‖x‖E ∀x ∈ E.

When the norms ‖ · ‖E , ‖cdot‖H are clear from the context, we will shorten the notation‖A‖‖·‖E ,‖·‖H

to ‖A‖E,H . We shall be especially interested in the norms ‖ · ‖p,r on Rm×n =L(Rn,Rm) induced in the outlined manner by the pair of norms ‖ · ‖p on Rn and ‖ · ‖r on Rm;these norms are given by relation (1.1.3).

Induced norms and conjugation

An important for the sequel property of induced norms is the relation between the inducednorms of a linear mapping and its conjugate.

8

Page 17: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Conjugate of a linear mapping A ∈ L(E, H) is the mapping A∗ given by the identity

〈h,Ae〉H = 〈A∗h, e〉E ∀(e ∈ E, h ∈ H). (1.2.6)

It is well-known that this identity, for every given A, indeed defines a unique linear mappingA∗; the mapping A 7→ A∗ is a one-to-one linear mapping of L(E, H) onto L(H, E) which isself-inverse: (A∗)∗ = A. When identifying linear mappings from E to H by their matrices inorthonormal bases ej, hi of E, H, the matrix representing A∗ in the pair of bases (hi, ej)is the transpose of the matrix representing A in the pair of bases (ej, hi). In particular, whenE = Rn, H = Rm, so that L(E,H) is exactly Rm×n, the conjugation becomes the mappingA 7→ AT : Rm×n → Rn×m.

Relation between induced norms of a mapping A ∈ L(E, H) and its conjugate A∗ ∈L(H,E) is given by the following

Proposition 1.2 Let E, H be Euclidean spaces equipped with norms ‖·‖E, ‖·‖H , and let ‖·‖∗E,‖ · ‖∗H be the norms conjugate to these norms. For every A ∈ L(E,H) one has

‖A‖‖·‖E ,‖·‖H= max

e∈E:‖e‖E≤1

h∈H:‖h‖∗H≤1

〈h,Ae〉H . (1.2.7)

In particular,A ∈ L(E, H) ⇒ ‖A‖‖·‖E ,‖·‖H

= ‖A∗‖‖·‖∗H ,‖·‖∗E (1.2.8)

and∀(A ∈ Rm×n, p, r ∈ [1,∞) : ‖A‖p,r = ‖AT ‖r∗,p∗ , p∗ =

p

p− 1, r∗ =

r

r − 1, (1.2.9)

or, equivalently,A ∈ Rm×n, α, β ∈ [0, 1] ⇒ ‖A‖ 1

α, 1β

= ‖AT ‖ 11−β

, 11−α

(1.2.10)

Proof of this well-known fact is immediate: since ‖ · ‖H = (‖ · ‖∗H)∗, we have

‖Ae‖H = maxh∈H

〈h,Ae〉H : ‖h‖∗H ≤ 1 ;

this relation combines with ‖A‖‖·‖E ,‖·‖H= max

e‖Ae‖H : ‖e‖E ≤ 1 to imply (1.2.7). (1.2.8)

immediately follows from (1.2.7) and the identity (‖ · ‖∗)∗ = ‖ · ‖:‖A∗‖‖·‖∗H ,‖·‖∗E = max

h∈H:‖h‖∗H≤1

e∈E:(‖e‖∗E

)∗=‖e‖E≤1

〈e,A∗h〉E [by (1.2.8) as applied to A∗]

= maxh∈H:‖h‖∗

H≤1

e∈E:‖e‖E≤1

〈h,Ae〉H [by definition of A∗]

= ‖A‖‖·‖E ,‖·‖H[by (1.2.7)]

Finally, (1.2.10) is given by (1.2.8) due to (‖ · ‖ 1α)∗ = ‖ · ‖ 1

1−α, see (1.2.3).

1.2.3 Induced norms and quadratic maximization

Relation (1.2.7) demonstrates that computing induced norm of a linear mapping A ∈ L(E, H)is equivalent to solving the problem of bilinear maximization:

‖A‖‖·‖E ,‖·‖H= max

e∈E,h∈H〈h,Ae〉H : ‖h‖H ≤ 1, ‖e‖E ≤ 1 (1.2.11)

All known algorithms for approximating induced norms are based on further reducing (1.2.11)to a problem of quadratic maximization.

9

Page 18: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Reducing (1.2.11) to quadratic maximization

Let L = H × E be the product of Euclidean spaces E, H, so that vectors from L are orderedpairs (h, e) with h ∈ H, e ∈ E and naturally defined linear operations, and the inner producton L is

〈(h, e), (h′, e′)〉L = 〈h, h′〉H + 〈e, e′〉E .

Given norms ‖ · ‖E , ‖ · ‖H on E and H, we associate with them the norm

‖(h, e)‖L = max[‖h‖∗H , ‖e‖E ]

on L. Finally, with a linear mapping A ∈ L(E, H) we associate the mapping S[A] ∈ L(L,L)given by

S[A](h, e) = (Ae,A∗h).

Note that when identifying E with Rn, n = dimE and H with Rm, m = dimH, by choosingorthonormal bases ej in E, hi in H, we can equip L with the orthonormal basis (hi, 0) ∪(0, ej), thus identifying L with Rm+n, L(E, H) with Rm×n and L(L,L) with R(m+n)×(m+n).With these identifications, A ∈ L(E, H) becomes an m × n matrix, and S[A] becomes the

symmetric matrix

[A

AT

].

The quadratic maximization representation of ‖ · ‖‖·‖E ,‖·‖His given by the following

Proposition 1.3 One has

‖A‖‖·‖E ,‖·‖H=

12

maxu∈L

〈u, S[A]u〉L : ‖u‖L ≤ 1 . (1.2.12)

Proof of this well-known fact is immediate: setting u = (h, e) and taking into account thedefinition of ‖ · ‖L and S[A], the left hand side in (1.2.12) becomes

12 max

e∈E,h∈H〈h,Ae〉L + 〈A∗h, e〉E︸ ︷︷ ︸

=〈h,Ae〉F

: ‖h‖∗F ≤ 1, ‖e‖E ≤ 1

= 12 max

e∈E,h∈H2〈h,Ae〉L : ‖h‖∗F ≤ 1, ‖e‖E ≤ 1

= ‖A‖‖·‖E ,‖·‖F,

where the concluding equality is given by (1.2.11).There are also two particular cases when computing the induced norm can be reduced to

maximizing a nonnegative quadratic form over a solid symmetric w.r.t. the origin; we shall seethat these cases are especially well suited for approximating the norm. The cases in questionare as follows:

A. ‖ · ‖H is a Euclidean norm ‖ · ‖2 on H = Rm. In this case we have

‖A‖‖·‖E ,‖·‖2 = maxe:‖e‖E≤1

‖Ae‖2 = maxe:‖e‖E≤1

√〈Ae,Ae〉H

√max

e:‖e‖E≤1〈e,A∗Ae〉E . (1.2.13)

Note that the resulting mapping A∗A ∈ L(E,E) is symmetric positive semidefinite.

10

Page 19: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

B. ‖ · ‖E is the Euclidean norm ‖ · ‖2 on E = Rn. In this case we invoke (1.2.10) to get

‖A‖‖·‖2,‖·‖H= ‖A∗‖‖·‖∗H ,‖·‖2 ,

whence, as we have already seen,

‖A‖‖·‖2,‖·‖H=

√max

h:‖h‖∗H≤1〈h,AA∗h〉H , (1.2.14)

and the mapping AA∗ ∈ L(H, H) is symmetric positive semidefinite.

1.3 Solvability status of the Matrix Norm problem: known re-sults

The goal of this Section is to summarize known results on the solvability status of the MatrixNorm problem. We restrict ourselves with the case when the problem of interest is to compute‖A‖p,r, where A is an m × n real matrix and p, r ∈ [1,∞] (see (1.1.3)), which is the problemwe primarily focus on in the Thesis. When speaking about solvability status of the problem, wetreat p, r as fixed parameters and m,n, A ∈ Rm×n as the data specifying an instance; in otherwords, we consider a two-parametric family Pp,r, p, r ∈ [1,∞], of generic problems, where theinstances of Pp,r are specified by the data m,n, A ∈ Rm×n, and an instance of Pp,r given by aparticular data (m,n, A) requires to compute ‖A‖p,r. The questions we are interested in are

(a) what are the “easy” (polynomially solvable, computationally tractable) members of our2-parametric family of generic problems and what are the “difficult” (NP-hard) ones,and

(b) how well can one approximate, in an efficient way, ‖A‖p,r in the case when Pp,r is difficult.Our local goal is to outline known results on (a) and (b). It should be stressed that here weconsider the problems Pp, r in “full generality”, that is, without imposing restrictions on thedata like entry-wise nonnegativity of A or specific sparsity pattern in A. We remark also thatsince we are speaking about problems with real data, in the sequel the notions of polynomial-time solvability/NP-hardness are understood in the sense of Real Arithmetic Complexity Theory(see, e.g., [1], Chapter 5, or Chapter 2).

1.3.1 When Pp,r is known to be easy

To the best of our knowledge, the only cases when Pp,r is known to be polynomially solvableare the following 3:

1. p = 1. In this case, ‖A‖p,r = max1≤i≤n

‖Aj‖r, where Aj is j-th column of A. This is a

particular case of the following immediate observation:

(*) For A ∈ L(Rn, H), ‖A‖‖·‖1,‖·‖H= max

1≤j≤n‖Aj‖H , where Aj is the image of

j-th standard basic orth ej in Rn under the mapping A.

Indeed, ‖A‖‖·‖1,‖·‖His the maximum of the convex function ‖Ax‖H over the unit ball of

the ‖ · ‖1-norm; since the latter set is the convex hull of the vectors ±ej , j = 1, ..., n, themaximum is achieved at one of these points.

11

Page 20: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

2. r = ∞. In this case, ‖A‖p,r = max1≤i≤m

‖Ai‖p∗ , where ATi are the rows of A and p∗ = p

p−1 .

This case is “symmetric” to the first one, due to the following simple observation:

(!) The “computability status” of problem Pp,r is exactly the same as the com-putability status of problem Pr∗,p∗

which is an immediate consequence of the identity (1.2.8):

∀A ∈ L(E,H) : ‖A‖‖·‖E ,‖·‖H= ‖A∗‖‖·‖∗H ,‖·‖∗E ;

as a result, (*) implies that

(**) For A ∈ L(E,Rm), ‖A‖‖·‖E ,‖·‖∞ = max1≤i≤m

‖Ai‖∗E , where Ai is the image of

i-th standard basic orth ei in Rm under the mapping A∗.

In the sequel, we refer to the problems Pp,r and Pr∗,p∗ as to symmetric to each other.

3. p = r = 2. It is well-known that in this case ‖A‖p,r is the maximal singular value of A,and the singular values are efficiently computable.Note that the case in question is “self-symmetric”: when p = r = 2, the problem Pr∗,p∗(which is always equivalent to Pp,r) is just Pp,r itself.

In the sequel, we refer to the just listed problems Pp,r as to “trivial”, and the remaining problemsPp,r (those which are not known to be polynomially solvable) as to “nontrivial” ones.

1.3.2 When Pp,r is known to be difficult

To the best of our knowledge, the only cases when Pp,r is known to be difficult are the following3:

1. p = ∞, r = 1;

2. p = ∞, r = 2;

3. p = 2, r = 1.

The key case here is the first one, and this case is difficult even when A is restricted to besymmetric positive semidefinite. Indeed, from Proposition 1.2 we know that for every A ∈ Rm×n

one has‖A‖∞,1 = max

‖x‖∞≤1,‖y‖∞≤1; yT Ax (1.3.1)

When A is restricted to be symmetric and positive semidefinite, this bilinear maximizationproblem is known to be equivalent to the quadratic maximization problem due to the relation

A ∈ Sn, A º 0 ⇒ ‖A‖p,p∗ = max‖x‖p≤1

xT Ax (1.3.2)

The latter well-known relation can be verified as follows: by Proposition 1.2 we always have

‖A‖p,p∗ = max‖x‖p≤1,‖y‖p≤1

yT Ax.

When A is symmetric, we have

yT Ax = zT Az − wT Aw, z =12(x + y), w =

12(x− y);

12

Page 21: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

when x, y run through the unit ‖ · ‖p-ball, z, w do not leave the same ball, whence

‖A‖p,p∗ = max‖x‖p≤1,‖y‖p≤1

yT Ax ≤ max‖z‖p≤1,‖w‖p≤1

[zT Az−wT Aw] ≤ max‖z‖p≤1

zT Az− min‖w‖p≤1

wT Aw.

When A is further restricted to be positive semidefinite, the latter inequality becomes

‖A‖p,p∗ ≤ max‖z‖p≤1

zT Az,

while for every square matrix A it holds ‖A‖p,p∗ = max‖x‖p≤1,‖y‖p≤1

yT Ax ≤ max‖z‖p≤1

zT Az. Thus,

for symmetric positive semidefinite A we indeed have ‖A‖p,p∗ = max‖z‖p≤1

zT Az.

Relation (1.3.1) says that problem P∞,1 is at least as difficult as the problem of maximizing apositive semidefinite quadratic form over the unit box. This latter problem, in turn, is at leastas difficult as the well-known MAXCUT problem

MAXCUT: Given a complete n-node graph with nonnegative integer weights of thearcs, partition the nodes into two sets in a way which maximizes the total weight ofarcs starting in one of the sets and ending in the other one of them.

Indeed, denoting by aij = aji 1 ≤ i, j ≤ n, i 6= j, the weight of arc (i, j), and setting aii = 0, itis easily seen (see, e.g., [1]) that MAXCUT is equivalent to the quadratic maximization problem

14

max‖x‖∞≤1

xT Lx, Lij =

−aij , i 6= j∑k

aik, i = j ,

and that the resulting matrix L is symmetric positive semidefinite.We see that the generic problem of computing ‖A‖∞,1 for a symmetric positive semidefinite

matrix A is at least as difficult as MAXCUT, and the latter problem indeed is quite difficult:it is NP-hard to solve this problem; more than this, it is NP-hard to approximate its optimalvalue within a once for ever fixed relative accuracy, like 0.04, and the latter remains NP-hardeven when instead of deterministic algorithms we are allowed to use randomized ones (see [22],Chapter 13). The conclusion is that the problem of computing ‖ ·‖∞,1 is NP-hard already in thecase when A is restricted to be symmetric positive semidefinite, and even when an approximationof relative accuracy 0.04 is sought.

“Computational intractability” of computing ‖A‖∞,1 for symmetric positive semidefinite Aimplies the same property for the problems P∞,2 and P2,1 – the second and the third difficultcases of Pp,r in the above list. Indeed, given a positive semidefinite matrix A, we can efficientlyrepresent is as A = BT B, even with additional restriction for B to be lower/upper triangular orto be symmetric positive semidefinite. By (1.2.13) we have

‖A‖∞,1 = ‖B‖2∞,2;

thus, if P∞,2 were computationally tractable, so would be the problem of computing ‖A‖∞,1

for 0 ¹ A ∈ Sn, which, as we know, is not the case. Moreover, from the above remarks itfollows that the problem of computing ‖B‖∞,2 is intractable already when B is restricted to belower/upper triangular or symmetric positive semidefinite, and even when an approximation ofonce for ever fixed small relative accuracy, like 0.02, is sought. Finally, problem P2,1 has exactlythe same “computational status” as P∞,2 “by symmetry” (that is, by observation (!) in Section1.3.1).

13

Page 22: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

1.3.3 Known approximation results for Pp,r

All known approximation results for nontrivial problems Pp,r are straightforward consequencesof the results os Yu. Nesterov ([21], [22], Chapter 13.2) on convex relaxations of quadraticmaximization problems and are as follows:

1. Problem P∞,2 (and symmetric to it problem P2,1):

Proposition 1.4 For A ∈ Rm×n, the efficiently computable quantity

Ψ∞,2(A) =12

minµ,ν

ν +

n∑

i=1

µi :

[Diagµ AT

A νIm

]º 0

(1.3.3)

is an upper bound on ‖A‖∞,2 = ‖AT ‖2,1, and this bound is tight within the factor√

π/2:

‖A‖∞,2 ≤ Ψ∞,2(A) ≤√

π/2‖A‖∞,2. (1.3.4)

This is a straightforward consequence of the following fact:

Theorem 1.1 [Nesterov’s π/2 Theorem, [21]] Let Q be an n × n symmetric positivesemidefinite matrix. Then the efficiently computable quantity

N(Q) = maxX

Tr(QX) : X ∈ Sn, X º 0, Xii ≤ 1, i = 1, ..., n= min

µ

∑i

µi : Diagµ º Q

(1.3.5)

is an upper bound on the quantity ‖Q‖∞,1 = max‖x‖∞≤1

xT Qx, and this bound is tight within

the factor π/2:max‖x‖∞≤1

xT Qx ≤ N(Q) ≤ π

2max‖x‖∞≤1

xT Qx. (1.3.6)

Nesterov’s π/2 Theorem was originally proved via the “random hyperplane” techniqueoriginating from the famous MAXCUT-related paper [18] of Goemans and Williamson;it can be also obtained from a more general Matrix Cube Theorem [8]. It can be easilyseen that the quantity Ψ∞,2(A) is nothing but

√N(AT A); since ‖A‖∞,2 =

√‖AT A‖∞,1,

(1.3.4) follows from (1.3.6) (where we set Q = AT A) by taking square roots of N(Q) and‖Q‖∞,1 = max

‖x‖∞≤1xT Qx.

2. Problems Pp,r with 1 ≤ r ≤ 2 ≤ p ≤ ∞.

Proposition 1.5 Let A ∈ Rm×n and let 1 ≤ r ≤ 2 ≤ p ≤ ∞. Then the efficientlycomputable quantity

Ψp,r(A) =12

minµ∈Rn,ν∈Rm

‖µ‖ p

p−2+ ‖ν‖ r

2−r:

[Diagµ AT

A Diagν

]º 0

(1.3.7)

is an upper bound on ‖A‖p,r, and this bound is tight within the factor 12√

3π− 2

3

= 2.2936...:

‖A‖p,r ≤ Ψp,r(A) ≤ 12√

3π − 2

3

‖A‖p,r. (1.3.8)

14

Page 23: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

In addition, in the case of r = 2 one has

‖A‖p,2 ≤ Ψp,2(A) ≤ √p− 1‖A‖p,2. (1.3.9)

This is an immediate consequence of Nesterov’s Theorems 13.2.4 and 13.2.5 in [22], Chapter13.2. Note that (1.3.9) is better than (1.3.8) when p > 2 is close to 2; moreover, whenp → 2 + 0, the tightness factor of the bound Ψp,2, as stated by (1.3.9), approaches 1uniformly in the sizes m,n of A, which is in full accordance with the fact that ‖A‖2,2 isefficiently computable.

1.4 Overview of the results

The main results of the Thesis are as follows.

NP-hardness of Pp,r with p > r. It is natural to conjecture that the 3 situations listed inSection 1.3.1 are the only cases when Pp,r is polynomially solvable. While we have no proof ofthis conjecture in its full generality, we do prove that it holds true when p > r. This “negative”result is derived in Chapter 2.

Improved approximation results for Pp,r when 1 ≤ r ≤ 2 ≤ p. In Chapter 3, we refineNesterov’s results stated in Proposition 1.5. Our major result here (Theorem 3.2) states thatin the case of 1 ≤ r ≤ 2 ≤ p ≤ ∞ the efficiently computable upper bound Ψp,r(A) on ‖A‖p,r,A ∈ Rm×n, given by (1.3.7), satisfies the relations

‖A‖p,r ≤ Ψp,r(A) ≤ min[Υ(p, n)α(r)

,Υ(r∗,m)

α(p∗)

]

︸ ︷︷ ︸Θ(p,n,r,m)

‖A‖p,r, p∗ =p

p− 1, r∗ =

r

r − 1, (1.4.1)

where

α(s) =(∫ |t|s 1√

2πexp−t2/2dt

)1/s=√

2(

Γ( s+12 )√π

)1/s

,

Υ(s, k) = min[α(s),√

2 ln(k + 1)]

and Γ(s) is the Euler function.Note that the tightness factor of the bound Ψp,r(A), as given by (1.4.1), is, in appropriate

range of values of p, r,m, n, better than the one given by (1.3.8), (1.3.9) (for detailed analysis,see Section 3.2.3). The technique, originating from [1], underlying Theorem 3.2, while still usingrandomization, differs from the “random hyperplane” technique underlying Nesterov’s results.

Exactness of the bound for a nonnegative matrix A. Along with the outlined resultson tightness factor of the bound Ψp,r(A) for “general” A, we show that in the particular case ofa nonnegative A, the bound coincides with ‖A‖p,r (Theorem 3.3).

Approximation results for Pp,r with “general” p, r. All aforementioned approximationresults for Pp,r deal with the “good case” – the one when 1 ≤ r ≤ 2 ≤ p ≤ ∞. In Chapter 4we “interpolate” these approximations to the entire range 1 ≤ p, r ≤ ∞ of the values of p, r.In contrast to the “good case”, where the quality of approximation is independent of p, r andthe sizes of A (see (1.3.8)), the tightness factor of the interpolated approximations grows with

15

Page 24: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

the sizes m,n of A. This growth, however, is “moderate”: the tightness factor is never worsethan O(1) (max[m,n])

25128 , which corresponds to the cases of p = 16

11 , r = 4 and p = 43 , r = 16

5 ,see Section 4.2.3. To get an impression of these results, note that the tightness factor of ourefficiently computable upper bound on ‖A‖p,r is < 2.5 for m× n matrices with m, n ≤ 100 andis < 9.5 for matrices with as many as 100,000 rows and columns.

Approximation results for ‖A‖‖·‖p,|·|∞. The results we have mentioned deal with problemsPp,r. In Chapter 5, we consider the problem of computing the induced norm of a linear mapping

x 7→ Ax =n∑

j=1

xjAj : Rn → Sm

in the case when the argument space Rn is equipped with ‖ · ‖p, and the image space Sm isequipped with the standard spectral norm | · |∞ (cf. Example 2 in Section 1.1.2). Our mainresult here is in developing an efficiently computable upper bound Ψp(A) on ‖A‖‖·‖p,|·|∞ suchthat

‖A‖‖·‖p,|·|∞ ≤ Ψp(A) ≤ np−1

p2 (min[ϑ(µ), n])p−1

p ‖A‖‖·‖p,|·|∞ , µ = max1≤j≤n

Rank(Aj), (1.4.2)

where ϑ(k) is certain universal function such that

ϑ(1) = 1, ϑ(2) =π

2, ϑ(k) ≤

√πk/2, k ≥ 3 (1.4.3)

(Theorem 5.2). Note that aside of the trivial case p = 1 (covered by observation (*) in Section1.3.1), the only known results on approximating ‖A‖‖·‖p,|·|∞ deal with the cases of p = 2 andp = ∞ ([6], Theorem 6.2.2) and p = ∞ (Matrix Cube Theorem, [8]). The results of [6] statethat certain efficiently computable upper bounds Φ2(A), Ψ∞(A) on ‖A‖·‖2,|·|∞ , resp., ‖A‖·‖∞,|·|∞satisfy the inequalities

(a) ‖A‖·‖2,|·|∞ ≤ Φ2(A) ≤ √min[m,n]‖A‖·‖2,|·|∞ ,

(b) ‖A‖·‖∞,|·|∞ ≤ Ψ∞(A) ≤ √mn‖A‖·‖∞,|·|∞

(1.4.4)

The quality bound (1.4.4.b) was improved significantly in [8], where it is shown that

‖A‖·‖∞,|·|∞ ≤ Φ∞(A) ≤ ϑ(µ)‖A‖·‖∞,|·|∞ , (1.4.5)

with µ and ϑ(·) given by (1.4.2) and (1.4.3). Note that in the case of µ << min[n,m2/n] (whichis typical in the situation of Example 2, Section 1.1.2), the tightness factor of our bound Ψ2(A),as stated by (1.4.2), is much better than the tightness factor of the bound Φ2(A) given by(1.4.4.a). When p = ∞, (1.4.2) recovers (1.4.4.b). Note that both Ψp(A) and (1.4.2) are givenby interpolation, similar to the one developed in Chapter 4, between the case of p = ∞, coveredby the Matrix Cube Theorem, and the trivial case of p = 1.

16

Page 25: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Chapter 2

Complexity of the Matrix Normproblem

In this Chapter, we demonstrate that the problem of computing the matrix norm ‖A‖p,r =max

x∈Rn:‖x‖p≤1‖Ax‖r of an m× n matrix A is NP-hard, provided that ∞ ≥ p > r ≥ 1.

2.1 NP-Hardness: preliminaries

In this Section, we present the “complexity framework” to be used in the sequel; this frameworkis borrowed from [1], Chapter 5.

2.1.1 Generic optimization problems: instances, data vectors and sizes

A generic optimization problem P is a collection of instances – optimization programs of theform

(p) : maxx

f(p)(x) : x ∈ X(p) ⊂ Rn(p)

where n(p) is the design dimension of instance (p), X(p) is the feasible set of the instance, andf(p)(x) : X(p) → R is instance’s objective.

It is assumed that an instance of P is identified by a finite-dimensional real data vectorData(p); the dimension of this vector is called the size of the instance:

Size(p) = dimData(p).

For example, the problem of computing ‖ · ‖p,r-norm of a real m × n matrix A canbe considered as a generic optimization problem Pp,r with instances of the form

(p) : maxx

f(p)(x) ≡ ‖Ax‖r : x ∈ X(p) ≡ x ∈ Rn : ‖x‖r ≤ 1

;

here the data vector is

Data(p) = (m, n,A11, A21, ..., Am1, A12, ..., Am2, ..., A1n, ..., Ann),

and the size of instance isSize(p) = 2 + mn.

17

Page 26: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

To avoid unnecessary for our purposes technical elaborations, we assume from now on thatall instances of P have nonempty, closed and bounded feasible sets, and that the objectivesare continuous on these sets (which implies solvability of every instance). Note that theserequirements are met by matrix norm problems Pp,r.

2.1.2 ε-solutions

In what follows, we adjust the definitions from [1], Chapter 5, to the case of generic problemswith simple feasible sets of the instances (as it is the case for our problem of interest Pp,r).Specifically, given a generic optimization problem P and ε > 0, we call a vector xε ∈ Rn(p) anε-solution of an instance (p) ∈ P if x is feasible for the instance (i.e., xε ∈ X(p)) and

f(p)(xε)− minx∈X(p)

f(p)(x) ≤ ε.

The quantity

Digits(p, ε) = ln

(Size(p) + ‖Data(p)‖1 + ε2

ε

)(2.1.1)

is called the number of accuracy digits in an ε-solution to an instance (p) ∈ P. Note thatthis quantity grows as ln(1/ε) when ε → +0, in full accordance with the intuition on what“the number of accuracy digits” should be; the specific numerator in (2.1.1) (which becomesirrelevant when ε is small) is motivated by technical reasons.

2.1.3 Model of computations, solution algorithms, complexity and polyno-mial time solvability

We assume that instances of P are solved on a Real Arithmetic computer – an idealized computercapable to store reals and to operate with them, specifically, capable to perform precisely fourarithmetic operations, comparisons and computation of elementary functions like sin, √, expwith real operands; every operation of this type (“an elementary operation”) takes unit time.Now, a solution algorithm A for P is a code for Real Arithmetic computer with the followingproperties. When solving an instance (p) ∈ P, the computer gets on input the data vectorData(p) of the instance and required accuracy ε > 0 and starts to execute code A on thesedata. The code should ensure that after finitely many elementary operations the computationterminates, and an ε-solution to the instance is returned. The complexity CA(p, ε) of finding anε-solution to an instance (p) ∈ P by solution algorithm A is, by definition, the running time (thenumber of elementary operations) of executing A on the data Data(p), ε. A solution algorithmA is called polynomial time, if this complexity is bounded by a polynomial in the size of (p) andthe number of accuracy digits in the resulting solution:

CA(p, ε) ≤ CASizeα(p)Digitsβ(p, ε) ∀((p) ∈ P, ε > 0).

Finally, P is called polynomially solvable (“computationally tractable”), if P admits a polyno-mial time solution algorithm.

There are good reasons to treat the notion of polynomial solvability of a generic optimizationproblem as a good formalization of the informal property of being “efficiently solvable”.

18

Page 27: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

2.2 Computationally intractable optimization problems

Treating polynomial time solvability of a generic optimization problem as a synonym of “com-putational tractability” of the problem, we were supposed to qualify a generic problem as “com-putationally intractable” is it does not admit a polynomial time solution algorithm. However, atthe present level of our knowledge, no examples of natural generic optimization problems whichare provably intractable are known, and at present “computational intractability” of a genericcontinuous optimization problem P means a weaker fact, specifically, NP-hardness of P. Thelatter notion takes its origin in Combinatorial Optimization, and we start with outlining it.

2.2.1 Combinatorial Complexity Theory: problem classes NP and P

In Combinatorial Optimization, we are interested in solving generic problems with instances ofthe form

given a, check whether there exists x such that P(a, x) = 1,

where

• a is a finite binary word – the data of the instance;

• candidate solutions x are finite binary words;

• P (a, x) is a characteristic for the generic problem in question predicate – function definedon all pairs of finite binary words and taking values 0, 1.

A generic combinatorial problem P is said to belong to class NP, if

1. The associated predicate P (a, x) is polynomially computable – there exists a Turing ma-chine which, given on input finite binary words a, x, computes P (a, x) in time polynomialin the length length(a) + length(x) of the input; here length(s) is the number of binarydigits in a finite binary word s.

Instead of Turing machine, we could speak about a computer capable to storefinite binary words and carry out standard logical operations with bits. Notethat such a computer is capable to carry out arithmetic operations with ratio-nal numbers (represented as pairs of integers – numerators and denominators offractions representing the rationals). The major difference with the Real Arith-metic computer, where every arithmetic operation takes unit time, is that nowan arithmetic operation with rational operands takes time polynomial in the“lengths” of the operands.

2. The lengths of meaningful candidate solutions are polynomially bounded in terms of thelength of the data. In other words, we can associate with P a positive integer χ such that

P (a, x) = 1 ⇒ length(x) ≤ χ(1 + length(a))χ. (2.2.1)

Note that there is a conceptually very simple way to solve instances of a NP-generic problemP. Indeed, given data a of an instance, we could take one by one all finite binary words xsatisfying (2.2.1) and compute P (a, x). As a result, we either find a solution to the instance,or arrive at a correct conclusion that the instance has no solutions at all; the latter happenswhen all x satisfying (2.2.1) are checked, and no solution is found. This “brute force” approach

19

Page 28: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

is, however, impractical, since the number candidate solutions satisfying (2.2.1) is exponentialin length(a), and so is the running time of the outlined procedure. What we would like tohave, is a polynomial time solution algorithm for P – a code for a Turing machine (or theoutlined computer with bitwise operations) which, as applied to the data a of an instance of P,in polynomial in length(a) time returns a solution to the instance or reports correctly that nosolution exists. NP-combinatorial problems which admit polynomial time solution algorithmsare called CCT-polynomially solvable (“CCT” stands for “Combinatorial Complexity Theory”),and their class is denoted by P . There are good reasons to think that polynomial solvability ofa combinatorial problem is “theoretical equivalent” of the informal property of the problem tobe efficiently solvable.

2.2.2 NP-hard combinatorial problems

One of the major results of Combinatorial Complexity Theory establishes existence of universal(“NP-complete”) problems – generic NP-problems P such that polynomial solvability of P wouldimply polynomial solvability of every NP-problem and thus – the equality P=NP. Whether thelatter equality indeed takes place, it is the major open question in Computer Science; theconjecture is that the quality does not take place, so that there indeed exist “computationallyintractable” generic problems in NP. The reason for this conjecture is, that quite a lot amongNP-complete problems there are of definite practical interest (like Travelling Salesman, BooleanProgramming, numerous Scheduling problems, etc.); although over the years these numerousproblems were attacked by thousands of top-rate researchers, no efficient solution algorithms forthese problems were found. After it was discovered that all these problems are equivalent toeach other as far as efficient solvability is concerned, the huge total research effort invested inthese problems makes the very existence of polynomial time algorithms for them highly unlikely.As a result, at the present level of our knowledge NP-complete problems and NP-hard ones(those at least as complicated as NP-complete problems) are thought of to be computationallyintractable.

2.2.3 Difficult problems of Continuous Optimization

“Real Arithmetic Complexity Theory” – the one based upon Real Arithmetic model of com-putation – borrows the results on NP-completeness in order to conclude that certain genericproblems of continuous optimization are computationally intractable “modulo the conjecturethat P6=NP”. Since the “P6=NP” conjecture is a common belief, such a conclusion usually isworded merely as computational intractability (or NP-hardness) of the corresponding continu-ous problems. The underlying reasoning is as follows. Consider a generic optimization problemP and assume that the objectives of instances from P are everywhere defined and polynomiallycomputable (that is, for every (p) ∈ P and every x ∈ Rn(p), the real f(p)(x) can be computedin polynomial in Size(p) number of operations of Real Arithmetics). Assume that there exists aNP-combinatorial problem Q with the following properties:

1. Q is NP-complete;

2. Q can be reduced to P in the following sense:

There exists a polynomial time (in the sense of Combinatorial ComplexityTheory!) algorithm M which, given on input the data vector Data(q) of aninstance (q) ∈ Q, converts it into a triple Data(p[q]), ε(q), µ(q) comprised of the

20

Page 29: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

data vector of an instance (p[q]) ∈ P, positive rational ε(q) and rational µ(q)such that (p[q]) is solvable and

— if (q) is unsolvable, then the value of the objective of (p[q]) at everyε(q)-solution to this problem is ≤ µ(q)− ε(q);

— if (q) is solvable, then the value of the objective of (p[q]) at every ε(q)-solution to this problem is ≥ µ(q) + ε(q).

We claim that in the case in question we have all reasons to qualify P as a “computationallyintractable” problem. Assume, to the contrary, that P admits a polynomial time solutionalgorithm B, and let us look what happens if we apply this algorithm to solve (p[q]) withinaccuracy ε(q). Since (p[q]) is solvable, the method must produce an ε(q)-solution x to (p[q]).With additional “polynomial time effort” we may compute the value of the objective of (p[q]) at x(recall that the objectives of instances from P are assumed to be polynomially computable). Nowwe can compare the resulting value of the objective with µ(q); by the definition of reducibility,if this value is ≤ µ(q), then q is unsolvable, otherwise q is solvable. Thus, we get a correct “RealArithmetic” solvability test for Q. What is the (Real Arithmetic) running time of this test?By definition of a Real Arithmetic polynomial time algorithm, it is bounded by a polynomial ofs(q) = Size(p[q]) and

d(q) = Digits((p[q]), ε(q)) = ln

(Size(p[q]) + ‖Data(p[q])‖1 + ε2(q)

ε(q)

).

Now note that if ` = length(Data(q)), then the total number of bits in Data(p[q]) and in ε(q) isbounded by a polynomial of ` (since the transformation Data(q) 7→ (Data(p[q]), ε(q), µ(q)) takesCCT-polynomial time). It follows that both s(q) and d(q) are bounded by polynomials in `, sothat our “Real Arithmetic” solvability test for Q takes a number of arithmetic operations whichis polynomial in length(Data(q)).

Recall that Q was assumed to be an NP-complete generic problem, so that it would be“highly improbable” to find a CCT-polynomial time solvability test for this problem, while wehave managed to build such a test, with the only (but important!) difference that our test isa Real Arithmetic one – it uses “incomparable more powerful” elementary operations. Well,a “reasonable” Real Arithmetic algorithm – one which can be used in actual computations –must be tolerant to “small rounding errors”. Specifically, such an algorithm, as applied toa pair ((p), ε) should be capable to “say” to the computer: “I need to work with reals withsuch and such number of binary digits before and after the dot, and I need all elementaryoperations with these reals to be precise within the same number of accuracy digits”, and thealgorithm should preserve its performance and accuracy guarantees, provided that the computermeets the indicated requirement. Moreover, for a “reasonable” Real Arithmetic algorithm theaforementioned “number of digits before and after the dot” must be polynomial in Size(p) andDigits(p, ε) 1). With these assumptions, our polynomial time Real Arithmetic solvability testcan be easily converted into a CCT-polynomial time solvability test for Q, which – once again –hardly could exist. Thus, a Real Arithmetic polynomial time algorithm for P hardly could existas well.

1) In fact, this property normally is included into the very definition of a Real Arithmetic polynomial timealgorithm; we prefer to skip these boring technicalities and to work with a simplified definition.

21

Page 30: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

2.3 NP-hardness of the Matrix Norm Problem

2.3.1 The strategy

We are about the apply the scheme presented in Section 2.2.3 in order to demonstrate that theproblem Pp,r of computing ‖A‖p,r is NP-hard, provided that ∞ ≥ p > r ≥ 1. Specifically, weintend to reduce to Pp,r the following NP-complete problem

Stones: Given n positive integers aj (“weights of the stones”), check whether theequation ∑

j

ajxj = 0

has a solution with xj ∈ −1, 1, j = 1, ..., n (“whether the stones can be partitionedinto two groups of equal weights”).

The data in this problem is given by the collection (n, a1, ..., an); the problem clearly is in NP,and it is well known that it is NP-complete.

Our plan is as follows: given the data n, a = (a1, ..., an) of the Stones problem, we associatewith these data and with δ > 0 the symmetric n× n matrix

Aδ = I − δaaT .

Observe that‖Aδ‖p,r = max

x‖Aδx‖r : ‖x‖p ≤ 1 = n−1/pC1/r(Aδ)

C(Aδ) = maxx

‖Aδx‖r

r : ‖x‖p ≤ n1/p

.

(2.3.1)

Efficient computability of ‖Aδ‖p,r is clearly exactly the same as efficient computability of C(Aδ),so that all we need is to verify that the latter problem is difficult. To this end, observe that ifthe instance Stones(a) of Stones corresponding to data a is “positive”, that is, the equation∑j

ajxj has a solution x with entries ±1, then

C(Aδ) ≥ ‖Aδx‖rr = ‖x− δ aT x︸︷︷︸

=0

a‖rr = µ(a) ≡ n.

We shall prove that when δ = δ(a) > 0 is chosen properly, thenA.1. If Stones(a) is “negative” (i.e., the corresponding equation in variables ±1 has no

solution), thenC(Aδ) < µ(a)− ε(a) (2.3.2)

with positive rational ε(a);A.2. Both δ(a) and ε(a) can be obtained from the data a of the Stones by a CCT-polynomial

time algorithm.Taken together, A.1 and A.2 say that the NP-complete problem Stones admits polynomial timereduction, as defined in Section 2.2.3, to Pp,r, so that the latter problem is computationallyintractable.

2.3.2 Demonstrating A.1

Claim A.1 relates to the case when a is an integral vector such that aT x 6= 0 whenever x is avector with coordinates ±1; till the end of this Section, we assume that a satisfies this condition.

22

Page 31: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Step 1. Let us set

Sn = x ∈ Rn : ‖x‖p ≤ n1/p;Xn = x ∈ Rn : xj ∈ −1; 1, j = 1, ..., n ⊂ Sn;Xγ

n = x ∈ Sn : ||xj | − 1| ≤ γ.(2.3.3)

Lemma 2.1 Letθ(r) = r(r − 1)max[(1/2)r−2, (3/2)r−2]

and let(a) 0 ≤ γ ≤ min

[rθ−1(r), (2‖a‖1)−1, (16‖a‖2

2n3/2)−1

],

(b) 0 < δ ≤ min[(4n3/2‖a‖2

2)−1, (16θ(r)‖a‖4

2n3)−1

].

(2.3.4)

Thenx ∈ Xγ

n ⇒ ‖Aδx‖rr ≤ n− r

4δ. (2.3.5)

Proof. Let x ∈ Xγn , x be a point with coordinates ±1 such that |xj − xj | ≤ γ, j = 1, ..., n, and

let yj = xjxj , so that |xj | = yj = 1 + γj with |γj | ≤ γ. We have

|δ(aT x)aj | ≤ δ‖a‖22‖x‖2 ≤ δ‖a‖2

2n3/2,

where the concluding inequality is readily given by the fact that x ∈ Sn, whence |xj | ≤ n for allj and, in particular,

‖x‖2 ≤ n3/2. (2.3.6)

Invoking (2.3.4.b), we get |δ(aT x)aj | ≤ 1/4, while 5/4 ≥ yj ≥ 3/4 by (2.3.4.a).On the segment [1/2, 3/2] the function g(s) = sr is twice continuously differentiable with the

second derivative not exceeding θ(r) = r(r − 1)max[(1/2)r−2, (3/2)r−2], whence

|(Aδx)j |r = |xj − δ(aT x)aj |r = |yj − δ(aT x)aj xj |r = (yj − δaT xaj xj)r

≤ yrj − rδaT xajy

r−1j xj + 1

2θ(r)δ2(aT x)2a2j

≤ yrj − rδaT xajxj +

[rδ|aT x||aj ||yr−1

j xj − xj |+ 12θ(r)δ2(aT x)2a2

j

]

⇒ ‖Aδx‖rr ≤ ‖x‖r

r − rδ(aT x)2 +∑j

[rδ|aT x||aj ||yr−1

j − yj |+ 12θ(r)δ2(aT x)2a2

j

].

(2.3.7)

Since yj ∈ [1− γ, 1 + γ] ⊂ [1/2, 3/2], we have

|yr−1j − yj | ≤ |yr−1

j − 1|+ |yj − 1| ≤ r−1θ(r)γ2 + γ ≤ 2γ,

where the concluding inequality is given by (2.3.4.a). Therefore the concluding inequality in(2.3.7) implies that

‖Aδx‖rr ≤ ‖x‖r

r − rδ(aT x)2 + 2rδγ|aT x|‖a‖∞ +12θ(r)δ2|aT x|2‖a‖2

2. (2.3.8)

Now observe that aT x is an integer different from 0, that is, |aT x| ≥ 1. At the same time,|xj − xj | ≤ γ for all j, whence

|aT x| ≥ |aT x| − γ‖a‖1 ≥ 12

(2.3.9)

23

Page 32: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

(we have used (2.3.4.a)). Further,

2rγ|aT x|‖a‖∞ ≤ 2rγ‖a‖22‖x‖2 ≤ 2rγ‖a‖2

2n3/2 ≤ r

8(2.3.10)

(we have used (2.3.4.a) and (2.3.6)) and

12θ(r)δ|aT x|2‖a‖2

2 ≤12θ(r)δ‖a‖4

2‖x‖22 ≤

12θ(r)δ‖a‖4

2n3 ≤ r

8(2.3.11)

(we have used (2.3.4.b) and (2.3.6)). Combining (2.3.8) – (2.3.11), we arrive at (2.3.5).

Step 2. Now let us prove the following simple fact (which is the only component in the proofof NP-hardness of Pp,r where the assumption p > r is used):

Lemma 2.2 Let(a) γ ∈ (0, 1/2),(b) δ ∈ [0, n−3/2‖a‖−1

2 ](2.3.12)

and let x ∈ Sn\Xγn . Then

‖x‖rr ≤ n−

κ(1−κ)nκ−222−2pγ2

2 , p < ∞1− (1− γ)r, p = ∞︸ ︷︷ ︸

νp,r(γ)

(2.3.13)

and consequently‖Aδx‖r

r ≤ n− νp,r(γ) + r(2n2)r−1n7/2‖a‖2δ. (2.3.14)

Proof. Let us first prove (2.3.13).

(2.3.13), case of p < ∞: Assuming that p < ∞, let yj = |xj |p, κ = r/p. Observe that

(a)∑j

yj ≤ n,

(b) κ ∈ (0, 1)(c) ∃j∗ : |yj∗ − 1| ≥ 21−pγ(d) ‖x‖r

r =∑j

yκj

(2.3.15)

Indeed, (a) is readily given by x ∈ Sn, (b) follows from the fact that 1 ≤ r < p < ∞, (d) isevident. To prove (c), note that since x 6∈ Xγ

n , there exists j∗ such that ||xj∗ | − 1| > γ, that is,yj∗ = |xj∗ |p = (1 + ∆)p, where ∆ = |xj∗ | − 1 satisfies ∆ ≥ −1 and |∆| > γ. If ∆ > 0, we haveyj∗ − 1 ≥ (1 + γ)p − 1 ≥ pγ, which is even more than is required in (c). When ∆ < 0, we have1− yj∗ = 1− (1 + ∆)p ≥ 1− (1− γ)p ≥ p(1− γ)p−1γ, where the concluding inequality is givenby the fact that the function sp is convex on the ray s ≥ 0, so that sp + psp−1(1− s) ≤ 1p = 1for all s. Since γ ∈ (0, 1/2), we conclude that when ∆ < 0, we have 1 − yj∗ ≥ p(1 − 1/2)p−1γ,and (c) follows.

Setting ∆j = yj − 1, we have by (d):

‖x‖rr =

j

(1 + ∆j)κ =∑

j

[1 + κ∆j − κ(1− κ)

2(1− ξj∆j)κ−2∆2

j

]

24

Page 33: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

with properly chosen ξj ∈ (0, 1), whence, taking into account that 0 ≤ yj ≤ n due to x ∈ Sn

and that κ ∈ (0, 1),

‖x‖rr ≤ ∑

j

[1 + κ∆j − κ(1−κ)nκ−2

2 ∆2j

]

= n + κ

j

(yj − 1)

︸ ︷︷ ︸≤0 by (2.3.15.a)

−∑

j

κ(1− κ)nκ−2

2∆2

j

︸ ︷︷ ︸≥κ(1−κ)nκ−222p−2γ2

2by (2.3.15.c)

≤ n− κ(1−κ)nκ−222−2pγ2

2 ,

as required in (2.3.13).

(2.3.13), case of p = ∞: Setting yj = |xj |, we have 0 ≤ yj ≤ 1 due to x ∈ Sn; besides this,there exists j∗ such that yj∗ ≤ 1− γ due to x 6∈ Xγ

n . We now have

‖x‖rr =

j

yrj ≤ (n− 1) + (1− γ)r ≤ n− [1− (1− γ)r],

as required in (2.3.13). Relation (2.3.13) is proven.

(2.3.14): We have ‖Aδx‖r = ‖x− δ(aT x)a‖r ≤ ‖x‖r + δ|aT x|‖x‖r ≤ ‖x‖r + δ‖a‖2‖x‖2‖x‖r ≤‖x‖r + δ‖a‖2n

7/2, where the concluding inequality is given by the facts that |xj | ≤ n for all jdue to x ∈ Sn. Thus,

‖Aδx‖r ≤ ‖x‖r + δ‖a‖2n7/2 & ‖x‖r ≤ n2. (2.3.16)

Invoking (2.3.12.b), we conclude that ‖Aδx‖r ≤ 2n2 and ‖x‖r ≤ n2. On the segment 0 ≤ t ≤ 2n2

the function f(t) = tr is Lipschitz continuous with constant r(2n2)r−1, so that (2.3.16) impliesthat

‖Aδx‖rr ≤ ‖x‖r

r + r(2n2)r−1n7/2‖a‖2δ,

which combines with (2.3.13) to imply (2.3.14).

Step 3. Now we can complete the derivation of A.1. Let us set

γ = γ(a) = min[1/2, rθ−1(r), (2‖a‖1)−1, (16‖a‖2

2n3/2)−1

](2.3.17)

(see (2.3.4)). Further, let δ = δ(a) be a rational number of the form 2−k with integer k suchthat

14 min

[νp,r(γ(a))

2r(2n2)r−1n7/2‖a‖2 , (4n3/2‖a‖22)−1, (16θ(r)‖a‖4

2n3)−1

]≤ δ(a)

≤ min[

νp,r(γ(a))

2r(2n2)r−1n7/2‖a‖2 , (4n3/2‖a‖22)−1, (16θ(r)‖a‖4

2n3)−1

] (2.3.18)

(see (2.3.4) and (2.3.14)). By Lemma 2.1, for x ∈ Sn ∩Xγn we have

‖Aδx‖rr ≤ n− r

4δ,

while by Lemma 2.2 for x ∈ Sn\Xγn we have

‖Aδx‖rr ≤ n− 1

2νp,r(γ(a)).

25

Page 34: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

It follows that when Stones(a) is “negative”, we have

C(Aδ) ≡ maxx

‖Aδx‖r

r : ‖x‖p ≤ n1/p≤ n−min

[r

4δ(a),

12νp,r(γ(a))

]

︸ ︷︷ ︸ω(a)

. (2.3.19)

Choosing ε(a) = 2−k(a) with integer k(a) in such a way that ω(a)/4 ≤ ε(a) ≤ ω(a), we arrive atA.1.

2.3.3 Demonstrating A.2

From the description of δ(a) and ε(a) as presented in the previous Section it is clear that both thequantities are rational numbers of the form 2−k with integer k ∈ [floor(log2(s))−1, floor(log2(s))],where s is given by an explicit easy-to-compute in Real Arithmetic model of computation formulainvolving a, p, r and n = dim a. Note also that from formulae for the quantities s underlyingδ(a) and ε(a) it follows that 1 ≥ s ≥ c(n‖a‖∞)−d with c > 0 and d > 0 depending solely onp, r. Since by evident reasons n is at most the binary length length(a) of the data of Stones(a)(indeed, every one of n entries in the vector a requires at least one bit to be represented), weget 1 ≥ s ≥ c length−2d(a) with c > 0 depending solely on p, r. From these observations it isimmediately seen that given a, one can compute the quantities s underlying δ(a) and ε(a) withinrelative accuracy 0.25 in the number N(a) of bit operations which is polynomial in length(a);after the corresponding s is computed within relative accuracy 0.25, we can immediately recoverδ(a) and ε(a) in polynomial in length(a) number of bit-wise operations. After the rational δ(a)is computed, it clearly takes polynomial in length(a) number of bit-wise operations to build theentries of the rational matrix Aδ(a) = I − δ(a)aaT . A.2 is proven, and thus NP-hardness of Pp,r

in the case of ∞ ≥ p > r ≥ 1 is demonstrated.

26

Page 35: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Chapter 3

Approximating ‖A‖p,r in the case of1 ≤ r ≤ 2 ≤ p ≤ ∞

In this Chapter, we refine the results of Nesterov, stated in Proposition 1.5, on approximating‖A‖p,r in the “good case”, where 1 ≤ r ≤ 2 ≤ p ≤ ∞.

3.1 Semidefinite Relaxation bound on ‖A‖p,r

3.1.1 Derivation of the bound

Nesterov’s upper bound Ψp,r(A) on ‖A‖p,r (see (1.3.7) is readily given by the standard Semidef-inite Relaxation scheme. The derivation is as follows.

Notation. For a k×k matrix M , let dg(M) denote the k-dimensional column vector comprisedof the diagonal entries of M . For a vector y ∈ Rk, let |y| be the vector with the coordinates|yi|, and [y]2, be the vector with the coordinates y2

i . Finally, let ‖y‖s for 0 < s < 1 be definedby exactly the same formula

‖y‖s =

(∑

i

|yi|s)1/s

as in the case of s ∈ [1,∞); note that ‖y‖s is not a norm when 0 < s < 1; the only property ofnorm which fails to be true in this case is the triangle inequality. Moreover, it is immediatelyseen that the function ‖y‖s, 0 < s < 1, is concave in the domain y ≥ 0.

27

Page 36: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

The construction. Let A ∈ Rm×n, and let ATi be the rows of A, i = 1, ...,m. For 1 ≤ p, r ≤ ∞

we have by the definition of the induced norm:

‖A‖p,r = max ‖Ax‖r : ‖x‖p ≤ 1= max

x

√[|Ax|]2‖r/2 : ‖x‖p ≤ 1

[since clearly ‖y‖r =√‖[y]2‖ r

2]

= maxx

(m∑

i=1

(AT

i xxT Ai

) r2

) 1r

:√‖dg(xxT )‖ p

2≤ 1

[since ‖x‖p =√‖[x]2] p

2=

√‖dg(xxT )‖ p

2]

= maxx∈Rn

√‖dg(A(xxT )AT )‖ r

2: ‖dg(xxT )‖ p

2≤ 1

= maxX∈Sn

√‖dg(AXAT )‖ r

2: ‖dg(X)‖ p

2≤ 1, X = xxT for some x ∈ Rn

(a)

≤ maxX

√‖dg(AXAT )‖ r

2: X ∈ Sn, X º 0, ‖dg(xxT )‖ p

2≤ 1

(b)

where the concluding ≤ comes from the fact that a matrix of the form xxT always is symmetricpositive semidefinite, so that passing from maximization in (a) to the one in (b), we can onlyincrease the optimal value. We arrive at the inequality

‖A‖p,r ≤ Ψp,r(A) ≡ maxX

√‖dg(AXAT )‖ r

2: X ∈ Sn

+, ‖dg(X)‖ p2≤ 1

,Sn

+ = X ∈ Sn : X º 0.(3.1.1)

By construction, the resulting inequality is valid for all p, r ∈ [1,∞]. Now consider the “goodcase” where 1 ≤ r ≤ 2 ≤ p ≤ ∞. Then the optimization problem in (3.1.1) is efficientlysolvable. Indeed, since p ≥ 2, the feasible set of the problem is cut off the “computationallytractable” convex set Sn

+ by explicit convex constraint ‖dg(X)‖ p2. Further, since r ≤ 2, the

objective in our maximization problem is an explicit concave function of X º 0 (since ‖y‖ r2,

and then also√‖y‖ r

2, is a concave function of y ≥ 0. As a result, the problem can be solved to

any desired accuracy ε > 0 at the arithmetic cost (that is, number of elementary operations ofexact Real Arithmetic) which is polynomial in the sizes m,n of A and the “number of accuracydigits” ln(mnε−1 ∑

i,j|Aij |) by, e.g., the Ellipsoid method (see, e.g., general theory presented in

[1], Chapter 5).

3.1.2 Processing the bound

Definition of the bound Ψp,r(A) as given by (3.1.1) looks different from the definition (1.3.7) ofthe similarly denoted quantity in Chapter 1. We are about to prove that both definitions areequivalent to each other.

Theorem 3.1 Let A ∈ Rm×n, and let 1 ≤ r ≤ 2 ≤ p ≤ n. Then the quantity Ψp,r(A) given by(3.1.1) can be represented as

Ψp,r(A) =12

minµ,ν

‖µ‖ p

p−2+ ‖ν‖ r

2−r:

[Diagµ AT

A Diagν

]º 0

. (3.1.2)

Proof. 10. It is easily seen that both sides in (3.1.2) are continuous in r, p in the domain1 ≤ r ≤ 2 ≤ p ≤ ∞, so that it suffices to prove the relation when 1 < r < 2 < p < ∞.

We start with the following

28

Page 37: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Lemma 3.1 Let s ∈ (2,∞), and let s+ = ss−2 , s∗ = s

s−1 . Then for every nonnegative vectord ∈ Rm one has

infλ

i

di

λi: λ > 0, ‖λ‖s+ ≤ 1

= ‖d‖ s∗

2. (3.1.3)

Proof. The optimization problem in (3.1) is convex, so that every Karush-Kuhn-Tucker pointof the problem is its optimal solution. Assuming w.l.o.g. that di > 0 for all i and rewriting thenonlinear constraint equivalently as

∑i

λs+

i ≤ 1, a KKT point λ should satisfy the relations

di

λ2i

= γs+λs+−1i ,

where γ ≥ 0 is the Lagrange multiplier of the nonlinear constraint. From these equations γ > 0,so that the nonlinear constraint is active at every KKT point. Thus, at a KKT point we should

have λi = θd1

s++1

i with θ > 0 such that∑i

λs+

i = 1. It follows that

θ =

(∑

i

d

s+s++1

i

)− 1s++1

and

λi =d

1s++1

i(∑i

d

s+s++1

i

) 1s++1

.

Reversing our computations, we see that the latter formulas indeed define a KKT point, andthus – an optimal solution to the optimization problem in (3.1.3). The resulting optimal valueis

i

d1− 1

s++1

i θ−1 =

(∑

i

d

s+s++1

i

) s++1

s+

= ‖d‖ s+s++1

= ‖d‖ s∗2

.

20. We now have

(Ψp,r(A))2

= maxX

‖dg(AXAT )‖ r

2: X ∈ Sn, X º 0, ‖dg(X)‖ p

2≤ 1

[by (1.1.6)]

= maxXº0,

‖dg(X)‖ p2≤1

infλ>0,

‖λ‖ r2−r

≤1

∑i

(AXAT )ii

λi[Lemma 3.1 with s = r

r−1 ]

whence, invoking the von Neumann Lemma,

(Ψp,r(A))2 = infΛ=DiagλÂ0,‖λ‖ r

2−r≤1

maxXº0,

‖dg(X)‖ p2≤1

Tr(AXAT Λ−1

)= inf

Λ=DiagλÂ0,‖λ‖ r

2−r≤1

maxXº0,

‖dg(X)‖ p2≤1

Tr(X[AT Λ−1A]

).

(3.1.4)

Lemma 3.2 Let B ∈ Sn, let Q be a k×n matrix of rank n, and let s ∈ [1,∞], s∗ = ss−1 . Then

maxX

Tr(XB) : X º 0, ‖dg(QXQT )‖s ≤ 1

= min

ζ

‖ζ‖s∗ : QT DiagζQ º B

= minζ

‖ζ‖s∗ : QT DiagζQ º B, ζ ≥ 0

.

(3.1.5)

29

Page 38: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Proof. Since Q is of rank n, both sides in the first equality in (3.1.5) are continuous in s, so thatit suffices to prove this equality in the case of s ∈ (1,∞). When ζ is such that QT DiagζQ º B,then for every X º 0 with ‖dg(QXQT )‖s ≤ 1 one has

Tr(XB) ≤ Tr(XQT DiagζQ) = Tr(QXQT Diagζ) = ζT dg(QXQT )≤ ‖dg(QXQT ))‖s‖ζ‖s∗ ≤ ‖ζ‖s∗

(we have used the Holder inequality). Thus, the value ‖mu‖s∗ of the objective of the optimizationproblem in the right hand side of the first equality in (3.1.5), let this problem be called (P ),is ≥ the objective of the left hand side problem in the same equality at every feasible solutionX to the latter problem, so that the right hand side in the first equality in (3.1.5) is ≤ theleft hand side. To prove the inverse inequality, note that if X∗ is an optimal solution to the(clearly solvable due to Rank(Q) = n) optimization problem in the left hand side of (3.1.5), thenTr(X∗B) ≥ 0 and therefore we may assume w.l.o.g. that ‖dg(QX∗QT )‖s = 1. By optimalityconditions, there exist ω ≥ 0 such that the matrix

G ≡ ∇X

∣∣∣∣X=X∗

[−Tr(XB) + ω‖dg(QXQT )‖s

]∈ Sn

satisfies the relation

〈G,X −X∗〉Sn ≡ Tr(G(X −X∗)) ≥ 0 ∀X º 0.

Setting X = X∗ + Y with Y º 0, we see that Tr(GY ) ≥ 0 whenever Y º 0, whence G º 0; inparticular, Tr(GX∗) ≥ 0 due to X∗ ≥ 0. Setting X = 0, we get −Tr(GX∗) ≥ 0, while, as wehave just seen, Tr(GX∗) ≥ 0. We see that

G º 0 & Tr(GX∗) = 0. (3.1.6)

By origin of G, we haveG = −B + θQT Diagζ∗Q, (3.1.7)

where ζ∗ is the gradient of ‖ · ‖s at the point dg(QX∗QT ) ≥ 0; it follows that ζ∗ satisfies

ζ∗ ≥ 0, ‖ζ∗‖s∗ = 1, ζT∗ dg(QX∗QT ) = ‖dg(QX∗QT )‖s = 1. (3.1.8)

From (3.1.7) and the first relation in (3.1.6) it follows that

QT Diagωζ∗Q º B,

while the second relation in (3.1.6) combines with (3.1.7) to imply the first equality in thefollowing chain:

Tr(X∗B) = ωTr(XQT Diagζ∗Q) = θTr([QX∗QT ]Diagζ∗) = ωζT∗ dg(QX∗QT ) = θ,

where the second and the third equalities are evident and the fourth one is given by (3.1.8).Thus, the left hand side in (3.1.5) is equal to ω. Now, by (3.1.8) ζ ≡ ωζ∗ is a feasible solution tothe minimization problem, let it be called (P ), in the right hand side of (3.1.5), and the valueof the objective at this solution, by the same (3.1.8), equal to θ, that is, to the left hand sideof (3.1.5). Since the (P ) is a minimization problem, the left hand side in (3.1.5) is ≥ the righthand side one. We have seen that the opposite inequality also holds true, thus, the first equality

30

Page 39: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

in (3.1.5) is true and ωζ∗ is an optimal solution to (P ). This solution is nonnegative by (3.1.8),which implies the second equality in (3.1.5).

30. In view of Lemma 3.2 (where one sets Q = In, B = AT Λ−1A and s = p/2), relation(3.1.4) implies that

(Ψp,r(A))2 = = infΛ=DiagλÂ0,‖λ‖ r

2−r≤1

maxXº0,

‖dg(X)‖ p2≤1

Tr(X[AT Λ−1A]

)

= infΛ=DiagλÂ0,‖λ‖ r

2−r≤1

minζ≥0

‖ζ‖ p

p−2: Diagζ º AΛ−1AT

= infλ>0,ζ≥0

‖ζ‖ p

p−2:

[Diagζ AT

A Diagλ

]º 0, ‖λ‖ r

2−r≤ 1

,

where the concluding equality is given by the Schur Complement Lemma:

A symmetric block matrix

[P QT

Q R

]with R Â 0 is positive semidefinite iff the

matrix P −QT R−1Q is so.

We have arrived at the relation

(Ψp,r(A))2 = infλ>0,ζ≥0

‖ζ‖ p

p−2:

[Diagζ AT

A Diagλ

]º 0, ‖λ‖ r

2−r≤ 1

.

The optimal value in the right hand side problem clearly is the same as in the problem wherethe constraint ‖λ‖ r

2−r≤ 1 is replaced with ‖λ‖ r

2−r= 1, so that

(Ψp,r(A))2 = infλ>0,ζ≥0

‖ζ‖ p

p−2:

[Diagζ AT

A Diagλ

]º 0, ‖λ‖ r

2−r= 1

. (3.1.9)

We are about to prove that the latter relation implies our target equality (3.1.2). To this end,note that the optimization problem in the right hand side of (3.1.2) clearly has the same optimalvalue Opt(P ) as the optimization problem

minµ≥0,ν>0

‖µ‖ p

p−2+ ‖ν‖ r

2−r:

[Diagµ AT

A Diagν

]º 0

. (P )

Now, if (µ, ν) is a feasible solution to (P ), then, by the Schur Complement Lemma, so is the“scaled solution” (θ−1µ, θν), where θ > 0. It follows that the feasible set of (P ) is exactly theset of all scalings of “normalized” feasible solutions to the problem, that is, feasible solutions(ζ, λ) with ‖λ‖ r

2−r= 1. In other words,

Opt(P )

= infθ>0,ζ≥0,λ>0

θ−1‖ζ‖ p

p−2+ θ :

[Diagζ AT

A Diagλ

]º 0, ‖λ‖ r

r−2

= infζ≥0,λ>0

infθ>0

[θ−1‖ζ‖ p

p−2+ θ

]:

[Diagζ AT

A Diagλ

]º 0, ‖λ‖ r

r−2

= infζ≥0,λ>0

2√‖ζ‖ p

p−2:

[Diagζ AT

A Diagλ

]º 0, ‖λ‖ r

r−2

= 2Ψp,r(A),

31

Page 40: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

where the concluding equality is given by (3.1.9). The resulting equality is nothing but 12Opt(P ),

since, as we remember, the right hand side in (3.1.2) equals to 12Opt(P ).

Remark 3.1 An important feature of the representation (3.1.2) is that the Matrix Inequalityconstraint in the right hand side of (3.1.2) is affine in (µ, ν, A). As a result, not only the boundΨp,r(A) is efficiently computable, it also can be optimized efficiently in A. Specifically, assumethat A = A[u] depends affinely on design parameters u varying in a set U given by an explicitsystem of convex inequalities (S). Then the problem of minimizing Ψp,r(A[u]) over u ∈ U is anexplicit convex optimization problem

minu,µ,ν

12

[‖µ‖ p

p−2+ ‖ν‖ r

2−r

]:

[Diagµ AT [u]

A[u] Diagν

]º 0, u ∈ U

.

Similarly, an upper bound Ψp,r(A) ≤ t on Ψp,r(A) defines a computationally tractable convexset in the space of pairs (A, t), so that one can optimize efficiently convex objectives under thisconstraint.

Remark 3.2 Theorem 3.1 demonstrates that the bound Ψp,r(A) is “enough intelligent” to rec-ognize the identity

‖A‖p,r = ‖AT ‖r∗,p∗ , p∗ =p

p− 1, r∗ =

r

r − 1

(cf. (1.2.9). Indeed, when 1 ≤ r ≤ 2 ≤ p ≤ ∞, we have 1 ≤ p∗ ≤ 2 ≤ r∗ ≤ ∞ as well, and(3.1.2) as applied to A and AT results in

Ψp,r(A) = minµ,ν

12

[‖µ‖ p

p−2+ ‖ν‖ r

2−r

]:

[Diagµ AT

A Diagν

]º 0

(∗)

Ψr∗,p∗(AT ) = minµ,ν

12

[‖µ‖ r∗

r−2+ ‖ν‖ p∗

2−p∗

]:

[Diagµ A

AT Diagν

]º 0

. (∗∗)

Since the symmetric block matrices

[P QT

Q R

]and

[R Q

QT P

], P ∈ Sk, R ∈ S`, are rotations

of each other: [P QT

Q R

]=

[I`

Ik

]T [R Q

QT P

] [I`

Ik

]

a matrix of the form

[Diagµ AT

A Diagν

]is º 0 if and only if the matrix

[Diagν A

AT Diagµ

]

is so. Taking into account that p∗2−p∗ = p

p−2 ,r∗

r∗−2 = r2−r , we see that problems (∗) and (∗∗) differ

only in the notation, so thatΨp,r(A) = Ψr∗,p∗(A

T ). (3.1.10)

3.2 Quality of the bound

We are about to present one of the major results of our Thesis which quantifies the quality ofthe bound Ψp,r(A).

32

Page 41: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

3.2.1 The idea

The idea underlying our derivation originates from [1] and is very simple. The bound Ψp,r(A)as given by (3.1.1) is the optimal value in an optimization problem with a symmetric positivesemidefinite n× n matrix X in the role of the design variable. Treating an optimal solution X∗of the problem as the covariance matrix of a Gaussian n-dimensional random vector ζ with zeromean, let us look at the quantities

QA(r) = E‖Aζ‖2r,

qA(r) = E‖ζ‖2p,

(3.2.1)

where E stands for expectation. Assuming X∗ 6= 0 (this is the only nontrivial case) and setting

ψp,r(A) =√

QA(r)/qA(p), (3.2.2)

we getEζ

‖Aζ‖2

r − ψ2p,r(A)‖ζ‖2

p

= 0,

so that there exists a realization ζ 6= 0 of ζ such that ‖Aζ‖2r − ψ2

p,r(A)‖ζ‖2p ≥ 0, whence

‖A‖p,r ≥ ψp,r(A). Thus, the quantity ψp,r(A) is a lower bound on ‖A‖p,r. As we shall see, thislower bound on ‖A‖p,r is “not too far” from the upper bound Ψp,r(A) on the same quantity, sothat Ψp,r(A) is a “not too bad” upper bound on ‖A‖p,r.

3.2.2 The main result

To implement the outlined plan as applied to a given matrix A ∈ Rm×n, we may w.l.o.g. assumethat A 6= 0 (indeed, in the case of A = 0 we clearly have Ψp,r(A) = 0, so that in this case theupper bound on ‖A‖p,r definitely is precise.

Preliminaries. As we have already mentioned, the optimization problem in (3.1.1), that is,the problem

[Ψp,r(A) ≡] Opt(P ) = maxX

√‖dg(AXAT )‖ r

2: X ∈ Sn, X º 0, ‖dg(X)‖ p

2≤ 1

(3.2.3)

clearly is solvable. We denote by X∗ an optimal solution to the problem. Since A 6= 0, we clearlyhave Opt > 0, so that X∗ 6= 0. In the sequel, we denote by ζ a Gaussian n-dimensional randomvector with zero mean and the covariance matrix X∗, and set η = Aζ. Note that η is a Gaussianm-dimensional random vector with zero mean and the covariance matrix

EηηT = EAζζT AT = AEζζT AT = AX∗AT .

We setσj =

√(X∗)jj , j = 1, ..., n

[⇒ ζj ∼ N (0, σ2

j )]

ρi =√

(AX∗AT )ii, i = 1, ..., m[⇒ ηi ∼ N (0, ρ2

i )].

Finally, for s > 0 let

α(s) =

∞∫

−∞|t|s 1√

2πexp−t2/2dt

1/s

=√

2

Γ

(s+12

)

√π

1/s

, 0 ≤ s < ∞, (3.2.4)

33

Page 42: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

where Γ(t) =∞∫0

xt−1 exp−tdt is the Euler Γ-function. Note that α(s) is a continuous nonde-

creasing function of s ∈ (0,∞) such that α(1) =√

2π , α(2) = 1 and α(s) = (1 + o(1))

√s/e as

s →∞.

Bounding qA(p) from above. 1) Assuming 2 ≤ p < ∞, we have

qA(p) ≡ E‖ζ‖2p

= E

(∑j|ζ|pj

) 2p

≤(E

∑j|ζ|pj

) 2p

[by Jensen’s Inequality, since t2p is concave in t ≥ 0]

=

(∑j

σpj α

p(p)

) 2p

= α2(p)

(∑j

σpj

) 2p

= α2(p)‖dg(X∗)‖ p2

[since σ2j = (X∗)jj ]

≤ α2(p) [since X∗ is feasible for (3.2.3)]

We arrive at the relation

qA(p) ≤ α2(p), 1 ≤ p < ∞. (3.2.5)

2) We can write ζj = σjξj with ξj ∼ N (0, 1). Then

qA(p) ≡ E

(n∑

j=1σp

j |ξj |p) 2

p

≤ E

(n∑

j=1σp

j

) 2p (

max1≤j≤n

|ξj |)2

=

(∑j(X∗)

p2jj

) 2p

E max1≤j≤n

ξ2j [since σ2

j = (X∗)jj ]

≤ E max1≤j≤n

ξ2j [since X∗ is feasible for (3.2.3)]

It is well known that if ξi, i = 1, ..., n, are N (0, 1) random variables, perhaps dependent, then

E max1≤j≤n

ξ2j ≤ 2 ln(n + 1).

Thus,

A ∈ Rm×n ⇒ qA(p) ≤ Υ2(p, n), Υ(p, n) = min[α(p),

√2 ln(n + 1)

]. (3.2.6)

34

Page 43: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Bounding QA(r) from below. We have

QA(r) = E‖η‖2r

= E

(∑i|ηi|r

) 2r

≥(E

∑i|ηi|r

) 2r

[by Jensen’s Inequality, since t2r is convex in t ≥ 0]

=(∑

iρr

i αr(r)

) 2r

[since ηi ∼ N (0, ρ2i )]

= α2(r)Opt2(P ) [since X∗ is optimal for (P )]

whenceQA(r) ≥ α2(r)Opt2(P ). (3.2.7)

The result. From (3.2.6) and (3.2.7) it follows that

ψp,r(A) ≡√

QA(r)/qA(p) ≥ α(r)Υ(p, n)

Opt(P ). (3.2.8)

In fact, we have proved the following

Theorem 3.2 Let 1 ≤ r ≤ 2 ≤ p ≤ ∞ and A ∈ Rm×n. Then

‖A‖p,r ≤ Ψp,r(A) ≤ min[Υ(p, n)α(r)

,Υ(r∗,m)

α(p∗)

]

︸ ︷︷ ︸Θ(p,n,r,m)

‖A‖p,r, p∗ =p

p− 1, r∗ =

r

r − 1, (3.2.9)

where α(·) is given by (3.2.4) and

Υ(s, k) = min[α(s),

√2 ln(k + 1)

].

Proof. The left inequality in (3.2.9) states that Ψp,r(A) is an upper bound on ‖A‖p,r, whichindeed is the case. We have seen that ψp,r(A) is a lower bound on ‖A‖p,r, whence (3.2.8) impliesthat Ψp,r(A) ≤ Υ(p,n)

α(r) ‖A‖p,r. Applying this inequality to AT , r∗, p∗ in the role of A, p, r, respec-

tively, we get Ψr∗,p∗(AT ) ≤ Υ(r∗,m)α(p∗) ‖AT ‖r∗,p∗ . Taking into account that ‖A‖p,r = ‖AT ‖r∗,p∗ (see

(1.2.9)) and that Ψp,r(A) = Ψr∗,p∗(AT ) (see (3.1.10)), we conclude that Ψp,r(A) ≤ Υ(r∗,m)α(p∗) ‖A‖p,r.

We have proved the right inequality in (3.2.9).

3.2.3 Discussion

In light of Theorem 3.2, the quality of the efficiently computable upper bound Ψp,r(A) on ‖A‖p,r,1 ≤ r ≤ 2 ≤ p ≤ ∞, A ∈ Rm×n, can be described as follows.

1. Recall that α(s) grows with s > 0 and Υ(s, k) grows with s. It follows that the tightnessfactor Θ(p, n, r,m) of the bound grows with p ∈ [2,∞] and decreases with r ∈ [1, 2], sothat it is never worse than in the case of p = ∞, r = 1. Even in this worst case thetightness factor is not that bad – it increases pretty slowly with m and n:

Θ(p, n, r,m) ≤ Θ(∞, n, 1, m) =√

π ln(min[m + 1, n + 1]) (3.2.10)

35

Page 44: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Remark 3.3 In fact there is no quality deterioration at all as m,n → ∞, since by Nes-terov’s result (Proposition 1.5) we have

Ψp,r(A) ≤ 12√

3π − 2

3︸ ︷︷ ︸≡Θ∗=2.2936...

‖A‖p,r

for all m,n. In view of this inequality, (3.2.9) can in fact be strengthened to

‖A‖p,r ≤ Ψp,r(A) ≤ min[Υ(p, n)α(r)

,Υ(r∗,m)

α(p∗), Θ∗

]

︸ ︷︷ ︸Θ(p,n,r,m)

‖A‖p,r, p∗ =p

p− 1, r∗ =

r

r − 1.

(3.2.11)and Θ(p, n, m, r) clearly is bounded in the domain 1 ≤ r ≤ 2 ≤ p ≤ ∞ uniformly in m,n.

2. When either p is bounded away from ∞, or r is bounded away from 1, the tightness factorΘ is uniformly bounded in the remaining parameters:

[2 ≤] min[p,

r

r − 1

]≤ σ < ∞⇒ Θ(p, n, r,m) ≤ √

πσ. (3.2.12)

3. When p → 2 + 0, r → 2− 0, the tightness factor Θ(p, n, r,m) tends to 1 uniformly in m,n,and Θ(2, n, 2,m) = 1, so that Ψ2,2(A) = ‖A‖2,2 for all A.

Comparison with known results on the quality of the bound. As it was explained inChapter 1, Section 1.3.3, known results on the quality of the bound Ψp,r(A) deal with threesituations:

A. p = ∞, r = 2 (and the symmetric situation p = 2, r = 1), where Proposition 1.4 statesthat

Ψ∞,2(A) ≤√

π/2‖A‖∞,2; (3.2.13)

B. General “good case” 1 ≤ r ≤ 2 ≤ p ≤ ∞, where Proposition 1.5 states that

Ψp,r(A) ≤ Θ∗‖A‖p,r; (3.2.14)

C. The case of 2 ≤ p < ∞, r = 2, where Proposition 1.5 states, in addition to (3.2.14), that

Ψp,2(A) ≤ √p− 1‖A‖p,r. (3.2.15)

Case A: Observe that Θ(p, n, r,m) ≤ Υ(r∗,m)/α(p∗) ≤ α(r∗)/α(p∗), whence

Θ(p, n, 2, m) ≤ α(2)/α(p∗) = 1/α(p∗) (A).

when p grows from 2 to ∞, α(p∗) decreases from α(2) = 1 to α(1) =√

2/π, so that the righthand side in (A) grows from 1 to

√π/2. Thus, Theorem 3.2 recovers Nesterov’s result (3.2.13)

and, moreover, extends it from the case of p = ∞, r = 2, to the case of p ∈ [2,∞], r = 2, withthe right hand side in (A) being the less the less is p.Case B: While technique underlying Theorem 3.2 is unable to recover (3.2.14), this Theoremstill improves the latter relation in a reasonably large range of values of p, r, namely, when por r are not too far from 2. Specifically, we have seen that as p → 2 + 0, the tightness factor

36

Page 45: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Θ(p, n, r,m) converges to 1 uniformly in the remaining parameters, and similarly for r → 2− 0.In contrast to this, the tightness factor of the bound Ψp,r(A) as given by (3.2.14) is independentof p, r.Case C: Since Θ(p, n, 2,m) ≤ α(p)/α(2) = α(p), Theorem 3.2 states that

p ≥ 2 ⇒ Ψp,2(A) ≤ α(p)‖A‖p,2.

When p > 2, the ratio α(p)/√

p− 1, while being bounded away from 0, is < 1:

, , 0 10 20 30 40 50 60 70 80 90 1000.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Ratio α(p)/√

p− 1 vs. p ≥ 2

Thus, in the case in question Theorem 3.2 slightly improves (3.2.15), and that improvement isuniform in m,n.

3.2.4 Evaluating tightness of the bound

Theorem 3.2 presents a guaranteed upper bound Θ(p, n, r,m) on the ratio Θp,r(A) =Ψp,r(A)/‖A‖p,r; the actual ratio depends on A, and a natural question is how one could evaluatethis ratio for a given A. Here one can use the same idea which led us to Theorem 3.2, specifically,to generate a given large number N , say, N = 100 or N = 1000, realizations ζ`, ` = 1, ..., N ,of the random vector ζ considered in Section 3.2.1, that is, the Gaussian random vector in Rn

with zero mean and covariance matrix X∗ which is an optimal solution optimal solution to theoptimization problem in (3.1.1) (in actual computations, of course, X∗ will be a high-accuracyapproximation to this optimal solution). Since every one of the ratios ρ` = ‖Aζ`‖r/‖ζ`‖p is alower bound on ‖A‖p,r, so is the (random) quantity

ρN = max`

ρ`,

whence the quantityΘN (A) = Ψp,r(A)/ρN

is a valid (random) upper bound on the ratio Θ(A) = Ψp,r(A)/‖A‖p,r to be evaluated; ΘN (A)is the “quality guarantee” yielded by the outlined randomized algorithm. From the reasoningwhich led us to Theorem 3.2 we known that

E‖Aζ‖2

r −Θ2(p, n, r,m)‖ζ‖2p

≤ 0,

37

Page 46: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

so that with N large, it is highly improbable that all the ratios ρ` will be larger than Θ(p, n, r,m).In other words, we have all reasons to expect that the quality guarantee ΘN (A) will be typicallybetter (that is, less) than the theoretical quality guarantee Θ(p, n, r,m).

Note that this construction is very similar to the famous MAXCUT-related “random hyper-plane” technique of Goemans and Williamson [18]. The latter technique is designed for the caseof p = ∞, r = 1 and uses, instead of the ratios ρ`, the ratios

ρ` = ‖Aζ`‖2/‖ζ`‖∞ = ‖Aζ‖2,

where ζ` is the vector with the coordinates sign(ζ`j ).

There are several ways to improve the practical behaviour of the outlined randomized bound-ing scheme for Θp,r(A), namely, as follows:

1) We can look both on the ratios ρ` and ρ` and use, as the lower bound on ‖A‖p,r,the quantity

ρN = max`

max[ρ`, ρ`].

Moreover, we could further enrich the family of “test vectors” z (those which we useto compute the lower bounds ‖Az‖r/‖z‖p on ‖A‖p,r), including in the family all thevectors [ζ`]α with coordinates sign(ζ`

) j)|ζ`j |α, where α runs through a finite grid on

[0, 1], say, the grid 0, 0.1, 0.2, ..., 1. Note that α = 1 results in [ζ`]α = ζ`, whileα = 0 results in [ζ`]α = ζ`.

2) We can use our randomized scheme to bound from below both ‖A‖p,r and‖AT ‖r∗,p∗ (here, as always, s∗ = s

s−1), and to take, as the resulting lower boundon ‖A‖p,r, the best (the largest) of the results yielded by these two computations.Although the quantities to be bounded from below in these two computations areidentical, the computations themselves are completely different, and it may happenthat the second of them yields much better results than the first.

With the outlined modifications, our randomized scheme for bounding ‖A‖p,r from belowseems to demonstrate that the ratio Ψp,r(A)/‖A‖p,r usually is much closer to 1 than its theoret-ical upper bound Θ(p, n, r,m) (or even improved bound Θ(p, n, r,m), see (3.2.11)). To get animpression of this phenomenon, we present here the histogram of the (observed upper bound on

38

Page 47: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

the) ratio Ψ∞,1(A)/‖A‖∞,1 in a sample of 100 randomly generated 10× 10 matrices:

1 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.180

5

10

15

20

25

30

Histogram of upper bounds on Ψ∞,1(A)/‖A‖∞,1 for 100 randomly generated 10× 10 matrices

We see that observed ratios are much better than their theoretical upper bounds

Θ(∞, 10, 1, 10) = 2.7446..., Θ(∞, 10, 1, 10) = 2.2936...

3.3 Exactness of the bound for nonnegative matrices

There is a simple and important case where the upper bound Ψp,r(A) on ‖A‖p,r in fact coincideswith ‖A‖p,r – this is the case when all entries in A are nonnegative or, slightly more generally,when there exist diagonal matrices L, R with diagonal entries ±1 such that the entries in LARheare nonnegative.

Theorem 3.3 Let 1 ≤ r ≤ 2 ≤ p ≤ ∞, let A ∈ Rm×n, and let there exist diagonal matricesL ∈ Sm, R ∈ Sn with diagonal entries ±1 such that the entries in LAR are nonnegative. Then

Ψp,r(A) = ‖A‖p,r. (3.3.1)

Proof. Let A = LAR, so that A is a matrix with nonnegative entries; note that A = LAR,since L2 = Im, R2 = In. Let also X∗ be an optimal solution to the optimization problem in(3.1.1), so that

(a) ‖dg(X∗)‖ p2

≤ 1

(b) Ψp,r(A) =√‖dg(AX∗AT )‖ r

2=

√‖dg(LARX∗RAT L)‖ r

2

=√‖dg(ARX∗RA‖ r

2.

(3.3.2)

Let x ∈ Rn be the vector with the coordinates xi =√

(X∗)ii, i = 1, ..., n; note that ‖x‖p ≤ 1by (3.3.2.a) says that ‖x‖p ≤ 1. Note that since the matrix Y = RX∗R is º 0, we haveZij ≡ |Yij | ≤

√YiiYjj =

√(X∗)ii(X∗)jj = xixj . Taking in account that A is nonnegative, we

conclude that0 ≤ dg(AY AT ) ≤ dg(AZAT ) ≤ dg(AxxT AT ) = [Ax]2,

39

Page 48: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

and thus‖Ax‖r =

√‖dg(AxxT AT )‖ r

2≥

√‖Dg(AY AT )‖ r

2= Ψp,r(A),

where the concluding equality is (3.3.2.b). Thus, ‖x‖p ≤ 1 and ‖Ax‖r ≥ Ψp,r(A), whence,setting y = Rx and taking into account that R,L are diagonal matrices with diagonal entries±1, ‖y‖p ≤ 1 and ‖Ay‖r = ‖[LAR][Rx]‖r = ‖LAx‖r = ‖Ax‖r ≥ Ψp,r(A). Thus, we have builta vector y with ‖y‖p ≤ 1 such that ‖Ay‖r ≥ Ψp,r(A), whence ‖A‖p,r ≥ Ψp,r(A). Since theopposite inequality is always true, we conclude that Ψp,r(A) = ‖A‖p,r.

Remark 3.4 In fact, for a matrix A of the structure considered in Theorem 3.3, the norm‖A‖p,r is efficiently computable in the range 1 ≤ r ≤ p < ∞ of values of p, r which is wider, asfar as finite r, p are concerned, than the range considered in Theorem 3.3. Specifically, in thenotation from the proof of Theorem, we clearly have ‖A‖p,r = ‖Ap,r and Aij = |Aij |; thus, whencomputing the norm of A, we can assume w.l.o.g. that Aij ≥ 0. In this case we clearly have‖Ax‖r ≥ ‖A|x|‖r, where |x| is the vector with the coordinates |xj |; note that ‖x‖p = ‖|x|‖p. Inother words,

‖A‖rp,r = max

x,‖x‖p≤1‖A|x|‖r

r = maxuj=|xj |p≥0

∑i

ui≤1

i

j

Aiju1p

j

r

.

Since p ≥ r and Aij ≥ 0, the function

i

j

Aiju1p

j

r

us concave on the domain u ≥ 0, and therefore it can be efficiently maximized over the simplexu ≥ 0,

∑j

uj ≤ 1; thus, ‖A‖p,r is efficiently computable (as the optimal value in an efficiently

solvable convex optimization problem).

40

Page 49: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Chapter 4

Approximating ‖A‖p,r in the entirerange of p, r

Recall that the approximation results for the problem Pp,r presented so far deal with certainsub-region R in the entire range R+ = (p, r) : 1 ≤ r, p ≤ ∞ of the parameters p, r. In thisChapter, we “interpolate” these results from R to R+. To avoid trivialities, from now on weassume that the matrix A in question is nonzero.

4.1 Developing tools

It is convenient for us to pass from the parameters p, r defining the induced norm ‖A‖p,r of anm×n real matrix A to the parameters α = 1/p, β = 1/r; with this parameterization, the entirerange of the values of the norm parameters becomes the unit square S = (α, β) : 0 ≤ α, β ≤ 1.Given a nonzero m× n matrix A, we set

φA(α, β) = ln ‖A‖ 1α

, 1β. (4.1.1)

Well-known properties of this function to be heavily exploited in the sequel are summarized inthe following

Proposition 4.1 Let A ∈ Rm×n, A 6= 0. Then(i) Function φA(α, β) are well defined and continuous on the square S;(ii) One has

φA(α, β) = φAT (1− β, 1− α); (4.1.2)

(iii) Function φA(α, β) is(iii.1) Convex, nondecreasing and Lipschitz continuous, with constant ln n, in β ∈ [0, 1] for

every α ∈ [0, 1];(iii.2) Convex, nonincreasing and Lipschitz continuous, with constant ln m, in α ∈ [0, 1] for

every β ∈ [0, 1].

Proof. (i) is evident. (ii) is nothing but the identity (1.2.10). To prove (iii.1), note that bydefinition of the norm one has

ψA(α, β) = maxx:‖x‖ 1

α≤1

fx(β), fx(β) = ln(‖Ax‖ 1β),

41

Page 50: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

where, by definition, ln(0) = −∞. We see that for α fixed, ψA(α, β) is the maximum of a familyof functions fx(β). Every one of these functions is either ≡ −∞ (this is the case when Ax = 0),or is a nondecreasing Lipschitz continuous, with constant lnn, convex function of β ∈ [0, 1](see Proposition 1.1). Since the supremum of a family of convex nondecreasing functions withcommon Lipschitz constant L is convex, monotone and Lipschitz continuous with constant L,(iii.1) follows. Finally, (iii.2) follows from (iii.1) by virtue of (4.1.2).

Our main “instrument” in the sequel will be the following simple observation.

Lemma 4.1 Let f(x) be a monotone convex and Lipschitz continuous, with constant L, functionon a segment [a, b], a ≤ b, and let c = λa + (1− λ)b with λ ∈ (0, 1). Then

λf(a) + (1− λ)f(b)− L(b− a)λ(1− λ) ≤ λf(a) + (1− λ)f(b). (4.1.3)

Proof. The right inequality in (4.1.3) is given by the convexity of f . Let us prove the rightinequality. Denoting by f ′(a) the right derivative of f at a and by f ′(b) the left derivative of fat b and taking into account that f is convex on [a, b], we have

f(c) ≥ f(a) + f ′(a)(c− a) = f(a) + (1− λ)f ′(a)(b− a),f(c) ≥ f(b) + f ′(b)(c− b) = f(b)− λf ′(b)(b− a).

Multiplying the first inequality by λ, the second - by 1− λ and summing up the results, we get

f(c) ≥ λf(a) + (1− λ)f(b) + λ(1− λ)(b− a)[f ′(a)− f ′(b)].

Since f is monotone, both f ′(a) and f ′(b) are of the same sign, so that [f ′(a) − f ′(b)] ≥−max[|f ′(a)|, |f ′(b)|]. Since f is Lipschitz continuous with constant L on [a, b], we furtherhave max[|f ′(a)|, |f ′(b)|] ≤ L. Thus, f(c) ≥ [λf(a) + (1− λ)f(b)]− Lλ(1− λ)(b− a).

4.2 Interpolating the norm bound

4.2.1 Preliminaries

For the time being, we already have built an efficiently computable upper bound ΨA(α, β) onthe norm ‖A‖ 1

α, 1β

of an m × n matrix A on a part of the square S = (α, β) : 0 ≤ α, β ≤ 1.This upper bound is defined as follows (see Fig. 4.1 to which we shall many times refer in thesequel).

A. On the segment α = 1, 0 ≤ β ≤ 1 (red segment BC on Fig. 4.1) the bound ΨA(α, β) isgiven by

ΨA(α, 0) = max1≤j≤n

‖Aj‖ 1β, (4.2.1)

where Aj is j-th column of matrix A. This segment corresponds to the first of the cases, listedin Section 1.3.1, where ‖A‖·,· is efficiently computable, and the bound ΨA(1, β) is exactly thecorresponding norm ‖A‖1, 1

βof A (see Section 1.3.1). Consequently, on the segment in question

the quality of the bound is as good as it could be:

ΨA(1, β)/‖A‖1, 1β

= Ω(1, n, β, m) ≡ 1. (4.2.2)

42

Page 51: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

AB

CD

M

P

Q

R

S

U

U’

U’’

VV’ V’’

Figure 4.1: Partition of the domain S = (α, β) : 0 ≤ α, β ≤ 1 (square ABCD, α along theX-axis, β along the Y -axis, M = (1/2, 1/2))

B. On the segment 0 ≤ α ≤ 1, β = 0 (red segment AB on Fig. 4.1) the bound ΨA(α, β) isgiven by

ΨA(α, 0) = max1≤i≤m

‖Ai‖ 11−α

, (4.2.3)

where (Ai)T is i-th row of A. This segment corresponds to the second of the cases, listed inSection 1.3.1, where ‖A‖·,· is efficiently computable, and the bound ΨA(α, 0) is exactly thecorresponding norm ‖A‖ 1

α,∞. Here again the bound is as good as it could be:

ΨA(α, 0)/‖A‖ 1α

,0 = Ω(α, n, 0,m) ≡ 1. (4.2.4)

C. In the square 0 ≤ α ≤ 1/2, 1/2 ≤ β ≤ 1 (blue square SMRD on Fig. 4.1) the bound is

ΨA(α, β) ≡ Ψ 1α

, 1β(A), (4.2.5)

see (3.1.1), (3.1.2), (3.1.10). This square corresponds to the “good case” 1 ≤ r ≡ 1β ≤ 2 ≤ p ≡

1α ≤ ∞ considered in Chapter 3, and in this square the quality of the bound can be quantifiedaccording to Remark 3.3:

ΨA(α, β)/‖A‖ 1α

, 1β≤ Ω(α, n, β, m) ≡ Θ(1/α, n, 1/β,m) ≤ 2.2936... (4.2.6)

(see (3.2.11) and Theorem 3.2). In the square in question, the tightness factor Ω, while beingbounded from above by the absolute constant Θ∗, depends on α, β, m, n and is > 1, except forthe case of α = β = 1/2 (red point M on Fig. 4.2), where the bound is exact; the latter pointcorresponds to the third of the situations, listed in Section 1.3.1, where ‖A‖·,· is easy to compute.

43

Page 52: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

4.2.2 Interpolating the norm bound

Relations (4.2.1), (4.2.3), (4.2.5) define certain efficiently computable upper bound ΨA(α, β) onthe norm ‖A‖ 1

α, 1β

on certain part S∗ of the unit square S (blue and red areas on Fig. 4.1). Ourcurrent goals are

(a) to extend ΨA(·, ·) to the remaining part of S in such a way that the extended functionΨA(α, β) still is an efficiently computable upper bound on ‖A‖ 1

α, 1β

everywhere on S;

(b) To evaluate the quality of the extension, that is, to build an upper bound Ω(α, n, β,m)on the quantity sup

A∈Rm×n

ΨA(α, β)/‖A‖ 1α

, 1β

(note that in S∗, the quantity Ω has been already

defined, see (4.2.2), (4.2.4), (4.2.6).We can simplify somehow our task by noting that the identity

‖A‖ 1α

, 1β

= ‖AT ‖ 11−β

, 11−α

allows us to restrict our attention to the part S− = 0 ≤ α, β ≤ 1, α + β ≤ 1 (triangle ABD onFig. 4.1). Indeed, after both ΨA(α, β) and Ω(α, n, β, m) are extended onto this triangle, we canfurther extend these quantities from S− to the entire S by setting for (α, β) ∈ S+ ≡∈ S\S− =0 ≤ α, β ≤ 1, α + β > 1:

ΨA(α, β) = ΦAT (1− β, 1− α)Ω(α, n, β,m) = Ω(1− β,m, 1− α, n).

(4.2.7)

Formally speaking, (4.2.7) redefines ΨA(·, ·) and Ω on a part of S+, specifically, on the segmentα = 1, 0 ≤ β ≤ 1 (red segment BC on Fig. 4.1) and the triangle 0 ≤ α ≤ 1/2 ≤ β ≤ 1, α+β >1 (blue triangle MRD on Fig. 4.1) which belong to the “original” domain of definition S∗ of thequantities in question. In fact, there is no collision at all, since ΨA and Ω as defined in Section4.2.1 possess the required symmetry due to the relations

Ψp,r(A) = Ψ rr−1

, pp−1

(A), Θ(p, n, r,m) = Θ(r

r − 1,m,

p

p− 1, n), 1 ≤ r ≤ 2 ≤ p ≤ ∞

(see (3.1.10), (3.2.11), (3.2.9)) and

ΨA(1, β) = ΨA(1− β, 0), Ω(1, n, β, m) = Ω(1− β,m, 0, n) = 1

(see (4.2.1), (4.2.3), (4.2.2), (4.2.4)).In view of the outlined remarks, it suffices to achieve goals (a), (b) in the domain S− of

values of α, β, that is, in the triangle ABD on Fig. 4.1.We shall extend ΨA(·, ·), Ω(·, n, ·,m) from S∗ to the triangle S− in two stages: first, to the

square 0 ≤ α, β ≤ 1/2 (square APMS on Fig. 4.1) and then to the triangle 1/2 ≤ α ≤ 1, 0 ≤β ≤ 1/2, α + β ≤ 1 (triangle PBM on Fig. 4.1).

Extending ΨA, Ω to the square APMS

Let (α, β) be a point of the square APMS where the quantities Ψ, Ω are not defined yet, thatis, let

α ∈ [0, 1/2], β ∈ (0, 1/2) (4.2.8)

(point U on Fig. 4.1). Settingλ ≡ λ(β) = 1− 2β,

44

Page 53: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

we have β = λ · 0 + (1− λ) ∗ 12 . Applying item (iii.2) of Proposition 4.1, the function φA(α, s) =

ln(‖A‖ 1α

, 1s, regarded as a function of s, is convex, monotone and Lipschitz continuous. with

constant lnm, on the segment [1, 1/2]. Applying Lemma 4.1, we conclude that

ψA(α, β) ≤ λψA(α, 0) + (1− λ)ψA(α, 1/2) ≤ ψA(α, β) +ln m

2λ(1− λ),

or, which is the same,

‖A‖ 1α

, 1β≤

(‖A‖ 1

α,0

)1−2β (‖A‖ 1

α, 12

)2β ≤ mβ(1−2β)‖A‖ 1α

, 1β

(4.2.9)

Now recall that the quantities ‖A‖α,0 and ‖A‖ 1α

, 12

admit efficiently computable upper bounds

ΨA(α, 0), ΨA(α, 1/2); the first of these bounds is exact (Ω(α, 0) = 1), while the second is tightwithin the factor Ω(α, n, 1/2,m) ≤ √

π/2. Therefore the first inequality in (4.2.9) implies that

0 ≤ α, β ≤ 1/2 ⇒ ‖A‖ 1α

, 1β≤ ΨA(α, β) ≡ Ψ1−2β

A (α, 0)Ψ2βA (α, 1/2), (4.2.10)

andΨ1−2β

A (α, 0)Ψ2βA (α, 1/2)

≤ Ω1−2β(α, n, 0,m)Ω2β(α, n, 1/2,m)(‖A‖ 1

α,∞

)1−2β (‖A‖ 1

α, 12

)2β

≤ Ω1−2β(α, n, 0,m)︸ ︷︷ ︸=1

Ω2β(α, n, 1/2,m)mβ(1−2β)‖A‖ 1α

, 1β,

where the concluding ≤ is given by the second inequality in (4.2.9). Thus, when 0 ≤ α, β ≤ 1/2,we have

ΨA(α, β)/‖A‖ 1α

, 1β≤ Ω(α, n, β, m) ≡ mβ(1−2β)Ω2β(α, n, 1/2,m). (4.2.11)

Note that (4.2.10) extends the quantity ΨA(α, β), previously defined only on the top and thebottom sides of the square APMS, to the entire square, and this extension ensures the require-ment (a). Indeed, (4.2.10) says that ΨA(α, β) is an upper bound on ‖A‖ 1

α, 1β

everywhere in the

square 0 ≤ α, β ≤ 1/2, and this bound clearly is efficiently computable along with our “old”bounds ΨA(α, 0) and ΨA(α, 1/2), 0 ≤ α ≤ 1/2.

Similarly, (4.2.11) extends the upper bound Ω(α, n, β, m) on the ratio ΨA(α, β)/‖A‖ 1α

, 1β

from

the top and the bottom sides of the square APMS to the entire square. Since Ω2β(α, n, 1/2,m) ≤√π/2, we have

0 ≤ α, β ≤ 1/2 ⇒ Ω(α, n, β,m) ≤(

π

2

mβ(1−2β) ≤ π

2m

18 . (4.2.12)

Note that both ΨA(α, β) and Ω(α, n, β, m) are continuous in α, β on their new domain.

Extending ΨA, Ω to the triangle PBM

As a result of the previous stage, we have extended ΨA(α, β) to the segment α = 1/2, 0 ≤ β ≤ 1(segment PM on Fig. 4.1):

(a) ‖A‖2, 1β

≤ ΨA(1/2, β) ≡ Ψ1−2βA (1/2, 0)Ψ2β

A (1/2, 1/2),

(b) ΨA(1/2, β)/‖A‖2, 1β

≤ Ω(1/2, n, β, m) ≡ mβ(1−2β) (4.2.13)

45

Page 54: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

(the concluding inequality in (4.2.13.b) is due to Ω(1/2, n, 1/2,m) = 1). We now can extendΨA to the triangle 1/2 ≤ α ≤ 1, 0 ≤ β ≤ 1/2, α + β ≤ 1 (triangle PBM on Fig. 4.1) byinterpolation completely similar to the one used at the previous stage. Specifically, given apoint (α, β) from the triangle (point V on Fig. 4.1) and setting

λ = λ(α) = 2(1− α),

so thatα = λ · 1

2+ (1− λ) · 1

and taking into account item (iii.1) of Proposition 4.1 and Lemma 4.1 as applied to the restrictionof ψA(s, β) on the segment 1/2 ≤ s ≤ 1 (segment V′V′′ on Fig. 4.1), we get

‖A‖ 1α

, 1β≤ ‖A‖λ

2, 1β‖A‖1−λ

1, 1β

≤ n12λ(1−λ)‖A‖λ

2, 1β‖A‖1−λ

1, 1β

. (4.2.14)

Recalling that we already have in our disposal efficiently computable upper bounds ΨA(1/2, β)on ‖A‖2, 1

β(see (4.2.13.a)) and ΨA(1, β) on ‖A‖1, 1

β(the latter bound is in fact exact), (4.2.14)

implies that1/2 ≤ α ≤ 1, 0 ≤ β ≤ 1/2, α + β ≤ 1⇒ ‖A‖ 1

α, 1β≤ ΨA(α, β) ≡ Ψ2(1−α)

A (1/2, β)Ψ2α−1A (1, β) (4.2.15)

andΨ2(1−α)

A (1/2, β)Ψ2α−1A (1, β)

≤ Ω2(1−α)(1/2, n, β,m) Ω2α−1(1, n, β, m)︸ ︷︷ ︸=1

(‖A‖2, 1

β

)2(1−α) (‖A‖1, 1

β

)2α−1

[we have used (4.2.13.b) and (4.2.2)]≤ Ω2(1−α)(1/2, n, β,m)n(1−α)(2α−1)‖A‖ 1

α, 1β,

where the concluding ≤ is given by the second inequality in (4.2.14). Thus, when 1/2 ≤ α ≤ 1,0 ≤ β ≤ 1/2 and α + β ≤ 1 we have

ΨA(α, β)/‖A‖ 1α

, 1β≤ Ω(α, n, β, m) ≡ n(1−α)(2α−1)Ω2(1−α)(1/2, n, β,m). (4.2.16)

Relation (4.2.15) extends the quantity ΨA(α, β), previously defined only on the left and the rightsides of the square PBQM, to the triangle PBM, and this extension ensures the requirement (a).Indeed, (4.2.15) says that ΨA(α, β) is an upper bound on ‖A‖ 1

α, 1β

everywhere in the triangle,

and this bound clearly is efficiently computable along with our “old” bounds ΨA(1/2, β) andΨA(1, β), 0 ≤ β ≤ 1/2.

Similarly, (4.2.16) extends the upper bound Ω(α, n, β, m) on the ratio ΨA(α, β)/‖A‖ 1α

, 1β

from

the left and the right sides of the square PBQM to the triangle PBM. Since Ω2(1−α)(1/2, n, β, m) ≤m2β(1−2β)(1−α) by (4.2.13), we have

1/2 ≤ α ≤ 1, 0 ≤ β ≤ 1/2, α + β ≤ 1 ⇒ Ω(α, n, β, m) ≤ m2β(1−2β)(1−α)n(1−α)(2α−1). (4.2.17)

We have finally extended ΨA and ΩA on the entire triangle ABD (and thus, via (4.2.7), on theentire square S = 0 ≤ α, β ≤ 1).

46

Page 55: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

11

1.5

2

2.5

B

A

D

0

0.2

0.4

0.6

0.8

1

00.1

0.20.3

0.40.5

0.60.7

0.80.9

1

0

5

10

B

A

D

0

0.2

0.4

0.6

0.8

1

0

0.10.2

0.3

0.40.5

0.6

0.70.8

0.9

1

0

5

10

B

A

D

Ω(α, 100, β, 100) Ω(α, 100000, β, 20000) Ω(α, 100000, β, 100000)max

0≤α,β≤1Ω = 2.4582... max

0≤α,β≤1Ω = 8.3607... max

0≤α,β≤1Ω = 9.4746...

Figure 4.2: Sample graphs of Ω(α, n, β,m) as a function of α, β ∈ [0, 1]. α along axis AB, βalong axis AD.

4.2.3 A rough summary

We have defined an efficiently computable upper bound ΨA(α, β) on the norm ‖A‖ 1α

, 1β

of an

m × n matrix in the entire range 0 ≤ α, β ≤ 1 of the values of α, β. A rough summary ofour results on the quality of this bound is as follows (for the sake of simplicity, we bound thisquality in terms of the larger size k = max[m,n] of A). The ratio ΨA(α, β)/‖A‖ 1

α, 1β

— equals to 1 when α = β = 1, same as when α = 1 and when β = 0 (red areas on Fig. 4.1);— is bounded from above by the absolute constant Θ∗ = 2.2936... in the domain 0 ≤ α ≤

1/2 ≤ β ≤ 1 (the blue square on Fig. 4.1);— is bounded from above by the quantity π

2 k18 in the domains 0 ≤ α, β ≤ 1/2 and 1/2 ≤

α, β ≤ 1 (squares AMPS and MOCR on Fig. 4.1). In the case m = n = k of square matrices,the “most difficult” situations in these domains (those for which Ω(α, k, β, k) grows with k asO(1)k

18 ) are those of β = 1/4, 0 ≤ α ≤ 1/2, and of α = 3/4, 1/2 ≤ β ≤ 1;

— is bounded from above by the quantity kγ in the domain 0 ≤ β ≤ 1/2 ≤ α ≤ 1 (thesquare PBQM on Fig. 4.1), where (cf. (4.2.17))

γ = maxα,β

2β(1− 2β)(1− α) + (1− α)(2α− 1) :

α + β ≤ 1,0 ≤ β ≤ 1/2 ≤ α ≤ 1

=

25128

= 0.1953125

In the case m = n = k of square matrices, the “most difficult” situations here (those whereΩ(α, k, β, k) grows with k as O(1)kγ) are the symmetric to each other cases of α = 11

16 , β = 14

and α = 34 , β = 5

16 .Sample graphs of the quantity Ω(α, n, β, m) are presented on Fig. 4.2.

47

Page 56: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

48

Page 57: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Chapter 5

Bounding ‖A‖‖·‖p,|·|∞

In this Chapter, we build efficiently computable upper bound on the norm ‖A‖‖·‖p,|·|∞ of a linearmapping acting from Rn, ‖ · ‖p to the space Rm×n of m× n matrices equipped with the matrixnorm |A|∞ = ‖A‖2,2. From now on, we simplify the notation ‖A‖‖·‖p,|·|∞ to ‖A‖p,∞; to avoidcollisions with similar notation ‖A‖p,q = ‖A‖‖·‖p,‖·‖q

for a norm of a matrix, we will use scriptcapital letters to denote elements of L(Rn,Rm×n), while keeping, as always, the Roman capitalsto denote matrices (that is, elements of L(Rp,Rq)).

5.1 Motivation: Robust Semidefinite Programming

5.1.1 Preliminaries

Recall that Robust Optimization (RO) methodology considers a family P of instances – usualoptimization problems of common structure (including objective) and data of the constraintsrunning through a given uncertainty set U . RO and associates with such a family its RobustCounterpart (RC) – a semiinfinite optimization program where one seeks to minimize the original(common for all instances) objective over robust feasible solutions – solutions which are feasiblefor all instances of P. Even with “nice” – convex and efficiently solvable – instances, the RC ofP has infinitely many (convex) constraints and as such can be intractable. Whether the latterindeed is the case, it depends on structure of the instances. For structures of primary interest(uncertain Linear/Convex Quadratic/Semidefinite problems) the situation with the “tractabilitystatus” of RC is as follows.

1. In the simplest case of uncertain Linear Programming, where instances of P are linearprograms min

xcT x : Ax ≥ b with common objective and common number of linear

inequality constraints, and the uncertainty set U is a set in the space of the data [A, b] of theinstances, the RC of P is computationally tractable, provided that U is a computationallytractable convex set (e.g., a set given by finitely many explicit convex constraints) [3, 4,16, 17];

2. In the case of uncertain Conic Quadratic Programming, when the instances of P are conicquadratic programs

minx

cT x : ‖Aix− bi‖2 ≤ cT

i x− di, i = 1, ..., m

49

Page 58: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

with common objective, common number m of conic quadratic constraints and instance-independent dimensions of bi, i = 1, ...,m and the data ξ = Ai, bi, ci, dim

i=1 runningthrough a convex uncertainty set U in the space RN of appropriately chosen dimension N(equal to the total number of entries in the data), effective solvability of the RC dependson the geometry of U . Specifically, the RC is computationally tractable when U is anellipsoid, and may become computationally intractable for U as simple as a box in thespace RN of the data.

3. The most difficult case is the one of uncertain Semidefinite Programming, where the in-stances are of the from

minx∈Rn

cT x : Ai[x] ≡ A0

i +∑

j

xjAji º 0, i = 1, ...,m

, (5.1.1)

where Aji ∈ Sνi . It is always assumed that all instances of P have common “structure”

(c,m, n1, ..., nm), while the remaining data Aji ∈ Sνi , j = 0, ..., nm

i=1 run through a givenuncertainty set U in the corresponding data space RN (with N equal to the number of

free entriesm∑

i=1dimSνi in the uncertain data).

In the case of uncertain Semidefinite problem P, the RC, aside of few very spacial cases[2, 1], is computationally intractable. In particular, the latter is the case already for U assimple as an ellipsoid [20, 6].

5.1.2 Approximate Robust Counterparts

When the RC of an uncertain problem P is computationally intractable and thus essentiallyuseless, the RO methodology suggests to use the “second best” – an approximate RC defined asfollows [3]:

Assume, as it usually is the case in applications, that the uncertainty set U of theuncertain problem in question belongs to a parametric family of the form

Uρ = ξ∗ + ρW,

where ξ∗ is the “nominal” data, W is a symmetric w.r.t. the origin convex compactset in the space RN of the data – the set of data perturbations of magnitude ≤ 1,and ρ ≥ 0 is the “uncertainty level”, so that ρ = 0 corresponds to no uncertainty atall, and the larger is ρ, the larger is the uncertainty set. W.l.o.g. we may assumethat the actual uncertainty set U is U1. With this approach, both the uncertainproblem and its robust counterpart become members, corresponding to ρ = 1, ofparameterized problem families P = Pρρ≥0, RC = RCρρ≥0.

Now, a computationally tractable optimization problem

minx,u

cT x : S(x, u) ≤ 0

(RC

is called an approximate Robust Counterpart of RC1, if it has the same objectivecT x as the RC (and therefore as all the instances of P), and the projection X of thefeasible set of RC onto the space of x-variables is contained in the feasible set X1

50

Page 59: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

of the “true” Robust Counterpart. In other words, replacing intractable RC withits tractable approximation, we still are “on the safe side” – the x-components offeasible solutions to the approximation are robust feasible solutions of P = P1.

The quality of an approximate RC is quantified by its tightness factor – the infinumθ of those ρ ≥ 1 for which X (this set is contained X1 by the definition of anapproximation) contains the feasible set Xρ of the RC RCρ. Thus, a tight within afactor θ approximate RC of P is “at least as conservative” as the true RC of theproblem, and is “at most as conservative” as the true RC of the problem with θ timeslarger uncertainty level. When the tightness factor is moderate, the approximate RCin question is, from the practical viewpoint, a reasonable substitution of the true RC.

Results on “tight” approximate RC’s are known primarily for uncertain Conic Quadratic pro-gramming [9]. They state that when the uncertainty space U is the intersection of a finitenumber K of concentric ellipsoids in the data space RN :

D = ξ = ξ∗ + ∆ξ ∈ RN : P∆ξ = 0, ∆ξT Sk∆ξ ≤ ρ2, k = 1, ..., K,where Sk º 0 and

∑k

Sk  0, and the right hand side data ci, di in conic quadratic constraints

are certain (i.e., that P∆ξ = 0 implies that ∆ci = 0, ∆di for all i; this assumption can besomehow relaxed), then the RC admits a computationally tractable approximation tight up tofactor O(1)

√ln(K + dimx) (which, for all practical purposes, is at most a moderate constant).

There are no similar approximation results for uncertain Semidefinite programming, aside of twoexceptions. The first exception is the results of [8], where U is a special type box (we shall presentthese results in full details later). The second exception are the results of [6] corresponding tothe case when U is a general type box or a general type ellipsoid in RN . For both cases, [6]offers approximate RC’s with tightness factors O(1)

√min[N, ν], where ν = max

iνi is the largest

row size of the matrices A0i , i = 1, ..., m. (which is also called the size of the LMI (Linear Matrix

Inequality) A0i +

∑j

xjAji º 0).

5.1.3 Robust Counterpart of uncertain Linear Matrix Inequality and Normbounding

Observe that the RC of uncertain Semidefinite problem P with instances of the form (5.1.1) is a“constraint-wise” notion: when passing from P to its RC, we preserve the objective and replaceevery one of LMIs

A0i +

j

xjAji º 0

with its Robust Counterpart

A0i +

n∑

j=1

xjAji º 0∀[A0

i , ..., Ani ] ∈ Ui,

where Ui is the projection of the uncertainty set U on the subspace of the data of i-th LMIAi[x] º 0. Now, replacing the RC’s of every one of the uncertain LMIs with their θ-tightapproximate RC’s, we arrive at θ-tight approximate RC of P. Thus, we may focus on theproblem of building an approximate RC

A0 +∑

j

xjAj º 0 ∀[A0, ..., An] ∈ U (5.1.2)

51

Page 60: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

of a single uncertain LMI with Aj ∈ Sν .Assume, as it usually is the case in applications, that the uncertain data [A0, ..., An] in (5.1.2)

is affinely parameterized by a perturbation vector ξ ∈ Rk:

U = [A0, ..., An] = [A0∗, ..., A

n∗ ] +

k∑

`=1

ξ`[A0` , ..., A

n` ], ξ ∈ V, (5.1.3)

where V is a centered at the origin convex solid (convex compact set with a nonempty interior)in Rk, or, which is the same, V is the unit ball of certain norm ‖ · ‖ on Rk:

V = ξ ∈ Rk : ‖ξ‖ ≤ 1.We are about to demonstrate that

(!) The Analysis problem associated with (5.1.2), that is, the problem of checkingwhether a given x is or is not feasible for (5.1.2), is equivalent to the problem ofcomputing the (‖ · ‖, | · |∞)-norm of an appropriate mapping Ax : Rk → Sν .

Indeed, when substituting in the left hand side of (5.1.2) representation of uncertain data viathe perturbation vector, (5.1.2) becomes

k∑

`=1

ξ`A`[x] ¹ A[x] ∀(ξ ∈ Rk : ‖ξ‖ ≤ 1), (5.1.4)

where A[x] and A`[x], ` = 1, ..., k, are affine matrix-valued functions taking values in Sν andreadily given by the parameterization of the data via ξ:

A[x] = A0∗ +

n∑

j=1

xjAj∗, A`[x] = −A0

` −n∑

j=1

xjAj` .

Now, there is an evident necessary condition

A[x] º 0

for (5.1.4) to be valid at a given point x; moreover, it is easily seen that in fact the restrictionof A[x] onto the linear span of the image spaces of A`[x], ` = 1, ..., k, should be positive definite.In order to avoid unessential technicalities, let us strengthen this assumption to

A[x] Â 0, (5.1.5)

and let us ask ourselves when (5.1.4) indeed is valid, given that (5.1.5) holds true. The answeris immediate: in the case of (5.1.5) relation (5.1.4) hods true if and only if

Axξ ≡k∑

`=1

A−1/2[x]A`[x]A−1/2[x]︸ ︷︷ ︸A`

x

¹ Iν ∀(ξ : ‖ξ‖ ≤ 1), (5.1.6)

or, which is clearly the same, if and only if

−Iν ¹ Axξ ¹ Iν ∀(ξ : ‖ξ‖ ≤ 1). (5.1.7)

The two-sided inequality in the left hand side of ¹ clearly is equivalent to |Ax|∞ ≤ 1, and wehave arrived at the following simple

52

Page 61: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Proposition 5.1 Let the set V in (5.1.3) be the unit ball of a norm ‖ ·‖ on Rk, and let a designvector x ∈ Rn be such that (5.1.5) holds true (which is “nearly necessary” for the validity of(5.1.4)). Then x is feasible for the Robust Counterpart (5.1.4) of the uncertain LMI in questionif and only if

‖Ax‖‖·‖,|·|∞ ≤ 1,

where Ax ∈ L(Rk,Sν) is given by (5.1.6).

Proposition 5.1 reduces the problem of checking whether a given x is feasible for (5.1.4) tocomputing the associated norm of the linear mappingAx. When the latter problem is intractable,but we have in our disposal a tight efficiently computable upper bound Ψ(A) on ‖Ax‖‖·‖,|·|∞ ,the efficiently verifiable constraint

Ψ(Ax) ≤ 1 (5.1.8)

is an approximation of the RC (5.1.4) of the uncertain LMI in question (up to the fact that (5.1.8)not necessarily is a convex constraint on x), and the tightness factor of this approximation isat least as good as the tightness factor of the upper bound Ψ(·) on the norm ‖ · ‖‖·‖,|·|∞ . Thus,building approximate RC of our uncertain LMI can be reduced to bounding from above thelatter norm. We are about to investigate the latter problem in the case when ‖ · ‖ is ‖ · ‖p,1 ≤ p ≤ ∞, that is, in the case when the norm to be bounded is ‖A‖p,∞.

5.2 Bounding ‖A‖p,∞

In the sequel, we focus on building efficiently computable upper bound on the quantity ‖A‖p,∞,where

Aξ =k∑

`=1

ξ`A` : Rk → Sν ; (5.2.1)

here A` are symmetric ν × ν matrices (“columns” of the linear mapping A).

Computational status of the problem. There is not that much known about the compu-tational status of the problem of computing ‖A‖p,∞ for generic A ∈ L(Rk,Sν). Of course, theproblem is easy when p = 1 (see Section 1.3.1):

‖A‖1,∞ = max1≤`≤k

|A`|∞ ≡ max1≤`≤k

‖A`‖2,2. (5.2.2)

This is the only case when the problem is known to be easy; on the other hand, the only knowncases when the problem is provably NP-hard are those of p = 2 [20] and p = ∞ [8]. Whenp = ∞, the problem is NP-hard already when the “columns” A` are restricted to be of ranks≤ 2. The latter case covers the MAXCUT problem [8], which implies, in particular, that itis NP-hard to approximate ‖A‖∞,∞ within 4% accuracy (see Section 1.3.2), even when A` arerestricted to be of ranks ≤ 2.

Known approximation results for the problem of computing ‖A‖p,∞ are restricted to thecases of p = 2 and p = ∞.

• For the case of p = 2, an efficiently computable upper bound on ‖A‖2,∞ was proposed in[6], where it is proven that the tightness factor of this bound does not exceed min[k

12 , ν

12 ].

53

Page 62: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

• For the case of p = ∞, an efficiently computable upper bound on ‖A‖∞,∞ was proposedin [6], where it was proven that the bound is tight within the factor

√kν. This result was

significantly improved in [8], where it was proven that the tightness factor of the bound inquestion does not exceed O(1)

√µ, where µ is the maximal rank of the matrices A`.

Strategy. Summarizing the above discussion, we have in our disposal efficiently computableupper bounds on ‖A‖p,∞ for the cases of p = 1 (where the bounds in fact are exact) and ofp = ∞, and our strategy will be to “interpolate” these bounds from the endpoints 1, ∞ of therange of p to the entire range of p, similarly to what was done in Chapter 4.

Before implementing our strategy, we present the results on bounding ‖A‖∞,∞ from [8].

Matrix Cube Theorem and bounding ‖A‖∞,∞. Similarly to Section 5.1.3, we start fromthe following simple observation:

(O): The relation ‖A‖∞,∞ ≤ t is equivalent to the fact that

`

ξ`A` ¹ tI ∀(ξ : ‖ξ‖∞ ≤ 1). (5.2.3)

A simple sufficient condition for the validity of (5.2.3) is given by the following immediateobservation:

Lemma 5.1 Assume that there exist matrices X` ∈ Sν such that

X` º A`, X` º −A`

and ∑

`

X` ¹ tIν .

Then (5.2.3) is valid.

Indeed, with X` as in the premise of Lemma we have∑`

ξ`A` ¹ ∑

`X` ¹ tI for all ξ with

‖ξ‖∞ ≤ 1.Lemma 5.1 implies the following simple

Corollary 5.1 The optimal value in the explicit convex optimization problem

Ψ∞(A) = mint,X1,...,xk

t : X` º A`, X` º −A`, ` = 1, ..., k,

`

Xi ≤ tI

, (5.2.4)

where A` are the “columns” of the linear mapping Aξ =∑`

ξ`A` : Rk → Sν , is an upper bound

on ‖A‖∞,∞.

The Matrix Cube Theorem [8] quantifies the quality of the bound Ψ∞(A) in terms of the maximalrank of the matrices A`. The result is as follows:

54

Page 63: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Theorem 5.1 [Matrix Cube Theorem, [8]] Consider a linear mapping Aξ =∑`

ξ`A` : Rk → Sν ,

and letµ = max

1≤`≤kRank(A`).

Then‖A‖∞,∞ ≤ Ψ∞(A) ≤ ϑ(µ)‖A‖∞,∞, (5.2.5)

where Ψ∞(A) is given by (5.2.4), and ϑ(s) is a universal nondecreasing function of positiveintegral argument s such that

ϑ(1) = 1, ϑ(2) =π

2, ϑ(4) = 2 and ϑ(s) ≤

√πs/2 for all s. (5.2.6)

We add to this the following simple observation:

Lemma 5.2 One always haveΨ∞(A) ≤ k‖A‖∞,∞. (5.2.7)

Indeed, let r = max1≤`≤k

|A`|∞ ≡ ‖A‖1,∞. Since ‖ξ‖1 ≥ ‖ξ‖∞, we have r ≤ ‖A‖∞,∞. Now,the

optimization problem in the right hand side of (5.2.4) has a feasible solution X` = rI, ` = 1, ..., k,t = kr, so that Ψ∞(A) ≤ kr ≤ k‖A‖∞,∞.We have arrived at the following

Corollary 5.2 Let µ be the maximal rank of the “columns” A` of a linear mapping Aξ =∑`

ξ`A` : Rk → Sν . Then

‖A‖∞,∞ ≤ Ψ∞(A) ≤ Ω(µ, k)‖A‖∞,∞,Ω(µ, k) = min[ϑ(µ), k] ≤ O(1)min[

√µ, k] ≤ O(1)min[

√ν, k].

(5.2.8)

Implementing the strategy. We start with the following observation (cf. Proposition 4.1):

Proposition 5.2 Let Aξ =∑`

ξ`A` : Rk → Sν be nontrivial (not all A` are zeros), and let

f(α) = log(‖A‖ 1

α,∞

): [0, 1] → R.

Function f is well-defined convex nonincreasing function of α ∈ [0, 1], and this function isLipschitz continuous with the constant ln k.

Proof. By symmetry, we have

log(‖A‖ 1

α,∞

)= log

(‖A∗‖|·|∗∞,‖·‖ 1

1−α

)= max

A∈Sν

log

(‖A∗A‖ 1

1−α

)

︸ ︷︷ ︸fA(α)

: |A|∗∞ ≤ 1

;

here | · |∗∞ is the norm conjugate to | · |∞ (that is, | · |∗∞ ≡ | · |1). By Proposition 1.1, everyone of the functions fA(α) is either identically −∞, or is a convex, nonincreasing and Lipschitzcontinuous, with constant log k, function of α ∈ [0, 1], It follows that the latter properties areshared by f(α) = supA:|A|∗∞≤1 fA(α).

55

Page 64: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Invoking Lemma 4.1, we extract from Proposition 5.2 that

p ∈ [1,∞] ⇒ ‖A‖p,∞ ≤ ‖A‖1p

1,∞‖A‖p−1

p∞,∞ ≤ kp−1

p2 ‖A‖p,∞, (5.2.9)

whence, invoking Corollary 5.2,

‖A‖p,∞ ≤ Ψp(A) ≡ ‖A‖1p

1,∞Ψp−1

p∞ (A) ≤ kp−1

p2 Ωp−1

p (µ, k)‖A‖p,∞,

µ = max1≤`≤k

Rank(A`),(5.2.10)

where Ψ∞(A) is given by (5.2.4) and Ω(µ, k) is given by (5.2.8). Note that bound Ψp(A) isefficiently computable along with ‖A‖1,∞ ≡ max

`|A`|∞. We have arrived at the following result,

which is the main result of this Chapter:

Theorem 5.2 Let Aξ =∑`

ξ`A` be a linear mapping from Rk to Sν , and let

µ = max1≤`≤k

Rank(A`).

Then ‖A‖p,∞ admits an efficiently computable upper bound Ψp(A) given by (5.2.10), and thetightness factor of this bound do not exceed

Θ(µ, k, p) = kp−1

p2 (min[ϑ(µ), k])p−1

p ≤ O(1)kp−1

p2 (min[√

µ, k])p−1

p ≤ O(1)kp−1

p2(min[

√ν, k]

) p−1p ,

(5.2.11)where the universal function ϑ(s) satisfies (5.2.6).

Discussion. A. Let us look how large could be the tightness factor Θ. In view of (5.2.6), wehave

Θ(µ, p, k) ≤ O(1)kp−1

p2(min[

√ν, k]

) p−1p .

It follows that

Θ(µ, p, k) ≤ O(1)

k

14 ν

14+ ln ν

16 ln k ≤ k14 ν

38 , ν ≤ k2

k, ν > k2 (5.2.12)

In the most interesting case of p = 2, we have

Θ(µ, 2, k) ≤ O(1)

k

14 ν

14 , ν ≤ k2

k34 , ν > k2

In the case of p = 2, k = O(1)ν, our tightness factor coincides, within a factor O(1), with thetightness factor

√min[ν, k] of the bound on ‖A‖2,∞ presented in [6] (note that this bound is

different from Ψ2(A)). In contrast to this, when k << ν or k >> ν, our tightness factor isessentially worse than the one for the bound from [6]. Note, however, that the analysis we havecarried out so far deals with the worst case of µ = ν, which is not the case in applicationswe are primarily interested in, that is, applications in uncertain Semidefinite Programming. Inthese latter applications, especially those coming from Robust Control, µ is usually a moderateabsolute constant, like 2 or 4 (see [8]), while ν can be large. When µ = O(1), bound (5.2.11)becomes

Θ(µ, p, k) ≤ O(1)kp−1

p2 ≤ O(1)k14 ,

56

Page 65: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

where the “worst case” corresponds to p = 2, and in this worst case our tightness factor osO(1)k

14 , which is much better than the tightness factor

√min[k, ν] of the bound from [6], pro-

vided that ν >>√

k, which is nearly always the case in applications.B. Let us look what are the consequences of our bounding scheme for ‖A‖p,∞ in the context of

uncertain Semidefinite Programming (Section 5.1.3), where we are interested to build a tractableapproximation of the RC of an uncertain LMI, that is, of the semi-infinite constraint

k∑

`=1

ξ`A`[x] ¹ A[x] ∀(ξ : ‖ξ‖p ≤ 1), (5.2.13)

in the domainX = x : A[x] Â 0;

here x ∈ Rn is a design vector, and A[x], A1[x], ..., Ak[x] are symmetric ν × ν matrices affinelydepending on x. As it was explained in Section 5.1.3, given an efficiently computable upperbound Φ(A) on the norm ‖A‖p,∞ of a linear mapping A ∈ L(Rn,Sν), we can approximate(5.2.13) in X with the efficiently verifiable constraint

A[x] Â 0 & Φ(Ax) ≤ 1, (5.2.14)

whereAxξ =

`

ξ`A−1/2[x]A`[x]A−1/2[x];

this approximation is safe, in the sense that every feasible solution of the approximation isfeasible for (5.2.13), and the tightness factor of the approximation does not exceed the one ofthe bound Ψ(·).

We are interested in the case when the bound Φ(·) in question is Ψp(·). In this case, theapproximation becomes the system of constraints

(a) −sA[x] ¹ A`[x] ¹ sA[x], ` = 1, ..., k(b.1) Y` º A`[x], Y` º −A`[x], ` = 1, ..., k(b.2)

∑`

Y` ¹ tA[x]

(c) s1p t

p−1p ≤ 1

(d) A[x] Â 0

(5.2.15)

in variables X, s, t, Y1, ..., Yk, in the sense that a candidate solution x ∈ Rn satisfies (5.2.14) ifand only if x can be extended, by properly chosen s, t, Y1, ..., Yk to a feasible solution of (5.2.15).

Indeed, when x ∈ X, (a) can be equivalently rewritten as−sI ¹ A−1/2[x]A`[x]A−1/2[x] ¹sI, that is, (a) says exactly that s ≥ ‖Ax‖1,∞. Similarly, setting X` =A−1/2[x]Y`A

−1/2[x], we see that (b) says exactly that t ≥ Ψ∞(Ax). With theseobservations, a possibility for a given x ∈ X to be extended to a feasible solution of

(a− c) is equivalent to the inequality Ψp(Ax) ≡ Ψ1p

1 (Ax)Ψp−1

p∞ (Ax) ≤ 1.

A severe drawback of approximation (5.2.15) is that the approximation is not a system of convexconstraints on all variables involved, due to the presence of products sA[x], tA[x] and constraint(c), which defines a nonconvex set in the quadrant s, t ≥ 0). As a result, the projection X ofthe feasible set of the system on the space of X-variables can be a non-convex set, which makes

57

Page 66: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

it unclear how achieve the ultimate goal associated with approximate RC, which is to optimizeover X a given convex objective. There is, however, a way to circumvent, to some extent, thissevere drawback. Specifically, note that as far as the projection X of the feasible set of (5.2.15)on the space of x-variables is concerned (this projection is all we are interested in), we losenothing when making (c) an equality rather than inequality, that is, when setting s = t1−p. Theresulting system of constraints in variables x, t, Y1, ..., Yk is

(a) −t1−pA[x] ¹ A`[x] ¹ t1−pA[x], ` = 1, ..., k(b.1) Y` º A`[x], Y` º −A`[x], ` = 1, ..., k(b.2)

∑`

Y` ¹ tA[x]

(d) A[x] Â 0

. (5.2.16)

When t is treated not as a variable, but as a fixed parameter, (5.2.16) becomes a system ofexplicit convex constraints on the variables x, Y1, ..., Yk, so that we can efficiently optimize agiven (convex) objective f(x) over the projection X t of the feasible set of this system onto thespace of x-variables. Now we can choose a “resolution” γ > 1, say, γ = 1.5, and approximate Xby the union of X t over t’s of the form γq, M− ≤ q ≤ M+, where integers M−,M+ are chosenin such a way that the segment [γM− , γM+ ] definitely covers all values of t which might be ofinterest. In applications, it usually is not a big deal to choose appropriate M−,M+; moreover,when γ is not too close to 1, one usually can ensure that K = M+ − M− + 1 is a moderateinteger. With this approach, the problem of optimizing a given convex objective over X , reducesto solving K efficiently solvable problems of optimizing the objective over X γq

, M− ≤ q ≤ M+,and choosing the best – with the smallest value of the objective – among the optimal solutionsto these K problems. It is easily seen that the tightness factor of the resulting approximationscheme is just by factor γ larger than the tightness factor of the underlying upper bound Ψp(·).

58

Page 67: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

Bibliography

[1] A. Ben-Tal, A. Nemirovski, Lectures on Modern Convex Optimization, SIAM-MPS Serieson Optimization, 2001.

[2] A. Ben-Tal, A. Nemirovski, “Stable Truss Topology Design via Semidefinite Program-ming” – SIAM Journal on Optimization 7 (1997), pp. 991-1016.

[3] A. Ben-Tal, A. Nemirovski, “Robust Convex Optimization” – Mathematics of OperationsResearch 23 (1998).

[4] A. Ben-Tal, A. Nemirovski, “Robust solutions to uncertain linear programs” – OR Letters25 (1999), pp. 1-13.

[5] Ben-Tal, A., and Nemirovski, A. “Robust solutions of Linear Programming problemscontaminated with uncertain data”, Mathematical Programming v. 88 (2000), 411-424.

[6] Ben-Tal, A., El Ghaoui, L., and Nemirovski, A. “Robust Semidefinite Programming” – R.Saigal, H. Wolkowitcz, L. Vandenberghe, Eds. Handbook on Semidefinite Programming,Kluwer Academic Publishers, 2000, 139-162.

[7] Ben-Tal, A., and Nemirovski, A., “Robust Optimization — Methodology and Applica-tions”, Mathematical Programming Series B, v. 92 (2002), 453-480.

[8] Ben-Tal, A., and Nemirovski, A., “On tractable approximations of uncertain linear matrixinequalities affected by interval uncertainty”, SIAM Journal on Optimization v. 12 (2002),811-833.

[9] Ben-Tal, A., Nemirovski, A., and Roos, C., “Robust solutions of uncertain quadratic andconic-quadratic problems”, SIAM Journal on Optimization v. 13 (2002), 535-560.

[10] Ben-Tal, A., Nemirovski, A., Roos, C., “Extended Matrix Cube Theorems with appli-cations to µ-Theory in Control” – Mathematics of Operations Research v. 28 (2003),497–523.

[11] Ben-Tal, A., Goryashko, A., Guslitzer, E. and Nemirovski, A., “Adjustable Robust Solu-tions of Uncertain Linear Programs”, Mathematical Programming v. 99 (2004), 351–376.

[12] Bertsimas, D., Pachamanova, D., Sim, M., “Robust Linear Optimization under GeneralNorms” – Operations Research Letters, 2004.

[13] Bertsimas, D., Sim, M., “The price of Robustness”, Operations Research 52 (2004), 35–53.

59

Page 68: COMPUTATION OF MATRIX NORMS WITH APPLICATIONS TO …nemirovs/Daureen.pdf1.1 Matrix Norm problem: setting and motivation 1.1.1 Matrix Norm problem. In the Thesis, we focus on the Matrix

[14] Bertsimas, D., Sim, M., “Robust Discrete optimization and Network Flows”, Mathemat-ical Programming Series B, 98 (2003), 49–71.

[15] Lobo, M.S., Vanderberghe, L., Boyd, S., Lebret, H., “Second-Order Cone Programming”,Linear Algebra and Applications v. 284 (1998), 193 – 228.

[16] L. El-Ghaoui, H. Lebret, “Robust solutions to least-square problems with uncertain datamatrices” – SIAM J. of Matrix Anal. and Appl. 18 (1997), 1035-1064.

[17] L. El-Ghaoui, F. Oustry, H. Lebret, “Robust solutions to uncertain semidefinite programs”– SIAM J. on Optimization 9 (1998), 33-52.

[18] M.X. Goemans, D.P. Williamson, “Improved approximation algorithms for Maximum Cutand Satisfiability problems using semidefinite programming” – Journal of ACM 42 (1995),1115-1145.

[19] Lobo, M.S., Vanderberghe, L., Boyd, S., Lebret, H., “Second-Order Cone Programming”,Linear Algebra and Applications v. 284 (1998), 193 – 228.

[20] Margalit, T., “Robust Convex Programming with Applications to Portfolio Selection”,M.Sc. Thesis, Faculty of Industrial Engineering and Management, Technion – Israel In-stitute of Technology, 1997.

[21] Yu. Nesterov, “Semidefinite relaxation and non-convex quadratic optimization” – Opti-mization Methods and Software 12 (1997), 1-20.

[22] R. Saigal, H. Wolkowitcz, L. Vandenberghe, Eds. Handbook on Semidefinite Program-ming, Kluwer Academic Publishers, 2000.

60