From Matrix to Tensor:The Transition to Numerical Multilinear Algebra
Lecture 1. Introduction to TensorComputations
Charles F. Van Loan
Cornell University
The Gene Golub SIAM Summer School 2010Selva di Fasano, Brindisi, Italy
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
What is a Tensor?
Definition
An order-d tensor A ∈ IRn1×···×nd is a real d-dimensional arrayA(1:n1, . . . , 1:nd) where the index range in the k-th mode is from1 to nk .
Low-Order Tensors
A scalar is n order-0 tensor.
A vector is an order-1 tensor.
A matrix is an order-2 tensor.
We will use calligraphic font to designate tensors that have order 3or greater, e.g., A, B, C, etc.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Where Might They Come From?
Discretization.
A(i , j , k, `) might house the value of f (w , x , y , z) at(w , x , y , z) = (wi , xj , yk , z`).
Multiway Analysis.
A(i , j , k, `) is a value that captures an interaction between fourvariables/factors.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
You Have Seen them Before
Images
A color picture is an m-by-n-by-3 tensor:
A(:, :, 1) = red pixel values
A(:, :, 2) = green pixel values
A(:, :, 3) = blue pixel values
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
You Have Seen them Before
Block Matrices
A =
a11 a12 a13 a14 a15 a16
a21 a22 a23 a24 a25 a26
a31 a32 a33 a34 a35 a36
a41 a42 a43 a44 a45 a46
a51 a52 a53 a54 a55 a56
a61 a62 a63 a64 a65 a66
Matrix entry a45 is the (2,1) entry of the (2,3) block:
a45 ⇔ A(2, 3, 2, 1)
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The Decomposition Paradigm in Matrix Computations
Typical:
Convert the given problem into an equivalent easy-to-solveproblem by using the “right” matrix decomposition.
PA = LU, Ly = Pb, Ux = y =⇒ Ax = b
Also Typical:
Uncover hidden relationships by computing the “right”decomposition of the data matrix.
A = UΣV T =⇒ A ≈r̂∑
i=1
σiuivTi
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Two Natural Questions
Question 1.
Can we solve tensor problems by converting them to equivalent,easy-to-solve problems using a tensor decomposition?
Question 2.
Can we uncover hidden patterns in tensor data by computing anappropriate tensor decomposition?
We explore the decomposition issue for 2-by-2-by-2 tensors...
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The Singular Value Decomposition
The 2-by-2 case...
[a11 a12
a21 a22
]=
[u11 u12
u21 u22
] [σ1 00 σ2
] [v11 v12
v21 v22
]T
= σ1
[u11
u21
] [v11
v21
]T
+ σ2
[u12
u22
] [v12
v22
]T
A reshaped presentation of the same thing...a11
a21
a12
a22
= σ1
v11u11
v11u21
v21u11
v21u21
+ σ2
v12u12
v12u22
v22u12
v22u22
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The Kronecker Product
Definition As Applied to 2-vectors
[x1
x2
]⊗
[y1
y2
]=
x1y1
x1y2
x2y1
x2y2
2-by-2 SVD in Kronecker Terms...
a11
a21
a12
a22
= σ1
[v11
v21
]⊗
[u11
u21
]+ σ2
[v12
v22
]⊗
[u12
u22
]
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Reshaping
Appreciate the Duality..
[a11 a12
a21 a22
]= σ1 · u1v
T1 + σ2 · u2v
T2
a11
a21
a12
a22
= σ1 · (v1 ⊗ u1) + σ2 · (v2 ⊗ u1)
U =[
u1 u2
]V =
[v1 v2
]
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Write A ∈ IR2×2×2 as a minimal sum of rank-1 tensors.
What’s a rank-1 tensor?
If R has unit rank then
R = (rijk) rijk = hkgj fi
If R ∈ IR2×2×2 has unit rank then there exist f , g , h ∈ IR2 suchthat
r111
r211
r121
r221
r112
r212
r122
r222
=
[h1
h2
]⊗
[g1
g2
]⊗
[f1f2
]
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Write A ∈ IR2×2×2 as a minimal sum of rank-1 tensors.
Find thinnest possible X ,Y ,Z ∈ IR2×r so
a111
a211
a121
a221
a112
a212
a122
a222
=
r∑k=1
zk ⊗ yk ⊗ xk
where
X = [x1| · · · |xr ] Y = [y1| · · · |yr ] Z = [z1| · · · |zr ]
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Write A ∈ IR2×2×2 as a minimal sum of rank-1 tensors.
A Surprising Fact
If
a111
a211
a121
a221
a112
a212
a122
a222
= randn(8,1) then
r = 2 79% of the time
r = 3 21% of the time
Compare to..
If A = randn(n,n), then 100% of the time rank(A) = n
What are the “full rank” 2-by-2-by-2 tensors?
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The 79/21 Property
Interesting Fact
If the aijk are randn then
det
([a111 a121
a211 a221
]− λ
[a112 a122
a212 a222
])= 0
has real distinct roots 79% of the time and complex conjugateroots 21% of the time.
What is the connection between this generalized eigenvalueproblem and the 2-by-2-by-2 tensor rank problem?
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The 79% Situation
(Real Distinct Eigenvalues)
There exist nonsingular X =[
x1 x2
]and Y =
[y1 y2
]so[
a111 a121
a211 a221
]= X
[α1 00 α2
]Y T = α1x1y
T1 + α2x2y
T2[
a112 a122
a212 a222
]= X
[1 00 1
]Y T = x1y
T1 + x2y
T2
i.e.,
a111
a211
a121
a221
= α1(y1 ⊗ x1) + α2(y2 ⊗ x2)
a112
a212
a122
a222
= (y1 ⊗ x1) + (y2 ⊗ x2)
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The 79% Situation
Stack ’em
a111
a211
a121
a221
a112
a212
a122
a222
=
[α1
1
]⊗ (y1 ⊗ x1) +
[α2
1
]⊗ (y2 ⊗ x2)
A has rank 2.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The 21% Situation
(Complex Conjugate Eigenvalues)
There exist nonsingular X =ˆ
x1 x2
˜and Y =
ˆy1 y2
˜so»
a111 a121
a211 a221
–= X
»α1 α2
−α2 α1
–Y T
= α1(x1yT1 +x2y
T2 )+α2(x1y
T2 −x2y
T1 )
= α1(x1+x2)(y1+y2)T − (α1−α2)x1y
T2 − (α1+α2)x2y
T1
»a112 a122
a212 a222
–= X
»1 00 1
–Y T
= x1yT1 + x2y
T2
= (x1 + x2)(y1 + y2)T − x1y
T2 − x2y
T1
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The 21% Situation
(Complex Conjugate Eigenvalues)
a111
a211
a121
a221
a112
a212
a122
a222
=
[α1
1
]⊗ (y1 + y2)⊗ (x1 + x2)
−[α2 − α1
1
]⊗ y2 ⊗ x1
−[α1 + α2
1
]⊗ y1 ⊗ x1
A has rank 3.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Matlab: 2-by-2-by-2 Tensor Rank
% Script L1
% Estimates the prob that randn (2,2,2) has rank 2:
N = 1000000; count = 0;
for eg=1:N
A = randn (2,2,2);
A_front = [A(1,1,1) A(1,2,1); A(2,1,1) A(2 ,2,1)];
A_back = [A(1,1,2) A(1,2,2); A(2,1,2) A(2 ,2,2)];
[S,T] = qz(A_front ,A_back ,’real’);
if S(2 ,1)==0
count = count +1;
end
end
ProbARank2 = count/N
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Problem 1.1. What happens if we use rand instead of randn in the scriptL1?
Problem 1.2. Explain why the script L1 generates the same resultsregardless of the choice for A front and A back:
Choice 1: A front =
»a111 a121
a211 a221
–A back =
»a112 a122
a212 a222
–
Choice 2: A front =
»a111 a112
a211 a212
–A back =
»a121 a122
a221 a222
–
Choice 3: A front =
»a111 a112
a121 a122
–A back =
»a211 a212
a221 a222
–
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Matlab: Generating Tensors
% A 2-by -3-by -4 random tensor ...
A = randn (2,3,4);
% Dimensions can be specified by a vector ...
n = [2,3,4];
A = randn(n);
% The functions ones and zeros ...
A = ones(n);
A = zeros(n);
% Size and reshaping ...
n = size(A);
N = prod(n);
a = reshape(A,N,1);
B = reshape(a,n(length(n): -1:1));
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Matlab: Tensor Operations
% Applying a function to each component ...
n = [2,3,4];
A = randn(n);
for i1=1:n(1)
for i2=1:n(2)
for i3=1:n(3)
A(i1 ,i2 ,i3) = sin(A(i1,i2 ,i3));
end
end
end
% In lieu of the triple loop ...
A = sin(A);
% The usual cast of characters ..
A = floor(A); A = ceil(A); A = round(A);
A = real(A); A = imag(A);
A = sqrt(A); A = abs(A);
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Matlab: Tensor Operations
n = [2,3,4];
A = randn(n);
B = randn(n);
% These operations are legal ...
C = 3*A;
C = -A;
C = A+1;
C = A.^2;
% These operations are legal if A and B
% have the same dimension ...
C = A + B;
C = A./B;
C = A.*B;
C = A.^B;
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Nearness Problems
The Nearest Rank-1 Matrix Problem
If A ∈ IRn1×n2 has SVD A = UΣV T then Bopt = σ1u1vT1
minimizesφ(B) = ‖ A− B ‖F rank(B) = 1.
The Nearest Rank-1 Tensor to A ∈ IR2×2×2
Find unit 2-norm vectors u, v ,w ∈ IR2 and scalar σ so that‖ a− σ · w ⊗ v ⊗ u ‖2 is minimized where
a =
a111
a211
a121
a221
a112
a212
a122
a222
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
A Highly Structured Nonlinear Optimization Problem
It depends on four parameters...
φ(σ, θ1, θ2, θ3) =
∥∥∥∥a − σ
[cos(θ3)sin(θ3)
]⊗
[cos(θ2)sin(θ2)
]⊗
[cos(θ1)sin(θ1)
]∥∥∥∥2
=
∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥
a111
a211
a121
a221
a112
a212
a122
a222
− σ ·
c3c2c1
c3c2s1c3s2c1
c3s2s1s3c2c1
s3c2s1s3s2c1
s3s2s1
∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥2
ci = cos(θi ), si = sin(θi ) i = 1:3
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
A Highly Structured Nonlinear Optimization Problem
and it can be reshaped...
φ =
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚
266666666664
a111
a211
a121
a221
a112
a212
a122
a222
377777777775− σ ·
266666666664
c3c2c1
c3c2s1
c3s2c1
c3s2s1
s3c2c1
s3c2s1
s3s2c1
s3s2s1
377777777775
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2
=
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚
266666666664
a111
a211
a121
a221
a112
a212
a122
a222
377777777775−
266666666664
c3c2 00 c3c2
c3s2 00 c3s2
s3c2 00 s3c2
s3s2 00 s3s2
377777777775
»x1
y1
– ‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2
[x1
y1
]= σ ·
[c1
s1
]= σ ·
[cos(θ1)sin(θ1)
]
Idea: Improve σ and θ1 by minimizing with respect to x1 and y1,holding θ2 and θ3 fixed.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
A Highly Structured Nonlinear Optimization Problem
and it can be reshaped...
φ =
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚
266666666664
a111
a211
a121
a221
a112
a212
a122
a222
377777777775− σ ·
266666666664
c3c2c1
c3c2s1
c3s2c1
c3s2s1
s3c2c1
s3c2s1
s3s2c1
s3s2s1
377777777775
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2
=
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚
266666666664
a111
a211
a121
a221
a112
a212
a122
a222
377777777775−
266666666664
c3c1 0c3s1 00 c3c1
0 c3s1
s3c1 0s3s1 00 s3c1
0 s3s1
377777777775
»x2
y2
– ‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2
[x2
y2
]= σ ·
[c2
s2
]= σ ·
[cos(θ2)sin(θ2)
]
Idea: Improve σ and θ2 by minimizing with respect to x2 and y2,holding θ1 and θ3 fixed.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
A Highly Structured Nonlinear Optimization Problem
and it can be reshaped...
φ =
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚
266666666664
a111
a211
a121
a221
a112
a212
a122
a222
377777777775− σ ·
266666666664
c3c2c1
c3c2s1
c3s2c1
c3s2s1
s3c2c1
s3c2s1
s3s2c1
s3s2s1
377777777775
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2
=
‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚
266666666664
a111
a211
a121
a221
a112
a212
a122
a222
377777777775−
266666666664
c2c1 0c2s1 0s2c1 0s2s1 00 c2s1
0 c2s1
0 s2c1
0 s2s1
377777777775
»x3
y3
– ‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2
[x3
y3
]= σ ·
[c3
s3
]= σ ·
[cos(θ3)sin(θ3)
]
Idea: Improve σ and θ3 by minimizing with respect to x3 and y3,holding θ1 and θ2 fixed.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Alternating Least Squares
A Common Framework for Tensor-Related Optimization
Choose a subset of the unknowns such that if they are(temporarily) fixed, then we are presented with aneasy-to-solve problem in the remaining unknowns.
By choosing different subsets, cycle through all the unknowns.
Repeat until converged.
“Easy-to-solve” usually means “linear.”
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Problem 1.3. Write a Matlab function [sigma,theta] =
NearestTank1(A) that takes a tensor A ∈ IR2×2×2 and uses alternatingleast squares to produce an estimate of a nearest rank-1 approximant.What is a good starting value? The linear least squares problems that needto be solved are highly structured. Explain and exploit.
Problem 1.4. Fix θ3. How would you choose σ, θ1 and θ2 so that
φ(σ, θ1, θ2, θ3) =
‚‚‚‚a − σ
»cos(θ3)sin(θ3)
–⊗
»cos(θ2)sin(θ2)
–⊗
»cos(θ1)sin(θ1)
–‚‚‚‚2
is minimized?
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
Summary of Lecture 1.
Key Words
Reshaping is about turning a matrix in to a differently sizedmatrix or into a tensor or into a vector.
The Kronecker product is an operation between two matricesthat produces a highly structured block matrix.
A Rank-1 tensor can be reshaped into a multiple Kroneckerproduct of vectors.
A tensor decomposition expresses a given tensor in terms ofsimpler tensors.
The alternating least squares approach to multilinear LS isbased on solving a sequence of linear LS problems, eachobtained by freezing all but a subset of the unknowns.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
What is the Course About?
The Lectures...
Lecture 1. Introduction to Tensor Computations
Lecture 2. Tensor Unfoldings
Lecture 3. Transpositions, Kronecker Products, Contractions
Lecture 4. Tensor-Related Singular Value Decompositions
Lecture 5. The CP Representation and Rank
Lecture 6. The Tucker Representation
Lecture 7. Other Decompositions and Nearness Problems
Lecture 8. Multilinear Rayleigh Quotients
Lecture 9. The Curse of Dimensionality
Lecture 10. Special Topics
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
What is the Course About?
The Next Big Thing
Scalar-Level Thinking
1960’s ⇓
Matrix-Level Thinking
1980’s ⇓
Block Matrix-Level Thinking
2000’s ⇓
Tensor-Level Thinking
⇐ The factorization paradigm:LU, LDLT , QR, UΣV T , etc.
⇐ Cache utilization, parallelcomputing, LAPACK, etc.
⇐New applications, factoriza-tions, data structures, non-linear analysis, optimizationstrategies, etc.
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
What is the Course About?
The Curse of Dimensionality
In Matrix Computations, to say that A ∈ IRn1×n2 is “big” is to saythat both n1 and n2 are big.
In Tensor Computations, to say that A ∈ IRn1×···×nd is “big” is tosay that n1n2 · · · nd is big and this need not require big nk . E.g.n1 = n2 = · · · = n1000 = 2.
Algorithms that scale with d will induce a transition...
Matrix-Based Scientific Computation
⇓
Tensor-Based Scientific Computation
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations
The “Geometry” of the Tensor Research Community
Applications
NonlinearOptimization
MatrixComputations
MultilinearAlgebra
NonlinearAnalysis
Statistics
ProgrammingLanguages
Decompositions
SoftwareLibraries
-�� -
6
?
?
6
��
���3�
��
��+
QQQsQ
QQk
QQQkQ
QQs
��
���+�
��
��3
⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations