lecture 16. manifold learning - github pagesย ยท statistical machine learning (s2 2017) deck 16...

Post on 17-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Semester 2, 2017Lecturer: Andrey Kan

Lecture 16. Manifold LearningCOMP90051 Statistical Machine Learning

Copyright: University of MelbourneSwiss roll image: Evan-Amos, Wikimedia Commons, CC0

Statistical Machine Learning (S2 2017) Deck 16

This lectureโ€ข Introduction to manifold learning

โˆ— Motivationโˆ— Focus on data transformation

โ€ข Unfolding the manifoldโˆ— Geodesic distancesโˆ— Isomap algorithm

โ€ข Spectral clusteringโˆ— Laplacian eigenmapsโˆ— Spectral clustering pipeline

2

Statistical Machine Learning (S2 2017) Deck 16

Manifold Learning

Recovering low dimensional data representation non-

linearly embedded within a higher dimensional space

3

Statistical Machine Learning (S2 2017) Deck 16

The limitation of k-means and GMMโ€ข K-means algorithm can find spherical clusters

โ€ข GMM can find elliptical clusters

โ€ข These algorithms will struggle in cases like this

4

desired resultK-means clustering

Figure from Murphy

Statistical Machine Learning (S2 2017) Deck 16

Focusing on data geometryโ€ข We are not dismissing the k-means algorithm yet,

but we are going to put it aside for a moment

โ€ข One approach to address the problem in the previous slide would be to introduce improvements to algorithms such as k-means

โ€ข Instead, letโ€™s focus on geometry of the data and see if we can transform the data to make it amenable for simpler algorithmsโˆ— Recall โ€œtransform the data vs modify the modelโ€ discussion

in supervised learning

5

Statistical Machine Learning (S2 2017) Deck 16

Non-linear data embeddingโ€ข Recall the example with 3D GPS coordinates that denote a carโ€™s

location on a 2D surface

โ€ข In a similar example consider coordinates of items on a picnic blanket which is approximately a planeโˆ— In this example, the data resides on a plane embedded in 3D

โ€ข A low dimensional surface can be quite curved in a higher dimensional spaceโˆ— A plane of dough (2D) in a Swiss roll (3D)

6

Picnic blanket image: Theo Wright, Flickr, CC2Swiss roll image: Evan-Amos, Wikimedia Commons, CC0

Statistical Machine Learning (S2 2017) Deck 16

Key assumption: Itโ€™s simpler than it looks!

7

โ€ข Key assumption: High dimensional data actually resides in a lower dimensional space that is locally Euclidean

โ€ข Informally, the manifold is a subset of points in the high-dimensional space that locally looks like a low-dimensional space

Statistical Machine Learning (S2 2017) Deck 16

Manifold example

8

โ€ข Informally, the manifold is a subset of points in the high-dimensional space that locally looks like a low-dimensional space

โ€ข Example: arc of a circleโˆ— consider a tiny bit of a circumference (2D) can treat as line (1D)

A

BC

๐ด๐ด๐ด๐ด โ‰ˆ ๐ด๐ด๐ด๐ด + ๐ด๐ด๐ด๐ด

Statistical Machine Learning (S2 2017) Deck 16

๐‘™๐‘™-dimensional manifoldโ€ข Definition from Guillemin and Pollack, Differential Topology, 1974

โ€ข A mapping ๐‘“๐‘“ on an open set ๐‘ˆ๐‘ˆ โŠ‚ ๐‘น๐‘น๐‘š๐‘š is called smooth if it has continuous partial derivatives of all orders

โ€ข A map ๐‘“๐‘“:๐‘‹๐‘‹ โ†’ ๐‘น๐‘น๐‘™๐‘™ is called smooth if around each point ๐’™๐’™ โˆˆ ๐‘‹๐‘‹ there is an open set ๐‘ˆ๐‘ˆ โŠ‚ ๐‘น๐‘น๐‘š๐‘š and a smooth map ๐น๐น:๐‘ˆ๐‘ˆ โ†’ ๐‘น๐‘น๐‘™๐‘™ such that ๐น๐นequals ๐‘“๐‘“ on ๐‘ˆ๐‘ˆ โˆฉ ๐‘‹๐‘‹

โ€ข A smooth map ๐‘“๐‘“:๐‘‹๐‘‹ โ†’ ๐‘Œ๐‘Œ of subsets of two Euclidean spaces is a diffeomorphism if it is one to one and onto, and if the inverse map ๐‘“๐‘“โˆ’1:๐‘Œ๐‘Œ โ†’ ๐‘‹๐‘‹ is also smooth. ๐‘‹๐‘‹ and ๐‘Œ๐‘Œ are diffeomorphic if such a map exists

โ€ข Suppose that ๐‘‹๐‘‹ is a subset of some ambient Euclidean space ๐‘น๐‘น๐‘š๐‘š. Then ๐‘‹๐‘‹ is an ๐‘™๐‘™-dimensional manifold if each point ๐’™๐’™ โˆˆ ๐‘‹๐‘‹ possesses a neighbourhood ๐‘‰๐‘‰ โŠ‚ ๐‘‹๐‘‹ which is diffeomorphic to an open set ๐‘ˆ๐‘ˆ โŠ‚๐‘น๐‘น๐‘™๐‘™

9

Statistical Machine Learning (S2 2017) Deck 16

Manifold examples

10

โ€ข A few examples of manifolds are shown below

โ€ข In all cases, the idea is that (hopefully) once the manifold is โ€œunfoldedโ€, the analysis, such as clustering becomes easy

โ€ข How to โ€œunfoldโ€ a manifold?

Statistical Machine Learning (S2 2017) Deck 16

Geodesic Distancesand Isomap

A non-linear dimensionality reduction algorithm that

preserves locality information using geodesic distances

11

Statistical Machine Learning (S2 2017) Deck 16

12

โ€ข Find a lower dimensional representation of data that preserves distances between points (MDS)

โ€ข Do visualization, clustering, etc. on lower dimensional representation Problems?

General idea: Dimensionality reduction

A

AB

B

?

Statistical Machine Learning (S2 2017) Deck 16

โ€œGlobal distancesโ€ VS geodesic distances

โ€ข โ€œGlobal distancesโ€ cause a problem: we may not want to preserve them

โ€ข We are interested in preserving distances along the manifold (geodesic distances)

geodesic distance

CD

Images: ClkerFreeVectorImages and Kaz @pixabay.com (CC0) 7

Statistical Machine Learning (S2 2017) Deck 16

MDS and similarity matrixโ€ข In essence, โ€œunfoldingโ€ a manifold is achieved via

dimensionality reduction, using methods such as MDS

โ€ข Recall that the input of an MDS algorithm is similarity (aka proximity) matrix where each element ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– denotes how similar data points ๐‘–๐‘– and ๐‘—๐‘— are

โ€ข Replacing distances with geodesic distances simply means constructing a different similarity matrix without changing the MDS algorithmโˆ— Compare it to the idea of modular learning in kernel methods

โ€ข As you will see shortly, there is a close connection between similarity matrices and graphs and in the next slide, we review basic definitions from graph theory

14

Statistical Machine Learning (S2 2017) Deck 16

Refresher on graph terminologyโ€ข Graph is a tuple ๐บ๐บ = {๐‘‰๐‘‰,๐ธ๐ธ}, where ๐‘‰๐‘‰ is a set of

vertices, and ๐ธ๐ธ โŠ† ๐‘‰๐‘‰ ร— ๐‘‰๐‘‰ is a set of edges. Each edge is a pair of verticesโˆ— Undirected graph: pairs are unorderedโˆ— Directed graph: pairs are ordered

โ€ข Graphs model pairwise relations between objectsโˆ— Similarity or distance between the data points

โ€ข In a weighted graph, each vertex ๐‘ฃ๐‘ฃ๐‘–๐‘–๐‘–๐‘– has an associated weight ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–โˆ— Weights capture the strength of the relation between

objects15

Statistical Machine Learning (S2 2017) Deck 16

Weighted adjacency matrixโ€ข We will consider weighted undirected graphs with

non-negative weights ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– โ‰ฅ 0. Moreover, we will assume that ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– = 0, if and only if vertices ๐‘–๐‘– and ๐‘—๐‘—are not connected

โ€ข The degree of a vertex ๐‘ฃ๐‘ฃ๐‘–๐‘– โˆˆ ๐‘‰๐‘‰ is defined as

deg ๐‘–๐‘– โ‰ก๏ฟฝ๐‘–๐‘–=1

๐‘›๐‘›๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข A weighted undirected graph can be represented with an weighted adjacency matrix ๐‘พ๐‘พ that contain weights ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– as its elements

16

Statistical Machine Learning (S2 2017) Deck 16

Similarity graph models data geometryโ€ข Geodesic distances can be

approximated using a graph in which vertices represent data points

โ€ข Let ๐‘‘๐‘‘(๐‘–๐‘–, ๐‘—๐‘—) be the Euclidean distance between the points in the original space

โ€ข Option 1: define some local radius ๐œ€๐œ€. Connect vertices ๐‘–๐‘– and ๐‘—๐‘— with an edge if ๐‘‘๐‘‘ ๐‘–๐‘–, ๐‘—๐‘— โ‰ค ๐œ€๐œ€

โ€ข Option 2: define nearest neighbor threshold ๐‘˜๐‘˜. Connect vertices ๐‘–๐‘– and ๐‘—๐‘—if ๐‘–๐‘– is among the ๐‘˜๐‘˜ nearest neighbors of ๐‘—๐‘— OR ๐‘—๐‘— is among the ๐‘˜๐‘˜nearest neighbors of ๐‘–๐‘–

โ€ข Set weight for each edge to ๐‘‘๐‘‘ ๐‘–๐‘–, ๐‘—๐‘—17

geodesic distance

CD

C

D

Statistical Machine Learning (S2 2017) Deck 16

Computing geodesic distancesโ€ข Given the similarity graph,

compute shortest paths between each pair of pointsโˆ— E.g., using Floyd-Warshall

algorithm in ๐‘‚๐‘‚ ๐‘›๐‘›3

โ€ข Set geodesic distance between vertices ๐‘–๐‘– and ๐‘—๐‘— to the length (sum of weights) of the shortest path between them

โ€ข Define a new similarity matrix based on geodesic distances

18

geodesic distance

CD

C

D

Statistical Machine Learning (S2 2017) Deck 16

Isomap: summary1. Construct the similarity

graph

2. Compute shortest paths

3. Geodesic distances are the lengths of the shortest paths

4. Construct similarity matrix using geodesic distances

5. Apply MDS19

geodesic distance

CD

C

D

Statistical Machine Learning (S2 2017) Deck 16

Spectral Clustering

An spectral graph theory approach to non-linear

dimensionality reduction

20

Statistical Machine Learning (S2 2017) Deck 16

Data processing pipelinesโ€ข Isomap algorithm can be considered a pipeline in a

sense that in combines different processing blocks, such as graph construction, and MDS

โ€ข Here MDS serves as a core sub-routine to Isomap

โ€ข Spectral clustering is similar to Isomap in that it also comprises a few standard blocks, including k-means clustering

โ€ข In contrast to Isomap, spectral clustering uses a different non-linear mapping technique called Laplacian eigenmap

21

Statistical Machine Learning (S2 2017) Deck 16

Spectral clustering algorithm1. Construct similarity graph, use the corresponding

adjacency matrix as a new similarity matrixโˆ— Just as in Isomap, the graph captures local geometry and

breaks long distance relationsโˆ— Unlike Isomap, the adjacency matrix is used โ€œas isโ€,

shortest paths are not used

2. Map data to a lower dimensional space using Laplacian eigenmaps on the adjacency matrixโˆ— This uses results from spectral graph theory

3. Apply k-means clustering to the mapped points

22

Statistical Machine Learning (S2 2017) Deck 16

Similarity graph for spectral clusteringโ€ข Again, we start with constructing a similarity graph. This can

be done in the same way as for Isomap (but no need to compute the shortest paths)

โ€ข Recall that option 1 was to connect points that are closer than ๐œ€๐œ€, and options 2 was to connect points within ๐‘˜๐‘˜ neiborhood

โ€ข There is also option 3 usually considered for spectral clustering. Here all points are connected to each other (the graph is fully connected). The weights are assigned using a Gaussian kernel (aka heat kernel) with width parameter ๐œŽ๐œŽ

๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– = exp โˆ’1๐œŽ๐œŽ

๐’™๐’™๐‘–๐‘– โˆ’ ๐’™๐’™๐‘–๐‘–2

23

Statistical Machine Learning (S2 2017) Deck 16

Graph Laplacianโ€ข Recall that ๐‘พ๐‘พ denotes weighted adjacency matrix which

contains all weights ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–โ€ข Next, degree matrix ๐‘ซ๐‘ซ is defined as a diagonal matrix

with vertex degrees on the diagonal. Recall that a vertex degree is deg ๐‘–๐‘– = โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข Finally, another special matrix associated with each graph is called unnormalised graph Laplacian and is defined as ๐‘ณ๐‘ณ โ‰ก ๐‘ซ๐‘ซ โˆ’๐‘พ๐‘พโˆ— For simplicity, here we introduce spectral clustering using

unnormalised Laplacian. In practice, it is common to use a Laplacian normalised in certain way, e.g., ๐‘ณ๐‘ณ๐‘›๐‘›๐‘›๐‘›๐‘š๐‘š โ‰ก ๐‘ฐ๐‘ฐ โˆ’ ๐‘ซ๐‘ซโˆ’1๐‘พ๐‘พ, where ๐‘ฐ๐‘ฐ is an identity matrix

24

Statistical Machine Learning (S2 2017) Deck 16

Laplacian eigenmapsโ€ข Laplacian eigenmaps, a central sub-routine of spectral clustering, is

a non-linear dimensionality reduction method

โ€ข Similar to MDS, the idea is to map the original data points ๐’™๐’™๐‘–๐‘– โˆˆ ๐‘น๐‘น๐‘š๐‘š, ๐‘–๐‘– = 1, โ€ฆ ,๐‘›๐‘› to a set of low-dimensional points ๐’›๐’›๐‘–๐‘– โˆˆ ๐‘น๐‘น๐‘™๐‘™, ๐‘™๐‘™ < ๐‘š๐‘š that โ€œbest representโ€ the original data

โ€ข Laplacian eigenmaps use a similarity matrix ๐‘พ๐‘พ rather than original data coordinates as a starting pointโˆ— Here the similarity matrix ๐‘พ๐‘พ is the weighted adjacency matrix of the

similarity graph

โ€ข Earlier, weโ€™ve seen examples of how โ€œbest representโ€ criterion is formalised in MDS methods

โ€ข Laplacian eigenmaps use a different criterion, namely the aim is to minimise (subject to some constraints)

๏ฟฝ๐‘–๐‘–,๐‘–๐‘–

๐’›๐’›๐‘–๐‘– โˆ’ ๐’›๐’›๐‘–๐‘–2๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

25

Statistical Machine Learning (S2 2017) Deck 16

Alternative representation of mappingโ€ข This minimisation problem is solved using results from

spectral graph theory

โ€ข Instead of the mapped points ๐’›๐’›๐‘–๐‘–, the output can be viewed as a set of ๐‘›๐‘› โˆ’dimensional vectors ๐’‡๐’‡๐‘–๐‘–, ๐‘—๐‘— = 1, โ€ฆ , ๐‘™๐‘™. The solution eigenmap is expressed in terms of these ๐’‡๐’‡๐‘–๐‘–โˆ— For example, if the mapping is onto 1D line, ๐’‡๐’‡1 = ๐’‡๐’‡ is just a

collection of coordinates for all ๐‘›๐‘› pointsโˆ— If the mapping is onto 2D, ๐’‡๐’‡1 is a collection of all the fist

coordinates, and ๐’‡๐’‡2 is a collection of all the second coordinates

โ€ข For illustrative purposes, we will consider a simple example of mapping to 1D

26

Statistical Machine Learning (S2 2017) Deck 16

Problem formulation for 1D eigenmapโ€ข Given an ๐‘›๐‘› ร— ๐‘›๐‘› similarity matrix ๐‘พ๐‘พ, our aim is to find

a 1D mapping ๐’‡๐’‡, such that ๐‘“๐‘“๐‘–๐‘– is the coordinate of the mapped ๐‘–๐‘–๐‘ก๐‘ก๐‘ก point. We are looking for a mapping that minimises 1

2โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘– โˆ’ ๐‘“๐‘“๐‘–๐‘–

2๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข Clearly for any ๐’‡๐’‡, this can be minimised by multiplying ๐’‡๐’‡ by a small constant, so we need to introduce a scaling constraint, e.g., ๐’‡๐’‡ 2 = ๐’‡๐’‡โ€ฒ๐’‡๐’‡ = 1

โ€ข Next recall that ๐‘ณ๐‘ณ โ‰ก ๐‘ซ๐‘ซ โˆ’๐‘พ๐‘พ

27

Statistical Machine Learning (S2 2017) Deck 16

Re-writing the objective in vector formโ€ข 1

2โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘– โˆ’ ๐‘“๐‘“๐‘–๐‘–

2๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข = 12โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘–2๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– โˆ’ 2๐‘“๐‘“๐‘–๐‘–๐‘“๐‘“๐‘–๐‘–๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– + ๐‘“๐‘“๐‘–๐‘–2๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข = 12โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘“๐‘“๐‘–๐‘–2 โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– โˆ’ 2โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘–๐‘“๐‘“๐‘–๐‘–๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– + โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘“๐‘“๐‘–๐‘–2 โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข = 12โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘“๐‘“๐‘–๐‘–2deg ๐‘–๐‘– โˆ’ 2โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘–๐‘“๐‘“๐‘–๐‘–๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘– + โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘“๐‘“๐‘–๐‘–2deg ๐‘—๐‘—

โ€ข = โˆ‘๐‘–๐‘–=1๐‘›๐‘› ๐‘“๐‘“๐‘–๐‘–2deg ๐‘–๐‘– โˆ’ โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘–๐‘“๐‘“๐‘–๐‘–๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

โ€ข = ๐’‡๐’‡โ€ฒ๐‘ซ๐‘ซ๐’‡๐’‡ โˆ’ ๐’‡๐’‡โ€ฒ๐‘พ๐‘พ๐’‡๐’‡

โ€ข = ๐’‡๐’‡โ€ฒ๐‘ณ๐‘ณ๐’‡๐’‡

28

Statistical Machine Learning (S2 2017) Deck 16

Laplace recontre Lagrangeโ€ข Our problem becomes to minimise ๐’‡๐’‡โ€ฒ๐‘ณ๐‘ณ๐’‡๐’‡, subject to ๐’‡๐’‡โ€ฒ๐’‡๐’‡ = 1. Recall

the method of Lagrange multipliers. Introduce a Lagrange multiplier ๐œ†๐œ†, and set derivatives of the Lagrangian to zero

โ€ข ๐“›๐“› = ๐’‡๐’‡โ€ฒ๐‘ณ๐‘ณ๐’‡๐’‡ โˆ’ ๐œ†๐œ† ๐’‡๐’‡โ€ฒ๐’‡๐’‡ โˆ’ 1

โ€ข 2๐’‡๐’‡โ€ฒ๐‘ณ๐‘ณโ€ฒ โˆ’ 2๐œ†๐œ†๐’‡๐’‡โ€ฒ = 0

โ€ข ๐‘ณ๐‘ณ๐’‡๐’‡ = ๐œ†๐œ†๐’‡๐’‡

โ€ข The latter is precisely the definition of an eigenvector with ๐œ†๐œ† being the corresponding eigenvalue!

โ€ข Critical points of our objective function ๐’‡๐’‡โ€ฒ๐‘ณ๐‘ณ๐’‡๐’‡ = 12โˆ‘๐‘–๐‘–,๐‘–๐‘– ๐‘“๐‘“๐‘–๐‘– โˆ’ ๐‘“๐‘“๐‘–๐‘–

2๐‘ค๐‘ค๐‘–๐‘–๐‘–๐‘–

are eigenvectors of ๐‘ณ๐‘ณ

โ€ข Note that the function is actually minimised using eigenvector ๐Ÿ๐Ÿ, which is not useful. Therefore, for 1D mapping we use an eigenvector with the second smallest eigenvalue

29

Dรฉjร  vu?

Statistical Machine Learning (S2 2017) Deck 16

Laplacian eigenmaps: summaryโ€ข Start with points ๐’™๐’™๐‘–๐‘– โˆˆ ๐‘น๐‘น๐‘š๐‘š. Construct a similarity graph using

one of 3 options

โ€ข Construct weighted adjacency matrix ๐‘พ๐‘พ (do not compute shortest paths) and the corresponding Laplacian matrix ๐‘ณ๐‘ณ

โ€ข Compute eigenvectors of ๐‘ณ๐‘ณ, and arrange them in the order of the corresponding eignevalues 0 = ๐œ†๐œ†1 < ๐œ†๐œ†2 < โ‹ฏ < ๐œ†๐œ†๐‘›๐‘›

โ€ข Take eigenvectors corresponding to ๐œ†๐œ†2 to ๐œ†๐œ†๐‘™๐‘™+1 as ๐’‡๐’‡1, โ€ฆ ,๐’‡๐’‡๐‘™๐‘™, ๐‘™๐‘™ < ๐‘š๐‘š, where each ๐’‡๐’‡๐‘–๐‘– corresponds to one of the new dimensions

โ€ข Combine all vectors into an ๐‘›๐‘› ร— ๐‘™๐‘™ matrix, with ๐’‡๐’‡๐‘–๐‘– in columns. The mapped points are the rows of the matrix

30

Statistical Machine Learning (S2 2017) Deck 16

Spectral clustering: summary

31

1. Construct a similarity graph

2. Map data to a lower dimensional space using Laplacian eigenmaps on the adjacency matrix

3. Apply k-means clusteringto the mapped points

spectral clustering result

Statistical Machine Learning (S2 2017) Deck 16

This lectureโ€ข Introduction to manifold learning

โˆ— Motivationโˆ— Focus on data transformation

โ€ข Unfolding the manifoldโˆ— Geodesic distancesโˆ— Isomap algorithm

โ€ข Spectral clusteringโˆ— Laplacian eigenmapsโˆ— Spectral clustering pipeline

32

top related