graph signal processing for geometric data and beyond ... · traditional image/video processing...

16
1 Graph Signal Processing for Geometric Data and Beyond: Theory and Applications Wei Hu, Member, IEEE, Jiahao Pang, Member, IEEE, Xianming Liu, Member, IEEE, Dong Tian, Senior Member, IEEE, Chia-Wen Lin, Fellow, IEEE, and Anthony Vetro, Fellow, IEEE Abstract—Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)—a fast-developing field in the signal processing community—enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high- level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges. Index Terms—Graph Signal Processing (GSP), Geometric Data, Riemannian Manifold, Depth Image, Point Cloud I. I NTRODUCTION R ECENT advances in depth sensing, laser scanning and image processing have enabled convenient acquisition and extraction of geometric data from real-world scenes, which can be digitized and formatted in a number of different ways. Efficiently representing, processing and analyzing geometric data is central to a wide range of applications from augmented and virtual reality [1], [2] to autonomous driving [3] and surveillance/monitoring applications [4]. Geometric data may be represented in various data formats. It has been recognized by Adelson, et al. [5] that different representations of a scene can be expressed as approximations of the plenoptic function, which is a high-dimensional math- ematical representation that provides complete information about any point within a scene and also how it changes when observed from different positions. This connection among the different scene representations has also been embraced Wei Hu is with Wangxuan Institute of Computer Technology, Peking University, Beijing, China. (e-mail: [email protected]) Jiahao Pang and Dong Tian are with InterDigital, Princeton, NJ, USA. (e- mail: [email protected], [email protected]) Xianming Liu is with Harbin Institute of Technology, Harbin, China. (e- mail: [email protected]) Chia-Wen Lin is with Department of Electrical Engineering and Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan. (e-mail: [email protected]) Anthony Vetro is with Mitsubishi Electric Research Laboratories, Cam- bridge, MA, USA. (e-mail: [email protected]) Corresponding author: Chia-Wen Lin. and reflected in the work plans for the development of the JPEG Pleno standardization framework [6]. In this paper, we mainly consider explicit representations of geometry, which directly describe the underlying geometry, but the framework and techniques extend to implicit representations of geometry, in which the underlying geometry is present in the data but needs to be inferred, e.g., from camera data. Examples of explicit geometric representations include 2D geometric data (e.g., depth maps), 3D geometric data (e.g., point clouds and meshes), and 4D geometric data (e.g., dynamic point clouds). Examples of implicit geometric representations include camera based inputs, e.g., multiview video. For many cases of interest that aim to render immersive imagery of a scene, the focus will be on dense representations of geometry. However, there are also some applications of interest that benefit from sparse representations of geometry, such as human activity analysis, in which the geometry of the human body can be represented with few data points. Traditional image/video processing techniques assume sam- pling patterns over regular grids and have limitations when dealing with the wide range of geometric data formats, some of which have irregular sampling patterns. To overcome the limitations of traditional techniques, Graph Signal Processing (GSP) techniques have been proposed and developed in recent years to process signals that reside over connected graph nodes [7]–[9]. For geometric data, each sample is denoted by a graph node and the associated 3D coordinate (or depth) is the signal to be analyzed. The underlying surface of geometric data provides an intrinsic graph connectivity or graph topol- ogy. The graph-based representation has several advantages over conventional representations in that it is more compact and accurate, and also structure-adaptive since it naturally captures geometric characteristics in the data, such as piece- wise smoothness (PWS) [10]. A unified framework of GSP for geometric data is illustrated in Fig. 1, in which we highlight how geometric data and graph operators are counterparts in the context of Riemannian manifolds. Given functions on Riemannian manifolds (sur- faces), geometric data are discrete samples of the functions representing the geometry of objects. Correspondingly, graph operators are discrete counterparts of the functionals on Rie- mannian manifolds, which are capable of manipulating the 3D geometry data. Furthermore, it has been shown that graph operators converge to functionals on Riemannian manifolds under certain constraints [11], [12], while graph regularizers converge to smooth functionals on Riemannian manifolds [13], [14]. Hence, GSP tools are naturally advantageous for geomet- arXiv:2008.01918v1 [cs.CV] 5 Aug 2020

Upload: others

Post on 19-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Graph Signal Processing for Geometric Data andBeyond: Theory and Applications

    Wei Hu, Member, IEEE, Jiahao Pang, Member, IEEE, Xianming Liu, Member, IEEE,Dong Tian, Senior Member, IEEE, Chia-Wen Lin, Fellow, IEEE, and Anthony Vetro, Fellow, IEEE

    Abstract—Geometric data acquired from real-world scenes,e.g., 2D depth images, 3D point clouds, and 4D dynamic pointclouds, have found a wide range of applications includingimmersive telepresence, autonomous driving, surveillance, etc.Due to irregular sampling patterns of most geometric data,traditional image/video processing methodologies are limited,while Graph Signal Processing (GSP)—a fast-developing field inthe signal processing community—enables processing signals thatreside on irregular domains and plays a critical role in numerousapplications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, weprovide the first timely and comprehensive overview of GSPmethodologies for geometric data in a unified manner by bridgingthe connections between geometric data and graphs, among thevarious geometric data modalities, and with spectral/nodal graphfiltering techniques. We also discuss the recently developed GraphNeural Networks (GNNs) and interpret the operation of thesenetworks from the perspective of GSP. We conclude with a briefdiscussion of open problems and challenges.

    Index Terms—Graph Signal Processing (GSP), GeometricData, Riemannian Manifold, Depth Image, Point Cloud

    I. INTRODUCTION

    RECENT advances in depth sensing, laser scanning andimage processing have enabled convenient acquisitionand extraction of geometric data from real-world scenes, whichcan be digitized and formatted in a number of different ways.Efficiently representing, processing and analyzing geometricdata is central to a wide range of applications from augmentedand virtual reality [1], [2] to autonomous driving [3] andsurveillance/monitoring applications [4].

    Geometric data may be represented in various data formats.It has been recognized by Adelson, et al. [5] that differentrepresentations of a scene can be expressed as approximationsof the plenoptic function, which is a high-dimensional math-ematical representation that provides complete informationabout any point within a scene and also how it changes whenobserved from different positions. This connection amongthe different scene representations has also been embraced

    Wei Hu is with Wangxuan Institute of Computer Technology, PekingUniversity, Beijing, China. (e-mail: [email protected])

    Jiahao Pang and Dong Tian are with InterDigital, Princeton, NJ, USA. (e-mail: [email protected], [email protected])

    Xianming Liu is with Harbin Institute of Technology, Harbin, China. (e-mail: [email protected])

    Chia-Wen Lin is with Department of Electrical Engineering and Instituteof Communications Engineering, National Tsing Hua University, Hsinchu,Taiwan. (e-mail: [email protected])

    Anthony Vetro is with Mitsubishi Electric Research Laboratories, Cam-bridge, MA, USA. (e-mail: [email protected])

    Corresponding author: Chia-Wen Lin.

    and reflected in the work plans for the development of theJPEG Pleno standardization framework [6]. In this paper, wemainly consider explicit representations of geometry, whichdirectly describe the underlying geometry, but the frameworkand techniques extend to implicit representations of geometry,in which the underlying geometry is present in the data butneeds to be inferred, e.g., from camera data. Examples ofexplicit geometric representations include 2D geometric data(e.g., depth maps), 3D geometric data (e.g., point clouds andmeshes), and 4D geometric data (e.g., dynamic point clouds).Examples of implicit geometric representations include camerabased inputs, e.g., multiview video. For many cases of interestthat aim to render immersive imagery of a scene, the focuswill be on dense representations of geometry. However, thereare also some applications of interest that benefit from sparserepresentations of geometry, such as human activity analysis,in which the geometry of the human body can be representedwith few data points.

    Traditional image/video processing techniques assume sam-pling patterns over regular grids and have limitations whendealing with the wide range of geometric data formats, someof which have irregular sampling patterns. To overcome thelimitations of traditional techniques, Graph Signal Processing(GSP) techniques have been proposed and developed in recentyears to process signals that reside over connected graphnodes [7]–[9]. For geometric data, each sample is denoted bya graph node and the associated 3D coordinate (or depth) isthe signal to be analyzed. The underlying surface of geometricdata provides an intrinsic graph connectivity or graph topol-ogy. The graph-based representation has several advantagesover conventional representations in that it is more compactand accurate, and also structure-adaptive since it naturallycaptures geometric characteristics in the data, such as piece-wise smoothness (PWS) [10].

    A unified framework of GSP for geometric data is illustratedin Fig. 1, in which we highlight how geometric data andgraph operators are counterparts in the context of Riemannianmanifolds. Given functions on Riemannian manifolds (sur-faces), geometric data are discrete samples of the functionsrepresenting the geometry of objects. Correspondingly, graphoperators are discrete counterparts of the functionals on Rie-mannian manifolds, which are capable of manipulating the3D geometry data. Furthermore, it has been shown that graphoperators converge to functionals on Riemannian manifoldsunder certain constraints [11], [12], while graph regularizersconverge to smooth functionals on Riemannian manifolds [13],[14]. Hence, GSP tools are naturally advantageous for geomet-

    arX

    iv:2

    008.

    0191

    8v1

    [cs

    .CV

    ] 5

    Aug

    202

    0

  • 2

    Functions on

    Riemannian Manifolds

    Sample

    Functional on Riemannian Manifolds

    Discrete Counterpart

    Geometric

    Data

    Graph Operator

    2D depth map 3D Point Cloud

    4D Point Cloudtime

    Spectral-domain

    MethodsNodal-domain

    Methods

    Graph Signal Processing /

    Graph Neural Networks

    Graph

    Inference

    Fig. 1: Illustration of a unified framework of GSP for geometric data processing.

    ric data processing by representing the underlying topology ofgeometry on graphs.

    The graph operator is typically constructed based on do-main knowledge or driven by training data as shown inFig. 1. It essentially specifies a graph filtering process, whichcan be performed either in the spectral-domain (i.e., graphtransform domain) [15] or nodal-domain (i.e., spatial do-main) [16]. Nodal-domain approaches typically avoid eigen-decomposition for fast computing over large-scale data whilestill relying on spectral analysis to provide insights [17].A nodal-domain method might also be specified through agraph regularizer to enforce graph-signal smoothness [18]–[20]. Sparsity and smoothness are two widely used domainmodels. Additionally, Graph Neural Networks (GNNs) havebeen developed to enable inference of graph signals includinggeometric data [21].

    In practice, GSP for geometric data plays a critical rolein numerous applications of geometric data, from low-levelprocessing, such as restoration and compression, to high-levelanalysis. The processing of geometric data includes denoising,enhancement and resampling, as well as compression such aspoint cloud coding standardized in MPEG1 and JPEG Pleno2,while the analysis of geometric data addresses supervised orunsupervised feature learning for classification, segmentation,detection and generation. These applications are unique rela-tive to the use of GSP techniques for other data in terms ofthe signal model and processing methods.

    The remainder of this paper is organized as follows. Sec-tion II reviews basic concepts in GSP, graph Fourier Trans-form, as well as interpretation of graph variation operatorsboth in the discrete domain and continuous domain. Section IIIintroduces the graph representation of geometric data based ontheir characteristics, along with problems and applications ofgeometric data to be discussed throughout the paper. Then,we elaborate on spectral-domain GSP methods for geometricdata in Section IV and nodal-domain GSP methods in Sec-tion V. GNNs for geometric data are presented in Section VI,

    1https://mpeg.chiariglione.org/standards/mpeg-i/point-cloud-compression2https://jpeg.org/jpegpleno/

    where we provide potential interpretation of GNNs from theperspective of GSP. Finally, Section VII concludes this paperwith remarks on open problems.

    II. REVIEW: GRAPH SIGNAL PROCESSING

    A. Graph Variation Operators and Graph Signal

    We denote a graph G = {V, E ,A}, which is composed of avertex set V of cardinality |V| = N , an edge set E connectingvertices, and an adjacency matrix A. Each entry ai,j in Arepresents the weight of the edge between vertices i and j,which often captures the similarity between adjacent vertices.In geometric data processing, we often consider an undirectedgraph with non-negative edge weights, i.e., ai,j = aj,i ≥ 0.

    Among variation operators in GSP, we focus on the com-monly used graph Laplacian matrix. The combinatorial graphLaplacian [7] is defined as L := D−A, where D is the degreematrix—a diagonal matrix where di,i =

    ∑Nj=1 ai,j . Given

    real and non-negative edge weights in an undirected graph,Ł is real, symmetric, and positive semi-definite [22]. Thesymmetrically normalized version is Lsym := D−

    12 LD−

    12 , and

    the random walk graph Laplacian is Lrw := D−1L, which areoften used for theoretical analysis or in neural networks dueto the normalization property.

    A graph signal is a function that assigns a scalar or vectorto each vertex. For simplicity, we consider x : V → R, suchas the intensity on each vertex of a mesh. We denote graphsignals as x ∈ RN , where xi represents the signal value at thei-th vertex.

    B. Graph Fourier Transform

    Because L is a real symmetric matrix, it admits an eigen-decomposition Ł = UΛU>, where U = [u1, ...,uN ] isan orthonormal matrix containing the eigenvectors ui, andΛ = diag(λ1, ..., λN ) consists of eigenvalues {λ1 = 0 ≤λ2 ≤ ... ≤ λN}. We refer to the eigenvalue λi as the graphfrequency/spectrum, with a smaller eigenvalue correspondingto a lower graph frequency.

  • 3

    (a) 2D Depth Map (b) 3D Point Cloud (c) 4D Dynamic Point Cloud

    time

    Fig. 2: Geometric data and their graph representations. The graphs of the patches enclosed in red squares are shown at thebottom; the vertices are colored by the corresponding graph signals. (a) 2D Depth map [23]. (b) 3D Point cloud [24]. (c) 4Ddynamic point cloud [25], where the temporal edges of a point P are also shown.

    For any graph signal x ∈ RN residing on the vertices of G,its graph Fourier transform (GFT) x̂ ∈ RN is defined as [15]

    x̂ = U>x. (1)

    The inverse GFT follows as

    x = Ux̂. (2)

    With an appropriately constructed graph that captures thesignal structure well, the GFT will lead to a compact repre-sentation of the graph signal in the spectral domain, which isbeneficial for geometric data processing such as reconstructionand compression. Other GFT definitions, such as [26]–[29],can also be employed without changing the framework.

    C. Interpretation of Graph Variation Operators

    The graph variation operators have various interpretations,both in the discrete domain and the continuous domain.

    In the discrete domain, we can interpret graph Lapla-cian matrices by precision matrices under Gaussian-MarkovRandom Fields (GMRFs) from a probabilistic perspective,and thus further show the GFT approximates the Karhunen-Loève transform (KLT) for signal decorrelation under GMRFs.As discussed in [30], there is a one-to-one correspondencebetween precision matrices of different classes of GMRFsand types of graph Laplacian matrices. For instance, thecombinatorial graph Laplacian corresponds to the precisionmatrix of an attractive, DC-intrinsic GMRF. Further, as theeigenvectors of the precision matrix (the inverse of the co-variance matrix) constitute the basis of the KLT, the GFTapproximates the KLT under a family of statistical processes,as proved from different ways in [10], [31]–[33]. This indicatesthe GFT is approximately the optimal linear transform forsignal decorrelation, which is beneficial to the compressionof geometric data as will be discussed in Section IV-C2.

    In the continuous domain, instead of viewing a neighbor-hood graph as inherently discrete, it can be treated as a discrete

    approximation of a Riemannian manifold [11], [34]. Thus,as the number of vertices on a graph increases, the graphis converging to a Riemannian manifold. In this scenario,each observation of a graph signal is a discrete sample ofa continuous signal (function) defined on the manifold. Notethat not all graph signals can be interpreted in the continuousdomain: voting pattern in a social network or paper informa-tion in a citation network is inherently discrete. With a focuson geometric data which are indeed signals captured from acontinuous surface, we have a continuous-domain interpre-tation of graph signals as discrete samples of a continuousfunction (Fig. 1). The link between neighborhood graphs andRiemannian manifolds enables us to process geometric datawith tools from differential geometry and variational methods[12]. For instance, the graph Laplacian operator convergesto the Laplace-Beltrami operator in the continuous manifoldwhen the number of samples tends to infinity. Hence, withoutthe direct access to the underlying geometry (surface), it isstill possible to infer the property of the geometry based onits discrete samples.

    III. GRAPH REPRESENTATIONS OF GEOMETRIC DATA

    In this section, we elaborate on the graph representations ofgeometric data, which arise from the unique characteristics ofgeometric data and serve as the basis of GSP for geometricdata processing. Also, we discuss and compare with non-graphrepresentations, which helps understand the advantages andinsights of graph representations.

    A. Problems and Challenges of Geometric Data

    There exist various problems associated with geometricdata, e.g., noise, holes (incomplete data), compression arti-facts, large data size, and irregular sampling. For instance,due to inherent limitations in the sensing capabilities andviewpoints that are acquired, geometric data often suffer fromnoise and holes, which will affect the subsequent rendering

  • 4

    TABLE I: Representative Geometric Datasets and Relevant Application Scenarios.

    Geometric Data Format Datasets Contents Typical Applications/Tasks

    2D depth map

    FlyingThings3D [35] Synthetic scene

    Stereo matching, depth completionMiddlebury [23]

    Indoor sceneTsukuba [36]

    KITTI [37] Driving scene

    3D point cloud

    Stanford 3D Scanning Repository [38]Single object

    3D telepresence, surface reconstructionBenchmark [39]

    MPEG Sequences [40]Single personMicrosoft Sequences [25]

    ShapeNet [41]Single object Classification, part segmentationModelNet [42]

    Stanford Large-Scale 3D Indoor Spaces Dataset [43]Indoor scene

    Semantic/instance segmentationScanNet [24]

    KITTI [37]Driving sceneWAYMO Open Dataset [44]

    4D dynamic point cloud

    MPEG Sequences [40]Single person 3D telepresence, compressionMicrosoft sequences [25]

    KITTI [37]Driving scene Semantic/instance segmentation, detectionSemantic KITTI [45]

    or downstream inference tasks since the underlying structuresare deformed.

    These problems must be accounted for in the diverse rangeof applications that rely on geometric data, including pro-cessing (e.g., restoration and enhancement), compression, andanalysis (e.g., classification, segmentation, and recognition).Some of the representative geometric datasets along with thecorresponding application scenarios are summarized in Table I.

    We assert that the chosen representation of geometric datais critically important in addressing these problems and ap-plications. Next, we discuss the characteristics of geometricdata, which lay the foundation for using graphs for therepresentation.

    B. Characteristics of Geometric Data

    Geometric data represent the geometric structure underlyingthe surface of objects and scenes in the 3D world, and haveunique characteristics that capture structural properties.

    For example, 2D depth maps characterize the per-pixelphysical distance between objects in a 3D scene and the sensor,which usually consists of sharp boundaries and smooth interiorsurfaces—referred to as piece-wise smoothness (PWS) [10],as shown in Fig. 2(a). The PWS property is suitable to bedescribed by a graph, where most edge weights are 1 forsmooth surfaces and a few weights are 0 for discontinuitiesacross sharp boundaries. Such a graph construction will leadto a compact representation in the GFT domain, where mostenergy is concentrated on low-frequency components for thedescription of smooth surfaces [10].

    3D geometric data such as point clouds form omnidirec-tional representations of a geometric structure in the 3D world.As shown in Fig. 2(b), the underlying surface of the 3Dgeometric data often exhibits PWS, as given by the normalsof the data [46].

    In 4D geometric data such as dynamic point clouds, there isa great deal of consistency along the temporal dimension [47],[48], as shown in Fig. 2(c). In contrast to conventional video

    data, the temporal correspondences in dynamic point cloudsare difficult to track, mainly because 1) the sampling patternmay vary from frame to frame due to the irregular sampling;and 2) the number of points in each frame of a dynamic pointcloud sequence may vary significantly.

    C. Non-Graph Representations of Geometric Data

    Non-graph representations of geometric data are often quan-tized onto regular grids. For instance, depth maps are usuallyrepresented as standard gray-level images, while 3D pointclouds are quantized onto regular voxels [49] or projected ontoa set of images from multiple viewpoints [50]. These non-graph representations enable the processing and analysis ofgeometric data with classical methods proposed for Euclideandata such as images, videos, and regular 3D data.

    However, such representations have the following limita-tions: 1) The representations are sometimes redundant, e.g.,a voxel-based representation for point clouds still needs torepresent an unoccupied space without any point, leading tounnecessarily large data; 2) The representations are sometimesinaccurate, e.g., due to quantization loss introduced by voxelsor projection errors when the point clouds are represented bya set of images; and 3) The representations are often deficientin capturing the underlying geometric structure explicitly.

    D. Graph Representations of Geometric Data

    In contrast, graphs provide compact, accurate, andstructure-adaptive representations for geometric data, whichfurther inspire new insights and understanding.

    To represent geometric data on a graph G = {V, E ,A}, weconsider points in the data (e.g., pixels in depth maps, pointsin point clouds and meshes) as vertices V with cardinalityN . Further, for the i-th point, we represent the coordinate andpossibly associated attribute (pi,ai) of each point as the graphsignal on each vertex, where pi ∈ R2 or pi ∈ R3 representsthe 2D or 3D coordinate of the i-th point (e.g., 2D for depthmaps, and 3D for point clouds), and ai represents associated

  • 5

    attributes, such as depth values, RGB colors, reflection intensi-ties, and surface normals. To ease mathematical computation,we denote the graph signal of all vertices by a matrix,

    X =

    xT1xT2...

    xTN

    ∈ RN×d, (3)where the i-th row vector xTi = [p

    Ti a

    Ti ] ∈ R1×d represents

    the graph signal on the i-th vertex and d denotes the dimensionof the graph signal.

    To capture the underlying structure, we use edges E in thegraph to describe the pairwise relationship between points,which is encoded in the adjacency matrix A as reviewed inSection II. The construction of A, i.e., graph construction, iscrucial to characterize the underlying topology of geometricdata. We classify existing graph construction methods mainlyinto two families: 1) model-based graph construction, whichbuilds graphs with models from domain knowledge [51], [52];and 2) learning-based graph construction, which infers/learnsthe underlying graph from geometric data [53], [54].

    Model-based graph construction for geometric data often as-sumes edge weights are inversely proportional to the affinitiesin coordinates, such as a K-nearest-neighbor graph (K-NNgraph) and an �-neighborhood graph (�-N graph). A K-NNgraph is a graph in which two vertices are connected by anedge, when their Euclidean distance is among the K-th small-est Euclidean distances from one point to the others; while inan �-N graph, two vertices are connected if their Euclideandistance is smaller than a given threshold �. A K-NN graphintends to maintain a constant vertex degree in the graph, thatmay lead to a more stable algorithm implementation; while an�-N graph intends to make the vertex degree to reflect localpoint density, leading to more physical interpretation. Thoughthese graphs exhibit manifold convergence properties [11],[12], it still remains challenging to find an efficient estimationof the sparsification parameters such as K and � given finiteand non-uniformly sampled data.

    In learning-based graph construction, the underlying graphtopology is inferred or optimized from geometric data interms of a certain optimization criterion, such as enforcinglow frequency representations of observed signals. For ex-ample, given a single or partial observation, [20] optimizesa distance metric from relevant feature vectors on verticesby minimizing the graph Laplacian regularizer, leading tolearned edge weights. Besides, edge weights could be trainablein an end-to-end learning manner [55]. Also, general graphlearning methodologies can apply to the graph construction ofgeometric data [53], [54].

    IV. SPECTRAL-DOMAIN GSP METHODS FORGEOMETRIC DATA

    Based on the aforementioned graph representations, wewill elaborate on GSP methodologies for geometric data,starting from the spectral-domain methods that offer spectralinterpretations.

    A. Basic Principles

    Spectral-domain methods represent geometric data in thegraph transform domain and perform filtering on the resultingtransform coefficients. While various graph transforms exist,we focus our discussion on the Graph Fourier Transform(GFT) discussed in Section II-B without loss of generality.

    Let the frequency response of a graph spectral filtering bedenoted by ĥ(λk) (k = 1, . . . , N), then the graph spectralfiltering takes the form

    Y = U

    ĥ(λ1) . . .ĥ(λN )

    U>X. (4)This filtering first transforms the geometric data X into theGFT domain U>X, performs filtering on each eigenvalue(i.e., the spectrum of the graph), and finally projects back tothe spatial domain via the inverse GFT to acquire the filteredoutput Y.

    As discussed in Section II-C, the GFT leads to compactrepresentation of geometric data if the constructed graphcaptures the underlying topology well. Based on the GFTrepresentation, the key issue is to specify N graph frequencyresponses {ĥ(λk)}Nk=1 to operate on the geometric data; thesefilters should be designed according to the specific task.Widely used filters include low-pass graph spectral filters andhigh-pass graph spectral filters, which will be discussed furtherin the next subsection.

    Due to the computational complexity of graph transforms,which often involve full eigen-decomposition, this class ofmethods are either dedicated to small-scale geometric dataor applied in a divide-and-conquer manner. For instance, onemay divide a point cloud into regular cubes, and perform graphspectral filtering on individual cubes separately. Also, one maydeploy a fast algorithm of GFT (e.g., the fast GFT in [28]),to accelerate the spectral filtering process.

    B. Representative Graph Spectral Filtering

    1) Low-Pass Graph Spectral Filtering: Analogous to pro-cessing digital images in the regular 2D grid, we can use alow-pass graph filter to capture the rough shape of geometricdata and attenuate noise under the assumption that signalsare smooth in the associated data domain. In practice, ageometric signal (e.g., coordinates, normals) is inherentlysmooth with respect to the underlying graph, where high-frequency components are likely to be generated by fine detailsor noise. Hence, we can perform geometric data smoothingvia a low-pass graph filter, essentially leading to a smoothedrepresentation in the underlying manifold.

    One intuitive realization is an ideal low-pass graph filter,which completely eliminates all graph frequencies above agiven bandwidth while keeping those below unchanged. Thegraph frequency response of an ideal low-pass graph filter withbandwidth b is

    ĥ(λk) =

    {1, k ≤ b,0, k > b,

    (5)

  • 6

    (a) Original. (b) 10 frequencies. (c) 100 frequencies. (d) 400 frequencies. (e) Graph spectral distribution.

    Fig. 3: Low-pass approximation of point cloud Bunny. Plot (a) is the original point cloud with 35,947 points. Plots (b), (c)and (d) show the low-pass approximations with 10, 100 and 400 graph frequency components , respectively. (e) presents themain graph spectral distribution with frequencies higher than 500 omitted as the corresponding magnitudes are around zero.

    which projects the input geometric data into a bandlimitedsubspace by removing components corresponding to largeeigenvalues (i.e., high-frequency components).

    The smoothed result provides a bandlimited approximationof the original geometric data. Fig. 3 demonstrates an exampleof the bandlimited approximation of the 3D coordinates ofpoint cloud Bunny (35947 points) [38] with 10, 100 and 400graph frequencies, respectively. Specifically, we construct aK-NN graph (K = 10) on the point cloud, and compute thecorresponding GFT. Then we set the respective bandwidth forlow-pass filtering as in (4) and (5). One can observe that thefirst 10 low-frequency components are able to represent therough shape, with finer details becoming more apparent withadditional graph frequencies. This validates the assertion thatthe GFT achieves energy compaction for geometric data.

    Another simple choice is a Haar-like low-pass graph filteras discussed in [56], with the graph frequency response as

    ĥ(λk) = 1− λk/λmax, (6)

    where λmax = λN is the maximum eigenvalue for normal-ization. As λk−1 ≤ λk, we have ĥ(λk−1) ≥ ĥ(λk). As such,low-frequency components are preserved while high-frequencycomponents are attenuated.

    2) High-Pass Graph Spectral Filtering: In contrast to thelow-pass filtering, high-pass filtering eliminates low-frequencycomponents and detects large variations in geometric data,such as geometric contours or texture variations. A simpledesign is a Haar-like high-pass graph filter with the followinggraph frequency response

    ĥ(λk) = λk/λmax. (7)

    As λk−1 ≤ λk, we have ĥ(λk−1) ≤ ĥ(λk). This indicatesthat lower-frequency responses are attenuated while high-frequency responses are preserved.

    3) Graph Spectral Filtering with a Desired Distribution:We may also design a desirable spectral distribution and thenuse graph filter coefficients to fit this distribution. For example,an L-length graph filter is in the form of a diagonal matrix:

    ĥ(Λ) =

    ∑L−1k=0 ĥkλ

    k1

    . . . ∑L−1k=0 ĥkλ

    kN

    , (8)

    where Λ is a diagonal matrix containing eigenvalues of graphLaplacian Ł as discussed in Section II-B, and ĥk is thefilter coefficients. If the desirable response of the i-th graphfrequency is ci, we let

    ĥ(λi) =

    L−1∑k=0

    ĥkλki = ci, (9)

    and solve a set of linear equations to obtain the graph filtercoefficients, ĥk. An alternative to construct such a graph filteris the Chebyshev polynomial coefficients introduced in [15].

    C. Applications in Geometric Data

    Having discussed graph spectral filtering, we review somerepresentative applications of spectral-domain GSP methodsfor geometric data, including restoration and compression.

    1) Geometric Data Restoration: Low-pass graph spectralfiltering is often designed for geometric data restoration suchas denoising. As demonstrated in the example of Fig. 3, cleangeometric data such as point clouds are dominated by low-frequency components in the GFT domain. Hence, a carefullydesigned low-pass filter is able to remove high-frequencycomponents that are likely introduced by noise or outliers.

    Based on this principle, Hu et al. proposed depth mapdenoising by iterative thresholding in the GFT domain [57]. Tojointly exploit local smoothness and non-local self-similarityof a depth map, they cluster self-similar patches and computean average patch, from which a graph is deduced to describecorrelations among adjacent pixels. Then self-similar patchesare transformed into the same GFT domain, where the GFTbasis is computed from the derived correlation graph. Finally,iterative thresholding in the GFT domain is performed as theideal low-pass graph filter in (5) to enforce group sparsity.

    Rosman et al. proposed spectral point cloud denoising basedon the non-local framework as well [58]. Similar to Block-Matching 3D filtering (BM3D) [59], they group similar surfacepatches into a collaborative patch, and compute the graphLaplacian from this grouping. Then they perform shrinkagein the GFT domain by a low-pass filter similar to (5), whichleads to denoising of the collaborative patch.

    In contrast, high-pass graph filtering can be used to detectcontours in 3D point cloud data as these are usually rep-resented by high-frequency components. For instance, Chen

  • 7

    et al. proposed a high-pass graph-filtering-based resamplingstrategy to highlight contours for large-scale point cloudvisualization; the same technique can also be used to extractkey points for accurate 3D registration [56].

    2) Geometric Data Compression: Transform-based cod-ing is generally a low-pass filtering approach. When codingpiece-wise smooth geometric data, the GFT produces smallor zero high-frequency components since it does not filteracross boundaries, thus leading to a compact representation inthe transform domain. Further, as discussed in Section II-C,the GFT approximates the KLT in terms of optimal signaldecorrelation under a family of statistical processes.

    Graph transform coding is suitable for depth maps dueto the piece-wise smoothness. Shen et al. first introduced agraph-based representation for depth maps that is adaptiveto depth discontinuities, transforming the depth map into theGFT domain for compression and outperforming traditionalDCT coding [51]. Variants of this work include [60], [61].To further exploit the piece-wise smoothness of depth maps,Hu et al. proposed a multi-resolution compression framework,where boundaries are encoded in the original high resolutionto preserve sharpness, and smooth surfaces are encoded atlow resolution for greater efficiency [10], [62]. It is alsoshown in [10] that the GFT approximates the KLT under amodel specifically designed to characterize piece-wise smoothsignals. Other graph transforms for depth map coding includeGeneralized Graph Fourier Transforms (GGFTs) [32] andlifting transforms on graphs [63].

    3D point clouds also exhibit certain piece-wise smoothnessin both geometry and attributes. Zhang et al. first proposedusing graph transforms for attribute compression of staticpoint clouds [52], where graphs are constructed over localneighborhoods in the point cloud by connecting nearby points,and the attributes are treated as graph signals. The graphtransform decorrelates the signal and was found to be muchmore efficient than traditional octree-based coding methods.Other follow up work includes graph transforms for sparsepoint clouds [64], [65], graph transforms with optimizedLaplacian sparsity [66], normal-weighted graph transforms[67], Gaussian Process Transform (GPT) [68], and graphtransforms for the enhancement layer [69].

    In 4D dynamic point clouds, motion estimation becomesnecessary to remove the temporal redundancy [47], [70],[71]. Thanou et al. represented the time-varying geometry ofdynamic point clouds with a set of graphs, and considered 3Dpositions and color attributes of the point clouds as signalson the vertices of the graphs [47]. Motion estimation isthen cast as a feature matching problem between successivegraphs based on spectral graph wavelets. Dynamic point cloudcompression remains a challenging task as each frame isirregularly sampled without any explicit temporal pointwisecorrespondence with neighboring frames.

    V. NODAL-DOMAIN GSP METHODS FORGEOMETRIC DATA

    A. Basic Principles

    In contrary to spectral-domain GSP methods, this class ofmethods perform filtering on geometric data locally in the

    nodal domain, which is often computationally efficient andthus amenable to large-scale data.

    Let Nn,p be a set of p-hop neighborhood nodes of the n-thvertex, whose cardinality often varies according to n. Nodal-domain filtering is typically defined as a linear combinationof local neighboring vertices

    yn :=∑

    j∈Nn,p

    hn,jxj , (10)

    where hn,j denotes filter coefficients of the graph filter. SinceNn,p is node-dependent, hn,j needs to be properly definedaccording to n.

    Typically, hn,j may be parameterized as a function of theadjacency matrix A:

    y = h(A)x, (11)

    where

    h(A) =

    K−1∑k=0

    hkAk = h0I+h1A+ . . .+hK−1A

    K−1. (12)

    Here hk is the k-th filter coefficient that quantifies the con-tribution from the k-hop neighbors, and K is the length ofthe graph filter. Ak determines the k-hop neighborhood bydefinition, thus a higher order corresponds to a larger filteringrange in the graph vertex domain. When operating A on agraph signal, it computes the average of the neighboring signalof each vertex, which is essentially a low-pass filter.

    A can be replaced by other graph operators such as thegraph Laplacian Ł:

    h(Ł) =K−1∑k=0

    hkŁk = h0I + h1Ł + . . .+ hK−1ŁK−1. (13)

    When operating Ł on a graph signal, it sums up the signaldifference between each vertex and its neighbors, which isessentially a high-pass filter.

    B. Nodal-domain OptimizationBesides direct filtering as in (10) or (11), nodal-domain

    filtering often employs graph priors for regularization. GraphSmoothness Regularizers (GSRs), which introduce priorknowledge about smoothness in the underlying graph signal,play a critical role in a wide range of inverse problems, suchas depth map denoising [13], [57], point cloud denoising [18],[20], and inpainting [75].

    1) Formulation: In general, the formulation to restore ageometric datum x with a signal prior, e.g., the GSR, is givenby the following maximum a posteriori optimization problem:

    x? = arg minx‖y −H(x)‖22 + µ · GSR(x,G), (14)

    where y is the observed signal and H(·) is a degradationoperator (e.g., down-sampling) defined over x. The first termin (14) is a data fidelity term; µ ∈ R balances the importancebetween the data fidelity term and the signal prior.

    Next, we discuss two classes of commonly used GSRs—Graph Laplacian Regularizer (GLR) and Graph Total Variation(GTV), as well as techniques to solve (14) with these priors.The property comparison of different GSRs is summarized inTable II.

  • 8

    TABLE II: Properties of different Graph Smoothness Regularizers (GSR).

    Graph Smoothness Regularizer (GSR) Math Expression Dynamic Typical Solver Typical Works

    Graph Laplacian Regularizer (GLR)∑

    i∼j ai,j · (xi − xj)2 No Direct Solver / CG to (17) [13], [18], [20]Reweighted Graph Laplacian Regularizer (RGLR)

    ∑i∼j ai,j(xi, xj) · (xi − xj)2 Yes Proximal Gradient [46]

    Graph Total Variation (GTV)∑

    i∼j ai,j · |xi − xj | No Primal-Dual Method [72], [73]Reweighted Graph Total Variation (RGTV)

    ∑i∼j ai,j(xi, xj) · |xi − xj | Yes ADMM [74]

    2) Graph Laplacian Regularizer (GLR): The most com-monly used GSR is the GLR. Given a graph signal x residingon the vertices of G encoded in the graph Laplacian Ł, theGLR can be expressed as

    x>Lx =∑i∼j

    ai,j · (xi − xj)2, (15)

    where i ∼ j means vertices i and j are connected, implyingthe underlying points on the geometry are highly correlated.ai,j is the corresponding element of the adjacency matrix A.The signal x is smooth with respect to G if the GLR issmall, as connected vertices xi and xj must be similar fora large edge weight ai,j ; for a small ai,j , xi and xj can differsignificantly. This prior also possesses a interpretation in thefrequency domain:

    x>Lx =

    N∑k=1

    λkx̂2k, (16)

    where λk is the k-th eigenvalue of L, and x̂k is the the k-thGFT coefficient. In other words, x̂2k is the energy in the k-thgraph frequency for geometric data x. Thus, a small x>Lxmeans that most of the signal energy is occupied by the low-frequency components.

    When we employ the GLR as the prior in (14) and assumeH(·) is differentiable, (14) exhibits a closed-form solution. Forsimplicity, we assume H = I (e.g., as in the denoising case),then setting the derivative of (14) to zero yields

    x? = (I + µŁ)−1y, (17)

    which is a set of linear equations and can be solved directlyor with conjugate gradient (CG) [76]. As Ł is a high-passoperator, the solution in (17) is essentially an adaptive low-pass filtering result from the observation y. This can also beindicated by the corresponding graph spectral response:

    ĥ(λk) = 1/(1 + µλk), (18)

    which is a low-pass filter since smaller λk’s correspond tolower frequencies. As described in Section IV-B1, the low-pass filtering will lead to smoothed geometric data with theunderlying shape retained.

    Further, as discussed in Section II-C, the graph Laplacianoperator converges to the Laplace-Beltrami operator on thegeometry in the continuous manifold when the number ofsamples tends to infinity. We can also interpret the GLR froma continuous manifold perspective. According to [12], given aRiemannian manifold M (or surface) and a set of N pointsuniformly sampled on M, an �-neighborhood graph G can beconstructed with each vertex corresponding to one sample on

    M. For a function x on manifold M and its discrete samplesx on graph G (a graph signal), under mild conditions,

    limN→∞�→0

    x>Lx ∼ 1|M|

    ∫M‖∇Mx(s)‖22ds, (19)

    where ∇M is the gradient operator on manifold M, and sis the natural volume element of M [12]. In other words,the GLR now converges to a smoothness functional definedon the associated Riemannian manifold. The relationship (19)reveals that the GLR essentially regularizes graph signals withrespect to the underlying manifold geometry, which justifiesthe usefulness of the GLR [77].

    In the aforementioned GLR, the graph Laplacian L is fixed,which does not promote reconstruction of the target signalwith discontinuities if the corresponding edge weights are notvery small. It is thus extended to Reweighted GLR (RGLR)in [13], [74], [78] by considering L as a learnable function ofthe graph signal x. The RGLR is defined as

    x>L(x)x =∑i∼j

    ai,j(xi, xj) · (xi − xj)2, (20)

    where ai,j(xi, xj) can be learned from the data. Now we havetwo optimization variables x and ai,j , which can be optimizedalternately via proximal gradient [79].

    It has been shown in [74] that minimizing the RGLR itera-tively can promote piece-wise smoothness in the reconstructedgraph signal x, assuming that the edge weights are appropri-ately initialized. Since geometric data often exhibits piece-wisesmoothness as discussed in Section III-B, the RGLR helps topromote this property in the reconstruction process.

    3) Graph Total Variation (GTV): Another popular lineof GSRs generalizes the well-known Total Variation (TV)regularizer [80] to graph signals, leading to the Graph TotalVariation (GTV) and its variants. The GTV is defined as [81]:

    ‖x‖GTV =∑i∼j

    ai,j · |xi − xj |. (21)

    where ai,j is fixed during the optimization. Since the GTV isnon-differentiable, (21) has no closed-form solution, but can besolved via existing optimization methods such as the primal-dual algorithm [82].

    Instead of using fixed A, Bai et al. extended the conven-tional GTV to the Reweighted GTV (RGTV) [74], wheregraph weights are dependent on x:

    ‖x‖RGTV =∑i∼j

    ai,j(xi, xj) · |xi − xj |. (22)

    This can be solved by ADMM [83] or the algorithm proposedin [74].

  • 9

    (a) Ground truth. (b) Noisy. (c) Spectral-LP. (d) Nodal- [18]. (e) Nodal- [20].

    Fig. 4: Point cloud denoising results with Gaussian noise σ = 0.04 for Quasimoto [39]: (a) The ground truth; (b) The noisypoint cloud; (c) The denoised result by graph spectral low-pass (LP) filtering that we implement according to (5); (d) Thedenoised result by a nodal-domain GSP method in [18]; (e) The denoised result by a nodal-domain GSP method in [20].

    The work of [74] also provides spectral interpretations of theGTV and RGTV by rewriting them as `1-Laplacian operatorson a graph. The spectral analysis demonstrates that the GTV isa stronger PWS-preserving filter than the GLR, and the RGTVhas desirable properties including robustness to noise and blur,and promotes sharpness. Hence, the RGTV is advantageous toboosting the piece-wise smoothness of geometric data.

    C. Applications in Geometric Data

    In the following, we review a few works on geometricdata restoration with nodal-domain GSP methods. First, wepresent a few applications recovering geometric data with thesimple-yet-effective GLR, and then extend our scope to moreadvanced graph smoothness regularizers.

    1) Geometric Data Restoration with the GLR: To copewith various geometric data restoration problems, GLR-basedmethods place more emphasis on the choice of the neighbor-hood graph and the algorithm design. For instance, to removeadditive white Gaussian noise (AWGN) from depth images,Pang and Cheung [13] adopted the formulation in (14) withthe GLR. To understand the behavior of the GLR for 2D depthimages, [13] performs an analysis in the continuous domain,leading to an �-neighborhood graph (Section III-D) which notonly smooths out noise but also sharpens edges.

    Zeng et al. [18] applied the GLR for point cloud denoising.In contrast to [13], they first formulated the denoising problemwith a low dimensional manifold model (LDMM) [84]. TheLDMM prior suggests that the clean point cloud patches aresamples from a low-dimensional manifold embedded in thehigh dimensional space, though it is non-trivial to minimizethe dimension of a Riemannian manifold. With (19) and toolsfrom differential geometry [85], it is possible to “convert”the LDMM signal prior to the GLR. Hence, the problemof minimizing the manifold dimension is approximated byiteratively solving a quadratic program with the GLR.

    Instead of constructing the underlying graph with pre-defined edge weights from hand-crafted parameters, Hu etal. proposed feature graph learning by minimizing the GLRusing the Mahalanobis distance metric matrix M as a variable,

    assuming a feature vector per node is available [20]. Then thegraph Laplacian Ł becomes a function of M, i.e., Ł(M). Afast algorithm with the GLR is presented and applied to pointcloud denoising, where the graph for each set of self-similarpatches is computed from 3D coordinates and surface normalsas features.

    2) Geometric Data Restoration with Other GSRs: Despitethe simplicity of the GLR, applying it for geometric datarestoration involves sophisticated graph construction or algo-rithmic procedures. This has motivated the development ofother geometric data restoration methods using various GSRsthat are tailored to specific restoration tasks.

    To remove noise on point clouds, the method proposedin et al. [73] first assumes smoothness in the gradient ∇GYof the point cloud Y on a graph G, leading to a Tikhonovregularization GSRTik(Y) = ‖∇GY‖22 which is equivalent tothe simple GLR. The method further assumes the underlyingmanifold of the point cloud to be piece-wise smooth rather thansmooth, and then replaces the Tikhonov regularization withthe GTV regularization (21), i.e., GSRTV(Y) = ‖∇GY‖1. In[72], Elmoataz et al. also applied the GTV for mesh filteringto simplify 3D geometry.

    In [46], Dinesh et al. applied the RGTV (22) to regularizethe surface normal for point cloud denoising, where the edgeweight between two nodes is a function of the normals. More-over, they established a linear relationship between normalsand 3D point coordinates via bipartite graph approximationfor ease of optimization. To perform point cloud inpainting,Hu et al. [75] also applied a modified GTV called AnisotropicGTV (AGTV) as a metric to measure the similarity of pointcloud patches.

    3) Restoration with Nodal-domain Filtering: Solving op-timization problems can be formidable and sometimes evenimpractical. An alternative strategy for geometric data recov-ery is to perform filtering in the nodal-domain as discussed inSection V-A. Examples include point cloud re-sampling [56]and depth image enhancement [86]. Essentially, nodal-domainfiltering aims at “averaging” the samples of a graph signaladaptively, either locally or non-locally.

  • 10

    (a) Spectral graph convolution.

    1-hop

    2-hop

    (b) Spatial graph convolution.

    Fig. 5: Graph convolution operations.

    D. Discussion on Spectral- and Nodal-Domain GSP Methods

    There exists a close connection between spectral-domainmethods and nodal-domain ones, and as discussed earlier, thereis a correspondence between spatial graph filters and theirgraph spectral response. As an example, consider the filterin (13), where Łk = (UΛU>)k = UΛkU> since U>U = I.It follows that (13) can be rewritten as

    h(Ł) = Uh(Λ)U>, (23)

    which corresponds to a graph spectral filter with h(Λ) asthe spectral response in (4). Another example is the spectralresponse of the solution to a GLR-regularized optimizationproblem, as presented in (18).

    In addition, some nodal-domain graph filtering methods areapproximations of spectral-domain filtering, including polyno-mial approximations of graph spectral filters like Chebyshevpolynomials [15]–[17] for depth map enhancement, as well aslifting transforms on graphs [63] for depth map coding.

    Comparing the two methods, as mentioned earlier, spectralmethods entail higher computational complexity due to eigen-decomposition, whereas spatial methods avoid such complex-ity and are thus more amenable to large-scale geometric data.Also, graph transforms employed in spectral methods aremostly global transforms, which capture the global features.In contrast, spatial methods are often applied to capture localfeatures in graph signals. Taking point cloud denoising as anexample, we show denoising results in Fig. 4 comparing onegraph spectral low-pass filtering method that we implementaccording to (5) as well as two state-of-the-art nodal-domainGSP methods [18], [20]. We can observe from these resultsthat the nodal-domain methods reconstruct local structuralfeatures better, including fine components such as the tobaccopipe.

    VI. GRAPH NEURAL NETWORKS FOR GEOMETRIC DATAThe aforementioned GSP methods are model-based and

    built upon domain knowledge and characteristics of geometricdata. In contrast, learning-based methods infer filter parame-ters in a data-driven manner, such as the recently developedgeometric deep learning [21].

    Convolutional Neural Networks (CNNs) have proven tobe extremely effective for a wide range of imaging tasks,but have been designed to process data defined on regulargrids, where spatial relationships between data samples (e.g.,top, bottom, left and right) are uniquely defined. In orderto leverage such networks for geometric data, some priorworks transform irregular geometric data to regular 3D voxelgrids or collections of 2D images before feeding them toa neural network [87], [88] or impose certain symmetriesin the network computation (e.g., PointNet [89], PointNet++[90], PointCNN [91]). As discussed in Section III-C, non-graph representations are sometimes redundant, inaccurate ordeficient in their structural description.

    In contrast, GSP provides efficient filtering and sampling ofsuch data with insightful spectral interpretation, which is ableto generalize the key operations (e.g., convolution and pooling)in a neural network for irregular geometric data. For example,graph convolution can be defined as graph filtering either inthe spectral or the spatial domain. This leads to the recentlydeveloped Graph Neural Networks (GNNs) (see [21] andreferences therein), which generalize CNNs to unstructureddata. The input geometric features at each point (vertex) ofGNNs are usually assigned with coordinates, laser intensitiesor colors, while features at each edge are usually assigned withgeometric similarities between two connected points.

    In the remainder of this section, we will analyze GNNsfrom the perspective of GSP for geometric data, focusing onthe key graph convolution operators in the spectral domainor the nodal domain. This will permit us to provide greaterinterpretability of GNNs.

    A. Spectral Graph ConvolutionAs there is no clear definition of shift invariance over graphs

    in the nodal domain, one may define graph convolution inthe spectral domain via graph transforms according to theConvolution Theorem. That is, the graph convolution of signalf ∈ RN and filter g ∈ RN in the spectral domain with respectto the underlying graph G can be expressed as the element-wise product of their graph transforms:

    g ?G f = U(U>g �U>f), (24)

    where U is the GFT basis and � denotes the Hadamardproduct. Let gθ = diag(U>g), the graph convolution can besimplified as

    g ?G f = UgθU>f . (25)

    The key difference in various spectral GNN methods is thechoice of filter gθ which captures the holistic appearance ofthe geometry. In an early work [92], gθ = Θ is a learnablediagonal matrix.

    As schematically shown in Fig. 5(a), the spectral-domaingraph convolution (25) is essentially the spectral graph filter-ing defined in (4) if the diagonal entries of gθ are the graph

  • 11

    (d) Point cloud classification (ModelNet) [42]

    Vase? Cup?

    Truck? Car?

    Human?

    (b) Point cloud segmentation (ShapeNet) [41]

    (e) RGB+Depth segmentation (NYUD2) [106]

    (a) Point cloud detection (KITTI) [37]

    (c) Point cloud segmentation (S3DIS) [43]

    ChairAirplane

    (f) Point cloud generation (ShapeNet) [41](g) Single image to 3D mesh (ShapeNet) [41]

    An

    aly

    sis

    Syn

    thesis

    +

    Fig. 6: Example applications of Graph Neural Networks (GNNs) on geometric data from cited datasets.

    frequency response ĥ(λk). As gθ is often learned so as to adaptto various tasks, it is analogous to the graph spectral filteringwith desired distribution as discussed in Section IV-B3. Hence,we are able to interpret spectral-domain graph convolution viaspectral graph filtering for geometric data processing.

    It has been noted earlier that the eigen-decomposition re-quired by spectral-domain graph convolution incurs relativelyhigh computational complexity. However, one may parameter-ize the filter using a smooth spectral transfer function Θ(Λ)[93]. One choice is to represent Θ(Λ) as a K-degree polyno-mial, such as the Chebyshev polynomial which approximatesthe graph kernel well [15]:

    Θ(Λ) =

    K−1∑k=0

    Tk(Λ̃), (26)

    where T (·) denotes the Chebyshev polynomial. It is defined asT0(Λ̃) = I, T1(Λ̃) = Λ̃, Tk(Λ̃) = 2Λ̃Tk−1(Λ̃) − Tk−2(Λ̃).Λ̃ denotes the normalized eigenvalues in [−1, 1] due to thedomain defined by the Chebyshev polynomial.

    Combining (25) and (26), we have

    g ?G f = UΘ(Λ)U>f (27)

    ≈ UK−1∑k=0

    Tk(Λ̃)U>f =

    K−1∑k=0

    Tk(L̃)f , (28)

    where L̃ = UΛ̃U> is a normalized graph Laplacian. Thisleads to the well-known ChebNet [94]. If we only consider 1-degree Chebyshev polynomial, it is essentially the GCN [95].

    The spectrum-free convolution in (28) is essentially oneclass of nodal-domain graph filtering presented in (13).

    B. Spatial Graph Convolution

    Analogous to the convolution in CNNs, spatial graph con-volution aggregates the information of neighboring vertices tocapture the local geometric structure in the spatial domain,leading to feature propagation over adjacent vertices that en-forces the smoothness of geometric data to some extent [96]–[98]. Such graph convolution filters over the neighborhood ofeach vertex in the spatial domain are essentially nodal-domaingraph filters from the perspective of GSP.

    In their pioneering work [96], Simonovsky et al. generalizedthe convolution operator from regular grids to arbitrary graphsin the spatial domain. The filtered signal at each vertex iscomputed as a weighted sum of signals in its neighborhood,where each filtering weight is conditioned on the respectiveedge label to retain the structural information. As a represen-tative spatial method on point clouds, Wang et al. introducedthe concept of edge convolution [98], which generates edgefeatures that characterize the relationships between each pointand its neighbors. The edge convolution exploits local geo-metric structure and can be stacked to learn global geometricproperties. Let xi ∈ Rd and xj ∈ Rd denote the graph signalon the i-th and j-the vertex respectively, the output of edgeconvolution is:

    x′i = Ψ(i,j)∈Eh(xi,xj) ∈ Rd, (29)

    where E is the set of edges and h(·, ·) is a generic edge featurefunction, implemented by a certain neural network. Ψ is ageneric aggregation function, which could be the summationor maximum operation. The operation of (29) is demonstrated

  • 12

    in Fig. 5(b), where we could consider not only the 1-hopneighbors but also 2-hop neighbors or more.

    The edge convolution is also similar to the nodal-domaingraph filtering: both aggregating neighboring information;however, the edge convolution specifically models each pair-wise relationship by a non-parametric function. [98] alsoproposed to dynamically construct a K-NN graph in each layerof the network. The distance metric of the K-NN graph isthe Euclidean distance in the high-dimensional feature space,instead of the original 3D space.

    Further, instead of using fixed handcrafted weight functionsin each layer, Monti et al. proposed parametric kernels withlearnable parameters [97]. In particular, a Gaussian kernel isemployed as an edge weight function with a learnable meanand covariance matrix, which is essentially graph learning.Later, a diversity of graph learning methods have been pro-posed in GNNs for adaptively inferring the underlying graphtopology of geometric data [55], [99], [100].

    C. Applications to Geometric Data

    GNNs have achieved success in both analysis and synthesisof geometric data. As demonstrated in Fig. 6, popular analysisproblems include point cloud classification and segmenta-tion [98], [101]–[103], LiDAR point cloud detection for au-tonomous driving [104], RGBD segmentation [105], [106], aswell as skeleton-based action recognition [100], [107], [108].Example synthesis problems include point cloud generation[109] and image to mesh generation [110]. In the following,we discuss some applications to geometric data with bothspectral GNNs and spatial GNNs.

    1) Spectral Geometric Data Learning: Previous works of-ten employ the Chebychev expansion as in [94] for graphconvolution in spectral geometric data learning, along withvarious regularizations or graph constructions.

    Te et al. proposed a Regularized Graph Convolutional Neu-ral Network (RGCNN) [102] as one of the first approaches toutilize GNNs for point cloud segmentation, which regularizeseach layer by the GLR with spectral smoothing functionality.Such regularization is robust to both low density and noise inpoint clouds.

    The Adaptive Graph Convolution Network (AGCN) pro-posed in [55] learns a task-driven graph adaptively during thetraining and leverages distance metric learning to parameter-ize the similarity between two vertices on the graph. Thisapproach was shown to improve object classification fromLiDAR point clouds.

    Wang et al. proposed local spectral graph convolution [111]on point clouds, that uses standard unparametrized Fourierkernels with learnable filter coefficients and devises a recursiveclustering and pooling strategy for aggregating informationfrom within clusters of nodes.

    2) Spatial Geometric Data Learning: Most existing GNNsfor geometric data are based on spatial methods. We discussthe applications of spatial geometric data learning from twoperspectives: supervised learning with data labels and unsu-pervised learning without the supervision of labels.

    Supervised Learning. This class of methods present var-ious kernel designs and graph constructions. Li et al. [112]

    modeled the spatial distribution of a point cloud by building aSelf-Organizing Map (SOM) and perform hierarchical featureextraction on individual points and SOM nodes. KCNet [113]learns local structures of point clouds based on kernel correla-tion as an affinity measure between the neighboring points ofa target point and kernel points, which has a clear geometricinterpretation. LDGCNN [114] removes the transformationnetwork and links the hierarchical features from differentlayers in DGCNN [98]. Deformable kernels with various shapeand weights are learned for point cloud analysis in [115].

    To learn the graph structure adaptively in GNNs, graphattention networks (GATs) [116] introduce the attention mech-anism to implicitly specify adaptive weights to different nodesin a neighborhood. GATs are further exploited in [99] forpoint cloud semantic segmentation. GLCN [117] learns anon-negative function that represents the pairwise relationshipbetween two vertices using an attention mechanism via asingle-layer neural network, and optimizes the network byminimizing a graph learning loss. We may also treat humanskeleton data as a sparse point cloud, and construct graphsover the skeleton data by connecting human body jointsboth spatially and temporally [107], [118], [119]. Given thisconstruction, several attempts have been made to learn spatio-temporal graphs in order to infer the action or activity basedon the spatio-temporal correlation in each skeleton instance[100], [108], [120].

    Unsupervised Learning. While existing GNNs for geo-metric data are mostly trained in a supervised fashion, theyrequire a large amount of labeled data to learn graph repre-sentations, introducing high labeling cost especially for large-scale graphs. Hence, it is important for some applications totrain graph feature representations in an unsupervised manner,e.g., in order to adapt to downstream learning tasks on graphs.

    In particular, several works have emerged in recent yearsthat perform unsupervised geometric data learning based onAuto-Encoders (AEs) [121]—one of the most representativemethods for unsupervised feature learning. Yang et al. pro-posed an end-to-end deep auto-encoder on point clouds—FoldingNet [122], with a folding-based decoder that mimicsthe process of sampling point clouds from 2D surfaces lyingin a 3D space. Further, Chen et al. introduced a trainablegraph structure to refine the reconstruction from FoldingNet[123]. This graph structure flexibly regularizes the spatialrelationships among 3D points and enables reconstruction ofmore complex 3D shapes. Other AE-based works includemulti-task auto-encoding for learning geometric features [124],and differentiable manifold auto-encoding for point cloudreconstruction [125], etc.

    The above AE-based methods reconstruct the data of apoint cloud at the decoder, which is also referred to as Auto-Encoding Data (AED). In contrast, Gao et al. proposed toauto-encode node-wise transformations that can be appliedto a point cloud for transformation equivariant representationlearning [103], which is referred to as Auto-Encoding Transfor-mation (AET). The goal is to capture intrinsic patterns of graphstructure under both global and local transformations, whichbenefits downstream tasks such as point cloud classificationand segmentation.

  • 13

    Another class of methods are based on Generative Adversar-ial Networks (GANs) [126]. For instance, Valsesia et al. stud-ied localized generative models for point clouds based ongraph convolution [109]. The key idea is to learn domain andlocalized features simultaneously, where the features capturelocal dependencies between a point and its neighbors. Shu etal. proposed a multi-class 3D point cloud generation method—tree-GAN [127], and introduce a tree-structured graph convo-lution network (TreeGCN) as a generator for tree-GAN. Withgraph convolutions performed within a tree, tree-GAN utilizesthe ancestor information to boost the feature representation.

    Except for the AE-based and GAN-based methods, someworks formulate a pretext learning task, where a target objec-tive can be computed without any supervision. For example,Zhang et al. proposed to learn features from an unlabeled pointcloud dataset by using part contrasting and object clusteringas self-supervision with GNNs [128].

    VII. CONCLUSIONS AND OPEN ISSUESWe present a generic GSP framework for geometric data,

    from theory to applications. Distinguished from other graphsignals, geometric data are discrete samples of 3D sur-faces, which exhibit unique characteristics such as piece-wise smoothness that can be compactly, accurately, andadaptively represented on graphs. Hence, GSP is naturallyadvantageous for the processing and analysis of geometricdata, with interpretations in both the discrete domain and thecontinuous domain of Riemannian manifolds. In particular, wediscuss spectral-domain GSP methods and nodal-domain GSPmethods, as well as their relation. Further, we elaborate onGNNs for geometric data, where we provide insight fromthe perspective of GSP, highlighting that the basic graphconvolution operation is essentially graph spectral or nodalfiltering. We anticipate this interpretation will inspire futureresearch on more principled GNN designs that leverage thekey GSP concepts and theory.

    While existing GSP methods have achieved success in avariety of applications involving geometric data processing andanalysis, there remain quite a few challenges ahead. Someopen problems and potential future research directions include:• Time-varying geometric data processing and analysis:

    Unlike regularly sampled videos, 4D geometric data arecharacterized by irregularly-sampled points, both spa-tially and temporally, and the number of points in eachtime instance may also vary. This makes it challengingto establish temporal correspondences and exploit thetemporal information. While some works have been donein the context of 4D point cloud compression [47], [71]and restoration [48] with GSP, it still remains challengingto address complex scenarios with fast motion. Also,there is little work on the analysis of 4D geometric dataand a lack of dynamic datasets for such problems.

    • Large-scale geometric data processing and analysis: It isimportant yet challenging to realize computation-efficientand even real-time processing and analysis for large-scale geometric data. For instance, a 4D dynamic pointcloud may contain millions of points in each frame; thepoint cloud of a scene may reach hundreds of Gigabytes,

    even a few Terabytes. Processing or analyzing such large-scale geometric data efficiently for real-time applicationsremains challenging.

    • Implicit geometric data processing: While not discussedin detail, the presented GSP framework embraces theprocessing of geometric data that is implicitly containedin the data, e.g., multiview representations. For instance,Maugey et al. [129] proposed a graph-based representa-tion of geometric information based on multiview images.While this work aims at more efficient compression, webelieve the use of such representations can potentially beleveraged for a wide range of inference tasks as well.

    • Integration with multi-modalities: The use of geometricdata with other modalities has the potential to improveperformance. Taking autonomous driving as an example,sensor measurements from multiple modalities, includingcamera and LiDAR, can be fused for robust detection,classification or prediction. However, techniques that canexploit the characteristics across multiple modalities forhigh-level learning have not been fully explored. Webelieve that many of the GSP techniques that havebeen developed for geometric data can be extended forprocessing multiple modalities.

    • Unsupervised/self-supervised geometric data learning:As mentioned in Section VI-C2, unsupervised/self-supervised geometric data learning is crucial in prac-tical applications, and provides insights into intrinsicrepresentation learning. While several efforts have beenmade based on AEs, GANs, or pretext learning tasks asdiscussed in Section VI-C2, this line of research is stillin an early phase. Effective unsupervised/self-supervisedlearning methods that fully exploit the characteristics ofgeometric data continue to be an active area of study.

    REFERENCES[1] G. C. Burdea and P. Coiffet, Virtual reality technology. John Wiley

    & Sons, 2003.[2] D. Schmalstieg and T. Hollerer, Augmented reality: Principles and

    practice. Addison-Wesley Professional, 2016.[3] S. Chen, B. Liu, C. Feng, C. Vallespi-Gonzalez, and C. Wellington, “3D

    point cloud processing and learning for autonomous driving,” IEEESignal Process. Mag., 2020.

    [4] C. Benedek, “3D people surveillance on range data sequences of arotating LiDAR,” Pattern Recognit. Lett., vol. 50, pp. 149–158, 2014.

    [5] E. H. Adelson and J. R. Bergen, “The plenoptic function and theelements of early vision,” MIT Press, 1991.

    [6] T. Ebrahimi, S. Fossel, F. Pereira, and P. Schelkens, “JPEG Pleno:Toward an efficient representation of visual reality,” IEEE Multimedia,vol. 23, no. 4, pp. 14–20, 2016.

    [7] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-dergheynst, “The emerging field of signal processing on graphs: Ex-tending high-dimensional data analysis to networks and other irregulardomains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, 2013.

    [8] A. Sandryhaila and J. M. Moura, “Discrete signal processing ongraphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656,2013.

    [9] A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Van-dergheynst, “Graph signal processing: Overview, challenges, and ap-plications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, 2018.

    [10] W. Hu, G. Cheung, A. Ortega, and O. C. Au, “Multiresolution graphFourier transform for compression of piecewise smooth images,” IEEETrans. Image Process., vol. 24, no. 1, pp. 419–433, 2015.

    [11] D. Ting, L. Huang, and M. Jordan, “An analysis of the convergence ofgraph Laplacians,” in Proc. Int. Conf. Mach. Learn., 2010, pp. 1079–1086.

  • 14

    [12] M. Hein, J.-Y. Audibert, and U. v. Luxburg, “Graph Laplacians andtheir convergence on random neighborhood graphs,” J. Mach. Learn.Res., vol. 8, pp. 1325–1368, 2007.

    [13] J. Pang and G. Cheung, “Graph Laplacian regularization for imagedenoising: Analysis in the continuous domain,” IEEE Trans. ImageProcess., vol. 26, no. 6, pp. 1770–1785, 2017.

    [14] M. Hein, “Uniform convergence of adaptive graph-based regulariza-tion,” in Proc. Int. Conf. Comput. Learn. Theory, 2006, pp. 50–64.

    [15] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets ongraphs via spectral graph theory,” Appl. Comput. Harmonic Anal.,vol. 30, no. 2, pp. 129–150, 2011.

    [16] A. Gadde, S. K. Narang, and A. Ortega, “Bilateral filter: Graph spectralinterpretation and extensions,” in Proc. IEEE Int. Conf. Image Process.,2013, pp. 1222–1226.

    [17] D. Tian, H. Mansour, A. Knyazev, and A. Vetro, “Chebyshev andconjugate gradient filters for graph image denoising,” in Proc. IEEEInt. Conf. Multimedia Expo Workshop, 2014, pp. 1–6.

    [18] J. Zeng, G. Cheung, M. Ng, J. Pang, and Y. Cheng, “3D point clouddenoising using graph Laplacian regularization of a low dimensionalmanifold model,” IEEE Trans. Image Process., vol. 29, pp. 3474–3489,December 2019.

    [19] M. Rossi and P. Frossard, “Geometry-consistent light field super-resolution via graph-based regularization,” IEEE Trans. Image Process.,vol. 27, no. 9, pp. 4207–4218, 2018.

    [20] W. Hu, X. Gao, G. Cheung, and Z. Guo, “Feature graph learning for3D point cloud denoising,” IEEE Transactions on Signal Processing,vol. 68, pp. 2841–2856, 2020.

    [21] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst,“Geometric deep learning: Going beyond Euclidean data,” IEEE SignalProcess. Mag., vol. 34, no. 4, pp. 18–42, 2017.

    [22] F. R. Chung and F. C. Graham, Spectral graph theory. AmericanMathematical Soc., 1997, no. 92.

    [23] “Middlebury stereo datasets,” http://vision.middlebury.edu/stereo/data/,accessed October 12, 2019.

    [24] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, andM. Nießner, “ScanNet: Richly-annotated 3D reconstructions of indoorscenes,” in Proc. CVPR, 2017, pp. 5828–5839.

    [25] Q. Cai and P. A. Chou, “Microsoft voxelized upper bodies a vox-elized point cloud dataset,” in ISO/IEC JTC1/SC29 WG11 ISO/IECJTC1/SC29/WG1 input document m38673/M72012, May 2016.

    [26] J. A. Deri and J. M. Moura, “Spectral projector-based graph Fouriertransforms,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 6, pp.785–795, 2017.

    [27] B. Girault, A. Ortega, and S. S. Narayanan, “Irregularity-aware graphFourier transforms,” IEEE Trans. Signal Process., vol. 66, no. 21, pp.5746–5761, 2018.

    [28] L. Le Magoarou, R. Gribonval, and N. Tremblay, “Approximate fastgraph Fourier transforms via multilayer sparse approximations,” IEEETrans. Signal Inf. Process. Netw., vol. 4, no. 2, pp. 407–420, 2017.

    [29] K.-S. Lu and A. Ortega, “Fast graph Fourier transforms based on graphsymmetry and bipartition,” IEEE Trans. Signal Process., vol. 67, no. 18,pp. 4855–4869, 2019.

    [30] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from dataunder Laplacian and structural constraints,” IEEE J. Sel. Topics SignalProcess., vol. 11, no. 6, pp. 825–841, 2017.

    [31] C. Zhang and D. Florêncio, “Analyzing the optimality of predictivetransform coding using graph-based models,” IEEE Signal Process.Lett., vol. 20, no. 1, pp. 106–109, 2012.

    [32] W. Hu, G. Cheung, and A. Ortega, “Intra-prediction and generalizedgraph Fourier transform for image coding,” IEEE Signal Process. Lett.,vol. 22, no. 11, pp. 1913–1917, 2015.

    [33] C. Zhang, P. Chou, and D. Florencio, “Graph signalprocessing–A probabilistic framework,” URL https://sigport.org/sites/default/files/graphSP prob. pdf, 2016.

    [34] A. Singer, “From graph to manifold Laplacian: The convergence rate,”Applied Comput. Harmonic Anal., vol. 21, no. 1, pp. 128–134, 2006.

    [35] “FlyingThings3D datasets,” https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html, accessedOctober 12, 2019.

    [36] “Tsukuba datasets,” https://home.cvlab.cs.tsukuba.ac.jp/dataset, ac-cessed October 12, 2019.

    [37] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomousdriving? The KITTI vision benchmark suite,” in Proc. CVPR, 2012,pp. 3354–3361.

    [38] M. Levoy, J. Gerth, B. Curless, and K. Pull, “The stanford3D scanning repository, 2005,” URL http://www-graphics. stanford.edu/data/3dscanrep (accessed September 2017), 2005.

    [39] M. Berger, J. A. Levine, L. G. Nonato, G. Taubin, and C. T. Silva, “Abenchmark for surface reconstruction,” ACM Trans. Graphics, vol. 32,no. 2, p. 20, 2013.

    [40] T. M. Eugene d′Eon, Bob Harrison and P. A. Chou, “8i voxelizedfull bodies, version 2—A voxelized point cloud dataset,” ISO/IECJTC1/SC29/WG11 m40059 ISO/IEC JTC1/SC29/WG1 M74006, Jan.2017.

    [41] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang,Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “ShapeNet:An information-rich 3D model repository,” arXiv preprintarXiv:1512.03012, 2015.

    [42] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao,“3D ShapeNets: A deep representation for volumetric shapes,” in Proc.CVPR, 2015, pp. 1912–1920.

    [43] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer,and S. Savarese, “3D semantic parsing of large-scale indoor spaces,”in Proc. CVPR, 2016, pp. 1534–1543.

    [44] “WAYMO open dataset,” https://github.com/waymo-research/waymo-open-dataset, accessed October 12, 2019.

    [45] J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss,and J. Gall, “SemanticKITTI: A dataset for semantic scene understand-ing of LiDAR sequences,” in Proc. IEEE Int. Conf. Comput. Vis., 2019.

    [46] C. Dinesh, G. Cheung, and I. V. Bajić, “Point cloud denoising viafeature graph Laplacian regularization,” IEEE Trans. Image Process.,vol. 29, pp. 4143–4158, 2020.

    [47] D. Thanou, P. A. Chou, and P. Frossard, “Graph-based compressionof dynamic 3D point cloud sequences,” IEEE Trans. Image Process.,vol. 25, no. 4, pp. 1765–1778, 2016.

    [48] Z. Fu, W. Hu, and Z. Guo, “3D dynamic point cloud inpainting viatemporal consistency on graphs,” Proc. IEEE Int. Conf. MultimediaExpo, July 2020.

    [49] R. Schnabel and R. Klein, “Octree-based point-cloud compression.”Spbg, vol. 6, pp. 111–120, 2006.

    [50] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3D objectdetection network for autonomous driving,” in Proc. CVPR, July 2017.

    [51] G. Shen, W.-S. Kim, S. K. Narang, A. Ortega, J. Lee, and H. Wey,“Edge-adaptive transforms for efficient depth map coding,” in Proc.Picture Coding Symp., 2010, pp. 566–569.

    [52] C. Zhang, D. Florencio, and C. Loop, “Point cloud attribute compres-sion with graph transform,” in Proc. IEEE Int. Conf. Image Process.,2014, pp. 2066–2070.

    [53] X. Dong, D. Thanou, M. Rabbat, and P. Frossard, “Learning graphsfrom data: A signal representation perspective,” IEEE Signal Process.Mag., vol. 36, no. 3, pp. 44–63, 2019.

    [54] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connectingthe dots: Identifying network structure via graph signal processing,”IEEE Signal Process. Mag., vol. 36, no. 3, pp. 16–43, 2019.

    [55] R. Li, S. Wang, F. Zhu, and J. Huang, “Adaptive graph convolutionalneural networks,” in Proc. AAAI Conf. Artif. Intell., 2018.

    [56] S. Chen, D. Tian, C. Feng, A. Vetro, and J. Kovačević, “Fast resamplingof three-dimensional point clouds via graphs,” IEEE Trans. SignalProcess., vol. 66, no. 3, pp. 666–681, 2017.

    [57] W. Hu, X. Li, G. Cheung, and O. Au, “Depth map denoising usinggraph-based transform and group sparsity,” in Proc. IEEE WorkshopMultimedia Signal Process., 2013, pp. 001–006.

    [58] G. Rosman, A. Dubrovina, and R. Kimmel, “Patch-collaborative spec-tral point-cloud denoising,” in Comput. Graphics Forum, vol. 32, no. 8,2013, pp. 1–12.

    [59] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoisingwith block-matching and 3D filtering,” in Image Process.: AlgorithmsSyst. Neural Netw. Mach. Learn., vol. 6064, 2006, p. 606414.

    [60] G. Cheung, W.-S. Kim, A. Ortega, J. Ishida, and A. Kubota, “Depthmap coding using graph based transform and transform domain sparsi-fication,” in Proc. IEEE Workshop Multimedia Signal Process., 2011,pp. 1–6.

    [61] W.-S. Kim, S. K. Narang, and A. Ortega, “Graph based transforms fordepth video coding,” in Proc. IEEE Int. Conf. Acoustics Speech SignalProcess., 2012, pp. 813–816.

    [62] W. Hu, G. Cheung, X. Li, and O. Au, “Depth map compressionusing multi-resolution graph-based transform for depth-image-basedrendering,” in Proc. IEEE Int. Conf. Image Process., 2012, pp. 1297–1300.

    [63] Y.-H. Chao, A. Ortega, W. Hu, and G. Cheung, “Edge-adaptive depthmap coding with lifting transform on graphs,” in Proc. Picture CodingSymp., 2015, pp. 60–64.

    http://www-graphicshttp://arxiv.org/abs/1512.03012

  • 15

    [64] R. A. Cohen, D. Tian, and A. Vetro, “Attribute compression for sparsepoint clouds using graph transforms,” in Proc. IEEE Int. Conf. ImageProcess., 2016, pp. 1374–1378.

    [65] ——, “Point cloud attribute compression using 3-D intra prediction andshape-adaptive transforms,” in Proc. Data Compression Conf., 2016,pp. 141–150.

    [66] Y. Shao, Z. Zhang, Z. Li, K. Fan, and G. Li, “Attribute compression of3D point clouds using Laplacian sparsity optimized graph transform,”in Proc. IEEE Vis. Commun. Image Process., 2017, pp. 1–4.

    [67] Y. Xu, W. Hu, S. Wang, X. Zhang, S. Wang, S. Ma, and W. Gao,“Cluster-based point cloud coding with normal weighted graph Fouriertransform,” in Proc. IEEE Int. Conf. Acoustics Speech Signal Process.,2018, pp. 1753–1757.

    [68] R. L. de Queiroz and P. A. Chou, “Transform coding for point cloudsusing a gaussian process model,” IEEE Trans. Image Process., vol. 26,no. 7, pp. 3507–3517, 2017.

    [69] P. de Oliveira Rente, C. Brites, J. Ascenso, and F. Pereira, “Graph-basedstatic 3D point clouds geometry coding,” IEEE Trans. Multimedia,vol. 21, no. 2, pp. 284–299, 2018.

    [70] A. Anis, P. A. Chou, and A. Ortega, “Compression of dynamic 3Dpoint clouds using subdivisional meshes and graph wavelet transforms,”in Proc. IEEE Int. Conf. Acoustics Speech Signal Process., 2016, pp.6360–6364.

    [71] Y. Xu, W. Hu, S. Wang, X. Zhang, S. Wang, S. Ma, Z. Guo,and W. Gao, “Predictive generalized graph Fourier transform forattribute compression of dynamic point clouds,” arXiv preprintarXiv:1908.01970, 2019.

    [72] A. Elmoataz, O. Lezoray, and S. Bougleux, “Nonlocal discrete regu-larization on weighted graphs: A framework for image and manifoldprocessing,” IEEE Trans. Image Process., vol. 17, no. 7, pp. 1047–1060, 2008.

    [73] Y. Schoenenberger, J. Paratte, and P. Vandergheynst, “Graph-baseddenoising for time-varying point clouds,” in Proc. 3DTV Conf. IEEE,2015, pp. 1–4.

    [74] Y. Bai, G. Cheung, X. Liu, and W. Gao, “Graph-based blind imagedeblurring from a single photograph,” IEEE Trans. Image Process.,vol. 28, no. 3, pp. 1404–1418, 2018.

    [75] W. Hu, Z. Fu, and Z. Guo, “Local frequency interpretation and non-local self-similarity on graph for point cloud inpainting,” IEEE Trans.Image Process., vol. 28, no. 8, pp. 4087–4100, 2019.

    [76] O. Axelsson and G. Lindskog, “On the rate of convergence of thepreconditioned conjugate gradient method,” Numerische Mathematik,vol. 48, no. 5, pp. 499–523, 1986.

    [77] M. Hein and M. Maier, “Manifold denoising,” in Proc. Adv. NeuralInf. Process. Syst., 2007, pp. 561–568.

    [78] X. Liu, G. Cheung, X. Wu, and D. Zhao, “Random walk graphLaplacian-based smoothness prior for soft decoding of JPEG images,”IEEE Trans. Image Process., vol. 26, no. 2, pp. 509–524, 2016.

    [79] N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trendsin Optimization, vol. 1, no. 3, pp. 123–231, 2013.

    [80] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variationbased noise removal algorithms,” Phys. D, vol. 60, no. 1-4, pp. 259–268, Nov. 1992. [Online]. Available: http://dx.doi.org/10.1016/0167-2789(92)90242-F

    [81] C. Couprie, L. Grady, L. Najman, J.-C. Pesquet, and H. Talbot, “Dualconstrained TV-based regularization on graphs,” SIAM J. ImagingSciences, vol. 6, no. 3, pp. 1246–1273, 2013.

    [82] X. Zhang, M. Burger, and S. Osher, “A unified primal-dual algorithmframework based on Bregman iteration,” J. Sci. Comput., vol. 46, no. 1,pp. 20–46, 2011.

    [83] B. Wahlberg, S. Boyd, M. Annergren, and Y. Wang, “An ADMMalgorithm for a class of total variation regularized estimation problems,”IFAC Proceedings Volumes, vol. 45, no. 16, pp. 83–88, 2012.

    [84] S. Osher, Z. Shi, and W. Zhu, “Low dimensional manifold model forimage processing,” SIAM J. Imaging Sci., vol. 10, no. 4, pp. 1669–1690,2017.

    [85] S. Helgason, Differential geometry, Lie groups, and symmetric spaces.Academic Press, 1979.

    [86] Y. Wang, A. Ortega, D. Tian, and A. Vetro, “A graph-based jointbilateral approach for depth enhancement,” in Proc. IEEE Int. Conf.Acoustic Speech Signal Process., 2014, pp. 885–889.

    [87] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-viewconvolutional neural networks for 3D shape recognition,” in Proc. IEEEInt. Conf. Comput. Vis., 2015, pp. 945–953.

    [88] C. Ma, Y. Guo, J. Yang, and W. An, “Learning multi-view representa-tion with LSTM for 3-D shape recognition and retrieval,” IEEE Trans.Multimedia, vol. 21, no. 5, pp. 1169–1182, 2018.

    [89] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learningon point sets for 3D classification and segmentation,” in Proc. CVPR,2017, pp. 652–660.

    [90] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchicalfeature learning on point sets in a metric space,” in Proc. Adv. NeuralInf. Process. Syst., 2017, pp. 5099–5108.

    [91] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “PointCNN: Con-volution on x-transformed points,” in Proc. Adv. Neural Inf. Process.Syst., 2018, pp. 820–830.

    [92] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networksand locally connected networks on graphs,” in Proc. Int. Conf. Learn.Rep., 2014.

    [93] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks ongraph-structured data,” arXiv preprint arXiv:1506.05163, 2015.

    [94] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutionalneural networks on graphs with fast localized spectral filtering,” inProc. Adv. Neural Inf. Process. Syst., 2016, pp. 3844–3852.

    [95] T. N. Kipf and M. Welling, “Semi-supervised classification with graphconvolutional networks,” Proc. Int. Conf. Learn. Rep., April 2017.

    [96] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filtersin convolutional neural networks on graphs,” in Proc. CVPR, 2017, pp.3693–3702.

    [97] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M.Bronstein, “Geometric deep learning on graphs and manifolds usingmixture model CNNs,” in Proc. CVPR, 2017, pp. 5115–5124.

    [98] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.Solomon, “Dynamic graph CNN for learning on point clouds,” ACMTrans. Graphics, vol. 38, no. 5, pp. 1–12, 2019.

    [99] L. Wang, Y. Huang, Y. Hou, S. Zhang, and J. Shan, “Graph attentionconvolution for point cloud semantic segmentation,” in Proc. CVPR,2019, pp. 10 296–10 305.

    [100] X. Gao, W. Hu, J. Tang, J. Liu, and Z. Guo, “Optimized skeleton-basedaction recognition via sparsified graph regression,” in Proc. ACM Int.Conf. Multimedia, 2019, pp. 601–610.

    [101] Y. Zhang and M. Rabbat, “A graph-CNN for 3D point cloud classi-fication,” in Proc. IEEE Int. Conf. Acoustics Speech Signal Process.,2018, pp. 6279–6283.

    [102] G. Te, W. Hu, A. Zheng, and Z. Guo, “RGCNN: Regularized graphCNN for point cloud segmentation,” in Proc. ACM Int. Conf. Multime-dia, 2018, pp. 746–754.

    [103] X. Gao, W. Hu, and G.-J. Qi, “GraphTER: Unsupervised learningof graph transformation equivariant representations via auto-encodingnode-wise transformations,” in Proc. CVPR, June 2020.

    [104] W. Shi and R. Rajkumar, “Point-GNN: Graph neural network for 3Dobject detection in a point cloud,” in Proc. CVPR, 2020, pp. 1711–1719.

    [105] X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3D graph neuralnetworks for RGBD semantic segmentation,” in ICCV, 2017, pp. 5199–5208.

    [106] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentationand support inference from rgbd images,” in ECCV. Springer, 2012,pp. 746–760.

    [107] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutionalnetworks for skeleton-based action recognition,” in Proc. AAAI Conf.Artif. Intell., February 2018.

    [108] B. Li, X. Li, Z. Zhang, and F. Wu, “Spatio-temporal graph routing forskeleton-based action recognition,” in Proc. AAAI Conf. Artif. Intell.,2019.

    [109] D. Valsesia, G. Fracastoro, and E. Magli, “Learning localized genera-tive models for 3D point clouds via graph convolution,” in Proc. Int.Conf. Learn. Rep., 2019.

    [110] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2Mesh:Generating 3D mesh models from single rgb images,” in Proc. Eur.Conf. Comput. Vis., 2018, pp. 52–67.

    [111] C. Wang, B. Samari, and K. Siddiqi, “Local spectral graph convolutionfor point set feature learning,” in Proc. Eur. Conf. Comput. Vis., 2018,pp. 52–66.

    [112] J. Li, B. M. Chen, and G. Hee Lee, “SO-Net: Self-organizing networkfor point cloud analysis,” in Proc. CVPR, 2018, pp. 9397–9406.

    [113] Y. Shen, C. Feng, Y. Yang, and D. Tian, “Mining point cloud localstructures by kernel correlation and graph pooling,” in Proc. CVPR,2018, pp. 4548–4557.

    [114] K. Zhang, M. Hao, J. Wang, C. W. de Silva, and C. Fu, “Linkeddynamic graph CNN: Learning on point cloud via linking hierarchicalfeatures,” arXiv preprint arXiv:1904.10014, 2019.

    http://arxiv.org/abs/1908.01970http://arxiv.org/abs/1506.05163http://arxiv.org/abs/1904.10014

  • 16

    [115] Z.-H. Lin, S.-Y. Huang, and Y.-C. F. Wang, “Convolution in the cloud:Learning deformable kernels in 3D graph convolution networks forpoint cloud analysis,” in Proc. CVPR, 2020, pp. 1800–1809.

    [116] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, andY. Bengio, “Graph attention networks,” Proc. Int. Conf. Learn. Rep.,April 2018.

    [117] B. Jiang, Z. Zhang, D. Lin, J. Tang, and B. Luo, “Semi-supervisedlearning with graph learning-convolutional networks,” in Proc. CVPR,2019, pp. 11 313–11 320.

    [118] K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu, “Skeleton-based action recognition with shift graph convolutional network,” inProc. CVPR, June 2020.

    [119] X. Zhang, C. Xu, and D. Tao, “Context aware graph convolution forskeleton-based action recognition,” in Proc. CVPR, 2020, pp. 14 333–14 342.

    [120] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Skeleton-based actionrecognition with multi-stream adaptive graph convolutional networks,”arXiv preprint arXiv:1912.06971, 2019.

    [121] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extractingand composing robust features with denoising autoencoders,” in Proc.Int. Conf. Mach. Learn. ACM, 2008, pp. 1096–1103.

    [122] Y. Yang, C. Feng, Y. Shen, and D. Tian, “FoldingNet: Point cloudauto-encoder via deep grid deformation,” in Proc. CVPR, 2018, pp.206–215.

    [123] S. Chen, C. Duan, Y. Yang, D. Li, C. Feng, and D. Tian, “Deepunsupervised learning of 3D point clouds via graph topology inferenceand filtering,” IEEE Trans. Image Process., vol. 29, pp. 3183–3198,2019.

    [124] K. Hassani and M. Haley, “Unsupervised multi-task feature learning onpoint clouds,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 8160–8171.

    [125] S. Luo and W. Hu, “Differentiable manifold reconstruction for pointcloud denoising,” in Proc. ACM Int. Conf. Multimedia, October 2020.

    [126] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.

    [127] D. W. Shu, S. W. Park, and J. Kwon, “3D point cloud generativeadversarial network based on tree s