on the intrinsic dimensionality of image representations...

On the Intrinsic Dimensionality of Image Representations(Supplementary Material)

Sixue Gong Vishnu Naresh Boddeti Anil K. JainMichigan State University, East Lansing MI 48824

{gongsixu, vishnu, jain}@msu.edu

In this supplementary material we include; (1) Section 1:direct training of low-dimensional representations, (2) Sec-tion 2: intrinsic dimensionality estimates from the baselineapproaches [4, 3], (3) Section 3: evaluation of the base-line dimensionality reduction techniques on the LFW andIJB-C datasets, (4) Section 4: derivations of the intrinsic di-mensionality estimation process, (5) Section 5: RMSE andfitting plots for the graph distance based approach [1], (6)Section 6 intrinsic dimensionality estimation and learningand visualizing the learned projections on the Swiss Rolldataset, and (7) Section 7: finally visualizations of the em-beddings using t-SNE [2].

1. Direct TrainingOur findings in this paper, that many current DNN repre-

sentations can be significantly compressed, naturally begsthe question: can we directly learn embedding functionsthat yield compact and discriminative embeddings in thefirst place? Taigman et al. [6] study this problem inthe context of learning face embeddings, and noted thata compact feature space creates a bottleneck in the infor-mation flow to the classification layer and hence increasesthe difficulty of optimizing the network when training fromscratch. Given the significant developments in network ar-chitectures and optimization tools since then, we attempt tolearn highly compact embedding directly from raw-data, us-ing current best-practices, while circumventing the chicken-and-egg problem of not knowing the target intrinsic dimen-sionality before learning the embedding function.

We train1 the Inception ResNet V1 [5] on the CASIA-WebFace [8] for embeddings of different sizes. Figure 1shows the ROC curves on the LFW and IJB-C datasets. Themodels suffer significant loss in performance as we decreasethe dimensionality of the embeddings. In comparison theproposed DeepMDS based dimensionality reduction retainsits discriminative ability even at high levels of compression.These results call for the development of algorithms that can

1We build off of the publicly available implementation at https://github.com/davidsandberg/facenet

10 2 10 1 100 101

False Accept Rate (%)0

20

40

60

80

100

Verif

icatio

n Ra

te (%

)512D: 94.14%256D: 93.19%128D: 88.16%64D: 72.92%32D: 47.23%16D: 22.74%10D: 11.14%

(a) LFW

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 57.53%256D: 54.61%128D: 46.82%64D: 34.36%32D: 22.03%16D: 11.81%12D: 7.61%

(b) IJB-C

Figure 1: ROC curve on LFW and IJB-C datasets for the In-ception ResNet V1 [5] model trained with different embed-ding dimensionality on the CASIA-WebFace [8] dataset.

directly learn compact and effective image representations.

https://github.com/davidsandberg/facenet

https://github.com/davidsandberg/facenet

2. Intrinsic Dimensionality EstimationTable 1 and Table 2 reports the ID estimates from the k-

nearest neighbor approach [3] and IDEA [4], respectively,for different representation models across different datasetsthat we consider. These approaches are known to underesti-mate the intrinsic dimensionality [7]. We observe the sameas our ID estimates for the baselines are lower than the es-timates of the graph distance based approach that we use.

Table 1: Intrinsic Dimensionality: KNN [3]

Representation dataset k4 7 9 15

FaceNet-128 LFW 10 10 11 11IJB-C 10 10 9 9

FaceNet-512 LFW 8 8 8 9IJB-C 10 10 9 9

Sphereface LFW 6 7 7 8IJB-C 6 6 5 5

ResNet-101 ImageNet-100 25 20 19 16

Table 2: Intrinsic Dimensionality: IDEA [4]

Representation dataset k4 7 9 15

FaceNet-128 LFW 14 13 13 12IJB-C 14 11 10 9

FaceNet-512 LFW 12 10 10 10IJB-C 14 11 10 9

Sphereface LFW 10 9 9 9IJB-C 8 7 6 5

ResNet-101 ImageNet-100 21 21 20 20

3. Intrinsic Dimension MappingIn this section we present results of DeepMDS on LFW

(BLUFR) dataset and the baseline dimensionality reduc-tion methods for mapping from the ambient to the intrin-sic space. Figure 2 show the face verification ROC curvesof DeepMDS on LFW dataset for FaceNet-128, FaceNet-512 and SphereFace representation models. Figure 3 showthe face verification ROC curves of Principal ComponentAnalysis on the IJB-C and LFW (BLUFR) datasets for allthe three representation models. Similarly, Fig. 4 and Fig.5 show the face verification ROC curves of the Isomap andDenoising Autoencoder baselines, respectively.

4. Intrinsic Dimensionality Estimation(Derivations)

We first show the derivation for estimating the intrinsicdimensionality m that minimizes the RMSE with respect to

a m-hypersphere,

minm

∫ rmax

rmax−2σ

∥∥∥∥log p̂M(r)

p̂M(rmax)− (m− 1) log

(sin

[πr

2rmax

])∥∥∥∥2First we estimate σ for the m-hypersphere by approx-

imating the distribution p̂M(r) by a univariate Gaussiandistribution around the mode of pM(r). So, given sam-ples S = {r1, . . . , rT } from the distribution p(r), thevariance around the mode can be estimated as, σ2 =1T

∑Tt=1(rt− rmax)2, where rmax is the radius at the mode

of p̂M(r). Then, we estimate the distribution log p̂M(r)p̂M(rmax)

vs log(sin[

πr2rmax

])and solve the following least-squares

fit problem:

minm

∑S∩rmax−2σ≤ri≤rmax

(yi − (m− 1)xi)2

where yi = log p̂ M(ri)p̂ M(rmax)

and xi = log(sin[

πr2rmax

]).

In the case of comparison to a Gaussian distribution, theintrinsic dimensionality can also be estimated by compar-ing to the geodesic distance distribution for points sampledfrom a Gaussian distribution as,

mind

∫ rmax

rmax−2σ

∥∥∥∥log p(r)

p(rmax)+ (d− 1)

r2

4σ2

∥∥∥∥22

(1)

The solution of this optimization problem can be foundfollowing the same procedure described above for a m-hypersphere.

5. Intrinsic Dimensionality Estimation FittingFigure 6 shows the distribution of geodesic distances

p(r) for each of the datasets and representation models.Figure 7 shows the plot of log p̂M(r)

p̂M(rmax)vs log r

rmax, as

we vary the number of neighbors k, for the SphereFacerepresentation model on the LFW and IJB-C datasets andResNet-34 on the ImageNet dataset.

6. Swiss RollIn this section we consider the swiss roll dataset, as a

means of providing visual validation of the estimated intrin-sic space on a known dataset. First we estimate the intrinsicdimensionality of the swiss roll dataset and then we learn alow-dimensional mapping from the ambient 3-dim space tothe intrinsic space. We sample 2000 points from the swissroll dataset and use these points for the experiments. Forthis dataset, the intrinsic dimensionality estimate is 2 di-mensions (see Figure 8, which is indeed the ground truthintrinsic dimensionality of swiss-roll.

7. Visualizing EmbeddingsOur main objective for training the mapping network

is to preserve the distance between samples in the given

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 93.90%64: 94.16%32: 92.41%16: 90.39%10: 85.66%

(a) LFW: FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 98.19%256: 98.19%128: 98.19%64: 98.16%32: 98.10%16: 97.71%10: 96.85%

(b) LFW: FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 96.74%256: 96.73%128: 96.44%64: 96.50%32: 96.31%16: 95.95%10: 92.33%

(c) LFW: SphereFace

Figure 2: DeepMDS: Face Verification on LFW (BLUFR) dataset for the (a) FaceNet-128, (b) FaceNet-512 and (c)SphereFace embeddings.

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 42.57%64D: 45.45%32D: 43.33%16D: 29.37%12D: 21.11%

(a) IJB-C: FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 69.32%256D: 72.49%128D: 72.49%64D: 72.49%32D: 71.80%16D: 58.44%12D: 46.58%

(b) IJB-C: FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 71.26%256D: 50.99%128D: 52.14%64D: 55.33%32D: 49.39%16D: 33.78%

(c) IJB-C: SphereFace

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 93.90%64: 94.86%32: 90.31%16: 66.07%10: 36.38%

(d) LFW: FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 98.19%256: 98.19%128: 98.19%64: 98.19%32: 97.61%16: 87.19%10: 64.50%

(e) LFW: FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 96.74%256: 96.75%128: 96.80%64: 91.74%32: 66.38%16: 32.67%10: 16.04%

(f) LFW: SphereFace

Figure 3: PCA: Face Verification on IJB-C and LFW (BLUFR) dataset for the (a) FaceNet-128, (b) FaceNet-512 and (c)SphereFace embeddings.

dataset. It is critical that the projected features preserve thelocal relationships and the overall structure of the embed-ding space. We visualize the lower dimensional featuresprojected by DeepMDS, and the original 128-D FaceNet-128 feature vectors. Fig. 10 illustrates the embeddings ofthe entire LFW dataset. In Fig. 11, we selected twenty sub-jects in LFW. Fig. 10a and Fig. 11a provide visualizationsof a 2-dim DeepMDS embeddings learned from the origi-nal 128-dim FaceNet-128 features. Fig. 10b and Fig. 11bprovide visualizations of the 10-dim intrinsic space embed-ding learned from the original 128-dim FaceNet-128 fea-tures. We use t-SNE [2] to visualize the 10-dim space intwo dimensions. We observe that DeepMDS is able to pre-serve local neighborhood relationships between images inthe low-dimensional intrinsic space as well as the globalstructure of the embedding space.

References[1] D. Granata and V. Carnevale. Accurate estimation of the in-

trinsic dimension using graph distances: Unraveling the ge-ometric complexity of datasets. Scientific Reports, 6:31377,2016. 1

[2] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(Nov):2579–2605,2008. 1, 3

[3] K. W. Pettis, T. A. Bailey, A. K. Jain, and R. C. Dubes. Anintrinsic dimensionality estimator from near-neighbor infor-

mation. IEEE Transactions on Pattern Analysis and MachineIntelligence, (1):25–37, 1979. 1, 2

[4] A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, and P. Cam-padelli. Novel high intrinsic dimensionality estimators. Ma-chine Learning, 89(1-2):37–65, 2012. 1, 2

[5] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi.Inception-v4, inception-resnet and the impact of residual con-nections on learning. In AAAI Conference on Artificial Intel-ligence, volume 4, page 12, 2017. 1

[6] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Web-scaletraining for face identification. In IEEE Conference on Com-puter Vision and Pattern Recognition, 2015. 1

[7] P. J. Verveer and R. P. W. Duin. An evaluation of intrinsic di-mensionality estimators. IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, 17(1):81–86, 1995. 2

[8] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representa-tion from scratch. arXiv:1411.7923, 2014. 1

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 42.57%64D: 42.36%32D: 35.11%16D: 30.24%12D: 21.29%

(a) IJB-C: FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)512D: 69.32%256D: 69.24%128D: 66.45%64D: 62.31%32D: 55.88%16D: 54.19%12D: 45.96%

(b) IJB-C: FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 71.26%256D: 72.68%128D: 70.94%64D: 70.34%32D: 55.26%16D: 34.02%

(c) IJB-C: SphereFace

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 93.90%64: 95.05%32: 94.42%16: 90.81%10: 81.61%

(d) LFW: FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 98.19%256: 98.46%128: 98.71%64: 98.65%32: 98.34%16: 96.43%10: 90.81%

(e) LFW: FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 96.74%256: 92.88%128: 93.18%64: 95.00%32: 95.31%16: 89.47%10: 77.31%

(f) LFW: SphereFace

Figure 4: Isomap: Face Verification on IJB-C and LFW (BLUFR) dataset for the (a) FaceNet-128, (b) FaceNet-512 and (c)SphereFace embeddings.

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 42.57%64D: 22.90%32D: 22.86%16D: 5.59%12D: 4.28%

(a) FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 69.32%256D: 60.87%128D: 58.64%64D: 59.26%32D: 52.54%16D: 43.06%12D: 27.70%

(b) FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 71.26%256D: 10.88%128D: 18.80%64D: 20.26%32D: 8.02%16D: 1.84%

(c) SphereFace

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

128D: 93.90%64: 86.65%32: 88.74%16: 84.14%10: 65.81%

(d) FaceNet-128

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 98.19%256: 96.28%128: 88.88%64: 58.54%32: 34.46%16: 30.15%10: 17.61%

(e) FaceNet-512

10 2 10 1 100 101


20

40

60

80

100

Verif

icatio

n Ra

te (%

)

512D: 96.74%256: 77.80%128: 32.95%64: 32.04%32: 11.73%16: 27.53%10: 6.73%

(f) SphereFace

Figure 5: Denoising Autoencoder: Face Verification on IJB-C and LFW (BLUFR) dataset for the (a) FaceNet-128, (b)FaceNet-512 and (c) SphereFace embeddings.

0 2 4 6 8 10 12 14Geodesic Distance (r)

0.0000.0050.0100.0150.0200.0250.0300.0350.040

P(r)

LFWIJB-C

(a) FaceNet-128


0.00

0.01

0.02

0.03

0.04

P(r)

LFWIJB-C

(b) FaceNet-512


0.00

0.01

0.02

0.03

0.04

P(r)

LFWIJB-C

(c) SphereFace

0 2 4 6 8Geodesic Distance (r)

0.0000.0050.0100.0150.0200.0250.0300.0350.040

P(r)

ImageNet

(d) ResNet-34

Figure 6: Distribution of geodesic distances for different representation models and datasets.

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))

Representation (K=4)Gaussian (m=10)Hypersphere (m=10)

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

2.001.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


(a) SphereFace on LFW Dataset

0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.50

1.25

1.00

0.75

0.50

0.25

0.00

log(

p(r)/

p(r m

ax))


0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

2.001.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


(b) SphereFace on IJB-C Dataset

0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00log(r/rmax)

1.751.501.251.000.750.500.250.00

log(

p(r)/

p(r m

ax))


0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0log(r/rmax)

76543210

log(

p(r)/

p(r m

ax))


(c) ResNet-34 on ImageNet Dataset

Figure 7: log p̂M(r)p̂M(rmax)

vs log rrmax

plots as we vary number of neighbors k for different representation models and datasets.

0.0 0.5 1.0 1.5 2.0 2.5 3.0Geodesic Distance (r)

01000020000300004000050000600007000080000

P(r)

histogram

(a) Histogram of Swiss Roll

0.5 0.6 0.7 0.8 0.9log r

rmax

0.5

0.4

0.3

0.2

0.1

0.0

log

p(r)

p(r m

ax))


(b) log p̂M(r)p̂M(rmax)

vs log rrmax

of Swiss Roll

1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00Dimension

0.05

0.10

0.15

0.20

0.25

Root

Mea

n Sq

uare

d Er

ror m (K=7)

(c) Dimensionality of Swiss Roll

Figure 8: Intrinsic Dimensionality of Swiss Roll

20

10

-15

-15

-10

-5

0

5

10

-10

15

-5 0 5 10 15 0

(a) Swiss Roll

-50 -40 -30 -20 -10 0 10 20 30 40

-20

-15

-10

-5

0

5

10

15

20

25

30

(b) Projection of Isomap

-30 -20 -10 0 10 20 30

-30

-20

-10

0

10

20

30

(c) Projection of DeepMDS

Figure 9: Swiss Roll: (a) the original 2000 points from the swiss roll manifold, (b) the 2-dim intrinsic space estimated byIsomap, and (3) the 2-dim intrinsic space estimated by our proposed method DeepMDS. In both cases, the blue and blackpoints, and correspondingly green and red points, are close together in both the intrinsic and ambient space.

(a) 2-D (b) 10-D (Intrinsic)

Figure 10: Visualization of Embeddings on LFW Dataset

(a) 2-D (b) 10-D (Intrinsic)

Figure 11: Visualization of Embeddings on LFW Dataset: subset of images from Fig. 10. The subset consists of 20 subjectsfrom LFW dataset.

on the intrinsic dimensionality of image representations...

Documents