linear functional estimation under multiplicative

SERGIO BRENNER MIGUEL ∗
Abstract
We study the non-parametric estimation of the value ϑ(f) of a linear functional evaluated at an unknown density function f with support on R+ based on an i.i.d. sample with multiplicative measurement errors. The proposed estimation procedure combines the estimation of the Mellin transform of the density f and a regularisation of the inverse of the Mellin transform by a spectral cut-off. In order to bound the mean squared error we distinguish several scenarios characterised through different decays of the upcoming Mellin transforms and the smoothnes of the linear functional. In fact, we identify scenarios, where a non-trivial choice of the upcoming tuning parameter is necessary and propose a data-driven choice based on a Goldenshluger-Lepski method. Additionally, we show minimax-optimality over Mellin-Sobolev spaces of the estimator.
Keywords: Linear functional model, multiplicative measurement errors, Mellin-transform, Mellin-Sobolev space, minimax theory,inverse problem, adaptation
AMS 2000 subject classifications: Primary 62G05; secondary 62F10 , 62C20,
1 Introduction
In this paper we are interested in estimating the value ϑ(f) of a linear functional evaluated at an unknown density f : R+ → R+ of a positive random variable X , when Y = XU for
∗Institut für Angewandte Mathematik, MΛTHEMΛTIKON, Im Neuenheimer Feld 205, D-69120 Heidelberg, Germany, e-mail: {brennermiguel|johannes}@math.uni-heidelberg.de
∗∗CNRS, MAP5 UMR 8145, F-75006 Paris, France, e-mail: [email protected]
1
fY (y) = [f ∗ g](y) =
∫ ∞ 0
f(x)g(y/x)x−1dx, y ∈ R+,
such that ∗ denotes multiplicative convolution. Therefore, the estimation of f and hence ϑ(f)
using an i.i.d. sample Y1, . . . , Yn from fY is called a multiplicative deconvolution problem, which is an inverse problem. Vardi [1989] and Vardi and Zhang [1992] introduce and study intensively multiplicative censoring, which corresponds to the particular multiplicative deconvolution problem with multiplicative error U uniformly distributed on [0, 1]. Multiplicative censoring is a common chal- lenge in survival analysis as explained and motivated in Van Es et al. [2000]. The estimation of the cumulative distribution function of X is considered in Vardi and Zhang [1992] and As- gharian and Wolfson [2005]. Series expansion methods are studied in Andersen and Hansen [2001] treating the model as an inverse problem. The density estimation in a multiplicative censoring model is considered in Brunel et al. [2016] using a kernel estimator and a convolution power kernel estimator. Assuming a uniform error distribution on an interval [1−α, 1+α]
for α ∈ (0, 1) Comte and Dion [2016] analyze a projection density estimator with respect to the Laguerre basis. Belomestny et al. [2016] study a beta-distributed error U . The multiplicative measurement error model covers all those three variations of multiplicative censoring. It was considered by Belomestny and Goldenshluger [2020] for the point-wise density estimation. The key to the analysis of multiplicative deconvolution is the multiplication theorem, which for a density fY = f ∗ g and their Mellin transformsM[fY ],M[f ] and M[g] (defined below) statesM[fY ] =M[f ]M[g]. Exploiting the multiplication theorem Be- lomestny and Goldenshluger [2020] introduce a kernel density estimator of f allowing more generally X and U to take also negative values. Moreover, they point out that transforming the data by applying the logarithm is a special case of their estimation strategy. Note that by applying the logarithm the model Y = XU writes log(Y ) = log(X) + log(U), and hence multiplicative convolution becomes (additive) convolution for the log-transformed data. As a consequence, first the density of log(X) is eventually estimated employing usual strategies for non-parametric (additive) deconvolution problems (see for example Meister [2009]) and then secondly transformed back to an estimator of f . Thereby, regularity conditions commonly used in (additive) deconvolution problems are imposed on the density of log(X), which however is difficult to interpret as regularity conditions on the density of f . Furthermore, the analysis of a global risk of an estimator f using this naive approach is challenging as Comte and Dion [2016] point out. The global estimation of the density under multiplicative measurement errors is considered in
2
Brenner Miguel et al. [2021] using the Mellin transform and a spectral cut-off regularization of its inverse to define an estimator for the unknown density f . Brenner Miguel [2021] studies the global density estimation under multiplicative measurement errors for multivariate random variables while the global estimation of the survival function can be found in Brenner Miguel and Phandoidaen [2021]. In this paper we estimate the value ϑ(f) of a known linear functional of the unknown density f plugging in the estimator of f proposed by Brenner Miguel et al. [2021]. In additive deconvolution linear functional estimation has been studied for instance by Butucea and Comte [2009], Mabon [2016] and Pensky [2017] to mention only a few. In the literature, the most studied examples for estimating linear functionals is point-wise estimation of the unknown density f , the survival function, cumulative distribution function (c.d.f.) or the Laplace transform of f . These examples are particular cases of our general setting. More precisely, we show below, that in each of those examples the quantity of interest can be written as linear functional in the form
ϑ(f) := 1
Ψ(−t)M[f ](t)dt,
where Ψ : R→ C is a known function andM[f ] denotes the Mellin transform of f . Exploiting properties of the Mellin transform we characterize the underlying inverse problem and natural regularity conditions which borrow ideas from the inverse problems community (see e.g. Engl et al. [2000]). More precisely, we identify conditions on the decay of the Mellin transform of f and g and of the function Ψ to ensure that our estimator is well-defined. We illustrate those conditions by different scenarios. The proposed estimator, however, involves a tuning parameter and we specify when this parameter has to be chosen non-trivially. For that case, we propose a data-driven choice of the tuning parameter inspired by the work of Goldenshluger and Lepski [2011] who consider data-driven bandwidth selection in kernel density estimation. We establish an oracle inequality for the plug-in spectral cut-off estimator under fairly mild assumptions on the error density g. Moreover we show that uniformly over Mellin-Sobolev spaces the proposed estimator is minimax-optimal. The paper is organized in the following way: in section 2 we develop the data-driven plug-in estimator and introduce our basic assumptions. We state an oracle type upper bound for the mean squared error of the plug-in spectral cut-off estimator with fully-data driven choice of the tuning parameter. In section 3 we state a maximal upper bound over Mellin-Sobolev spaces mean squared error of the spectral cut-off estimator for the plug-in spectral cut-off estimator with optimal tuning parameter realising a squared-bias-variance trade-off and lower bounds for the point-wise estimation of the unknown density f , the survival function and the c.d.f. The proofs can be found in the appendix.
3
2 Data-driven estimation
:= ∫
|h(x)|pω(x)dx. Denote by Lp(ω) the set of all measurable functions from
to C with finite . Lp(ω)-norm. In the case p = 2 let h1, h2L2 (ω) :=
∫ h1(x)h2(x)ω(x)dx
for h1, h2 ∈ L2 (ω) be the corresponding weighted scalar product. Using a slight abuse of no-
tation xa with a ∈ R denotes the weight function x 7→ xa, and we write · xa := · L2 R(xa),
respectively ·, ·xa := ·, ·L2 R+ (x2c−1). Further we use the abbreviation LpR = LpR(ω) for the
unweighted LpR space with ω(x) = 1 for all x ∈ R. For a measurable function h : R → C let us denote by h∞,ω the essential supremum of the function x 7→ h(x)ω(x).
Mellin transform Let c ∈ R. For two functions h1, h2 ∈ L1 R+(xc−1) and any y ∈ R we have∫∞
0 |h1(x)h2(y/x)x−1|dx < ∞ which allows us to define their multiplicative convolution
h1 ∗ h2 : R→ C through
(h1 ∗ h2)(y) =
h1(y/x)h2(x)x−1dx, y ∈ R. (2.1)
For a proof sketch of h1 ∗ h2 ∈ L1 R+(xc−1) and the following properties we refer to Bren-
ner Miguel [2021]. If in addition h1 ∈ L2 R+(x2c−1) (respectively h2 ∈ L2
R+(x2c−1)) then h1 ∗h2 ∈ L2
R+(x2c−1), too. For h ∈ L1 R+(xc−1) we define its Mellin transformMc[h] : R→ C
at the development point c ∈ R by
Mc[h](t) :=
xc−1+ith(x)dx, t ∈ R. (2.2)
One key property of the Mellin transform, which makes it so appealing for multiplicative deconvolution problems, is the multiplication theorem, which for h1, h2 ∈ L1
R+(xc−1) states
Mc[h1 ∗ h2](t) =Mc[h1](t)Mc[h2](t), t ∈ R. (2.3)
Making use of the Fourier transform, the domain of definition of the Mellin transform can be extended to L2
R+(x2c−1). Therefore, let : R → R+, with x 7→ exp(−2πx) and denote by −1 : R+ → R its inverse. Note that the diffeomorphisms , −1 map Lebesgue null sets on Lebesgue null sets. Consequently, the map Φc : L2
R+(x2c−1) → L2 R, with h 7→ c · (h )
is a well-defined isomorphism and denote by Φ−1 c : L2
R → L2 R+(x2c−1) its inverse. For h ∈
L2 R+(x2c−1) the Mellin transformMc[h] : R→ C developed in c ∈ R is defined through
Mc[h](t) := (2π)F [Φc[h]](t) for any t ∈ R.
4
Here, F : L2 R → L2
R with H 7→ (t 7→ F [H](t) := limk→∞ ∫ k −k exp(−2πitx)H(x)dt) denotes
the Plancherel-Fourier transform where the limit is understood in a L2 R convergence sense.
Due to this definition several properties of the Mellin transform can be deduced from the well-known Fourier theory. In particular for any h ∈ L1
R+(xc−1) ∩ L2 R+ (x2c−1) we have
Mc[h](t) =
xc−1+ith(x)dx for any t ∈ R, (2.4)
which coincides with the common definition of a Mellin transform given in Paris and Kaminski [2001].
Example 2.1. Now let us give a few examples of Mellin transforms of commonly considered distribution families.
(i) Beta Distribution admits a density gb(x) := 1(0,1)(x)b(1− x)b−1 for a b ∈ N and x ∈ R+. Then, we have gb ∈ L2
R+(x2c−1) ∩ L1 R+(xc−1) for any c > 0 and
Mc[gb](t) = b∏
c− 1 + j + it , t ∈ R.
(ii) Scaled Log-Gamma Distribution given by its density gµ,a,λ(x) = exp(λµ) Γ(a)
x−λ−1(log(x) − µ)a−11(eµ,∞)(x) for a, λ, x ∈ R+ and µ ∈ R. Then, for c < λ + 1 hold gµ,a,λ ∈ L2
R+(x2c−1) ∩ L1 R+(xc−1) and
Mc[gµ,a,λ](t) = exp(µ(c− 1 + it))(λ− c+ 1− it)−a, t ∈ R.
Note that gµ,1,λ is the density of a Pareto distribution with parameter eµ and λ and g0,a,λ
is the density of a Log-Gamma distribution.
(iii) Gamma Distribution admits a density gd(x) = xd−1
Γ(d) exp(−x)1R+(x) for d, x ∈ R+. Then,
for c > −d+ 1 we have gd ∈ L2 R+(x2c−1) ∩ L1
R+(xc−1) and
Γ(d) , t ∈ R.
(iv) Weibull Distribution admits a density gm(x) = mxm−1 exp(−xm)1R+(x) for m,x ∈ R+. For c > −m+ 1,Mc[gm] is well-defined and
Mc[gm](t) = (c− 1 + it)
m Γ
exp(−(log(x)−µ)2/2λ2)1R+(x)
for λ, x ∈ R+ and µ ∈ R.Mc[gµ,λ] is well-defined for any c ∈ R and it holds
Mc[gµ,λ](t) = exp(µ(c− 1 + it)) exp
( σ2(c− 1 + it)2
c : L2 R → L2
R+ (x2c−1) its inverse. If H ∈ L1 R ∩ L2
R then the inverse Mellin transform is explicitly expressed through
M−1 c [H](x) =
1
2π
Furthermore, a Plancherel-type equation holds for the Mellin transform. Precisely, for all h1, h2 ∈ L2
R+ (x2c−1) we have
h1, h2x2c−1 = (2π)−1Mc[h1],Mc[h2]L2 R
and h12 x2c−1 = (2π)−1Mc[h1]2
L2 R . (2.6)
Linear functional In the following paragraph we introduce the linear functional, motivate it through a collection of examples and determine sufficient conditions to ensure that the considered objects are well-defined. We then define an estimator based on the empirical Mellin transform and the multiplication theorem for Mellin transforms. Let c ∈ R and f ∈ L2
R+ (x2c−1). In the sequel we are interested in estimating the linear functional
ϑ(f) := 1
Ψ(−t)Mc[f ](t)dt (2.7)
for a function Ψ : R→ C with Ψ(t) = Ψ(−t) for any t ∈ R and such that ΨMc[f ] ∈ L1 R. The
slattern is fulfilled, if Ψ ∈ L2 R. Nevertheless a more detailed analysis of the decay ofMc[f ]
and Ψ allows to ensure the integrability in a less restrictive situation. Before we present an estimator for ϑ(f) let us briefly illustrate our general approach by typical examples.
Illustration 2.2. We study in the sequel point-wise estimation at a given point xo ∈ R+ in the following four examples.
(i) Density: Introducing the evaluation f(xo) of f at the point xo ifMc[f ] ∈ L1 R then we have
f(xo) =M−1 c [Mc[f ]](xo) = ϑ(f) with Ψ(t) := x−c+ito , t ∈ R, satisfying Ψ(t) = Ψ(−t)
for all t ∈ R.
(ii) Cumulative distribution function: Considering the evaluation F (xo) = ∫ xo
0 f(x)dx of the
c.d.f. F at the point xo define for c < 1 the function ψ(x) := x1−2c1(0,xo)(x), x ∈ R+, which belongs to L1
R+(xc−1) ∩ L2 R+ (x2c−1). Setting
Ψ(t) :=Mc[ψ](t) =
x−c+itdx = (1− c+ it)−1x1−c+it o
we get ϑ(f) = F (xo) by an application of the Plancherel equality.
(iii) Survival function: Introducing the evaluation S(xo) = ∫∞ xo f(x)dx of the survival func-
tion S at the point xo define the function ψ(x) := x1−2c1(xo,∞)(x), x ∈ R+, which for
6
R+ (x2c−1). Setting
Ψ(t) :=Mc[ψ](t) =
x−c+itdx = −(1− c+ it)−1x1−c+it o
we get ϑ(f) = S(xo) by an application of the Plancherel equality.
(iv) Laplace transform: Given the evaluation L(xo) = ∫∞
0 exp(−xox)f(x)dx of the Laplace
transform L at the point xo define for c < 1 the function ψ(x) := x1−2c exp(−tox), x ∈ R+, which belongs to L1
R+(xc−1) ∩ L2 R+ (x2c−1). Setting
Ψ(t) :=Mc[ψ](t) =
∫ ∞ 0
x−c+it exp(−xox)dx = xc−1−it o Γ(1− c+ it)
we get ϑ(f) = L(xo) by an application of the Plancherel equality.
It is worth stressing out that in all four examples introduced in Illustration 2.2, the quantity of interest is independent of the choice of the model parameter c ∈ R. However, the conditions on c ∈ R given Illustration 2.2 and the assumption f ∈ L2
R+(x2c−1) ensure that the represen- tation ϑ(f) is well-defined, and hence are essential for our estimation strategy. Consequently, we present the upcoming theory for almost arbitrary choices of c ∈ R.
Remark 2.3. Consider Illustration 2.2. Since S = 1 − F there is an elementary connection between the estimation of the survival function and the estimation of the c.d.f.. For example, we eventually deduce from a c.d.f. estimator F (xo) a survival function estimator S(xo) through S(xo) := 1− F (xo) with same risk, that is EfY ((S(xo)−S(xo))
2) = EfY ((F xo)−F (xo)) 2).
Thus we can define for any c 6= 1 a survival function (respectively c.d.f.) estimator using the results of (ii) and (iii) in Illustration 2.2.
Estimation strategy To define an estimator of the quantity ϑ(f) we make use of the multiplication theorem (2.3) as it is common for deconvolution problems. To do so, let f ∈ L2
R+(x2c−1)∩L1 R+(xc−1) and g ∈ L1
R+(xc−1) then we deduceMc[fY ](t) =Mc[f ](t)Mc[g](t)
for all t ∈ R by application of the multiplication theorem. Under the mild assumption that Mc[g](t) 6= 0 for all t ∈ R we conclude thatMc[f ](t) =Mc[fY ](t)/Mc[g](t) for all t ∈ R and rewrite (2.7) into
ϑ(f) = 1
Mc[g](t) dt. (2.8)
A naive approach is to replace in (2.8) the quantity Mc[fY ] by its empirical counterpart Mc(t) := n−1
∑n j=1 Y
c−1+it j , t ∈ R. However, the resulting integral is not well-defined, since
7
ΨMc/Mc[g] is generally not integrable. We ensure integrability introducing an additional spectral cut-off regularisation which leads to the following estimator
ϑk := 1
Mc[g](t) dt for any k ∈ R+. (2.9)
The following proposition shows that the estimator is consistent for suitable choice of the cut-off parameter k ∈ R+. We denote by EnfY the expectation corresponding to the distribution PnfY of (Y1, . . . , Yn) and use the abbreviation EfY := E1
fY . Analogously, we define Enf and Ef .
Proposition 2.4. For c ∈ R assume that f ∈ L2 R+ (x2c−1), ΨMc[f ] ∈ L1
R and EfY (Y 2(c−1)
1 ) <
EnfY ((ϑk − ϑ(f))2) 6 1[k,∞)ΨMc[f ]2 L1 R
+ EfY (Y
L1 R
EnfY ((ϑk − ϑ(f))2) 6 1[k,∞)ΨMc[f ]2 L1 R
+ g∞,x2c−1
Ψ(t) Mc[g](t)
2 dt. Choosing now a sequence of spectral cut-off parameters (kn)n∈N such that kn → ∞ and
1[−kn,kn]Ψ/Mc[g]2 L1(R)n
−1 → 0 (respectively Ψ,g(kn)n−1 → 0) implies that ϑkn is a con-
sistent estimator of ϑ(f), that is EnfY ((ϑkn − ϑ(f))2) → 0 for n → ∞. We note that the additional assumption, g∞,x2c−1 < ∞, is fulfilled by many error densities and thus rather weak.
Remark 2.5. Despite the fact, that the first bound (2.10) only requires a finite second moment of Y c−1
1 , we have in many cases Ψ,g(k)1[−k.k]Ψ/Mc[g]−2 L1 R → 0 for k → ∞, implying
that the bound of the variance term in (2.11) increases slower in k than the bound presented in (2.10). It is worth stressing out, that there exist cases where the opposite effect occurs. For instance let the error U be lognormal-distribution with parameter µ = 0, λ = 1, see Example 2.1. Then supy∈R+ y2c−1g(y) = supz∈R
exp(2(c−1)z)√ 2πλ
exp(−(z − µ)2/2λ2)) < ∞.
Thus if E(X 2(c−1) 1 ) < ∞ both bounds are finite and following the argumentation of Butucea
and Tsybakov [2008] one can see, that in the special case of point-wise density estimation, the inequality presented in (2.10) is more favourable than the inequality presented in (2.11).
For the upcoming theory, we will focus on the second bound of Proposition 2.4. Assuming that g∞,x2c−1 <∞, allows us to state that the growth of the second summand, also referred as variance term, is determined by the growth of Ψ,g(k) as k going to infinity.
8
The parametric case In this paragraph we determine when Proposition 2.4 implies a parametric rate of the estimator. To be precise, there are two scenarios only which occur.
(P) If supk∈R+ Ψ,g(k) = ΨMc[g]−12 L2(R) < ∞, i.e. the second summand in (2.11) is
uniformly bounded in k and hence of order n−1. Then for all sufficiently large values of k ∈ R+ the bias term is negligible with respect to the parametric rate n−1.
(NP) If supk∈R+ Ψ,g(k) = ΨMc[g]−12 L2(R) = ∞, i.e. the second summand is unbounded
and hence necessitates an optimal choice of parameter k ∈ R+ realising to squared-bias- variance trade-off.
Our aim is now to characterise when the case (P) occur. To do so, we start by introducing a typical characterisation of the decay of the error density and the decay of the function Ψ, starting with the error density. Let us first revisit Example 2.1 to analyse the decay of the presented densities.
Example 2.6 (Example 2.1 continued).
(i) Beta Distribution: For c > 0 and b ∈ N we haveMc[gb](t) = ∏b
j=1 j
and thus
cg,c(1 + t2)−b/2 6 |Mc[gb](t)| 6 Cg,c(1 + t2)−b/2 t ∈ R
where cg,c, Cg,c > 0 are positive constants only depending on g and c.
(ii) Scaled Log-Gamma Distribution: For λ, a ∈ R+, µ ∈ R and c < λ + 1 we have Mc[gµ,a,λ](t) = exp(µ(c− 1 + it))(λ− c+ 1− it)−a for t ∈ R.
cg,c(1 + t2)−a/2 6 |Mc[gµ,a,λ](t)| 6 Cg,c(1 + t2)−a/2 t ∈ R
where cg,c, Cg,c > 0 are positive constants only depending on g and c.
(iii) Gamma Distribution: For d ∈ R+ and c > −d + 1 we haveMc[gd](t) = Γ(c+d−1+it) Γ(d)
for t ∈ R and thus
cg,c(1 + t2)(c+d−1.5)/2 exp(−|t|π/2) 6 |Mc[gd](t)| 6 Cg,c(1 + t2)(c+d−1.5)/2 exp(−|t|π/2)
for t ∈ R where cg,c, Cg,c > 0 are positive constants only depending on g and c.
(iv) Weibull Distribution: Letm ∈ R+ and c > −m+1 we haveMc[gm](t) = (c−1+it) m
Γ ( c−1+it m
) for t ∈ R and thus
cg,c(1 + t2) 2c−2−m
2m exp(−|t|π 2m
) 6 |Mc[gm](t)| 6 Cg,c(1 + t2) 2c−2−m
2m exp(−|t|π 2m
)
9
(v) Lognormal Distribution: Let λ ∈ R+, µ ∈ R and c ∈ R we have Mc[gµ,λ](t) =
exp(µ(c− 1 + it)) exp ( λ2(c−1+it)2
2
) for t ∈ R and thus
cg,c exp(−λ2t2/2) 6 |Mc[gm](t)| 6 Cg,c exp(−λ2t2/2)
Motivated by Example 2.1 we distinguish between smooth error and supersmooth error densities staying in the terminology of Fan [1991], Belomestny and Goldenshluger [2020] or Brenner Miguel et al. [2021]. An error density g is called smooth if there exists a γ, cg,c, Cg,c > 0 such that
cg(1 + t2)−γ/2 6 |Mc[g](t)| 6 Cg,c(1 + t2)−γ/2, t ∈ R ([G1])
and it is referred to as super smooth if there exists λ, ρ, cg,c, Cg,c > 0 and γ ∈ R such that
cg,c(1 + t2)−γ/2 exp(−λ|t|ρ) 6 |Mc[g](t)| 6 Cg,c(1 + t2)−γ/2 exp(−λ|t|ρ), t ∈ R. ([G2])
On the other hand to calculate the growth of Ψ,g we specify the decay of Ψ. Similar to the error density g we consider the case of a smooth Ψ, i.e. there exists cΨ,c, CΨ,c > 0 and p > 0
such that
cΨ,c(1 + t2)−p/2 6 |Ψ(t)| 6 CΨ,c(1 + t2)−p/2, t ∈ R, ([Ψ1])
and a super smooth Ψ, i.e. there exists µ,R, cΨ,c, CΨ,c > 0 and p ∈ R such that
cΨ,c(1 + t2)−p/2 exp(−µ|t|R) 6 |Mc[Ψ](t)| 6 CΨ,c(1 + t2)−p/2 exp(−µ|t|R), t ∈ R. ([Ψ2])
As we see in the following Illustration the examples of Ψ considered in Illustration 2.2 do fit into these two cases.
Illustration 2.7 (Illustration 2.2 continued).
(i) Point-wise density estimation: We have that |Ψ(t)| = x−co and thus p = 0 in sense of [Ψ1].
(ii) Point-wise cumulative distribution function estimation: We have that |Ψ(t)| = x1−c o√
(1−c)2+t2
and thus p = 1 in sense of [Ψ1].
(iii) Point-wise survival function estimation: We have that |Ψ(t)| = x1−c o√
(1−c)2+t2 and thus p = 1
in sense of [Ψ1].
(iv) Laplace transform estimation: We have that |Ψ(t)| = tc−1 o |Γ(1 − c + it)| and thus p =
1− 2c, µ = π/2 and R = 1 in the sense of [Ψ2].
10
After the introduction of the typical terminology for deconvolution settings we can state when the function Ψ,g is bounded. We summarize the collection of scenarios in the following Proposition.
Proposition 2.8. Assume that for a c ∈ R holds f ∈ L2 R+ (x2c−1), ΨMc[f ] ∈ L1
R, σ =
Ef (X2(c−1) 1 ) <∞ and g∞,x2c−1 <∞. Then for the cases
(i) [Ψ1] and [G1] with 2p− 2γ > 1;
(ii) [Ψ2] and [G1] or
(iii) [Ψ2] and [G2] with (R > ρ), (R = ρ, µ > λ) or (R = ρ, µ = λ, 2p− 2γ > 1)
we get that supk∈R+ Ψ,g <∞. Furthermore, for all k ∈ R sufficiently large we have
EnfY ((ϑk − ϑ(f))2) 6 C(Ψ, g, σ)
n .
The proof of Proposition 2.8 is a straight forward calculus and thus omitted. For our four examples of Ψ we get a parametric rate for the estimation of the survival function and cumulative distribution function if the error density fulfils [G1] with γ < 1/2 and a parametric rate for the estimation of the Laplace transform if the error density fulfils [G1] with γ > 0 or if g fulfils [G2] with (ρ < 1), (ρ = 1, λ < π/2),(ρ = 1 or λ = π/2, γ < −c).
The non-parametric case We now focus on the case (NP), that is supk∈R+ Ψ,g(k) = ∞, which occurs in several situations. In this scenario the first summand of Proposition 2.4 is decreasing in k while the second summand is increasing and unbounded. A choice of the parameter k ∈ R+ realising an optimal trade-off is thus non-trivial. We therefore define a data-driven procedure for the choice of the parameter k ∈ R+ inspired by the work of Goldenshluger and Lepski [2011]. In fact, let us reduce the set of possible parameters to Kn := {k ∈ N : g∞,x2c−1Ψ,g(k) 6
n, k 6 n1/2(log n)−2} and denote Kn = maxKn. We further introduce the variance term up to a (log n)-term
V (k) := χg∞,x2c−1σΨ,g(k)(log n)n−1
where χ > 0 is a numerical constant which is specified below and σ := Ef (X2(c−1) 1 ). Based
on a comparison of the estimators constructed above an estimator of the bias term is given by
A(k) := sup k′∈Kk,KnK
((ϑk′ − ϑk)2 − V (k′))+
where Ka, bK := (a, b] ∩ N for a, b ∈ R+. Analogously, we define Ja, bK = [a, b] ∩ N and Ja, bJ = [a, b) ∩ N. Since the term σ in V (k) depends on the unknown density f , and hence it
11
is itself unknown, we replace it by the plug-in estimator σ := 1 n
∑n j=1
we estimate V (k) and A(k) by
V (k) := 2χg∞,x2c−1σΨ,g(k) log(n)n−1 and A(k) := sup k′∈Kk,KnK
((ϑk′ − ϑk)2 − V (k′))+.
Below we study the fully data-driven estimator ϑk of ϑ(f) with
k := arg min k∈Kn
(A(k) + V (k)).
Theorem 2.9. For c ∈ R assume that f ∈ L2 R+ (x2c−1), ΨMc[f ] ∈ L1
R, EfY (Y 8(c−1)
1 ) < ∞ and g∞,x2c−1 <∞. Then for χ > 72 holds
EfY ((ϑ(f)− ϑk) 2) 6 C1 inf
k∈Kn (1[k,∞)ΨMc[f ]2
L1 R
n
where C1 is a positive numerical constant and C2 is a positive constant depending on Ψ, g, EfY (Y 8(c−1)).
The proof of Theorem 2.9 is postponed to the appendix. Let us shortly comment on the moment assumptions of Theorem 2.9. For c ∈ R close to one, the apparently high moment assumption EfY (Y
8(c−1) 1 ) <∞ is rather weak. For the point-wise density estimation, compare
Illustration 2.2, this assumption is always true if c = 1. For the point-wise survival function estimation (respectively. cumulative distribution function estimation), c = 1 cannot be fullfilled but arbitrary values of c ∈ R close to one are possible. As already mentioned, for the pointwise density estimation the assumption ΨM1[f ] ∈ L1
R implies that M1[f ] ∈ L1 R. For
c = 1, we see that g∞,x <∞ is fullfilled for many examples of error densities.
3 Minimax theory
In the following section we develop the minimax theory for the plug-in spectral cut-off estimator under the assumptions [G1] and [Ψ1]. Over the Mellin-Sobolev spaces we derive an upper for all linear functional satisfying assumption [Ψ1]. We state a lower bound for each of the cases (i)-(iii) of Illustration 2.2 separately, that is point-wise estimation of the density f , the survival function S and the cumulative distribution function F . We finish this section, by motivating the regularity spaces through their analytically implications.
Upper bound Let us restrict to the scenario where [G1] and [Ψ1] holds for 2p − 2γ 6 1. Here one can state that there exist a constant CΨ,g > 0 such that Ψ,g(k) 6 CΨ,gk
2γ−2p+1.
12
Now let us consider the bias term. To do so, we introduce we Mellin-Sobolev spaces at the development point c ∈ R by
Ws c(R+) := {h ∈ L2
R+ (x2c−1) : |h|2s,c := (1 + t2)s/2Mc[h]L2 R <∞} (3.1)
with corresponding ellipsoids Ws c(L) := {h ∈ Ws
c(R+) : |h|2s,c < L}. We denote the subset of densities by
Ds,c,L R+ := {f ∈Ws
c(L) : f is a density,Ef (X2c−2 1 ) 6 L}. (3.2)
Using this construction we get the following result as a direct consequence.
Theorem 3.1. Assume that for a c ∈ R [G1] holds for g and [Ψ1 ] for Ψ. Additionally, assume that g∞,x2c−1 < ∞. Setting for any s > 1/2 − p the cut-off parameter to kn :=
n1/(2s+2γ) implies then
EfY ((ϑkn − ϑ(f))2) 6 CL,s,Ψ,g,c n −(2s+2p−1)/(2s+2γ)
where CL,s,Ψ,g,c > 0 is a constant depending on L, s, p,Ψ, γ, c and g∞,x2c−1 .
Proof of Theorem 3.1. Evaluating the upper bound in Proposition 2.4 under [G1] and [Ψ1 ] we have g∞,x2c−1σψ,g(k)n−1 6 CΨ,g,L
k2γ−2p+1
6 CL,c
|Ψ(t)|2(c2 + t2)−sdt 6 CL,c,s,ψk −2s−2p+1.
Now choosing kn := n1/(2s+2γ) balances both term leading to the rate n−(2s+2p−1)/(2s+2γ).
The assumption s > 1/2 − p implies that ΨMc[f ] ∈ L1 R by a simple calculus which can
be found in proof of Theorem 3.1 in the appendix. Before considering the lower bounds let us illustrate the last Theorem using our examples (i) to (iii) of Illustration 2.2.
Illustration 3.2.
(i) Point-wise density estimation: Since p = 0 we assume that s > 1/2 = 1/2 − p. In this scenario Theorem 3.1 implies
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
(ii) Point-wise cumulative distribution function estimation: We have p = 1 and hence for any s > 0 holds s > 1/2 − p. Recall that for γ < 1/2 we are in the parametric case where we choose k ∈ R+ sufficiently large. For γ > 1/2 we deduce from Theorem 3.1 for any c < 1 that
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
13
(iii) Point-wise survival function estimation: We have p = 1 and hence for any s > 0 holds s > 1/2 − p. Recall that for γ < 1/2 we are in the parametric case where we choose k ∈ R+ sufficiently large. For γ > 1/2 we deduce from Theorem 3.1 for any c < 1 that
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
In example (i) the sign of c has a strong impact on the upper bound. In fact, for c > 0 it appears that the estimation in a point xo close to 0 is harder than for bigger values of xo. The case for c < 0 has an opposite effect. Further in (ii) and (iii), i.e. estimating the survival function and the c.d.f. estimation, the estimator of the c.d.f. seems to have a better behaviour close to 0 than the survival function estimator. We stress out, that in Illustration 2.2 we already mention that one can use an estimator for the survival function to construct an estimator for the c.d.f and vice versa. The results of Illustration 3.2 suggests to estimate the survival function directly or using the c.d.f. estimator, according if xo ∈ R+ is close to 0 or not.
Remark 3.3. Belomestny and Goldenshluger [2020] derive for point-wise density estimation a rate of n−2s/(2s+2γ+1) under similar assumptions on the error density g. However, they consider Hölder-type regularity classes rather than Mellin-Sobolev spaces which are of a global nature. Even if the rates in Illustration 3.2 seem to be less sharp compared to Belomestny and Goldenshluger [2020], they cannot be improved as shown by the lower bounds below.
Additionally, if γ > 1 we have that kn := n1/(2s+2γ) 6 k1/2 and thus kn ∈ Kn. We can deduce the following Corollary using the similar arguments of the proof of Theorem 3.1 on Theorem 2.9. We therefore omit its proof.
Corollary 3.4. Assume that for a c ∈ R holds [G1] holds for g and [Ψ1 ] for Ψ. Further let EfY (Y
8(c−1) 1 ), g∞,x2c−1 <∞ and f ∈ Ds,c,L
R+ for any s > 1/2− p. Then
EfY ((ϑk − ϑ(f))2) 6 Cf,g,Ψ log(n)n−(2s+2p−1)/(2s+2γ)
where Cf,g,Ψ > 0 is a constant depending on L,s,p,Ψ,γ,c, EfY (Y 8(c−1)
1 ) and g∞,x2c−1 .
To state that the presented rates of Theorem 3.1 cannot be improved over the whole Mellin- Sobolev ellipsoids, we give a lower bound result for the cases (i)-(iii) in the following section.
Lower bound For the following part, we will need to have an additionally assumption on the error density g. In fact, we will assume that g has bounded support, that is g(x) = 0 for x > d, d ∈ R+. For the sake of simplicity we will say that d = 1. Further we assume that there exists c′g, C
′ g ∈ R+ such that
c′g(1 + t2)−γ/2 6 |M1/2[g](t)| 6 C ′g(1 + t2)−γ/2 for |t| → ∞. ([G1’])
14
For technical reasons we will restrict ourselves to the case of c > 1/2.
Theorem 3.5. Let s, γ ∈ N , assume that [G1] and [G1’] holds . Then there exist constants Cg,xo,i, Ls,g,xo,c,i > 0,i ∈ J3K, such that
(i) Point-wise density estimation: for all L > Ls,g,xo,c,1, n ∈ N and for any estimator f(xo)
of f(xo) based on an i.i.d. sample (Yj)j∈J1,nK,
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
(ii) Point-wise survival function estimation: for all L > Ls,g,xo,c,2, n ∈ N and for any estimator S(xo) of S(xo) based on an i.i.d. sample (Yj)j∈J1,nK,
sup f∈Ds,c,L
−(2s+1)/(2s+2γ).
(iii) Point-wise cumulative distribution function estimation: for all L > Ls,g,xo,c,3, n ∈ N and for any estimator F (xo) of F (xo) based on an i.i.d. sample (Yj)j∈J1,nK,
sup f∈Ds,c,L
−(2s+1)/(2s+2γ).
We want to stress out that in the multiplicative censoring model, the family (gk)k∈N of Beta(1,k) densities fulfils both assumption [G1] and [G1’].
Regularity assumptions While in the theory of inverse problems the definition of the Mellin- Sobolev spaces is quite natural, we want to stress out that elements of these spaces can be characterised by their analytical properties. In Brenner Miguel et al. [2021] one can find a characterisation of Ws
1(R+). Since the generalisation for the spaces Ws c(R+) is straight forward, we
only state the result while the proof for the case c = 1 can be found in Brenner Miguel et al. [2021].
Proposition 3.6. Let s ∈ N. Then f ∈ Ws c(R+) if and only if f is s − 1-times continuously
differentiable where f (s−1) is locally absolutely continuous with derivative f (s) and ωjf (j) ∈ L2
R+ (ω2c−1) for all j ∈ J0, sK.
15
Appendix
A Proofs of section 2
Usefull inequality The next inequality was is state in the following form in Comte [2017] based on a similar formulation in Birgé and Massart [1998].
Lemma A.1. (Bernstein inequality) LetX1, . . . , Xn independent random variables and Tn(X) :=∑ j=1(Xi − E(Xi)). Then for η > 0,
P(|Tn(X)− E(Tn(X))| > nη) 6 2 exp(− nη
2
2
2
i=1 E(|Xm i |) 6 m!
2 v2bm−2 for all m > 2. If the Xi are identically distributed, the
previuos condition can be replaced by Var(X1) 6 v2 and |X1| 6 b.
Proof of Proposition 2.4. Let us denote for any k ∈ R+ the expectation ϑk := EnfY (ϑk) which leads to the usual squared bias-variance decomposition
EnfY ((ϑk − ϑ(f))2) = (ϑk − ϑ(f))2 + VarnfY (ϑk). (A.1)
Consider the first summand in (A.1)- An application of the Fubini-Tonelli theorem implies
(ϑk − ϑ(f))2 =
6 1[k,∞)ΨMc[f ]2 L1 R
Study the the second term in (A.1). Then the bound in (2.10) follows then by the following inequality
VarnfY (ϑk) 6 1
1
VarnfY (ϑk) 6 1
Mc[g](t) dt
2πn
∫ k
−k
Ψ(t)
2 dt. Furthermore we have for any y > 0 that
y2c−1fY (y) =
1 ).
16
Proof of Theorem 2.9. Let us set ϑ := ϑ(f). By the definition of k follows for any k ∈ Kn
(ϑ− ϑk) 2 6 2(ϑ− ϑk)2 + 2(ϑk − ϑk)
2
6 2(ϑ− ϑk)2 + 2(ϑk − ϑk∧k) 2 + 2(ϑk∧k − ϑk)
2
6 2(ϑ− ϑk)2 + 4(A(k) + V (k)).
Consider A(k) we have by a straight forward calculus A(k) 6 A(k) + supk′∈Kn(V (k′) − V (k′))+ and thus
A(k) 6 sup k′∈Kk,KnK
( 3(ϑk‘ − ϑk′)2 + 3(ϑk − ϑk)2 − V (k′)
) +
(ϑk − ϑk′)2.
By the monoticity of V (k) we deduce that for k < k′ holds V (k) > 1 2 V (k) + 1
2 V (k′) which
simplifies the term to
( (ϑk‘ − ϑk′)2 − 1
(ϑk − ϑk′)2
while the lattern summand can be bounded for any k′ ∈ Jk,KnJ by
(ϑk − ϑk′)2 6
|Ψ(t)Mc[f ](t)|dt)2.
Further we have that (ϑ− ϑk)2 6 2(ϑ− ϑk)2 + 2(ϑk − ϑk)2 which implies
(ϑ− ϑk) 2 6
) + 4 sup
+ 26 sup k′∈Jk,KnK
( (ϑk′ − ϑk′)2 − 1
) +
.
To control the last term we split the centred arithmetic mean ϑk − ϑk into two terms, applying at one term a Bernstein inequality, cf lemma A.1, and standard techniques on the other term. For a positive sequence (cn)n∈N and t ∈ R introduce
Mc(t) = n−1
n∑ j=1
c−1 j ) + Y c−1+it
j 1(cn,∞)(Y c−1 j )) =: Mc,1(t) + Mc,2(t).
Split the centred arithmetic mean ϑk−ϑk = νk,1 +νk,2 where νk,i := 1 2π
∫ k −k
(Mc,i(t)− EfY (Mc,i(t)))dt. Thus we have
(ϑ− ϑk) 2 6
) + 4 sup
+ 52 sup k′∈Kn
ν2 k′,2.
The claim of the theorem follows thus by the following lemma.
17
Lemma A.2. Under the assumptions of Theorem 2.9 with cn := √ n1/2σg∞,x2c−1 log(n)/42
hold
( ν2 k′,1 −
n ,
ν2 k′,2) 6
(V (k′)− V (k′))+) 6 C(E(X
4(c−1) 1 , σ)
Proof of Lemma A.2. To prove (i). we see that
EfY ( sup k′∈Kn
12 + x)dx.
Now our aim is to apply the Bernstein inequality Lemma A.1. To do so, defining for y > 0
the function hk(y) := 1 2π
∫ k −k
yitdt leads to
c−1 j )hk(Yj)− EfY (Y c−1
1 1(0,cn)(Y1)hk(Y1))
Ψ(t) Mc[g](t)
dt 6√kΨ,g(k) implying |Y c−1 j 1(0,cn)(Y
c−1 j )hk(Yj)| 6
cn √ kΨ,g(k) =: b. Further,
VarfY (Y c−1 1 1(0,cn)(Y1)hk(Y1)) 6 EfY (Y 2c−2
1 h2 k(Y1)) 6 x2c−1g∞σΨ,g(k) =: v.
Therefore the Bernstein inequality yields, for any x > 0
PfY (|νk,1| > √ V (k)
4v ( V (k)
√ V (k)
12 + √ x))
using the concavity of the square root. We have thus to bound the 4 upcoming terms. In fact
n
4v
(V (k) 12
n
8b
8cn √ kΨ,g(k)12
28cn
√ n1/2
k >
3
2 log(n)
by definition of Kn and cn = √ n1/2σg∞,x2c−1 log(n)/42. In analogy we can show that
n
8b =
> 5 √
( √
√ x log(n)(σg∞,x2c−1)−1). Thus
we conclude
n .
For part (ii) we have |νk′,2| 6 (2π)−1 ∫ k′ −k′ |Ψ(t)||Mc[g](t)|−1|Mc,2(t)−EfY (Mc,2(t))|dt im-
plying with the Cauchy Schwartz inequality that
EfY ( sup k′∈Kn
1
2π
∫ Kn
−Kn
Ψ(t)
1 1(cn,∞)(Y c−1
6 CΨ,gn 1/2EfY (Y (c−1)(2+u))c−un .
for any u ∈ R+. Choosing u = 6 leads to EfY (supk′∈Jk,KnK ν 2 k′,2) 6 CΨ,g,σEfY (Y
8(c−1) 1 )n−1.
To show inequality (iii), we first define the event := {|σ − σ| < σ 2 }. Then on we have
σ 2 6 σ 6 3
2 σ. Which implies that V (k) 6 V (k) 6 3V (k) and
EfY ( sup k′∈Kn
(V (k′)− V (k′))+) 6 2χEfY (|σ − σ|1c) 6 2χ VarfY (σ)
σ
by application of the Cauchy-Schwartz and the Markov inequality. This implies the claim.
B Proofs of section 3
Proof of Theorem 3.5. First we outline here the main steps of the proof. We will construct propose two densities fo, f1 in Ds,c,L
R+ by a perturbation with a small bump, such that the differ- ence (ϑ(f1)− ϑ(f2))2 and the Kullback-Leibler divergence of their induced distributions can
19
be bounded from below and above, respectively. The claim follows then by applying Theorem 2.5 in Tsybakov [2008]. We use the following construction, which we present first. We set fo(x) := exp(−x) for x ∈ R+. Let C∞c (R+) be the set of all infinitely differentiable functions with compact support in R+ and let ψ ∈ C∞c (R+) be a function with support in [−1, 1],
∫ 1
−1 ψ(x)dx = 0, ψ(γ−1)(0) 6= 0 , ψ(γ)(0) 6= 0 and define for j ∈ No the fi-
nite constant Cj,∞ := max(ψ(l)∞,x0 , l ∈ J0, jK). For each xo ∈ R+ and h ∈ (0, xo/2)
(to be selected below) we define the bump-function ψh,xo(x) := ψ(x−xo h
), x ∈ R. Let us further define the operator S : C∞c (R) → C∞c (R) with S[f ](x) = −xf (1)(x) for all x ∈ R and define S1 := S and Sn := S Sn−1 for n ∈ N, n > 2. Now, for j ∈ N, set ψj,h,xo(x) := Sj[ψh,xo ](x) = (−1)j
∑j i=1 ci,jx
ih−iψ(i)(x−xo h
) for x ∈ R+ and ci,j > 1. For a bump-amplitude δ > 0 and γ ∈ N we define
f1(x) = fo(x) + δhγ+s−1/2ψγ,h,yo(x)x−1, x ∈ R+. (B.1)
The corresponding survival function So of fo is given by So(x) = exp(−x), for x ∈ R+, while Fo(x) = 1 − exp(−x). The resulting survival function and cumulative distribution functions F1 and S1 of f1 are then given by
S1(x) = So(x) + δhγ+s−1/2ψγ−1,h,yo(x), x ∈ R+
F1(x) = Fo(x)− δhγ+s−1/2ψγ−1,h,yo(x), x ∈ R+.
To ensure that S1, respectively F1, is a survival function, respectively a cumulative distribution function, it is sufficient to show that f1 is a density.
Lemma B.1. For any 0 < δ < δo(ψ, γ, xo) := exp(−3xo/2)(3xo/2)−γ(Cγ,∞cγ) −1 the func-
tion f1, defined in eq. B.1, is a density, where cγ = ∑γ
i=1 ci,γ .
Further one can show that these functions all lie inside the ellipsoids Ds,c,L R+ for L big
enough. This is captured in the following lemma.
Lemma B.2. Let s ∈ N and c > 1/2. Then, for all L > Ls,c,γ,δ,ψ,xo > 0 holds fo and f1, as in (B.1), belong to Ds,c,L
R+ .
For sake of simplicity we denote for a function ∈ L2 R+ the multiplicative convolution
with g by := [ ∗ g].
Lemma B.3. Let h 6 ho(ψ, γ). Then
1. (S1(xo)− So(xo))2 = (F1(xo)− Fo(xo))2 > c2γ−1,γ−1
2 δ2ψ(γ−1)(0)2h2s+1
2. (f1(xo)− fo(xo)) > c2γ,γ
2 δ2ψ(γ)(0)h2s−1 and
20
3. KL(f1, f0) 6 C(g, xo, fo)ψ2δ2h2s+2γ where KL is the Kullback-Leibler-divergence.
Selecting h = n−1/(2s+2γ), it follows
1
M
M
where C(2) g,yo,ψ,fo,δ
< 1/8 for all if δ 6 δ1(g, yo, ψ, fo). Thereby, we can use Theorem 2.5 of Tsybakov [2008], which in turn for any estimator f of f implies
sup f∈Ds,c,L
) > c > 0;
) > c > 0 and
sup f∈Ds,c,L
2 > C(1) ψ,δ,γ
) > c > 0.
Note that the constant C(1) ψ,δ,γ does only depend on ψ, γ and δ, hence it is independent of the
parameters s, L and n. The claim of Theorem 3.5 follows by using Markov’s inequality, which completes the proof.
Proofs of the lemmata
Proof of Lemma B.1. For any h ∈ C∞c (R+) holds S[h] ∈ C∞c (R+) and thus Sj[h] ∈ C∞c (R)
for any j ∈ N. Further for h ∈ C∞c (R+) holds ∫∞ −∞ h
(1)(x)dx = 0 which implies that for any δ > 0 and we have
∫∞ 0 f1(x)dx = 1.
By construction (B.1) the function ψh,xo has support supp(ψh,xo) in [xo/2, 3xo/2]. Since supp(S[h]) ⊆ supp(h) for all h ∈ C∞c (R+) the function ψγ,h,xo has support in [xo/2, 3xo/2]
too. First, for x /∈ [xo/2, 3xo/2] holds f1(x) = exp(−x) > 0. Further for x ∈ [xo/2, 3xo/2]
holds
f1(x) = fo(x) + δhs+γ−1/2x−1ψγ.h.yo(x) > exp(−3xo/2)− δ(3xo/2)γCγ,∞cγ
since ψj,h,xo∞ 6 (3xo/2)jCj,∞cjh −j for any s > 1 and j ∈ N where cj :=
∑j i=1 ci,j .
Now choosing δ 6 δo(ψ, γ) := exp(−3xo/2)(3xo/2)−γ(Cγ,∞cγ) −1 ensures f1(x) > 0 for all
x ∈ R+.
Proof of Lemma B.2. Our proof starts with the observation that for all t ∈ R and c > 0 that
Mc[fo](t) ∼ tc−1/2 exp(−π/2|t|), |t| > 2,
21
by applying the Stirling formula, compare Belomestny and Goldenshluger [2020]. Thus for every s ∈ N there exists Ls,c such that |fo|2s,c 6 L for all L > Ls,c. Next we consider |fo − f1|s,c. We have |fo − f1|2s = δ2h2s+2γ−1|ω−1ψγ,h,xo |2s,c where | . |s,c is defined in (3.1). Now since supp(ψγ,h,xo) ⊂ [xo/2, 3xo/2] and ψγ,h,xo ∈ C∞c (R+) we have that its Mellin transform is well-defined for any c ∈ R. By a integration by parts we see that for any φ ∈ C∞c (R+) and t, c ∈ R holds
Mc[S[φ]](t) = (c+ it)Mc[φ](t)
and thus |Mc−1[ψγ+s,h,xo ](t)|2 = ((c− 1)2 + t2)s|Mc−1[ψγ,h,xo ](t)|2 and thus
|ω−1ψγ,h,xo|2s,c 6 Cc
∫ ∞ −∞ |Mc−1[ψγ+s,h,xo ](t)|2dt = Cc
∫ 3xo/2
xo/2
x2c−3|ψγ+s,h,xo(x)|2dx
by the Parseval formula, cf eq. 2.6, which implies that |ω−1ψγ,h,yo |2s,c 6 Cc,xoψγ+s,h,xo2. Now applying the Jensen inequality leads to
ψγ+s,h,xo2 6 Cγ,s
γ+s∑ j=1
)2dx 6 Cγ,s,xoh −2γ−2s+1C∞,γ+s.
Thus |fo−f1|2s,c 6 C(c,s,γ,δ,ψ,xo) and |f1|2s,c 6 2(|fo−f1|2s,c+ |f1|2s,c) 6 2(C(c,s,γ,δ,ψ) +Ls,c) =:
Ls,c,γ,δ,ψ,xo,1. Now let us consider the moment condition. First we see that ∫∞
0 x2(c−1)fo(x) =
δhs+γ−1/2
xo/2
)dx
6 Cs,c,γ,δ,ψ,xo
Thus we have Efo(X2c−2),Ef1(X2(c−1)) 6 Cc + Cs,c,γ,δ,ψ,xo =: Ls,c,γ,δ,ψ,xo,2 Choosing now Ls,c,γ,δ,ψ,xo = max(Ls,c,γ,δ,ψ,xo,1.Ls,c,γ,δ,ψ,xo,2) shows the claim.
Proof of Lemma B.3. First we see that (So(xo)− S1(xo))
2 = (Fo(xo)− F1(xo)) 2 = δ2h2s+2γ−1(ψγ−1,h,xo(xo))
2 and that (ψγ−1,h,xo(xo))
2 = ∑γ−1
γ−1,γ−1h −2γ+2ψ(γ−1)(0)2.
For h small enough we thus
(So(xo)− S1(xo))) 2 >
2s+1
for h < ho(γ, ψ). In analogy, we can show that
(fo(xo)− f1(xo)) 2 = δ2h2γ+2s−1(ψγ,h,xo(xo))
2 > cψ,γh 2s−1.
22
For the second part we have by using KL(f1, fo) 6 χ2(f1, fo) := ∫ R+(f1(x)−fo(x))2/fo(x)dx
it is sufficient to bound the χ-squared divergence. We notice that fθ − fo has support in [0, 3xo/2] since f1 − fo has support in [xo/2, 3xo/2] and g has support in [0, 1] In fact for x > 3xo/2 holds fθ(y) − fo(y) =
∫∞ y
(fθ − fo)(x)x−1g(y/x)dx = 0. Since fo is monotone
decreasing we can deduce that fo is montone decreasing since for x2 > x1 ∈ R+ holds
fo(x2) =
∫ 1
0
g(x)x−1fo(x1/x)dx = fo(x1)
since the integrand is strictly positive. We conclude therefore that there exists a constant cfo,xo,g > 0 such that fo(x) > cfo,xo,g > 0 for all x ∈ (0, 3xo/2). Thus
χ2(f1, fo) 6 fo(3xo/2)−1f1 − fo2 = fo(3xo/2)−1δ2h2s+2γ−1ω−1ψγ,h,xo 2.
Let us now consider ω−1ψγ,h,xo2. In the first step we see by application of the Plancherel, cf. 2.6, that ω−1ψγ,h,xo2 = 1
2π
∫∞ −∞ |M1/2[ω−1ψγ,h,xo ](t)|2dt. Now for t ∈ R, we see by us-
ing the multiplication theorem for Mellin transforms thatM1/2[ω−1ψγ,h,xo ](t) =M1/2[g](t) · M1/2[ω−1Sγ[ψh,xo ]](t). Again we haveM1/2[ω−1Sγ[ψh,xo ]](t) = (−1/2+it)γM−1/2[ψh,xo ](t). Together with assumption [G1’] we get
ω−1ψγ,h,yo 2 6
C1(g)
2π
∫ ∞ −∞ |M−1/2[ψh,yo ](t)|2dt = C1(g)ω−1ψh,yo2 6 C(g, yo)hψ2.
Since M > 2K we have thus KL(fθ(j) , fθ(0)) 6 C(g,yo)ψ2
fo(3yo/2) δ2h2s+2γ.
References
K. E. Andersen and M. B. Hansen. Multiplicative censoring: density estimation by a series expansion approach. Journal of Statistical Planning and Inference, 98(1-2):137–155, 2001.
M. Asgharian and D. B. Wolfson. Asymptotic behavior of the unconditional npmle of the length-biased survivor function from right censored prevalent cohort data. The Annals of Statistics, 33(5):2109–2131, 2005.
D. Belomestny and A. Goldenshluger. Nonparametric density estimation from observations with multiplicative measurement errors. In Annales de l’Institut Henri Poincaré, Probabil- ités et Statistiques, volume 56, pages 36–67. Institut Henri Poincaré, 2020.
D. Belomestny, F. Comte, and V. Genon-Catalot. Nonparametric Laguerre estimation in the multiplicative censoring model. Electronic Journal of Statistics, 10(2):3114–3152, 2016.
23
L. Birgé and P. Massart. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli, 4(3):329–375, 1998. ISSN 1350-7265. doi: 10.2307/3318720. URL https://doi.org/10.2307/3318720.
S. Brenner Miguel. Anisotropic spectral cut-off estimation under multiplicative measurement errors. Preprint arXiv:2107.02120, 2021.
S. Brenner Miguel and Phandoidaen. Multiplicative deconvolution in survival analysis under dependency. Preprint arXiv:2107.05267, 2021.
S. Brenner Miguel, F. Comte, and J. Johannes. Spectral cut-off regularisation for density estimation under multiplicative measurement errors. Electronic Journal of Statistics, 15(1): 3551–3573, 2021.
E. Brunel, F. Comte, and V. Genon-Catalot. Nonparametric density and survival function estimation in the multiplicative censoring model. Test, 25(3):570–590, 2016.
C. Butucea and F. Comte. Adaptive estimation of linear functionals in the convolution model and applications. Bernoulli, 15(1):69–98, 2009.
C. Butucea and A. B. Tsybakov. Sharp optimality in density deconvolution with dominating bias. i. Theory of Probability & Its Applications, 52(1):24–39, 2008.
F. Comte. Nonparametric estimation. Master and Research. Spartacus-Idh, Paris, 2017.
F. Comte and C. Dion. Nonparametric estimation in a multiplicative censoring model with symmetric noise. Journal of Nonparametric Statistics, 28(4):768–801, 2016.
H. W. Engl, M. Hanke-Bourgeois, and A. Neubauer. Regularization of inverse problems. Kluwer Acad. Publ., 2000.
J. Fan. On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics, pages 1257–1272, 1991.
A. Goldenshluger and O. Lepski. Bandwidth selection in kernel density estimation: Ora- cle inequalities and adaptive minimax optimality. The Annals of Statistics, 39:1608–1632, 2011.
G. Mabon. Adaptive deconvolution of linear functionals on the nonnegative real line. Journal of Statistical Planning and Inference, 178:1–23, 2016.
A. Meister. Density deconvolution. In Deconvolution Problems in Nonparametric Statistics, pages 5–105. Springer, 2009.
R. B. Paris and D. Kaminski. Asymptotics and mellin-barnes integrals, volume 85. Cambridge University Press, 2001.
M. Pensky. Minimax theory of estimation of linear functionals of the deconvolution density with or without sparsity. The Annals of Statistics, 45(4):1516–1541, 2017.
A. B. Tsybakov. Introduction to nonparametric estimation. Springer Publishing Company, Incorporated, 2008.
B. Van Es, C. A. Klaassen, and K. Oudshoorn. Survival analysis under cross-sectional sam- pling: length bias and multiplicative censoring. Journal of Statistical Planning and Infer- ence, 91(2):295–312, 2000.
Y. Vardi. Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika, 76(4):751–761, 1989.
Y. Vardi and C.-H. Zhang. Large sample study of empirical distributions in a random- multiplicative censoring model. The Annals of Statistics, pages 1022–1039, 1992.
25

linear functional estimation under multiplicative

Documents