0601080v1

Upload: spotirak

Post on 10-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 0601080v1

    1/20

    arXiv:cs/0601080v1

    [cs.IT]18Jan2006

    On Measure Theoretic definitions of GeneralizedInformation Measures and Maximum Entropy

    Prescriptions

    Ambedkar Dukkipati, M Narasimha Murty and Shalabh

    Bhatnagar

    Department of Computer Science and Automation, Indian Institute of Science,

    Bangalore-560012, India.

    E-mail: [email protected], [email protected],

    [email protected]

    Abstract. Though Shannon entropy of a probability measure P, defined as

    X

    dPd ln

    dPd d on a measure space (X,M, ), does not qualify itself as an

    information measure (it is not a natural extension of the discrete case), maximum

    entropy (ME) prescriptions in the measure-theoretic case are consistent with that of

    discrete case. In this paper, we study the measure-theoretic definitions of generalized

    information measures and discuss the ME prescriptions. We present two results in

    this regard: (i) we prove that, as in the case of classical relative-entropy, the measure-

    theoretic definitions of generalized relative-entropies, Renyi and Tsallis, are natural

    extensions of their respective discrete cases, (ii) we show that, ME prescriptions of

    measure-theoretic Tsallis entropy are consistent with the discrete case.

    PACS numbers:

    Corresponding author

    http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1http://arxiv.org/abs/cs/0601080v1
  • 8/8/2019 0601080v1

    2/20

    2

    1. Introduction

    Shannon measure of information was developed essentially for the case when the random

    variable takes a finite number of values. However, in the literature, one often encounters

    an extension of Shannon entropy in the discrete case to the case of a one-dimensionalrandom variable with density function p in the form (e.g [1, 2])

    S(p) =

    +

    p(x) lnp(x) dx .

    This entropy in the continuous case as a pure-mathematical formula (assuming

    convergence of the integral and absolute continuity of the density p with respect to

    Lebesgue measure) resembles Shannon entropy in the discrete case, but can not be used

    as a measure of information. First, it is not a natural extension of Shannon entropy

    in the discrete case, since it is not the limit of the sequence finite discrete entropies

    corresponding to pmf which approximate the pdf p. Second, it is not strictly positive.Inspite of these short comings, one can still use the continuous entropy functional in

    conjunction with the principle of maximum entropy where one wants to find a probability

    density function that has greater uncertainty than any other distribution satisfying a

    set of given constraints. Thus, in this use of continuous measure one is interested in it

    as a measure of relative uncertainty, and not of absolute uncertainty. This is where one

    can relate maximization of Shannon entropy to the minimization of Kullback-Leibler

    relative-entropy (see [3, pp. 55]).

    Indeed, during the early stages of development of information theory, the important

    paper by Gelfand, Kolmogorov and Yaglom [4] called attention to the case of defining

    entropy functional on an arbitrary measure space (X,M, ). In this respect, Shannon

    entropy of a probability density function p : X R+ can be written as,

    S(p) =

    X

    p(x) lnp(x) d .

    One can see from the above definition that the concept of the entropy of a pdf is a

    misnomer: there is always another measure in the background. In the discrete case

    considered by Shannon, is the cardinality measure [1, pp. 19]; in the continuous case

    considered by both Shannon and Wiener, is the Lebesgue measure cf. [1, pp. 54] and

    [5, pp. 61, 62]. All entropies are defined with respect to some measure , as Shannon

    and Wiener both emphasized in [1, pp.57, 58] and [5, pp.61, 62] respectively.This case was studied independently by Kallianpur [6] and Pinsker [7], and perhaps

    others were guided by the earlier work of Kullback [8], where one would define entropy in

    terms of Kullback-Leibler relative entropy. Unlike Shannon entropy, measure-theoretic

    definition of KL-entropy is a natural extension of definition in the discrete case.

    In this paper we present the measure-theoretic definitions of generalized information

    measures and show that as in the case of KL-entropy, the measure-theoretic definitions

    Counting or cardinality measure on a measurable space (X,M), when is X is a finite set and

    M = 2X , is defined as (E) = #E, E M.

  • 8/8/2019 0601080v1

    3/20

    3

    of generalized relative-entropies, Renyi and Tsallis, are natural extensions of their

    respective discrete cases. We discuss the ME prescriptions for generalized entropies and

    show that ME prescriptions of measure-theoretic Tsallis entropy are consistent with the

    discrete case, which is true for measure-theoretic Shannon-entropy.

    Rigorous studies of the Shannon and KL entropy functionals in measure spaces can

    be found in the papers by Ochs [9] and by Masani [10, 11]. Basic measure-theoretic

    aspects of classical information measures can be found in [7, 12, 13].

    We review the measure-theoretic formalisms for classical information measures

    in 2 and extend these definitions to generalized information measures in 3. In

    4 we present the ME prescription for Shannon entropy followed by prescriptions for

    Tsallis entropy in 5. We revisit measure-theoretic definitions of generalized entropic

    functionals in 6 and present some results.

    2. Measure-Theoretic definitions of Classical Information Measures

    2.1. Discrete to Continuous

    Let p : [a, b] R+ be a probability density function, where [a, b] R. That is, p

    satisfies

    p(x) 0, x [a, b] and

    ba

    p(x) dx = 1 .

    In trying to define entropy in the continuous case, the expression of Shannon entropy

    was automatically extended by replacing the sum in the Shannon entropy discrete case

    by the corresponding integral. We obtain, in this way, Boltzmanns H-function (alsoknown as differential entropy in information theory),

    S(p) =

    ba

    p(x) lnp(x) dx . (1)

    But the continuous entropy given by (1) is not a natural extension of definition

    in discrete case in the sense that, it is not the limit of the finite discrete entropies

    corresponding to a sequence of finer partitions of the interval [a, b] whose norms tend

    to zero. We can show this by a counter example. Consider a uniform probability

    distribution on the interval [a, b], having the probability density function

    p(x) = 1b a

    , x [a, b] .

    The continuous entropy (1), in this case will be

    S(p) = ln(b a) .

    On the other hand, let us consider a finite partition of the the interval [a, b] which is

    composed of n equal subintervals, and let us attach to this partition the finite discrete

    uniform probability distribution whose corresponding entropy will be, of course,

    Sn(p) = ln n .

  • 8/8/2019 0601080v1

    4/20

    4

    Obviously, if n tends to infinity, the discrete entropy Sn(p) will tend to infinity too,

    and not to ln(b a); therefore S(p) is not the limit of Sn(p), when n tends to infinity.

    Further, one can observe that ln(b a) is negative when b a < 1.

    Thus, strictly speaking continuous entropy (1) cannot represent a measure of

    uncertainty since uncertainty should in general be positive. We are able to prove the

    nice properties only for the discrete entropy, therefore, it qualifies as a good measure

    of information (or uncertainty) supplied by an random experiment. The continuous

    entropy not being the limit of the discrete entropies, we cannot extend the so called

    nice properties to it.

    Also, in physical applications, the coordinate x in (1) represents an abscissa, a

    distance from a fixed reference point. This distance x has the dimensions of length. Now,

    with the density function p(x), one can specify the probabilities of an event [c, d) [a, b]

    as d

    cp(x) dx, one has to assign the dimensions (length)1, since probabilities are

    dimensionless. Now for 0 z < 1, one has the series expansion

    ln(1 z) = z +1

    2z2 +

    1

    3z3 + . . . , (2)

    it is necessary that the argument of the logarithm function in (1) be dimensionless.

    Hence the formula (1) is then seen to be dimensionally incorrect, since the argument

    of the logarithm on its right hand side has the dimensions of a probability density [14].

    Although Shannon [15] used the formula (1), he does note its lack of invariance with

    respect to changes in the coordinate system.

    In the context of maximum entropy principle Jaynes [16] addressed this problem

    and suggested the formula,

    S(p) =

    b

    a

    p(x) lnp(x)

    m(x)dx , (3)

    in the place of (1), where m(x) is a prior function. Note that when m(x) is probability

    density function, (3) is nothing but the relative-entropy. However, if we choose m(x) = c,

    a constant (e.g [17]), we get

    S(p) = S(p) ln c ,

    where S(p) refers to the continuous entropy (1). Thus, maximization of S(p) is

    equivalent to maximization of S(p). Further discussion on estimation of probability

    density functions by ME-principle in the continuous case can be found in [18, 17, 19].

    Prior to that, Kullback [8] too suggested that in the measure-theoretic definition of

    entropy, instead of examining the entropy corresponding to only on given measure, we

    have to compare the entropy inside a whole class of measures.

    2.2. Classical information measures

    Let (X,M, ) be a measure space. need not be a probability measure unless otherwise

    specified. Symbols P, R will denote probability measures on measurable space (X,M)

    and p, r denoteM-measurable functions on X. AnM-measurable function p : X R+

    is said to be a probability density function (pdf) if X p d = 1.

  • 8/8/2019 0601080v1

    5/20

    5

    In this general setting, Shannon entropy S(p) of pdfp is defined as follows [20].

    Definition 2.1. Let(X,M, ) be a measure space andM-measurable function p : X

    R+ be pdf. Shannon entropy of p is defined as

    S(p) = X

    p lnp d , (4)

    provided the integral on right exists.

    Entropy functional S(p) defined in (4) can be referred to as entropy of the

    probability measure P, in the sense that the measure P is induced by p, i.e.,

    P(E) =

    E

    p(x) d(x) , E M . (5)

    This reference is consistent because the probability measure P can be identified a.e by

    the pdf p.

    Further, the definition of the probability measure P in (5), allows us to write entropyfunctional (4) as,

    S(p) =

    X

    dP

    dln

    dP

    dd , (6)

    since (5) implies P , and pdfp is the Radon-Nikodym derivative of P w.r.t .

    Now we proceed to the definition of Kullback-Leibler relative-entropy or KL-entropy

    for probability measures.

    Definition 2.2. Let (X,M) be a measurable space. Let P and R be two probability

    measures on (X,M). Kullback-Leibler relative-entropy KL-entropy of P relative to R is

    defined as

    I(PR) =

    X

    lndP

    dRdP if P R ,

    + otherwise.

    (7)

    The divergence inequality I(PR) 0 and I(PR) = 0 if and only if P = R can

    be shown in this case too. KL-entropy (7) also can be written as

    I(PR) =

    X

    dP

    dRln

    dP

    dRdR . (8)

    Let the -finite measure on (X,M) such that P R . Since is -finite,from Radon-Nikodym theorem, there exists a non-negative M-measurable functions

    p : X R+ and r : X R+ unique -a.e, such that

    P(E) =

    E

    p d , E M , (9)

    Say p and r are two pdfs and P and R are corresponding induced measures on measurable space

    (X,M) such that P and R are identical, i.e.,E

    p d =E

    r d, E M. Then we have pa.e= r and

    hence X

    p lnp d = X

    r ln r d. If a nonnegative measurable function f induces a measure on measurable space (X,M) with respect

    to a measure , defined as (E) =

    E

    f d, E M then . Converse is given by Radon-Nikodym

    theorem [21, pp.36, Theorem 1.40(b)].

  • 8/8/2019 0601080v1

    6/20

    6

    and

    R(E) =

    E

    r d , E M . (10)

    The pdfs p and r in (9) and (10) (they are indeed pdfs) are Radon-Nikodym derivatives

    of probability measures P and R with respect to , respectively, i.e., p = dPd and r =dRd

    .

    Now one can define relative-entropy of pdf p w.r.t r as follows+.

    Definition 2.3. Let (X,M, ) be a measure space. LetM-measurable functions

    p,r : X R+ be two pdfs. The KL-entropy of p relative to r is defined as

    I(pr) =

    X

    p(x) lnp(x)

    r(x)d(x) , (11)

    provided the integral on right exists.

    As we have mentioned earlier, KL-entropy (11) exist if the two densities are

    absolutely continuous with respect to one another. On the real line the same definitioncan be written as

    I(pr) =

    R

    p(x) lnp(x)

    r(x)dx ,

    which exist if the densities p(x) and r(x) share the same support. Here, in the sequel

    we use the convention

    ln 0 = , lna

    0= + forany a R, 0.() = 0. (12)

    Now we turn to the definition of entropy functional on a measure space. Entropy

    functional in (6) is defined for a probability measure that is induced by a pdf. By

    the Radon-Nikodym theorem, one can define Shannon entropy for any arbitrary -continuous probability measure as follows.

    Definition 2.4. Let(X,M, ) be a-finite measure space. Entropy of any-continuous

    probability measure P (P ) is defined as

    S(P) =

    X

    lndP

    ddP . (13)

    Properties of entropy of a probability measure in the Definition 2.4 are studied

    in detail by Ochs [9] under the name generalized Boltzmann-Gibbs-Shannon Entropy.

    In the literature, one can find notation of the form S(P|) to represent the entropy

    functional in (13) viz., the entropy of a probability measure, to stress the role of themeasure (e.g [9, 20]). Since all the information measures we define are with respect

    to the measure on (X,M), we omit in the entropy functional notation.

    By assuming as a probability measure in the Definition 2.4, one can relate

    Shannon entropy with Kullback-Leibler entropy as,

    S(P) = I(P) . (14)+ This follows from the chain rule for Radon-Nikodym derivative:

    dP

    dR

    a.e=

    dP

    d

    dR

    d

    1

    .

  • 8/8/2019 0601080v1

    7/20

    7

    Note that when is not a probability measure, the divergence inequality I(P) 0

    need not be satisfied.

    A note on the -finiteness of measure . In the definition of entropy functional we

    assumed that is a -finite measure. This condition was used by Ochs [9], Csiszar [22]

    and Rosenblatt-Roth [23] to tailor the measure-theoretic definitions. For all practical

    purposes and for most applications, this assumption is satisfied. (See [9] for a discussion

    on the physical interpretation of measurable space (X,M) with -finite measure for

    entropy measure of the form (13), and of the relaxation -finiteness condition.) By

    relaxing this condition, more universal definitions of entropy functionals are studied by

    Masani [10, 11].

    2.3. Interpretation of Discrete and Continuous Entropies in terms of KL-entropy

    First, let us consider discrete case of (X,M, ), where X = {x1, . . . , xn}, M = 2X and

    is a cardinality probability measure. Let P be any probability measure on (X,M).

    Then and P can be specified as follows.

    : k = ({xk}) 0, k = 1, . . . , n ,n

    k=1

    k = 1 ,

    and

    P: Pk = P({xk}) 0, k = 1, . . . , n ,n

    k=1

    Pk = 1 .

    The probability measure P is absolutely continuous with respect to the probability

    measure if k = 0 implies Pk = 0 for any k = 1, . . . n. The corresponding Radon-Nikodym derivative of P with respect to is given by

    dP

    d(xk) =

    Pk

    k, k = 1, . . . n .

    The measure-theoretic entropy S(P) (13), in this case, can be written as

    S(P) = n

    k=1

    Pk lnPk

    k=

    nk=1

    Pk ln k n

    k=1

    Pk ln Pk .

    If we take referential probability measure as a uniform probability distribution on the

    set X, i.e. k

    = 1n

    , we obtain

    S(P) = Sn(P) ln n , (15)

    where Sn(P) denotes the Shannon entropy of pmf P = (P1, . . . , P n) and S(P) denotes

    the measure-theoretic entropy in the discrete case.

    Now, lets consider the continuous case of (X,M, ), where X = [a, b] R, M is

    set of Lebesgue measurable sets of [a, b], and is the Lebesgue probability measure. In

    this case and P can be specified as follows.

    : (x) 0, x [a, b], (E) =

    E

    (x) dx, E M,

    ba

    (x) dx = 1 ,

  • 8/8/2019 0601080v1

    8/20

    8

    and

    P: P(x) 0, x [a, b], P(E) =

    E

    P(x) dx, E M,

    ba

    P(x) dx = 1 .

    Note the abuse of notation in the above specification of probability measures and P,where we have used the same symbols for both measures and pdfs.

    The probability measure P is absolutely continuous with respect to the probability

    measure , if(x) = 0 on a set of a positive Lebesgue measure implies that P(x) = 0 on

    the same set. The Radon-Nikodym derivative of the probability measure P with respect

    to the probability measure will be

    dP

    d(x) =

    P(x)

    (x).

    Then the measure-theoretic entropy S(P) in this case can be written as

    S(P) = ba

    P(x) ln P(x)(x) d

    x .

    If we take referential probability measure as a uniform distribution, i.e. (x) = 1ba

    ,

    x [a, b], then we obtain

    S(P) = S[a,b](P) ln(b a) ,

    where S[a,b](P) denotes the Shannon entropy of pdfP(x), x [a, b] (1) and S(P) denotes

    the measure-theoretic entropy in the continuous case.

    Hence, one can conclude that measure theoretic entropy S(P) defined for a

    probability measure P on the measure space (X, M, ), is equal to both Shannon entropy

    in the discrete and continuous case case up to an additive constant, when the referencemeasure is chosen as a uniform probability distribution. On the other hand, one can

    see that measure-theoretic KL-entropy, in discrete and continuous cases are equal to its

    discrete and continuous definitions.

    Further, from (14) and (15), we can write Shannon Entropy in terms Kullback-

    Leibler relative entropy

    Sn(P) = ln n I(P) . (16)

    Thus, Shannon entropy appearers as being (up to an additive constant) the variation

    of information when we pass from the initial uniform probability distribution to new

    probability distribution given by Pk 0,

    nk=1 Pk = 1, as any such probability

    distribution is obviously absolutely continuous with respect to the uniform discrete

    probability distribution. Similarly, by (14) and (2.3) the relation between Shannon

    entropy and Relative entropy in discrete case we can write Boltzmann H-function in

    terms of Relative entropy as

    S[a,b](p) = ln(b a) I(P) . (17)

    Therefore, the continuous entropy or Boltzmann H-function S(p) may be interpreted

    as being (up to an additive constant) the variation of information when we pass from

    the initial uniform probability distribution on the interval [a, b] to the new probability

  • 8/8/2019 0601080v1

    9/20

    9

    measure defined by the probability distribution function p(x) (any such probability

    measure is absolutely continuous with respect to the uniform probability distribution

    on the interval [a, b]).

    Thus, KL-entropy equips one with unitary interpretation of both discrete entropy

    and continuous entropy. One can utilize Shannon entropy in the continuous case, as

    well as Shannon entropy in the discrete case, both being interpreted as the variation

    of information when we pass from the initial uniform distribution to the corresponding

    probability measure.

    Also, since measure theoretic entropy is equal to the discrete and continuous entropy

    upto an additive constant, ME prescriptions of measure-theoretic Shannon entropy are

    consistent with discrete case and the continuous case.

    3. Measure-Theoretic Definitions of Generalized Information Measures

    We begin with a brief note on the notation and assumptions used. We define all the

    information measures on the measurable space (X,M), and default reference measure

    is unless otherwise stated. To avoid clumsy formulations, we will not distinguish

    between functions differing on a -null set only; nevertheless, we can work with equations

    betweenM-measurable functions on X if they are stated as valid as being only -almost

    everywhere (-a.e or a.e). Further we assume that all the quantities of interest exist

    and assume, implicitly, the -finiteness of and -continuity of probability measures

    whenever required. Since these assumptions repeatedly occur in various definitions

    and formulations, these will not be mentioned in the sequel. With these assumptions

    we do not distinguish between an information measure of pdf p and of correspondingprobability measure P hence we give definitions of information measures for pdfs, we

    use corresponding definitions of probability measures as well, when ever it is convenient

    or required with the understanding that P(E) =E

    p d, the converse being due to

    the Radon-Nikodym theorem, where p = dPd . In both the cases we have P .

    First we consider the Renyi generalizations. Measure-theoretic definition of Renyi

    entropy can be given as follows.

    Definition 3.1. Renyi entropy of a pdf p : X R+ on a measure space (X,M, ) is

    defined as

    S(p) = 11

    lnX

    p(x) d(x) , (18)

    provided the integral on the right exists and R, > 0.

    The same can be defined for any -continuous probability measure P as

    S(P) =1

    1 ln

    X

    dP

    d

    1dP . (19)

    On the other hand, Renyi relative-entropy can be defined as follows.

  • 8/8/2019 0601080v1

    10/20

    10

    Definition 3.2. Letp, r : X R+ be two pdfs on measure space (X,M, ). The Renyi

    relative-entropy of p relative to r is defined as

    I(pr) =1

    1

    lnX

    p(x)

    r(x)1

    d(x) , (20)

    provided the integral on the right exists and R, > 0.

    The same can be written in terms of probability measures as,

    I(PR) =1

    1ln

    X

    dP

    dR

    1dP

    =1

    1ln

    X

    dP

    dR

    dR , (21)

    whenever P R; I(PR) = +, otherwise. Further if we assume in (19) is a

    probability measure then

    S(P) = I(P) . (22)

    Tsallis entropy in the measure theoretic setting can be defined as follows.

    Definition 3.3. Tsallis entropy of a pdf p on (X,M, ) is defined as

    Sq(p) =

    X

    p(x) lnq1

    p(x)d(x) =

    1 X

    p(x)q d(x)

    q 1, (23)

    provided the integral on the right exists and q R and q > 0.

    lnq in (23) is referred to as q-logarithm and is defined as lnq x =x1q 1

    1 q (x >

    0, q R

    ). The same can be defined for -continuous probability measure P, and can bewritten as

    Sq(P) =

    X

    lnq

    dP

    d

    1dP . (24)

    The definition of Tsallis relative-entropy is given below.

    Definition 3.4. Let(X,M, ) be a measure space. Let p, r : X R+ be two probability

    density functions. The Tsallis relative-entropy of p relative to r is defined as

    Iq(pr) = X p(x) lnqr(x)

    p(x)d(x) =

    X

    p(x)q

    r(x)q1d(x) 1

    q 1(25)

    provided the integral on right exists and q R and q > 0.

    The same can be written for two probability measures P and R, as

    Iq(PR) =

    X

    lnq

    dP

    dR

    1dP , (26)

    whenever P R; Iq(PR) = +, otherwise. If in (24) is a probability measure then

    Sq(P) = Iq(P) . (27)

  • 8/8/2019 0601080v1

    11/20

  • 8/8/2019 0601080v1

    12/20

    12

    We also haveS

    um= m , m = 1, . . . M . (35)

    Equations (34) and (34) are referred to as the thermodynamic equations.

    5. ME prescription for Tsallis Entropy

    The great success of Tsallis entropy is attributed to the power-law distributions one can

    derive as maximum entropy distributions by maximizing Tsallis entropy with respect to

    the moment constraints. But there are subtilities involved in the choice of constraints

    one would choose for ME prescriptions of these entropy functionals. These subtilities

    are still part of the major discussion in the nonextensive formalism [24, 25, 26].

    In the nonextensive formalism maximum entropy distributions are derived with

    respect to the constraints which are different from (28), which are used for classicalinformation measures. The constraints of the form (28) are inadequate for handling the

    serious mathematical difficulties (see [27]). To handle these difficulties constraints of

    the form X

    um(x)p(x)q d(x)

    Xp(x)q d(x)

    = umq , m = 1, . . . , M (36)

    are proposed. (36) can be considered as the expectation with respect to the modified

    probability measure P(q) (it is indeed a probability measure) defined as

    P(q)(E) = X

    p(x)q d1

    E

    p(x)q d . (37)

    The measure P(q) is known as escort probability measure.

    The variational principle for Tsallis entropy maximization with respect to

    constraints (36) can be written as

    L(x,,) =

    X

    lnq1

    p(x)dP(x)

    X

    dP(x) 1

    Mm=1

    (q)m

    X

    p(x)q1

    um(x) umq

    dP(x)

    , (38)

    where the parameters (q)m can be defined in terms of true Lagrange parameters m as

    (q)m =

    X

    p(x)q d

    1

    m , m = 1, . . . , M . (39)

    The maximum entropy distribution in this case can be written as

    p(x) =

    1 (1 q)

    dx p(x)q

    1 Mm=1

    m

    um(x) umq

    11qZq

    (40)

    p(x) =e(Xp(x)q d)

    1Mm=1 m(um(x)umq)

    q

    Zq, (41)

  • 8/8/2019 0601080v1

    13/20

    13

    where

    Zq =

    X

    e(Xp(x)q d)

    1Mm=1 m(um(x)umq)

    q d(x) . (42)

    Maximum Tsallis entropy in this case satisfiesSq = lnq Zq , (43)

    while corresponding thermodynamic equations can be written as

    mlnq Zq = umq , m = 1, . . . M , (44)

    Sq

    umq= m , m = 1, . . . M , (45)

    where

    lnq Zq = lnq Zq Mm=1

    mumq . (46)

    6. Measure-Theoretic Definitions: Revisited

    It is well known that unlike Shannon entropy, Kullback-Leibler relative-entropy in the

    discrete case can be extended naturally to the measure-theoretic case. In this section, we

    show that this fact is true for generalized relative-entropies too. Renyi relative-entropy

    on continuous valued space R and its equivalence with the discrete case is studied by

    Renyi [28]. Here, we present the result in the measure-theoretic case and conclude that

    both measure-theoretic definitions of Tsallis and Renyi relative-entropies are equivalentto its discrete case.

    We also present a result pertaining to ME of measure-theoretic Tsallis entropy. We

    show that ME of Tsallis entropy in the measure-theoretic case is consistent with the

    discrete case.

    6.1. On Measure-Theoretic Definitions of Generalized Relative-Entropies

    Here we show that generalized relative-entropies in the discrete case can be naturally

    extended to measure-theoretic case, in the sense that measure-theoretic definitions can

    be defined as a limit of a sequence of finite discrete entropies of pmfs which approximatethe pdfs involved. We call this sequence of pmfs as approximating sequence of pmfs of

    a pdf. To formalize these aspects we need the following lemma.

    Lemma 6.1. Let p be a pdf defined on measure space (X,M, ). Then there exists a

    sequence of simple functions {fn} (we refer to them as approximating sequence of simple

    functions ofp) such that limn fn = p and each fn can be written as

    fn(x) =1

    (En,k)

    En,k

    p d , x En,k, k = 1, . . . m(n) , (47)

  • 8/8/2019 0601080v1

    14/20

    14

    where (En,1, . . . , E n,m(n)) is the measurable partition corresponding to fn (the notation

    m(n) indicates that m varies with n). Further each fn satisfies

    X

    fn d = 1 . (48)

    Proof. Define a sequence of simple functions {fn} as

    fn(x) =

    1

    p1([ k2n ,k+12n ))

    p1([ k2n ,

    k+12n ))

    p d , if k2n

    p(x) < k+12n

    ,

    k = 0, 1, . . . n2n 1

    1p1([n,))

    p1([n,))

    p d , if n p(x),

    (49)

    Each fn is indeed a simple function and can be written as

    fn =n2

    n

    1k=0

    1

    En,k

    En,k

    p d

    En,k +

    1Fn

    Fn

    p d

    Fn , (50)

    where En,k = p1

    k2n ,

    k+12n

    , k = 0, . . . , n2n 1 and Fn = p

    1 ([n, )). SinceE

    p d < for any E M, we haveEn,k

    p d = 0 whenever En,k = 0, for

    k = 0, . . . n2n 1. SimilarlyFn

    p d = 0 whenever Fn = 0. Now we show that

    limn fn = p, point-wise.

    First assume that p(x) < . Then n Z+ p(x) n. Also k Z+,

    0 k n2n1 k2n p(x)