data evaluation in high-performance liquid chromatography-diode-array detection-fluorescence...

Download Data Evaluation in High-Performance Liquid Chromatography-Diode-Array Detection-Fluorescence Detection by Information Theory

Post on 14-Feb-2017




1 download

Embed Size (px)


  • Data Evaluation in High-Performance Liquid Chromatography-Diode-ArrayDetection-Fluorescence Detection by Information Theory

    Victor David* and Andrei Medvedovici

    Department of Analytical Chemistry, Faculty of Chemistry, University of Bucharest,Sos. Panduri, No. 90, 76.235, Section 5, Bucharest, Romania

    Received November 7, 1999

    The concepts of information theory were applied to the high-performance liquid chromatography (HPLC)technique, with diode-array (DAD) and/or fluorescence (FLD) detections. The information amount for acomplete analysis can be computed as a function of analytical parameters, such as the number of analytes,level of concentration, and standard deviation of determinations. By means of the proposed method, theinformation content of a qualitative and quantitative analysis accomplished by HPLC-DAD-FLD wasestimated, and sensitivity was optimized taking into account a maximum information content, while thedetection limit was estimated considering that at this level of concentration the information content approacheszero.


    Evaluation of analytical data is usually focused on theprecision and accuracy of determinations during an analyticalprocess. Depending on their sizes, and a criterion chosen tosatisfy a standard quality,1,2 these statistical parameters canbe used in optimization of experimental parameters to yieldmaximum information from the process. Thus, the informa-tion theory is deeply involved in analytical chemistry, mainlyas a consequence between an analytical and a communicationprocess.3 Moreover, the advantage of using the new pos-sibility of data processing with the aid of a computer in-linewith the analytical instrument can be used in evaluation ofthe analytical information. Those techniques, such as GC-MS, GC-FTIR, HPLC-MS, or HPLC-DAD (DAD )diode-array detection), which finally report an analyticalresult as a raw of components identified in a sample togetherwith the so-called similarity index, can be easily discussedin terms of information theory. Thus, in a previous paperwe proposed a new method for estimation of analytical datafrom GC-MS, with very good results in the case of verycomplex mixtures.4

    High-performance liquid chromatography (HPLC), withone or two detections, diode-array detection (DAD) andfluorescence detection (FLD), is an analytical technique withlarge applications in separation and determination of non-volatile compounds from different mixtures. Due to the recentdevelopment in the field of data acquisition and processing,the analytical results can be reported from qualitative andquantitative points of view. For this reason, these results canbe characterized by means of information theory. By theapplication of the information theory in data evaluating acertain analytical process, we usually estimate quantitativelyits information amount (content) dependent on the analyticalparameters and optimize them to obtain as much information

    as possible. Up to the present, there have been severalattempts in the literature to apply the information theory inthe field of HPLC techniques. According to Matsuda et al5,6

    the information theory can be used to select the most efficientHPLC conditions for dissolution testing of multiingredientpharmaceuticals in the framework of a total chromatographicoptimization procedure. Information content, sample com-plexity, physicochemical detectors, and chromatographictechniques are investigated in the analysis of plant extracts.7

    Even the homogeneity of the distribution of an analyte in amatrix has been characterized in terms of information theory.8

    Nevertheless, there is not now a comprehensive treatise ofthis analytical technique by means of the information theory.Therefore, it is the purpose of the present study to applythis theory to this field in view of measuring the informationcontent obtained from such a determination.


    An analytical process compared with a communicationprocess can be represented by means of the following twofields of probabilities:9

    In this representation the main role is played by the eventsX i, the number of which being denoted byn. The event Xican be in HPLC-FLD-DAD a certain compound to beidentified with the aid of its UV-vis absorption spectrum,a value of the measured absorbance (in DAD), or a value ofthe emission intensity (in FLD). The main role of the HPLCprocess played from the information point of view is toimprove the values of a posteriori probabilities, assigned toeach chemical compound to be found in the sample, due tothe separation process that takes place during this analysis.* Corresponding author. E-mail:

    sample: a priori{X i, pi, i ) 1, 2, ...n}

    f analytical processf

    a posterioriresult: {X i, qi, i ) 1, 2, ...,n}

    976 J. Chem. Inf. Comput. Sci.2000,40, 976-980

    10.1021/ci990139x CCC: $19.00 2000 American Chemical SocietyPublished on Web 06/02/2000

  • With these two probabilities the information entropies canbe computed by Shannons formula:

    The information content,Hi, will be the difference betweenthe two entropies:

    Depending on the values ofpi andqi, we can encounter oneof the following situations with corresponding significance.10

    (1) pi ) 0; qi ) 0 (Hi ) 0). In this case the process isproving our knowledge about the analyte Xi: without anydoubt it is not possible to be found in the sample. The processconfirms that the sample does not possibly contain analyteX i, and thus it does not carry out any information with regardto the analyte Xi.

    (2) pi ) 1; qi ) 1 (Hi ) 0). This is a situation similar tothat described above, but with another conclusion: theanalyte Xi is undoubtedly present in the sample.

    (3) pi ) 1; qi ) 0 (Hi questionable). This is a wrongsupposition by assigning to the analyte Xi a sure presencein the sample, while the determination process does not provethis presence. In this rare case it is difficult to estimate theinformation content of the process.

    (4) pi ) 0; qi ) 1 (Hi questionable). This is the samesituation as described above, but the supposition that Xi isnot found in the sample seems not to be true after thedetermination.

    (5) 0 < pi ) qi < 1 (Hi ) 0). The uncertainty assignedto the compound Xi to be present in the sample has not beenchanged, making the analytical process inopportune in thedetermination of this compound.

    (6) 0 < pi < qi < 0.5 (Hi < 0). Although the probabilitythat Xi is found in the sample has increased, the uncertaintyabout this compound has also increased, leading to a negativeinformation content of the determination.

    (7) 0 < pi < qi < 0.5 (Hi > 0). The certainty for Xi notto be found in the sample has increased, and consequently apositive value of the information content of the determinationis obtained.

    (8) 0.5< pi < qi < 1 (Hi > 0). The certainty that Xi isfound in the sample has increased.

    (9) 0.5< qi < pi < 1 (Hi < 0). The uncertainty that Xiis found in the sample has increased.

    The main conclusion arising from these situations is thatan analytical process is useful for determining an analyte Xiif and only if the a posteriori probabilityqi is closer to thevalues 0, or 1, than the a priori probabilitypi. In the case ofa maximum uncertainty a priori to the analysis, i.e.,pi )1/2 for all the eventsi ) 1, 2, ..., n, and a lack of anyuncertainty a posteriori to the analysis, i.e.,qi ) 0, or 1, forall the eventsi ) 1, 2, ...,n, the value of the informationamount isn bits. Moreover, if the uncertainty of the aposteriori probability field is zero, then the informationamount obtained from analysis is given by the a prioriinformation entropy.


    The semiquantitative analysis of a sample consists ofsetting up a number of intervals [Cj(lower),Cj(upper)] within

    which the concentration of the analyzed component of theanalytical sample is situated. These concentration intervalswithin which the value of the concentration of a certaincompoundXi identified to be present in a sample by a DADdetection could be practically represented as follows:

    The probabilityqij that the value of the concentration of thecomponenti lies in one of the intervals is

    Therefore, the information amount corresponding to theprobability field of a semiquantitative analysis for onecomponenti and a lack of any uncertainty in a final result isgiven by

    For instance, if a semiquantitative analysis for a certaincompound is carried out down to the parts per million level(10-4%, i.e., ) 6) by means of DAD detection, theinformation content of this analysis has the value of 2.57bits, whereas if the analysis is carried out down to the partsper billion level ( ) 9) by means of FLD detection, thevalue is 3.14 bits.

    The information content corresponding to the events of asemiquantitative analysis for all the components of thesample has the expression

    By carrying out a quantitative analysis for a certaincomponent, a confidence intervalcj ( is set up, withinwhich the value of the concentration of the analyzedcomponent can be situated; here is the standard deviationof the analytical resultc, and is a number, set after analysis,which indicates the magnitude of the confidence interval in units (usually the value ) 3 is considered). The valueof the concentration of a certain componenti, after theanalysis has been carried out, iscjij ( ijij, cjij [cj(lower),cj(upper)],j ) 1, . Hence, the events of the probability field,before a quantitative analysis has been carried out, are certainconcentration microintervals, with a magnitude of 2ijij, j) 1,, i ) 1, n. Besides, the events of the probability fieldcorresponding to a quantitative analysis depend on the eventsof a semiquantitative analysis.11

    According to this procedure, the semiquantitative


View more >