kuba tatarkiewicz phd, r&d global director · © © 2019 horiba, ltd. all rights reserved.2019...

19
© 2019 HORIBA, Ltd. All rights reserved. © 2019 HORIBA, Ltd. All rights reserved. 1 How to present and compare data obtained by particle tracking analysis and other related methods Kuba Tatarkiewicz PhD, R&D Global Director 3-6-2019

Upload: others

Post on 04-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • © 2019 HORIBA, Ltd. All rights reserved.© 2019 HORIBA, Ltd. All rights reserved. 1

    How to present and compare dataobtained by particle tracking analysis

    and other related methods

    Kuba Tatarkiewicz PhD, R&D Global Director

    3-6-2019

  • © 2019 HORIBA, Ltd. All rights reserved. 2

    • Measuring sizes of individual particleshydrodynamic diameter dh > d0 due to diluent drag

    • Counting tracked particles in a given volumeinvestigated volume depends on size and refractive index

    • Testing small volume of a sample• statistical process of estimating particle size distribution• volume tested about 100,000x smaller than sample volume

    Particle tracking analysis

  • © 2019 HORIBA, Ltd. All rights reserved. 3

    • Definition of CN: counts per volume [part/mL]other defs: mass CM [mg/mL] or volume CV [μL/mL]

    • Typically CN is given for a range of sizes e.g. between 100 nm and 1 μm diameters

    • Hence use of plots with size bins• bin widths not necessarily equal• some software do not support unequal bins (notably Excel)

    Concentration measurements

  • © 2019 HORIBA, Ltd. All rights reserved. 4

    • Error in counting proportional to this applies to total counts and individual bin counts

    • Number of bins depends on expected errorsN=10,000 and 100 bins, average error 10 counts/bin or 10%

    • To obtain “nice”, smooth distributions:• decrease number of bins and use wider bins• use narrow bins to calculate parameters of a distribution

    Statistical considerations

    𝑁𝑁

  • © 2019 HORIBA, Ltd. All rights reserved. 5

    Example plots of size distributions

    Counts Density of PSDlogarithmic bins

    Cumulative

    0

    200

    400

    600

    800

    1,000

    1,200

    1,400

    1,600

    0 400 800 1,200 1,600 2,000

    Cou

    nts

    per b

    in [N

    ]

    Diameter [nm]

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    10 100 1,000

    Cum

    ulat

    ive

    coun

    ts

    Diameter [nm]

    D50=217 nm

  • © 2019 HORIBA, Ltd. All rights reserved. 6

    • Definition of a bin:• d ∈ [bi, bi+1[ where i=1,…,N• equivalent definition: bi ≤ d < bi+1 (non-overlapping bins)

    • typically b1 = 0 and bN = max size to be measured• Strange bins definition encountered:

    Binning

  • © 2019 HORIBA, Ltd. All rights reserved. 7

    • Concentration is just a rational number, also per bin• Histogram uses area to represent a distribution• With unequal bins natural measurable is:

    density of particle size distribution (PSD)units: number of particles per bin width and

    per investigated volume [part/nm/mL]total concentration ≡ area of a histogram

    What is really plotted

  • © 2019 HORIBA, Ltd. All rights reserved. 8

    • When plotting concentration vs. size,DO NOT connect points*

    • Total concentration is a sum of numbers, not an integral• When plotting density of PSD, one can connect

    middle of bins ☛ linear approximation• Total concentration CN is then

    an area under the curve

    Dimensional analysis

    *There’s no data between points – by definition of a bin

  • © 2019 HORIBA, Ltd. All rights reserved. 9

    • Create a list of measured sizestypically diameters of nanoparticles are given in [nm]

    • Bin those sizes in specific binningtypically logarithmic binning

    (bi+2 – bi+1 ) / (bi+1 – bi ) = const

    • Calculate density of PSD [part/nm/mL]; V0 needed• Plot as a histogram of density of PSD• Calculate parameters of a distribution (AV, SD, D50 ...)

    Practical procedure

  • © 2019 HORIBA, Ltd. All rights reserved. 10

    Example of real data in logarithmic binning

    Concentration from 50 nm to 700 nm = area of histogram of density of PSD

    Den

    sity

    of P

    SD

  • © 2019 HORIBA, Ltd. All rights reserved. 11

    • Most standards of size are mono-dispersethere are no standards for concentration…

    • Typically real life samples are poly-dispersee.g. multiple sizes or continuous distributions

    • Natural samples like sea water or blood:• highly poly-disperse with dominating small particles

    • Junge distribution N(d) ∼ d-x where x=3.5÷4

    Real data vs. standards

  • © 2019 HORIBA, Ltd. All rights reserved. 12

    • Fitting unknown distribution of sizes • Assumed binomial(s) distribution• Height (or area) of different peaks

    is NOT conserved during fitting(not invariant of any fitting procedure)

    • Looks great but lacks numerical accuracy• Artificial peaks can be created

    Fitted data problem (NanoSight FTLA)

    cf. van der Pol et al. J Thrombosis and Haemostasis, 12, 1182 (2014)

  • © 2019 HORIBA, Ltd. All rights reserved.

    • Concentration

    • Average size

    • Standard deviation

    • D50

    13

    Parametric description of distributions

    Dk =1

    𝑁𝑁𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡�𝑖𝑖=1

    𝑘𝑘

    𝑛𝑛𝑖𝑖 definition Dk = 0.5☛ k☛ bk equal D50

    𝑆𝑆𝑆𝑆 =1

    𝑁𝑁𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡�𝑖𝑖=1

    𝑁𝑁

    𝑛𝑛𝑖𝑖 ∗ 𝑑𝑑𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎𝑡𝑡𝑎𝑎𝑎𝑎 − 𝑏𝑏𝑖𝑖 +𝐵𝐵𝑖𝑖2

    2

    Bi = (bi+1 - bi) 𝑁𝑁𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = �𝑖𝑖=1

    𝑁𝑁𝑛𝑛𝑖𝑖𝐵𝐵𝑖𝑖𝐵𝐵𝑖𝑖 = �

    𝑖𝑖=1

    𝑁𝑁

    𝑛𝑛𝑖𝑖

    𝑑𝑑𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎𝑡𝑡𝑎𝑎𝑎𝑎 =1

    𝑁𝑁𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡�𝑖𝑖=1

    𝑁𝑁

    𝑛𝑛𝑖𝑖 ∗ (𝑏𝑏𝑖𝑖 +𝐵𝐵𝑖𝑖2

    )

  • © 2019 HORIBA, Ltd. All rights reserved. 14

    D50 & mode depend on binning!

    most accurate

    which one is mode?which one is mode?

  • © 2019 HORIBA, Ltd. All rights reserved. 15

    • Parametric description using AV and SD

    is not accurate or unique!

    Anscombe’s quartet – same AV and SD, various shapes

    • Even higher moments do not give enough info

    • Basic experimental question:How similar are two measured distributions?

    Lies, damned lies, and statistics

  • © 2019 HORIBA, Ltd. All rights reserved. 16

    Kolmogorov-Smirnov statistics

    max

    Non-parametric test:Kolmogorov-Smirnov statistics

    dav = 256 nm, SD = 145 nm, CV = 0.57dav = 163 nm, SD = 68 nm, CV = 0.42

    Comparing parameters:

    Sheet1

    DA,BalphaDA,B,αReject?

    0.23350.050.0338yes

  • © 2019 HORIBA, Ltd. All rights reserved. 17

    Anderson-Darling statistics

    Area between two cumulatives is a good measure of distance between two or more distributions with unknown shape.

  • © 2019 HORIBA, Ltd. All rights reserved. 18

    • Normalize area between cumulativesextreme case: a) particles at 10 nm, b) particles at 1000 nm

    • Distance is a number from the range [0,1]same distributions have distance 0extreme case – distance 1

    • If areas are calculated between distributions in same normalization, it can be a good measure

    Distance between distributions

    d

    cum

    ulat

    ive

  • © 2019 HORIBA, Ltd. All rights reserved.

    DankeБольшое спасибо

    Grazie

    감사합니다

    Obrigado谢谢

    ขอบคณุครบั

    ありがとうございました

    धन्यवाद

    Cảm ơn

    Tack ska ni ha

    Thank you

    Merci

    Gracias

    Dziękuję

    Σας ευχαριστούμε

    நன்ற

    19

    How to present and compare data�obtained by particle tracking analysis�and other related methodsParticle tracking analysisConcentration measurementsStatistical considerationsExample plots of size distributionsBinningWhat is really plottedDimensional analysisPractical procedureExample of real data in logarithmic binningReal data vs. standardsFitted data problem (NanoSight FTLA)Parametric description of distributionsD50 & mode depend on binning!Lies, damned lies, and statisticsKolmogorov-Smirnov statisticsAnderson-Darling statisticsDistance between distributionsSlide Number 19