mchem computing and chemistry [b14sc3]

105
MChem Computing and Chemistry [B14SC3] Data Analysis for Beginners…or How to avoid disasters when writing up your research work… or On the Meaning of Life, the Universe and Everything! Lecture #42 Dr Roderick Ferguson – Summer 2008

Upload: dakota

Post on 11-Jan-2016

20 views

Category:

Documents


3 download

DESCRIPTION

MChem Computing and Chemistry [B14SC3]. “ Data Analysis for Beginners… ” or How to avoid disasters when writing up your research work… or On the Meaning of Life, the Universe and Everything!. Lecture #42 Dr Roderick Ferguson – Summer 2008. The “data analysis” assignment…. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MChem Computing and Chemistry [B14SC3]

MChem Computing and Chemistry [B14SC3]

“Data Analysis for Beginners…”or

• How to avoid disasters whenwriting up your research work…

or

• On the Meaning of Life, the Universeand Everything!

Lecture #42 Dr Roderick Ferguson – Summer 2008

Page 2: MChem Computing and Chemistry [B14SC3]

The “data analysis” assignment…

This is in 3 parts

Some useful information, and a copy of the

MS Word document can be found at:-

http://www.eps.hw.ac.uk/~cherrf/B14SC3

(this link is also available from my chemistry staff home page)

Page 3: MChem Computing and Chemistry [B14SC3]

The “data analysis” assignment…

Now, if you are really stuck and can’t see

how to get started with the 2nd part of this…

then I will be available over the next 2 days

i.e. Wed 4th and Thurs 5th June to offer some

help - My office is now DB 2.49

But, you should have enough knowledge

to be able to attempt this yourselves!

Page 4: MChem Computing and Chemistry [B14SC3]

Data Analysis for Beginners…

1. Introduction to Data Analysis.

2. Linear Least Squares.

3. Nonlinear Least Squares.

4. Theoretical Models (and Maths).

5. Errors (and what to do with them!).

Page 5: MChem Computing and Chemistry [B14SC3]

Data Analysis for Beginners…

1. Introduction to Data Analysis.

2. Linear Least Squares.

3. Nonlinear Least Squares.

4. Theoretical Models (and Maths).

5. Errors (and what to do with them!).

Page 6: MChem Computing and Chemistry [B14SC3]

Introduction to Data Analysis

Why do you need to do it?

• Data Analysis is an essential skill for a

professional scientist today.

• Many modern instruments can generate large quantities of numerical data which require some sort of analysis and/or theoretical interpretation.

Page 7: MChem Computing and Chemistry [B14SC3]

Introduction to Data Analysis

How can you do it?

• Most people now have access to very powerful Desktop Computers…

• There are many software tools that can be used to analyse numerical data…

• Today we will focus on what can be done with something that is readily available –ie Microsoft Excel… !

Page 8: MChem Computing and Chemistry [B14SC3]

Introduction to Data AnalysisAn historical aside

• This was not always true…

• Modern research workers have no idea of what performing data analysis was like before the (micro) computer revolution

that started in the 1980’s…

Page 9: MChem Computing and Chemistry [B14SC3]

Introduction to Data AnalysisAn historical aside (cont)

• To do data analysis, you had to have access to a large “mainframe” computer…

• You also had to learn at least one computer programming language…

• And you also had to type in both your numerical data and analysis program

onto punched paper cards!

Page 10: MChem Computing and Chemistry [B14SC3]

Introduction to Data Analysis• There are also many pitfalls and traps that

the new research worker can very easily fall into!

• Thus some background knowledge on both the how and the why aspects is required…

• Also, it is never a good idea to use something without first understandinghow it works!

Page 11: MChem Computing and Chemistry [B14SC3]

Introduction to Data Analysis

At first, your reactions

will probably

be fear and

confusion

when you

try to do

Data Analysis…

Page 12: MChem Computing and Chemistry [B14SC3]

Introduction to Data Analysis

• However,

Don’t Panic!

• because, it’s easier than you think…

Page 13: MChem Computing and Chemistry [B14SC3]

Data Analysis for Beginners…

1. Introduction to Data Analysis.

2. Linear Least Squares.

3. Nonlinear Least Squares.

4. Theoretical Models (and Maths).

5. Errors (and what to do with them!).

Page 14: MChem Computing and Chemistry [B14SC3]

Linear Least Squares

Whilst performing data analysis, you will

encounter the following terms…

• “Best Fit”

• “Goodness of Fit”

• “Residuals”

• “Sum of Squares”

What do they mean?

Page 15: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – an example

We have two columns of data and want

to see if there is a LINEAR relationship

between them …

point # X Y 1 1.1 1.4 2 2.0 1.8 3 2.9 2.2 4 4.2 2.8 5 5.0 2.9

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

1 2 3 4 5

X

Y

Step 0 – Draw a Graph!

Page 16: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – an example

What would your idea of a good straight line fit to

this data be?

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

1 2 3 4 5

X

Y

Page 17: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – an example

Can we give a more precise mathematical

description for the idea of “Best Fit” ?

Yes – we can!

Need some definitions. We’ll look at Residuals

and the Sum of Squares (SS)

or more precisely,

the “Sum of Squares of the Residuals”.

Page 18: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – an example

Back to our graph again …

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

2.8

3.0

1 2 3 4 5

X

Y

… but now with the residuals added!

Page 19: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (1)

Consider the i’th data point with values (Xi,Yi) and

suppose that the data can be described by the

familiar straight line relationship

F = m X + C where m is the slope and

C is the intercept.

• Now, for each experimental Yi we calculate a theoretical Yi (which we’ll call Fi ) by using the above equation ie Fi = m Xi + c, for all of the i data points.

Page 20: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (2)

• The difference between each calculated and experimental value of Y is called the Residual ie we have Ri = Yi – Fi for all i data points.

• Note that sometimes Ri will be positive and also sometimes it will be negative…

• How can we get an overall measure of how close the theoretical line is to our data?

• Clearly, it must have something to do with ALL of the residuals …

Page 21: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (3)

We define the Sum of Squares of the Residuals

(or just SS) as :-

• This gives us a single quantity that measures how good a fit the straight line is to the data.

• Note also that SS will depend on both m and C

N

iii

N

ii FYRSS

1

2

1

2 )(

Page 22: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (3)

ie SS = SS(m,C), which means that SS is a function

of two independent variables {m and C}

so that:

From our original problem we have now got a new

problem i.e. create a Sum of Squares function,

and we need to find values of m and C which

minimise this function!

2

1

)(),( CmXYCmSS ii

N

i

Page 23: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (4)

In other words,

• How do we find minimum values (or minima) of functions ?

• We need the help of Calculus – ie the part of Maths that deals with the rate of change of a function …

Best Fit => find Minimum of the SS function!

Page 24: MChem Computing and Chemistry [B14SC3]

F(x)

Linear Least Squares – Simple Calculus (1)

Recall a function of one variable:

Tangent line (slope +ve)

Tangent line ( slope -ve)

Tangent line (slope = 0)

=> a minimum!

Page 25: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – Simple Calculus (2)

Function of one variable:

• The slope of the tangent line is given by the rate of change of F with x, dF/dx or the derivative of F.

• Furthermore, at a minimum (or maximum) value of the function F(x), the slope of the tangent line is zero ie dF/dx = 0

We can also define functions of more than one

variable…

Page 26: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – Simple Calculus (3)

Function of more than one variable:

• If F = F(x,y,z), a function of three variables x, y and z – then we can define 3 partial derivatives namely, ∂F/ ∂x, ∂F/∂y and ∂F/∂z.

You may be familiar with this notation from

your Thermodynamics studies…

Note that partial derivatives are very useful

in ALL branches of the Physical Sciences!

Page 27: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (5)

Recall our original problem of finding the minimum

of the function, SS(m,C):

0

0

C

SSm

SS• need to find the values of

m and C that make the two partial derivatives of SS vanish

• ie we need to solve the pair of equations:-

Page 28: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (6)• this is very easy to do for the straight line case

• Our SS function, SS(m,C), is given by the following (after some expansion!)

ie SS is of the general form:

2222 222

),(

iiiiii YYCYXmNCXmCXm

CmSS

GFCEmBCHmCAmCmSS 222),( 22

Page 29: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (7)

For the straight line, the Sum of Squares function

is a conic section ie a contour map of this surface

will be a series of concentric ellipses.

m

C

SS

Page 30: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (8)

When we perform the two differentiations, we get:

or

0222

0222 2

ii

iiii

YCNXmC

SS

YXXCXmm

SS

FBCHmC

SS

EHCAmm

SS

222

222

Page 31: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – theory (9)

These two linear equations are very easy to solve

for both m and C…

(you can try doing this as an exercise… !)

=> any good pocket calculator can do linear least

squares fits to data!

Now, we’ll look at a few applications of

Linear Least Squares theory…

Page 32: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (1)

Linear least squares analysis can be extended to

help with other problems.

Often, you will encounter polynomials eg

and we can set up an SS function for this n’th

degree polynomial…

ie SS = (yi-fi)2 = SS(a0, a1, a2, a3,…, an)

nnn xaxaxaxaaxP ...)( 3

32

210

Page 33: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (2)

For a polynomial of degree ‘n’, we have to solve

the following system of ‘n+1’ linear equations:-

0

0

0

1

0

na

SS

a

SS

a

SS These equations can be solved by standard matrix methods using ‘Linear Algebra’

The Microsoft EXCEL spreadsheet computer program can do linear least square fits of this more general kind via the LINEST function.

Page 34: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (3)

Two uses of polynomials…

1) Calibration curves

In Chemistry, polynomial functions are often used

to construct calibration curves for some analytical

technique such as Mass Spectrometry

or Atomic Absorption

Here the instrument response is known to be

describable by a polynomial function

(usually a 3rd or 4th degree polynomial).

Page 35: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (4)

Two uses of polynomial functions…

2) Data Smoothing

Another application of polynomials and linear least

squares fitting is data smoothing and interpolation.

Example – X Ray scattering data from amorphous

polymer samples (Dr Arrighi).

These experiments can generate HUGE data files!

Page 36: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (5)

The Intensity vs angle and temperature data are

conveniently stored as 2 dimensional Excel

spreadsheets.

Also, the I(Q,T) vs Q plots for a fixed temperature,

T, are often found to be very noisy.

Quadratic polynomials can be used to ‘smooth’ the

data so that important features stand out.

How does this work?

Page 37: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (6)

First, here’s the original noisy data:-

Q

I(Q

,T)

Page 38: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (7)

And now, here’s the smoothed data:-

Q

I(Q

,T)

Page 39: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (8)

And here’s how smoothing works…

Original data point

Interpolated data point

Smoothing Polynomial

Page 40: MChem Computing and Chemistry [B14SC3]

Linear Least Squares – applications (9)

Data Smoothing is a potentially risky operation…

You must be very careful when you do this…

Why?

Because you could be throwing away some vital

information – especially if you use too much data

smoothing!

Page 41: MChem Computing and Chemistry [B14SC3]

Data Analysis for Beginners…

1. Introduction to Data Analysis.

2. Linear Least Squares.

3. Nonlinear Least Squares.

4. Theoretical Models (and Maths).

5. Errors (and what to do with them!).

Page 42: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares

• Often we need to fit data to a more complicated nonlinear function…

• The sum of squares equations are then also nonlinear…

• and so we have to use other methods to solve the minimisation problem…

Page 43: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares

• Another feature of this type of problem is that you have to supply a reasonable starting guess for the parameters used in your theoretical model…

• You must have a feeling for the behaviour of your model function…

• Good idea to plot out both your data and model function on the SAME graph!

Page 44: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares

• By trying out several different sets of parameter values, you can get a rough idea of where a good starting guess is.

• Once a suitable starting set of parameters has been found, then there are several methods (algorithms) that can be used to locate the minimum in the SS function.

Page 45: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example

• For a good example of a Chemistry based nonlinear curve fitting exercise, one can look at First Order Chemical Kinetics.

• The concentration of a new molecule that is being produced in a chemical reaction that follows First Order Kinetics can be described as:-

Page 46: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example

here c(t) is the concentration at any time t,

and c∞ is the final steady state concentration

ie c∞ = c(∞)

To see where this comes from, let’s look atthe rate of change of c with time ie dc/dt

)]exp(1[)( ktctc

Page 47: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example

or equivalently

This is an example of a first order linear differential

equation.

)]([)exp()(

tcckktkcdt

tdc

kctkcdt

tdc)(

)(

Page 48: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example

Here is a plot of the c(t) function:-

Page 49: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example• For some models, one can transform a

nonlinear function into a linear function

by using some maths…

• eg if the model was c(t) = c0 exp(-kt) then

taking logs would give a linear equation

• However, this trick does not work for our particular first order kinetics problem!

Page 50: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example• We need to set this up as a nonlinear least

squares curve fitting problem.

• get an rough idea of possible starting values for the parameters from the c(t) graph.

• Rate Constant, k, obtained from initial slope at t = 0

• Steady State Concentration, c∞ from long time data.

Page 51: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example

Parameter estimation…

Page 52: MChem Computing and Chemistry [B14SC3]

Nonlinear Least Squares Example• a knowledge of the Chemistry or Physics

behind any curve fitting problem can be very useful in helping you to pick good starting parameter values!

• Also, once you have got a curve fit to your data you need to be able to:

• a) interpret the results and

• b) estimate errors in your parameters.

Page 53: MChem Computing and Chemistry [B14SC3]

Questions to ask about first…Before starting to look at your particular curve fitting problem, you should ask yourself the following questions:

1 Do I have enough data points?

(Straight line plots need at least

5 data points)

2 Can my data be fitted to a linear model?

(eg straight line, polynomial or using

transformed datasets)

Page 54: MChem Computing and Chemistry [B14SC3]

Questions to ask about first…3. Am I using too many parameters?

Remember Occam’s Razor – assume as few hypotheses in your theory as possible

ie Keep it Simple!

Danger of overparametrising a model by using a 5th degree polynomial when only a 2nd degree quadratic would do…

Page 55: MChem Computing and Chemistry [B14SC3]

Questions to ask about first…and a very important final question…

4 Do I really understand how nonlinear least squares curve fitting works?

We’ll look at this now.

Page 56: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?Here, we try to describe our data by some

more complicated function

y = f(x, p1, p2, p3, … pm)

and we have to find best fit values for the ‘m’

parameters {p1, p2, p3, … pm}

The SS function now becomes

SS = (yi-fi)2 = SS(p1, p2, p3, … pm)

Page 57: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?where, fi = f(xi, p1, p2, p3, … pm)

is the value of f(x) at the i’th data point xi

• The system of equations that we get by setting the ‘m’ partial derivatives of SS equal to zero are now found to be nonlinear

• The SS function itself may now have more than one minimum value

Page 58: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• We must try and find the lowest possible

minimum value ie the global minimum.

• This is a lot more trickier to do than what we have been used to with the previous linear least squares fits.

But don’t panic! – as there are ways to deal

with this situation.

Page 59: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• We use a search method to explore the

multidimensional (hyperspace) surface defined by the SS function.

• We need a starting value for our parameters and we may also need expressions for the partial derivatives (∂f/∂pj) of our model function f.

• These are used to find the fastest route to the minimum.

Page 60: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?Search methods can include:

• Steepest Descent

• Parabolic Surface Approximation or Quadratic Form.

• Simplex Algorithm

• Genetic Algorithms

Page 61: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• In practice, most good programs will use

a combination of the first two methods

• eg the Levenberg - Marquhardt Algorithm.

• The search method will continue looking for the minimum (iterating) until some suitable stopping criterion has been met.

Page 62: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• The EXCEL spreadsheet ‘Solver’ add in

uses the ‘steepest descent’ method

• it evaluates the partial derivatives (∂f/∂pj) numerically, i.e. by approximation

• your model function, f, must be smooth and well behaved eg no spikey or sharp bits!

Page 63: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?Let’s now look at a two parameter model,

this uses parameters p1 and p2

so we can write f = f(x, p1, p2)

• this could be our earlier First Order Chemical Kinetics example

• where x = time t, p1 = k, and p2 = C∞.

Page 64: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?Here’s a possible contour map of the

Sum of Squares function SS(p1,p2)

p2

p1

Page 65: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?here’s the result of using a search method to

look for the minima in the SS function:-

p2

p1

Page 66: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• If it is computationally difficult or

expensive to calculate function derivatives, then there are other search methods such as the SIMPLEX method.

• For a two parameter problem, this uses three points on the SS surface to define a Simplex shape which can be moved to hunt for a minimum…

Page 67: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…A 2 parameter simplex search

p2

p1

SS

Page 68: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…Step #0

p2

p1

SS

Page 69: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…Step #1

p2

p1

SS

Page 70: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…Step #2

p2

p1

SS

Page 71: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…Step #3

p2

p1

SS

Page 72: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…Step #4

p2

p1

SS

Page 73: MChem Computing and Chemistry [B14SC3]

The SIMPLEX method…Step #6

p2

p1

SS

Page 74: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• For a 3 parameter model, the simplex

method would use 4 points (a tetrahedral simplex)

• For a model with ‘m’ parameters, we would use a simplex constructed from ‘m+1’ points on the ‘m’ dimensional SS hypersurface.

Page 75: MChem Computing and Chemistry [B14SC3]

How does NLLSQ work?• There are some situations where you

need to curve fit several sets of data at the same time…

• This can be achieved by generalising the idea of the Sum of Squares to include several data sets e.g.

M

k

kN

ikiki fySS

1

)(

1

2,, )(

Page 76: MChem Computing and Chemistry [B14SC3]
Page 77: MChem Computing and Chemistry [B14SC3]

NLLSQ problem?• There is one important problem that you

will experience at some stage in your career when doing curve fitting…

• It is one which is not obvious at first sight, and can cause unnecessary grief to the inexperienced…

…especially when writing up your PhD !!

Page 78: MChem Computing and Chemistry [B14SC3]

NLLSQ problem?• The technical description of this problem

is “Overparametrisation of a Model”

• What happens is that one of your model parameters depends implicitly on some of the other parameters.

• This is not easy to spot at first!

Page 79: MChem Computing and Chemistry [B14SC3]

Overparametrisation…An example

A 4th year project student was asked to fit

their data to the following function:-

This seems to be a 4 parameter function.

dcx

baxxf

)(

Page 80: MChem Computing and Chemistry [B14SC3]

Overparametrisation…

To their surprise, they found that more than

one set of parameters gave rise to an

identical curve fit!

What is going on here?

Some further analysis is required…

Page 81: MChem Computing and Chemistry [B14SC3]

Overparametrisation…Notice that for this function, we can divide

both the top and the bottom of the fraction by ‘a’

This yields the following expression for f:-

)/()/(

)/()(

adxac

abxxf

Page 82: MChem Computing and Chemistry [B14SC3]

Overparametrisation…and this can be written as:-

where p1 = b/a, p2 = c/a, and p3 = d/a.

This analysis shows that we really have

a 3 parameter model and

not a 4 parameter one!

32

1)(pxp

pxxf

Page 83: MChem Computing and Chemistry [B14SC3]

Overparametrisation…In more general terms

• We have some model function

f = f(x, p1, p2, p3, p4), and, unknown to us,

the parameter p3 is also a function of p1, p2 and p4 i.e. we have p3 = g(p1, p2, p4)

• This means that we are really dealing with a 3 parameter model, h(x, p1, p2, p4), where h = f(g(…)) and g is unknown!

Page 84: MChem Computing and Chemistry [B14SC3]

Overparametrisation…

From the perspective of the SS function,

we are not dealing with a true minimum

• instead we have a surface that is now more like a valley (the Grand Canyon effect).

• What is the answer to this problem?

Page 85: MChem Computing and Chemistry [B14SC3]

Overparametrisation Fixed!

Fix one of the parameters to a sensible value

and then let the others vary!

1 The assumption is that you have enough

information about your model to make a sensible choice for this parameter (and its value)

2 This may require you to perform different

additional experiments!

Page 86: MChem Computing and Chemistry [B14SC3]

Data Analysis for Beginners…

1. Introduction to Data Analysis.

2. Linear Least Squares.

3. Nonlinear Least Squares.

4. Theoretical Models (and Maths).

5. Errors (and what to do with them!).

Page 87: MChem Computing and Chemistry [B14SC3]

Theoretical Models and MathsSome examples from my past

(mainly polymer science).

• Polymer Physical Ageing data – uses KWW function, f(t) = exp[ -(t/)β ]

• Deconvolution of ‘FTIR’ peaks: Use of Gaussian functions to do peak area determination

Page 88: MChem Computing and Chemistry [B14SC3]

Theoretical Models and Maths

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

164016601680170017201740176017801800

FTIR peak constructedfrom 3 Gaussian functions

Might be usedin Copolymer Composition Analysis…

Page 89: MChem Computing and Chemistry [B14SC3]

Theoretical Models and Maths

• Copolymer Reactivity Ratios determination

• Tg vs “wt fraction” plots => Kwei equation

• Assymetric peaks fitted with Exponentially

Modified Gaussian function e.g. data from

a Dynamic Mechanical Thermal Analyser

(DMTA) experiment.

Page 90: MChem Computing and Chemistry [B14SC3]

Theoretical Models and Maths

• Neutron Scattering data fitted with Dawson’s Integral Function…

dteexFx

tx 0

22

)(

Page 91: MChem Computing and Chemistry [B14SC3]

Using the Excel ‘Solver’ add in

Spreadsheet layout (simple example) :-

p1 p2 p3 SS

3.5 -2 0.4 2.1460

X Y F Resid^2

0.0 2.92 3.50 0.3364

1.0 1.94 1.90 0.0016

2.0 1.55 1.10 0.2025

3.0 1.55 1.10 0.2025

4.0 2.09 1.90 0.0361

5.0 3.20 3.50 0.0900

6.0 4.77 5.90 1.2769

My Data

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

X

Page 92: MChem Computing and Chemistry [B14SC3]

Using the Excel ‘Solver’ add inSpreadsheet layout (simple example) :-

1. Data, Fitting Function and Residuals clearly laid out

2. Parameter values and Sum of Squares clearly laid out

3. Graph of your data (points) and fitted function (curve). This gives you immediate visual feedback!!!

Page 93: MChem Computing and Chemistry [B14SC3]

Using the Excel ‘Solver’ add inAfter invoking the Solver Dialog…

1. Set the Target Cell [$D$4]

2. Equal to ( )Max (*)Min ( ) Value of [ ]

3. By Changing Cells:

[$A$4:$C$4]

4. Then hit the Solve button!

Page 94: MChem Computing and Chemistry [B14SC3]

Using the Excel ‘Solver’ add in

Spreadsheet layout (after Solver) :-

p1 p2 p3 SS

2.921429 -1.21607 0.253928 0.0035

X Y F Resid^2

0.0 2.92 2.92 0.0000

1.0 1.94 1.96 0.0004

2.0 1.55 1.51 0.0020

3.0 1.55 1.56 0.0001

4.0 2.09 2.12 0.0009

5.0 3.20 3.19 0.0001

6.0 4.77 4.77 0.0000

My Data

0.00

1.00

2.00

3.00

4.00

5.00

6.00

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

X

Page 95: MChem Computing and Chemistry [B14SC3]

Good Practice…Before any curve fit:-

• Draw a Graph of your data!

• Can you evaluate the model function in a single spreadsheet cell?

• If not then you may need to use special techniques

Page 96: MChem Computing and Chemistry [B14SC3]

Good Practice…Special spreadsheet techniques:-

1 Use extra columns as workspace

2 Or consider writing your own User Defined Function by employing Excel’s built in programming language (VBA)

This is better if you are dealing with a large

data set – and is less prone to errors when

setting up your spreadsheet!

Page 97: MChem Computing and Chemistry [B14SC3]

Excel User Defined functions…

a) in the spreadsheet cell use:-

“=MyFunc(A4,$B$1,$B$2,$B$3)” to evaluate

a user function with 1 variable and 3 parameters

The $’s mean you are using absolute references –

these won’t change when copying cells!

b) create a new VBA Module (with shift F11)

and then use the following template code:-

Page 98: MChem Computing and Chemistry [B14SC3]

Excel User Defined functions…

e.g.

Function MyFunc(x as double, p1 as double, p2 as double, p3 as double) as double

MyFunc = (x+p1)/(p2*x+p3)

End Function

Page 99: MChem Computing and Chemistry [B14SC3]

Good Practice…After any curve fit:-

• ALWAYS look at the Residuals…

• Can give you a better idea of the quality of fit to your data!

• May indicate that a different model/theory is needed…

and also that your supervisor got it wrong!

Page 100: MChem Computing and Chemistry [B14SC3]

Data Analysis for Beginners…

1. Introduction to Data Analysis.

2. Linear Least Squares.

3. Nonlinear Least Squares.

4. Theoretical Models (and Maths).

5. Errors (and what to do with them!).

Page 101: MChem Computing and Chemistry [B14SC3]

Errors (and what to do with them!)

• This really needs a separate lecture!

• For Linear fits – use Excel’s LINEST function, as this will also report parameter errors

• For Nonlinear curve fits, this is more tricky and 3rd party add ins are required…

Page 102: MChem Computing and Chemistry [B14SC3]

Errors (and what to do with them!)

• Could use “Solver Aid” – which is a VBA Excel Macro which will estimate the errors in your fitted parameters

• This is available from the website of Robert de Levie – who also has a book…

“Advanced Excel forscientific data analysis”

(Oxford University Press)

Page 103: MChem Computing and Chemistry [B14SC3]

Errors (and what to do with them!)

• How does “Solver Aid” work?

• Need to look at what happens to the SS function near the minimum when all parameters are frozen apart from one

• This gives a function of one variable in the parameter of choice, eg p2

• the shape of this function near the minimum determines the error in p2

Page 104: MChem Computing and Chemistry [B14SC3]

Conclusions…We have considered:-

• Introduction to Data Analysis.

• Linear Least Squares.

• Nonlinear Least Squares.

• Theoretical Models (and Maths).

• Errors (and what to do with them!).

Page 105: MChem Computing and Chemistry [B14SC3]

Conclusions

When you have mastered these skills,

you will then have started the journey to

becoming a Professional Scientist…

Have fun, and remember Dr Ferguson’s

42nd Law:-

DON’T PANIC!!