estimating the accuracy of the approximation (surrogate) from assumption that error is due to...

13
Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get estimate to error standard deviation (called standard error) Standard measure of accuracy Coefficient of multiple determination measures how much of variability in data is captured by approximation Adjusted coefficient of multiple determination accounts for the fitting bias n n y T e e 2 ˆ y r n i i r n i i y SS SS R y y SS y y SS y y 2 1 2 1 2 ˆ n n n R R y y a 1 ) 1 ( 1 2 2

Upload: jefferson-lynd

Post on 02-Apr-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1

Slide 2 Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get estimate to error standard deviation (called standard error) Standard measure of accuracy Coefficient of multiple determination measures how much of variability in data is captured by approximation Adjusted coefficient of multiple determination accounts for the fitting bias Slide 3 Curve fit noise=randn(1,30); x=1:1:30; y=x+noise 3.908 2.825 4.379 2.942 4.5314 5.7275 8.098 25.84 27.47 27.00 30.96 [p,s]=polyfit(x,y,1); yfit=polyval(p,x); plot(x,y,'+',x,x,'r',x,yfit,'b') With dense data, functional form is clear. Fit serves to filter out noise Slide 4 Example with y=0.1*x noise=randn(1,30); x=1:1:30; y=0.1*x+noise ; xx=[ones(30,1),x']; [B,BINT,R,RINT,STATS] = regress(y',xx) Stat 0.3016 12.0896 0.0017 1.7498 Slide 5 Estimating error in coefficients Some coefficients are more accurately estimated than others Standard error in coefficient is t-statistic is ratio of coefficient to standard error, would like it to be at least 2 Coefficients that are poorly estimated may be dropped to improve accuracy of predictions Dropping one coefficients changes t-statistics for others Need to iterate in dropping and adding coefficients Slide 6 Regression in Excel (add-in data analysis) Rand Rand-0.5 x y fit error 0.7647420.26474211.2647421.035390.03539 0.258649-0.2413521.7586492.0311920.031192 0.7350260.23502633.2350263.0269940.026994 0.411036-0.0889643.9110364.0227970.022797 0.6749210.1749212424.1749223.93884-0.06116 0.694810.194812525.1948124.93465-0.06535 0.6479640.1479642626.1479625.93045-0.06955 0.407839-0.092162726.9078426.92625-0.07375 0.211674-0.288332827.7116727.92205-0.07795 0.405013-0.094992928.9050128.91786-0.08214 0.242633-0.257373029.7426329.91366-0.08634 Slide 7 Regression output SUMMARY OUTPUT Regression Statistics Multiple R0.999381 R Square0.998763 Adjusted R Square0.998719 Standard Error0.313962 Observations30 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept0.0395870.1175700.3367110.738845-0.2012450.280419 X Variable 10.9958020.006623150.3646232.93E-420.9822371.009368 Slide 8 Output with y=0.1x SUMMARY OUTPUT Regression Statistics Multiple R0.969193 R Square0.939334 Adjusted R Square0.937168 Standard Error0.251021 Observations30 Coefficients Standard Errort StatP-valueLower 95%Upper 95% Intercept-0.190830.094-2.030120.051942-0.38340.0017 X Variable 10.110250.00529520.821771.41E-180.09940.1211 Slide 9 Example 3.2.1 Given data Use Microsoft Excel to fit linear and quadratic polynomials Compare standard errors and t-statistics of coefficients X-2012 Y-1.5 01.251.75 Slide 10 Linear fit Slide 11 Quadratic fit Slide 12 Graphical comparison. Slide 13 Cross validation Error estimates based on model assumptions are vulnerable For polynomial response surface approximations assumptions are rarely satisfied Cross validation divides data into n g groups Fit the approximation to n g -1 groups, and use last group to estimate error. Repeat for each group When each group consists of one point, error called PRESS (prediction error sum of squares) Calculate error at each point and then presenting r.m.s error Can be shown that Can be used only if not ill-conditioned Slide 14 Questions The pairs (0,0), (1,1), (2,1) represent strain (millistrains) and stress (ksi) measurements. Estimate Young modulus using the three commonly used error norms. Estimate the error in Young modulus using cross validation