(c) (d) (b)

Upload: mathathlete

Post on 30-May-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 (c) (d) (b)

    1/30

    In this chaptcr, studcnts will lcam:(a) conccpts ofscattq diagram, correlation coellicient and linear regression;(b) calculation and jnterprctation ofthe product monrcnt correlation coefficient and oflhecquation ofthc lcast squares regression line;(c) intcrlolationandcxtrapolation;(d) usc ofa squarc, rcciprocal or logarithmic transfomation to achicve linearity.Notc: Studcnts arc not rcquircd to lcanr(r) clerivationoflirrrnulac;(b) hypothcsis tests

    L2.3.4.5.6.

    Bivariate Data and Scattcr DiagramsProducl Moment Correlation Coellioient, /Regrcssion LincsInterpolation and ExtrapolationLinea|ization of Bivariatc DataMiscellaneous Examples

    l. Bivariate Data and Scatter DiagramsThe type oldata with each obscrvation having two measurenelts associated with it is callecl abivariate data.

    trgl.v

    Age ofa planl Quantity of fruit producedHeight of students Weight of students

    wcight at the end ofa spring Lcngth ofthe springDitunet(l ufstem of a plant Average leigth of leafof the plantNo. ofhrs spent studying Marks achieved

    'firne 'l emperature ofcooling object

    c8-l

  • 8/14/2019 (c) (d) (b)

    2/30

    Scatter DingramThe most common and convcnicnt mcthod ofdisplaying a sel clfbivariate data is by means oi ascatler diagram.

    Wc trcat thc bivariatc pairs as sct ol(r, r) coordinalcs and plot thcm as a graph io obiain a scl ol'points. Thlr scattcr diagram will revcal thc rclationship bctwccn thc two variablcs.

    Eg 2 The marks ola class of l0 studcnts jn a Mathernatics cxamiratjon are give,l in the tablcStudent R (' D F] F (i H

    onark in Paper 1) 12 84 50 42 33 50 69 8l 5o :15v

    (mark in Paper 2) 31 u3 42 60 63 59 92'/3 ,10

    Use of GC to obtain Scatter DiagramCC:Step l: Entcr data

  • 8/14/2019 (c) (d) (b)

    3/30

    Step 2i Plot the data(sr',\'r'PLo1). Sct Plot to 'ON'

    < I :Plot 1...>. ON>.:cntcr>' Choosc type ofgraph 'scattcr plot'' Xlist :Lr (-r-coordinates). Yljst: L2 L| coordinatcs). Mark: Any. .:TRACE>,

  • 8/14/2019 (c) (d) (b)

    4/30

    Do it yourselfQn l: The height and weight ofa class ofl0 students are given in the table below:

    Sketch a scatter diagram for the set ofbivariate data given-

    Soln:

    Student B c D E F G H I J(Height in m) 1.5 I.58 1.6 1.61 1.65 1.72 1.73 r.78 1.8 1.85

    v(weisht in ks) 53 5'7 62 65 66 '70 '75 72 90 85

    c8-4

  • 8/14/2019 (c) (d) (b)

    5/30

    Analvsis of Scatter Diagram

    XXXXX Xx )0(XXXX

    Xand I related in this way are said 1() have acorelation.(linear relationship)i.e. as n gets larger, y gets larger

    XXXXX

    XX XXXX XX

    corelation.(linear relationship)i.e. as r. gets larger,l, gets smaller

    XX n

    iYtrXX1XX X correlation(No clear relationship)

    c8-5

  • 8/14/2019 (c) (d) (b)

    6/30

    Wc rvi ll only be dealing with Iinear rclationship. I f points in the scatter didgram seem to lie near astraight iinc. we say that lhere is linear correlation bctween -r and I.Notc:. Scatter diagrarns arc uscd only for quantitative variables (i.e. height, mass, counts, ctc).. Scatter cliagrams can givc us a visual evidence ofoutliers.

    .lnterpretationofthcstrcngthsolclybasedonlhescallerdiagramissubjectivcanditcanbecleceiving when clifl_erent scales for the axcs arc used.

    2. Linear Product ryJomcnt Correlation Coefficient /To measure the degree oflincar rclationship betweeD two variables r and 1 (which is calledcorrelation), a quantity called the , will be needed.The estimatcd product-moment correlation coef'licient ofa sample is given by:

    Scalc: 1 to 5 (-r-axis)i I to l5 (v-axis) Scale: I b 50 (,r axis); 1 to 15 (y-axis)

    I(' rx:y t) I','I{zEC;Ito-/-t'Found in MF 15where r denotes the rnean ofall the l-valueswhile y denotes the mean ofall they-values.

    I

  • 8/14/2019 (c) (d) (b)

    7/30

    Importnnt notes on the product-momcnt colrelation cofficient, /l. 1! r 11. reR2. Sign o1 r intlicates the direction ofliirear corelation.

    r > 0- posilivecol_relation/

  • 8/14/2019 (c) (d) (b)

    8/30

    XX vxxx., xX XiXXXaXX XXXXXX Xx )o(XXXX

    vX,.XX,( X XXXXX XX

    Diagrarn (a) Diagram (b) Diagram (c)

    Eg 3aThe marks in Mathematics (.r) and Chemistry (/) obtained by ten randomly chosen JC 2 studentswere taken and the summadsed data were given as followsI xy = 3}6ao, l'2 = tus+, Z yt = $azo, Ir = 528, Z y = 6saFind the product moment conelation coefficient / and comment on the value ofr obtained.

    Soln: r= y- I'IY2,, tl,)'lfr", t:,I' )\"' n

    38640_ (s28)(666)10 (conect to 3 sig fig):qqa+ (528)'li4esro (uuuf lr0 Jl r0 J

    c8-8

  • 8/14/2019 (c) (d) (b)

    9/30

    Eg 3bThe data in the above example is given in the table below instead ofthe sumtnarised statistics. Findthe product momcnt correlation coefficient r and comml-nt on the value ofr obtained.

    Soln:IJse of GC to obtain rStep I :

    Step 2: Tum diagnostics on

    Step 3: lnRp!l

    !J=E+bXd-Jo. I Jt rlErfJJLr-. J-{a D ?rf I 7JI.|-.(|+|+L'J(z.zor-.8626339159

    Key in the data using

    r = 0.863 (coffect to 3 sig fi8)

    18 20 30 40 16 54 60 80 88 9)J 42 54 60 54 62 68 80 66 80 t00

    Lr(r) = 18

    :Izg30t0t65\60

    c8-9

  • 8/14/2019 (c) (d) (b)

    10/30

    Do it yoursellQn 2: The height and wcight of a class of l0 students arc giverl in the table belo\.v:

    Find the product moment correlation coefficicnt r and comment on the valuc ofr obtained.Student;-

    (l lcight in m)

    B C D F F G H J1.5 1.58 1.6 1.63 t65 t1) 1.73 1.78 t.8 1.85

    v(Weisht in kg) 53 5't 62 65 66 10 '75 12. 90 85

    Soln:

    3. Regression Lines/'mcasurcs how wellcquation ol a straiglrt

    the data fits a linear model. Ifthe fitlinc to model the relationship. This

    is good, we can consider fbr-mulating anstraight line is called a regression line.

    (a) F-or any biva.iate set ofdata, connecting variables r and/, there are always qg_glliqJglXdefined reercssion Iines.

    Least Squares mthodThe line ofbcst fit, also known as the regrcssion line, is found based on the least squares method.(For a bettcr understandinS ofthis mcthod, visit the website belowlltp:,'/Nww.dwamicseomctry.con,'Javasketchpa(i/Gallery/Othcr FxDlorittions ond AlnusencntslLcast Sauales.html)

    c8,10

  • 8/14/2019 (c) (d) (b)

    11/30

    Equation of the lcast squares regression line oft on x"Least squdres" Regression line of y on x) minirrizel ) ennrs

    y:a+bx

    J = a +br (lea.;l squares regression line of y on x)J = a + hy is obtained by finding values ofa and b such that lel is minimum. (e is the differencebetwcen the observed and expectedy, aiso known as residuals)

    Observed value ofy:Expected valu ofl,:

    Thel-coordinate of the point.The correspondingy-coordinate on the Lne.The difference between the above two.:

    aid a = t-b;

    b= I ("-;X-v - t) \- -\- .,, xy t'^ z-t and r-t=b()l;-;) (in MFrs))t"-;)' -,, (I')'z'^ nThus,y = t+b(,Y-t)

    c8-l i

  • 8/14/2019 (c) (d) (b)

    12/30

    Note: t. t" \=4::!!, )=4Regrcssion linc passes through (t,t), the rnean olthe set ofbivariate data.b is klrown as the estimated regression ooellrcient (slope ofgraph).a is the/-intercept..r is the indepndent variablc (controllcd) andJ'' is the dependent variable.Ilegression linc is used tbr estimatirlg ), given ). (r is the independent variable)

    Eg.la 'lhe marks in Mathcmatics (.r) and flhemistry 0,) obteined by ten randomly chosen JC 2students were takcn and thc surrrnariscd data wcre given as lollows

    20

    r12Soln:Use ofGC to obtain regression lineStep I : Key in the data using

  • 8/14/2019 (c) (d) (b)

    13/30

    Eg 4b Suppose that the table in Eg 4a is not given and thc data is summarised as I-ollowsI.r=11J640. 1,,2 =:++o+. lt'-totztt.I jr=52S, )-r=666,n: lo(a) Find the equation ofthe estimated regression line of.), on,rr.(b) Interpret the siope and ),-intercept in lhe context ofthe question.(c) P.edict the marks in Chemistry Cr) ofa JC2 studcnt ifhe obtained 50 marks in

    Mathematics (,r) using the regression line obtained in (ii). Comnent on thereliability of the score obtained.Soln:(a)

    \'.\-,,L^. /-^,/,.1= s2.8 , t=

    {s28)(ob6)18640 - "t0= 66.6

    r,,-(t')' l52S tl),4464 ) L10Estimated regression line ofy, on;:: y =(b)

    (c)

    Slope:

    i/-intercept:

    For an increase of I in the Mathematics score, there is an increase of0.528in the Chemistry score.A student is estimated to score 38.7 for Chomistry when he/she scores 0 forMathematics.

    Since r ry 0.863 , it indicates a high positive linear correlation between Mathematics andChernistry scores. llence, the predicted score is reliable.

    Wherr r=50, _l=y- I'Iril;'rtt;"arLl ,f;'ilF,*ff t-\i{" , )\" n ) 38640_(528)(666)

    10

    c8 13

  • 8/14/2019 (c) (d) (b)

    14/30

    Do it YoursellQn 3:The ro. ofhours spent studying for a particular subject in a week and the marks obtained 1br a testfor l0 students are given in the table below:

    Student B c D E F G H I J(No. ofhours per week) 5 7 8 t0 1) 13 15 20 2t

    v(Mark) 53 5'l 62 66 '70 '75 72 90 85

    (a) Find the equation ofthe estimated regression lile of7 on:r.(b) Inter?ret the slope andl-intercept in the context ofthe question.(c) Estimate the no. of hourc a student needs to spend in order to achieve a mark of 80 in thetest. Comme[t on the reliability ofthe va]ue obtained-

    Soln:

    c8-14

  • 8/14/2019 (c) (d) (b)

    15/30

    Equatiop of the teast squares regression line of.r on r

    -r - c + d! (lea:;l .tqLter.ts rcgresston li)lc oJ x on y),r: c + dI is obtained by finding values ofit ilncl b such that te2 is minimum.(e is the di1lrence betwcen the obscrvcd and cxpcctcd r, also known as residuals)Observed value of -r: Ther coordinat ofthe point.Expected value of,r: 'lhe corresponcling r coordinaic on the line.

    e : The difference bctwccn thc abovc two.Equatiorl ofregression line ofr on.1, can bc found using

    Note: t, t,,\-,

    y. Regression line passes through (t,t), the mean ofthc set ofbivariate data.

    "Lcast squares" Re(r?ssion li ?ofxotr!(Minimizc -r' crrors)

    ...t xX'\-;; x -\;. '(i,r)\,..:/, X

    \ " r:c+d/.,

    x""'xX ''"

    t(,.,-Xr t)lrr rt' or\-.!.,\r, L"L' and y-1=d(x ;) (in MFls)

    I:u' _DIThus,J= t+d(,t-t)

    c8-15

  • 8/14/2019 (c) (d) (b)

    16/30

    ' d is known ns the estimated regression coclticient (slope of !3aph).' c is the r-intercept.. Jl is the independent va able (controllcd) and -r is the dependent variable.. Regression line is used fbr estimating n gjverJ, (), is the independent variable)Ilelationship bctlren / and regrcssion lines. A differcnt line olregression will bc obtained if rve interchangc thc jndependert and depcndeDt

    vadables.

    v Regression line of.), on -r

    Regression line ofx on JL/

    Spccial case: (For/=+l)y : a + bx (least squares regressiotl line ofy on x)r : c + d/ (ledtl squeres regression line ofx on y)If/ = fl, the two lines coincidl--

    r.= I (ifboth b and d are positive)r:- I (ifboth b and d are negative)Tlc larger the numerical value ofr, the nearer the lines approach coincidence andthe nearer the points are to having a linear hend.If the two lines are identical, i and I have pg4&qtll!494ryq1AtigD!t!ip.

    c8-16

  • 8/14/2019 (c) (d) (b)

    17/30

    No lihem correl.tion r = 0

    Eg 5a Find the regression lines of/ onr and r on.), for the data below and also calculate theproduct moment correlation coeffi cient.

    Soln:Step l: Placer values in Ll and.p values in L2. Step 2: To get product moment corelation coefficient.

    Step 3: To get regression line ofj2 on jr.

    I 2 4 6 ,] 8 10v l0 l4 l2 13 15 t2 t3

    c8-t 7

  • 8/14/2019 (c) (d) (b)

    18/30

    rnReg(B+bx) Lr,LrnRPgz,Vrl Ic=a+bxa=11.70403587b=. 1S68986547rr=. 1438282624r=,37BlEB19E3)=11.7+0.186,lj

    Step 4: To get regrcssion line of-r ony.

    x: - 4.34 + 0.769 yr + 4.14.. v - --l --:-- ) store thrs as Y'' 0.7b9

    Note: All above regression lines are stored in Yt, Y2 respectively so that the regression line canbe obtained graphically (Not really a must-do)

    Soln:Regression liney on rc isRcgression line -r ony isProduct moment coffelation coeffi cint.

    lnHeg(E+bx) Lr,r , Vzl 'J=E+bxd- +. ,J.tiJ ?.4J ?JLt-- I OOJltlJl{]Jrz=.1438282624r=.3781881983

    c8 18

  • 8/14/2019 (c) (d) (b)

    19/30

    Eg 5b Find the regression lines ofjl on r and r on / for the data below and also calculate theprcduct momcnt correlation coeffi cient.lx2 =2t0, )r = 38' n :7ly2 =tt+t, Iy = AS, Z^y=qgs

    = 5.43 = 12.'7\'-s.,I _, Z-^ Z.' {18)(8e)495 : --,1--,1-.., (I')' t3s t')70 ) L

    Soln:

    Regression Iine/ on,Y is:

    (used to jlnd I when x is give )F'F u ..- (ls)(so)\ nt L L' 495-u u' n 1s,,.-(I,) l r47- {84)'z-' n 1Regression line ofr onJr' is:

    (usedJbr.lindi g r when )) is gire )t- I'Iu 4es {J8)(8e)-1/1r, _tI,, l[r" {Id] ./[zro-:s llrr+z-8] Illt'- , )lt' , I 1r 7t\ 1l

    (Compare these a swerc with those you obtained using GC)

    c8-19

  • 8/14/2019 (c) (d) (b)

    20/30

    Eg6civen )(; i)(y t)=2s, I(x-r)' =s0 and l(y,y)2 =3s,ea1gut.1.(0 the coefficient ofregression fory on r, and(i0 the linear product momcnt corelation coefficient.Soln:(D Coefficicnt ofregression fory on,r,(iD Linear product moment correlation coefficient,

    Advantage of scatter diagramOutlier / suspect / anomaly affects computation ofr. Sketch scatter plot

    (-r,,-t,r)Regession lineofy on r

    x. Identify the outlier data pair (J.r,.l,r). Remove data (xl,.t/r) ftorn CC. Recalculate the corelation coeflicient for the revised data. Recalculate the line ofregression ofy on r for the revised data.

    c8-20

  • 8/14/2019 (c) (d) (b)

    21/30

  • 8/14/2019 (c) (d) (b)

    22/30

    (ii) Eqn- ofrcgrcssion line o1_r on r:

    (iij)

    c8 22

  • 8/14/2019 (c) (d) (b)

    23/30

    trg9The averagc densities ofblackbirds (in pairs per thousand hcctarcs) ovcr vcry large lreas off'amland and ofwoodland arc shown, f-or the years 1976 1o 1982, in the table below.

    Year t9'7 6 19'77 197ft 1919 | 9E0 lgSl 1982Fannland density (r) 83 91 gl 86 tD2 l]l 98woodland density (-r) 313 342 366 350 31{t 400

    ;;y: e r>7,) i,2 = 641?9,I-v = 2585,Ix2 = 964609,I-ry = 248579Counting blackbirds in woodland is easier than counling then in lalmlaDd. It is desired in future todetermine only woodland clcnsity and hence use it kr estimate larmland dcnsity.(i) Trcating thc ycars as providing indcpcndcnt lajrs ofobservations, usc thc givcn data to

    cstjn)ate thc lincar regressjon cquation oft, on.r rclating the farmland and r.voodland deisities.(ii) Given that the 1983 woodland deisity is 500, estimate the average l'annland density 1or that

    year. Comment on thc rcliability ofthc estimation obtaincd-

    Soln: (r)Manurl meihod

    _ lt , ) rlet

  • 8/14/2019 (c) (d) (b)

    24/30

    (a)o)

    Do it yourselfQn 4:The no. ofhours spent studying for a particular subject in a week and the marks obtained for a testfor 10 students are given in the table below:

    Srudent R c D E F G H I J(No. ofhours per week) 5 l 8 l0 ll 12 t3 15 20 21

    v(Mark) 55 60 62 63 66 '73 75 '14 89 84

    Find the equation ofthe estimated regrssion line ofy on x.Estimate the no. ofhouN a student needs to spend in order to achieve full marks in thetest. Comment on the reliability ofthe value obtained.

    Soln:

    Obtain the least square stimates for d and B using an equation of the form(i) y=q+ Blogtar and(ii) y=d+Px2as a fit for the set ofdata shown above.Determine which equation is a better fit, giving rcasons to support your answer.

    c8-24

  • 8/14/2019 (c) (d) (b)

    25/30

    Soln: (i)

    Frt,m CC, v: Therelbre d:

    .f -.1+ /loglr-r =Kcy in the cltta lbr x, l and z jnto

    Since the correlation coelficient ibr part (i) is larger than that in part (ii), there is amuch better positive linear conelation. Therefore, t = a + / logr0 r is a better fit.

    rr =loB{Lt }

    ETt8E6It591EO95

    c8,25

  • 8/14/2019 (c) (d) (b)

    26/30

    6. MiscellaneousExamplesEg 11A random sampie ofeight pairs of values of). and.), is used to obtain the following equations oftheregression lines ofy on n and ofrc on J., respectively.7 t5t 7I x. _. .t___v+20' t0 l0 6-Seven pairs ofdata are given in the table.

    Find the sth pair ofvalues of(jr,./). Detemine the value ofthe product moment conelationcoefficient and comment on what its value implies about the 2 regression lines given above.

    Let y be the value obtained by substituting a sample value ofr into the equation ofthe regressionline ofy onx. Evaluate fforeach ofthe eight values ofxand venfy that )(7 f)'=S.S.For each ol the sample values ofx, I/'isgivenby y'=a+bir,where u*!!1, 6*-1.y7lro1"onI0 l0you say about the value of I(1,- f ')'] ?Soln:

    1 l5tlr=--J+-.-...--(l) l0 l07y= _-y + 20 ...... (2)6

    Solve (l)aad(2) we get r: andl:Since we need to find the 86 value, take n:8therelore lr and Zy -From the given data, )a =O+ and )a =a0Therefore the 8rh values are -n = and .).) =

    l0 1l 12 l1 1'7 't4 ls-l 9 8 ,7 6 5 4 1

    Using GC, r =

    c8-26

  • 8/14/2019 (c) (d) (b)

    27/30

    Sincc r . 0.90,+,which is very close to I , it indicates a high negative lincar conclation between,\ and I. Hcnce thc rcgrcssion lines are very close.

    l0 1l 12 ll l1 l4 19 10v 9 8 I 6 5 4 u

    _. 1 l5rt0 t0 8.1 7.4 61 1.4 1.2 5.3 1.8 8.1lf )l' 0.81 0.36 0.09 1.96 3.24 1.69 0.64 0.01I(.u r)' = 8.8 (shown)

    Eg 12The daily rate charged by a ca-hire firn varics with thc lcngth ofthc hirc period. Thc finr-r'sbrochure gives fhe following data:

    Calculate the value ofthe product- moment correlation coel'ficient.Give a sketch ofthe scatter diagram fb. the data, as shown on your calculator, and hence(i) comment on the suitability offinding the linear regression line ofy on r,(ii) state, with a reason, which ofthe following models is approp.iate.

    A: .y-n11'rr B:1=a16": C,y=u+! D: y=a+hlnxjl.F or the appropriate mod{-l, calculate the least squares estimates of a and b. Find also the productlnolrlent corelation coefficient and commeDt on the suitabilitv ofthe modcl-

    H irePcriod,,r l)ays

    2 3 4 5 l0 30 50DailyRate $.1 149 119 115 11). 109 105 103 10i

    c8-2',1

  • 8/14/2019 (c) (d) (b)

    28/30

    Soln:. Entcr thc data into the GCI as in two lists (say Iand y). Prcss l2"d l[CATALoG][D] and select Diagnosticon.. With the command Diagnosticon, on thc Homc Screen, find ilny regressir'n ei.luation.(Follow previous exarnple to find the regression equalior)Your scrccn shot should look like this:

    L i nEesv=EX+ha= -. 4986649635h=128.5658301rr=.317465457?F= -. 56344la73la9Therefbre /-= -0.56J (corect to 3 places ofdecimals)Sketch the data as a scatter diagram.

    (i) The scatter plot ofl, and r shows that the relationship betwecnJ", and x is non-linear.Morover, the / value indicates a low negative linear col.(rlation. Hcnce, theregression line ofJL, on x is not suitable.It can bo asily identified as C since-! tends to a limit lbr larger value ofr.

    h'l akc y-a.- iey a Ibz.Drawlher(e,ressic,nline_yonz.(ii)

    c8-28

  • 8/14/2019 (c) (d) (b)

    29/30

  • 8/14/2019 (c) (d) (b)

    30/30

    Soln:Equation ofthe regression line ofr on / :When r:300, r:It is not a suitable model as the concentration cannot bc a negative value.(i) / = - 0.994 . There exisis a high negativ linoar correlation.(ii) As I is an independent variable, regression line ofy on t is appropriate.

    Y = 4.62 0 0123tAs / is close to I, the regression lines ofl on I and l on 1 are almost idertical, therelore we canusel on I to estimate L