chapter 3 - the science of physical geography

Upload: asmawi-bin-abdullah

Post on 10-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    1/25

    Physical Geography Textbook

    76 The Science of P hysical Geography

    CHAPTER 3: The Science of Physical Geography

    (a). Scientific Method

    (b). The Hypothetico-Deductive Method

    (c). Concepts of Time and Space in Physical Geography

    (d). Study of Form or Process?

    (e). Descriptive Statistics

    (f). Hypothesis Testing

    (g). Inferential Statistics: Comparison of Sample Means

    (h). Inferential Statistics: Regression and Correlation

    http://www.physicalgeography.net/fundamentals/3a.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3c.htmlhttp://www.physicalgeography.net/fundamentals/3d.htmlhttp://www.physicalgeography.net/fundamentals/3e.htmlhttp://www.physicalgeography.net/fundamentals/3f.htmlhttp://www.physicalgeography.net/fundamentals/3g.htmlhttp://www.physicalgeography.net/fundamentals/3h.htmlhttp://www.physicalgeography.net/fundamentals/3h.htmlhttp://www.physicalgeography.net/fundamentals/3g.htmlhttp://www.physicalgeography.net/fundamentals/3f.htmlhttp://www.physicalgeography.net/fundamentals/3e.htmlhttp://www.physicalgeography.net/fundamentals/3d.htmlhttp://www.physicalgeography.net/fundamentals/3c.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3a.html
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    2/25

    Physical Geography Textbook

    77 The Science of P hysical Geography

    (a). Scientific Method

    Francis Bacon(1561-1626), a 17th century English philosopher, was the first individual to

    suggest a universal methodology forscience. Bacon believed that scientific method required

    aninductiveprocess of inquiry.Karl Popperlater refuted this idea in the 20th century.

    Popper suggested that science could only be done using adeductivemethodology. The nexttopic (3b) examines Karl Popper's recommended methodology for doing science more

    closely.

    Scienceis simply a way of acquiring knowledge about nature and the Universe. To practice

    science, one must follow a specific universal methodology. The central theme of this

    methodology is the testing of hypotheses. Ahypothesiscan be defined as a proposal intended

    to explain certain facts or observations that has not been formally tested. The overall goal of

    science is to better comprehend the world around us. Various fields of study, like physics,

    chemistry, biology, medicine and the earth sciences, have used science exclusively to expand

    their knowledge base. Science allows its practitioners to acquire knowledge using techniques

    that are both neutral and unbiased.

    The broadest, most inclusive goal of science is to understand (see Figure 3a-1).

    Understanding involves two interconnected processes: explanation and confirmation.

    Explanation is perhaps the most important basic goal of understanding. Explanation consists

    of explaining reality with a system of hypotheses, theories, and laws. Explanation may also

    relate observed phenomena to a system of empirical formulas, or link them to mechanisms

    that are hierarchically structured at both higher and lower levels of function. Atheorycan be

    defined as a collection of logical ideas that are used to explain something. The process of

    testing, refining, and re-testing hypotheses constructs theories. The nature of this

    confirmation process suggests that theories are rarely static..

    Figure 3a-1: Relationship between reality, theory, and understanding in science. This model

    suggests that we develop scientific theories to explain phenomena found in reality. Once a

    theory is established, it must be confirmedby re-examining reality to find contrary data. If

    contrary data is found, the theory is modified to include this new information and the

    confirmation process begins again. The process of validating theories is endless processbecause we can never assume that we have considered all possibilities.

    http://en.wikipedia.org/wiki/Sir_Francis_Baconhttp://en.wikipedia.org/wiki/Sir_Francis_Baconhttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://en.wikipedia.org/wiki/Karl_Popperhttp://en.wikipedia.org/wiki/Karl_Popperhttp://en.wikipedia.org/wiki/Karl_Popperhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/t.html#theoryhttp://www.physicalgeography.net/physgeoglos/t.html#theoryhttp://www.physicalgeography.net/physgeoglos/t.html#theoryhttp://www.physicalgeography.net/physgeoglos/t.html#theoryhttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://en.wikipedia.org/wiki/Karl_Popperhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.physicalgeography.net/physgeoglos/s.html#sciencehttp://en.wikipedia.org/wiki/Sir_Francis_Bacon
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    3/25

    Physical Geography Textbook

    78 The Science of P hysical Geography

    Figure 3a-2: Facilitating tools involved explanation and confirmation.

    Explanation has two important secondary components: idealization and unification (see

    Figure 3a-2). Idealization may be considered to be the condensation of a body of empirical

    fact into a simple statement. In the process of condensation, some detail must be omitted andthe processes and phenomenon abstracted. Idealization may also involve isolating the

    phenomenon from other aspects of the system of interest. A second aspect of explanation is

    the unification of apparently unrelated phenomena in the same abstract or ideal system of

    concepts.

    Another minor goal of science is the confirmation of constructed models or theories

    associated with understanding. Confirmation is accomplished throughhypothesis testing,

    prediction, and by runningexperiments. The next topic (3b) examines these aspects of

    science in greater detail.

    (b). The Hypothetico-Deductive Method

    PhilosopherKarl Poppersuggested that it is impossible to prove a scientific theory true by

    means ofinduction, because no amount of evidence assures us that contrary evidence will

    not be found. Instead, Karl Popper proposed that proper science is accomplished by

    deduction. Deduction involves the process offalsification. Falsification is a particular

    specialized aspect ofhypothesis testing. It involves stating some output from theory in

    specific and then finding contrary cases using experiments or observations. The methodology

    proposed by Popper is commonly known as the hypothetico-deductive method.

    http://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/p.html#predictionhttp://www.physicalgeography.net/physgeoglos/p.html#predictionhttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.britannica.com/eb/article?eu=62405&tocid=0&query=karl%20popperhttp://www.britannica.com/eb/article?eu=62405&tocid=0&query=karl%20popperhttp://www.britannica.com/eb/article?eu=62405&tocid=0&query=karl%20popperhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://www.physicalgeography.net/physgeoglos/f.html#falsificationhttp://www.physicalgeography.net/physgeoglos/f.html#falsificationhttp://www.physicalgeography.net/physgeoglos/f.html#falsificationhttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/f.html#falsificationhttp://www.physicalgeography.net/physgeoglos/d.html#deductionhttp://www.physicalgeography.net/physgeoglos/i.html#inductionhttp://www.britannica.com/eb/article?eu=62405&tocid=0&query=karl%20popperhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/p.html#predictionhttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testing
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    4/25

    Physical Geography Textbook

    79 The Science of P hysical Geography

    Popper's version of scientific method first begins with the postulation of ahypothesis. A

    hypothesis is an educated guess or a theory that explains some phenomenon. The researcher

    then tries to prove or test this scientific theory false throughpredictionorexperimentation

    (seeFigure 3a-2).A prediction is a forecast or extrapolation from the current state of the

    systemof interest. Predictions are most useful if they can go beyond simple forecast. An

    experiment is a controlled investigation designed to evaluate the outcomes of causalmanipulations on some system of interest.

    To get a better understanding of the hypothetico-deductive method, we can examine the

    following geographic phenomena. In the brackish tidal marshes of the Pacific Coast of British

    Columbia and Washington, we find that the plants in these communities spatially arrange

    themselves in zones that are defined by elevation. Near the shoreline plant communities are

    dominated primarily by a single species known as Scirpus americanus. At higher elevations

    on the tidal marsh Scirpus americanus disappears and a species called Carex lyngbyei

    becomes widespread. The following hypothesis has been postulated to explain this unique

    phenomenon:

    The distribution of Scirpus americanus and Carex lyngbyei is controlled by their tolerances to

    the frequency of tidal flooding. Scirpus americanus is more tolerant of tidal flooding than

    Carex lyngbyei and as a result it occupies lower elevations on the tidal marsh. However,

    Scirpus americanus cannot survive in the zone occupied by Carex lyngbyei because not

    enough flooding occurs. Likewise, Carex lyngbyei is less tolerant of tidal flooding than

    Scirpus americanus and as a result it occupies higher elevations on the tidal marsh. Carex

    lyngbyei cannot survive in the zone occupied by Scirpus americanus because too much

    flooding occurs.

    According to Popper, to test this theory a scientist would now have to prove it false. As

    discussed above this can be done in two general ways: 1) predictive analysis; or 2) by way of

    experimental manipulation. Each of these methods has been applied to this problem and the

    results are described below.

    Predictive Analysis

    If the theory is correct, we should find that in any tidal marsh plant community that contains

    Scirpus americanus and Carex lyngbyei that the spatial distribution of these two species

    should be similar in all cases. This is indeed true. However, there could be some other causal

    factor, besides flooding frequency, that may be responsible for these unique spatial patterns.

    Experimental Manipulation

    If the two species are transplanted into the zones of the other they should not be able to

    survive. An actual transplant experiment found that Scirpus americanus can actually grow in

    the zone occupied by Carex lyngbyei, while Carex lyngbyei could also grow at lower Scirpus

    sites. However, this growth became less vigorous as the elevation became lower and at a

    certain elevation it could not grow at all. These results falsify the postulated theory. So the

    theory must be modified based on the results and tested again.

    The process of testing theories in science is endless. Part of this problem is related to the

    complexity of nature. Any one phenomenon in nature is influenced by numerous factors eachhaving its particular cause and effect. For this reason, one positive test result is not conclusive

    http://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/fundamentals/3a.htmlhttp://www.physicalgeography.net/fundamentals/3a.htmlhttp://www.physicalgeography.net/fundamentals/3a.htmlhttp://www.physicalgeography.net/physgeoglos/s.html#systemhttp://www.physicalgeography.net/physgeoglos/s.html#systemhttp://www.physicalgeography.net/physgeoglos/s.html#systemhttp://www.physicalgeography.net/fundamentals/3a.htmlhttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    5/25

    Physical Geography Textbook

    80 The Science of P hysical Geography

    proof that the phenomenon under study is explained. However, some tests are better than

    others and provide us with stronger confirmation. These tests usually allow for the isolation

    of the phenomena from the effects of causal factors. Manipulative experiments tend to be

    better than tests based on prediction in this respect.

    (c). Concepts of Time and Space in Physical Geography

    The concepts oftimeandspaceare very important for understanding the function of

    phenomena in the natural world. Time is important to Physical Geographers because the

    spatial patterns they study can often only be explained inhistoric terms. The measurement of

    time is notabsolute. Time is perceived by humans in arelative fashion by using human

    created units of measurement. Examples of human created units of time are the measurement

    of seconds, minutes, hours, and days.

    Geographers generally conceptualize two types of space.Concrete spacerepresents the realworld or environment.Abstract spacemodels reality in a way that distills much of the

    spatial information contained in the real world. Maps are an excellent example of abstract

    space. Finally, like time, space is also perceived by humans in arelative fashion by using

    human created units of measurement.

    Both time and space are variable in terms ofscale. As such, researchers of natural

    phenomena must investigate their subjects in the appropriate temporal and/or spatial scales.

    For example, an investigator studying a forest ecosystem will have to deal with completely

    different scales of time and space when compared to a researcher examining soil bacteria.

    The trees that make up a forest generally occupy large tracts of land. For example, the boreal

    forest occupies millions of hectares in Northern Canada and Eurasia. Temporally, these trees

    have life spans that can be as long as several hundred years. On the other hand, soil bacteria

    occupy much smaller spatial areas and have life spans that can be measured in hours and

    days.

    (d). Study of Form or Process?

    Physical Geography as a science is experiencing a radical change in philosophy. It is

    changing from a science that was highlydescriptive to one that is increasingly experimentalandtheoretical. This transition represents a strong desire by Physical Geographers to

    understand theprocesses that cause the patterns orforms we see in nature.

    Before 1950, the main purpose of research in Physical Geography was the description of the

    natural phenomena. Much of this description involved measurement for the purpose of

    gaining basic facts dealing withform or spatial appearance. Out of this research Physical

    Geographers determined such things as: the climatic characteristics for specific locations and

    regions of the planet; flow rates of rivers; soil characteristics for various locations on the

    Earth's surface; distribution ranges of plant and animal species; and calculations of the

    amount of freshwater stored in lakes, glaciers, rivers and the atmosphere. By the beginning of

    the 20th century Physical Geographers began to examine the descriptive data that wascollected, and started to ask questions related to why? Why is the climate of urban

    http://www.physicalgeography.net/physgeoglos/t.html#timehttp://www.physicalgeography.net/physgeoglos/t.html#timehttp://www.physicalgeography.net/physgeoglos/t.html#timehttp://www.physicalgeography.net/physgeoglos/s.html#spacehttp://www.physicalgeography.net/physgeoglos/s.html#spacehttp://www.physicalgeography.net/physgeoglos/s.html#spacehttp://www.physicalgeography.net/physgeoglos/c.html#concrete_spacehttp://www.physicalgeography.net/physgeoglos/c.html#concrete_spacehttp://www.physicalgeography.net/physgeoglos/c.html#concrete_spacehttp://www.physicalgeography.net/physgeoglos/a.html#abstract_spacehttp://www.physicalgeography.net/physgeoglos/a.html#abstract_spacehttp://www.physicalgeography.net/physgeoglos/a.html#abstract_spacehttp://www.physicalgeography.net/physgeoglos/s.html#scalehttp://www.physicalgeography.net/physgeoglos/s.html#scalehttp://www.physicalgeography.net/physgeoglos/s.html#scalehttp://www.physicalgeography.net/physgeoglos/s.html#scalehttp://www.physicalgeography.net/physgeoglos/a.html#abstract_spacehttp://www.physicalgeography.net/physgeoglos/c.html#concrete_spacehttp://www.physicalgeography.net/physgeoglos/s.html#spacehttp://www.physicalgeography.net/physgeoglos/t.html#time
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    6/25

    Physical Geography Textbook

    81 The Science of P hysical Geography

    environments different from the climate of rural? Why does hail only form in thunderstorms?

    Why are soils of the world's tropical regions nutrient poor? Why do humid and arid regions

    of the world experience different levels of erosion?

    In Physical Geography, and all other sciences, most questions that deal with why are usually

    queries aboutprocess. Some level of understanding about process can be derived from basicdescriptive data. Process is best studied, however, through experimental manipulation and

    hypothesis testing. By 1950, Physical Geographers were more interested in figuring out

    process than just collecting descriptive facts about the world. This attitude is even more

    prevalent today because of our growing need to understand how humans are changing the

    Earth and its environment.

    Finally, as mentioned above, a deeper understanding of process normally requires the use of

    hypothesis testing, experimental methods, and statistics. As a result, the standard

    undergraduate and graduate curriculum in Physical Geography exposes students to this type

    of knowledge so they can better ask the question why?

    (e). Descriptive Statistics

    Introduction

    Physical Geographers often collect quantitative information about natural phenomena to

    further knowledge in their field of interest. This collected data is then often analyzed

    statistically to provide the researcher with impartial and enlightening presentation, summary,

    and interpretation of the phenomena understudy. The most common statistical analysis

    performed on data involves the determination of descriptive characteristics like measures of

    central tendency and dispersion.

    It usually is difficult to obtain measurements of all the data available in a particular system of

    interest. For example, it may be important to determine the average atmospheric pressure

    found in the center of hurricanes. However, to make a definitive conclusion about a

    hurricane's central pressure with 100% confidence would require the measuring of all the

    hurricanes that ever existed on this planet. This type of measurement is called apopulation

    parameter. Under normal situations, the determination of population parameters is

    impossible, and we settle with a subset measure of the population commonly called an

    estimator. Estimators are determined by taking a representativesampleof the populationbeing studied.

    Samples are normally taken atrandom. Random sampling implies that each measurement in

    the population has an equal chance of being selected as part of the sample. It also ensures that

    the occurrence of one measurement in a sample in no way influences the selection of another.

    Sampling methods arebiasedif the recording of some influences the recording of others or if

    some members of the population are more likely to be recorded than others.

    http://www.physicalgeography.net/physgeoglos/p.html#population_parameterhttp://www.physicalgeography.net/physgeoglos/p.html#population_parameterhttp://www.physicalgeography.net/physgeoglos/p.html#population_parameterhttp://www.physicalgeography.net/physgeoglos/p.html#population_parameterhttp://www.physicalgeography.net/physgeoglos/e.html#estimatorhttp://www.physicalgeography.net/physgeoglos/e.html#estimatorhttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/r.html#randomhttp://www.physicalgeography.net/physgeoglos/r.html#randomhttp://www.physicalgeography.net/physgeoglos/r.html#randomhttp://www.physicalgeography.net/physgeoglos/r.html#randomhttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/e.html#estimatorhttp://www.physicalgeography.net/physgeoglos/p.html#population_parameterhttp://www.physicalgeography.net/physgeoglos/p.html#population_parameter
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    7/25

    Physical Geography Textbook

    82 The Science of P hysical Geography

    Measures of Central Tendency

    Collecting data to describe some phenomena of nature usually produces large arrays of

    numbers. Sometimes it is very useful to summarize these large arrays with a single parameter.Researchers often require a summary value that determines the center in a data sample's

    distribution. In orther words, a measure of the central tendency of the data set. The most

    common of these measures are themean, themedian, and themode.

    Table 3e-1 describes a 15-year series of number of days with precipitation in December for

    two fictitious locations. The following discussion describes the calculation of themean,

    median, andmodefor thissampledata set.

    Table 3e-1: Number of days with precipitation in December for Piney and Steinback, 1967-

    81.

    Year Piney Steinback

    1967 10 12

    1968 12 12

    1969 9 13

    1970 7 15

    1971 10 13

    1972 11 9

    1973 9 161974 10 11

    1975 9 12

    1976 13 13

    1977 8 10

    1978 9 9

    1979 10 13

    1980 8 14

    1981 9 15

    (Xi) 144 187

    N 15 15

    Themeanvalues of these two sets is determined by summing of the yearly values divided by

    the number of observations in each data set. In mathematical notation this calculation would

    be expressed as:

    mean( ) = (Xi)/N

    where Xiis the individual values, N is the number of values, and issigma, a sign used to

    show summation.

    http://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/s.html#samplehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#mean
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    8/25

    Physical Geography Textbook

    83 The Science of P hysical Geography

    Thus, the calculate means for Piney and Steinback are:

    Pineymean= 10 (rounded off)

    Steinbackmean= 13 (rounded off)

    Themodeof a data series is that value that occurs withgreatest frequency. For Piney, the

    most frequent value is 9 which occurs five times. The mode for Steinback is 13.

    The third measure of central tendency is called themedian. The median is the middle value

    (or the average of the two middle values in an even series) of the data set when the

    observations are organized in ascending order. For the two locations in question, the medians

    are:

    Piney

    9, 9, 10, 11, 12, 12, 12, 13, 13, 13, 13, 14, 15, 15, 16

    median= 13

    Steinback

    7, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 11, 12, 13

    median= 9

    Measures of Dispersion

    Measures of central tendency provide no clue into how the observations are dispersed within

    the data set. Dispersion can be calculated by a variety of descriptive statistics including the

    range,variance, andstandard deviation. The simpest measure of dispersion is therange.

    Therangeis calculated by subtracting the smallest individual value from the largest. When

    presented together with themean, this statistic provides a measure of data set variability. The

    range, however, does not provide any understanding to how the data are distributed about themean. For this measurement, thestandard deviationis of value.

    The following information describes the calculation of therange,variance, and standard

    deviation for the data set in Table 3e-2.

    http://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#medianhttp://www.physicalgeography.net/physgeoglos/m.html#modehttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#mean
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    9/25

    Physical Geography Textbook

    84 The Science of P hysical Geography

    Table 3e-2: Dates of the first fall frost at Somewhere, USA, for an 11-year period.

    Day of First Frost * (Xi) Xi - (Xi - )2

    291 -8 64

    299 0 0279 -20 400

    302 3 9

    280 -19 361

    303 4 16

    299 0 0

    304 5 25

    307 8 64

    314 15 225313 14 196

    (Xi) = 3291

    = 3291/11 = 299

    (Xi - )2

    = 1360

    *The dates are given in year days, i.e., January 1st is day 1, January

    2nd is day 2, and so on throughout the year.

    Therangefor the data is set is derived by subtracting 279 (the smallest value) from 314 (thelargest value). Therangeis 35 days.

    The first step in the calculation ofstandard deviationis to determine thevarianceby

    obtaining the deviations of the individual values (Xi) from the mean ( ). The formula for

    variance(S2) is:

    S2

    = [ (Xi - )2] /(N-1)

    where is the summation sign, (Xi - )2 is calculated (third column), and N is the number of

    observations.Standard deviation(S) is merely thesquare root of thevariance( S2

    ).

    In the case of the Somewhere data, thestandard deviationis:

    S2

    = 1356 / 10

    S = 11.6 or 12 (to the nearest day)

    http://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/v.html#variancehttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/r.html#rangehttp://www.physicalgeography.net/physgeoglos/r.html#range
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    10/25

    Physical Geography Textbook

    85 The Science of P hysical Geography

    This value provides significant information about the distribution of data around the mean.

    For example:

    (a) Themean one samplestandard deviationcontains approximately 68% of the

    measurements in the data series.

    (b) Themean two samplestandard deviationscontains approximately 95% of the

    measurements in the data series.

    In Somewhere, the corresponding dates for fall frosts one and twostandard deviations

    from themean(day 299) are:

    Minus two standard deviations: 299 - 24 = 275

    Minus one standard deviation: 299 - 12 = 287

    Plus one standard deviation: 299 + 12 = 311

    Plus two standard deviations: 299 + 24 = 323

    The calculations above suggest that the chance of frost damage is only 2.5% on October 2nd

    (day 275), 16% on October 15th (day 287), 50% on October 27th (day 299), 84% on

    November 8th (day 311), and 97.5% on November 20th (day 323).

    (f). Hypothesis Testing

    In section3b, we discovered that an important component of scientific method was the

    testing ofhypotheseseither throughexperimentsorpredictiveforms of analysis. A

    hypothesis can be defined as a tentative assumption that is made for the purpose of empirical

    scientific testing. A hypothesis becomes a theory of science when repeated testing produces

    the same conclusion.

    In most cases, hypothesis testing involves the following structured sequence of steps. The

    first step is the formulation of anull hypothesis. The null hypothesis is the assumption that

    will be maintained by the researcher unless the analysis of data provides significant evidence

    to disprove it. The null hypothesis is denoted symbolically as H0. For example, here is aformulated null hypothesis related to the investigation of precipitation patterns over adjacent

    rural and urban land-use types:

    H0 : There is no difference in precipitation levels between urban and adjacent rural areas.

    The second step of hypothesis testing is to state thealternative hypothesis(H1). Researchers

    should structure their tests so that all outcomes are anticipated before the tests and that results

    can be clearly interpreted. Some tests may require the formulation of multiple alternative

    hypotheses. However, interpretation is most clear cut when the hypothesis is set up with only

    one alternative outcome. For the example dealing with precipitation patterns over adjacent

    rural and urban land-use types, the alternative might be:

    http://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/p.html#base_levelhttp://www.physicalgeography.net/physgeoglos/e.html#experimenthttp://www.physicalgeography.net/physgeoglos/h.html#hypothesishttp://www.physicalgeography.net/fundamentals/3b.htmlhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/m.html#meanhttp://www.physicalgeography.net/physgeoglos/s.html#standard_deviationhttp://www.physicalgeography.net/physgeoglos/m.html#mean
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    11/25

    Physical Geography Textbook

    86 The Science of P hysical Geography

    H1 : There is an increase in precipitation levels in urban areas relative to adjacent rural areas

    because of the heating differences of the two surface types (the urban area heats up more and

    has increased convective uplift).

    Step three involves the collection of data for hypothesis testing. It is assumed that this data is

    gathered in an unbiased manner. For some forms of analysis that useinferentialstatisticaltests the data must be collected randomly, data observations should be independent of each

    other, and the variables should be normally distributed.

    The fourth step involves testing the null hypothesis through predictive analysis or via

    experiments. The results of the test are then interpreted (acceptance or rejection of the null

    hypothesis) and a decision may be made about future investigations to better understand the

    system under study. In the example used here, future investigations may involve trying to

    determine the mechanism responsible for differences in precipitation between rural and urban

    land-use types.

    Inferential Statistics and Significance Levels

    Statisticians have developed a number of mathematical procedures for the purpose of testing

    hypotheses. This group of techniques is commonly known asinferential statistics(see

    sections3gand3hfor examples). Inferential statistics are available both for predictive and

    experimental hypothesis testing. This group of statistical procedures allow researchers to test

    assumptions about collected data based on the laws of probability. Tests are carried out by

    comparing calculated values of the test statistic to assigned critical values.

    For a given null hypothesis, the calculated value of the test statistic is compared with tables

    of critical values at specified significance levels based on probability. For example, if a

    calculated test statistic exceeds the critical value for a significance level of 0.05 then this

    means that values of the test statistic as large as, or larger than calculated from the data would

    occur by chance less than 5 times in 100 if the null hypothesis was indeed correct. In other

    words, if we were to reject the null hypothesis based on this probability value of the test

    statistic, we would run a risk of less than 5% of acting falsely.

    One-tailed and Two-tailed Tests

    When using some types of inferential statistics the alternative hypothesis may be directional

    or non-directional. A directional hypothesis (or one-sided hypothesis) is used when either

    only positive or negative differences are of interest in an experimental study. For example,when an alternative hypothesis predicts that the mean of one sample would be greater (but

    not less) than another, then a directional alternative would be used. This type of statistical

    procedure is known as aone-tailed test. A non-directional (or two-sided) hypothesis would

    be used when both positive and negative differences are of equal importance in providing

    evidence with which to test the null hypothesis. We call this type of testtwo-tailed.

    http://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/fundamentals/3g.htmlhttp://www.physicalgeography.net/fundamentals/3g.htmlhttp://www.physicalgeography.net/fundamentals/3g.htmlhttp://www.physicalgeography.net/fundamentals/3h.htmlhttp://www.physicalgeography.net/fundamentals/3h.htmlhttp://www.physicalgeography.net/fundamentals/3h.htmlhttp://www.physicalgeography.net/physgeoglos/o.html#one_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/o.html#one_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/o.html#one_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/t.html#two_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/t.html#two_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/t.html#two_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/t.html#two_tailed_statistical_testhttp://www.physicalgeography.net/physgeoglos/o.html#one_tailed_statistical_testhttp://www.physicalgeography.net/fundamentals/3h.htmlhttp://www.physicalgeography.net/fundamentals/3g.htmlhttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statisticshttp://www.physicalgeography.net/physgeoglos/i.html#inferential_statistics
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    12/25

    Physical Geography Textbook

    87 The Science of P hysical Geography

    (g). Inferential Statistics: Comparison of Sample Means

    Introduction

    In Physical Geographyit is often necessary to test whether two samples of the same natural

    event are statistically distinct under different conditions. For example, a research scientistmay want to determine if rainfall from convective storms differs in its intensity in rural and

    adjacent urban landscapes. Information from this investigation can then be used to test

    theories concerning the influence of the urban landscape on the development of

    thunderstorms. Further, this type ofhypothesis testingchanges our general comprehension

    in Physical Geography from simple description to process oriented understanding. A number

    of inferential statistical procedures have been developed to carry out this process. We will

    examine two of the most popular statistical techniques available to researchers.

    Mann-Whitney U test

    In research work it is often necessary to test whether two samples of the same phenomenonare statistically different. One test that is particularly useful for this type of test situation is

    theMann-Whitney U test. This technique isnon-parametricor 'distribution-free 'in nature.

    Non-parametric methods are particularly suited to data that are notnormally distributed.

    Setting up the Null Hypothesis

    This is the first stage of any statistical analysis and states the hypothesis that is to be tested.

    This is the assumption that will be maintained unless the data provide significant evidence to

    discredit it. The null hypothesis is denoted symbolically as H0. For our example, thenull

    hypothesiswould be:

    H0 : there is no difference in precipitation levels between urban and adjacent rural areas.

    It is also necessary to state thealternative hypothesis(H1). In this case the alternative might

    be:

    H1 : there is an increase in precipitation levels in urban areas relative to adjacent rural areas

    because of the heating differences of the two surface types (the urban area heats up more and

    has increased convective uplift).

    Calculation

    To calculate the U-statistic, the values for both sets of samples are ranked together in an

    ascending fashion. When ties occur, the mean rank of all the scores involved in the tie is

    entered for those observations. The rank values for each set of observations are then summed

    separately to determine the following values:

    r1 and r2

    These values are then entered in the formulae shown under Table 3g-1 for the calculation of

    U and U1.

    http://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/n.html#non_parametric_statistical_testhttp://www.physicalgeography.net/physgeoglos/n.html#non_parametric_statistical_testhttp://www.physicalgeography.net/physgeoglos/n.html#non_parametric_statistical_testhttp://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/a.html#alternative_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#null_hypothesishttp://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/n.html#non_parametric_statistical_testhttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testinghttp://www.physicalgeography.net/physgeoglos/h.html#hypothesis_testing
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    13/25

    Physical Geography Textbook

    88 The Science of P hysical Geography

    Table 3g-1: Analysis of convective precipitation levels per storm

    event (mm of rain) between urban and rural areas using the Mann-

    Whitney U test.

    Urban (n1) Rural (n2) Rank (n1) Rank (n2)

    28 14 26 5

    27 20 25 13.5

    33 16 28 8.5

    23 13 20 2.5

    24 18 23 11

    17 21 10 16

    25 23 24 20

    23 20 20 13.5

    31 14 27 5

    23 20 20 13.5

    23 20 20 13.5

    22 14 17 5

    15 11 7 1

    - 16 - 8.5

    - 13 - 2.5

    n1 = 13 n2 = 15 r1 = 267 r2 = 139

    U = n1n2 + {n1(n1 + 1)}/2 - r1U = 13 15 + {13(13 + 1)}/2 - 267 = 19

    U1 = n1n2 + {n2(n2 + 1)}/2 - r2U1 = 13 15 + {15(15 + 1)}/2 - 139 = 176

    where n1 is the number of observations in the first sample, and n2 is the number of

    observations in the second sample.

    The lower of these two values (U and U1) is then taken to determine the significance of the

    difference between the two data sets. Calculated from the data found on Table 3g-1, thevalue ofU is 19 and U1 is 176. The lower value is thus 19. This value is now compared to the

    critical value found on the significance tables for the Mann-Whitney U (Table 3g-2) at a pre-

    determined significance level for the given sample sizes. An important feature of this

    statistical test is that the greater the difference between the two sets of samples, the smaller

    will be the test statistic (i.e., the lower value ofU or U1). Thus, if the computed value is

    lower than the critical value in Table 3g-2, the null hypothesis (H0) is rejected for the given

    significance level. If the computed value is greater than the critical value, we then accept the

    null hypothesis.

    Using a significance level of 0.05 with sample sizes ofn1 = 13 and n2 = 15, the critical value

    in the table for a two-tailed test is 54. Note that this is a two-tailed test, because the directionof the relationship is not specified. The computed value of U is 19, which is much less than

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    14/25

    Physical Geography Textbook

    89 The Science of P hysical Geography

    the tabulated value. Thus, the null hypothesis (H0) is rejected and the alternative hypothesis

    (H1) is accepted.

    Table 3g-2: Critical values of U for the Mann-Whitney U test (P = 0.05).

    n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    1 - - - - - - - - - - - - - - - - - - - -

    2 - - - - - - - 0 0 0 0 1 1 1 1 1 2 2 2 2

    3 - - - - 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8

    4 - - - - - - 3 4 4 5 6 7 8 9 10 11 11 12 13 13

    5 - 0 1 2 2 3 5 6 7 8 9 10 12 13 14 15 17 18 19 20

    6 - - - - - 5 6 8 10 11 13 14 16 17 19 21 22 24 25 27

    7 - - - - - - 8 10 12 14 16 18 20 22 24 26 28 30 32 34

    8 - - - - - - - 13 15 17 19 22 24 26 29 31 34 36 38 41

    9 - - - - - - - - 17 20 23 26 28 31 34 37 39 42 45 48

    10 - - - - - - - - - 23 26 29 33 36 39 42 45 48 52 55

    11 - - - - - - - - - - 30 33 37 40 44 47 51 55 58 62

    12 - - - - - - - - - - - 37 41 45 49 53 57 61 65 69

    13 - - - - - - - - - - - - 45 50 54 59 63 67 72 76

    14 - - - - - - - - - - - - - 55 59 64 67 74 78 83

    15 - - - - - - - - - - - - - - 64 70 75 80 85 90

    16 - - - - - - - - - - - - - - - 75 81 86 92 9817 - - - - - - - - - - - - - - - - 87 93 99 105

    18 - - - - - - - - - - - - - - - - - 99 106 112

    19 - - - - - - - - - - - - - - - - - - 113 119

    20 - - - - - - - - - - - - - - - - - - - 127

    Unpaired Student's t-Test

    Another statistical test used to determine differences between two samples of the same

    phenomenon is the Student's t-test. The Student's t-test, however, differs from the Mann-Whitney U test in that it is used with data that isnormally distributed(parametric).

    Table 3g-3 describes the data from two "treatments" of strawberry plants that were subjected

    to freezing temperatures over an equal period of days. The data displayed are the numbers of

    fruit produced per plant. The treatments consist of genetically engineered and control

    (normal) varieties.

    H0 : there is no difference in the number of strawberries produced by the control and

    genetically engineered varieties.

    H1 : there is a difference in the number of strawberries produced by the control and

    genetically engineered varieties.

    http://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/n.html#normal_distributionhttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/n.html#normal_distribution
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    15/25

    Physical Geography Textbook

    90 The Science of P hysical Geography

    Table 3g-3: Strawberry data.

    Control

    (Xa)

    (Xa)2 Engineered

    (Xb)

    (Xb)2

    10.7 114.49 10.0 100

    6.7 44.89 10.2 104.04

    8.7 75.69 12.0 144

    8.3 68.89 10.5 110.25

    10.6 112.36 10.3 106.09

    8.3 68.89 9.4 88.36

    10.0 100 9.7 94.09

    9.8 96.04 12.7 161.29

    9.1 82.81 10.4 108.16

    9.8 96.04 10.8 116.64

    8.9 79.21 12.3 151.29

    10.3 106.09 11.0 121

    8.3 68.89 12.3 151.29

    9.4 88.36 10.8 116.64

    8.8 77.44 10.6 112.36

    10.9 118.81 10.1 102.01

    9.4 88.36 10.7 114.49

    7.9 62.41 10.2 104.04

    8.3 68.89 9.5 90.25

    8.6 73.96 11.0 121

    11.1 123.21 9.4 88.36

    8.8 77.44 10.2 104.04

    7.5 56.25 11.2 125.44

    8.9 79.21 10.5 110.25

    7.9 62.41 11.9 141.61

    - - 12.3 151.29

    Xa = 227 Xa2 = 2091.04 Xb = 280 Xb2 = 3038.28( Xa)2 = 51,529 ( Xb)2 = 78,400

    = 9.08

    = 10.77

    na = 25 nb = 26

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    16/25

    Physical Geography Textbook

    91 The Science of P hysical Geography

    To test the hypothesis that there is no difference between strawberry varieties we compute:

    where : and are the arithmetic means for groups A and B, na and nb are the number of

    observations in groups A and B, and S2

    is the pooled within-group variance.

    To compute the pooled within variance, we calculate the corrected sum of squares (SS)

    within each treatment group.

    = 2091.04 - 2061.16 = 29.88

    = 3038.28 - 3015.38 = 22.90

    Then the pooled variance is

    = 1.077

    and,

    = 5.814

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    17/25

    Physical Geography Textbook

    92 The Science of P hysical Geography

    This value of t has (na - 1) + (nb - 1) degrees of freedom. If it exceeds the tabular value of t

    (Table 3g-4) at a pre-determined probability level, we can reject the null hypothesis, and the

    difference between the two means would be considered statistically significant (greater than

    would be expected by chance if there is actually no difference).

    In this case, the critical t value with49 degrees of freedom at the 0.01 probability level is

    approximately 2.682. Since our sample is greater than this, the difference is significant at the

    0.01 level and we can reject the null hypothesis.

    The Paired t-Test

    The previous description for the t-test assumed that the random samples are drawn from the

    two populations independently. However, there are some situations where the observations

    are paired. Analyzing paired data is done differently than if the two samples are independent.

    This modified procedure is known as apaired t-test. Most statistical software programs thatperform the Student's t-test have options to select for either a paired or unpaired analysis.

    Table 3g-4: Critical values of Student's t-distribution (2-tailed).

    Degrees

    of

    Freedom

    P=0.10 P=0.05 P=0.02 P=0.01 P=0.001 Degrees

    of

    Freedom

    1 6.314 12.706 31.821 63.657 636.619 1

    2 2.920 4.303 6.965 9.925 31.598 2

    3 2.353 3.182 4.541 5.841 12.924 3

    4 2.132 2.776 3.747 4.604 8.610 4

    5 2.015 2.571 3.365 4.032 6.869 5

    6 1.943 2.447 3.143 3.707 5.959 6

    7 1.895 2.365 2.998 3.499 5.408 7

    8 1.860 2.306 2.896 3.355 5.041 8

    9 1.833 2.262 2.821 3.250 4.781 910 1.812 2.228 2.764 3.169 4.587 10

    11 1.796 2.201 2.718 3.106 4.437 11

    12 1.782 2.179 2.681 3.055 4.318 12

    13 1.771 2.160 2.650 3.012 4.221 13

    14 1.761 2.145 2.624 2.977 4.140 14

    15 1.753 2.131 2.602 2.947 4.073 15

    16 1.746 2.120 2.583 2.921 4.015 16

    17 1.740 2.110 2.567 2.898 3.965 1718 1.734 2.101 2.552 2.878 3.922 18

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    18/25

    Physical Geography Textbook

    93 The Science of P hysical Geography

    19 1.729 2.093 2.539 2.861 3.883 19

    20 1.725 2.086 2.528 2.845 3.850 20

    21 1.721 2.080 2.518 2.831 3.819 21

    22 1.717 2.074 2.508 2.819 3.792 22

    23 1.714 2.069 2.500 2.807 3.767 23

    24 1.711 2.064 2.492 2.797 3.745 24

    25 1.708 2.060 2.485 2.787 3.725 25

    26 1.706 2.056 2.479 2.779 3.707 26

    27 1.703 2.052 2.473 2.771 3.690 27

    28 1.701 2.048 2.467 2.763 3.674 28

    29 1.699 2.045 2.462 2.756 3.659 29

    30 1.697 2.042 2.457 2.750 3.646 30

    40 1.684 2.021 2.423 2.704 3.551 40

    60 1.671 2.000 2.390 2.660 3.460 60

    120 1.658 1.980 2.358 2.617 3.373 120

    (h). Inferential Statistics: Regression and Correlation

    Introduction

    Regression andcorrelation analysis are statistical techniques used extensively in physical

    geography to examine causal relationships between variables. Regression and correlation

    measure the degree ofrelationship between two or more variables in two different but related

    ways. In regression analysis, a singledependentvariable, Y, is considered to be a function of

    one or more independent variables, X1, X2, and so on. The values of both thedependentand

    independentvariables are assumed as being ascertained in an error-free random manner.

    Further, parametric forms of regression analysis assume that for any given value of the

    independent variable, values of the dependent variable are normally distributed about some

    mean. Application of this statistical procedure to dependent and independent variables

    produces an equation that "best" approximates the functional relationship between the dataobservations.

    Correlation analysis measures the degree of association between two or more variables.

    Parametricmethods of correlation analysis assume that for any pair or set of values taken

    under a given set of conditions, variation in each of the variables is random and follows a

    normal distribution pattern. Utilization of correlation analysis on dependent and independent

    variables produces a statistic called thecorrelation coefficient(r). The square of this

    statistical parameter (thecoefficient of determinationor r2) describes what proportion of the

    variation in the dependent variable is associated with theregression of an independent

    variable.

    http://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/p.html#parametric_testshttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variable
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    19/25

    Physical Geography Textbook

    94 The Science of P hysical Geography

    Analysis of variance is used to test the significance of the variation in the dependent variable

    that can be attributed to theregression of one or more independent variables. Employment of

    this statitical procedure produces acalculated F-value that is compared to acritical F-values

    for a particular level of statistical probability. Obtaining a significant calculated F-value

    indicates that the results of regression and correlation are indeed true and not the consequence

    of chance.

    Simple Linear Regression

    In a simple regression analysis, onedependent variableis examined in relation to only one

    independent variable. The analysis is designed to derive an equation for the line that best

    models the relationship between the dependent and independent variables. This equation has

    the mathematical form:

    Y = a + bX

    where, Y is the value of the dependent variable, X is the value of the independent variable, a

    is the intercept of the regression line on the Y axis when X = 0, and b is the slope of the

    regression line.

    The following table contains randomly collected data on growing season precipitation andcucumber yield (Table 3h-1). It is reasonable to suggest that the amount of water received on

    a field during the growing season will influence the yield of cucumbers growing on it. We

    can use this data to illustate how regression analysis is carried out. In this table, precipitation

    is our independent variable and is not affected by variation in cucumber yield. However,

    cucumber yield is influenced by precipitation, and is therefore designated as the Y variable in

    the analysis.

    Table 3h-1: Cucumber yield vs precipitation data for 62

    observations.

    Precipitation

    mm (X)

    Cucumbers

    kilograms per m2

    (Y)

    Precipitation

    mm (X)

    Cucumbers

    kilograms per m2

    (Y)

    22 .36 103 .74

    6 .09 43 .64

    93 .67 22 .50

    62 .44 75 .39

    84 .72 29 .30

    14 .24 76 .61

    52 .33 20 .29

    http://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variable
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    20/25

    Physical Geography Textbook

    95 The Science of P hysical Geography

    69 .61 29 .38

    104 .66 50 .53

    100 .80 59 .58

    41 .47 70 .62

    85 .60 81 .66

    90 .51 93 .69

    27 .14 99 .71

    18 .32 14 .14

    48 .21 51 .41

    37 .54 75 .66

    67 .70 6 .18

    56 .67 20 .21

    31 .42 36 .29

    17 .39 50 .56

    7 .25 9 .13

    2 .06 2 .10

    53 .47 21 .18

    70 .55 17 .17

    6 .07 87 .63

    90 .69 97 .66

    46 .42 33 .18

    36 .39 20 .06

    14 .09 96 .58

    60 .54 61 .42

    X = 3,050

    = 49.1935

    Y = 26.62

    = 0.4294

    n = 62

    Often the first step in regression analysis is to plot the X and Y data on a graph (Figure 3h-

    1). This is done to graphically visualize the relationship between the two variables. If there isa simple relationship, the plotted points will have a tendancy to form a recognizable pattern (a

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    21/25

    Physical Geography Textbook

    96 The Science of P hysical Geography

    straight line or curve). If the relationship is strong, the pattern will be very obvious. If the

    relationship is weak, the points will be more spread out and the pattern less distinct. If the

    points appear to fall pretty much at random, there may be no relationship between the two

    variables.

    Figure 3h-1: Scattergram plot of the precipitation and cucumber yield data found in Table

    3h-1. The distribution of the data points indicates a possible positive linear relationship

    between the two variables.

    The type of pattern (straight line, parabolic curve, exponential curve, etc.) will determine the

    type of regression model to be applied to the data. In this particular case, we will examinedata that produces a simple straight-line relationship (see Figure 3h-1). After selecting the

    model to be used, the next step is to calculate the corrected sums of squares and products

    used in a bivariate linear regression analysis. In the following equations, capital letters

    indicate uncorrected values of the variables and lower-case letters are used for the corrected

    parameters in the analysis.

    The corrected sum of squares for Y:

    y2 = Y2 -

    = (0.362

    + 0.092

    + ... + 0.422) - (26.62

    2) / 62

    = 2.7826

    The corrected sum of squares for X:

    x2 = X2 -

    = (222

    + 62

    + ... + 612) - (3,050

    2) / 62

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    22/25

    Physical Geography Textbook

    97 The Science of P hysical Geography

    = 59,397.6775

    The corrected sum of products:

    xy = (XY) -

    = ((22)(.36) + (6)(.09) + ... + (61)(.42)) - ((26.62)(3,050)) / 62

    = 354.1477

    As discussed earlier, the general form of the equation for a straight line is Y = a + bX. In this

    equation, a and b are constants or regression coefficients that are estimated from the data set.

    Based on the mathematical procedure of least squares, the best estimates of these coefficients

    are:

    = (354.1477) / (59,397.6775) = 0.0060

    a = Y - bX = 0.42935 - (0.0060)(49.1935) = 0.1361

    Substituting these estimates into the general linear equation suggests the following

    relationship between the Y and X variables:

    = 0.1361 + 0.0060X

    where indicates that we are using an estimated value ofY.

    With this equation, we can estimate the the number of cucumbers (Y) from the measurements

    of precipitation(X) and describe this relationship on ourscattergram with a best fit straight-

    line (Figure 3h-2). Because Y is estimated from a known value ofX, it is called the

    dependent variableand X theindependent variable. In plotting the data in a graph, the

    values ofY are normally plotted along the vertical axis and the values ofX along the

    horizontal axis.

    http://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/i.html#independent_variablehttp://www.physicalgeography.net/physgeoglos/d.html#dependent_variable
  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    23/25

    Physical Geography Textbook

    98 The Science of P hysical Geography

    Figure 3h-2: Scattergram plot of the precipitation and cucumber yield data and theregression model best fit straight-line describing the linear relationship between the

    two variables.

    Regression Analysis and ANOVA

    A regression model can be viewed of as a type of moving average. The regression equation

    attempts to explain the relationship between the Y and X variables through linear association.

    For a particular value ofX,the regression model provides us with an estimated value ofY.

    Yet, Figure 3h-2 indicates that many of the plotted values of the actual data are observed to

    be above the regression line while other values are found below it. These variations are

    caused either by sampling error or the fact that some other unexplained independent variableinfluences the individual values of the Y variable.

    Thecorrected sum of squares for Y (i.e., y2) determines the total amount of variation thatoccurs with the individual observations ofY about the mean estimate of . The amount of

    variation in Y that is directly related with the regression on X is called theregression sum of

    squares. This value is calculated accordingly:

    Regression SS = = (354.1477)2 / (59,397.6775)= 2.1115

    As discussed above, the total variation in Y is determined by y2 = 2.7826. The amount ofthe total variation in Y that is not associated with the regression is termed theresidual sum of

    squares. This statistical paramter is calculated by subtracting theregression sum of squares

    from thecorrected sum of squares for Y ( y2):

    Residual SS = y2 - Regression SS

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    24/25

    Physical Geography Textbook

    99 The Science of P hysical Geography

    = 2.7826 - 2.1115

    = 0.6711

    The unexplained variation can now be used as a standard for testing the amount of variation

    attributable to the regression. Its significance can be tested with theF testfrom calculations

    performed in anAnalysis of Variancetable.

    Source of variation df 1 SS MS 2

    Due to regression 1 2.1115 2.1115

    Residual (unexplained) 60 0.6711 0.0112

    Total 61 2.7826 -

    1 There were 62 values ofY analyzed and therefore n = 62. The total sum of squaresdegrees

    of freedom (df) is determined as n-1 or 61. The regression ofY on X has 1degree of

    freedom. The residual or unexplaineddegrees of freedom is determined by subtracting

    regression df (1) from total sum of squares df (61).

    2 MS is calculated as SS / df.

    Using theAnalysis of Variance procedure, the regression is tested by determining the

    calculatedFstatistic:

    F =(Regression MS) / (Residual SS)=(2.1115) / (0.0112)= 188.86

    To test this statistic we use atable of F to determine acritical test value for a probability of

    0.01 or 1% (this relationship can occur by chance only in 1 out 100 cases) and with 1,60

    degrees of freedom. According to the table thecritical test value is 7.1. In this test, the

    relationship is deemed significant if the calculatedF statistic is greater than thecritical test

    value. This regression is statistically significant at the 0.01 level because 188.86 is greater

    than 7.1.

    Caution must be taken when interpreting the results of regression. In our example, we found a

    significant relationship between precipitation and cucumber yield. However, this conclusion

    may not be the result of a causal relationship between the two variables. A third variable that

    is directly associated to both precipitation and cucumber yield may beconfounding the

    interpretation of the analysis. Absolute verification of associations between variables can

    only be confirmed with experimental manipulation.

  • 8/8/2019 Chapter 3 - The Science of Physical Geography

    25/25

    100 The Science of P hysical Geography

    Coefficient of Determination

    To measure how strong the correlation is between the two variables, we can determine the

    amount of the total variation in Y that is associated with the regression model. This ratio is

    sometimes called thecoefficient of determinationand is represented by the symbol r2. The

    value of the coefficient of determination ranges from 1.00 to 0.00. The calculatedcoefficientof determination from the data set above was 0.76 or 76% (as calculated below). This value

    suggests that 76% of the variation in Y was associated with the change seen X from the data

    set observations.

    Coefficient of determination = (Regression SS) / (Total SS)

    = (2.1115) / (2.7826) = 0.7588

    Correlation Coefficient

    Another useful regression statistic that measures the strength of the correlation between to

    variables is thecorrelation coefficient.This statistic is often represented by the symbol "r"

    and is determined by taking the square-root of thecoefficient of determination. The value of

    the correlation coefficient ranges from 1.00 to -1.00. A value of 0.0 indicates that there is

    absolutely no relationship between the X and Y variables. The strength of the relationshipbetween the X and Y variables increases as the value of r approaches 1.00 and -1.00. Perfect

    correlation occurs if r equals either 1.00 (perfect positive) or -1.00 (perfect negative). Positive

    correlation coefficients indicate that an increase in the value of the X variable results in an

    increase in the value of the Y variable. Negative correlation coefficients indicate that an

    increase in the value of the X variable results in a decrease in the value of the Y variable.

    http://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determinationhttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#correlation_coefficienthttp://www.physicalgeography.net/physgeoglos/c.html#coefficient_of_determination