david j. scott department of statistics, university of
TRANSCRIPT
![Page 1: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/1.jpg)
RDavid J. Scott
��� �� � � � �� � �� � ��� � � � �Department of Statistics, University of Auckland
R – p.1/196
![Page 2: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/2.jpg)
Outline
Introduction: the S language
Basic S concepts
Practicalities: help, commands, running R, ESS
Data structuresNumbers and vectorsObjects, modes and attributesFactorsArrays and matricesLists and data frames
Vectors
Matrices
Text handling in R
Importing and exporting dataR – p.2/196
![Page 3: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/3.jpg)
Outline (continued)
Programming in S
Dates and times
MiscellaneaScopeCustomising the environmentInstalling packagesAccessing source codeMathematical expressionsProbability distributionsOptimisation
An extended example: the Weibull distribution
R – p.3/196
![Page 4: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/4.jpg)
Outline (continued)
S graphics
Trellis graphics and
� � � � � ��
Sweave
Writing functions and packages
S4 methods
R – p.4/196
![Page 5: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/5.jpg)
Resources
CRAN (Comprehensive R Archive Network), local mirror
�� � �� � ��� � � � � �� �� � � � � �� � �
An Introduction to R
�� � �� � ��� � � � � �� �� � � � � �� � � ��� � ��� � � �� � �� � � � � � � � �
A Reference Card for R from Tom Short
� � � � � �� �� � � � � �� � �� � � � � � � �� �� �� � � � � � � � � ��� � � � � �
The ESS Manual
�� � �� � � � � � � � �� �� � � � � �� � �� � � � � � � �� �� � � � � � � � �� � � �
A Reference Card for ESS
� � � � � �� �� � � � � �� � �� � � � � � � �� �� � � � � � � � � � � � � � �
The Sweave User Manual
� � � � � �� �� � � � � �� � �� � � � � � � �� �� � � � � � � � � � � �� � � ! !" !# !$ � � �
R – p.5/196
![Page 6: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/6.jpg)
Introduction: The S Language
R – p.6/196
![Page 7: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/7.jpg)
The S Language
S is a language and computational environment designedspecifically for carrying out “statistical” computations.
It was designed by statisticians so that they could easilydevelop and deploy new statistical methodology.
It brings good ideas on numerical methods, graphics andlanguage design together in a single useful package.
It represents a philosophy of turning ideas into programsquickly and easily.
R – p.7/196
![Page 8: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/8.jpg)
History
The S language was developed at AT&T/Lucent BellLaboratories by John Chambers, with the assistance ofRick Becker, Allan Wilks and Duncan Temple Lang.
The development of the language was aided by collaborationof the AT&T researchers with a much wide group ofacademics and practitioners.
The developers of S language recognise a number of distinctversions of S which came about during the development of thelanguage.
R – p.8/196
![Page 9: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/9.jpg)
S Version 1
Released in 1980.
The first version of S released outside of Bell Labs.
Used primarily in North American universities.
It is described in the bookS : A Language and Environment ForData Analysis and Graphics
(also known as the “brown” book).
R – p.9/196
![Page 10: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/10.jpg)
S Version 2
Released in 1987.
Based initial experiences developing S, Chambers developeda new environment called
�� �
.
This added the ability for users to define their own extensionsto the language. This version of S is described in the book
The NEW S Language(also known as the “blue” book).
R – p.10/196
![Page 11: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/11.jpg)
S Version 3
Released in 1990.
Added some basic object-oriented facilities, which were seenas essential for providing good modelling facilities.
Included “State of the Art” modelling software.
These extensions are described in the bookModels in S
also known as the “white” book.
R – p.11/196
![Page 12: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/12.jpg)
S Version 4
Released in 2000.
Contains a rather more sophisticated version ofobject-oriented programming and a number of otherdevelopments which support programming activities with S.
The initial versions of
��
ran very slowly, and this has tendeddelay its acceptance by the S programming community.
This version of S is described in the bookProgramming with Data
(also known as the “green book”).
R – p.12/196
![Page 13: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/13.jpg)
The ACM Software Systems Award
The importance of John Chambers’ work on S was recognisedin 1998, when he was awarded the ACM award for softwaresystems.
The ACM award is the world’s most prestigious award forsoftware engineering.
Other winners of the ACM award include Thompson andRitchie for the UNIX operating system and Don Knuth for theTEX typesetting system.
R – p.13/196
![Page 14: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/14.jpg)
S Implementations
S This was the version of S used internally for re-search purposes at Bell Labs. A small group of ex-ternal researchers had access to this version.
S-PLUS In the late 1980s AT&T sold the rights to sell acommercial version of S to Seattle-based StatisticalSciences. The company has since been renamedtwice and is now known as Insightful .
R In the early 1990s Robert Gentleman andRoss Ihaka University of Auckland developed an al-ternative free S implementation. It is called R af-ter its original developers: Robert Gentleman andRoss Ihaka.
R – p.14/196
![Page 15: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/15.jpg)
R
R is now developed by the R-Core Development Team
New releases are produced around every six months
In addition to base R there are a number of recommendedpackages which comprise the basic R distribution
There are around 1000 additional packages produced byusers of R
The source code and compiled versions of R for Windows,Macintosh and various Linux distributions (debian, redhat,ubuntu, . . . ) are available from CRAN
CRAN also has user-produced packages and documentation
R – p.15/196
![Page 16: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/16.jpg)
Basic S Concepts
R – p.16/196
![Page 17: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/17.jpg)
Basic S Concepts
S is a computer language which is processed by a specialprogram called an interpreter
The interpreter reads and evaluates S language expressions,and prints the values determined for the expressions
The interpreter indicates that it is expecting input by printingits prompt at the start of a line
By default the S prompt is a greater than sign �.
On UNIX or LINUX machines you can start Splus by typing“
� � ��� �” and R by typing “�
”
R – p.17/196
![Page 18: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/18.jpg)
Using S as a Calculator
A user types expressions to the S interpreter
S responds by computing and printing the answers
� � � �
�� � �
� � � �
�� � ��
� � � �
�� � �� �
R – p.18/196
![Page 19: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/19.jpg)
Grouping and Evaluation
Normal arithmetic rules apply,
� � � � � �
�� � �
But evaluation can be controlled with parentheses
� � � � � � � �
�� � �
R – p.19/196
![Page 20: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/20.jpg)
Built-in Functions
Many mathematical and statistical functions are available.
� � � � � � � �
�� � �
�� � � � �
� ��� � � � �
�� � �� � �� �
� � � �� � � � �
�� � �� � � �� �
R – p.20/196
![Page 21: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/21.jpg)
Assignment
Values are stored by assigning them a name
The resulting name-value pair is called a variable.
The statements:
� � � �
� � ��� �
store the value
� �
under the name �.
You can also write the assignment in the other direction� � � � �
R – p.21/196
![Page 22: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/22.jpg)
Using Variables
Variables can be used in expressions in the same way asnumbers
For example,
� � ��� �
� � ��� � � � �
� ��� � �
R – p.22/196
![Page 23: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/23.jpg)
Special Values
In addition to ordinary numbers, S has special codes whichindicate infinite and undefined numerical values
� � �
�� � � � �
� � � �
�� � � � � �
� �
�� � ��� �
R – p.23/196
![Page 24: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/24.jpg)
Special Values
Infinities and
� � �
s have expected properties
� � � � �
�� � � � �
� � � ��� �
�� � ��� �
� � � � �
�� �
� � � � � �
�� � ��� �
R – p.24/196
![Page 25: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/25.jpg)
Practicalities
R – p.25/196
![Page 26: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/26.jpg)
Help
Help using
��� � � or preceding with
�
��� � � � �� ��� � �
or
� �� ��� �
Other help from
��� � � � � ,
� � � � � � � � �
, and � � � � � � , e.g.
��� � � � � � � � �� � � � �� �
��� � � � � � � �� � � �� ��� � � �� � � � � � � � �
� � � � � � � � � � � � � �
R – p.26/196
![Page 27: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/27.jpg)
Commands
R is case-sensitive
Names are locale dependent, alphanumeric charactersallowed, plus ’ ’ and ’ �’
Commands are either expressions or assignments
Expressions are evaluated and printed
An assignment evaluates an expression and passes the valueto a variable: the result is not automatically printed
An assignment in brackets is printed
�
denotes the start of a comment
R – p.27/196
![Page 28: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/28.jpg)
Commands
� ��� �
� � � � � � � � � � � � �
� � �� �� �
� � � ��� � � � �
� � � � � � � � � � � � � � � � � � � � � � �
R – p.28/196
![Page 29: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/29.jpg)
Running R
Can run in a terminal window: command recall and editing ispossible
Best environment is XEmacs and ESS
On Windows WinEdt plays very nicely with R
Run commands from a file with �� � � �
� �� � � � �� �� � � � � �� �
Put output to a file with � � � �
� � � � � �� �� � � � � � � � �
Return to console output with � � � � � �
See objects in current workspace with � � � � � � � �
or
� � � �
Remove objects with
� � ���
��
� �� ��
� � �
R – p.29/196
![Page 30: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/30.jpg)
XEmacs and ESS
To start R from within XEmacs use M-x R
Most important is to evaluate a region C-c C-r
This leaves you in the commands file
C-c M-r evaluates the region but leaves you in R
Evaluate a line with C-c C-j
To ensure XEmacs uses ESS, put the line� � �� � � � � � � � � � � �
in the file��� � ��� � �
Note that
�� � �� � �
is accessible in XEmacs from the Optionsmenu: select
� �� � � � � � � � � � �
R – p.30/196
![Page 31: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/31.jpg)
Workspace, History, Batch Mode
All objects can be saved at the end of a session
Saved in � � � � in the current directory
Objects will be reloaded if a later session is started in thisdirectory
To avoid confusion of objects, use different directories fordifferent projects
To run in batch mode, on the Unix command line:� �� � � �� � � �� �� � �� �Output will be in
�� � � � �� �� � �
Can be done in the background or with � � � � or �� �� �
See
� � �� � �
There are other commands like this, e.g.
� � � � � � � � �� �
R – p.31/196
![Page 32: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/32.jpg)
Data Structures
R – p.32/196
![Page 33: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/33.jpg)
Data Structures
Vectors: can be numeric, character, or logical
Matrices: just vectors with dimensions added and possibly rowand column names—elements are all of the same type
Factors : for categorical data
Lists: like a general form of vectorelements can be of different typeselements can be vectors or listsused for returning results from functions
Data frames: like matrices, rectangularcolumns can be of different typescommon input to modeling functions
Functions
R – p.33/196
![Page 34: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/34.jpg)
Objects
R operates on objects
Vectors are atomic—all of one type or mode
Lists are recursive structures
Functions and expressions are also recursive structures
Mode and length are intrinsic attributes of objects
Can change mode—coerce objects
Once an object exists its length can be changed
Change attributes with � � �� � �
Finding out about objects: � , � � �� � �,
��� � �
, � � �
, andfor functions � � �
R – p.34/196
![Page 35: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/35.jpg)
Objects
Objects may have a class—used for object-orientedprogramming
Determines the behaviour of � �� , � � � � , and printing
These are generic functions
If no class, default is used, e.g. � �� ��� � � � � is used if youtry and plot the object
Remove the class using � � � � � � �See the available methods with � �� � �
See function code for non-visible functions by using
� � � � � �� � �� �� � �� � � �� ��
� � � � � � � �� � �
or
� � � �� � ��� � � � � � � � � � �� �
Alternatively if the namespace is known, use
� �� �� � �� � � � � � � � � � � ��
R – p.35/196
![Page 36: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/36.jpg)
Factors
Specifies a grouping
Can be ordered or unordered
Used in modeling to do traditional analysis of variance
Use
� � � � or � � � � �
to create factors
See examples from
� � � � �
� � � � allows construction of tables using factors
R – p.36/196
![Page 37: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/37.jpg)
Lists
An ordered collection of components
To create:
� � � � � � � � �� � �� � � ���
� � � � �� � � � ��
�� � � � � � � � � ��
� � � � � � � � � � � � ��
��
� � �
Refer to components using number in double brackets:
� � � � � � �
by name using the separator
�
:
� � � �� �
or by name using the quoted name in double brackets:
� � � �� �� � � � �
� � � � � � �
is different from� � �� �
Lists are constructed using
� � �
R – p.37/196
![Page 38: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/38.jpg)
Data Frames
A list of class
�� � � � �
Components are vectors, factors, numeric matrices, lists, orother data frames
Matrices, lists, and data frames provide as many variables tothe new data frame as they have columns, elements, orvariables, respectively
Numeric vectors, logicals and factors are included as is
Character vectors are coerced to factors
Vector structures in the data frame must all have the samelength
Matrix structures must have the same row size
R – p.38/196
![Page 39: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/39.jpg)
Data Frames
Create with
�� � � � �
� � �� � � � � � � � �� � � � � � �� � � � � � ��
�� � � � � �� � ��
� �� � � � �� � � �Coerce for example matrices using � � �� � � � �
Instead of using
�
notation, such as � � �� � � � � � � � � � �
use � � � � � �
and
��� � � � � �
� � � � � � �� � �
� � � � � � � � � �� � � � � � � � � � �� � � �� � � � � �
� � � � � � � � �� � � � � � �
� � � � � � �� � �
�� �
� � � � � � � � � � � � � � � � ��
� � � � � � � �
� ��� � � � � � � �� � � � � � �
R – p.39/196
![Page 40: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/40.jpg)
Vectors
R – p.40/196
![Page 41: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/41.jpg)
Vectors
S works naturally with vectors of values
The simple way to combine scalar values into a vector is usingthe � � �
function
� � ��� � � ���
��
��
��
� �
� �
�� � � � � � �
Vectors can be manipulated just like scalar values
� � � �
�� � � � � � �
� � � �
�� � �� � �� � �
� � � �
R – p.41/196
![Page 42: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/42.jpg)
Properties of Vectors
The functions
� � � � �
and � ��� return information about thevalues stored in vectors
� � ��� � � ���
��
��
��
� �
� ��� � � � � � �
�� � �
� � � �� � � �
�� � � �� � � ��R – p.42/196
![Page 43: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/43.jpg)
Simple Subsetting
Individual vector elements can be accessed with a subsettingmechanism
� � ��� � � ���
��
��
��
� �
� � �� �
�� � �
� � � � �
�� � �
R – p.43/196
![Page 44: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/44.jpg)
Vector Subsets
Extracting vector subsets:
� � ��� � � ���
��
��
��
� �
� � � � � ���
��
� � �
�� � � � �
� � �� � �
�� � � � � �
When subscripts are all negative, all elements except thoseare extracted
R – p.44/196
![Page 45: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/45.jpg)
Logical Subsetting
Values can be extracted using logical conditions
For example, the command � � � � � � extracts all elements from
� which are greater than 2
� � ��� � � ���
��
��
��
� �
� � � � � � �
�� � � �
Logical subsetting can be very expressive.
� � � � � � � �� � � � � � ��� � � � � � � � � �
R – p.45/196
![Page 46: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/46.jpg)
Modifying Subsets
It is possible to change the values in a subset of a vector byusing subsetting in conjunction with assignment
For example, the expression
� � � ��� � � ��� �
will change the first two elements of the vector � to
�
.
R – p.46/196
![Page 47: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/47.jpg)
Patterned Sequences
S has a number of ways of generating vectors containingspecial sequences of values
The one that is most commonly used is the sequence operator“ � ”
The expression ��� � ��� returns the sequence of integer valuesfrom �� to � � .
� �� �
�� � � � � � � � � � � �
� � � �
�� � � � � � � � � � � �
R – p.47/196
![Page 48: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/48.jpg)
Patterned Sequences Using �
The vectors created with � can be quite large, and whenprinted they may span several lines.
� �� �
�� � � � � � � � � � � � � � � �
�� � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � �� � � � � � � � � � � � � � ��
� � � � � � �� � � � �� � � � � � � � � �� � � ��
� � � � � � �
The value in brackets at the start of each line gives the indexof the first value on the line
This can make it easier to locate particular values
R – p.48/196
![Page 49: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/49.jpg)
Patterned Sequences Using � �More general sequences can be created by using the “ � � �”function
The expression � � � � �
��
� � � � �
generates the sequence ofvalues from 0 to 5 in steps of 0.2
� �� � ��
��
��� � � � �
�� � � �� �
� �
� �� � �� �
� �
�
�� � �� � �� �
� �
� �� � �� ��
�
�� � � �
� �� � �� �
� �
� �� �
It is also possible to create sequences of a specified length
� �� � ��
��
� � � � � � �
�� � � � � �� � �� �� �� �
R – p.49/196
![Page 50: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/50.jpg)
Patterned Sequences Using � �Another function which is useful for creating patternedsequences is the function “ � �”, which repeats its firstargument according to the value of its second
The second argument can either be a single count, giving thenumber of times to repeat the first
� �� � � ��� ��
� �
�� � � � � � � � � � � � � � � � �
Or it can be a vector of counts, indicating how many times torepeat each element of the first argument
� �� � � ��� ��
� � ��
��
� � �
�� � � � � � � � � � � �
R – p.50/196
![Page 51: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/51.jpg)
Arithmetic on Vectors
S has very general capabilities for vector arithmetic
Arithmetic operations are carried out on vectors according toan element recycling rule
Under this rule, when vectors of different lengths arecombined in an arithmetic operation, the shorter vector is firstenlarged to match the length of the longer vector by recyclingits elements
Then the vectors are combined element by element
R – p.51/196
![Page 52: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/52.jpg)
Recycling Rule Example
Consider adding the vectors
� � �
and
� � �
��
��
��
� ��
��������
��
��
�
���������
recycle� �
���������
��
��
��
���������
��
��������
��
��
�
���������
add� �
���������
��
�
��
���������
This is how the result appears when computed with S
� ��� � � ��� �
�� � � � � � � �R – p.52/196
![Page 53: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/53.jpg)
Summary Operations on Vectors
The following functions return single-value numericalsummaries for the values in a vector or collection of vectors
� � � � �
the sum of the elements of �
� � � � � �
the product of the elements of �
� � � � �
the minimum of the elements of �
� � � � �
the maximum of the elements of �
The � � � � function computes a two value summary.
� � � �� � ��� � �
�� � � � R – p.53/196
![Page 54: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/54.jpg)
Examples
The � � �
function can be used to compute factorials
For example
�� �
� � �� � � �� � �
�� � �� �� �
Beware that very large factorials can’t be represented withfinite precision arithmetic (
�� � �
is too large)
� � �� � � �� � �
�� � � � �
The following computation shows how to compute binomialcoefficients.
� � �� � � �� � � � � �� � � ��� ��
�� �
�� � � �
R – p.54/196
![Page 55: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/55.jpg)
Cumulative Summaries
There are also cumulative versions called � � � � , � � � � �
,
� � � �, and � � � �
These produce a vector which consists of the summarycomputed for; the first element, first two elements, the firstthree elements and so on
� � � � � � � � ��� � �
�� � � � � � � � � � �� �� � � � �
� � � � � �� � � �� � �
�� � � � � � �
� � � � � � � � � � � �
� � � �� � � � �� �� �
R – p.55/196
![Page 56: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/56.jpg)
Statistical Summaries
There are a number of summaries which compute quantitiesof statistical interest
The most important of these are:
� � � � � �
the mean of the elements of �,
� � � � � � � �
the median of the elements of �,
� � � � �
the variance of the elements of �.
In addition to these, R also has
� � � � �
the s.d. of the elements of �.
R – p.56/196
![Page 57: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/57.jpg)
Sample Quantiles
The �� � � � � � function computes summaries based on thepercentiles of the values in its argument
The simplest use of �� � � � � � computes the median, upperand lower quartiles, and extremes
� � � � � � � � � ��� � �
� � � � � � � � � � �
� �� � �� �
� � � �
An optional second argument to �� � � � � � can be used tospecify a different set of quantiles
� � � � � � � � � ��� ��
� � � � �
� � � � � � � � � � � � � � �
� �� �� ��
� �
� �� �
� ��
� � � � � �
�� �� � �
R – p.57/196
![Page 58: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/58.jpg)
Character Strings
Any value in quotes is regarded by R as being a characterstring.
� � �� � � � � �� � �� � � �� �� � � � �
� � �� � � � �
�� � � ��� � ���
�� � ��
This is true, even when the value appears to be numeric
� � � �� � ��� � � � �
� � � �� �
�� � � � � �
Character strings can be combined to form vectors
� � � � �� � � � �� � � �� � �
�� � � ��� � ���
�� � �� � � � �
R – p.58/196
![Page 59: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/59.jpg)
Type Coercion
It is possible to combine numbers and character strings into asingle vector using “ �”
In this case, the numbers are first converted to strings.
� � � � �� � � � ��
� �
�� � � ��� � ���
�� � �� � � �
This an example of automatic type coercion
Arithmetic on a mixture of strings and numbers does not work,however
� � � � � �
� � � � � � � � � � �� � � �� � � � � �� � �
� � � �� � � � � � �
R – p.59/196
![Page 60: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/60.jpg)
Naming Vector Elements
Individual elements of a vector can be given names
� � ��� ��� �
� �
�� � � � � � �
� � ��� � � � � ��
� � ��
� � ��
� � ��
�� � �
� ��� � � � � � �� � �� � �� � � �
� � � � � � � � �� �
� �
� � � � �
� � � � �
Element names are printed above the elements, indexinginformation at the start of a line is dropped
R – p.60/196
![Page 61: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/61.jpg)
Names and Subsetting
Names are preserved during subsetting
� � � �� � �
� �� �
It is also possible to extract subsets by using the names assubscripts
� � � � � � � ��
� � � � �
� �
� �
R – p.61/196
![Page 62: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/62.jpg)
Matrices
R – p.62/196
![Page 63: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/63.jpg)
Matrices
In addition to vectors, S has a wide range of data structures.
� � ��� � � � � � � � ��� ��
�� � � ��
�� � � � �
creates a
� � �
matrix
The value can be viewed as follows:
� �
��� � ��
� �
���
� � �
� ��
� � �
� ��
� � �
Notice that the elements are inserted into the matrix in columnmajor order
R – p.63/196
![Page 64: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/64.jpg)
Row-Major Order
By default, values are inserted into matrices in column majororder
It is also possible to specify that the matrix be filled in rowmajor order
� � ��� � � � � � � � ��� �� �� � � �� �� � � ��
� ��� �� � � �� �� �
� �
��� � ��
� �
���
� � �
� ��
� � �
� ��
� � �R – p.64/196
![Page 65: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/65.jpg)
Dimensioning Information
Matrix dimensions can be obtained in a number of ways
� � ��� � � � � � � � ��� �� �� � � �� �� � � � �
� �� �� � �
�� � �
� �� � � � �
�� � �
� � � � � � �
�� � � �
R – p.65/196
![Page 66: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/66.jpg)
Diagonal Matrices
The function
� � � � can be used to create diagonal matricesand to extract the diagonals from matrices
� � ��� � � � � � ��� � �
� �
��� � ��
� � ��
� �
���
� �
� ��
� �
� ��
� �
� � � � � � � �
�� � � � �R – p.66/196
![Page 67: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/67.jpg)
Special Matrix Forms
There is a special form of call to
� � � � which can be used tocreate identity matrices
� � � � � � � �
��� � ��
� � ��
� �
���
� �
� ��
� �
� ��
� �
It is also easy to construct a block matrix of ones
� � � � � � � � ��� �� � � �� �� � � � �
��� � ��
� � ��
� �
���
� � � �
� ��
� � � �
� ��
� � � �
R – p.67/196
![Page 68: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/68.jpg)
Matrix Arithmetic
The standard arithmetic operations are all defined formatrices, and take place elementwise
� � ��� � � � � � � � ��� �� �� � � �� �� � � � �
� � ��� � � � � � � � ��� ��
�� � � ��
�� � � ��
� ��� �� � � �� �� �
� � � ���� � ��
� �
���
� � �
� ��
� � �
� ��
� � � �
Note that
� � �
is the elementwise product of
�
and
�
, not thematrix product
R – p.68/196
![Page 69: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/69.jpg)
Other Matrix Operations
The function computes the transpose of its argument
� � � � ���� � ��
� � ��
� �
���
� � � �
� ��
� � � �
Matrix multiplication can be performed with the
� � �
binaryoperator
� � � � � � � � �
��� � ��
� �
���
� � � � �
� ��
� � � � �
R – p.69/196
![Page 70: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/70.jpg)
Solving Linear Equations
The �� ��� � function an be used to solve systems of linearequations
For example the linear system
� �� �
���� �
�
�� �
can be solved as follows
� � ��� � � � � � � � � � ��
��
��
� �� �� � � � �
� � ��� � � ���
� �
� �� ��� � � ��
� �
�� � � � �
R – p.70/196
![Page 71: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/71.jpg)
Matrix Inversion
Called with a single argument, �� ��� � computes the matrixinverse
� �� ��� � � � �
��� � ��
� �
���
� � � �
� ��
� �� � ��
But computers only work to a finite precision—allcomputations are subject to roundoff error
� � � � � �� ��� � � � �
��� � ��
� �
���
� � �� � � � � � � � �
� ��
� � � �
R – p.71/196
![Page 72: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/72.jpg)
Regression
The general linear model can be written in matrix form as
� � � � ��� �
The least-squares estimates of the parameters are given by
� � � � � � � � � � � �
and the residuals by
�� � � � � � � �
The estimated dispersion matrix of
� �
is
� � � � ��� � ��
� � �� � � � � �
�
R – p.72/196
![Page 73: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/73.jpg)
Regression in S
The equations for regression analysis can be translateddirectly into S statements
� ��� �� �� � �
� � ��� �� � � � �
� �� � � � � � ��� �� ��� � � � � � � � � � ��
� � � � � � �
� � �
� � � � � � � � � � ��� � � � � � � �� � � � � �
� � � �� � � � � � ��� � � � �� � � � � � � � � � � � � � �
� � �
� � ��� � � �� � � � � � � �� ��� � � � � � � � � �
� � �
Warning: This is not the best way of computing the results
R – p.73/196
![Page 74: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/74.jpg)
Matrix Decompositions
There are many S functions which support computations onmatrices
The most useful are:
� � � � � eigenvalues and eigenvectors
� � �
singular-value decomposition
� QR decomposition
R – p.74/196
![Page 75: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/75.jpg)
Subsetting
It is possible to extract submatrices from matrices
� � ��� � � � � � � � ��� �� �� � � � �
� �
��� � ��
� � ��
� �
���
� � � �
� ��
� � � �
� � � ���
� �
�� � �
� � � ��� ��
��� � �
��� � ��
� �
���
� � �
� ��
� � �
R – p.75/196
![Page 76: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/76.jpg)
Subsetting Shorthands
An empty subscript field is equivalent to the full range ofpossible subscripts
For example:
� � ��
�� � �
��� � ��
� �
���
� � �
� ��
� � �
� � � ��
�
�� � � � �
Subscripting matrices by using logical indices and row orcolumn labels is also possible, but tends to be far lesscommon
R – p.76/196
![Page 77: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/77.jpg)
Row and Column Indices
The two functions � � and �� ��� � return matrices containingthe index for each row an column
� � ��� � � � � � � � ��� ��
�� � � � �
� �� �� � �
��� � ��
� � ��
� �
���
� � � �
� ��
� � � �
� �� � � � �
��� � ��
� � ��
� �
���
� � � �
� ��
� � � �
R – p.77/196
![Page 78: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/78.jpg)
Example: Zeroing A Lower Triangle
� � and �� �
can be used together with logical subscripting tooperate on upper and lower triangles of matrices
� � ��� � � � � � � � ��� � �� � � � �
� � � �� �� � � � �� � � � � � ���
� �
��� � ��
� � ��
� �
���
� � � �
� ��
� � �
� ��
� �
R – p.78/196
![Page 79: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/79.jpg)
Row and Column Labeling
Matrices can be made rather more useful by using row andcolumn labels
� � ��� � � � � � � � ��� �� �� � � � �
� � � � � � � � � � � �� � � � � � � � � �� � ��
� � � � � � ��
� � �� � � � �� � � � ��
� � � � � � � ��
� � � � � � � �
The primary benefit of labeling can be seen when the matrix isprinted
� �
� � � ��
� � � � �
� � � � � �
� � � � � � � � �
R – p.79/196
![Page 80: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/80.jpg)
Labeling The Labels
In R, but not S, it is possible to “label the labels.”
� � ��� � � � � � � � ��� �� �� � � � �
� � � � � � � � � � � �� � � � � �� � � � � � � � �� � ��
� � � � � � � ��
� �� � � � �� � � � �� � � � � � � � � � � � � � ��
� � � � � � � � �
� �
� � � � �
� �� � � � ��
� � � � �
� � � � � �
� � � � � � � � �
R – p.80/196
![Page 81: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/81.jpg)
Multiway Arrays
Multiway arrays generalise the notion of matrices.Arrays are created with the � � � function and theirsubsetting and labeling methods parallel those of matrices.The only major difference is that the notion of transposemust be generalized.The function � � � provides such a generalized transposeoperation.
R – p.81/196
![Page 82: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/82.jpg)
Multiway Arrays
� � � ��� �� � �� � � � � �� ���
� � �
� �
��
� ��
� � ��
� �
��
� � �
���
� � �
� �� � � � � � �
� �
��
� ��
� � ��
� �
��
� � �
� ��
��
� ��
� � ��
� �
��
� � � R – p.82/196
![Page 83: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/83.jpg)
Multiway Arrays
� �� � � � � �� � � � �
�
� � �
� �
��
� ��
� �
��
� �
���
� � �
� ��
� �
R – p.83/196
![Page 84: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/84.jpg)
Text Handling in R
R – p.84/196
![Page 85: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/85.jpg)
Printing
Complete S objects can be displayed by using the � � �
function
� � � � � � � � � � � ��� � �
�� � � � � � � � �
� � � � � � � � � � � � � �� �� � � � � �
��� � ��
� �
���
� � �
� ��
� � �
Only very basic control of formatting is possible when the
� � � function is used
Other functions can be used when finer control is necessary
R – p.85/196
![Page 86: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/86.jpg)
Control Of Printing
Print formating can be controlled by a number of optionalarguments
� � � � � � controls the number of digits printed to theright of the decimal point. The default is thevalue of the system option� � � � �.
�� � � � controls whether strings are printed with orwithout quotes. The default value is
� � � �
.
�� � � � � the character string used to indicate
� �
values. This is printed without quotes.
R – p.86/196
![Page 87: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/87.jpg)
Examples Using � � � �
� �� ��� �� ��� � ��� �� � � � � �� �� � � � � �� �� �
� ! "�# $ $�% & '# ( #
� �� ��� ) *),+ -� . �0/ � 21 � 3 �
� ! 54 6 6 64 7 6 64 8 8 64 9 7 64 9 6
� �� ��� �� )� : �� ;� < �� � =?> � � �� � > �
� ! 4 @BA C
R – p.87/196
![Page 88: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/88.jpg)
The � � � function
�� & is a function which can be used to concatenate and thenprint character strings passed to it as arguments
Objects other than character strings are automaticallyconverted to character strings by �� &By default, the strings being concatenated are separatedby a space character. Can be overridden with the � # ��
argument.Certain sequences of characters have a specialinterpretation (shared with C, C++, Java and otherlanguages)
� A represents a newline character,
� & represents a tabcharacter, and
� �
a backslash character
R – p.88/196
![Page 89: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/89.jpg)
Examples Using � � �
� � 3 ��� �> �� ��
� � = = = = � �� � �
� � � � � �
� � = = = = � �� � �� 1 � � � �
� � � � � �
� � = = �� �� � �� . �� � �� 1 � � � � �
� � � � � � �
� � = � � 1 �� =� � . � �� � 3� � �
� � �� �� ( # � 64 �� � 8
R – p.89/196
![Page 90: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/90.jpg)
Vector Arguments to � � �
When the arguments to �� & are vectors, their elements aretreated as though they were separate arguments.
� � � � � 1
� ! � � � � � � � � � � � � # � � C � � � � � ' � �� �
� 6 ! �� � � � � � $ � � � � � A � � % � � � � � � � � ( �
� � ! � � � � & � � � � ��� � �� � �� � ��� � ��� �
� � = � � = � � � = � + �� � � � � 1 �
� � �� 1 � � � �
� '# � $ � '� � # &�� � � � # C � '� � � $ �A % �� ( � & �� � �
R – p.90/196
![Page 91: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/91.jpg)
Rounding
There are a variety of S functions which help to formatnumbers for use with � � &
The function (% �A
rounds its argument to a specified numberof decimal digits
� � � � � . � � � � < - �� 3 �
� ! 64 � 64 9 � 64 8 � 64 9 7 64 7 8The function �� �A � C
rounds its argument to a specifiednumber of significant digits
� 1 �0/ � � < � � � � < - �� 3 �
� ! 64 6 64 64 � 8 64 � 7 64 9 �
R – p.91/196
![Page 92: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/92.jpg)
Formatting
The R function
C % ( �� & �
provides general formatting ofnumeric values as strings
The function is very flexible and provides better control thanC % ( �� &
C % ( �� & �
has arguments which control whether scientific orstandard formatting is used
It also has control for the number of digits appearing in results
� <� � � = � ) *��� <� � � = � < �� . �0/ � 1 � � �
� ! � 64 8 8 8 8 �
� <� � � = � ) *��� <� � � = � � �� . �0/ � 1 � � �
� ! � 84 8 8 8 8 # � 6 �It provides facilities for printing numbers in European formatsor for printing dollar amounts
R – p.92/196
![Page 93: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/93.jpg)
The � � � � Function
�� � & # provides a very flexible way of pasting strings together
Useful in conjunction with �� &
�� � & # obeys the vector recycling rule—useful for tasks likecreating labels
� � =1 � � =� � ) + � �
� ! �� � ( � � � � ( 9 � � � � ( 8 � � � � ( � �
� # �� and � % $ $� � � # � arguments control how strings are gluedtogether
They make it possible to glue all the results into a single string
� � =1 � � =� � ) + �� 1 � � � � � � � � � = �1 � � + �
� ! �� � ( � � � � ( � 9� � � ( � 8� � � ( � � �
R – p.93/196
![Page 94: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/94.jpg)
LATEX Tables
The next slide shows a very simple S function which can beused to print tables suitable as input for LATEX
This is just a sketch of how a real
$� & # 4 & � � $ # function couldbe written
� ��
� 8
� 9 �
� � = � �> = � � �
�
� � # �� A � & � � � $� ( � � $ ( ( !
� � � � �
� � � 8 � �
� � 9 � � � �
� # A � & � � � $� ( �
R – p.94/196
![Page 95: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/95.jpg)
Source Code
� � = � �> = � � � � <� � � � � � �
� �
� � � = 1 � � . � �� = � � 1 �
� � �) � �
� � � = 1 � � . � �� = � � 1 �
� � � 3 � �
� < � ��� � � �� � � � � � � � � / � �� � = 1 � � �
� � = � / ��� � = � � =� � � �� < � �
� � � �� 1 � � � �
� � = �� � � = 1 � 1 � � � � �
� � = � �
� <� � � �� ),+ � � � �
�� � �
� � = � � = 1 � � �� �� �� �� 1 � � � � �
� � = � �
� �
� � = � � . � = � � =� � � �
� �
R – p.95/196
![Page 96: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/96.jpg)
Generalisations
The
$� & # 4 & � � $ # function is very basic and can be enhancedin a number of ways
The function fails if its argument does not have row andcolumn labelsThe function should be changed so that the labels areoptional, or could be specified on the command lineIt might be useful to be able to be able to centre or leftjustify some columnsIt might be useful to round or format columns differently
Packages are available for converting R output into a formwhere it may be directly incorporated into a LATEX file
� ��� �� � will format tables
$� & # from the package
� � � � will format output fromsome R modeling functions
R – p.96/196
![Page 97: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/97.jpg)
Text Processing
There are a number of functions which allow manipulation ofcharacter strings in R
A � '� ( computes string lengths
� � � � & ( extracts substrings
� � � � & (� A � extracts substrings
� & ( � � $� & splits strings
& % $�% # ( converts to lower case
& % � � � # ( converts to upper case
�� � # C % $
character case conversion
� '� ( & ( carries out character conversions
R – p.97/196
![Page 98: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/98.jpg)
Substring Examples
� 1 � 1 � = � . � < �� 3� � �
� ! � � � �
� 1 � 1 � ��� / = � . � < �� ) + �� ) + � �
� ! � � � � � � � � � � � � # � � C �
� 1 � 1 � � � � = � . � < � � �� ) + �� �+ - �
� ! � � � � � � � � # � � � � � # �
� 1 � 1 � = � . � < �� ) + �� �+ - �
� ! � � � � �
� 1 � 1 � ��� / = � . � < �� ) + �� �+ - �
� ! � � � � � � � � # � � � � � # �
R – p.98/196
![Page 99: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/99.jpg)
Substring Replacement
� � � � ) 3� � - � �� �
� 1 � 1 � ��� / �� 3 � � � = = =
� �
� ! � � � � 7 � �� � �
� � � � ) 3� � - � �� �
� 1 � 1 � ��� / �� 3� � � � � = = =
� �
� ! � � � � 7 � �� � �
R – p.99/196
![Page 100: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/100.jpg)
Splitting Strings
� 1 � 1 � � � � � � �� � � � � �� � � . �� �
� � ! !
� ! � '# $ $ % � � & '# ( # � � % ( $ �
� 1 � 1 � � � � � � �� � � � � �� � � . ��
� �
� � ! !
� ! � '# $ $ % � � �
� 8 ! � & '# ( # � & % ( $ �
� 1 � 1 � � � � � � �� � � � � �� � � . ��
� � �+ 1 � = � �+ � � � �
� � ! !
� ! � '# $ $ % � � & '# ( # � � % ( $ �
R – p.100/196
![Page 101: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/101.jpg)
Pattern Matching
The � ( # � function can be used to carry out pattern matching
� � � � = � � � � � �� / � 1 � � �
� � � � � � � � 1 1 � 1 � � � �� � �
� �
� ! � � $ $ � � & '# � � �� A � � � � � �# A �
� / � � � � �� ��
� ! 9 �
� ��/ � � � � �� �� �
� ! � & '# � � �# A �
R – p.101/196
![Page 102: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/102.jpg)
Pattern Substitution
� � � � � � � � �� � � < �� � � � � � ��� � 1 � � . � � � 1 � � �
� 1 � � � ��� � � � � ��� � � � ��
� ! � � '# � % $�% � ( % C �% A # � � % $�% ( � �� # �� �� % A �
� / 1 � � � �� � �� � � �� � � �� ��
� ! � � '# � % $�% � ( % C �% A # � � % $�% � ( � �� # �� �� % A �
� � � � = � � � � � �� / � 1 � � �
� � � � � � � � 1 1 � 1 � � � �� � �
� / 1 � > � � � ) � ) �� ��
� ! � � $ $ � � $ $ � � & '# � & '# �
� 8 ! � �� A � � � � �� A � � � � � �# A � �# A �
R – p.102/196
![Page 103: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/103.jpg)
Importing and Exporting Data
R – p.103/196
![Page 104: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/104.jpg)
Exporting Data to Text Files
Use (� & #4 & � � $ #
� �� �� ��� � � �� � � �
��� � � ��� � ��� � � � � � � � � �� � � � � � �� ! � "� � � #$ %! �
� � � � � � � � � � & � � � � � � �' � � � � � � � � � �
� � � � � �( � � # $ %! � �� � � � �( � � #$ %! �
"( )� � � � � � � � �� � � � �� � � � � �
R prefers the header line to have no entry for the row names
� � � � � � ( � (
* � � ( � � � +� , - , . / -� .0 1
2 �� � )�3 -� . + , . . 40 � 1 , .
� � �
� % $4 A � �# � � 5 6
puts an empty entry in the space for the rownames
To transfer to Excel, the usual method is to write acomma-separated file: (� & #4 � �� is a wrapper which selectsthe appropriate options
R – p.104/196
![Page 105: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/105.jpg)
Reading Data from Text Files
Use ( # � 4 & � � $ #
� �� �� � � � �� � � �
��� � � ��� � � � � � � ) � � � � � �� ! � � � � � � � "� � � �� � � � � � � � � � �
� � � � � �( � � �� � � � �( � � �� � � � � � � � � � �� � � � � � �
� � � � � �� � � � � �� � �' � � �
�� � 2 � �� � � � ' � � � � � � � � � / � � � � � � . � � ) � �� � �( � � #$ %! �
� � � � � � � � � �� � � � � � � � � � � � � � � � � ) � � � � � ! � � � � �� � � � � � � � � � � # $ %! �
�� ( ( � � � ) �� � � � � � �� � � � ! � � �� � � � �� ! � � � � � ) � � �� ! �
� � � � �� � � � � � � � � � � � �� � � � � � � �� � � � � � � � � � � �
The header commonly has entries for the columns only, not forthe row labels, so one field is shorter than the remaining lines
If the file has (possibly empty) header field for the row labels,use
� � �� � � � � � � � �� � � � � ) � � � � #$ %! � � � � � � �( � � / �
Column names (which override the header line) can be givenwith � % $4 A � �# �
R – p.105/196
![Page 106: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/106.jpg)
Reading Data from Text Files
Files exported from a spreadsheet can have trailing emptyfields and their separators omitted. Use
C� $ $ � � � ��
( # � 4 & � � $ # tries to select a suitable class for each variable inthe data frame working through
$�% �� �� $��
� A & # �# (� A � �# (� �
and � % � � $ #
The final choice is
C � � & % (
� % $ � $� � � # � and � �4 � � provide greater control
� �4 � � suppresses conversion of character vectors to factors
There are special functions for comma-separated files( ( # � 4 � �� ) and fixed width format files ( ( # � 4 C C
)
Reading files uses � �� A which reads a character at a time,and can be used directly
See also ( # � � � A # � to read whole linesR – p.106/196
![Page 107: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/107.jpg)
Importing Data
The package
� �� � �� allows import of data written in otherformats
Functions include ( # � 4 # �� � A C % � ( # � 4 � & �� ( # � 4 �% ( & �
( # � 4 �� ( # � 4 � � � � and ( # � 4 �� � & � & for EpiInfo, Minitab
portable, SAS Transport, S-PLUS, SPSS, Stata and Systatfiles
A very useful option is to read data from a relational database
Several packages support database access such as
� �� � �
,� � � �� �, and
� �� � �
which is quite generic
A later section of the course introduces the database �� � �
and shows how to use it in conjunction with R
R – p.107/196
![Page 108: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/108.jpg)
Connections
Connections replace use of filenames with a flexible interfaceto file-like objects
For example �� C� $ # will read and write from �� � � files
� ( $
will read from URLs of types
' & & �� � �
,C & �� � �
, andC� $ # � � �
R – p.108/196
![Page 109: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/109.jpg)
Excel Spreadsheets
Best approach is to export the data in tab-delimited orcomma-separated form
You can cut and paste from the display of a spreadsheet with
� � �� � � � � � � � � � � � � � �� � � �� ) � � � � #$ %! � � � � � & � � � � � � � � �
You can also read the clipboard with ( # � � $� � �% � (
On Windows the function % � � �% A A # � & � � # $
in the package� �� �
can select rows and columns from sheets in an Excelspreadsheet file
R – p.109/196
![Page 110: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/110.jpg)
Data
There are data sets with R in the package
� � ��� � �See a list with
� & � ��
Load a data set with
� & � �� A C # ( &�
Find data or load data from other packages
� � � � � � � � � � � �� 3 � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� 3 � � � � � � � � � � �Edit data or create a data frame in a spreadsheet format with
# � &
� � � � ( �( "� �( � � � � � � � � �� 3 � � � � � � � � � � �
( �( "� �( ' � � � � � � ( �( "� �( �
� � � � � � � � � � � � �� � � � ( � � �Some statisticians advise against this usage since an ad hocchange to the data like this is not able to be done using batchprocessing and is thus not reproducible
R – p.110/196
![Page 111: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/111.jpg)
Programming in S
R – p.111/196
![Page 112: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/112.jpg)
Programming Concepts
We write programs to solve problems, e.g. to simulate a queue
A program is an organized list of instructions that, whenexecuted, causes the computer to behave in a predeterminedmanner
A program is like a recipe. It containsa list of ingredients—called variables; anda list of directions called statements or commands
Directions tell the computer what to do with the variables
Variables can represent numeric data, text, or graphicalimages
R – p.112/196
![Page 113: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/113.jpg)
Programming Concepts
A programming language is a vocabulary and a set ofgrammatical rules for instructing a computer to performspecific tasks. Examples are Matlab, R or C
A procedure is a section of a program that performs a specifictask, or a sequence of instructions for performing some action
A procedure is called an algorithm if it is finite, definite, andeffective, with some output
Computer programs should be algorithms, but there are manyprocedures not related to computers—knitting patterns,recipes, choreography
R – p.113/196
![Page 114: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/114.jpg)
Algorithms
An algorithm if it is finite, definite, and effective, with someoutput
finite means it ends after a finite number of stepsdefinite means each step is clearly defined—”beat untilfluffy” in a recipe doesn’t satisfy this criterioneffective means able to be carried outoutput is required otherwise the result of implementing thealgorithm will remain unknown.
To solve a problem, the clearest, most reliable way is to firstdevelop a suitable algorithm, then to program or code thealgorithm in a suitable programming language.
R – p.114/196
![Page 115: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/115.jpg)
Representing Algorithms
How is the algorithm to be specified, if not in a programminglanguage?
in natural language—Englishby using a flowchart—using diagrams is often helpfulby writing pseudocode—this is recommended by computerscientists and software developers
R – p.115/196
![Page 116: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/116.jpg)
Pseudocode
Pseudocode is an outline of a program, written in a form thatcan easily be converted into real programming statements
Enables the programmer to concentrate on the algorithmswithout worrying about the syntactical details of theprogramming language
Can be made progressively more detailed as the algorithm isdeveloped
Can be used as comments for the final program
R – p.116/196
![Page 117: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/117.jpg)
Flow of Control
To be useful, a programming language needs to have asufficiently rich collection of control structures
Control structures determine the order in whichprogramming instructions are executed, called flow of control
Programming languages typically allow forsequencealternation or selectioniteration, repetition or loopingjumpinginterrupting
R – p.117/196
![Page 118: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/118.jpg)
Flow of Control
sequence—this is the default order, instructions beingexecuted in the order in the order in which they appear
alternation or selection—implemented as
� C 4 4 4 & '# A
4 4 4 # $ � # 4 4 4 or �� � # or � � & � '
iteration, repetition or looping—implemented as '� $ # orC % ( or ( # � # � &
jumping—implemented as �% & % or� ( # � �
or � & % �
interrupting—used for mouse operations for example
R – p.118/196
![Page 119: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/119.jpg)
Flow of Control
Important R commands, or control structures which affectthe flow of control are:
� C
together with # $ � # executes one group of R commandsif some logical expression is true, and another set if it isnot true
C % ( executes a group of R commands for a particular setof values of some index
'� $ # executes a group of R commands an indefinitenumber of times while some expression is true.
� � & � '
is used to choose which of a number of differentgroups of code will be executed.
R – p.119/196
![Page 120: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/120.jpg)
Compound Expressions
Compound expressions give a way of treating a sequence ofexpressions as a single expression
The general form of a compound expression is:
� � �� � �� � ��� � � 4 4 4 � � � � � �� � ��� � �
�
A compound expression is evaluated by evaluating each of itscomponent expressions in turn and taking the value of the lastone as the value of the compound
Note that the statements here are separated by semi-colons,but newlines will also serve as separators
R – p.120/196
![Page 121: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/121.jpg)
� ��
� � � �
The syntax is
� C
( � �� � �
)
� � � � �
# $ � #� � � � �
Here � �� � �
must evaluate to a logical value
The other expressions may be a collection of R statementsenclosed in curly brackets
��
�Indentation is used to highlight flow of control
R – p.121/196
![Page 122: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/122.jpg)
Example of
� ��
� � � �
� �% $� � A � � � �� (� &� � # � �� &� % A
� � � # �� C� � % # C C� �� # A & � � � �� � � � % A � &
� # $ & � � � �� 9 � ��� � � � % A � & � & '# � � � (� �� A � A &
� C � � # $ & � � 6� �
� % $A � � � � 5 6�
5 6�
�� & � � 5% ( # � $ � % $ � &� % A � & % & '� � � �� (� &� � # � �� &� % A � A � �
� # $ � # �
� % $A � � � � � � � �� ( & � � # $ & � �� � � � �� ( & � � # $ & � � � � � 9 � � �
�� % $A
Informative names should be used
Some functions have one-letter names: � � & � �
� � � � often used for data,
��
��
�
for counters
R – p.122/196
![Page 123: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/123.jpg)
Example of
� ��
� � � �
For � � ��
� � �� � � �
� � � � � � � � � 9 � �� A � � � �
� � � � � � �� 9 � ��� � � �� A � � � � � � � �� � � �
� � � � � � � � �� �
� � � � � � � � � 5 6�
5 6�
� � � � � 5� � � � � � � � � � � � �� � � � � � � � � � � � � � � � � �
� � � � �
� � � � � ��� � � � � � �� � � � ��� � � � �� � � � �� � � �� � ��
� �� � � � �
� � � � � � �
R – p.123/196
![Page 124: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/124.jpg)
Example of
� ��
� � � �
For � � ��
� � �� � � �
� � �� ��� � �� � � �� � � �� �
� � � � �� �� � � ��� �� �� � � � � � � � � � � � � �
� � � � � � � � � �
� � � � � ��� � �� ��� � � �
� � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � �
� � � �
� � � � � ��� � � � � � �� � � � � � � � � � �� � � � � � �� �� � � �
� �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � �
� � � � � � �
R – p.124/196
![Page 125: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/125.jpg)
Example of
� ��
� � � �
For � � ���
� � �� � � � �
� � �� ��� � �� � � �� � � �� � �
� � � � �� �� � � �� �� �� � � � � � � � � � � � �
� � � � � � � � � �
� � � � � ��� � �� ��� � � �
� � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � �
� � � �
� � � � � ��� � � � � � �� � � � � � � � � � �� � � � � � �� �� � � �
� �� � � � �
� � � � �
� �� � � � � ��
� � � � � �
R – p.125/196
![Page 126: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/126.jpg)
� � � � � �
Used for elementwise change to a vector
The syntax is
� � � � (
��� � �
, � � �, �� )
Here
�� � �
is a logical vector
� � � gives the value to be returned if the test value is true
�� gives the value to be returned if the test value is false
� �� � � �� � � �
� �� � � � ��� � � � � � � �
�� � ��
� �� �� � ��
�� �� � � ��
� � � � � � ��
� � � � � � � �� � �� � ��
� � � � � � � � � �
� �� � � � �� � � � � � �� � � �
� �� � � � � � � � � � ��� � � � � � � � � � � � � � �
�� � ��
� �� �� � ��
�� �� � � ��
� � � � � � ��
� � � � � � � � � � � �
R – p.126/196
![Page 127: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/127.jpg)
Relations and Operators
Relations are tests, or comparisons, which are either’TRUE’ ( = 1), or ’FALSE’ ( = 0)
Relational Operator Comparison
� less than
� greater than
��� less than or equal
��� greater than or equal
� � equal
� � not equal
Note that "� � " is used in relation statements
R – p.127/196
![Page 128: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/128.jpg)
Relations and Operators
Relation statements may be combined using logical operators
�
,
�
and
�
Relational Operator Resulting Comparison
�
logical and
�
logical or
�
logical not
For example
� � � � � � � �� � � �
, is
� �� �� � � �� �� � � �
and isthus
�� ��
, so R will have the value 1
Use brackets to make sure the relation specified is the oneintended
R – p.128/196
![Page 129: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/129.jpg)
� � �
The syntax is
� �� ��� �� � � �� � �� �
� �� � �
The index �� � � starts out equal to the first element of � � � � �
Each time the
� �� � loop is executed, �� � � is set equal to thenext entry in � � � � �
, etc.
R – p.129/196
![Page 130: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/130.jpg)
Example of Use of
� � �
� � ��� � � � � � � � �� � � � � � � � � � � � � � � � � �
� � � � � � � � � �� � � � � � � � �
� � � � �� � � �� � � � � � � � �
� � � � �� � � �� � ��� �
� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � �� �� � � � �
� � � � �� � � �� � ��� �
� � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � �� �� � � � �
� � � � � � �� � �
� � � � �� � � � � � �� � � � � �� � � � � � � � � � � � � �� � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � �� � � � �
�
R – p.130/196
![Page 131: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/131.jpg)
Example of Use of
� � �
� � � � � �� � � �� � � � � � � � �
� � � � � �� � � � � � �� �
� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � �� � � � �
� � � � � � � � � � �� � � � � � � � � �
� � � � � �� � � �� � �� �
� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � �� �� � � � �
� � � � � � � � � �� � � � � � � � � �
� � �� � � � �� � �
� � � � � �� � � � � � �� � � � � �� � � � � � � � � � � � � �� � � � � � � �
� � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � �� � �
� �� � � � � � � � � � �� � � � � � � � � �
� � � � � � � � � � �� � � � � � � � � �
� � � � � � � � � � �� � � � � � � � � �
� � � � � � � � � � �� � � � � � � � � �
R – p.131/196
![Page 132: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/132.jpg)
� � � �
The syntax is
� � � � � ��� �� � �� � �� �
� � � �
� � � � is executed while �� � � � � � remains true
Initialisation is usually required prior to entering the loop, andsome change which will affect the logical expression within theloop
R – p.132/196
![Page 133: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/133.jpg)
Example of Use of � � � �
� � � � � � � � � � � � � � � � � � � � � � �
� � � � ��� � � � � � � � � � � � � � � � � � � � � � � � � � � �
� �� � �� �
� � � � � � � � � � � � � �
� � � � ��� � � � � � � � � � � � � � � � �
� � � �� � � � � � � � �� � � � � � � � � � � � � � � � � � �
� �� � �� � �� � � �
�� � � �� � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � �
R – p.133/196
![Page 134: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/134.jpg)
Example of Use of � � � �
� � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � �� � �� �
� � � � � � � � � � � � � � �
� � � � � �� � � � � � � � � � � � � � � � �
� � � � �� � � � � � � � �� � � � � � � � � � � � � � � � � � �
� � �� � �� � �� � � �
� �� � � � � � � � � �
� � � � � � � � � �
� � � � � � � � � �
� � � � �� � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � �
R – p.134/196
![Page 135: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/135.jpg)
Functions
New functionality is added to S by defining new functions
As a very simple example, define a function which squares itsargument as follows:
� �� � �� � �� � � � ��� � ��� � � � �
The expression
� � � � � � � � � creates a function whichis assigned as the value of the variable �� � �� �
The function has a single argument x and the value ismultiplied by itself to provide the value of the function
We can use this function in exactly the same way as any otherS function
� �� � �� � �� � �
�� � � � �
R – p.135/196
![Page 136: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/136.jpg)
Vectorisation
Because the operation � acts element-wise on vectors, thenew square function will also
� �� � �� � �� � � � �
�� � � � � � � � � �� � � � � � �
�� � � � � �
Using this fact we can write a simple sum-of-squares function
� � � � �� �� � � � �� � �� � � � � � � � � �� � �� � �
� � � � �� �� � � � �
�� � � ��
R – p.136/196
![Page 137: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/137.jpg)
General Functions
In general, an S function has the form:
� � � � � � � � �� � � � � �� � �
where � �� � � �
is a (comma separated) list of variable namesknown as the formal arguments of the function, and
�� � � is asimple or compound expression known as the body of thefunction
The general rule for evaluating a call to a function is totemporarily create a set of variables by associating thearguments passed to the function with the variable names in
� �� � � �
, and then to use these variable definitions to evaluatethe function body
R – p.137/196
![Page 138: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/138.jpg)
Example
Consider the function defined by:
� ��� �� �� � � � �� � � ��� � � �
� � � � � � � � �
� �
and suppose we make a call to this function by typing:
� ��� �� �� � �
�� � �
This function call is evaluated as follows:Temporary variables, � and
�
, are created with the values3 and 4.These variable definitions are used to evaluate theexpression �� � � � �� � � �� � �
to obtain the value
�
.When the evaluation is complete the temporary definitionsof � and
�are removed.
R – p.138/196
![Page 139: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/139.jpg)
Optional Arguments
S has a notion of default argument values which makes itpossible for the writer of a function to specify reasonabledefault values for arguments, while still providing the flexibilityusers the option of overriding these defaults
As an example, consider the following sum-of-squaresfunction
� � � � �� �� � � � �� � �� � � �� � � � � �
� � � � � �� � � �� � � �
� �
The function definition provides a default definition for the
� � �� � argument
R – p.139/196
![Page 140: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/140.jpg)
Example
When invoked with just a single argument the function returnsthe sum of the squared values in that argument
� � � � �� �� � � � �
�� � � ��
When provided with a value for the � � �� � argument, thefunction computes the sum of squared deviations about thespecified value
� � � � �� �� � � �� � � � � �� � � � � �
�� � � ��
�
R – p.140/196
![Page 141: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/141.jpg)
Argument Matching
Because it is not necessary to specify all the arguments to Sfunctions, it is important to be clear about which argumentcorresponds to which formal parameter of the function
This can be done by providing a name for the argumentexplicitly.
� � � � �� �� � � �� � �� � � � � � � �� � � � � �
�� � � ��
�
When names are provided for arguments, they are used inpreference to position which matching up formal argumentsand arguments
For example
� � � � �� � � �� � � � � � � �� � � � �� � � � � �
�� � � ��
�
returns the same answer as the function call above R – p.141/196
![Page 142: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/142.jpg)
The Argument Matching Process
The general rule for matching formal and actual arguments isas follows.1. Use any names provided with the actual arguments to
determine the formal arguments associated with thenamed arguments. Partial matches are acceptable, unlessthere is an ambiguity.
2. Match the unused actual arguments, in the order given, toany unmatched formal arguments, in the order theyappear in the function declaration.
R – p.142/196
![Page 143: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/143.jpg)
Argument Matching Example
If � � �� is defined by
� � � � �� �� � � � �� � �� � � �� � � � � �
� � � � � �� � � �� � � �
� �
then all the following calls to � � � � are equivalent.
� � � �� � � � � � � � � � � � � �
� � � �� � � � � � � � �� � � � � � � � � � �
� � � �� � � � � � �� � � � � � � � �
� � � �� � � � � � � � � � � � � � �
� � � �� � � � � � � � � � � � � � �
R – p.143/196
![Page 144: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/144.jpg)
Dates and Times
R – p.144/196
![Page 145: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/145.jpg)
Dates and Times
Dates and times are needed for time series, survival analysisetc
Problems: time and date formats, time zones, daylight saving
Classes
��� ��� , � �� � ,
� � ��
(classes:
� � �� � ��� and
� � �� � �)
��� � � is only dates, no times, � �� � is a bit old,
� � ��
classesare comprehensive but complicated
Tasksread dates and times in different formats: � �� � � � �,
� �� � � �� �� �, � �� � � �� � � �, � � � � � � �
print dates and times in different formats:
� �� � �� � � � �,
� � � � � � �
operate on dates and times: normal functions such as
� � � � � � � � , plus
� � � � � � �
R – p.145/196
![Page 146: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/146.jpg)
Dates and Times
Origin��� � �� days since January 1, 1970
� �� � � days since January 1, 1970, fractions of days givetimes within days� � �� �� number of seconds since the beginning of 1970 inthe GMT timezone (includes 22 leap seconds, see theobject �
� � � �� � �� � � �)
��� � � has no times, � �� � has times and dates, no timezones,� � �� � has times, dates and timezones
Suggestion: use the simplest possible class which supportsyour needs
Time zones are very tricky
R – p.146/196
![Page 147: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/147.jpg)
Dates and Times
To read a date, use � �� � � � � � � �� � �� � � � �� � � �� �� �, withoptional format
Formats are provided by
� �� � �, and for
� � �� � � by
� � � � � � �: see
� � � � � � � � for these
Check what has been read in using � � � � � � and
� �� � � �� �� � � �
� � �� � is a super class containing� � �� � � and
� � �� � �
� � �� � � is a number of seconds
� � �� � � is the number of seconds broken down into a list of 9items: year, month, day of the month, hour, minute, secondand a daylight saving flag
R – p.147/196
![Page 148: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/148.jpg)
Miscellanea
R – p.148/196
![Page 149: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/149.jpg)
Scope
A major difference between S-PLUS and R
Symbols in the body of a function are:formal parameters: occurring in the argument list, valuesdetermined by binding actual function arguments to formalparameterslocal variables: values determined from expressions in thefunction bodyother variables called free variables: become local ifvalues are assigned to them
� � � � � � � � �� � �� �
� � � �� �
�� � � � ��
�� � � � �
�� � � � ��
� is a formal parameter, � a local variable, � a free variable
R – p.149/196
![Page 150: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/150.jpg)
Scope
In R the free variable bindings are resolved by looking in theenvironment in which the function was created: called lexicalscope
Consider a function � � � �
� � ��� � � � � � � � �� � � �
� � � � � � � � � �� � �� �
�� � �
�
Then � � � � � � �
in R gives the answer 8.
Useful for maximum likelihood determination using � � � � � �
or
� � �
(see later).
R – p.150/196
![Page 151: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/151.jpg)
Customizing the Environment
R can be invoked with a number of options: the followingassumes that it is started with just the defaults
For details, check the manual An Introduction to R
�� � �� � ��� � � � � �� �� � � � � �� � � ��� � ��� � � �� � �� � � � � � � � �
or use
� � � � � � � �� � � � �
Note that this area has changed recently and there may bedifferences between versions of R
There are also differences between operating systems
Customisation is possible at the site level, the user level, andthe directory level
Note that R is installed as a directory tree at a particularlocation in the file system which is referred to as
� � ��
andmay be pointed to by the environment variable
� � � ��
R – p.151/196
![Page 152: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/152.jpg)
Customizing the Environment
The startup routine is basically as follows:The site file is read—either as pointed to by
� � � � � � �
orat
��
� �� � � � � � � � � � � � � � � � � �
The site-wide startup profile is read—either as pointed toby
� � � � �� �
or at
��
� �� � � � � � � � �� � � � � �� � � � �
R searches for �� �� � � � � � in the current directory or in the
user’s home directory (in that order) and sources itIt loads a saved image from �
� � � � if there is oneIt executes a �
� � � � � function if there is one. This functioncan be specified in the startup profiles, or in �
� � � �
R – p.152/196
![Page 153: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/153.jpg)
Installing Packages
Details regarding installation are given in the manual RInstallation and Administration
�� � �� � ��� � � � � �� �� � � � � �� � � ��� � ��� � � �� � �� � � �� � � � �
Note that this area has changed recently and there may bedifferences between versions of R
There are also differences between operating systems
Packages are installed in libraries
R comes with a single library�
�� �� � � � � �� �� � which contains
the standard and recommended packages
The location of the main library is given in the character string
�� � �� �� �
� � � � ��� �� �
�� � � � � � � � �� � � � � � � ��� �� �
R – p.153/196
![Page 154: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/154.jpg)
Installing Packages
First of all you must create a directory where packages will beinstalled: suppose it is called
�� � ��
Then you must set the environment variable� � � ��
to point tothat directory: In
�� � �
use
� � �� � ��
� � � �� � � � � �
Include this in your �� � � �� � so that your
� � � ��
will beautomatically set when you start a shell
Now start up R. To install the package � �� � � � , use
� � � � �� � � � � � � �� � � � � � �� � � � � �� � � � � �� � � � � � � � � � � � � � �� � � � � � � ��
Note that this will install from the Statistics Department’s localCRAN mirror site so will not incur any costs for internet use
From 2.5.0, R can have a site-specific libraries and userlibraries
R – p.154/196
![Page 155: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/155.jpg)
Mathematical Annotation
Pass an expression rather than a character string to � � �,
� � � �, � � �, or � � � � �
Will give Greek and other symbols, super and subscripts
Like a poor man’s TEX
In an expression
� ��� � � ,
� � �� etc. will give the corresponding Greek letterSuperscripts obtained using ˆSubscripts obtained using square brackets
��
See the � � � � � � � �
function
R – p.155/196
![Page 156: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/156.jpg)
Probability Distributions
For many distributions there are functions to calculate thedensity function, cumulative distribution function, quantilefunction, and to generate random observations
The function name is comprised of the abbreviated distributionname and a prefix
The prefixes are
���
��
�� and �
For example for the Poisson we have
� � � � ��
� � � � ��
� � � � ��
and � � � � �R – p.156/196
![Page 157: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/157.jpg)
Optimisation
This is a common task: e.g. for maximum likelihood
Two functions � � � � � ��
and � � ��
� � � � � ��
has a number of optimisation methods, can usederivatives if available
R – p.157/196
![Page 158: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/158.jpg)
An Extended Example:the Weibull Distribution
R – p.158/196
![Page 159: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/159.jpg)
Weibull example from Devore (2000)
Let
������ � � � ��� be a random sample from a Weibull pdf
� �� � � � �
��� ��� ��� � ��� � � � � � �
�
otherwise
Writing the likelihood and ln(likelihood), then setting both� � �!" # � �$ � �
and
� � � �!" # � �$ � �yields the equations
�
�&%" # � % �
�'%
(" # � % �
)
� �
� �
�&%)
� � �
These two equations cannot be solved explicitly to give generalformulas for the mle’s
* and
*�
. Instead, for each sample � �� � � � � ,the equations must be solved using an iterative numericalprocedure.
R – p.159/196
![Page 160: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/160.jpg)
Weibull example in R
Regard the problem of calculating MLEs as an optimizationproblem. Usually it is easier to optimize the log-likelihoodrather than the likelihood itself. The parameter values thatoptimize the log-likelihood are also the MLEs.
Probability density functions (probability functions for discretedistributions) have names starting with d — dnorm, dbinom,dunif, dweibull.
All density functions take an optional argument log which,when TRUE causes evaluation of the log-density.
The recipe becomes:Regard the parameters as the variables and sum thelog density for your distribution using your data.
R – p.160/196
![Page 161: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/161.jpg)
An R session on the Weibull example
� � � ��� �� � �� �� �� �
� � � � ��� �� �� � � � � � �� �� � �� � � � �
� � � � �� �� �� � � � � �� � � ��� � �� � � � � � � � �� �
� � � � � � � � � � � � � � � � � � � � � � � � �! �"
# ! � � � � � � �" � � � $% $ � &� % � & $ � � �
� � ' � � � �� � �! � � � � � � ��� ( � � � � �� � � � � � � � ) � � � * ,+ � � � ! � * � � �
� � � � � � � �� ! � �� � ! �� � � �� � � � ) � � � � � � � � � � � � � � � � �
� � � � �- . � � � �! ! �� � � �� � � � # ! � � � � � � �+ � ) � � � * ,+ � � � ! � * � � �+ ! � ( * / '0 1
2 3 4 % � � � &5
� � 6 � � � � �87 � � � �� � �� � � � �� � � ��� � � �87 � � � � � � � � ( � � �� � ! � ( 4 ! �9 � ! � ) � �-
� ! ! � �� � : 4 � �� � � � �� �� ; � �� � � � � � � � � � �� � � � ��
< 4 � � � �- . � � � �! ! ��� �� �� � � � # ! � � � � � � �+ � ) � � � *� 2 3+ � � � ! � *� 2 $ 3 + ! � ( * / '0 1
< =� �! � : 4 � ! � � ! ! � �� �+ � � � ) � � � * ,+ � � � ! � * � � � + ) � � � � � � * / '0 1
R – p.161/196
![Page 162: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/162.jpg)
Results of the Weibull example
� � � � � �! � � � � � � � � �� � � � � ) � � � � �� � �- � � ! � �
� � � � � � 5
# � ��� � � � � " � � � & & �
# � � � � �� � � " � � � 2 " $ 3 $ � $ % � � ��
# (� � - � �� � " � � � 2 " $ 3 4 � � � 4 � $ � �% � 4 �%
# ) � � � � � � " � � � 2 " $+ " $ 3 � � & � < � � 4 � � $ � � 4 � � 4 � � $ � � 4 � � $ � &% � 4 �
# � �- � " ��� �
# � � � � � � � �� �" ��� � $ �
� � � ! � � � �! � # ) � � � � � � � � � � � �� � �� � � � � � � � � � � 4 � �� � � � � � � � �� � � ��
2+ 3 2+ $ 3
2 +3 � � $ � � 5 � $ �� � � 5 � $ $
2 $+3 � � � � 5 � $ $ � � � �% � � % � �%
We see that the maximum of the log-likelihood is ( � �� �
, achievedat
* � �� ��
and
*� � � �� �� � . The approximate standard errors of
the estimates are
�� � � �� � �� �
and
� � �� � � �� ��� � �
. Wecan use the standard errors to determine a grid of
� � �
values forcontouring the log-likelihood function.
R – p.162/196
![Page 163: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/163.jpg)
Plotting the density at the estimates
� � ! � � � � �� � � � �� �� - . � � � �! ! ��� + � ) � � � * $ � + � � � ! � * $% � � �� + �+ � � � �+
< � � ! * � � �- � + � ! � � * � ! � � � � � � � � )� � + �! � � * � - �� � � � � � +
< �� ��� * � � � � � �! ! - �� � � � � � � ��� ( � � 1 � � � � � � ) � ! � � � � � � � - � � � �
� � � ( �� �� �� � � � # ! � � � � � � �+ � � ! * � �! � � �
0 500 1000 1500 2000 2500 3000
0 e
+00
2 e
−04
4 e
−04
6 e
−04
Weibull density using MLEs from the lifetime data
lifetime (hr)
dens
ity
R – p.163/196
![Page 164: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/164.jpg)
Contouring the log-likelihood function
� (� �- : 4 �� � � �� � � � �+ � � � . * � ,+ � � � ! * �
� � �� � ! � : 4 � �� � � � + � � + ! �� * � � � � � ! � � � � � � � � � �
� � )� � ! � : 4 � �� � � �+ $ � �+ ! �� * � � � ) � � � � � � � � � � � �
� � �� � � � � � �� � � ! �� ( * � �� � ! � ;
< � �� �� ��� � �� � � ! �� ( * � )� � ! � ;
< (� �- 2 �+ � 3 : 4 ! ! � �� � � � � � �� � ! � 2 � 3 + � )� � ! � 2� 3
< =
< =� � �� � � �� � � �� � ! �+ � )� � ! �+ (� �- + ! �� � ! � * & &" %
� � � ��� � � � �! � # � � � � �� � � 2 3+ �! � # � � � � �� � � 2 $ 3 + � � ) * � < � + � �� * �
� � � � ! � �� ! � � * �� � � � � � � �� � � ! � ) � + �! � � * �� � � � � � � �� � � � � �
� � 6 � � � � ! �� � ! � � � ! � �! � � �- � � � � � ) � � ) � 4 �� �� � � - � � � � � � � � � ��
� � �� � � �� � � �� � ! �+ � )� � ! �+ (� �- +
< ! �� � ! � * �! � # � ��� < � � ) � �� � � � � � + � � % + � � �+ � � � + � � � � + $ +
< ! � � � ! � * � � � � � � � � �+ % �+ � �+ � + � � + � � � + � � � * � �
R – p.164/196
![Page 165: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/165.jpg)
Log-likelihood contours - Weibull
0.5 1.0 1.5 2.0 2.5 3.0 3.5
500
1000
1500
2000
2500
+
α
β
0.5 1.0 1.5 2.0 2.5 3.0 3.5
500
1000
1500
2000
2500
α
β
R – p.165/196
![Page 166: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/166.jpg)
Lessons from the Weibull example
The likelihood function is the same as the probability densitybut with the parameters varying and the data fixed.
For a random sample, the log-likelihood is
� � � �- :- � � � � � � � � � :- � � � �+ � � � + � � � $+ � � � + ! � ( * / '0 1
We minimize the negative of the log-likelihood
! ! � �� � : 4 � �� � � � �� ��
4 � � � �- :- � � � � � � � � � :- � � � �+ � � � * � 2 3+ � � � + ! � ( * / '0 1
�! � : 4 � ! � � ! ! � �� �+ : � � � � � ��� ( � � � � �� � � � �+ ) � � � � � � * / '0 1
The inverse of the hessian provides an estimate of thevariance-covariance matrix.
For two-parameter models we can evaluate a grid oflog-likelihood values and get contours.
Standard errors from the inverse hessian are not alwaysrealistic indications of the variability in the parameterestimates. R – p.166/196
![Page 167: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/167.jpg)
S Graphics
R – p.167/196
![Page 168: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/168.jpg)
Ross Ihaka on Graphics
Ross has a set of slides on graphics which I will use at this point
R – p.168/196
![Page 169: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/169.jpg)
Trellis Graphics with lattice
R – p.169/196
![Page 170: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/170.jpg)
Trellis Graphics
A new form of graphical display used especially for groupeddata
Trellis displays are plots with one or more panel arranged on aregular grid-like structure
Panels are the same type of graph for a subset of the data
Subsets are chosen by conditioning on continuous or discretevariables in the data
This account draws very heavily on A Tour of Trellis Graphicsby Richard A. Becker, William S. Cleveland, and Ming-JenShyu, and includes verbatim quoting from that document.
Cleveland is probably the driving force behind Trellis Graphics
In R Trellis Graphics are implemented using the package
� � � � � �� , which in turn is based on �� � �
, a new graphical engine.
R – p.170/196
![Page 171: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/171.jpg)
Trellis Graphics Example
Example showing the latitude and longitude at different depthsfor earthquakes near Fiji. See
� �� � ��� �.
long
lat
170 175 180 185
−35
−30
−25
−20
−15
Depth Depth
170 175 180 185
Depth Depth
Depth
170 175 180 185
Depth Depth
170 175 180 185
−35
−30
−25
−20
−15
Depth
R – p.171/196
![Page 172: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/172.jpg)
Trellis Graphics
Graphics is a strong point of R and the S language
Standard graphics consist of high-level plotting functions(
��� � � � � ��� � �� � � � � � � �
and low-level functions to augmentexisting plots
� � � �� � � �� � � � �� �� � � �� � � � �)
It is possible to produce multiple plots using � � � � �� � ��� � �
and
� �� � � �
A coordinated set of plots with control over aspect ratios andaxes is more difficult
Trellis provides a unifying framework for multipanel displaysbased on conditioning
Also excellent for single panel displays
Executing a trellis expression produces a trellis object which isusually displayed (unless assigned a name or used in furthercalculations)
R – p.172/196
![Page 173: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/173.jpg)
Example
A simple trellis plot can be produced as follows
� � � ! � � �� 6� � 1+ - � � � * � � ) � � � !
The � � � � � � �
data set relates the production of nitrous oxide(NOx) to the equivalence ratio (a measure of richness of thefuel/air mixture)
First argument is a formula
A modification produces a number of similar plots conditionalon the compression ratio (
�
) which has 5 values
� � � ! � � �� 6� � 1 �� + - � � � * � � ) � � � ! We can change the layout of the panels
� � � ! � � �� 6� � 1 �� + - � � � * � � ) � � � ! + ! � � � � � * � � $+ �+
produces 2 columns and 3 rows on 1 page
R – p.173/196
![Page 174: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/174.jpg)
Example
If we try and condition on a variable which takes a lot of valueswe get a lot of very small uninformative graphs
We can condition instead on intervals from the range of avariable
� �� � ���
� � � � � constructs shingles from the data with aspecified number of intervals, and a specified overlap betweensuccessive intervals
1 1 : 4 �� � � ! � � � �� � � � � ) � � � ! # 1+ � � � � � � * �+ �� � � ! � � * � �
We can then plot against
� �
� � � ! � � �� 6� � � � 1 1+ - � � � * � � ) � � � ! + � � � � � � * $
� � � � � � allows restriction of the plots to part of the data
� � � ! � � �� 6� � � � 1 1+ - � � � * � � ) � � � ! + � � � � � � *� � %
R – p.174/196
![Page 175: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/175.jpg)
Example
� �� � � � data gives yield of barley for 6 sites for 2 years
- � � � ! � � � � � � � � � � � � � � ! - � � � � � � � � � �+
- � � � * � � � ! � �+
� ! � � * � � � � ! � � � � � ! - � � � � ) � ! � � � � � � �
R – p.175/196
![Page 176: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/176.jpg)
Panel Functions
For standard graphs, panel functions are provided:
� �� � ���
� � � � � �� � �� � ��
�� � � � � ��� � �� � ���
� � � �� ��Panels can also be constructed specially, for example bycombining two panels
For example, here is a plot which has a loess smooth added
� � � ! � � �� 6� � 1 �� + - � � � * � � ) � � � ! +
� � � � ! * � �� � � � �� �� + �
� � � � ! � � � � ! � � �� + �
� � � � ! � ! � � � � ��� + �
R – p.176/196
![Page 177: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/177.jpg)
Practicalities
In a
� � � loop, printing is suppressed to avoid excessive output,explicit �� � � or � � � statements must be used
Likewise in a
� � � loop trellis plots are not produced unless anexplicit �� � � command is used
The colors for trellis plots have been optimised for screendisplay
If you intend to include a trellis plot in a printed document it isrecommended that other colours be used
The usual way to do this is via
� � � ! ! � � � � � � � � � � � � ) � � � * � � ! � . ) � � � � ( �
This is an example of setting parameters for trellis plots whichdiffers from ordinary graphics
See
� �� � � � ��
� ���
� � � or
� �� � � � ��
� ���
�� �
R – p.177/196
![Page 178: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/178.jpg)
Sweave
R – p.178/196
![Page 179: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/179.jpg)
Sweave
Traditionally, to produce a report of a statistical analysis wouldrequire the following steps:
Analyse the data, obtaining summaries, tables, graphs etcWrite the report in LATEX copying the numeric results intothe document, and using
� � � � � �� �� � � � � to incorporatethe graphs or figures
This process might be made easier by using � � � � � � to formattabular output to make it suitable for LATEX
Sweave enables reports to be prepared using a singledocument which contains the R code to carry out the analysis,and the LATEX formatting to produce the report
R – p.179/196
![Page 180: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/180.jpg)
Sweave
The source file containing the R code and LATEX formatting isrun through R first which results in all the data analysis outputbeing including in the resulting LATEX file
This file is then processed by LATEX or pdfLATEX in the usualmanner to produce the final formatted report
There are many advantages:The report can be readily updated, if for example new datais obtainedRegular reports can be easily run each month say, just bysupplying new dates and new dataResearch is readily reproducible with all the materialrequired in a single fileThe tedious and error-prone procedure of transferringoutput from R into a LATEX document is avoided
R – p.180/196
![Page 181: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/181.jpg)
Sweave
Sweave source files are noweb files with some additionalsyntax to allow control over the final output
The usual extension for them is �
�� �
Nonweb is a programming tool which allows program sourcecode and documentation to be included in a single file
Noweb files consist of segments of code and ofdocumentation called chunks
Different programs are used to extract the code (“tangle” ) ortypeset documentation together with the code (“weave”)
� �� � �
� ��� at the start of a line marks the start of a code chunk
�
at the start of a line marks the start of a documentationchunk
R – p.181/196
![Page 182: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/182.jpg)
Sweave
In the noweb syntax shown above, options can be included inthe angled brackets to either show or not show the code, andto show or not show the results
By default both code and results are shown
To hide the code, use � � � � � � � � � , to hide results, use
� � � � � � � � ��
Then both are hidden using
� � � � � � � � � � � � � � � � � � � � �� � �To include an S object in the text (as opposed to in displayedtext), use
� � � �� � �
R – p.182/196
![Page 183: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/183.jpg)
Sweave
As an alternative to the noweb sysntax described above forchunks of code, LATEX syntax can be used
Here is an example from the article by Friedrich Leisch who isresponsible for Sweave
� � � � � � �� � �� � � � � � � � � � � � � � � � � � � � � �� �
� �� �� � � � � � � � � �
� �� �� � � � � � � � � �
� � � � � � � � �� � �� � � �� � � � � � � � �
� � � � � �� � �� �
XEmacs provides support for ��� � files: most importantly, to
include a code chunk, use M-n i
R – p.183/196
![Page 184: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/184.jpg)
Processing Sweave Documents
In R use
� � ��� �� � �� ���
��� � �� � �� � � � �� � � � � ��� ��� � �
Run LATEX on the resulting document
It is possible to use a Makefile also which processes thedocument as a batch job submitted to R and then runs LATEX
� � � �� �� � �� �� �� ! � � ��� �
�" # � � � ��� �� � �� ��� %$ ��� � �� � �& � � �� �� �� ! � � ��� � & � � '
� ( ( � ( � �� � ( ( � ( � � � � � � ( ( � � �� �
� �� �) � �� �� �� ! �
* � �,+ � ( � �� �� �� ! � � + � � �� �� �� ! � � * � �
+ � -+ * � � �� �� �� ! � � + � � �� �� �� ! � � + * �
Note that the line beginning with � � � should not be brokenafter the pipe symbol in the actual file
R – p.184/196
![Page 185: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/185.jpg)
Writing R Functions and Packages
R – p.185/196
![Page 186: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/186.jpg)
Package Structure
Packages provide a set of functions along with documentationwhich can extend the capability of R.
Typically the functions provided form a coherent set ofprocedures: for example
��� � � � � � � � � � ��� � provides a set offunctions which dealt with the hyperbolic distribution
A package consists of a subdirectory containing a file called� � � � � �� � �
and the sub-directories
� � � � � � � � �� � � � �
� � � � � � � � � � � � � � �� � � � �� � � � �� � � � �� � � � � � � � �
not allof which need be present.
The subdirectory may also contain files
� � �� � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � �
and
� � �� � � �
Files like
� � � � � � � � � � �� � � � and
� � �� �� �� � � are ignored byR but are possibly useful to users
R – p.186/196
![Page 187: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/187.jpg)
Files
Files
� � � � � � � � � �
and
� � � � �� � � �
are Bourne shell script filesexecuted before and after installation on Unix
� � �� � � �
contains a copy of the license to the package:should be just a reference to copies elsewhere (in
� �� � � � � � � � � � �
)
The subdirectory name should be the same as the packagename
Names may not include
� � � � � � � � � � � � � � � � � � � � � � � � � � �
and� � �
The function � �� � � ���
� � � � � � � � can be used to create thestructure for the new package
R – p.187/196
![Page 188: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/188.jpg)
The
� � � �
File
This file should have the form
� �" � �� �� + � � � �� �
� � � � � � � �� (
� �� �� - � �� ( � ( �
�� � �� ! � � � � � � " � � �" � � � � ��� � " � � � �
� � # � � � � � � � � � + � � � � ��� � � � � � + � � �� � ��� * � � � � � � �� ��� � �� #
" � � � � �� � � � � � � � � �� � � � � �� � � �� # � � � � � � � � � �� ��
! � � � � � � � � � � � � � � � � � + � � � � ��� � � � � � + � � �� � ��� * � � � � � � �� �
� �+ � � *� � � � ��� � �� � � � � � �
�� � � � � � � � ! � �
� � � " � �,+ � � � � � # � � � � � + �� �� � �+ # * � � " � �,+ � � � � � # ��
� # � + �" � �� � * � � �� * � # � �� � � � � � � � � ��� ��
� � " � � � �� � �� � � � � � � - � � � � � �
� �� � #� � + � � � � � � � � (+ � � �" � � � � � #� � + � � � � � � � �� � # � � � � � �
R – p.188/196
![Page 189: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/189.jpg)
Package Subdirectories
The
�
subdirectory contains R code in one or more filespreferably with the extension �
�
It should be possible to read in the files using � � � � � � � �
so Robjects should be created by assignments
The � �� subdirectory should contain documentation files forthe objects in the package in R documentation (Rd) format
File names must start with a letter and have the extension �
� �
C, C++, and FORTRAN code goes in the
� �� � �
directory
Source code is compiled and linked using
� � �� when
� � � � � � � � �
is run
R – p.189/196
![Page 190: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/190.jpg)
Package Subdirectories
The
� � � � � � subdirectory is for data files which can be loadedusing
� � � � � �
Data files can be plain R code (with extension �
�), tables (with
extension
��
� � � � � ��
� � � � � or � � �� ), or � �� � � �images (with
extension
��
� � � � � � or
��
� � � � )
The
� �� � � �
subdirectory is for R scripts, and needs a� � � � �� � �
file with one line for each demo, giving its nameand description
Subdirectory
� � � � � � �
is for test code
R – p.190/196
![Page 191: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/191.jpg)
Checking Packages
� � � � � � �
is the package checker. It checks such things asmissing cross references and duplicate aliases in help filesvalid filenames (for all operating systems)valid syntax in R filescorrect syntax and meta data in Rd filesno missing documentation entries
Also various code is exercisedexamples in documentation are runtests are runif a working LATEXis present, a
��
�� �
version of thepackage’s manual is created
R – p.191/196
![Page 192: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/192.jpg)
Building Packages
� � � � � � �
builds a package from its sources
The package is created in gzipped tar format
Some checks and cleanups are performed
R – p.192/196
![Page 193: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/193.jpg)
Name Spaces
This is a mechanism to prevent interference of two packageswhich happen to use objects of the same name
Specifies which variables in the package are to be exported tomake them available to package users, and which are toimported from other packages
Implemented by placing a
� � � � � � � � � � �file in the top level
package directory
� � � � � � � � � � �
file contains name space directives
Packages with name spaces are loaded and attached to thesearch path by calling
� �� �� � but only the exported variablesare placed in the attached frame
R – p.193/196
![Page 194: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/194.jpg)
Name Spaces
The � � �� � � directive is of the form
� � �� � � � � � � �
where
�
and � are variables to be exported
A regular expression can be used with � � �� � � � � � � � � �
Imports are specified with
� � � � �: � �� � � � � � � � � �� �
which imports all variables from packages
� � � and
� ��
� �� � � �� � � imports from a named package: � �� � � �� � � � � � � � � � � �
The usual method of fully specifying a variable can also beused, as
� � � � � �
but this is inefficient and not recommended
To use S3 method dispatch, methods should be registered:
�� � � � � � � �� � �� � � � �
ensures that �� � ��
� � � will be used as a �� � � method forclass
� � �
R – p.194/196
![Page 195: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/195.jpg)
S4 Classes and Methods
R – p.195/196
![Page 196: David J. Scott Department of Statistics, University of](https://reader034.vdocuments.site/reader034/viewer/2022043006/626b2405ed8da0139c6d9b4b/html5/thumbnails/196.jpg)
S4 Classes and Methods
I will use the slides of Yong Wang for this topic
R – p.196/196