1 the r project for statistical computing eric fouh, christopher poirel cs 5604 fall 2010

14
1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

Upload: willa-stewart

Post on 16-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

1

The R Project for statistical computing

Eric Fouh, Christopher Poirel

CS 5604

Fall 2010

Page 2: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

2

What is R?

Page 3: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

3

Usages of R

• statistics system

• data handling and storage facility

• calculations on arrays, in particular matrices

• integrated collection of tools for data analysis

• graphical tool for data analysis

• programming language (called ‘S’)

Page 4: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

4

Structure of R• R functions and dataset are stored in packages

• R is provided with 25 “standard” packages:

• Hundreds of contributed packages (written by different authors ) are available

Package Name Description

baseBase R functions

dataset Base R datasets

graphicsR functions for base graphics

stats R statistical functions

utils R utility functions

matrix Matrix package

class Functions for classification

clusterFunctions for cluster analysis

Page 5: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

5

R and Information Retrieval

IR Concept R package

Text preprocessing

Term weighting, scoring

tm package: Constructs a term-document matrix, using one of the the following weighting functions TF (weightTf), TF-IDF

(weightTfIdf). e.g. tdm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))

vector space model for scoring clv package: dot.product function returns a cosine similarity

measure of two vectors.

vector space classification class package: performs a k-Nearest Neighbour Classification on a dataset

Hierarchical clustering Cluster package: computes clusters (agglomerative hierarchical ) on dataset

Latent Semantic Indexing Base package: performs Singular Value Decomposition on matrix

Page 6: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

6

Getting started with R• To start R

>R• To quit R

>q()• To see installed packages

>library()• To load a package

>library(class)• To start help

> help.start()• To create a vector

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)• To create a matrix

> x <- array(1:20, dim=c(4,5)) # Generate a 4 by 5 array filled with number from 1 to 20.• To display an object

>x• To delete an object

>rm x• To load data from file

>HousePrice <- read.table("houses.data")

Page 7: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

7

Examples (1)

• Term-Document Matrix

Page 8: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

8

Examples (1)

Page 9: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

9

Examples (2)

• Eigenvalues and eigenvectors

Page 10: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

10

Examples(3)

Page 11: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

11

Examples(3)

• Law Rank approximation

Page 12: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

12

Examples(3)

Page 13: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

13

Examples(3)

Page 14: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010

14

Resources

• IIR Book

• http://www.r-project.org/

Questions?