二、计算新闻传播学工具介绍:r introduction to r for...
TRANSCRIPT
![Page 1: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/1.jpg)
二、计算新闻传播学工具介绍:R Introduction to R for CCR
Hai Liang 梁海
复旦大学2014年FIST课程《计算新闻传播学》 Computational Communication Research
![Page 2: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/2.jpg)
Why Programing + Why R
Why use R?
Inexpensive
Cross-platform
Extensible
Graphics better than many
You already know it
Familiarity with matrix algebra
Must be explicit
Need integrated calculator
Why avoid R?
Steep learning curve
Data cleaning can be difficult
Support limited
Extensibility needed to do what you need
Data types can be confusing
Limits to “Big Data“
Must be explicit
Base 1 (not 0)
2
![Page 3: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/3.jpg)
Outline
Section I:
1. R + Rstudio
2. I/O
3. Basic Syntax
4. Data Structure
5. Programming Tools
Section II:
1. Data Management
2. Statistics
3. Hands-On
3
![Page 4: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/4.jpg)
Readings
1. Torfs & Brauer (2014). A (very) short
introduction to R.
2. Kabacoff (2011). R in action: Data analysis and
graphics with R.
4
![Page 5: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/5.jpg)
Section I
Data Structure
o Vector
o Matrix
o List
Programing
o If-statement
o For-loop
o Function
5
![Page 6: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/6.jpg)
6
1. R + RStudio
1) Install R
2) Install RStudio
6
![Page 7: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/7.jpg)
7
1. R + RStudio
1) RStudio layout
2) Working directory
3) R packages
7
![Page 8: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/8.jpg)
8
2. Input/Output
1) TXT, CSV
2) SAV, DTA
3) Save & load x.Rdata
files
1) read.table(file=“”), read.csv(file=“”); write.table(), write.csv()
2) library(foreign), read.spss(file=“”, to.data.frame = T), read.dta(file=“”)
3) save(data,file=“data.Rdata”), load(“data.Rdata”)
8
![Page 9: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/9.jpg)
3. R Basic Syntax
1) +, -, *, /, ^, sqrt
2) Variables
Height <- 180, Weight <- 50, print
height*weight
3) Using functions
sum(1,2,3)
mean(1,2,3)
9
![Page 10: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/10.jpg)
3. R Basic Syntax—Operators
10
Operator Description Example
<- Assign a value a <- 1+2
+ Add x+y
- Subtract x-y
* Multiply x*y
/ Divide x/y
** or ^ Exponentiation x^y or x**y
%% Modulus x%%y
%/% Integer division x%/%y
Operator Description Example
<, > Less, greater than x<y
<=, >= Less, greater than or equal to
x>=y
== Equal to x==y
!= Not equal to x!=y
! Not !x
| Or x | y
& And x & y
isTRUE() Test if true isTRUE(x==y)
Arithmetic Operators Logical Operators
![Page 11: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/11.jpg)
4. Data Types
There are three general modes of data (inside parentheses)
Strings (“Why, hi there”)
Numbers (5)
TRUE/FALSE (TRUE)
Missing data (NA) – Note, there are no quotes (“NA”)
11
![Page 12: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/12.jpg)
4. Data Structure
1) Vector
2) Matrices
3) Data frames
4) Lists
12
Source: Kabacoff (2011)
![Page 13: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/13.jpg)
4. Data Structure – Vector
Vector is a list of values
[numeric, logic, or string]
Define a vector
V <- c()
V <- c(1,2,”hi”)
V <- seq(5,9,0.5)
V <- c(1:7)
Vector access V[1], V[1:2], V[c(1,3)]
Vector names names(V) <- c(“first”,
”second”, ”third”)
V[“first”]
Vector math V {+,-,*,/} 1
V+V == V*2
V*V == V^2
sqrt(V)
13
![Page 14: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/14.jpg)
4. Data Structure – Matrices
Data in rows and
columns (same mode)
Define a matrix
m <- matrix()
matrix(1,5,5)
V <- c(1:9)
m<-matrix(V,3,3)
Matrix access
m[1,2]; m[1,]; m[,2]
m[,2:3]; m[,c(1,3)]
Matrix math
m {+,-,*,/} 1
m+m = m*2
m%*%m
cbind/rbind(m,m)
14
![Page 15: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/15.jpg)
4. Data Structure – Data Frames
Matrix + cols with
different modes
Define a data frame
Weights <- c(1:8)
Prices<- c(2:9)
Types <-c(T,F,F,…)
Data <-
data.frame(Weights,
Prices, Types)
Data frame access Data[1,2]
Data$Prices, Data[[“Prices”]]
Data frame math Data$Weigths*
Data$Prices
mean(Data$Prices)
merge(data1,data2,by=“
Prices”,all=T)
15
![Page 16: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/16.jpg)
4. Data Structure – Lists
vector + matrix + data
frame etc.
Define a list
v1<-c(1,6,7,8)
v2<-c(2,4)
m<-matrix(1,2,4)
L <- list (v1, v2, m)
List access
L[[1]], L[[‘name’]]
16
![Page 17: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/17.jpg)
17
5. Programming Tools
1) If-statement
2) For-loop
3) Function
if (cond) statement else statement
ifelse (condition, ture, false)
If (cond) {
statement
} else {
statement
}
17
![Page 18: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/18.jpg)
18
5. Programming Tools
1) If-statement
2) For-loop
3) Function
An example
if (x>50} {
x=100
print (x)
} else if (x<=50 & x>10) {
x=50
print (x)
} else {
print (x)
}
18
![Page 19: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/19.jpg)
19
5. Programming Tools
1) If-statement
2) For-loop
3) Function
for (name in expr_1) {statements}
while (cond) {statements}
x=c("LH","Jonanthan","winson","Qinjie")
for (name in x) {
print (nchar(name))
}
for (i in 1:4) {
print (nchar(x[i]))
}
19
![Page 20: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/20.jpg)
20
5. Programming Tools
1) If-statement
2) For-loop
3) Function
myfuction <- function(arg1=default,arg2,…) {
statements
return (objects)
}
space <- function(len=5,wid=20){
sp<-len*wid
return (sp)
}
20
![Page 21: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/21.jpg)
Hands-On
Exercise 1.
http://tryr.codeschool.com/
Homework 2.1
a. Create a list with length 10: for the first component list[[1]], the dimension is 1, the second is 1*2, the third is 3*3, the fourth is 4*4, and so on. The values should be selected randomly from 1:100.
b. For each component in the list, select the values > 50
c. and write a function to calculate a value = sd (values)/mean(values) when length(values)>1, otherwise return 0.
d. Loop for each component, you will get 10 values, and then calculate the sum of the 10 values
e. Repeat the process for many times (could you find any patterns?)
21
![Page 22: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/22.jpg)
Section II
Data Management
o Aggregating
o Reshaping
Statistics
o Descriptive
o Graphics
o Linear Model
22
![Page 23: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/23.jpg)
23
6. Data Management
Basics
Crating new variables
Recoding variables
Renaming variables
Missing value
Merging datasets
Subsetting datasets
Advances
Aggregating dataset
The reshape package
o install.packages(“reshape”)
o library(reshape)
o cast()
o melt()
![Page 24: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/24.jpg)
6. Data Management – Basics I
Creating new variable load("sampleData.Rdata") # set working directory
sampleData$fn <- sampleData$User_followers_count+sampleData$User_friends_count #??!!
sampleData$User_followers_count<-as.numeric(sampleData$User_followers_count)
sampleData$User_friends_count<-as.numeric(sampleData$User_friends_count)
sampleData$fn <- sampleData$User_followers_count+sampleData$User_friends_count
Calculate a variable indicating favorites per post
Recoding a variable sampleData <- within(sampleData,{
popcat <- NA
popcat[User_followers_count > 282] <- "Popular"
popcat[User_followers_count <= 282] <- "Unpopular" })
24
![Page 25: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/25.jpg)
6. Data Management – Basics II
Renaming variable load("sampleData.Rdata")
colnames(sampleData)[c(2,3)]<-c("article_id","content")
Missing value load("sampleData.Rdata")
sampleData$User_verified_reason[nchar(sampleData$User_verified_reason)==0]<-NA
is.na(sampleData$User_verified_reason)
sampleData$User_verified_reason[is.na(sampleData$User_verified_reason)]<-"Unknown“
sum(c(1,2,3,NA)) = ?!
sum(c(1,2,3,NA), na.rm=TRUE)
25
![Page 26: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/26.jpg)
6. Data Management – Basics III
Subsetting datasets
o Selecting/keeping variables newdata<-sampleData[,c(1,3)]
newdata<-sampleData[,c("created_at","mid")]
o Dropping variables newdata<-sampleData[!(names(sampleData)%in%c("text","source"))]
newdata<-sampleData[c(-3,-4)]
sampleData$text<-NULL
o Selecting observations newdata<-sampleData[c(2:30),]
newdata<-sampleData[which(sampleData$User_gender==“m"&
sampleData$User_verified==“FALSE"),]
26
![Page 27: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/27.jpg)
6. Data Management – Basics IV
Merging datasets
o Adding columns load("netD.Rdata")
load("sampleData.Rdata")
newdata<-merge(netD,sampleData[,c("User_screen_name","User_gender")],
by.x="sender",by.y="User_screen_name")
newdata <-newdata[order(newdata$sender, newdata$receiver),] # sort dataset
o Adding rows total <- rbind(dataframeA,dataframeB)
27
![Page 28: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/28.jpg)
6. Data Management – Advances I
Aggregating
o aggregate(data,by,FUN) load("sampleData.Rdata")
sampleData$User_followers_count<-as.numeric(sampleData$User_followers_count)
aggregate(sampleData$User_followers_count,by=list(sampleData$User_verified),FUN="mean ",na.rm=T)
aggregate(sampleData$User_followers_count,by=list(sampleData$User_verified),FUN="media n",na.rm=T)
o aggregate(y~x,data,FUN) aggregate(User_followers_count~User_verified,data=sampleData,FUN="median",na.rm=T)
o aggregate(cbind(y1+y2)~x,data,FUN) aggregate(cbind(User_followers_count,User_friends_count)~User_verified,data=sampleData,F UN="median",na.rm=T)
28
![Page 29: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/29.jpg)
29
6. Data Management – Advances II
Reshaping data with the melt() and cast() functions in Kabacoff (2011), p115.
![Page 30: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/30.jpg)
30
6. Data Management – Advances II
With aggregation load("netD.Rdata")
library("reshape")
netD$freq<-1
withagg<-cast(netD,sender~receiver,sum)
Without aggregation withoutagg<-cast(netD,sender+receiver~issue)
Reverse aggregation nda<-melt(withoutagg,id=c("sender","receiver"))
![Page 31: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/31.jpg)
7. R for Statistics
Descriptive
Descriptive
Chi-square test
T-test
Correlation
One way ANOVA
Graphs
Bar plot
Histogram
Scatter plot
Linear Model
Estimation
Diagnosis
information
31
![Page 32: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/32.jpg)
7. R for Statistics – Descriptive I
Descriptive statistics
via summary()
load("sampleData.Rdata")
sampleData$User_followers_count<-as.numeric(sampleData$User_followers_count)
sampleData$User_friends_count<-as.numeric(sampleData$User_friends_count)
sampleData$User_gender<-as.factor(sampleData$User_gender)
summary(sampleData[c("User_followers_count","User_friends_count","User_gender")])
via by()
by(sampleData[c("User_followers_count","User_friends_count")],sampleData$User_gender,summa
ry)
32
![Page 33: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/33.jpg)
7. R for Statistics – Descriptive II
Descriptive statistics
via table()
table(sampleData$User_gender,sampleData$User_verified)
table(netD$sender,netD$receiver) # edgelist=>matrix
Chi-square test [significance indicates ‘Not Independent’]
install.packages("vcd")
library(vcd)
mytable<-xtabs(~User_gender+User_verified, data=sampleData)
mytable<-table(sampleData$User_gender,sampleData$User_verified)
chisq.test(mytable)
33
![Page 34: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/34.jpg)
7. R for Statistics – Descriptive III
Descriptive statistics
Categories association
mytable<-xtabs(~User_gender+User_verified, data=sampleData)
assocstats(mytable)
Correlation
cor(sampleData[c("User_followers_count","User_friends_count")],method="spearman",use=
"complete.obs")
cor.test(sampleData$User_followers_count, sampleData$User_friends_count, alternative =
"two.side", method ="pearson" )
34
![Page 35: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/35.jpg)
7. R for Statistics – Descriptive IV
Descriptive statistics
T-test
t.test(User_followers_count~User_gender,sampleData) #gender difference of n of
followers
t.test(sampleData$User_followers_count, sampleData$User_friends_count)
One-way ANOVA
fit<-aov(User_followers_count~User_province,data=sampleData)
summary(fit)
TukeyHSD(fit)
35
![Page 36: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/36.jpg)
36
7. R for Statistics – Graphs I
count<-table(sampleData$User_verified)
barplot(count,main="Simple Bar Plot",
xlab="Verified", ylab="Frequency")
barplot(count,main="Simple Bar Plot",
xlab="Verified", ylab="Frequency“, horiz=TRUE)
36
![Page 37: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/37.jpg)
37
7. R for Statistics – Graphs II
hist(sampleData$User_followers_count,
breaks=20,
col="red",
xlab="Number of followers",
main="Colored histogram with 20 bins")
hist(sampleData$User_followers_count,
freq=FALSE, #new line
breaks=20,
col="red",
xlab="Number of followers",
main="Histogram, rug plot, density curve")
rug(jitter(sampleData$User_followers_count))
lines(density(sampleData$User_followers_count), col="blue", lwd=2)
37
![Page 38: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/38.jpg)
38
7. R for Statistics – Graphs III
plot(sampleData$User_followers_count, sampleData$User_friends_count,
main="Basic Scatter plot of Followers vs. Friends", xlab="No. of Followers",
ylab="No. of Friends", pch=19)
abline(lm(User_followers_count~User_friends_cou nt,data=sampleData), col="red", lwd=2, lty=1)
lines(lowess(sampleData$User_followers_count,sa mpleData$User_friends_count), col="blue", lwd=2, lty=2)
?!
38
![Page 39: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/39.jpg)
39
7. R for Statistics – Graphs III
plot(log(sampleData$User_followers_count),log(s ampleData$User_friends_count), main="Basic Scatter plot of Followers vs. Friends",xlab="log_No. of Followers", ylab="log_No. of Friends", pch=19)
abline(lm(log(User_followers_count)~log(User_frie nds_count),data=sampleData), col="red", lwd=2, lty=1)
lines(lowess(log(sampleData$User_followers_coun t),log(sampleData$User_friends_count) ), col="blue", lwd=2, lty=2)
39
![Page 40: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/40.jpg)
7. R for Statistics – Linear Model I
Estimation
Linear models are estimated using the lm() function. It is a good idea to assign the model to an object in order to access model information. Dependent variable is listed first, all independent variables follow the ~. fit<-lm(log(User_followers_count)~log(User_friends_count)+User_gender+ as.factor(User_verified), data=sampleData)
Let’s see the results… summary(fit)
40
![Page 41: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/41.jpg)
41
7. R for Statistics – Linear Model II
Post-estimation Plots
There is a series of diagnostic plots available with the plot() command.
plot(fit)
This is also accessible in a single chart
layout(matrix(c(1,2,3,4),2,2)) plot(fit)
41
3 4 5 6 7 8 9 10
-40
4
Fitted values
Resid
uals
Residuals vs Fitted172
16293
-3 -2 -1 0 1 2 3-2
02
4
Theoretical Quantiles
Sta
ndard
ized r
esid
uals
Normal Q-Q172
162
47
3 4 5 6 7 8 9 10
0.0
1.0
Fitted values
Sta
ndard
ized r
esid
uals
Scale-Location172
16247
0.00 0.02 0.04 0.06
-30
24
Leverage
Sta
ndard
ized r
esid
uals
Cook's distance
Residuals vs Leverage
162
47192
![Page 42: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/42.jpg)
7. R for Statistics – Linear Model III
Information
In addition, there is quite a bit of information available within the fit object (a comprehensive list is here http://www.inside-r.org/r-doc/stats/lm). x<-residuals(fit) – Accesses model residuals plot(x) – Plot the residuals abline(a=0, b=0,col="red") – Add a horizontal line, intercept (a) = 0, slope (b) = 0, red (col) confint(fit) – Confidence intervals for each coefficient fitted(fit) – Predicted values
42
![Page 43: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/43.jpg)
43
Hands-On
Homework 3.1
a. Read the csv file “authorlist.csv”
b. Select columns “Author Name” and
“Discipline”. The variable discipline
contains one or a set of words.
c. Output: a co-occurrence matrix M,
e.g., M[1,1] =
communication, health, 5 authors
Test the hypothesis:
People are more inclined to follow the ones who are verified when controlling for gender difference and the number of friends.
43
![Page 44: 二、计算新闻传播学工具介绍:R Introduction to R for …weblab.com.cityu.edu.hk/workshops/fudan-ccr/Intro_R_for_CCR.pdfExtensibility needed to do what ... The reshape](https://reader035.vdocuments.site/reader035/viewer/2022082205/5afcb0237f8b9a864d8c77f0/html5/thumbnails/44.jpg)
THANK YOU & CONTACT US @
weblab.com.cityu.edu.hk