Download - Computing for Research I Spring 2012
Computing for Research ISpring 2012
Primary Instructor: Elizabeth Garrett-Mayer
Stata GraphicsFebruary 16
Basic syntax for commands• prefix: command varlist, options
• Examples:– regress y x, level(90)– by race: sum y x, detail– ttest y, by(x) unequal
Stata Graphics
• Maybe we can just end class now! • Check out these links:
– http://www.ats.ucla.edu/stat/stata/library/GraphExamples/default.htm
– http://www.ats.ucla.edu/stat/stata/topics/graphics.htm
– http://data.princeton.edu/stata/graphics.html– http://www.stata.com/capabilities/graphics.html
Basic univariate displays
• Boxplots• Stem and leaf• Histograms• Density plots
Ceramide Data
• Let’s look at the ceramide markers• What are their distributions?• Are there outliers?• Should we consider taking logs, or using % change?
Results of a phase II trial of gemcitabine plus doxorubicin in patients with recurrent head and neck cancers: serum C₁₈-ceramide as a novel biomarker for monitoring response.Saddoughi SA, Garrett-Mayer E, Chaudhary U, O'Brien PE, Afrin LB, Day TA, Gillespie MB, Sharma AK, Wilhoit CS, Bostick R, Senkal CE, Hannun YA, Bielawski J, Simon GR, Shirai K, Ogretmen B. Clin Cancer Res. 2011 Sep 15;17(18):6097-105. Epub 2011 Jul 26.
Histogram
• hist c180
.02
.04
.06
Den
sity
0 20 40 60C18 ceramide
Let’s make it prettier
* prettier histogramshist c18 , freq xaxis(1 2) ylabel(0(2)24) xlabel(20 "Twenty" 40 "Forty")
hist c18, title("Histogram of C18 Ceramide") subtitle("PI: K. Shirai")
hist c18, ytitle("number of patients") freq yline(0(10)20)
hist c18, xaxis(1 2) xlabel(19.6 "mean" 11.9 "median", axis(2) grid)
finding help on these can sometimes be tricky! e.g. help axis_choice_options
02
46
810
1214
1618
2022
24Fr
eque
ncy
0 20 40 60C18 ceramide
Twenty FortyC18 ceramide
0.0
2.0
4.0
6D
ensi
ty
0 20 40 60C18 ceramide
PI: K. ShiraiHistogram of C18 Ceramide
0.0
2.0
4.0
6D
ensi
ty
meanmedianC18 ceramide
0 20 40 60C18 ceramide
05
1015
2025
num
ber o
f pat
ient
s
0 20 40 60C18 ceramide
Boxplots
• graph box c180
2040
6080
C18
cer
amid
e
Boxplotsgraph box c18, by(cycle)graph box c18, over(cycle)
tab cyclegraph box c18 if cycle<7, over(cycle)
sort patient cyclemerge m:1 patient using "Ptdata.GemDox.dta"graph box c18 if cycle<7, over(cycle) over(gender)
graph hbox c18, over(initial) capsize(5)
020
4060
800
2040
6080
020
4060
801 3 5
7 9 11
15 19
C18
cer
amid
e
Graphs by Cycle
020
4060
80C
18 c
eram
ide
1 3 5 7 9 11 15 19
010
2030
4050
C18
cer
amid
e
1 3 5
010
2030
4050
C18
cer
amid
e
f m1 3 5 1 3 5
0 20 40 60 80C18 ceramide
SD
PR
PD
CR
0 20 40 60 80C18 ceramide
SD
PR
PD
CR
graph hbox c18, over(initial) capsize(5)
graph hbox c18, over(initial) medtype(marker)medmarker(msymbol(+) msize(large))
graph hbox c18, over(initial) ytitle(“C18”)
Labels
• Sometimes xlabels cannot be applied (e.g. boxplots)
• need to label your values• Example: cycle for boxplots
– label define cycle 1 "cycle 1" 3 "cycle 3" 5 "cycle 5" 7 "cycle 7"
– label values cycle cycle– graph box c18 if cycle<7, over(cycle)
• (Hint: use this on the homework!)
Stem and Leaf. stem c18
Stem-and-leaf plot for c18ceramide (C18 ceramide)
c18ceramide rounded to nearest multiple of .1plot in units of .1
0** | 42,43,44,46 0** | 57,57,67,81,89,90,96,98,99,99 1** | 01,06,08,08,14,15,19,20,35,44 1** | 62 2** | 03,15,16,18,19,19,22 2** | 82 3** | 17 3** | 4** | 23,49 4** | 58,68,68 5** | 5** | 6** | 37 6** | 86
Dotplot
• Excellent way to show data across groups when you have a relatively small dataset
• dotplot y, over(group)
dotplot c18, over(cycle)dotplot c18, over(gender)dotplot c18, over(gender) nogroupdotplot c18, over(gender) nogroup jitter(3)dotplot c18, over(gender) nogroup median center
Dotplot, by gender
020
4060
80C
18 c
eram
ide
f mgender
Scatterplots• Two way graph• Syntax:
– graph twoway scatter y x1 x2– graph twoway scatter y x1
• Example:– graph twoway scatter c18 totalceramide
020
4060
80C
18 c
eram
ide
400 600 800 1000 1200total ceramide levels
Regression example
• Scatterplot• Residual plots• Leverage • Fitted line with raw data
Code graph twoway scatter c18 totalcerregress c18 totalcer
* residual plot* (residual vs. fitted)rvfplot
* the long way* 1. generate a new variable from the regression, residualspredict resid, res* 2. generate a new variable from the regression, fitted valuespredict fitscatter res fit, yline(0)* leverage vs. residual plotlvr2plot
* take transform of C18?gladder c18boxcox c18
* generate new variablegen logc18=log(c18)scatter logc18 totalcerscatter logc18 totalcer, mlabel(gender) scatter logc18 totalcer, mlabel(gender)
s(i)scatter logc18 totalcer, s(Oh)
* redo regressionregress logc18 totalcerrvfplot, yline(0)lvr2plotpredict logfit
* make plot of fitted model and raw datascatter logfit logc18 totalcerscatter logfit logc18 totalcer, s(i o) c(l .)graph twoway scatter logfit totalcer, s(i) c(l) || scatter logc18 totalcer, s(o) c(.)
The next graph to create
Fancier way to put regression linesinfile str14 country setting effort change /// using http://data.princeton.edu/wws509/datasets/effort.raw
graph twoway scatter change setting graph twoway (scatter change setting ) (lfit change setting )graph twoway (scatter change setting ) (qfit change setting )graph twoway (scatter change setting ) (lfitci change setting )
• scatter makes a scatterplot of the two variables
• lfit plots the regression line of y on x
• qfit plots a fitted quadratic model of y on x
• lfitci plots the line AND a confidence interval!
Fancier way to put regression lines0
1020
3040
40 60 80 100setting
change Fitted values
Plot using qfit
-20
020
4040 60 80 100
setting
95% CI Fitted valueschange
Plot using lfitci
Bolivia
Brazil
ChileColombia
CostaRica
Cuba
DominicanRep
Ecuador
ElSalvador
GuatemalaHaiti
Honduras
Jamaica
MexicoNicaragua
Panama
ParaguayPeru
TrinidadTobago
Venezuela
-20
020
40
40 60 80 100setting
95% CI Fitted valueschange
• One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago (and to a lesser extent Panama and Nicaragua).
• We can solve this problem by specifying the position of the label relative to the marker using a 12-hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to the left) and the mlabv() option.
• We create a variable to hold the position set by default to 3 o'clock and then move Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock (we can also move Nicaragua and Panama up a bit, say to 2 o'clock).
graph twoway (lfitci change setting) (scatter change setting, mlabel(country) )
gen pos=3 replace pos = 11 if country == "TrinidadTobago" replace pos = 9 if country == "CostaRica" replace pos = 2 if country == "Panama" | country == "Nicaragua“
graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) )
Bolivia
Brazil
ChileColombia
CostaRica
Cuba
DominicanRep
Ecuador
ElSalvador
GuatemalaHaiti
Honduras
Jamaica
MexicoNicaragua
Panama
ParaguayPeru
TrinidadTobago
Venezuela
-20
020
40
40 60 80 100setting
95% CI Fitted valueschange
Legends
Bolivia
Brazil
Chile
Colombia
CostaRica
Cuba
DominicanRep
Ecuador
ElSalvador
Guatemala
Haiti
Honduras
Jamaica
MexicoNicaragua
Panama
Paraguay
Peru
TrinidadTobago
Venezuela
-20
020
40Fe
rtilit
y D
eclin
e
40 60 80 100setting
linear fit 95% CI
Fertility Decline by Social Setting
Bolivia
Brazil
Chile
Colombia
CostaRica
Cuba
DominicanRep
Ecuador
ElSalvador
Guatemala
Haiti
Honduras
Jamaica
MexicoNicaragua
Panama
Paraguay
Peru
TrinidadTobago
Venezuela
-20
020
40Fe
rtilit
y D
eclin
e
40 60 80 100setting
Fertility Decline by Social Setting
graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI"))
graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(off)
Spaghetti plotsCommand available from UCLA: spagplot
* spaghetti plotsclear insheet using "I:\MUSC Oncology\Shirai, Keisuke\October2010\ceramide.csv"findit spagplotspagplot c18 cycle, id(patient)spagplot c18 cycle, id(patient) nofit
* remove patients who only have cycle=1sort patient cycle by patient: gen visit=_negen maxvis=max(visit), by(patient)spagplot c18 cycle if maxvis>1, id(patient) nofit
* or, use c(L)graph twoway scatter c18 cycle if maxvis>1, c(L)help connectstyle
other neat stuff
• graph matrix• saving graphs: click and save as desired format• saving and combining (see princeton site,
section 3.3)– http://data.princeton.edu/stata/graphics.html
• See GraphExamples on ucla site:– http://www.ats.ucla.edu/stat/stata/library/GraphExamples/