Massive disclaimer: I seriously do not know what I'm doing. I try stuff and it works at times. Please consider trying stuff too. It's fun.
I'm serious
01/05/2018
Massive disclaimer: I seriously do not know what I'm doing. I try stuff and it works at times. Please consider trying stuff too. It's fun.
I'm serious
Massive disclaimer: I seriously do not know what I'm doing. I try stuff and it works at times. Please consider trying stuff too. It's fun.
Bing
Dynamic documents in R (Xie, 2015)
knitr: Sweave + RMarkdown format, caching, + other language support (Python, C++, etc.)
knitr is backbone of R Markdown –> Focus of today
Solution 1: Statcheck
Solution 2: tidystats
Also: tidystats solves other problems than R Markdown does
Also: use tidystats and R Markdown together (Sleegers, 2018)
R Markdown offers some encompassing solutions:
Downside:
"God blessed them and said to them, "Be fruitful and reproduce" (Genesis 1:28; 9:1)
Reproducibility
Data package requirement (mandatory!): be able to reproduce all your results in reasonable amounts of time
Remember those methods sections where you seem to be unable to understand what the authors did?
Even if you have the syntax/code, would you be able to figure out which analysis produces which result in the paper?
What did I just read
Complete R Markdown file can be self-contained (including main text, data manipulation, analyses, formatting)
You always understand which number was produced by which analysis at the exact place in your manuscript
"we have to manually copy over statistical results into or out of a manuscript" (Sleegers, 2018)
Manual work situations that can be prevented:
That seems like a serious waste of time..
Several research-process related problems:
Dynamic documents/dynamic reporting solves this
Able to produce:
Can import:
R Markdown integrates text (Markdown / LaTeX), data manipulation (R), analyses (R), and formatting (Markdown / LaTeX)
Data analysis workflow becomes:
Draft intro, theory, methods within Markdown
Data preparation
Data analysis
Write results using code in Markdown file
Format document within Markdown file
Print and submit!
How does it work?
You write your normal text (similar to e.g. Word) but add 'code chunks' in your text.
Code chunks act like normal R analyses. I actually often write my code in R to try it until it works and then copy paste it to R Markdown.
When you save results to objects (like in R), you can extract the results from those objects in your main text or put them into graphs/tables.
Example
Today, I will show some examples of how to build an R Markdown document, how text drafting looks and how you can insert results and other statistical output in your document.
Fasten your seatbelts.
yaml header
main text
We'll start with the Student's sleep data (again)
res <- t.test(extra ~ group, data = sleep, var.equal = TRUE)
Code:
We found a significant positive effect of drugs on hours of sleep, r*t*(res$parameter) = round(res$statistic,3), p = round(res$p.value,3).
Produces:
We found a significant positive effect of drugs on hours of sleep, t(18) = -1.861, p = 0.079.
We conducted a meta-analysis on organizational performance feedback which tests to what extent firms change their strategies in response to negative performance feedback.
Obviously, we wanted to test some moderators and conducted a meta-regression.
spfb.wls <- rma(spfb , mods=~ public + finance*mean_age + service +
hightech + change + search, var, method="DL",
data=dat, test="knha")
res.spfb <- list(spfb.wls)
Code:
# For social performance feedback, we find that firms are significantly # more responsive to non-financial performance feedback below the # aspiration level ($\beta$ = ` round(spfb.wls$b[3],3)`, # *p*-value = ` round(spfb.wls$pval[3],3)`).
Produces:
For social performance feedback, we find that firms are significantly more responsive to non-financial performance feedback below the aspiration level (\(\beta\) = -0.081, p-value = 0.019).
Let's say you also want to produce a table with the results.
knitr allows for easy table production, if you can configure your code right (this will take (many) trial and error runs)
knitr::kable(data.frame(name=c("Constant","Public firm","Finance",
"Firm age","Service","High-tech","Change","Search","Interaction MetricAge"),
HPFBA = sapply(res.spfb, function(x) x`$`b),
CIlb = sapply(res.spfb, function(x) x`$`ci.lb),
CIub = sapply(res.spfb, function(x) x`$`ci.ub),
Pvalue=sapply(res.spfb, function(x) x`$`pval),
digits=3, col.names=c("Variable","PF < HA","CI~lb~","CI~ub~",
"*p-value*"),
caption = "Table 3: Meta-analytic weighted
least squares for social performance feedback")
| Variable | PF < HA | CIlb | CIub | p-value |
|---|---|---|---|---|
| Constant | 0.057 | -0.070 | 0.185 | 0.367 |
| Public firm | -0.143 | -0.191 | -0.096 | 0.000 |
| Finance | -0.081 | -0.148 | -0.014 | 0.019 |
| Firm age | 0.002 | 0.000 | 0.003 | 0.015 |
| Service | -0.006 | -0.066 | 0.055 | 0.845 |
| High-tech | 0.033 | -0.017 | 0.083 | 0.192 |
| Change | 0.015 | -0.057 | 0.087 | 0.682 |
| Search | -0.040 | -0.122 | 0.043 | 0.337 |
So we wanted to plot some meta-analytic interaction. We interact our 'performance metric' variable with 'organizational age'.
require(metafor)
dat$spfb <-0.5*log((1+dat$spfb_dv)/(1-dat$spfb_dv))
spfb.wls <- rma(spfb,mods=~ public + finance*mean_age + service +
hightech + search + change,var,data=dat,
method="DL",test="knha")
wi <- 1/sqrt(dat$var)
size <- 0.5 + 1.5*(wi - min(wi))/(max(wi) - min(wi))
plot(dat$mean_age, dat$spfb_dv, pch=19, xlim=c(0,120), ylim=c(-0.5,0.5), cex=size,
xlab="Firm age (years)", ylab="Correlation", las = 1, bty="l")
preds.nonfinance <- predict(spfb.wls, newmods=cbind(1,0,0:120,0,0,0,0,0))
lines(0:120, preds.nonfinance$pred,col="red",lwd=2)
lines(0:120, preds.nonfinance$ci.lb,lty="dotted",col="red",lwd=2)
lines(0:120, preds.nonfinance$ci.ub,lty="dotted",col="red",lwd=2)
preds.finance <- predict(spfb.wls, newmods=cbind(1,1,0:120,0,0,0,0,0:120))
lines(0:120, preds.finance$pred,col="blue",lwd=2)
lines(0:120, preds.finance$ci.lb,lty="dotted",col="blue",lwd=2)
lines(0:120, preds.finance$ci.ub,lty="dotted",col="blue",lwd=2)
legend("topright", legend=c("Non-financial","Financial"),lty=c("dotted","solid"),col=c("red","blue"),lwd=2)
title("P < SA for public")
Because all statistics are structured in a manner you devised, it is very easy to write functions that return whatever-style output.
Using code inside your text allows you to specify where you want statistical output, which can be individual tests, tables, graphs, etc.
Being smart with packages allows you to make a completely self-contained paper within your Rmd file
if(!require(devtools)){install.packages("devtools")}
require(devtools)
if(!require(osfr)){install_github("chartgerink/osfr",force=TRUE)}
require(osfr)
download_files('bj786', private=TRUE) #masterdata.csv
download_files('wqyz2', private=TRUE) #reference file for formatting
download_files('pz3kx', private=TRUE) #20170711studies.csv
download_files('3v9jf', private=TRUE) #csl file for AMR references
download_files('g6xur', private=TRUE) #references
Live example from a conference paper
The pesky learning curve:
For PhD students:
I think reproduciblity is a daily process. Reproducible research starts today; not when your project is done.
When the TSB Science Committee requests to check your paper, you should be able to send them what they request within five minutes.
R Markdown is a great tool to help you do that.