Why I hope that Statcheck is useless within one year: A case for dynamic documents

01/05/2018

Dynamic documents in R Markdown

Massive disclaimer: I seriously do not know what I'm doing. I try stuff and it works at times. Please consider trying stuff too. It's fun.

I'm serious

Dynamic reporting in R Markdown

Massive disclaimer: I seriously do not know what I'm doing. I try stuff and it works at times. Please consider trying stuff too. It's fun.

Bing

A short history

Literate programming (Knuth, 1984)
Dynamic documents in R (Xie, 2015)
Sweave: Integration of R into LaTeX documents
knitr: Sweave + RMarkdown format, caching, + other language support (Python, C++, etc.)
knitr is backbone of R Markdown –> Focus of today

Why should everybody document dynamically - A tripartite

Statistical inconsistencies are everywhere.. (Nuijten et al., 2016)
Reproducibility = life
Doing manual work in 2018 seems rather silly

Statistical inconsistencies are everywhere

Inconsistencies in psych: it's bad (Nuijten et al., 2016). Replication in management: same/worse (van Zelst et al., work in progress)
Reasons are well-known: typos, copy-paste errors, incorrect rounding, etc.

Solution 1: Statcheck

But statcheck can only read APA
Statcheck shows where you were wrong, but doesn't fix it

Statistical inconsistencies are everywhere (2)

Solution 2: tidystats

R-only, although I'm not sure why this is a problem (:
Does not yet support most analyses
Specific output (e.g. CIs) has to be added manually

Also: tidystats solves other problems than R Markdown does

Also: use tidystats and R Markdown together (Sleegers, 2018)

Statistical inconsistencies are everywhere (3)

R Markdown offers some encompassing solutions:

It's not after the fact: mistakes are prevented while writing
Supports all analyses (e.g., I managed to produce CIs in a table from meta-analytic structural equation modeling)
If R can analyze it, R Markdown can print it

Downside:

The good old steep learning curve

Reproducibility = problem then, problem now

"God blessed them and said to them, "Be fruitful and reproduce" (Genesis 1:28; 9:1)

Reproducibility

Reproducibility = life

Will you capable of reproducing your own results in half a year from now?
Will your co-authors be able to do this?
Will a colleague that is not a co-author?
An independent researcher who's in your field of expertise?

Data package requirement (mandatory!): be able to reproduce all your results in reasonable amounts of time

Reproducibility = life (2)

Remember those methods sections where you seem to be unable to understand what the authors did?

Even if you have the syntax/code, would you be able to figure out which analysis produces which result in the paper?

Reproducibility = life (3)

What did I just read

Reproducibility = life (4)

Complete R Markdown file can be self-contained (including main text, data manipulation, analyses, formatting)

You always understand which number was produced by which analysis at the exact place in your manuscript

Don't forget to keep annotating your code: you will forget what you did

Doing manual work in 2018 seems rather silly

"we have to manually copy over statistical results into or out of a manuscript" (Sleegers, 2018)

Manual work situations that can be prevented:

Redoing analyses when some new data came in (e.g., new papers for your meta-analysis)
"Can you try this other statistical specification and send me the table with output?"
"Let's submit to this other journal which has completely different formatting requirements"

That seems like a serious waste of time..

Conclusion

Several research-process related problems:

Not all statistics are correctly reported
Non-reproducible research is not research –> we cannot take it seriously
Wasting time on non-useful actions

Dynamic documents/dynamic reporting solves this

Some general statements about R Markdown

Able to produce:

Word / ODT
PDF (LaTeX)
html / Github compatible markdown
Slides for presentation
Shiny documents

Can import:

data / other R-files / 'internet' files / bib-file / csl-file

R Markdown functionality

R Markdown integrates text (Markdown / LaTeX), data manipulation (R), analyses (R), and formatting (Markdown / LaTeX)

Data analysis workflow becomes:

Draft intro, theory, methods within Markdown
Data preparation
Data analysis
Write results using code in Markdown file
Format document within Markdown file
Print and submit!

R Markdown functionality (2)

How does it work?

You write your normal text (similar to e.g. Word) but add 'code chunks' in your text.

Code chunks act like normal R analyses. I actually often write my code in R to try it until it works and then copy paste it to R Markdown.

When you save results to objects (like in R), you can extract the results from those objects in your main text or put them into graphs/tables.

Example

So let's put this into practice #hoedan

Today, I will show some examples of how to build an R Markdown document, how text drafting looks and how you can insert results and other statistical output in your document.

Fasten your seatbelts.

Creating a document

yaml header

Write your draft

main text

How about analyses?

We'll start with the Student's sleep data (again)

res <- t.test(extra ~ group, data = sleep, var.equal = TRUE)

Code:

We found a significant positive effect of drugs on hours of sleep, r*t*(res$parameter) = round(res$statistic,3), p = round(res$p.value,3).

Produces:

We found a significant positive effect of drugs on hours of sleep, t(18) = -1.861, p = 0.079.

Let's move it up a notch

We conducted a meta-analysis on organizational performance feedback which tests to what extent firms change their strategies in response to negative performance feedback.

Obviously, we wanted to test some moderators and conducted a meta-regression.

spfb.wls <- rma(spfb , mods=~ public + finance*mean_age + service + 
                hightech  + change + search, var, method="DL", 
                data=dat, test="knha")
res.spfb <- list(spfb.wls)

Writing the results section

Code:

# For social performance feedback, we find that firms are significantly 
# more responsive to non-financial performance feedback below the 
# aspiration level ($\beta$ = ` round(spfb.wls$b[3],3)`, 
# *p*-value = ` round(spfb.wls$pval[3],3)`).

Produces:

For social performance feedback, we find that firms are significantly more responsive to non-financial performance feedback below the aspiration level ($\beta$ = -0.081, p-value = 0.019).

Producing table

Let's say you also want to produce a table with the results.

knitr allows for easy table production, if you can configure your code right (this will take (many) trial and error runs)

      knitr::kable(data.frame(name=c("Constant","Public firm","Finance",
      "Firm age","Service","High-tech","Change","Search","Interaction MetricAge"),
      
      HPFBA = sapply(res.spfb, function(x) x`$`b), 
      CIlb = sapply(res.spfb, function(x) x`$`ci.lb),
      CIub = sapply(res.spfb, function(x) x`$`ci.ub),
      Pvalue=sapply(res.spfb, function(x) x`$`pval),
      digits=3,  col.names=c("Variable","PF < HA","CI~lb~","CI~ub~",
                              "*p-value*"),
      caption = "Table 3: Meta-analytic weighted 
       least squares for social performance feedback")

Table time

Table 3: Meta-analytic weighted least squares for social performance feedback
Variable	PF < HA	CI_lb	CI_ub	p-value
Constant	0.057	-0.070	0.185	0.367
Public firm	-0.143	-0.191	-0.096	0.000
Finance	-0.081	-0.148	-0.014	0.019
Firm age	0.002	0.000	0.003	0.015
Service	-0.006	-0.066	0.055	0.845
High-tech	0.033	-0.017	0.083	0.192
Change	0.015	-0.057	0.087	0.682
Search	-0.040	-0.122	0.043	0.337

So we wanted to plot some meta-analytic interaction. We interact our 'performance metric' variable with 'organizational age'.

require(metafor)
dat$spfb <-0.5*log((1+dat$spfb_dv)/(1-dat$spfb_dv))
spfb.wls <- rma(spfb,mods=~ public + finance*mean_age + service + 
                  hightech + search + change,var,data=dat,
                method="DL",test="knha")
wi    <- 1/sqrt(dat$var)
size  <- 0.5 + 1.5*(wi - min(wi))/(max(wi) - min(wi))
plot(dat$mean_age, dat$spfb_dv, pch=19, xlim=c(0,120), ylim=c(-0.5,0.5), cex=size,
     xlab="Firm age (years)", ylab="Correlation", las = 1, bty="l")
preds.nonfinance <- predict(spfb.wls, newmods=cbind(1,0,0:120,0,0,0,0,0))
lines(0:120, preds.nonfinance$pred,col="red",lwd=2)
lines(0:120, preds.nonfinance$ci.lb,lty="dotted",col="red",lwd=2)
lines(0:120, preds.nonfinance$ci.ub,lty="dotted",col="red",lwd=2)
preds.finance <- predict(spfb.wls, newmods=cbind(1,1,0:120,0,0,0,0,0:120))
lines(0:120, preds.finance$pred,col="blue",lwd=2)
lines(0:120, preds.finance$ci.lb,lty="dotted",col="blue",lwd=2)
lines(0:120, preds.finance$ci.ub,lty="dotted",col="blue",lwd=2)
legend("topright", legend=c("Non-financial","Financial"),lty=c("dotted","solid"),col=c("red","blue"),lwd=2)
title("P < SA for public")

Plot

Reporting statistics

Because all statistics are structured in a manner you devised, it is very easy to write functions that return whatever-style output.

Using code inside your text allows you to specify where you want statistical output, which can be individual tests, tables, graphs, etc.

Self-contained files

Being smart with packages allows you to make a completely self-contained paper within your Rmd file

if(!require(devtools)){install.packages("devtools")}
require(devtools)
if(!require(osfr)){install_github("chartgerink/osfr",force=TRUE)}
require(osfr)
download_files('bj786', private=TRUE) #masterdata.csv
download_files('wqyz2', private=TRUE) #reference file for formatting
download_files('pz3kx', private=TRUE) #20170711studies.csv
download_files('3v9jf', private=TRUE) #csl file for AMR references
download_files('g6xur', private=TRUE) #references

Live example from a conference paper

Limitations of R Markdown

The pesky learning curve:

Producing a text file with numbers: rather easy
Producing a formatted docx/PDF with tables: doable'ish
Producing a presentation with analyses, tables, that is nicely formatted: ????

For PhD students:

"My supervisor doesn't understand this"

Concluding thoughts

I think reproduciblity is a daily process. Reproducible research starts today; not when your project is done.

When the TSB Science Committee requests to check your paper, you should be able to send them what they request within five minutes.

R Markdown is a great tool to help you do that.