Reproducible Research

with Quarto

James Balamuta

May 1, 2024

Lecture Objectives

  • Understand and describe literate programming
  • Discuss markdown a text-driven authoring format for writing documents.
  • Create dynamic reports using Quarto (next version of RMarkdown)

Code for Humans

Wrong Data

Hello James,

Prof. Toad* gave us an old copy of the dataset. Could you redo the analysis on the updated data? Let’s aim to meet tomorrow for coffee to discuss the results. Are you free at 9 AM?

Best,

Steven

In the Wild: Data Science Gone Wrong

  • Retraction Watch by Adam Marcus, Ivan Oransky, and Alison McCook Monitors for authors retracting their paper from a journal.

  • One such case of a paper being retracted due to an Excel error was the Growth in a Time of Debt by Reinhart & Rogoff.

    • The error was found by graduate student Thomas Herndon and co-authors Michael Ash, and Robert Pollin.
    • They published a critique highlighting the error.
    • Herndon appeared on the Colbert Report to discuss their findings.


How can we create a report that

contains code

and

updates if data changes?

Replicable

XKCD Comic that shows a person pulling a lever and being shocked. Then, the person pauses and contemplates not pulling the lever or pulling the lever again to verify they would be shocked.

Replicability is present only when the exact same experiment is performed at least twice leading to the same conclusion. This requires each experiment having the same data collection and analysis mechanisms.

Reproducible

Reproducibility exists if there is a specific set of computational functions/analyses (usually specified in terms of code) that exactly reproduces all of the numbers in a published paper from raw data.

There has been a notable push to move toward Reproducibility within Statistics. In particular, the Journal of American Statistical Association (JASA) recently created a formal guide for reproducibility and appointed their own Associate Editors of Reproducibility!

Elsewhere, the scientific community discussed reproducibility at length in a special edition at the Science journal.

Coverpage of the Literate Programming book by Donald E. Knuth

“Let us change our traditional attitude to the constructions of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

— Donald Knuth in Literate Programming (1984) on pg. 1

Literate Programming

The notion of encouraging programmers to interleaving code within narrative content that follows the natural logic and flow of human thought.

Text

Code

Output

Text

Quarto in the World: Books

https://r4ds.hadley.nz/ | GitHub

Quarto in the World: Website

https://mine-cetinkaya-rundel.github.io/quarto-tip-a-day/ | GitHub

Quarto in the World: Academic Papers

https://quarto-journals.github.io/jss/ | GitHub

Overview of a Quarto Document

Create a Quarto Document

In the top left, click the White Plus and select “Quarto Document…”

Drop down menu containing Quarto Document creation button

Creating a new Quarto Document

In the new prompt, enter a title, author name, and press “Create”

New quarto document wizard allowing a title and author information to be set.

New Document Options

Annotated Quarto Document

Annotated figure that describes the different sections of a Quarto document while in the source editor mode.

Annotated sections of the “Hello Quarto” document related to document information, text formatting, and code execution

Output of a Quarto Document

Image showcasing how the source code of the document translated over into the rendered product.

Annotated source to output of the “Hello Quarto” document

Writing in Markdown

A World Without Markdown

A sample word document that contains text that has been accented with bold and italics as well as an unordered list.

Example of a Word document word-document.docx

A World Powered By Microsoft’s Markup

The markup involved in showing how word stores formatting information.

Source of the example Word document unzip word-document.docx

Aside: Word’s Format

Note

The prior screenshot is taken by unzipping the .docx file and, then, opening the document.xml file, e.g. in Terminal we have:

unzip word-document.docx && open word/document.xml

A World With HTML

A sample HTML document that contains text that has been accented with bold and italics as well as an unordered list.

Sample HTML Webpage mirror the word document

A World With HTML Document Source

The source code generating the HTML document with text that has been accented with bold and italics as well as an unordered list.

Source of the sample HTML webpage

A World With Markdown

Code

Welcome to my markdown document. 
We can have **bold**, _italic_, 
***bold and italics*** text.
Also, we have:

- An
- Unordered
- List

Not bad right?

Output

Welcome to my markdown document. We can have bold, italic, bold and italics text. Also, we have:

  • An
  • Unordered
  • List

Not bad right?




Markdown is the

lingua franca

to creating any kind of document

Markdown in the Wild: Reddit

Demo using markdown to make a post on the website Reddit.com

Writing a post using markdown on Stanford’s Subreddit

Markdown in the Wild: GitHub

Demo using markdown to create an issue on the website GitHub.com

Writing an issue using markdown on GitHub

Markdown Quick Reference In RStudio

Opening the built-in Markdown quick reference sheet inside of RStudio by going to the help menu and selecting 'Markdown Quick Reference'.

Accessing the “Markdown Quick Reference” Guide inside of RStudio

Text Formatting

Code

Writing text with emphasis in
*italics*, **bold**, 
***italics + bold**,
~~strikethrough~~,
superscript^2^ / subscript~2~,
and `code style`.

Output

Writing text with emphasis in italics, bold, *italics + bold, strikethrough, superscript2 / subscript2, and code style.

Linking

Code

Line breaks create a new paragraph. 

Links can be hidden e.g. 
[Stanford](
https://stanford.edu/
) or 
not <https://stanford.edu/>.

Output

Line breaks create a new paragraph.

Links can be hidden e.g. Stanford or not https://stanford.edu/.

Images

Code

Relative Path Image

![](img/stanford-cardinal-logo.svg)

Absolute Path Image

![](C:/jjb/img/stanford-cardinal-logo.svg)

Web Image

![](https://en.wikipedia.org/../File:stanford-cardinal-logo.svg)

Output

The Stanford University cardinal logo.

(Repeated 3 Times…)

Images: Path Warning

Important

Relative paths are the best to use to share your work with others as they are operating system independent. For example, do you have a user called “jjb” on your computer with a folder “img”?

Quotes

Code

> "Never gonna give you up,
>  never gonna let you down..."
>
> --- Rick Astley

Output

“Never gonna give you up, never gonna let you down…”

— Rick Astley

Math

Code

Inline math: 
$a^2 + b^2 = c^2$

Display math (centered math):
$$1 - x = y$$

Output

Inline math: \(a^2 + b^2 = c^2\)

Display math (centered math): \[1 - x = y\]

Header Sizes

Markdown Syntax Output
# Header 1

Header 1

## Header 2

Header 2

### Header 3

Header 3

#### Header 4

Header 4

##### Header 5
Header 5
###### Header 6
Header 6

Lists: Unordered

Code

My **un**ordered list:

- Write Selection Simulation
- Conference Abstracts
     - UseR
     - Learning at Scale

Output

My unordered list:

  • Write Selection Simulation
  • Conference Abstracts
    • UseR
    • Learning at Scale

Lists: Ordered

Code

My **ordered** list:

1. Apples
1. Bananas
1. Chobani
    1. Pineapple
    1. Everything else

Output

My ordered list:

  1. Apples
  2. Bananas
  3. Chobani
    1. Pineapple
    2. Everything else

List: Summary

Important

Make sure a new line (space) exists between text and the first list item. For sublists or nested lists, indent four spaces to create a new level in the list.

Tip

To simplify ordered lists and allow for moving items in the list around, use 1. for each item. If a list needs to be broken, numbering is only continued if each entry is labeled using 1., 2., 3., … format.

Tables

Code

| Left                    | Center          | Right   |
|-------------------------|:---------------:|--------:|
| Hey, check it out       | Colons provide  |    873  |
| its **Markdown**        | alignment thus  |   1000  |
| right in the table      | *centered* text |         |

Output

Left Center Right
Hey, check it out Colons provide 873
its Markdown alignment thus 1000
right in the table centered text

Table: Tip

Tip

Visual mode provides a Table menu to setup quarto tables or use the table generator website.

Writing with Quarto

How Quarto Works

Quarto handles literate programming by using a series of programs:

How Quarto Works (Source)

  • knitr executes all code chunks and creates a new markdown (.md) file
  • pandoc takes the markdown file generated and converts it to the desired format.
  • Render inside of RStudio handles the interaction.

Source vs. Visual Mode

Figure showing what a Quarto document looks like in Source Editing Mode.

Source Editing Mode

Figure showing what a Quarto document looks like in Visual Editing Mode.

Visual Editing Mode

Render a Quarto Document

You can render a Quarto documents by using this shortcut in RStudio:

  •  Mac: Cmd (⌘) + Shift (⇧) + K
  • ⊞ Win: Ctrl + Shift + K

Or, you can press the “Render” button in either Source or Visual Mode.

Press the render button to generate a new document.

Rendering a Quarto Document using “Render”

Code Chunks: Text + Code + Output

Example

```{{r}}
#| label: chunk-label

# code here
```

Insert chunk into qmd by typing or using [⌘/Cntrl + ⌥/Alt + I]

Code

We're embedding _R_ code
**into** a report!!
```{{r}}
#| label: add_nums
1 + 2
```

Output

We’re embedding R code into a report!!

1 + 2
[1] 3

(Aside) Pets or Livestock

Important

Please make sure to label your code chunks! It helps with debugging.

Describing each code chunk using either a label or omitting the label and its consequences in finding errors.

Sample of Code Chunk Options

Code

Let's hide the _R_ code from
showing up in the report!


```{{r}}
#| label: ex-hide
#| echo: false
x = 1:10
y = 11:20
plot(x, y)
```

Output

Let’s hide the R code from showing up in the report!

Customize Code Execution Options

Option Description
eval Evaluate the code chunk.
echo Include the source code in output
output Include code output results (true, false, or asis)
warning Include warnings in the output.
error Include errors in the output (continues execution if error present).
include Catch all for preventing any output (code or results) from being included.

Customize Code Execution Options

Note

Demo: echo option

echo hides code, but shows results.

Code

```{r}
#| label: ex-orig
x = 1:10; y = 11:20
plot(x, y)
```
```{r}
#| label: ex-hide-code
#| echo: false
x = 1:10; y = 11:20
plot(x, y)
```

Output

ex-original Output:

x = 1:10; y = 11:20
plot(x, y)

ex-hide-code Output:

Demo: eval option

eval shows code, but does not create results.

Code

```{r}
#| label: ex-orig
x = 1:10; y = 11:20
plot(x, y)
```
```{r}
#| label: ex-not-run
#| eval: false
x = 1:10; y = 11:20
plot(x, y)
```

Output

ex-original Output:

x = 1:10; y = 11:20
plot(x, y)

ex-not-run Output:

x = 1:10; y = 11:20
plot(x, y)

Inline code

Enclose the R expression using `r `.

Code:


There are `r nrow(cars)` observations in our data. 

Output:

There are 50 observations in our data.

Important

If using Visual or Source mode, be advised the R expression will only substitute the value held by the variable when the Quarto document is rendered. That is, the value contained within the expression only appears in the output file.

Reference Code Chunk Variables Inline

Code:

```{{r}}
#| label: calc-values
#| echo: false
x = 1:10
x_mu = mean(x)
x_sd = sd(x)
```

The _mean_ of **x** is  `r x_mu` and
the _standard deviation_ is `r x_sd`.

Output:

The mean of x is 5.5 and the standard deviation is 3.02765.

Properties of a Quarto Document

Customization Header

The title, author, date, output format, and editor type is stored in the beginning or head of the quarto document. The data is stored according to the YAML Ain’t Markup Language (YAML)1 format.

---
title: "Hello Quarto"
author: "JJB + Course"
format: html
editor: "visual"
---

Render Options

Render as an HTML document

---
title: "Hello Quarto"
author: "JJB + Course"
format: html
---

Render as a PDF

---
title: "Hello Quarto"
author: "JJB + Course"
format: pdf
---

Render as a Word document

---
title: "Hello Quarto"
author: "JJB + Course"
format: docx
---

Multi-format Render Options

Render one Quarto document to many output options like HTML, Jupyter Notebook, PDF, and Word Document.

---
title: "Hello Quarto"
author: "JJB + Course"
format: 
  html: default
  ipynb: default # new format!
  pdf: default
  docx: default
---

Note

Quarto supports many formats include PowerPoint (PPT), Revealjs, Beamer, Rich Text Format (RTF), and on. For details, see All Formats.

Selecting a Single Render Format

Using the The Render Button Logo in RStudio Render button’s drop down menu, we can select a single output format to create.

Dropdown menu showing the different render options.

Example of a Selecting a Single Output Format from the Render Menu dropdown.

Customizing Specific Output Formats

In this example, we customize the html and docx format.

---
title: "Hello Quarto"
author: "JJB + Course"
format: 
  html:
    toc: true
    code-fold: true
  ipynb: default
  pdf: default
  docx: 
    number-sections: true
    highlight-style: github
---

Note

For individual format options, please find the format on the All Formats page of the Quarto user guide.

A brief history

Iteration 1: Sweave

  • Literate programming has been a huge focus of the R community.

  • Officially, the Sweave (.Rnw) system backed by R-core allowed for literate programming.

  • Championed by Fritz Leisch, who was an R Core Member that recently passed away.

Iteration 1: Sweave

  • However, the system required extensive use of LaTeX, which marks up text, to combine R code with prose.

  • Plus, there were a few useful options such as saving long running code chunk results to avoid needing to re-calculate the output that were missing.

    Sample Sweave code chunk:

<<plot1, eval=FALSE, width=8, height=6>>= 
data(faithful) 
plot(faithful$eruptions, faithful$waiting)
@

Iteration 2: knitr & rmarkdown (.Rmd)

  • Looking at the weakness of the Sweave feature set, the knitr package was created with a focus on improving options within LaTeX.

  • A little bit later, the rmarkdown package arrived on the scene to lower the barrier of entry by allowing for markdown to used to interweave r code and results.

    Sample Rmarkdown code chunk:

```{r plot1, eval=FALSE, width=8, height=6}
data(faithful)
plot(faithful$eruptions, faithful$waiting)
```

Iteration 3: Quarto (.qmd)

  • The focus on using rmarkdown to create reports drew widespread acclaim after its debut in 2014. (See J.J. describe rmarkdown in 2016.)

  • However, the name rmarkdown constrained the report format to just R.

  • As data science is a polygot field, that is you need to speak more than one language (e.g. Python, R, Julia, SQL, C++, …), the idea for a language agnostic framework was born.

Iteration 3: Quarto (.qmd)

  • Quarto is the manifestation of being able to work with multiple languages without needing R (you could just use a Jupyter kernel).

    Sample Quarto code chunk:

    ```{r}
    #| label: plot1
    #| eval: false
    #| width: 8
    #| height: 6
    data(faithful)
    plot(faithful$eruptions, faithful$waiting)
    ```

Summary: Part I

  • Code for Humans
    • Write code in a human-friendly way for consumption by humans not computers with literate programming.
    • Scientists aim for the Golden Standard of experiments being replicable.
    • Statisticians/Data Scientists aim for being able to reproducible computations

Summary: Part II

  • Markdown
    • Focuses on plain text that is human readable and customizable.
  • Quarto
    • Combine code with a narrative analysis in a reproducible manner.