Week 1 (R): R Markdown

If you’re new to R, follow the online ModernDrive chapter https://moderndive.netlify.app/1-getting-started.html

RStudio Intro

Using Project

GIF for creating an RStudio Project


Locate the following panes:

  • Console
  • Source
  • History
  • Help

Install R Packages

See https://twitter.com/visnut/status/1248087845589274624

# Install the tidyverse "meta" package
# install.packages("tidyverse")
# Install the here package
# install.packages("here")

!!! Don’t you dare include any install.packages() statements in your homework !!!

Load a Package

# Uncomment the code below to load the tidyverse package
# library(tidyverse)

Import Data

  1. Download the data file salary.txt from https://raw.githubusercontent.com/marklhc/marklai-pages/master/data_files/salary.txt

  2. Create a folder named data_files in your project

  3. Run the following

library(here)
# The `here()` function forces the use of the project directory
salary_dat <- read.table(here("data_files", "salary.txt"), header = TRUE)

R Markdown

  • YAML metadata
  • Text (Markdown)
  • Code chunks

YAML

Ex1:

  • Update your name in the author field
  • Change the option from toc: false to toc: true
  • Insert today’s date using the date field

Text (Markdown)

  • Bold **Bold**

  • italic *italic*

  • code `code`

  • Link to USC [Link to USC](www.usc.edu)

  • Header

# Level 1

## Level 2

### Level 3

Unordered list

  • item 1
  • item 2
    • item 2a

Ordered list

  1. item 1
  2. item 2
    1. item 2a

Equations (LaTeX)

Inline: \(Y_i = \beta_0 + \beta_1 X_i + e_i\)

Display:

\[\rho = \frac{\tau^2}{\tau^2 + \sigma^2}\]

Inline Code

The value of $\pi$ is `r pi`

The value of \(\pi\) is 3.1415927

Code Chunks

Content to be interpreted by R engine

1 + 1
[1] 2
v1 <- c(1, 2, 6, 8)  # create a vector `v1`
v1[3]  # extract 3rd element of v1
[1] 6
# extract the `salary` column, and print the first six values
head(salary_dat$salary)
[1] 51876 54511 53425 61863 52926 47034

Chunk Options

  • echo = FALSE: Do not show the input command
  • results = 'hide': Do not show the results
  • eval: Do not run the code (and so no output)
  • include = FALSE: Do not show anything from the chunk

Knitting

  • Try also output: revealjs::revealjs_presentation

Note: Different R sessions are used for the console and for knitting


Ex2: Change the chunk option for the chunk below so that it only shows the code, but not the output

m1 <- lm(salary ~ pub, data = salary_dat)  # linear model
summary(m1)

Call:
lm(formula = salary ~ pub, data = salary_dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-20660.0  -7397.5    333.7   5313.9  19238.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 48439.09    1765.42  27.438  < 2e-16 ***
pub           350.80      77.17   4.546 2.71e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8440 on 60 degrees of freedom
Multiple R-squared:  0.2562,    Adjusted R-squared:  0.2438 
F-statistic: 20.67 on 1 and 60 DF,  p-value: 2.706e-05

The model suggests that each publication is worth 350.8017794 dollars in salary.

Cheatsheet

https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown.PDF

Exercise

Download the Rmd file for the exercise on Blackboard

  1. Complete Ex1 above.
  2. Complete Ex2 above.
  3. Type this equation in LaTeX: https://wikimedia.org/api/rest_v1/media/math/render/svg/2898c190bd4d2bb0a4f53ebaf1e51d4c15de6fed. Make sure you get all the subscripts right.
  4. Install and load the modelsummary package, run the following, and find out what the average salary is for females in the sample (sex = 0 for males, 1 for females).
    (You need to remove the eval = FALSE chunk option.)
# Install and load the modelsummary package first; otherwise, it won't run
library(modelsummary)
datasummary_balance(~ sex, data = salary_dat)
  1. Run the following and find out what the code chunk does
    (You need to remove the eval = FALSE chunk option.)
fm1 <- lm(salary ~ pub, data = salary_dat)
fm2 <- lm(salary ~ pub + time, data = salary_dat)
fm3 <- lm(salary ~ pub * time, data = salary_dat)
msummary(list(`model 1` = fm1, `model 2` = fm2, `model 3` = fm3))

  1. Run the following and find out what this code chunk does. You’ll need to remove eval = FALSE so it runs.
ggplot(salary_dat, aes(x = pub, y = salary)) +
  geom_point() +
  geom_smooth()

  1. The following shows an interaction plot. Based on the plot, write a sentence to interpret the interaction between time (time since Ph.D.) and pub (number of publications) when predicting salary.
    (Hint: You need to install the interactions package. You can ignore statistical significance in your interpretation; focus on the pattern shown in the graph.)
interactions::interact_plot(fm3,
    pred = "pub",
    modx = "time",
    modx.values = c(1, 7, 15),
    modx.labels = c(1, 7, 15),
    plot.points = TRUE,
    x.label = "Number of publications",
    y.label = "Salary",
    legend.main = "Time since Ph.D."
)

  1. Knit the document to HTML, PDF, and Word. Which format do you prefer? If you run into an error when knitting to any one of the formats, record the error message.

  2. Submit the knitted document to Blackboard in your preferred format (HTML, PDF, or WORD).