Week 1 (R): R Markdown
If you’re new to R, follow the online ModernDrive chapter https://moderndive.netlify.app/1-getting-started.html
RStudio Intro
Using Project
Locate the following panes:
- Console
- Source
- History
- Help
Recommended Option
- Tools –> Global Options –>
- Set “Save workspace to .RData on exit” to “No”.
Install R Packages
See https://twitter.com/visnut/status/1248087845589274624
# Install the tidyverse "meta" package
# install.packages("tidyverse")
# Install the here package
# install.packages("here")
!!! Don’t you dare include any install.packages()
statements in your homework !!!
Load a Package
# Uncomment the code below to load the tidyverse package
# library(tidyverse)
Import Data
Download the data file
salary.txt
from https://raw.githubusercontent.com/marklhc/marklai-pages/master/data_files/salary.txtCreate a folder named
data_files
in your projectRun the following
library(here)
# The `here()` function forces the use of the project directory
<- read.table(here("data_files", "salary.txt"), header = TRUE) salary_dat
R Markdown
- YAML metadata
- Text (Markdown)
- Code chunks
YAML
Ex1:
- Update your name in the
author
field- Change the option from
toc: false
totoc: true
- Insert today’s date using the
date
field
Text (Markdown)
Bold
**Bold**
italic
*italic*
code
`code`
Link to USC
[Link to USC](www.usc.edu)
Header
# Level 1
## Level 2
### Level 3
Unordered list
- item 1
- item 2
- item 2a
Ordered list
- item 1
- item 2
- item 2a
Equations (LaTeX)
Inline: \(Y_i = \beta_0 + \beta_1 X_i + e_i\)
Display:
\[\rho = \frac{\tau^2}{\tau^2 + \sigma^2}\]
Inline Code
`r pi` The value of $\pi$ is
The value of \(\pi\) is 3.1415927
Code Chunks
Content to be interpreted by R engine
1 + 1
[1] 2
<- c(1, 2, 6, 8) # create a vector `v1`
v1 3] # extract 3rd element of v1 v1[
[1] 6
# extract the `salary` column, and print the first six values
head(salary_dat$salary)
[1] 51876 54511 53425 61863 52926 47034
Chunk Options
echo = FALSE
: Do not show the input commandresults = 'hide'
: Do not show the resultseval
: Do not run the code (and so no output)include = FALSE
: Do not show anything from the chunk
Knitting
- Try also
output: revealjs::revealjs_presentation
Note: Different R sessions are used for the console and for knitting
Ex2: Change the chunk option for the chunk below so that it only shows the code, but not the output
<- lm(salary ~ pub, data = salary_dat) # linear model
m1 summary(m1)
Call:
lm(formula = salary ~ pub, data = salary_dat)
Residuals:
Min 1Q Median 3Q Max
-20660.0 -7397.5 333.7 5313.9 19238.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 48439.09 1765.42 27.438 < 2e-16 ***
pub 350.80 77.17 4.546 2.71e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8440 on 60 degrees of freedom
Multiple R-squared: 0.2562, Adjusted R-squared: 0.2438
F-statistic: 20.67 on 1 and 60 DF, p-value: 2.706e-05
The model suggests that each publication is worth 350.8017794 dollars in salary.
Cheatsheet
https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown.PDF
Exercise
Download the Rmd file for the exercise on Blackboard
- Complete Ex1 above.
- Complete Ex2 above.
- Type this equation in LaTeX: https://wikimedia.org/api/rest_v1/media/math/render/svg/2898c190bd4d2bb0a4f53ebaf1e51d4c15de6fed. Make sure you get all the subscripts right.
- Install and load the
modelsummary
package, run the following, and find out what the average salary is for females in the sample (sex
= 0 for males, 1 for females).
(You need to remove theeval = FALSE
chunk option.)
# Install and load the modelsummary package first; otherwise, it won't run
library(modelsummary)
datasummary_balance(~ sex, data = salary_dat)
- Run the following and find out what the code chunk does
(You need to remove theeval = FALSE
chunk option.)
<- lm(salary ~ pub, data = salary_dat)
fm1 <- lm(salary ~ pub + time, data = salary_dat)
fm2 <- lm(salary ~ pub * time, data = salary_dat)
fm3 msummary(list(`model 1` = fm1, `model 2` = fm2, `model 3` = fm3))
- Run the following and find out what this code chunk does. You’ll need to remove
eval = FALSE
so it runs.
ggplot(salary_dat, aes(x = pub, y = salary)) +
geom_point() +
geom_smooth()
- The following shows an interaction plot. Based on the plot, write a sentence to interpret the interaction between
time
(time since Ph.D.) andpub
(number of publications) when predictingsalary
.
(Hint: You need to install theinteractions
package. You can ignore statistical significance in your interpretation; focus on the pattern shown in the graph.)
::interact_plot(fm3,
interactionspred = "pub",
modx = "time",
modx.values = c(1, 7, 15),
modx.labels = c(1, 7, 15),
plot.points = TRUE,
x.label = "Number of publications",
y.label = "Salary",
legend.main = "Time since Ph.D."
)
Knit the document to HTML, PDF, and Word. Which format do you prefer? If you run into an error when knitting to any one of the formats, record the error message.
Submit the knitted document to Blackboard in your preferred format (HTML, PDF, or WORD).