class: center, middle, inverse, title-slide .title[ # Review of Regression Analysis ] .subtitle[ ## PSYC 575 ] .author[ ### Mark Lai ] .institute[ ### University of Southern California ] .date[ ### 2020/08/04 (updated: 2022-08-27) ] --- # Statistical Model .pull-left[ <img src="02_review_regression_files/figure-html/stat-model-1.png" width="95%" style="display: block; margin: auto;" /> ] -- .pull-right[ ## A set of statistical assumptions describing how data are generated - Deterministic/fixed component `$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \ldots$$` - Stochastic/random component `$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \ldots + \color{red}{e_i}$$` `$$\color{red}{e_i \sim N(0, \sigma)}$$` ] ??? - It's only a review, so I won't go deep. - You may check out the sections in the book by Gelman et al. - Model in OpenBoard - Statistical notation * Notation for normal distribution * Important for MLM --- class: middle, center # Why Regression? -- ### MLM is an extension of multiple regression to deal with data from multiple levels --- # Learning Objectives ### Refresh your memory on regression -- - Describe the statistical model -- - Write out the model equations -- - Simulate data based on a regression model -- - Plot interactions --- class: inverse, middle, center # R Demonstration ??? Transition to RStudio - Data Import - Explain the variables --- # Salary Data From [Cohen, Cohen, West & Aiken (2003)](https://doi.org/10.4324/9780203774441) Examine factors related to annual salary of faculty in a university department -- - time = years after receiving degree - pub = # of publications - sex = gender (0 = male, 1 = female) - citation = # of citations - salary = annual salary --- # Data Exploration .pull-left[ <img src="02_review_regression_files/figure-html/salary_dat-1.png" width="95%" style="display: block; margin: auto;" /> ] ??? Explain what the x axis, y axis, diagonals are Citation vs salary as an example -- .pull-right[ - How does the distribution of `salary` look? - Are there more males or females in the data? - How would you describe the relationship between number of publications and salary? ] --- # Simple Linear Regression ### Sample regression line ### Confidence intervals ### Centering ??? - Regression line is only a sample estimate; there is uncertainty - Uncertainty measured by standard errors and confidence intervals * Show animations on the varying regression slopes * A function of sample size - Centering: Draw a picture on changing the x-axis - Interpretations: unit increase in `\(x\)` associated with `\(\beta\)` unit increase in `\(y\)` --- exclude: true class: inverse, middle, center # Simulation See lecture and R code --- # Categorical Predictors .pull-left[ ### Dummy Coding With `\(k\)` categories, one needs `\(k - 1\)` dummy variables The coefficients are differences relative to the reference group ] -- .pull-right[ <img src="02_review_regression_files/figure-html/p2-1.png" width="95%" style="display: block; margin: auto;" /> ] --- # Categorical Predictors .pull-left[ ### Dummy Coding With `\(k\)` categories, one needs `\(k - 1\)` dummy variables The coefficients are differences relative to the reference group Male = 0 `$$y = \beta_0 + \beta_1 (0) = \beta_0$$` ] .pull-right[ <img src="02_review_regression_files/figure-html/p2-1-1.png" width="95%" style="display: block; margin: auto;" /> ] --- # Categorical Predictors .pull-left[ ### Dummy Coding With `\(k\)` categories, one needs `\(k - 1\)` dummy variables The coefficients are differences relative to the reference group Male = 0 `$$y = \beta_0 + \beta_1 (0) = \beta_0$$` Female = 1 `$$y = \beta_0 + \beta_1 (1) = \beta_0 + \beta_1$$` ] .pull-right[ <img src="02_review_regression_files/figure-html/p2-2-1.png" width="95%" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # Multiple Regression --- .pull-left[ # Partial Effects `$$\text{salary}_i = \beta_0 + \beta_1 \text{pub}^c_i + \beta_2 \text{time}_i + e_i$$` ![](https://github.com/marklhc/marklai-pages/raw/master/static/img/regression_venn.png) ] ??? Transition to R -- .pull-right[ # Interpretations ### Every unit increase in `\(X\)` is associated with `\(\beta_1\)` unit increase in `\(Y\)`, **when all other predictors are constant** ] --- # Interactions ### Regression slope of a predictor depends on another predictor `\begin{align} \widehat{\text{salary}} & = 54238 + 105 \times \text{pub}^c + 964 \times \text{time}^c \\ & \quad + 15 (\text{pub}^c)(\text{time}^c) \end{align}` -- `time` = 7 `\(\Rightarrow\)` `time_c` = 0.21 `\begin{align} \widehat{\text{salary}} & = 54238 + 105 \times \text{pub}^c + 964 (0.21) \\ & \quad + 15 (\text{pub}^c)(0.21) \\ & = 54440 + 120 \times \text{pub}^c \end{align}` -- `time` = 15 `\(\Rightarrow\)` `time_c` = 8.21 `\begin{align} \widehat{\text{salary}} & = 54238 + 105 \times \text{pub}^c + 964 (8.21) \\ & \quad + 15 (\text{pub}^c)(8.21) \\ & = 62152 + 228 \times \text{pub}^c \end{align}` --- # Interactions ### Regression slope of a predictor depends on another predictor .pull-left[ `\begin{align} \widehat{\text{salary}} & = 54238 + 105 \times \text{pub}^c + 964 \times \text{time}^c \\ & \quad + 15 (\text{pub}^c)(\text{time}^c) \end{align}` `time` = 7 `\(\Rightarrow\)` `time_c` = 0.21 `\begin{align} \widehat{\text{salary}} & = 54238 + 105 \times \text{pub}^c + 964 (0.21) \\ & \quad + 15 (\text{pub}^c)(0.21) \\ & = 54440 + 120 \times \text{pub}^c \end{align}` `time` = 15 `\(\Rightarrow\)` `time_c` = 8.21 `\begin{align} \widehat{\text{salary}} & = 54238 + 105 \times \text{pub}^c + 964 (8.21) \\ & \quad + 15 (\text{pub}^c)(8.21) \\ & = 62152 + 228 \times \text{pub}^c \end{align}` ] .pull-right[ <img src="02_review_regression_files/figure-html/p4-1.png" width="95%" style="display: block; margin: auto;" /> ] --- # `modelsummary::msummary()` .small[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> M3 + Interaction </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:center;"> 54238.1 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (1183.0) </td> </tr> <tr> <td style="text-align:left;"> pub_c </td> <td style="text-align:center;"> 104.7 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (98.4) </td> </tr> <tr> <td style="text-align:left;"> time_c </td> <td style="text-align:center;"> 964.2 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (339.7) </td> </tr> <tr> <td style="text-align:left;"> pub_c × time_c </td> <td style="text-align:center;"> 15.1 </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1px"> </td> <td style="text-align:center;box-shadow: 0px 1px"> (17.3) </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 62 </td> </tr> <tr> <td style="text-align:left;"> R2 </td> <td style="text-align:center;"> 0.399 </td> </tr> <tr> <td style="text-align:left;"> R2 Adj. </td> <td style="text-align:center;"> 0.368 </td> </tr> <tr> <td style="text-align:left;"> AIC </td> <td style="text-align:center;"> 1291.8 </td> </tr> <tr> <td style="text-align:left;"> BIC </td> <td style="text-align:center;"> 1302.4 </td> </tr> <tr> <td style="text-align:left;"> Log.Lik. </td> <td style="text-align:center;"> −640.895 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:center;"> 12.817 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 7465.67 </td> </tr> </tbody> </table> ] --- # Summary ### Concepts - What is a statistical model - Linear/Multiple Regression * Centering * Categorical predictor * Interpretations * Interactions ### Try replicating the examples in the Rmd file