There are times when the relationship between dependent and independent variables is not linear. In these cases it is useful to include polynomial terms to help explain the variance in our dependent variable. The polynomial terms are treated as main effects in our model, so an estimate and p-value will be calculated, and from these results we can determine if the curvature in the data that we see is statistically significant. If we use the Multiple Regression module, the only way to include squared terms (and any order or interaction terms) is to build them in separate variable columns in the data and then include them in the regression. However, if we use a slightly more advanced tool, for example General Linear Models, we can specify that the program build these terms for us. Below is a quick guide on fitting an independent variable with main effects, a squared term, and a cubic term to a response variable. So, we will be estimating: y = B0 + B1x + B2x2 + B3x3 + e
For this example, open the NFLData08.sta data set.
1. On the Statistics tab, in the Advanced/Multivariate group, 2. Click Advanced Models. 3. From the menu, select General Linear.
4. In the General Linear Models (GLM) dialog, General linear models is selected by default as the Type of analysis. 5. Click OK.
6. In the GLM General linear models dialog, click the Variables button. 7. In the variable selection dialog, select 6 - PtsFor as our Dependent variable. 8. And select 8 - RushY as our Continuous predictor. 9. Click OK to close the variable selection dialog.
10. Now, in the GLM General linear models dialog, click the Between effects button. 11. In the GLM Between Effects dialog, select the Use custom effects for the between design option button. 12. Then, select the factor for which we want to include the polynomial term, in this case, RushY. 13. Since we want a third degree polynomial for our model, change the Degree to 3. 14. Then, click the Poly. to deg. (Polynomial to the degree) button. 15. The effects are displayed in the Effects in between design box. (Note: There are many more useful options in this dialog, such as interactions, nesting, and fitting surface models, but there must be more than one independent factor included in the model for these options to be available.) 16. Click OK to save our between effects structure. 17. And click OK in the GLM General linear models dialog to run our model. Note that a message will be displayed warning that the design matrix is ill-conditioned. You can ignore this warning (i.e., click OK). As you will see later in this example, the message is warning that some of our parameters might be redundant.
18. In the GLM Results dialog, select the Summary tab. 19. Then, click the Coefficients button. Note that a message may be displayed describing how you can rerun or resume the analysis; click OK. 20. A spreadsheet is created with the parameter estimates. (Incidentally, we see in the PtsFor Param. column that none of the effects is significant at the 0.05 significance level, i.e., no values >0.05.)
So, our estimated model is y = 15.784 + 0.003x + 0.001x2 – 0.000x3.
Written by, Shannon Dick
When you absolutely, positively must know how to use STATISTICA right the first time.