Written by: STATISTICA 8/30/2010 4:29 PM
It is often interesting to look at various measures of odds, and this can be accomplished using STATISTICA’s Generalized Linear/Nonlinear (GLZ) Model. The overall odds ratio explains how much more prevalent one category is compared to another. Additionally, the odds can be examined as a contribution from a predictor variable. For example, in modeling the probability of heart disease in a population, the analyst may discover that the odds of heart disease are .26, while the odds of heart disease given that the patient is a smoker are 1.75. These are powerful statistics for evaluating and understanding the relationships in the data.
With STATISITCA GLZ, you can compute both these types of odds ratios. This article will illustrate their use with an example concerning horseshoe crabs. A biologist is modeling the probability of the presence of male horseshoe crabs, called satellites, in the vicinity of nesting female horseshoe crabs. This example uses data found in the STATISTICA examples data folder, Crabs.sta. The biologist can calculate the odds of the presence of one or more satellites using an odds ratio. Additionally, the odds of satellite presence and their confidence intervals based on predictor variables such as spine condition and color can also be computed.
The overall odds ratio is available for Generalized Linear/Nonlinear Model analyses using a discrete distribution. Odds ratios for predictors in the model are available for analyses using the binomial distribution with logit link and over-parameterized model. When this model is appropriate, the odds ratios for predictor variables are very useful statistics. Note that the binomial distribution, logit link, and over-parameterized model are not default settings in Generalized Linear Models.
To begin the example, open Crabs.sta: select the Home tab, click the Open arrow, and select Open Examples. The file is located in the Datasets folder.
Now, launch Generalized Linear/Nonlinear Models: select the Statistics tab. In the Advanced/Multivariate group, click the Advanced Models arrow. Select Generalized Linear/Nonlinear to display the Generalized Linear/Nonlinear Models Startup Panel.
On the Advanced tab, change the Distribution to Binomial and ensure that Logit is selected in the Link functions group box.
Click the OK button to display the GLZ General custom design dialog.
Click the Variables button, and specify variables as seen in the image below. The dependent variable has a binomial distribution and is called Y. Y=1 means that one or more satellites are present in the vicinity of the nesting female horseshoe crab. Y=0 means no satellites are present. Color and Spine are categorical variables and Weight and Catwidth are continuous measures of the size of the female horseshoe crab.
Click OK in the variable selection dialog.
The default model will fit the probability of y=0, no satellites present. This can be changed to model the probability of y=1 (probability of the presence of one or more satellites): click the Response codes button to display the Select two codes for the binomial response dialog. The first code will be used for modeling, so type 1 0 in the field.
Click the OK button.
Additionally, the between-effects design should be modified to remove the interaction between categorical variables, which is not of interest. Click the Between effects button. In the GLM Between Effects dialog, select the Use custom effects for the between design option button. Select all four variables and click the Add button.
Click OK in the GLM Between Effects dialog.
The GLZ General custom design dialog should now look like this:
On the Advanced tab, clear the Sigma-restricted check box under Estimation. Now the over-parameterized method will be used.
Click OK in the GLZ General custom design dialog. A message will be displayed that tells us that one of the parameters in the model was redundant and was zeroed out (basically ignored). This typically is encountered with an over-parameterized model when you have a categorical predictor. Click OK to advance to the GLZ - Results dialog. The overall odds ratio is available from the Resid.1 tab.
Click the Class & odds ratio button to output the results.
A female horseshoe crab is 1.78 times as likely to have satellites present than not. This is the overall log odds ratio. This statistic can similarly be found for other discrete distributions and link function combinations.
The odds ratios for parameters in the model are created from the Summary tab.
Click the Estimates button for the results. Two results spreadsheets are produced. The first contains model parameter estimates and significance test results. The second contains odds ratios and their confidence intervals for the parameters.
In the output, the odds ratio for color = lightmed is 5.25. This odds ratio is interpreted as follows: a horseshoe crab with a light medium color is 5.25 times more likely to have one or more satellites.
This result is illustrated with the means plot shown below. The means of predictive probability of the presence of one or more satellites (y=1) is plotted, grouped by the color of the horseshoe crab. The light medium crabs had the highest probability of satellites, on average. Following the graph are instructions on how the graph was produced.
Producing the Graph
On the Resid. 1 tab of the GLZ - Results dialog, click the Predicted values button to produce the Y - Predicted Values spreadsheet.
Append the Color variable from Crabs.sta to this spreadsheet: right-click in the last variable header (Upper CL), and select Add Variables from the shortcut menu. In the Add Variables dialog, 1 is entered by default in the How many field. Double-click in the After field, select Upper CL, and click OK. In the Name field, enter Color. Click OK in the Add Variables dialog.
Then, in the Crabs.sta data set, right-click on the Color header and select Copy from the shortcut menu. In the Y – Predicted Values spreadsheet, right-click on the new Color variable header, and select Paste from the shortcut menu. A Text Label Warning dialog will be displayed. Click the Import with Text Labels button.
Now, to create a means with error plot from this spreadsheet, first ensure that the spreadsheet is specified as the input spreadsheet. Select the Data tab and in the Mode group, select the Input check box. Then, select the Graphs tab. In the Common group, click Means to display the Means with Error Plots dialog. Click the Variables button, and select Pred. as the Dependent variable and Color as the Grouping variable.
Click OK.
In the Means with Error Plots dialog, select the Integer mode option button under Grouping intervals.
Click the OK button to produce the Mean Plot of Pred. grouped by Color.
0 comment(s) so far...