Written by: STATISTICA 10/27/2010 8:58 AM
Selecting variables is typically one of the first steps within an analysis dialog in STATISTICA. When the data set contains a large number of variables, certain tools in STATISTICA, including Variable bundles and Show appropriate variables only, make this selection process easier,.
A variable bundle is a selection of a set of variables within a spreadsheet used to facilitate repeated selections of that same set of variables in analyses. After a variable bundle is created, the bundle can be selected for analysis instead of individually selecting each required variable. This tool helps both to speed up the variable selection process and to ensure the proper selections are made each time the bundle is used.
To illustrate variable bundles, we will use the CreditScoring.sta example data file, located in the Examples/Datasets folder of STATISTICA. This data set has both continuous and categorical predictors that are scattered throughout the spreadsheet. Creating a bundle before analysis enables the selections to be made automatically.
With the data file open, select the Data tab.
In the Variables group, click Bundles to display the Variable Bundle Manager dialog.
Click the New button to display the New Bundle dialog. The default name is Untitled, which should be changed to an appropriate bundle name. For this example, name it Continuous Predictors
Click OK, and the Select variables for bundle… dialog is displayed. For this example, select variables 3, 6, and 14, which are continuous predictor variables.
Click OK to continue.
Repeat the process of creating a new variable bundle, and name this one Categorical Predictors. Select variables 2, 4-5, 7-13, and 15-18.
Click OK.
When the bundles have been created, they will be displayed in the Variable Bundle Manager dialog as seen below. These bundles can be edited, deleted, or renamed. Clicking the Output to Spreadsheet button creates the following result:
Save the spreadsheet to retain the bundle information. When selecting variables for analyses, select the appropriate bundle listed at the top of the variable list to make the automatic selection.
Show Appropriate Variables Only Analysis tools are often intended for a specific kind of data. Regression, for example requires that the dependent (Y) variable be continuous. ANOVA requires categorical grouping factors. Select the Show appropriate variables only check box in STATISTICA variable specification dialogs to narrow the list of variables available for selection to those appropriate for the analysis. Like the bundles tool, Show appropriate variables only makes the variable selection process for any analysis faster and easier.
When the Show appropriate variables only check box is not selected, all variables will be available in all selection fields. When it is selected, only continuous variables are shown in variable selection fields where continuous is appropriate. And similarly, categorical variables are shown only in fields where categorical variables are expected.
STATISTICA has a method for determining if data are continuous or categorical. By default, variables’ measurement types are automatically determined. They are considered categorical if they have text labels, or are of type text, integer or byte. This setting can be changed in the Options dialog, in the Analysis/Graphs options pane.
Individual variables can be manually assigned a measurement type such as Continuous, Categorical, or Ordinal in the Variable dialog, accessed by double-clicking on a variable name in a data file. The default for all variables is Auto.
In all STATISTICA analyses, the variable selection dialog contains the Show appropriate variables only check box. Selecting this check box will modify the variables available for selection. In the Boosted Trees, Classification analysis shown in the next image, the variable selection dialog with the Show appropriate variables only check box selected shows categorical variables for both Dependent and Categorical pred. Continuous variables are shown for Continuous pred. and Count variable.
Clearing the check box would allow any variable to be selected in any field. Note that selecting inappropriate variables will likely cause warning or error messages.
0 comment(s) so far...