Spreadsheets Introduction
Cases, Variables, and Formats
Text Labels
Operations on Blocks of Values
Yes. Moreover, you can simultaneously run different analyses on the same or different data files. To open (and use) more than one input data file simultaneously, open multiple data files and make sure each is specified as an input data file (by selecting Input Spreadsheet from the Data menu) as you start the respective analyses.
Analyses stay associated with the different data files when you switch between them (using the buttons next to the STATISTICA Start button, above the status bar). Note that you can also perform different analyses on the same or different data files.
While you can specify any number of spreadsheets as input data files, only one file (per workbook) can be designated as the active input data file.
This means that if all your spreadsheets are opened within one workbook, only one of them can be used for input data at a time. The active input data spreadsheet is highlighted in the workbook tree with a red square, as shown above.
Yes. To designate a spreadsheet (i.e., the results from one analysis) as an input spreadsheet, select the spreadsheet (e.g., ensure the spreadsheet has the focus) and select Input spreadsheet from the Data menu. Now you can specify an analysis, and STATISTICA uses the selected spreadsheet as the input spreadsheet. Note that if you switch back to another spreadsheet that has previously been designated as an input spreadsheet, it can still be used for analyses as well.
While you can specify any number of spreadsheets as input data files, only one file (per workbook) can be designated as the active input data file. This means that if all your spreadsheets are opened within one workbook, only one of them can be used for input data at a time. The active input data spreadsheet is highlighted in the workbook tree with a red square as shown above.
Yes. Select Custom Lists in the Options dialog (accessible by selecting Options from the Tools menu) to edit or create custom lists for STATISTICA to use.
The Custom lists box reports all the custom lists currently available in STATISTICA. To see the entries included in a custom list, select one, and the entries are displayed in the List entries box. To create a new list, select NEW LIST and enter your choice of items for it in the List entries box. Once you have added all the items for the new list, click the Add button and the list is added to those in the Custom lists box. When entering data in spreadsheets, you can enter the first few items in the list, and then use extrapolation to fill in the remaining cases in the spreadsheet (as shown below).
A STATISTICA Matrix Spreadsheet (*.smx) is a spreadsheet file with a strictly defined format that can be used as input in several modules of STATISTICA (e.g., Cluster Analysis, Multidimensional Scaling, and SEPATH). Matrix spreadsheets have to meet the following conditions:
1. The number of cases (rows) = the number of variables (columns) + 4.
2. The matrix must be a square matrix and the case names should be the same as the variable names.
3. The last four cases contain the following case names and information:
Yes. In Windows Explorer, you can simply drag any STATISTICA Spreadsheet (file name extension *.sta) or graphics file (file name extension *.stg) onto the STATISTICA icon to automatically open the file in STATISTICA.
You can undo most operations such as editing, moving, or copying blocks; random fill; recoding or ranking variables; etc. by selecting Undo from the spreadsheet Edit menu or pressing CTRL+Z. Multi-level undo is supported (with 32 buffers), so you can undo up to 32 of the most recent spreadsheet operations.
The file header is located immediately above the variable headers at the top of the spreadsheet window. You can use this header to include a summary title or other identification of the data file (or results). Double-click in the header area to edit/enter information.
A customized header is used when printing spreadsheets (or other STATISTICA documents, e.g., reports, graphs, etc.). This header can include a title, page numbers, time of printing, date, the name of the file, and any custom text you want to include. To create a header, select Header/Footer from the View menu.
Note that while the file header is associated only with spreadsheets, any printed STATISTICA document can have a customized header.
A variety of graphs is available from the spreadsheet shortcut menu (accessed by right-clicking in any cell or selected block of the spreadsheet). These graphs belong to two broad categories, Graphs of Block Data and Graphs of Input Data, and they produce either summary graphs from rows or columns of the currently highlighted block, or graphs as the original input data identified by the currently selected row and/or column of the spreadsheet (respectively).
You can insert either a new or an existing object into a spreadsheet. To do this, select Object from the Insert menu to display the Insert Object dialog.
To insert a new object, select the Create New option button, select the type of object to be created from the Object Type box, and click the OK button.
To insert an existing object into the spreadsheet, select the Create from File option button in the Insert Object dialog. Next, click the Browse button to display the Browse dialog, and select the file to be displayed in the spreadsheet.
Then, click the OK button.
Double-click on the object to edit it. It will be opened in the application in which it was created.
STATISTICA data files are organized into cases and variables. If you are not familiar with this notation, you can think of cases as the equivalent of records in a database management program (or rows of a spreadsheet) and variables as the equivalent of fields (columns of a spreadsheet). Each case consists of a set of values of variables.
The difference between text and numeric types is straightforward; however, the main reason for having three types of numeric values is the storage efficiency. For most data files, that is not important and, thus, using the default (Double) data type is recommended. However, for very large data files, being able to switch to a 2 (or even 8) times more efficient storage (by using different data types) could make a difference between being able to perform the necessary analysis on a specific computer system.
The easiest way to add variables to an existing data file is to double-click on the gray area of the spreadsheet. This will display the Add Cases and/or Variables dialog.
Specify the number of variables to add and click the OK button. The new variables are added to the far right of the data file, and the variable specification dialog is displayed where you can specify various options for the new variables, such as Name, Type, Display format, etc.
You can also select Add Variables from the Insert menu to display the Add Variables dialog, where you can specify the Number of variables to add, Type, etc.
Also, you can select the appropriate command (e.g., Add, Move, Copy, Delete, etc.) from the variables menu, accessed by clicking the Vars button on the spreadsheet toolbar.
Select Add Variables from the spreadsheet Insert menu to display the Add Variables dialog. In the After box, enter 0, which references the variable number. If the variable number is 0, it effectively means that you are inserting before the first variable.
After you click the OK button, a new variable is inserted before the first variable of your spreadsheet.
Note that the name of the new variable is the name specified in the Name box of the Add Variables dialog (in this example, NewVar).
The easiest way to add cases to an existing data file is to double-click on the gray area of the spreadsheet. This displays the Add Cases and/or Variables dialog.
Specify the number of cases you want to add and click the OK button. The new cases are added to the bottom of the data file.
You can also select Add Cases from the Insert menu to display the Add Cases dialog, where you can specify the Number of cases to add and where to insert them.
Also, you can select the appropriate command from Cases menu, accessed by clicking the Cases button on the spreadsheet toolbar.
Select Add Cases from the spreadsheet Insert menu. In the Insert after case box of the Add Cases dialog, enter 0, which references the case number. If the case number is 0, it effectively means that you are inserting before the first case.
After you click the OK button, a new case is inserted before the first case of your spreadsheet.
Note that the name of the new case is 1. To change this, double-click on the 1 and enter a new name.
Following standard spreadsheet conventions, click in the upper-left corner of the spreadsheet (in the Info Box). This shortcut is useful, for example, when you intend to copy the entire file to the Clipboard or reformat the entire spreadsheet. You can also press CTRL+A or select Select All from the spreadsheet Edit menu or shortcut menu (accessed by right-clicking in the Info Box).
Note that by clicking twice in the Info Box, you deselect (i.e., select and then deselect) the entire spreadsheet.
Double-click on the cell before you start typing. This action enters the edit mode and positions the cursor within the cell.
Alternatively, you can follow the "old" spreadsheet convention and press the F2 key, which also enters the edit mode for the currently highlighted cell.
Date values of variables are stored internally in Julian format, that is, as a single integer value that represents the number of days that have passed since January 1, 1900. For example, a date entered and displayed as 1/21/1968 is stored as the Julian date 24858; the (optional) decimals are interpreted as time. Date values stored in this manner can be used in subsequent analyses (e.g., in Survival Analysis in order to calculate survival times, see below) and transformed using arithmetic operations; at the same time, they can be displayed as dates in reports or graphs (e.g., used to label scale values).
Julian date values can be displayed in the spreadsheet in numeric (Julian) format or in one of several predefined date display formats (e.g., 1/6/64, 6-Jan-64, Jan-1964, 01/06/64, 01/06/1964, 6-Jan).
Select Date in the Display format group of the Variable specifications dialog (accessible by double-clicking on the variable name in the spreadsheet or by selecting Variable Specs from the Data menu) and choose one of the predefined display formats.
Alternatively, you can create a single date variable from two (month, year or day) or three (day, month, year) variables as well as split a single date variable into two or three variables via the Date Operations dialog (accessed by selecting Date Operations from the Data menu).
Time values of variables are stored as (optional) decimal values representing the fraction of the day since midnight; for example, 6:00 a.m. is stored as 0.25. Time values stored in this manner can be used in subsequent analyses and transformed using arithmetic operations; at the same time, they can be displayed as times in reports or graphs (e.g., used to label scale values).
STATISTICA formats the display of time values according to the current settings of the Windows Control Panel.
In some circumstances, it can be useful to create text labels with date information [for example, when using a date variable as a coding variable with codes greater than 32,000. In this case, you can transform the date variable into a variable containing date text labels with numeric equivalents in a range that allows them to be used as codes by all procedures of STATISTICA (i.e., numeric values less than 32,000)]. Use the options on the Text Dates tab of the Date Operations dialog (accessed by selecting Date Operations from the Data menu) to convert dates into codes.
To review specifications for a single variable, double-click on a variable name in a spreadsheet to display the Variable specifications dialog ,where you can change the variable name, font format, display format, missing data value, long label, or formula for the current variable.
In this dialog, click the Values/Stats button to access descriptive statistics and a listing of sorted numeric values and text labels for the current variable. Click the Text Labels button to display the Text Labels Editor, where you can review and change the assignments between the text and numeric values for the current variable.
To review specifications for all variables, click the All Specs button to display the Variable Specifications Editor, an editable, combined table of specifications of all variables in the current data file. The table format is convenient when you need to compare or edit specifications of several variables, especially when you need to paste and copy between variables. You can also access this dialog by selecting All Variable Specs from the Data menu
A missing data code is a value that signifies that there is no data for a particular case and variable (displayed as a blank cell in the spreadsheet). A missing data code can be specified separately for each variable.
To change the code, double-click on a variable name in the spreadsheet to access the Variable specifications dialog for that variable. Alternatively, click the Vars toolbar button and select All Specs from the resulting menu to display the Variable Specifications Editor dialog for all variables. The default Missing Data code in STATISTICA (used when new files are created, new variables are added, or data are imported) depends on the variable type being used.
Moreover, in each analysis, you can individually adjust the way in which missing data are handled when processing data c (see the MD deletion group in most analysis definition dialogs). Whenever applicable, you have the choice to eliminate them from calculations in a casewise or pairwise manner, substitute them with means, or reconstruct or interpolate them.
The value used to designate missing data values for individual variables can be changed in the Variable specifications dialog, accessible by double-clicking on a variable header.
To edit a current case name, double-click on the case name in the spreadsheet. To manage (e.g., change the width, copy from a variable, etc.) case names, right-click on any case name and select Case Names Manager from the Case Name Management submenu to display the Case Names Manager dialog. The Case Names Manager command is also available from the Cases toolbar button and from the Data menu.
If the current data file contains no case names, you can create them by double-clicking on the case number.
A format is a group of formatting options that you can save with an assigned name for future use. Formats can be assigned to numbers as well as text. To create a new format in a spreadsheet, follow these instructions:
Select Format Manager from the Format - Block submenu to display the Format Manager dialog.
Click the New button to display the Format Name dialog.
Enter a name in the Name box, and then click the OK button to display the Edit Format dialog.
This dialog contains four tabs (Number, Alignment, Font, and Border) that are used to specify the formatting components for the new format. Select the desired formatting options for your new format. In the illustration above, the Bold check box has been selected in the Style/Effect group box and the Text Color box has been changed to navy.
Click the OK button to close the Edit Format dialog. The format that you just created has been added to the list of formats available in the Format Manager dialog and to the Format - Block submenu.
You can now apply this format to new selections (see the next section).
Formats can be applied quickly and consistently to segments of data. Once a format has been created, it can be applied to any part (or all) of the data in a spreadsheet.
To apply an existing format, first select the cell(s) that you want to format in the spreadsheet. Then select Block from the Format menu to display the list of available formats. (Alternatively, right-click on the selected cell(s) and choose Format from the shortcut menu, as displayed below.) Finally, select the name of the format to apply to the selected cells.
A spreadsheet layout is a convenient way to combine sets of formatting options into one collection. This feature can be used to specify and apply different formats for different components of a spreadsheet. Once you have created a spreadsheet layout, it can be applied to any spreadsheet for a consistently formatted appearance. Follow these instructions to create a new spreadsheet layout:
Select Layout Manager from the Format - Spreadsheet submenu to display the Spreadsheet Layouts dialog.
Click the New button to display the Spreadsheet Layout Name dialog.
Enter a name in the Name box, and then click the OK button to display the Edit Spreadsheet Layout dialog.
In the Formats group box, a default Format 1 will be displayed. Select the area of the spreadsheet to be affected by Format 1 in the Areas of Spreadsheet affected by format group box. In the above illustration, only the Variable Header check box is selected. Note that you can control the range of variable headers to be affected by Format 1 via the options in the Range of variables group box.
Next, click the Edit button in the Format group box to display the Edit Format: Format 1 dialog.
Select any formatting options to be associated with Format 1. Above, Comic Sans MS has been chosen as the Font, 10 for the Size, white for the Text Color, and blue for the Background color. Note that the Preview box is used to view your specified format.
Click the OK button to accept the settings and return to the Edit Spreadsheet Layout dialog.
To define additional formats for other areas of the spreadsheet, click the New button to display the New Format: Format 2 dialog, and select any formatting options to be associated with Format 2. In this manner, you can add additional formats to the spreadsheet layout as necessary. In the illustration above, Format 2 (Font - Comic Sans MS, Size - 10, Text Color - white, and Background color - blue) has been applied to the Case Headers.
When you are satisfied with the appearance of your spreadsheet layout, which is displayed in the Preview box, click the OK button. The spreadsheet layout that you just created is added to the list of formats available in the Spreadsheet Layouts dialog and on the Format - Spreadsheet submenu.
You can now apply this spreadsheet layout to new selections (see the next section).
You can apply any existing spreadsheet layout to the active spreadsheet. To do this, select Spreadsheet from the Format menu to display the list of available spreadsheet layouts. Then, select the name of the spreadsheet layout to apply to the spreadsheet.
Case states are collections of additional case information that can be assigned to individual cases in a spreadsheet.
You can assign case states in order to customize the appearance of points in graphical displays making it easy to identify influential and interesting points. A wide selection of symbols and colors is available to customize the appearance of selected points. Case states can be assigned in the spreadsheet before a graph is created, as well as interactively in the graph via the Brushing tool. Graphs and spreadsheets are fully linked, so changes made to case states in the graph will propagate back to the spreadsheet and vice versa.
Select a case in the spreadsheet. Next, select Edit Case States from the Data - Cases - Case States submenu, or right-click on the case and select Case States - Edit Case States from the shortcut menu, to display the Case State dialog, which contains options that display palettes of colors and symbols from which to select. Once you have made your selection, click OK.
Double-click on the variable header to display the Variable specifications dialog, where you can select the measurement type of the variable from a drop down list. Measurement types will map directly to subsequent analyses and graphs, identifying appropriate variables in each case (e.g., variables of type Categorical will be present within the list of categorical predictors available in a Factorial ANOVA). Note that you can also define a variable as Unspecified, in which case no distinction will be made about the measurement type of the variable. You can also define a variable as Auto, in which case STATISTICA will automatically determine the measurement type of the variable based on the values of the variable.
Yes. Double-click on the desired variable header to display the Variable specifications dialog, and select the Label check box. The values of the selected variable will be used as point labels in the scatterplot.
First, create a new variable. Double-click on the new variable heading to display the Variable specifications dialog. Specify the Type as Integer, specify the variable as a case state variable by selecting the Case State check box, and click OK. Then, select Copy to Variable from the Data - Cases - Case States submenu to display the Select a variable dialog. Select the appropriate variable and click OK. The case states will now be copied directly into the new variable.
In the Variable specifications dialog, click the Properties button to display the Variable Properties dialog. In this dialog, click the New Property button to display the New Property dialog, where you can name the new property, and click the OK button to close the dialog and return to the Variable Properties dialog. Specify the value of the new property in the Value field. Click the Remove button to remove the selected property from the variable.
Yes. On the Variable specifications dialog, select the Excluded check box to exclude the variable from analyses/graphs.
Yes. STATISTICA will (optionally) limit the availability of variables based upon the type of each variable and the specified analysis. A Show appropriate variables only check box is located in the lower-left corner of variable selection dialogs that enables this functionality. If the check box is cleared, all the variables will be displayed.
Yes, in STATISTICA, each numeric value can have two identities: the actual numeric value (e.g., 1) and a text label (e.g., Male) assigned to it. Note that in addition to text labels for numeric values, STATISTICA also supports text variables.
Normally, the assignments between text labels and numeric values are handled automatically as you enter or edit data in the spreadsheet. However, in some circumstances, you may want to review, edit, or restructure all assignments of values for a particular variable or copy text/numeric assignments from one variable to another. These operations can be performed in the Text Labels Editor dialog, accessible by selecting Text Labels from the Vars toolbar button menu or by clicking the Text Labels button in the Variable specifications dialog for a particular variable.
Use the options in the Text Labels Editor dialog to sort the assignments by text labels or numeric values, perform the Fill Down operation (to automatically reassign numeric values to text labels), Copy numeric/text assignments from other variables or from case names, and perform other operations.
Each value of a particular variable can simultaneously have a numeric and text identity. You can toggle the display of text labels or numeric values by selecting Display Text Labels from the spreadsheet View menu.
Yes, you can. Click the Copy From button in the Text Labels Editor (see above) to copy to the current variable the text labels, numerical values, and descriptions from other variables or from the case names.
When you need to copy the text labels, numerical values, and descriptions from one file to another, select Merge from the Data menu to display the Merge Options dialog. On the Text Labels tab, you can concatenate or replace text labels with labels from another file.
Text labels. When you copy or move a block in the spreadsheet, the values copied to the Clipboard depend on the display mode of the spreadsheet. If the spreadsheet displays numeric values when the block is copied, only those numeric values are copied to the Clipboard. If the spreadsheet displays text labels when the block is copied, not only are the text labels copied to the Clipboard, but also the corresponding numeric values.
In the illustration above, text labels are assigned to numeric values that did not previously have text label equivalents.
Display formats. When you copy or move a block in a spreadsheet, the display format for that block is also copied. This means that the display format of the copied or moved block overwrites the display format for the block into which it is copied. For example, if you copy a block of currency-formatted values and paste them into a variable that is formatted as percentage, the block remains formatted as currency even though the rest of the values in that variable are formatted as percentage.
In the example above, the four currency values copied from Cost are still formatted as currency even though they were pasted into the variable Percent.
STATISTICA Spreadsheets support mouse pointer-controlled column width, variable-speed scrolling, split scrolling, and block extension operations.
They also support standard (Excel-style) drag-and-drop operations such as copy, move, insert, autofill, and clear.
There are three types of operations that you can use to rearrange blocks of data or ranges of cases and variables in a data file. They each operate differently and can produce different effects.
This example shows how the selected portion of the spreadsheet will be copied to the Clipboard when using Copy with Headers or Copy (both available from the Edit menu).
Yes, they do. Spreadsheets can be split into up to four sections (panes) by dragging the split box (the small rectangular button above the up arrow at the top of the vertical scrollbar or to the left of the left arrow on the horizontal scrollbar). This is useful if you have a large amount of information and you want to review results from different parts of the spreadsheet. When you point to the split box, the pointer changes to Vertical Resize (or Horizontal). To position the split, drag to the desired position.
You can change the position of the split by dragging the split box (now located between panes) to a new position. Note that vertically split panes scroll together when you scroll vertically; horizontally split panes scroll together when you scroll horizontally.