STATISTICA






StatSoft Home Page

SPREADSHEETS

General Introduction

Can I Open More Than One Input Datafile Simultaneously?

Yes. Moreover, you can simultaneously run different analyses on the same or different datafiles. To open (and use) more than one input datafile simultaneously, open multiple datafiles and make sure each is specified as an input datafile (by selecting Input Spreadsheet on the Data menu) as you start the respective analyses.

For example, open the file Adstudy.sta and select Basic Statistics/Tables from the Statistics menu. When the Basic Statistics and Tables (Startup Panel) is displayed, double-click Descriptive Statistics on the Quick tab and calculate summary statistics for the variable Measure01.

Next, open the file Aggressn.sta and once again select Basic Statistics/Tables from the Statistics menu. This time, double-click t-test, independent, by groups and perform an independent t-test with Gender as the grouping variable and Aggressn as the dependent variable.

You will notice that the analyses stay associated with the different datafiles when you switch between them (using the buttons next to the STATISTICA Start button, above the status bar). Note that you can also perform different analyses on the same or different datafiles. For example, you could also perform analysis of variance on Adstudy.sta or on some other datafile.

While you can specify any number of spreadsheets as input datafiles, only one file (per workbook) can be designated as the active input datafile.



This means that if all your spreadsheets are opened within one workbook, only one of them can be used for input data at a time. The active input data spreadsheet is highlighted in the workbook tree with a red square, as shown above.

Can I Use the Results of One Analysis to Perform Another Analysis?

Yes. To designate a spreadsheet (i.e., the results from one analysis) as an input spreadsheet, select the spreadsheet (e.g., ensure the spreadsheet has the focus) and select Input spreadsheet from the Data menu. Now you can specify an analysis, and STATISTICA uses the selected spreadsheet as the input spreadsheet. Note that if you switch back to another spreadsheet that has previously been designated as an input spreadsheet, it can still be used for analyses as well.

While you can specify any number of spreadsheets as input datafiles, only one file (per workbook) can be designated as the active input datafile. This means that if all your spreadsheets are opened within one workbook, only one of them can be used for input data at a time. The active input data spreadsheet is highlighted in the workbook tree with a red square as shown in Can I Open More Than One Datafile Simultaneously? (above).

Can I Create a Custom List to Use for Extrapolation?

Yes. Select the Custom Lists tab of the Options dialog (accessible via Tools - Options) to edit or create custom lists for STATISTICA to use.



The Custom lists box reports all the custom lists currently available in STATISTICA. To see the entries included in a custom list, select it here and the entries are shown in the List entries box. To create a new list, select NEW LIST and enter the list items for it in the List entries box. Once you have added all the items for the new list, click the Add button and the list is added to those in the Custom lists box (see Mon., Wed., Fri. in the above illustration). When entering data in spreadsheets, you can enter the first few items in the list, and then use extrapolation to fill in the remaining cases in the spreadsheet (as shown below).

 

What Is a Matrix Spreadsheet?

A STATISTICA Matrix Spreadsheet (*.smx) is a spreadsheet file with a strictly defined format that can be used as input in several modules of STATISTICA (e.g., Cluster Analysis, Multidimensional Scaling, SEPATH). Matrix spreadsheets have to meet the following conditions:

1. The number of cases (rows) = the number of variables (columns) + 4.
2. The matrix must be a square matrix and the case names should be the same as the variable names.
3. The last four cases contain the following case names and information:

Means: The mean of each variable is given in this row; this case can be left empty (i.e., do not enter anything in this row) for Similarities and Dissimilarities matrices.
Std.Dev.: The standard deviation of each variable is given in this row; this case can be left empty (i.e., do not enter anything in this row) for Similarities and Dissimilarities matrices.
No.Cases: This required entry is the number of cases from which the matrix was produced, not the number of cases (rows of data) in this matrix file.
Matrix: This required entry represents the type of matrix file; 1 = Correlation, 2 = Similarities, 3 = Dissimilarities, and 4 = Covariance.

When entering these last four cases into the matrix file manually, be sure to spell the case names exactly as they appear above (i.e., Means, Std.Dev., No.Cases, and Matrix).

While a matrix spreadsheet looks identical to a regular spreadsheet, analyses that require matrix input (such as Multidimensional Scaling) only consider input spreadsheets that are also matrix spreadsheets.

You can create a matrix spreadsheet from any place that allows you to create a matrix of results [e.g., Basic Statistics - Correlations, Multiple Regression, Cluster Analysis, Multidimensional Scaling, and Structural Equation Modeling (SEPATH)]. To create the matrix, click the Matrix button. Note that clicking this button does not cause STATISTICA to save the matrix file to disk, rather it causes STATISTICA to create the special matrix spreadsheet. This matrix spreadsheet is always displayed in an individual window (regardless of the settings on the Options - Output Manager tab or the Analysis/Graph Output Manager dialog). You can, however, add the spreadsheet to a workbook or report using the Add to Workbook or Add to Report button, respectively.

When you save a matrix spreadsheet (using Save or Save As from the File menu), STATISTICA detects that it is in this special matrix form and sets the default file extension to *.smx.

Does STATISTICA Support Drag-and-Drop Opening of Spreadsheets and Graphs?

Yes. In Windows Explorer, you can simply drag any STATISTICA Spreadsheet (file name extension *.sta) or graphics file (file name extension *.stg) onto the STATISTICA icon to automatically open the file in STATISTICA.

How Much Disk Space Is Necessary to Perform Database Management Operations?

In order to allow you to revert to the original datafile after file editing, STATISTICA creates temporary and backup files. Thus, in order to edit a datafile, STATISTICA needs at least twice as much free disk space as the size of the file to be edited. More disk space is necessary if you perform large-scale editing operations (e.g., changing large blocks) on very large datafiles. To allow for the Undo function, STATISTICA needs to store multiple copies (up to 32) of the modified sections of data. Some operations (e.g., import/export) use intermediate scratch files, so they may need additional space.

Can I Undo Editing Operations?

You can undo most operations such as editing, moving, or copying blocks; random fill; recoding or ranking variables; etc. by selecting Undo from the spreadsheet Edit menu or pressing CTRL+Z. Multi-level undo is supported (with 32 buffers), so you can undo up to 32 of the most recent spreadsheet operations.

What Is the Difference between the File Header and a Customized Header?

File Header. The file header is located immediately above the variable headers at the top of the spreadsheet window. You can use this header to include a summary title or other identification of the datafile (or results). Double-click in the header area (or press F2) to edit/enter information.

To select the header only (for formatting), click once in the upper-left corner (the mouse pointer will be an outlined plus sign). Press CTRL+ENTER or ALT+ENTER to enter a new line (note that you need to extend the height of the field to see new lines that you are adding).

In addition to adding new lines to the file header, STATISTICA Spreadsheets (and all other STATISTICA documents) can easily be annotated by entering notes into the Comments area of the Document Properties (on the Summary tab) dialog, which is accessed by selecting Properties from the File menu. The Comments area can hold any type of comments or supplementary information.

Customized header. A customized header is used when printing spreadsheets (or other STATISTICA documents, e.g., reports, graphs, etc.). This header can include a title, page numbers, time of printing, date, the name of the file, and any custom text you might want to include. To create a header, select Header/Footer from the View menu. For more information, see the Modify Header/Footer dialog in the Electronic Manual. Note that while the file header is associated only with spreadsheets, any printed STATISTICA document can have a customized header.

What Graphs Are Available from Spreadsheets?

A variety of graphs is available from the spreadsheet shortcut menu (accessed by right-clicking in any cell or selected block of the spreadsheet). These graphs belong to two broad categories, Graphs of Block Data and Graphs of Input Data, and they produce either summary graphs from rows or columns of the currently highlighted block, or graphs as the original input data identified by the currently selected row and/or column of the spreadsheet (respectively).

How Do I Insert an Object into the Spreadsheet?

You can insert either a new or an existing object into the spreadsheet. To do this, select Object from the Insert menu to display the Insert Object dialog. To insert a new object, select the Create New option button, select the type of object to be created from the Object Type box, and click the OK button. In the following example, a Bitmap Image will be inserted into the spreadsheet.



Note that STATISTICA will now emulate Microsoft Paint, and you can edit the new bitmap image using Paint utilities (as shown below).



To return to the standard STATISTICA user interface, click a cell in the spreadsheet. (Note that if the Display As Icon check box is selected on the Insert Object dialog, then the object is displayed as an icon in the spreadsheet, and when you double-click the icon, the application in which it was created will be opened.)

To insert an existing object into the spreadsheet, select the Create from File option button on the Insert Object dialog. Next, click the Browse button to display the Browse dialog and select the file to be displayed in the spreadsheet. You can also select the Display As Icon check box to display the selected file as the icon that is pictured below the Display As Icon check box. Click the Change Icon button to specify a new icon. To link the object to the original file (so that changes in that file will automatically be reflected in the object that is displayed in STATISTICA), select the Link check box.



Finally, click the OK button.



Double-click on the object to edit it. It will be opened in the application in which it was created (unless both the Link and the Display As Icon check boxes are cleared, in that case, STATISTICA will emulate the application in which the object was created, as shown in the insertion of the Bitmap Image above).

Cases, Variables, Formats

What Are Cases and Variables?

STATISTICA datafiles are organized into cases and variables. If you are not familiar with this notation, you can think of cases as the equivalent of records in a database management program (or rows of a spreadsheet) and variables as the equivalent of fields (columns of a spreadsheet). Each case consists of a set of values of variables.

Why Do You Need Different Variable Types?

The difference between the text and the numeric types is straightforward; however, the main reason for having three types of numeric values is the storage efficiency. For most datafiles, that is not important and, thus, using the default (Double) data type is recommended. However, for very large datafiles, being able to switch to a 2 (or even 8) times more efficient storage (by using different data types) could make a difference between being able to perform the necessary analysis on a specific computer system or not.

How Do I Add/Delete Variables (Columns of Data)?

Perhaps the easiest way to add variables to an existing datafile is to expand the spreadsheet beyond the existing columns and double-click on the gray area of the spreadsheet. This will display the Add Cases and/or Variables dialog.



Specify the number of variables to add and click the OK button. The new variables are added to the far right of the datafile.

Variables can also be added to and/or deleted from a datafile either by (1) using the drag-and-drop operation (insert to add columns), or (2) selecting the appropriate option from the spreadsheet menu of global operations on variables, accessed by clicking the Vars button on the Spreadsheet toolbar.

How to add variables before variable 1. The quickest way to add variables before variable 1 is to select Add Variables from the spreadsheet Insert menu. In the After box of the Add Variables dialog, enter 0, which references the variable number. If the variable number is 0, it effectively means that you are inserting before the first variable.



Once you click the OK button, a new variable is inserted before the first variable of your spreadsheet.



Note that the name of the new variable is the name specified in the Name box of the Add Variables dialog (in this example, NewVar).

How Do I Add/Delete Cases (Rows of Data)?

Perhaps the easiest way to add cases to an existing datafile is to double-click on the gray area of the spreadsheet. This displays the Add Cases and/or Variables dialog.



Specify the number of cases you want to add and click the OK button. The new cases are added to the bottom of the datafile.

Cases can also be added to and/or deleted from a datafile either by (1) using the drag-and-drop operation (insert to add cases), or (2) selecting the appropriate option from the spreadsheet menu of global operations on cases, accessed by clicking the Cases button on the Spreadsheet toolbar.

How to add cases before case 1. The quickest way to add cases before case 1 is to select Add Cases from the spreadsheet Insert menu. In the Insert after case box of the Add Cases dialog, enter 0, which references the case number. If the case number is 0, it effectively means that you are inserting before the first case.



After you click the OK button, a new case is inserted before the first case of your spreadsheet.



Note that the name of the new case is 1. To change this, double-click on the 1 and enter a new name.

Can I Select (Highlight) the Entire Spreadsheet?

Following standard spreadsheet conventions, click in the upper-left corner of the spreadsheet (in the Info Box). This shortcut is useful, for example, when you intend to copy the entire file to the Clipboard or reformat the entire spreadsheet. You can also press CTRL+A or select Select All from the spreadsheet Edit menu or shortcut menu (accessed by right-clicking in the Info Box).



Note that by clicking twice in the Info Box, you deselect (i.e., select and then deselect) the entire spreadsheet.

Can I Edit the Contents of a Cell (and Not Overwrite It)?

Follow the standard spreadsheet (e.g., Microsoft Excel) conventions, specifically:Double-click on the cell. To avoid clearing the contents of the cell at the point when you start entering a correction, double-click on the cell before you start typing. This action enters the edit mode and positions the cursor within the cell.

Press F2. Alternatively, you can follow the "old" spreadsheet convention and press the F2 key, which also enters the edit mode for the currently highlighted cell.

How Are Dates Represented in STATISTICA Datafiles?

Date values of variables are stored internally in Julian format, that is, as a single integer value that represents the number of days that have passed since January 1, 1900. For example, a date entered and displayed as 1/21/1968 is stored as the Julian date 24858; the (optional) decimals are interpreted as time. Date values stored in this manner can be used in subsequent analyses (e.g., in Survival Analysis in order to calculate survival times, see below) and transformed using arithmetic operations; at the same time, they can be displayed as dates in reports or graphs (e.g., used to label scale values).

Julian date values can be displayed in the spreadsheet in numeric (Julian) format or in one of several predefined date display formats (e.g., 1/6/64, 6-Jan-64, Jan-1964, 01/06/64, 01/06/1964, 6-Jan).



To change the date display format, select Date in the Display format group of the Variable specifications dialog (accessible by double-clicking on the variable name in the spreadsheet or selecting Variable Specs from the Tools menu) and choose one of the predefined display formats. When you enter data into a new variable using a format that is recognized as valid in the Display format group of the Variable specifications dialog (e.g., Time, Date, Currency, etc.), STATISTICA displays the Auto Format Cells dialog. In this dialog, you can either apply the format to the individual cell or to the whole column.



You can create a single date variable from two (month, year or day, month) or three (day, month, year) variables as well as split a single date variable into two or three variables via the Date Operations dialog (accessed from the Data - Date Operations menu).



Refer to the on-line Electronic Manual (press the F1 key or click the Help button) for examples of creating date variables from numeric variables and splitting date variables into numeric variables.

How Is Time Represented in STATISTICA Datafiles?

Time values of variables are stored as (optional) decimal values representing the fraction of the day since midnight; for example, 6:00 A.M. is stored as 0.25. Time values stored in this manner can be used in subsequent analyses and transformed using arithmetic operations; at the same time, they can be displayed as times in reports or graphs (e.g., used to label scale values).



STATISTICA formats the display of time values according to the current settings of the Windows Control Panel.



See the previous topic (How Are Dates Represented in STATISTICA Datafiles) for a discussion of the Auto Format Cells and the Date Operations dialogs. Refer to the on-line Electronic Manual (press the F1 key or click the Help button in the Date Operations dialog) for examples of creating time variables from numeric variables and splitting time variables into numeric variables.

How Do I Convert Date-Values into Text Labels and Vice-Versa?

In some circumstances, it can be useful to create text labels with date information [e.g., when using a date variable as a coding variable with codes greater than 32,000 (e.g., As Values of Grouping Variables)? above]. In this case, you can transform the date variable into a variable containing date text labels with numeric equivalents in a range that allows them to be used as codes by all procedures of STATISTICA (i.e., numeric values less than 32,000). Use the Dates Operations - Text Dates tab to convert dates into codes.



This dialog is accessed by selecting Date Operations from the Data menu.

How Do I Review and Edit Variable Specifications (Names, Formats, Notes, Formulas, etc.)?

Specifications of a single variable. Double-click on a variable name in a spreadsheet to display the Variable specifications dialog where you can change the variable name, font format, display format, missing data value, long label, or formula for the current variable. (As in most other facilities commonly used in STATISTICA, this dialog can also be accessed from the toolbar, shortcut menus, and menus.)



This dialog can also be used to access descriptive statistics and a listing of sorted numeric values and text labels for the current variable (click the Values/Stats button). From this dialog, you can also access the Text Labels Editor (click the Text Labels button) to review and change the assignments between the text and numeric values for the current variable. Note that you can click the Scroll left or Scroll right button to scroll through each variable in your datafile.

Specifications of all variables. You can also click the All Specs button in the Variable specifications dialog to display an editable, combined table of specifications of all variables in the current datafile. Note that selecting All Variable Specs from the Data menu or All Specs from the Vars button on the Spreadsheet toolbar can also access this dialog.

The table format is convenient when you need to compare or edit specifications of several variables, especially when you need to paste and copy between variables (e.g., comments, formulas, or links), or extend a format definition or missing data code from one variable to subsequent variables.



The shortcut menu contains commands to Add Vars, Delete Vars, Cut, Copy, Paste, and Fill/Copy Down.

How Are Missing Data Handled in STATISTICA?

Specifying missing data codes. A missing data code, that is, a value that signifies that there is no data for a particular case and variable (displayed as a blank cell in the spreadsheet), can be specified separately for each variable. To change the code, double-click on a variable name in the spreadsheet to access the Variable specifications dialog for that variable. Alternatively, click the Vars toolbar button and select All Specs from the resulting menu to display the Variable Specifications Editor dialog for all variables. The default Missing Data code in STATISTICA (used when new files are created, new variables are added, or data are imported) depends on the variable type being used.

Processing cases with missing data. The way in which missing data are handled when processing data can be adjusted individually for each analysis (see the MD deletion group in most analysis definition dialogs). Whenever applicable, you have the choice to eliminate them from calculations in a casewise or pairwise manner, substitute them with means, or reconstruct or interpolate them (e.g., in the Time Series module).

Press the F1 key or click the help button in the respective analysis-definition dialog to learn about the specific missing data handling options available for the procedures of interest.

How Do I Change the Missing Data Code for Individual Variables?

The value used to designate missing data values for individual variables can be changed in all Variable specifications dialogs.

How Do I Edit Case Names?

In order to enter or edit the current case names, double-click on any case name in the spreadsheet. To manage (e.g., change the width, copy from a variable, etc.) case names, right-click on any case name and select Case Names Manager from the resulting shortcut menu to display the Case Names Manager dialog. The Case Names Manager command is also available from the Cases toolbar button and the Tools menu.



If the current datafile contains no case names, you can create them by double-clicking on the case number column or resizing the case number column with the mouse pointer.

How Do I Ceate a New Cell/Block Format?

A format is a group of formatting options that you can save with an assigned name for future use. Formats can be assigned to numbers as well as text. To create a new format in a spreadsheet, follow these instructions:

Select Format Manager from the Format - Block menu to display the Format Manager dialog.



In this dialog, click the New button to display the Format Name dialog.



Enter a name in the Name box, and then click the OK button to display the Edit Format dialog.



This dialog contains four tabs (Number, Alignment, Font, and Border) that are used to specify the formatting components for the new format. Select the desired formatting options for your new format. In the illustration above, the Bold check box has been selected in the Style/Effect group and the Text Color box has been changed to navy.

Click the OK button to close the Edit Format dialog. The format that you just created is added to the list of formats available on the Format Manager dialog and the Format - Block menu.



You can now apply this format to new selections (see the next section).

How Do I Apply a Format?

Formats can be applied quickly and consistently to segments of data. Once a format has been created, it can be applied to any part (or all) of the data in a spreadsheet.

To apply an existing format, first select the cell(s) that you want to format in the spreadsheet. Then select Block from the Format menu to display the list of available formats. (Optionally, right-click on the selected cell(s) and choose Format from the shortcut menu, as displayed below.) Finally, select the name of the format to apply to the selected cells.

How Do I Create a New Spreadsheet Layout?

A spreadsheet layout is a convenient way to combine sets of formatting options into one collection. This feature can be used to specify and apply different formats for different components of a spreadsheet. Once you have created a spreadsheet layout, it can be applied to any spreadsheet for a consistently formatted appearance. Follow these instructions to create a new spreadsheet layout:

Select Layout Manager from the Format - Spreadsheet menu to display the Spreadsheet Layouts dialog.



In this dialog, click the New button to display the Spreadsheet Layout Name dialog.



Enter a name in the Name box, and then click the OK button to display the Edit Spreadsheet Layout dialog.



In the Formats group, a default Format 1 will be displayed. Select the area of the spreadsheet to be affected by Format 1 in the Areas of Spreadsheet affected by format group. In the above illustration, only the Variable Header check box is selected. Note that you can control the range of variable headers to be affected by Format 1 by using the Range of variables group.

Next, click the Edit button in the Format group to display the Edit Format: Format 1 dialog.



Select any formatting options to be associated with Format 1. Above, Comic Sans MS has been chosen as the Font, 10 for the Size, white for the Text Color, and blue for the Background color. Note that the Preview group is used to view your specified format.

Click the OK button to accept the settings and return to the Edit Spreadsheet Layout dialog.



To define additional formats for other areas of the spreadsheet, click the New button to display the New Format: Format 2 dialog and select any formatting options to be associated with Format 2. In this manner, you can add additional formats to the spreadsheet layout as necessary. In the illustration above, Format 2 (Font - Comic Sans MS, Size - 10, Text Color - white, and Background color - blue) has been applied to the Case Headers.

When you are satisfied with the appearance of your spreadsheet layout, which is displayed in the Preview group, click the OK button. The spreadsheet layout that you just created is added to the list of formats available on the Spreadsheet Layouts dialog and the Format - Spreadsheet command.



You can now apply this spreadsheet layout to new selections (see the next section).

How Do I Apply a Spreadsheet Layout?

You can apply any existing spreadsheet layout to the active spreadsheet. To do this, select Spreadsheet from the Format menu to display the list of available spreadsheet layouts. Then, select the name of the spreadsheet layout to apply to the spreadsheet.

Text Labels

Can I Label Numeric Values?

Yes, in STATISTICA, each numeric value can have two identities: the actual numeric value (e.g., 1) and a text label (e.g., Male) assigned to it.

Note that in addition to text labels for numeric values, STATISTICA also supports text variables.

How Do I Enter/Edit the Assignments between Numeric Values and Text Labels?

Normally, the assignments between text labels and numeric values are handled automatically as you enter or edit data in the spreadsheet. However, in some circumstances, you may want to review, edit, or restructure all assignments of values for a particular variable or copy text/numeric assignments from one variable to another. These operations can be performed in the Text Labels Editor dialog, accessible by selecting Text Labels from the Vars toolbar button menu or by clicking the Text Labels button in the Variable specifications dialog for a particular variable.

Use the Text Labels Editor dialog to sort the assignments by text labels or numeric values, perform the Fill Down operation (to automatically reassign numeric values to text labels), Copy numeric/text assignments from other variables or from case names, and perform other operations.

Can I Switch between Displaying Text Labels and Numeric Values in the Data Spreadsheet?

Each value of a particular variable can simultaneously have a numeric and text identity. You can toggle the display of text labels or numeric values by selecting Display Text Labels from the spreadsheet View menu.

Can I Copy a Set of Text Labels, Numeric Values, and Descriptions to Other Variables and Files?

Yes, you can. Click the Copy From button in the Text Labels Editor (see above) to copy to the current variable the text labels, numerical values, and descriptions from other variables or from the case names.

When you need to copy the text labels, numerical values, and descriptions from one file to another, select Text Labels from the Data - Merge menu. This allows you to concatenate or replace text labels with labels from another file.

Can Clipboard and Drag-and-Drop Operations in the Spreadsheet Affect Values Outside the Range that is Being Pasted or Dropped?

Text labels. When you copy or move a block in the spreadsheet, the values copied to the Clipboard depend on the display mode of the spreadsheet. If the spreadsheet displays numeric values when the block is copied, then only those numeric values are copied to the Clipboard. If the spreadsheet displays text labels when the block is copied, then not only are the text labels copied to the Clipboard, but also the corresponding numeric values.



This can result in the assignment of text labels to numeric values that did not previously have text label equivalents. Note that in the illustration above, even though only the (highlighted) block was moved, other numeric values in the target variable (in this case 1's and 2's), acquired new text identities.

Display formats. When you copy or move a block in a spreadsheet, the display format for that block is also copied. This means that the display format of the copied or moved block overwrites the display format for the block into which it is copied. For example, if you copy a block of values that are formatted as currency and paste them into a variable that is formatted as percentage, the block remains formatted as currency even though the rest of the values in that variable are formatted as percentage.



In the example above, the four currency values copied from Cost are still formatted as currency even though they were pasted into the variable Percent.

Operations on Blocks of Values

What Are the Drag-and-Drop Facilities?

STATISTICA Spreadsheets support all standard spreadsheet (Excel-style) block drag-and-drop operations (copy, move, insert, etc.) and block extension operations, as well as mouse pointer-controlled column width, variable-speed scrolling, split scrolling, etc.

Moving a block. You can move a block by pointing to the border of the selection (the mouse pointer changes to an arrow) and dragging it to a new location.



Copying a block. To copy a block, point to the border of the selection (the mouse pointer changes to an arrow), then drag the selection to a new location while pressing the CTRL key. Note that when you are dragging the selection, a plus sign is displayed next to the cursor to indicate you are copying the text rather than moving it (see the cursor in the illustration below).



Inserting a block. To insert a block between columns or rows, point to the border of the selection (the mouse pointer changes to an arrow), then drag the selection while pressing the SHIFT key. If you point between rows, an insertion bar appears between the rows, and when you release the mouse button, the block is inserted between those two rows [creating new case(s)]. If you point the cursor between columns, an insertion bar appears between the columns, and when you release the mouse button, the block is inserted between those two columns [creating new variable(s)]. Note that if you also press the CTRL key while you are dragging the selection, the block will be copied and inserted instead of moved and inserted; a plus will appear next to the arrow-cursor, as shown in the illustration below.

 

Note that all SHIFT + drag-and-drop operations change the size of the datafile; the same is true of simply dragging blocks outside the boundaries of the current datafile. However, dragging (without SHIFT) within the current datafile boundaries produces results identical to using the Clipboard (except that the Clipboard is not used).

Clearing a block. To clear the contents of a block of cells, drag the block Fill Handle (a small solid square located on the lower-right corner of the block) within the selected block. After selecting the block, point at the lower-right corner of the block displaying the Fill Handle. When the mouse pointer changes to a plus sign, drag within the selected block. As you are dragging, the values within the block are dimmed. When you release the left mouse button, the dimmed values are deleted.

 

Extrapolating a block (AutoFill). Values within a block can be extrapolated by using intelligent, Excel-style, series extension facilities. After selecting the block, point at the lower-right corner of the block displaying the Fill Handle. When the mouse pointer changes to a plus sign, drag to the last cell(s) to which you want to extrapolate the data.

 

Numeric values are extrapolated using linear regression; text values are extrapolated following meaningful sequences (e.g., Q1, Q2, Q3, ..., or JAN, FEB, MAR, ...).

Can I Rearrange Blocks of Data or Ranges of Cases and Variables in a Datafile?

There are three types of operations that you can use to rearrange blocks of data or ranges of cases and variables in a datafile: Clipboard-based, global, and drag-and-drop-based; they operate differently and can produce different effects.

Clipboard-based operations. The standard Clipboard-based operations of cut, copy, and paste (implemented via the standard Clipboard keyboard, toolbar, or menu commands) affect only the contents of blocks of data, rows, or columns. They do not influence the overall size of the datafile (e.g., they can empty a column but will not remove the column from the spreadsheet).

Global operations. Global operations (accessible from the Vars and Cases Spreadsheet toolbar buttons and the spreadsheet Insert menu) are performed on entire rows or columns as "units"; for example, they move or delete entire columns, not only their contents. Hence, the size of the datafile increases or decreases when you use these options.

Drag-and-drop operations. STATISTICA supports standard (Microsoft Excel-style) drag-and-drop facilities, allowing you to easily perform both Clipboard-style (but without using the Clipboard) and global (if the insert mode is implemented by pressing the SHIFT key) operations.

What Is the Difference Between Copy and Copy with Headers?

The example below shows how the selected portion of the spreadsheet will be copied to the Clipboard when using Copy with Headers or Copy (both available from the Edit menu).



Copy with Headers:


Copy:

Do STATISTICA Spreadsheets Support Split Display?

Yes, it does. Spreadsheets can be split into up to four sections (panes) by dragging the split box (the black rectangle at the top of the vertical scrollbar or to the left of the horizontal scrollbar). This is useful if you have a large amount of information and you want to review results from different parts of the spreadsheet. When you guide the mouse pointer to the split box, the pointer changes to Vertical Resize (or Horizontal). Now, to position the split, drag to the desired position.



You can change the position of the split by dragging the split box (now located between panes) to a new position. Note that vertically split panes scroll together when you scroll vertically; horizontally split panes scroll together when you scroll horizontally.



How Can I Expand a Block in the Spreadsheet Outside the Current Screen? Blocks can be selected by (1) selecting a block with the mouse pointer, (2) clicking in one corner of the block to be selected, scrolling to the desired opposite corner (the original cell remains selected), and SHIFT+click in that cell, or (3) pressing the SHIFT key while using the cursor keys on the keyboard. To expand a previously selected block, you can use the SHIFT+cursor key, or scroll the display with the mouse pointer and SHIFT+click in the desired corner of the block. To select a large block in "split-pane" mode (as shown below), click in a cell in one pane, and then scroll (in the same pane) to display the diagonally opposite corner of another pane and SHIFT+click to select the block.



Variable speed of block highlighting. Note that you can control the speed at which you scroll when you extend a block outside the current display window. By moving the cursor a short distance away from the spreadsheet, you can scroll one line at a time when a block is selected; you can scroll one page at a time by moving the cursor farther away from the spreadsheet.

Can I Expand the Datafile Using Drag-and-Drop?

You can use drag-and-drop operations to increase the size of the spreadsheet by moving or copying a block [either all or some of the variable(s) or case(s)] beyond the current spreadsheet boundaries.



You can also expand the spreadsheet using the Insert operation. This operation inserts the block between cases or variables, thereby increasing the size of the spreadsheet. Note that whether you move, copy, or insert new cases or variables, when the block contains only part of a variable or case, then STATISTICA fills the remaining values in the variable/case with missing data.

Splitting and Merging Files

How Do I Split a STATISTICA Datafile into Smaller Files?

Select Subset/Random Sampling from the Data menu to display the Create a Subset/Random Sampling dialog, where you can create a subset of your datafile based on the specified variables and either case selection conditions or random sampling.



Once your selections have been made, a new spreadsheet with the requested subset is created, and you can continue the analyses using the subset of data. See Subset/Random Sampling for further details.

Saving subsets. When you intend to save the subset, select either Save As or Save from the File menu to save the new spreadsheet. The original file is not in danger of being overwritten unless you use the same name.

Permanent vs. temporary subsets. Subset creation is designed mostly to create permanent subset files, because in order to temporarily select a subset of data to be analyzed, the on-line Case Selection Conditions are more convenient to use.

How Do I Merge Two STATISTICA Datafiles?

Select Merge from the Data menu to display a submenu of merge options.



Either cases (rows of data) or variables (columns of data) can be merged. In other words, either the second file is appended to the "bottom" of the first one, or it is appended to the "right side" of the first one.

If you select Merge - Variables & then additional options are available.

For example, you can select Relational or Relational hierarchical option buttons, for which a key variable (in each file) is used to "match" cases based on the corresponding values of the key.

Can I Merge the Text Labels/Numeric Values from Two Files?

Yes. Select Text Labels from the Data - Merge menu to display the Merge Files - Text Labels dialog.



In this dialog, you can select the way in which the text values from the two files are to be merged.

Formulas, Transformations, Recording, Verifying, Cleaning Data

How Do I Calculate (Transform) Values of a Variable?

Spreadsheet formulas. To perform data transformation and recoding operations on single variables using one (as opposed to a set of transformation formulas), you can use the data spreadsheet formulas. Double-click on the variable name in a spreadsheet variable that you want to transform to display the Variable specifications dialog (see below) where a data transformation or recoding formula can be typed directly into the Long name (label or formula with Functions) box.

Following the Windows spreadsheet formula conventions (e.g., Microsoft Excel), start the formula with an "=" sign (otherwise STATISTICA will not recognize that the text is to be interpreted as a formula). For example, enter =(v1+v2)/2. Variables can be referenced by their names (e.g., Income, profit, TEST1) or numbers (e.g., v1, v2, v3, ...); v0 is the case number. Logical operators can be used to define conditional transformation expressions (e.g., = ScoreA + ScoreB). Note that <, <= , >, > =, and <> can be used to specify logical conditions.

Variable names can include special characters (e.g., spaces, plus or minus signs). Variable names that include special characters must be enclosed in quotation marks (e.g., = 'Score A' + 'Score B'). If a quotation mark itself occurs in the variable name, use double quotation marks instead (e.g., = "A's Score" + "B's Score"). Note that if double quotation marks are used in the name, then the variable name must be enclosed in single quotation marks.

Following the logic of the concept of "missing data," the formula will evaluate to missing data if any variable used in the formula is missing. However, several functions are supported that calculate specific statistics for ranges and/or lists of data "adjusting" for missing values. For example, the formula = mean (v1:v5, v7, sqrt(v9), TIME) will calculate the mean of variables v1, v2, v3, v4, v5, v7, square root of v9, and variable TIME, even if some of them may have missing values.



Click the Functions button on the Variable specifications dialog to access the Functions Browser facility that can be used to select functions and other elements of the syntax (for the formulas) and insert them into the formula editor.



The Function Browser also is used to quickly access information on any of the functions and the complete on-line syntax documentation. After entering a formula and clicking the OK button on the Variable specifications dialog, you are given the option to recalculate the variable now. If you choose not to recalculate at this time, you can do so later by clicking the Spreadsheet toolbar Recalculate button, selecting Recalculate from the Data menu, or pressing F9.

The STATISTICA Visual Basic programming language. If you need to write more complex data transformation programs than those that can be entered via spreadsheet formulas, the integrated programming language STATISTICA Visual Basic can be used. The industry standard STATISTICA Visual Basic language offers incomparably more than just a "supplementary application programming language" that can be used to write custom extensions. STATISTICA Visual Basic takes full advantage of the object model architecture of STATISTICA and allows you to access programmatically every aspect and virtually every detail of the functionality of the program. STATISTICA Visual Basic adds an arsenal of more than 10,000 new functions to the standard comprehensive syntax of Microsoft Visual Basic, thus comprising one of the largest and richest development environments available.

STATISTICA Visual Basic programs can be executed from within STATISTICA, but because of the industry standard compatibility of STATISTICA Visual Basic, you can also execute its programs from any other compatible environment (e.g., MS Excel, MS Word, or a stand-alone Visual Basic language).

The STATISTICA Visual Basic environment includes a flexible program editor and powerful debugging tools. To display the STATISTICA Visual Basic editor, select New from the File menu to display the Create New Document - Macro (SVB) Program tab. When editing macro programs by typing in general Visual Basic commands or program commands specific to STATISTICA Visual Basic, the editor displays type-ahead help to illustrate the appropriate syntax.



Help on the members and functions for each class (object) is also provided in-line.



You can access examples and concise syntax summaries by pressing the F1 key or clicking the help button to display the Electronic Manual. Note that to learn more about a particular Visual Basic function, select the function in the STATISTICA Visual Basic editor and then press F1 on your keyboard. This will call the Interactive Syntax Help Engine and display the help for that keyword.

When executing a program, you can set breakpoints in the program, step through line by line, and observe and change the values of variables in the macro program as it is running.



To summarize, STATISTICA Visual Basic is not only a powerful programming language, but it represents a very powerful professional programming environment for developing simple macros as well as complex custom applications.

Can STATISTICA Automatically Recalculate All Spreadsheet Formulas When the Data Change?

Yes. Select the Auto-recalculate when the data change check box in the Recalculate dialog (accessed by clicking the Recalculate button on the toolbar or selecting Recalculate from the Data menu).



STATISTICA will automatically recalculate all spreadsheet formulas when data are changed in the spreadsheet. Alternatively, you can press F9 to manually recalculate the formulas in the spreadsheet.

What Is the Simplest Way to Recode Values of a Variable (e.g., Split a Continuous Variable into Categories)?

Recoding functions of practically unlimited complexity can be custom-defined in STATISTICA Visual Basic and used repeatedly in your data transformation programs.

However, a quick on-line recoding facility can also be accessed directly from the spreadsheet at any point. Select Recode from the Spreadsheet Vars toolbar button to display the Recode Values dialog. You can scroll the Recode Values dialog to define new values (text or numeric) of the current variable (see the groups New Value 1, New Value 2, New Value 3, etc., below) depending on the specific conditions that you define (see the Category 1, Category 2, Category 3, etc., groups below).



For example, the recoding conditions specified above would "translate" the values less than or equal to 3 of the current variable (Quality) into Low, the values greater than 3 and less than or equal to 6 into Medium, and the values greater than 6 and less than or equal to 10 into High. Note that any other values that occur in the Quality variable would be "translated" into the missing data code because the MD Code option button is selected in the Other group. Note also that STATISTICA first finds all cases that meet Category 1 and recodes them; then STATISTICA only searches through the remaining cases that have not be recoded to see if they meet Category 2, etc. Hence, you do not have to specify Quality > 3 AND Quality < = 6 as Category 2.

When specifying conditions, follow the standard syntax conventions common in STATISTICA to all those procedures that involve any operation of "selecting cases" based on their values. You can also save the current recoding conditions to a text file or open a file with previously saved conditions.

Note that recoding conditions can be much more complex (see the Electronic Manual by pressing the F1 key or clicking the Help button in this dialog), and they can be defined such that the new values of the current variable do not depend on the old values of that variable, but only on values of some other variables in the datafile. Thus, this facility can be used not only to recode existing data, but also to create values of a new variable based on conditions met by other variables.

How Do I Create Values of a New Variable Based on Conditions Met by Other Variables?

You can use either of the data transformation facilities: spreadsheet formulas or STATISTICA Visual Basic. However, often the quickest way to create values of a new variable based on conditions met by other variables is to use the on-line data recoding facility, which is accessible at any point from the spreadsheet by clicking the Vars button and selecting Recode or selecting Recode from the Data menu. Note that the currently selected variable does not even have to be included in the text of the recoding conditions. Thus, you can use this facility to create values of a variable based on conditions met by other variables.

For example, you can add a new (empty) variable to the datafile, and then use this facility to create the new values. For instance, the recoding conditions could be used to assign 1's to the new variable for all "male subjects 18 to 25 years old with cholesterol levels below 200," 2's to "male subjects 18 to 25 years old with cholesterol levels above 200," and assign the missing data value to all other subjects.

What Syntax Can Be Used to Create Case Selection/Verification/Recode Conditions (or to Calculate Variable Values Using Spreadsheet Formulas)?

Recoding variable values, performing case selection for analyses, or verifying quality of data involve a comparison of conditions or values encountered in the spreadsheet with conditions specified by an expression to determine a further action for each case (i.e., respectively assigning a new variable value, including the case in an analysis, or verifying that the data associated with a case are correct). Assigning variable values via a spreadsheet formula involves case-by-case calculation of values for a variable based directly on the values of other spreadsheet variables and/or parameters not included in the spreadsheet. For data transformations more extensive or more complex than can be accomplished using the variable recode and spreadsheet formula facilities provided, use the STATISTICA Visual Basic language.



Syntax conversions for spreadsheet formulas. Spreadsheet formulas (specified on the Variable Specifications dialog accessed via the Data - Variable Specs menu) must start with an equal sign. When you enter a label that starts with an equal sign, STATISTICA will assume that it is a formula and will verify it for formal correctness. Note that a semicolon after a formula starts a comment: e.g., = v1 + v2; this is a comment.

Operators. A number of arithmetic, relational, and logical operators are available for creating expressions for recoding, case selection, and data verification or for creating spreadsheet formulas.

Arithmetic: +, -, *, /, ** or ^ (exponentiation), ( )
Relational:
= (equal to)
<>, >< (not equal to)
< (less than)
> (greater than)
<= (less than or equal to)
>= (greater than or equal to)
Logical:
AND (equivalent to &)
OR (equivalent to !)
NOT (equivalent to ~)
IsMD (is missing value)
Note that a common error is caused by omitting parenthesis needed to adjust for the precedence of operators; for example, the expression x > 0 and x < 1 is incorrect and needs parentheses: (x > 0) and (x < 1) because relational operators(>, <) have a lower precedence than the conjunction (and).

Math functions. Math functions can be used in expressions for recoding, case selection or data verification as well as in spreadsheet formulas for calculating variable values. If the value of any variable used in the expression or formula is missing (in the current case), then the expression evaluates to missing data (for the current case). Function names are not case sensitive, i.e., Log(x) is the same as log(x) or LOG(x). As indicated in the function list above, math functions accept one to two arguments depending on the function. The position of the function in the expression or spreadsheet formula will be replaced by the return value of the function. Numeric values (e.g, Sqrt(155)), variable names (e.g., Max(SCORE1,SCORE9)), or variable numbers (e.g., Log(v8)) are acceptable arguments. Additional arguments acceptable to math functions are expressions that evaluate to a number (e.g., Max(v7,(v5+v8-BASELINEVALUE)/3)) or functions that return a numeric result (e.g., Sin(Sqrt(v5))). Some commonly used constants can also be specified in expressions and formulas by reference: e.g., Pi = 3.14... Euler (e) = 2.71...

Statistics functions. Nine statistics functions that accept lists of values and/or ranges and arguments, and adjust to missing data, are also available for use in recoding, case selection, and verification expressions as well as in spreadsheet formulas: Note that the Statmin and Statmax functions have the stat prefix to distinguish them from the arithmetical Max and Min (math) functions discussed above. Like math functions, the names of statistics functions are not case sensitive when used in expressions and formulas. All statistics functions ignore arguments that contain missing values, basing their results on non-missing data only. Unlike math functions, statistics functions will not evaluate to a missing result unless all arguments are missing values or an argument is encountered that evaluates to an undefined value (e.g., square root of a negative number or division by zero). Also unlike math functions, each statistics function accepts any number of arguments placed in parentheses and separated by commas.

The following are acceptable statistics function arguments:

Distribution functions and their integrals. STATISTICA provides a predefined broad selection of distribution functions, their integrals and inverse distribution functions that can be used in spreadsheet formulas and in recoding, case selection, and verification expressions like all other functions.

Below is a listing of all available distributions (parameters are given in parenthesis):


Distribution

Density/Probability Function

Distribution Function

Inverse Distribution Function


Beta
Binomial
Cauchy
Chi-square
Exponential
Extreme
F
Gamma
Geometric
Laplace
Logistic
Logonormal
Normal
Pareto
Poisso
Rayleigh
Student's t
Weibull


beta(x,n,w)
binom(x,p,n)
cauchy(x,h,q)
chi2(x,n)
expon(x,l)
extreme(x,a,b)
F(x,n,w)
gamma(x,c)
geom(x,p)
laplace(x,a,b)
logis(x,a,b)
lognorm(x,m,s)
normal(x,m,s)
pareto(x,c)
poisson(x,l)
rayleigh(x,b)
student(x,df)
weibull(x,b,c,q)


ibeta(x,n,w)
ibinom(x,p,n)
icauchy(x,h,q)
ichi2(x,n)
iexpon(x,l)
iextreme(x,a,b)
iF(x,n,w)
igamma(x,c)
igeom(x,p)
ilaplace(x,a,b)
ilobis(x,a,b)
ilognorm(x,m,s)
inormal(x,m,s)
ipareto(x,c)
ipoisson(x,l)
irayleigh(x,b)
istudent(x,df)
iweibull(x,b,c,q)


vbeta(x,n,w)

vcauchy(x,h,q)
vchi2(x,n)
vexpon(x,l)
vextreme(x,a,b)
vF(x,n,w)
vgamma(x,c)

vlaplace(x,a,b)
vlogis(x,a,b)
vlognorm(x,m,s)
vnormal(x,m,s)
vpareto(x,c)

vrayleigh(x,b)
vstudent(x,df)
vweibull(x,b,c,q)


See the Electronic Manual for detailed discussion of the distribution types and the parameters that are required by each function.

How Can I Verify and "Clean" Data?

Select Verify from the Data menu to access an interactive data-verification and cleaning facility. Use the Verify Data dialog to enter the conditions to be met by the data.



Follow the standard syntax conventions common in STATISTICA to all those procedures that involve any operation of "selecting cases" based on their values. You can also save the current verification condition to a text file or open a file with previously saved conditions.

The verification can be as simple as checking whether values in a variable are "legal" (e.g., only 1 and 2 might be allowed for Gender) or whether they fall within allowed ranges of values (e.g., Age must be more than 0 and less than 200). It can also be as complex as checking multiple logical conditions that some values must meet in relation to other values.

Consider the following example of conditional verification:

If a person is a male or less than 10 years old, then the number of pregnancies for that person cannot be more than zero.

In order to apply these conditions, you would specify (for example):

Invalid if: (v1='MALE' or AGE<10) and PREGN>0

Once you have entered your verification condition(s), click either the Find First button to select the first invalid case in your datafile (after this first case has been selected, you can find the next case by selecting Find Next Invalid Case from the Data - Verify Data menu) or click the Mark All button to mark all of the invalid ycases in the datafile according to the Marked Cells spreadsheet layout.

How Do I Perform a Multiple Sort?

Select Sort from the Data menu, Sort Cases from the Cases button, or click the toolbar button. All of these actions display the Sort Options dialog where you can specify the key variables and the type of sort.



If you need to sort based on more than three keys, click the More Keys button to switch to a larger dialog.

How Do I Rank-Order Values of a Variable (Replace Values with Their Ranks)?

Select Rank from the Vars button on the Spreadsheet toolbar or from the Data menu to display the Rank Order Values dialog.



Then click the Variables button to select the variables to be ranked. Note that this operation replaces the original values with their respective ranks based on the sorted series. Optionally, you can also specify a subset of cases to be affected by the operation (click the Cases button), use case weights (click the Weight button), and set a number of options to perform particular (non-default) types of ranking.

How Do I Transpose Data (Convert Cases Into Variables)?

Select Transpose from the Data menu, then select either Block or File.



Transposing a block of data. The block transposing command affects only the contents of cells in the block currently selected in the spreadsheet (the block must be square); the variable names and case names will not be affected. For example, the Transpose - Block command executed on the following square block of data:



produces the following result:



Transposing a datafile. Datafile transposing restructures the entire file. For example, transposing the datafile shown above (before the block was transposed), changes the number of cases and variables in the datafile:



Note that when you transpose a file, case names become variable names and variable names become case names.

How Do I Automatically Fill Ranges of Data in the Spreadsheet?

In addition to the custom-defined operations of filling specific ranges of data with values available via the data transformation options, STATISTICA can fill selected blocks with random values. STATISTICA also supports standard (Microsoft Excel-style) block extrapolation operations, which can be implemented by dragging a border of the block onto the adjacent area of the spreadsheet that is to be filled following the extrapolation rules.

Random value fill. Select a block, then right-click and select Fill/Standardize Block - Fill Random Values from the spreadsheet shortcut menu. The currently selected block will be filled with random values (following a uniform distribution) in the range of 0 to 1. An error message is displayed if a block is not selected.

Extrapolation of values (AutoFill). A selected series of values (consisting of at least two values) will be extended if a meaningful pattern is detected. For example, the following numbers: 1.00, 2.00, and 3.00 would be extended to 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, and 8.00 as illustrated in the following spreadsheets:

If no meaningful pattern can be detected, predicted values from a linear trend regression will be used for the extrapolation. If a single value (or a repeated value) is selected, the value will be copied into the extrapolated block; also, sequences consisting of names of months, days, or quarters can be automatically extended.

Fill down or right. Finally, simple spreadsheet-style "fill block" (Down or Right) options are available that fill the currently selected block by duplicating the first row or column (respectively) of the block. To use this facility, first select a block to be filled in the spreadsheet. Then select Fill/Standardize Block from the spreadsheet Edit menu (or the spreadsheet shortcut menu, shown below). Use the hierarchical menu to select either Fill/Copy Down or Fill/Copy Right.

The former copies (duplicates) the first row of the block to all remaining rows of the block; the latter copies the first column of the block to the remaining columns of the block. Both options work in a manner similar to Microsoft Excel's Fill Right and Fill Down facility (also available in Excel from the Edit menu).

How Does the Extrapolation of Blocks (AutoFill) Work?

A horizontal or vertical series in a block can be extrapolated by dragging the block Fill Handle (a small solid square located on the lower-right corner of the block). After selecting the block, point at the Fill Handle. When the mouse pointer changes to a plus sign , drag to the last cell(s) to which you want to extrapolate the data. STATISTICA can create series of values such as sequential numbers, linear extrapolations, and dates (e.g., you can extend a series such as 1, 2, 3 to include 4, 5, 6).

You can extrapolate (AutoFill) a block in the following ways:

Months. The names of months can be spelled out or abbreviated in the following manner:

Days. The names of the days can be spelled out or abbreviated in the following manner:

Quarters. The names of quarters of the year can be specified as follows in the variable:

Custom lists. You can also create your own list of values to use in extrapolating values.

Extrapolating (auto filling) a block upwards or to the left. In the same manner as extrapolating a block down or right, a block can be extrapolated by dragging the Fill Handle up or to the left past the original start of the block.

If you drag the Fill Handle up or to the left and stop within the original selection without going past the top or left side of the selection, you will delete data within the selection (data to be deleted are indicated in gray as you drag within the selection). You can use the Fill/Standardize Block commands on the spreadsheet Edit menu to copy to adjacent cells within the currently selected block.

How Do I Standardize Values in a Block?

Select a block of values in the spreadsheet, right-click, and select either Fill/Standardize Block - Standardize Columns or Fill/Standardize Block - Standardize Rows from the shortcut menu. The standardized values are computed as follows:

Standard Value = (raw value - mean of highlighted row/column)/standard deviation

In the following illustration, the columns of the selected block have been standardized by columns:

You can also standardize selected variables by selecting Standardize from the Data menu (this procedure works independent of the currently selected block, but takes into account the current case selection conditions and weights).

Statistical Analyses

How Do I Select Variables for an Analysis?

Every analysis definition dialog in STATISTICA contains at least one Variables button that is used to specify variables to be analyzed. You can click the button (or press V). If you forget to specify variables and click the OK button to start the analysis, STATISTICA prompts you to specify the variables to be analyzed. The variable selection dialog that will be displayed supports various ways of selecting variables, and it offers various shortcuts and options to review the contents of the datafile.

Can Variables Be Selected for Analyses by Highlighting Them in the Spreadsheet?

Yes. If you select a block in the spreadsheet, the variables included in the block are automatically preselected for the next analysis. This shortcut is designed to limit the chance of producing unintended results in the following ways:

How Do I Select a Subset of Cases (Observations) to Be Included in an Analysis?

Before an analysis begins (i.e., before the data are processed), you can instruct STATISTICA to select only cases (i.e., rows in the spreadsheet) that meet some specific selection criteria. A facility to define and manage case selection conditions can be accessed from the Tools - Selection Conditions - Edit menu (see page 282) or by clicking on the status bar field Sel, which shows the current status of the Case Selection Conditions. Note that when case selection conditions are disabled, this field reads Sel: OFF, when they are enabled, it reads Sel: ON).

Note that there are two types of case selection conditions in STATISTICA - permanent (saved with the datafile) and temporary (associated with the current analysis only).

How Are Case Selection Conditions Stored and Saved?

Global case selection conditions (definitions of subsets of data) are attached to a spreadsheet and are called Spreadsheet Case Selection Conditions. They are accessed from the Tools - Selection Conditions menu or by clicking the status bar field Sel. Analysis/Graph Case Selection Conditions can also be locally connected to an analysis or graph. They are accessed by clicking the Select Cases button that is available on all analysis/graph definition dialogs. You can save and open either type of selection conditions (and maintain libraries of case selection conditions) using the Open or Save As buttons in the appropriate Case Selection Conditions dialog. Note that case weights are specified in a similar manner.

Can I Select Random Subsets of Data?

Yes, you can. Subsets of data can be created via both simple random sampling and systematic random sampling. Select Subset/Random Sampling from the Data menu to display the Create a Subset/Random Sampling dialog.



Select the Simple random sampling option button in the Subset Selection Rules group to obtain your subset via random sampling. You have two choices in regards to how the subset is created: either via the Percentage of cases within the original spreadsheet or an Approximate number of cases. If you select the With replacement check box, then once a case is selected to be included into the subset, that case will be placed back into the pool of available choices for the remaining cases in the subset (hence an individual case can appear more than once in the resulting subset). Additionally, if a weighting variable has been selected for the original spreadsheet, select the Use spreadsheet case weights check box to apply that case weight when the subset is being obtained.

Select the Systematic random sampling option button to obtain your subset via systematic random sampling. For instance, if you enter a 5 in the K= box, STATISTICA will randomly select a case within the first five cases and then finish obtaining the subset by selecting each fifth case in the spreadsheet after the originally selected case.

What Is the Quickest Way to Review Basic Descriptive Statistics for a Variable?

You can either select Descriptive Statistics from the Basic Statistics and Tables (Startup Panel) (accessible from the Statistics menu) or click the Values/Stats button in the Variable specifications dialog, as illustrated below. To display the Variable specifications dialog, double-click on the variable name in the spreadsheet. Alternatively, right-click on the variable name and select Variable Specs from the shortcut menu.



After you click the Values/Stats button, the resulting dialog displays information about the selected variable, a sorted list of its values, and descriptive statistics. The descriptive statistics can be copied to the Clipboard by clicking the Copy button (see the lower part of the window in the illustration below).

What Is the Quickest Way to Review a Sorted List of All Unique Values of a Variable?

Click the Values/Stats button in the Variable specifications dialog to display a sorted list of all unique numeric values and corresponding text labels (if there are any) for that variable. The Variable specifications dialog is accessed by double-clicking the variable name in the spreadsheet.



Note that the assignments between the numeric values and text labels for the variable can be edited in the Text Labels Editor.

What Are Statistics of Block Data?

Statistics for each row or column in a selected block can be computed and added to the spreadsheet (i.e., appended at the end) by selecting the desired Statistics of Block Data from the shortcut menu.



The selected statistics are added to the end of the spreadsheet as additional rows or columns. For example, the selection shown above computes the sums for each row in the selected block and appends them to the datafile as a new variable Sum. Note that in the illustration below the new variable Sum has been moved next to Measure02.



If you select Block Columns, the results of the selected statistic are appended as a new case, and the case name contains the name of the selected statistic and the range description, as shown below.



In addition, a (customizable) list of predefined graphs is also available from this shortcut menu.



These Graphs of Block Data allow you to visualize the data in the selected block either by row or by column.

Can Matrix Data Be Used for Input Instead of Raw Data?

Yes. STATISTICA supports a variety of matrix file types as input data (e.g., for Multiple Regression, Factor Analysis, Reliability/Item Analysis, and other modules). Matrix files can be edited in the spreadsheet (as if they were raw datafiles); however, in order to be properly interpreted as matrices, they need to meet specific content and format conditions (depending on the matrix type). STATISTICA also supports the multiple matrix datafile format for input in procedures (such as Structural Equation Modeling) that supports matrix data for groups. Note that matrix data that meet the specific content and format conditions are automatically saved in STATISTICA Matrix Spreadsheet (*.smx) format.

Import, Export, DDE, OLE

How Can I Access Data from Excel and Other Foreign Datafiles?

Clipboard. The quickest, and in many cases easiest, way to access datafiles from other Windows applications (e.g., spreadsheets) is to use the Clipboard, which in STATISTICA supports special Clipboard data formats generated by applications such as Microsoft Excel. For example, STATISTICA properly interprets formatted cells (such as 4/17/1999 or $10) and text values (including extensive in-cell formatting, e.g., RVStower120.3MHz).

File import facilities. Datafiles from a wide variety of Windows and non-Windows applications can be accessed and translated into the STATISTICA format using the file import facilities (available by opening the file using the File - Open command). You can even access formatted and free format text (ASCII) files.

The main advantages of using the file import facilities (over the Clipboard) are that:

STATISTICA Query. In addition to the file import facilities described above, STATISTICA provides access to virtually all databases via STATISTICA's flexible STATISTICA Query, accessible from both the File - Get External Data and the Data - Get External Data menus.

OLE links. STATISTICA supports the Object Linking and Embedding (OLE) conventions allowing you to link values or text in STATISTICA documents (e.g., spreadsheets, reports) to documents in other (Windows) applications. The procedure is in fact much simpler than it might appear and can be easily employed without technical knowledge about the mechanics of OLE, especially when you use the Paste Link (instead of script-entry) method.

DDE links. Finally, STATISTICA supports the Dynamic Data Exchange (DDE) conventions; thus, you can dynamically link a range of data in a spreadsheet to a subset of data in other (Windows) applications. Like OLE, the procedure is in fact much simpler than it might appear, and can be easily employed without technical knowledge about the mechanics of DDE. To set up a DDE Link, select DDE Links from the Edit menu.

Accessing datafiles larger than the local storage. Note that enterprise versions of STATISTICA offer options to query and access large remote datafiles in-place (i.e., without having to import the data and create a local copy).

How Do I Export Data from STATISTICA to Excel and Other Foreign Datafiles?

STATISTICA can export datafiles to a wide variety of Windows and non-Windows applications via the Save As command accessible from the File menu. You can even save spreadsheets as formatted and free format text (ASCII) files. These facilities allow you to specify the exact way in which the translation is to be performed (e.g., decide whether or not to export variable names, text labels, and case names). Also, STATISTICA Spreadsheets are OLE servers and clients, thus other applications can link to them.

Can STATISTICA Access Data from Enterprise Databases?

Yes. STATISTICA Query (including options to combine fields from multiple tables) is provided via both the File - Get External Data and the BI menus. As long as the database format supports a 32-bit ODBC driver or an OLE DB provider, accessing such data should present no problem.

Note that you can either take advantage of STATISTICA Query's built-in graphical user interface for querying:



or write SQL statements in a text editor.



For further details about setting up connections to external databases, consult the database vendor and/or your database administrator.

Can STATISTICA Process Files That Are Larger Than the Local Hard Drive?

Yes, the enterprise versions of STATISTICA offer options to query and access large remote datafiles in-place (i.e., without having to import the data and create a local copy).

Does STATISTICA Support ODBC?

Yes, via STATISTICA Query, an external data querying facility available from the Data - Get External Data menu and the File - Get External Data menu. The STATISTICA Query interface includes options to combine fields from multiple tables and provides flexible access to a wide variety of database management files, including the major enterprise database formats such as Microsoft SQL Servers and Oracle. Specific records (rows of tables) can be selected by entering SQL statements. STATISTICA Query automatically builds the SQL statement for you as you select the components of the query via a simple graphical interface and/or intuitive menu options and dialogs. Hence, an extensive knowledge of SQL is not necessary in order for you to create advanced and powerful queries of data in a quick and straightforward manner. Multiple queries based on one or many different databases can also be created to return data to an individual spreadsheet. Therefore, you can maintain connections to multiple external databases simultaneously.

How Do I Set Up OLE Links between STATISTICA and Other Windows Applications?

You can establish OLE (Object Linking and Embedding) links between a "source" (or server) file (e.g., a Microsoft Word document) and a STATISTICA document (the "client" file) so that when changes are made to the data in the source file, the data are automatically updated in the respective part of the STATISTICA document (client file). Additionally, STATISTICA can serve as a "source" (or server) file for other OLE compatible documents. In this way, you could link the values in a STATISTICA Spreadsheet to a table in a Microsoft Word document so that the Word document updates when data in the spreadsheet are changed.

OLE links are established using the quick, "paste-like" Paste Link option in the Paste Special dialog. To create an OLE Link within a STATISTICA document, first copy to the Clipboard the desired text (or values) from another Windows application. Then (in STATISTICA), select Paste Special from the Edit menu to display the Paste Special dialog. Select the Paste Link option button, specify the correct format in the As group, and click the OK button to establish the link using the source given in the Source field.

When a link is established, it can be managed using the Links dialog (select Links from the Edit menu). Use this dialog to update or break a link, open the source file or change to a different source file, and determine whether updates will be automatic or manual.

How Do I Set Up DDE Links between STATISTICA and Other Windows Applications?

You can establish DDE (Dynamic Data Exchange) links between a "source" (or server) file (e.g., a Microsoft Excel spreadsheet) and a STATISTICA datafile (the "client" file) so that when changes are made to the data in the source file, the data are automatically updated in the respective part of the STATISTICA Spreadsheet (client file).

A common application for dynamically linking two files would be in industrial settings, where the STATISTICA datafile would be dynamically linked with a measurement device connected to the serial port (e.g., in order to automatically update specific measurements hourly).

DDE links are established in the Manage DDE Links dialog (select DDE Links from the Edit menu). To create a new link, click the New Link button in this dialog. The dialog also is used to edit, update, delete, or disable any or all links.

Back to Top
Request Quote
StatSoft Home Page



If your question is not answered above, please e-mail us, or call (918) 749-1119 Monday through Friday, 9:00AM to 5:00PM Central Time. Please include your serial number (select About STATISTICA from the Help menu to view your serial number) and information about your hardware (the type of processor (CPU), the amount of memory (RAM), and disk space) and the version of the operating system that you are using. If you live outside North America, please contact your local StatSoft office. (Please note: Technical Support is only for users of the latest version of the program.)


[StatSoft]
2300 East 14th Street, Tulsa, OK 74104
Phone: (918) 749-1119; Fax: (918) 749-2217

[StatSoft]e-mail: info@statsoft.com

©Copyright StatSoft, Inc., 1984-2005.
StatSoft, StatSoft logo, STATISTICA, SEWSS, SEDAS, Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.