When working with large data files, it becomes important to look for ways to make one's processes more efficient. File size and computation times can both be affected by how data is stored.
Many variables can be stored more efficiently merely by changing a few of the default settings. In this brief article, we will explore the various methods to help make spreadsheet storage and computations more efficient.
To view and change the storage method of a given variable, click on the variable header in the spreadsheet. Then, select the Data tab and in the Variables group, click Specs to display the variable specification dialog box for the selected variable. You can also double-click on the variable header to display this dialog box. In the drop-down box labeled Type, you will find the data storage options. The default data storage method is double precision. In STATISTICA, it is called simply Double.
For variables stored with double precision, values are stored as 64-bit floating point real numbers, with 15-digit precision. The range of values supported by this data type is approximately +/-1.7*10308.
The next option, Text, is used for storing text data. The Length should be specified to store the number of characters needed. As you would expect, the longer the designated length of the text variable, the more storage space the data takes. So the length parameter should be set as small as possible to capture the full text.
For some types of numeric data, the double precision data storage is necessary. Any variable with values that have decimals or are extremely large or small require this storage type. But many variables are stored with far greater precision than necessary, and this is where we can change the data type and gain efficiency.
The integer data type takes on integer values between +/- 2,147,483,647. Variables stored with this method are still more efficient, with 4 bytes per cell, compared to 8 with double precision.
The byte data type takes on integer values from 0 to 255 and is the most economical storage option. For variables needing only small integer values, this data type should be used and only takes 1 byte of storage per cell in the spreadsheet.
Using the most efficient storage method for your variables makes for smaller spreadsheet files and faster computing.